UPCUnit - A unit test extension for UPC

Development October 10th, 2006

UPCUnit is a CUnit test extension for Unified Parallel C(UPC) programming language. It addresses the UPC-specific issues which CUnit does not cover, like collective test of returned values, reorganize of threads’ output, etc. Currently, only the basic non-interactive console is released.

UPCUnit is released in LGPL, as CUnit, also depends on CUnit.

It is almost full-fledged for daily use since it targets the test automation, do we really care about the interface and output when debugging? If there are some feedback or feature request from the community, I might consider to add other interfaces like ncurses, XML.

You can download the tarball from here.

2nd PGASCON Overview

HPC October 4th, 2006

NOTICE: PGASCON is not the official name for The Second Conference on Partitioned Global Address Space Programming Models, and this post states only personal opinion, which does not stands for GWU HPCL.

PGAS stands for Partitioned Global Address Space, aka Distributed Share Memory. This memory model has been adopted for the DARPA’s next generation Hight Productive Computer System program. The shared memory eliminate the tedious message pass overhead, while the partitioning leverages the performance by exploiting the affinity.

There are a few challenges for parallel computing that PGAS aims to address:

How to express the parallelism more naturally?
People tends to think the problem in sequential manner unless he is an inborn hardware designer. PGAS languages add new language keywords to declare a shared variable or vector and take SPMD executive models. The users still need to consider the synchronization and atomcity in the parallel computing environment with little help from the compiler.

How to map the shared vector?
For UPC, due to the limitation of old-style C array, the user could manipulate memory layout by the blocksize, while in CAF, the user may specify the memory by using memory vector. One interesting approach is pMatLab, a mapper object is constructed in the runtime and acts as the last argument for other MatLab functions. pMapper goes even farther for automatical and semi-automatical memory map.

How to optimize the share varible access?
This is a really BIG challenge for the compiler and runtime developer. First, a carefully designed cache and TLB for shared varibles may improve the hit rate and shorten the address parsing; lazy evaluation and aggregated packet passing help to reduce the memory bandwidth contention. New technology, for example, Sun’s optical linked chip, may enhance the overall performance in the architecture side as well.