ACM Transactions on Computer Systems (TOCS), Volume 22 Issue 3, August 2004

Cluster communication protocols for parallel-programming systems
Kees Verstoep, Raoul A. F. Bhoedjang, Tim Rühl, Henri E. Bal, Rutger F. H. Hofman
Pages: 281-325
DOI: 10.1145/1012268.1012269
Clusters of workstations are a popular platform for high-performance computing. For many parallel applications, efficient use of a fast interconnection network is essential for good performance. Several modern System Area Networks include...

A study of source-level compiler algorithms for automatic construction of pre-execution code
Dongkeun Kim, Donald Yeung
Pages: 326-379
DOI: 10.1145/1012268.1012270
Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of...