Interconnection Networks: Supercomputer networks are a key component for delivereing system level scalability. Current approaches to network interface design will not deliver latencies and message throughputs that are commensurate with the levels of bandwidth that can (and will) be built. Researchers are striving to fill this gap by developing technologies to accelerate the processing of MPI messages. The initial IAA research project in this area will develop techniques to achieve 750 ns latency and 15 million messages per second to a single socket by the end of 2008. In addition, the project will explore network interface level techniques to enable alternative programming models (e.g., UPC, Co-Array Fortran, etc.) on parallel platforms based on commodity microprocessors. This work is based on validated simulation models of the Red Storm network interface modified to incorporate novel MPI processing techniques.
Looking forward, it is clear that memory bandwidth is becoming the bottleneck for single node performance. Unfortunately, modern network architectures drive application developers to copy noncontiguous data into buffers before sending over the interconnect, wasting valuable memory bandwidth. The messaging rate research helps in this regard, but it is vital that we find more efficient means to move data through the system.
Memory Subsystems: The computer industry's shift to multi-core processors and accelerator-based designs is rapidly intensifying the imbalance between processor and memory system technologies. It has become clear that a change is needed in the architecture of processor memory subsystems. The initial IAA project in this area will actively look at placing advanced data movement functionality into the memory subsystem hardware that support advanced atomic memory operations and distributed memory operations such as gather/scatter capabilities. The project will also explore accelerating sparse memory accesses by exposing the underlying data structure to the CPU more intelligently. Longer term efforts based on 3D stacking will be focused on significantly improving bandwidth and lowering power.
Microprocessor Architecture: Commodity microprocessors are constantly evolving based on ongoing research throughout industry and academia. Studies have indicated that the scientific and engineering applications in use at the DOE differ significantly from most of the benchmarks typically used in this research. Sandia and ORNL are actively engaged in an effort to consider microprocessor extensions that will enhance the performance of DOE applications. In future projects the IAA may expand these efforts and draw in the broader research community to focus on how cutting edge conventional architecture techniques (e.g., value prediction, clustered microarchitectures, advanced prefetching) can be adapted to better apply to DOE applications.
RAS/Resilience: Reliability is a significant issue for leadership systems. The overall system has to be highly resilient to failure of individual components. The number of parts is increasing dramatically and the mean time between interrupts scales inversely with this number. Without breakthroughs in this area a single application running on an entire Exascale supercomputer might execute for only minutes between interrupts. Reliability, availability, and serviceability (RAS) will need improvements in capability and functionality to support the ability to run millions of cores on a single large problem. Reliability for advanced architectures is going to require a holistic approach which facilitates communication and cooperation among the operating system, runtime system, application software, and parallel file system when failures occur.