Design of Storage Hierarchy in Multithreaded Architectures

Lucas Roh, Walid A. Najjar


Multithreaded execution model combines some aspects of dataflow-like execution with von Neumann model execution. The main objective of this model is to mask the latency of inter-processor communications and remote memory accesses in large scale multiprocessors. This model has been proposed in a variety of forms: large or small threads that are either blocking or non-blocking, and strict or non-strict execution.

An important issue in the analysis and evaluation of multithreaded execution is the design and performance of the storage hierarchy. Because of the sequential execution of threads, the locality of access *within* an executing thread can be exploited using registers and caches. At the *inter-thread* level, however, the locality of accesses to memory is not yet well understood and may depend on the execution model and the compilation strategy. This paper presents an analysis of inter-thread level locality from the memory access point of view in a non-blocking, strictly executing multithreaded model. The results show a high degree of locality that can be exploited efficiently by a relatively simple storage hierarchy design.


multithreaded architectures, storage hierarchy, synchronization, cacheing

Talk Overheads (138739 bytes)