Architecture Reading List

Classic Machines and Papers

[Thor64]
J. E. Thorton, "Parallel Operation in the Control Data 6600",
AFIPS Proceedings of the Spring Joint Computer Conference,
part II, vol. 26 (1964), pp. 33-40.

[Moor65]
G. E. Moore, "Cramming more components onto integrated circuits" ,
Electronics , pp. 114-117, April 1965.

[Toma67]
R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units",
IBM Journal of Research and Development, Vol. 11 Issue 1 (January 1967), pp. 25-33.

[AnST67]
D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo,
"IBM System/360 Model 91: Machine Philosophy and Instruction-handling",
IBM Journal of Research and Development, Vol. 11 Issue 1(January 1967), pp. 8-24.

[Amda67]
G. M. Amdahl,
"Validity of the single-processor approach to achieving large scale computing capabilities",
AFIPS Conference Proceedings, April 1967, pp. 483-485.

[Thor70]
J. E. Thorton, "Design of a Computer: The Control Data 6600",
Glenview, IL: Scott Foresman, 1970.

[CRAY77]
"CRAY-1 Computer System Hardware Reference Manual" ,
CRAY Research Incorporated,
Publication No. 224004, Rev. C, November 1977.

[Russ78]
R. M. Russell, "The CRAY-1 Computer System"
Communications of the ACM, vol. 21, no. 1 (January 1978), pp. 63-72.

[Smit81]
J. E. Smith, "A Study of Branch Prediction Strategies",
Proceedings of the 8th Annual International Symposium on Computer Architecture,
May 1981, pp. 135-148.

[Kolo81]
J. S. Kolodzey, "The CRAY-1 Computer Technology",
IEEE Transactions on Component Hybrids, and Manufacturing Technology,
vol. CHMT-4, no. 2 (June 1981), pp. 181-186.

[EmCl84]
J. S. Emer and D. W. Clark,
"A Characterization of Processor Performance in the VAX-11/780",
Proceedings of the 11th Annual International Symposium on Computer Architecture,
June 1984, pp. 301-330.

[SmPl85]
J. E. Smith and A. R. Pleszkun, "Implementing Precise Interrupts in Pipelined Processors",
Proceedings of the 12th Annual International Symposium on Computer Architecture,
Boston, MA (June 1985), pp. 36-44.

[PaHS85]
Y. N. Patt, W. M. Hwu, and M. Shebanow,
"HPS, a new microarchitecture: rationale and introduction",
Proceedings of the 18th Annual Workshop on Microprogramming,
Pacific Grove, CA (December 1985), pp. 103-108.

[SoVa87]
G. S. Sohi and S. Vajapeyam,
"Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors",
Proceedings of the 14th Annual International Symposium on Computer Architecture,
Pittsburgh, PA (June 1987), pp. 27-34.

[RaFi92]
B. R. Rau and J. A. Fisher,
"Instruction-Level Parallel Processing: History, Overview, and Perspective",
Hewlett-Packard Laboratories Tech Report HPL-92-132 ,
October 1992.

Instruction Sets

[PaDi80]
D. A. Patterson and D. R. Ditzel, "The Case for the Reduced Instruction Set Computer",
ACM SIGARCH Computer Architecture News,
Vol. 8 (October 1980), pp. 25-33.

[Wulf81]
W. A. Wulf, "Compilers and Computer Architecture",
IEEE Computer , Vol. 14, issue 7 (July 1981), pp. 41-47.

[Radi82]
G. Radin, "The 801 Minicomputer ",
Proceedings of the 1st International Conference on Architectural Support for Programming Languages and Operating Systems,
Palo Alto, CA (March 1982), pp. 39-47.

[CHJS86]
G. Radin, "Instruction Sets and Beyond: Computers, Complexity, and Controversy",
IEEE Computer , Vol. 18, issue 9 (September 1985), pp. 8-19.

Decoupled Processing

[Rich99]
Kevin Rich, "Compiler Techniques for Evaluating and Extending Decoupled Architectures",
(Ph.D. Dissertation) University of California at Davis, Davis, California (December 1999).

[Tyso97]
Gary Tyson, "Evaluation of a Scalable Decoupled Microprocessor Design"
(Ph.D. Dissertation) University of California at Davis, Davis, California (August 1997).

[TyFa94]
G. Tyson and M. Farrens, "Code Scheduling for Multiple Instruction Stream Architectures",
International Journal of Parallel Processing, vol. 22, no. 3 (1994), pp. 243-272.

[TyFa93]
G. Tyson and M. Farrens, "Techniques for Extracting Instruction Level Parallelism on MIMD Architectures",
Proceedings of the 26th Annual International Symposium on Microarchitecture, Austin, Texas
(December 1-3, 1993), pp. 128-137.

[SmWP86]
J. E. Smith, S. Weiss and N. Y. Pang, "A Simulation Study of Decoupled Architecture Computers",
IEEE Transactions on Computers, vol. C-35, no. 8 (August 1986), pp. 692-702.

Methods

[DeBK01]
Rajagopalan Desikan, Doug Burger, and Stephen Keckler,
"Measuring Experimental Error in Microprocessor Simulation",
Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA01),
Goteborg, Sweden (July 1-4th, 2001), pp. 266-277.

[OsCF00]
Mark Oskin, Frederic T. Chong, Matthew Farrens,
"HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Designs",
Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA00),
Vancouver, Canada (June 10-14th, 2000) pages 71-82.

Branch Prediction

[YeP91]
T. Yeh and Y. Patt, "Two-level adaptive training branch prediction" ,
Proceedings of the 24th Annual International Symposium on Microarchitecture,
Albuquerque, New Mexico (November 18-20, 1991), pp. 51-61.

[YeP92]
T. Yeh and Y. Patt, "Alternative Implementations of Two-Level Adaptive Training Branch Prediction" ,
Proceedings of the Nineteenth Annual International Symposium on Computer Architecture,
Queensland, Australia (May 19-21, 1992), pp. 124-134.

[McFa93]
Scott McFarling, "Combining branch predictors",
Digital Equipment Corporation WRL Technical Note TN-36, June 1993

[CHYP94]
P. Chang, E. Hao, T. Yeh and Y. Patt,
"Branch Classification: A New Mechanism for Improving Branch Predictor Performance",
Proceedings of the 27th Annual International Symposium on Microarchitecture,
San Jose, Ca. (November 30-December 2, 1994), pp. 22-31.

[YoGS95]
C. Young, N. Gloy and M. D. Smith,
"A Comparative Analysis of Schemes for Correlated Branch Prediction",
Proceedings of the 22nd Annual International Symposium on Computer Architecture,
Santa Marhgerita Ligure, Italy (June 22-24, 1995), pp. 276-286.

[ChCM96]
I. K. Chen, J. T. Coffey and T. N. Mudge,
"Analysis of Branch Prediction via Data Compression",
Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems,
Cambridge, MA (October 1996), pp. 128-137.

[EmGl97]
Joel Emer and Nikolas Gloy,
"A Language for Describing Predictors and its Application to Automatic Synthesis",
Proceedings of the 24th Annual International Symposium on Computer Architecture,
Denver, Colorado (June 2-4, 1997), pp. 304-314.

[EPCP98]
M. Evers, S. J. Patel, R. S. Chappell and Y. N. Patt,
"An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work",
Proceedings of the 25th Annual International Symposium on Computer Architecture,
Barcelona, Spain (June 29-July 1, 1998), pp. 52-61.

[JuSN98]
T. Juan, S. Sanjeevan and J. J. Navarro,
"Dynamic History-Length Fitting: A third level of adaptivity for branch prediction",
Proceedings of the 25th Annual International Symposium on Computer Architecture,
Barcelona, Spain (June 29-July 1, 1998), pp. 155-166.

[StEP98]
J. Stark, M. Evers and Y. N. Patt,
"Variable Length Path Branch Prediction",
Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems,
San Jose, CA (October 3-7, 1998), pp. 170-179.

[EdMu98]
A. N. Eden and T. Mudge, "The YAGS Branch Prediction Scheme",
Proceedings of the 31st Annual International Symposium on Microarchitecture,
Dallas, Texas (November 30-December 2, 1998), pp. 69-77.

[KiT98]
S. P. Kim and G. S. Tyson, "Analyzing the Working Set Characteristics of Branch Execution",
Proceedings of the 31st Annual International Symposium on Microarchitecture,
Dallas, Texas (November 30-December 2, 1998), pp. 49-58.

[HeSS99]
T. Heil, Z. Smith and J. E. Smith,
"Improving Branch Predictors by Correlating on Data Values",
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 32),
Haifa, Israel (November 16-18, 1999), pages 28-37.

[HaSF00]
Michael Hangs, Phil Sallee and Matthew Farrens,
"Branch Transition Rate: A New Metric for Improved Branch Classification Analysis",
Proceedings of the 6th International Symposium on High-Performance Computer Architecture,
Toulouse, France (January 8-12, 2000), pp. 241-250.

[SkMC00]
Kevin Skadron, Margaret Martonosi, and Douglas Clark,
"A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions",
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques,
Philadelphia, PA (October 15-19, 2000), pp. 199-206

[ERSM01]
A. Eden, J. Ringenberg, S. Sparrow, and T. Mudge, "Hybrid myths in branch prediction",
Proceedings of the 5th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2001)
and the 7th International Conference on Information Systems Analysis and Synthesis (ISAS 2001)
,
Orlando, FL, July 2001.

Confidence Predictors

[JaRS96]
Erik Jacobsen, Eric Rotenberg, and James E. Smith,
"Assigning Confidence to Conditional Branch Predictions",
Proceedings of the 29th Annual International Symposium on Microarchitecture,
Paris, France (December 2-4, 1996), pp. 142-152.

[GKMP98]
D. Grunwald, A. Klauser, S.Manne and A. Pleszkun,
"Confidence Estimation for Speculation Control", Proceedings of the 25th Annual International Symposium on Computer Architecture,
Barcelona, Spain (June 29-July 1,1998), pp. 122-131.

[AGGG01]
J.L. Aragon, J. Gonzalez, J.M. Garcia and A. Gonzalez,
"Selective Branch Prediction Reversal by Correlating with Data Values and Control Flow" ,
Proceedings of the 19th IEEE International Conference on Computer Design,
Austin, Texas (September 24-26, 2001), pp. 228-233.

Advanced Caching Techniques

[BuGK96]
D. Burger, J. R. Goodman and A. Kagi,
"Memory Bandwidth Limitations of Future Microprocessors" ,
Proceedings of the 23rd Annual International Symposium on Computer Architecture,
Philadelphia, PA (May 22-24, 1996), pp. 78-89.

[TFMP97]
G. Tyson, M. Farrens, J. Matthews and A. Pleszkun,
"Managing Data Caches using Selective Cache Line Replacement",
International Journal of Parallel Processing,
vol. 25, no. 3 (June 1997), pp. 213-242.

[KuWi98]
S. Kumar and C. Wilkerson,
"Exploiting Spatial Locality in Data Caches using Spatial Footprints"
,
Proceedings of the 25th Annual International Symposium on Computer Architecture,
Barcelona, Spain (June 29-July 1, 1998), pp. 357-368.

[VTGN99]
A. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji,
"Adapting Cache Line Size to Application Behavior"
",
Proceedings of the 13th ACM International Conference on Supercomputing,
Rhodes, Greece (June 20-25, 1999), pp. 145-154.

[TRST99]
Edward S. Tam, Jude A. Rivers, Vijayalakshmi Srinivasan, Gary S. Tyson and Edward S. Davidson,
"Active Management of Data Caches by Exploiting Reuse Information",
IEEE Transactions on Computers, Vol 48, No 11, pp. 1244-1259, Nov 1999.

[HaRe00]
Erik G. Hallnor and Steven K. Reinhardt,
"A Fully Associative Software-Managed Cache Design",
Proceedings of the 27th Annual International Symposium on Computer Architecture,
Vancouver, British Columbia (June 10-14, 2000), pp. 107-116

[JaMu01]
B. Jacob and T. Mudge, "Uniprocessor virtual memory without TLBs",
IEEE Transactions on Computers, vol. 50, no. 5, May 2001, pp. 482-499.

Instruction Fetch Issues

[CMMP95]
T. M. Conte, K. N. Menezes, P. M. Mills and B. A. Patel,
"Optimization of Instruction Fetch Mechanisms for High Issue Rates",
Proceedings of the 22nd Annual International Symposium on Computer Architecture,
Santa Margherita Ligure, Italy (June 22-24, 1995), pp. 333-344.

[RoBS96]
E. Rotenberg, S. Bennett and J. E. Smith,
"Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching",
Proceedings of the 29th Annual International Symposium on Microarchitecture,
Paris, France (December 2-4, 1996), pp. 24-35.

[PoTM99]
M. Postiff, G. Tyson and T. Mudge, "Performance Limits of Trace Caches",
Journal of Instruction Level Parallelism, vol. 1, no. 5 (October 1999).

[BlRS99]
B. Black, B. Rychlik and J. P. Shen, "The Block-based Trace Cache",
Proceedings of the 26th Annual International Symposium on Computer Architecture,
Atlanta, GA (May 2-4, 1999), pp. 196-207.

[Rein01]
G. Reinman, "Hardware Optimizations Enabled by a Decoupled Fetch Architecture",
(Ph.D. Dissertation) University of California at San Diego, San Diego, California (August 2001).

Interesting Ideas

[Aust99]
Todd Austin, "DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design,",
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 32),
Haifa, Israel (November 16-18, 1999), pages 28-37.

[Aust00]
Todd Austin, "DIVA: A Dynamic Approach to Microprocessor Verification",
Journal of Instruction Level Parallelism, Vol. 2, no. 11 (May, 2000)

[EKDP03]
Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham,
Conrad Ziesler, David Blaauw, Todd Austin, Krisztian Flautner, Trevor Mudge,
"Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation",
Proceedings of the 36th Annual International Symposium on Microarchitecture,
San Diego, CA (Dec. 3-5, 2003), pp. 7-18.

Branch Elimination

[MLCH92]
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank and R. A. Bringmann,
"Effective Compiler Support for Predicated Execution Using the
Hyperblock", Proceedings of the 25th Annual International Symposium on
Microarchitecture, Portland, Oregon (December 1-4, 1992), pp. 45-54.

[HMC93] W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A.
Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G.
Holm and D. M. Lavery, "The Superblock: An Effective Technique for
VLIW and Superscalar Compilation.", Journal of Supercomputing, , vol.
7, no. 1/2 (1993), pp. 229-248.

[MuWh95]
F. Mueller and D. B. Whalley, "Avoiding Conditional Branches by Code
Replication", Proceedings of the ACM SIGPLAN Notices Conference on
Programming Language Design and Implementation, La Jolla, CA (June
18-21, 1995), pp. 56-66.

[BoGS97]
R. Bodik, R. Gupta and M. L. Soffa, "Interprocedural Conditional
Branch Elimination", Proceedings of the ACM SIGPLAN Notices Conference
on Programming Language Design and Implementation, Las Vegas, Nevada
(June 15-18, 1997), pp. 146-158.
[YaUW98]
M. Yang, G. Uh and D. B. Whalley, "Improving Performance by Branch
Reordering", Proceedings of the ACM SIGPLAN Notices Conference on
Programming Language Design and Implementation, Montreal, Canada (June
17-19, 1998), pp. 130-141.

[ASPM99]
D. I. August, J. W. Sias, J. Puiatti, S. A. Mahlke, D. A. Conners, K.
M. Crozier and W. W. Hwu, "The Program Decision Logic Approach to
Predicated Execution", Proceedings of the 26th Annual International
Symposium on Computer Architecture, Atlanta, GA (May 2-4, 1999), pp.
208-219.

Prefetching

[Joup90b]
N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition
of a Small Fully-Associative Cache and Prefetch Buffers", Proceedings
of the Seventeenth Annual International Symposium on Computer
Architecture, vol. 18, no. 2 (May 1990), pp. 364-373.

[Joup90a]
N. Jouppi, "Reducing Compulsory and Capacity Misses", Digital Western
Research Laboratory Technical Note TN-53(August 1990).

[ChBa95]
T. Chen and J. Baer, "Effective Hardware Based Data Prefetching for
High-Performance Processors", IEEE Transactions on Computers, vol.
44, no. 5 (May 1995), pp. 609-623.

[Eben98]
A. Ebenezer, Hardware Based Prefetching Methods, Masters Thesis,
Department of Electrical and Computer Engineering, University of
California-Davis, Davis, California, (December 1998).

[Joup98]
N. Jouppi, "Retrospective: Improving Direct-Mapped Cache Performance
by the Addition of a Small Fully-Associative Cache and Prefetch
Buffers", 25 Years of the International Symposium on Computer
Architecture - Selected Papers(1998), pp. 71-73.

[JoGr99]
D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors", IEEE
Transactions on Computers, vol. 48, no. 2 (February 1999), pp. 121-133.

Value Prediction