Project description for ECS201A

Proposal due: Friday, November 2nd
Status Report due: Tuesday, November 21st
Final Report due: Midnight Wednesday, December 13th

Intro/Overviews

The purpose of this project is to help you develop your research, independent thinking, and presentation skills. Your assignment is to pick some topic (ideally one that you find interesting) and study it in more detail. For example, you might choose to try and evaluate some proposal of your own, or examine an extension to a paper studied in class, or re-validate the data in some paper by writing your own simulator. Keep in mind that there is an inverse relationship between creativity and detail - if you choose a project with very little creative contribution, such as re-validating an existing work, then a more detailed evaluation will be expected. If the project has a high creativity factor, then correspondingly less rigor will be acceptable. You can work in groups of 2 or 3, but I expect approximately equal (and substantial) contributions from all members - therefore, if you are doing a re-validation, for example, your group size will have to be small (2) since there is not that much work to do.

These projects will be graded on roughly 4 different things:

How well the problem is defined and motivated
How extensive the survey of previous work is
The experimental technique used
The quality of the presentation of the results

The paper should be similar in style to the conference papers that we will read in class or that are referenced in the back of each chapter of the text. Your goal should be to produce a publishable-quality paper. However, since many conference papers represent a significant part of a Ph.D's graduate work, conference-quality originality and results are not expected. Desired, but not required.

For those of you who are not familiar with what a conference paper looks like, here are 5 examples - reading these, you can get the feel for how a paper should be put together.

M. Farrens and A. Park, "Dynamic Base Register Caching: A Technique for Reducing Address Bus Width" , Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada (May 27-30, 1991).

M. Farrens and A. Pleszkun, "Strategies for Achieving Improved Processor Throughput" , Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada (May 27-30, 1991).

P. Sallee, M. Haungs and M. Farrens, " Branch Transistion Rate: A New Metric for Improved Branch Classification Analysis ", Proceedings of the 6th International IEEE Symposium on High Performance Computer Architecture , Toulouse, France (January 10-12, 2000), pp. 241-250.

H. Lee, G. Tyson and M. Farrens, " Eager Writeback - a Technique for Improving Bandwidth Utilization ", Proceedings of the 33rd Annual International Symposium on Microarchitecture , Monterey, CA (December 10-13, 2000), pp. 11-21

A. N. Eden and T. Mudge, " The YAGS Branch Prediction Scheme ", Proceedings of the 31st Annual International Symposium on Microarchitecture, Dallas, Texas (November 30-December 2, 1998), pp. 69-77.

In addition, there are many more examples at this web site.

There are three milestones associated with this task: The Proposal, the Status Report, and the Final Report.

Milestone 1 - The Proposal

Proposals should be 1 to 2 pages long and should include:

A description of the topic
A statement of why the topic is interesting or important
A description of the methods to be used for evaluating the proposed idea (for projects with original research)
References to at least 3 relevant papers you have obtained and read. The course text and readings cite many papers. Some other important venues for publishing relevant work on Architecture:

Proceedings of the International Symposium on Computer Architecture (ISCA)

Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

Proceedings of the International Symposium on Microarchitecture (MICRO)

Proceedings of the High Performance Computer Architecture Symposium (HPCA)

International Journal of Parallel Processing

ACM Transactions on Computer Systems

IEEE Transactions on Computers

IEEE Computer Magazine

IEEE Micro

Microprocessor Report

I will read these proposals and give you feedback regarding the acceptability of the proposal. For example, your proposal may be too ambitious to get done in the given time frame, or it may be too easy to be a 3-person project but acceptable as a 2-person project, or it may be already done and a different "spin" will be required ... The proposal deadline is given above. However, proposals turned in earlier than the deadline will get feedback sooner. (Remember - up to means less than! :-)

Milestone 2 - The Status Report

In order to help ensure work on the projects is moving forward in a timely fashion, a 1 to 2 page status report is due midway between the proposal submission and Final Report due dates. This report should clearly describe the progress you are making, so that I can provide some feedback on how you are doing and suggest any mid-course corrections that might be advisable. The status report will not be graded, but should be viewed as an important part of the project.

Milestone 3 - The Final Report

As stated above, your Final Report should be similar in style to a conference paper - an abstract, body, and optional appendices. The abstract should summarize the contributions of the report in one or two paragraphs, while the length of the body should be limited to approximately 5000 words (15-20 pages of double-spaced 10-point text). If you need more space, you can put additional supporting material in appendices.

Project Talks

15-25 minute presentations of your results will be scheduled during finals week, with the in-class finals time being the latest possible available time (2006 might be different, since I have a conference M-W of finals week ...). This should be viewed as an opportunity to practice your presentation skills - the ability to convey your ideas and results to your peers is critically important in our communication age, and a central part of the research process that should be of interest to those pursuing an advanced degree.

Possible Research Topics

Ideally, you should come up with your own topic, one that you find particularly interesting and related to your own interests. For example, if you have an interest in compilers, then code scheduling for instruction level parallelism might be a good topic. If you are more interested in Operating Systems, then the design of a processor to support the OS might be more to your liking. However, I realize that often at this point you do not yet know what you find most interesting, so to help you along a list of example projects follows. This is by no means an exhaustive list, nor is it a particularly good one. Examples, mainly.

Currently, almost all machines use 32-bit instructions. What if you had a 64-bit instruction? What could you do with that? (add hints to the instruction set to support potential underlying hardware, for example).; There is a big problem with fetching sufficient instructions to feed machines with high ILP - propose a new way (or evaluate existing ways) to deal with this problem.; Write a cycle-level simulator for an existing architecture, and then evaluate various performance enhancements. For example, what happens to the performance if Out Of Order (OOO) issue is added (or removed)?; You can now fabricate 500 million transistors on a chip. What is the best use of these transistors?; Compare and contrast different approaches to exploiting instruction level parallelism methods - for example, decoupled vs. VLIW, vectors vs. superscalar, VLIW vs. superscalar, decoupled vs. superscalar, etc.; Suggest modifications to the decoupled architecture approach that might help provide prefetch capabilities.; Evaluate the maximum amount of parallelism available in a representative set of benchmark programs.; Look at ways to increase the effective bandwidth between processor and external memory.; Study the "bursty" nature of pipelines; are averages really useful? Is there a way to more accurately model bursty behaviour?; (interesting note - 99% of the human population has more than the average number of legs ...)
: Analyze program basic block size, and look at the branch problem. Evaluate the technique of predicated execution, and give some examples of how it can be used to increase basic block size.; Architectures/implementations for non-load/store architectures. For example, how might a stack or accumulator architecture be implemented to go fast? Can performance advantages be identified?; Look at instruction set enhancements and their effect on performance (e.g., update-mode addressing, conditional register-to-register moves, and multiply-add instructions); Analyze the static and dynamic instruction frequencies for 3-4 different architectures. Also look at instruction couples and triples. Based on this information, can you propose any new instructions?; Architectural support of operating systems (e.g., user-level traps for lightweight threads); Revisit the concept of an OS co-processor. What should such a co-processor look like? (what OS tasks could use specific hardware support, how often would it have to be used to be effective, etc.) "Design" the processor (define the instruction set, word size, datapath, number of ALU's, registers, etc) What does this specially designed OS co-processor give you that a 68000 used in a similar manner wouldn't?; What would an OS for a decoupled machine like MISC look like?; Programs exhibit a lot of predictability and redundancy. Often entire blocks of code have the same inputs each time, meaning they do not have to be reexecuted. How might you identify these blocks and exploit this information?; Is it really necessary to use all the existing transistors just to improve performance? Currently it takes months/years to develop software packages/systems, which are full of bugs and potential security holes. What kind of hardware support might you add to a processor in order to help facilitate the job of writing correct programs?; What does the distribution of data values look like?; What is the average lifetime of a cache location?; What is the distribution of hard to predict branches? Do they cluster or are they evenly distributed? Can you use this information?; Study cache implementations, especially non-blocking caches -- design methods and performance, for example; Various memory system enhancements, including victim caches, stream buffers, address hashing, etc.; Extend the current research that has been done on new ways to manage a cache (evaluate and improve the effectiveness of C/NA, for example); Look at what are called spacial/temporal caches - does it make sense to treat data differently based on the type of locality it exhibits?; How about a "compressed" cache? We would expect there to be lots of redundancy in the cache itself - could you maybe have 1 cache that is really small that holds compressed data, and another cache that holds uncompressed data?; Some load instructions are more "important" than others - in other words, some load instructions need to find their data in the first level cache, while others can afford to have the data be in the second level cache with no impact on performance. How might you identify these different types of loads, and do something useful with that information?; Methods and performance of various predictors, both value and branch, including ones you propose yourself; A study of confidence predictors, why they are important and how they might be improved; The importance of and techniques for predicting multiple branches in a single cycle; Is there really a "memory wall", and if so, what do we do about it?; What's all this noise about Processors In Memory (PIM), anyway?; What is speculative execution, how important is it, how is it implemented, what kind of performance can it provide, etc.; Compiler transformations to improve pipeline/superscalar performance; Compiler transformations to improve memory behavior; The effect of changing technology on architecture (e.g. flash memories, fiber optics), and the most likely technology changes in the near future.; High-performance I/O (e.g. RAIDS and ATM networks); Prefetching, both data and instruction.; Value Speculation, what it is and how it works; Power-aware processing, what challenges designers are facing and how these problems might be overcome

OR:

Take any paper in any of the major conferences (ISCA, MICRO, ASPLOS, HPCA, etc.) and extend, expand, rebut, or verify it. You will need to tell me which paper you are working on, so that we don't wind up with multiple groups working on the same paper. The list of papers referenced above ( this one) is a good place to start. There is also a link on the 201A main web page to the home pages of several of these conferences. The papers are almost always available online, and if not I have both hard and electronic copies of ISCA and MICRO papers (and hard copies of most ASPLOS and several HPCA).

OR:

You may write a survey paper of an area within computer architecture. These papers should contain:

A summary of previous work in an area, including extensive references
A presentation of opinions of other authors both for and against various options (again, with references)
A conclusion containing your opinion of the strengths and weaknesses of the arguments presented above

Since a survey paper has no creative content and therefore is less risky than a reseach project, the survey papers will be expected to meet a much higher standard (both of of completeness and analysis of the literature.) A survey paper is also an individual project - no team survey papers will be accepted. You need to read at least 10 papers on the subject you are surveying. Here is an example of a survey paper, written by a Masters Degree student a few years back.