

## Lecture 2: Caches and Advanced Pipelining

Prof. Fred Chong  
ECS 250A Computer Architecture  
Winter 1999

(Adapted from Patterson CS252 Copyright 1998 UCB)

FTC.W99 1

FTC.W99 1

## Review, #1

- Designing to Last through Trends

|           | <u>Capacity</u> | <u>Speed</u>    |
|-----------|-----------------|-----------------|
| Logic     | 2x in 3 years   | 2x in 3 years   |
| DRAM      | 4x in 3 years   | 2x in 10 years  |
| Disk      | 4x in 3 years   | 2x in 10 years  |
| Processor | ( n.a.)         | 2x in 1.5 years |

- **Time to run the task**
  - Execution time, response time, latency
- **Tasks per day, hour, week, sec, ns, ...**
  - Throughput, bandwidth
- “X is n times faster than Y” means

$$\frac{\text{ExTime}(Y)}{\text{ExTime}(X)} = \frac{\text{Performance}(X)}{\text{Performance}(Y)}$$

FTC.W99 2

## Review, #2

$$\bullet \text{ Amdahl's Law:}$$

$$\text{Speedup}_{\text{overall}} = \frac{\text{ExTime}_{\text{old}}}{\text{ExTime}_{\text{new}}} = \frac{1}{(1 - \text{Fraction}_{\text{enhanced}}) + \frac{\text{Fraction}_{\text{enhanced}}}{\text{Speedup}_{\text{enhanced}}}}$$

$$\bullet \text{ CPI Law:}$$

$$\text{CPU time} = \frac{\text{Seconds}}{\text{Program}} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Cycles}}{\text{Instruction}} \times \frac{\text{Seconds}}{\text{Cycles}}$$

- Execution time is the **REAL** measure of computer performance!
- Good products created when have:
  - Good benchmarks
  - Good ways to summarize performance
- **Die Cost goes roughly with die area<sup>4</sup>**

FTC.W99 3

## Recap: Who Cares About the Memory Hierarchy?

### Processor-DRAM Memory Gap (latency)



FTC.W99 4

## Levels of the Memory Hierarchy



FTC.W99 5

## The Principle of Locality

- **The Principle of Locality:**
  - Program access a relatively small portion of the address space at any instant of time.
- **Two Different Types of Locality:**
  - **Temporal Locality** (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)
  - **Spatial Locality** (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access)
- Last 15 years, HW relied on locality for speed

FTC.W99 6



## 4 Questions for Memory Hierarchy

- Q1: Where can a block be placed in the upper level?  
*(Block placement)*
- Q2: How is a block found if it is in the upper level?  
*(Block identification)*
- Q3: Which block should be replaced on a miss?  
*(Block replacement)*
- Q4: What happens on a write?  
*(Write strategy)*

FTC.W99.13

## Q1: Where can a block be placed in the upper level?

- direct mapped - 1 place
- n-way set associative - n places
- fully-associative - any place



FTC.W99.14

## Q2: How is a block found if it is in the upper level?

- Tag on each block
  - No need to check index or block offset
- Increasing associativity shrinks index, → expands tag →



FTC.W99.15

## Q3: Which block should be replaced on a miss?

- Easy for Direct Mapped
- Set Associative or Fully Associative:
  - Random
  - LRU (Least Recently Used)

Associativity: 2-way 4-way 8-way  
 Size LRU Random LRU Random LRU Random  
 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%  
 64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%  
 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

FTC.W99.16

## Q4: What happens on a write?

- Write through**—The information is written to both the block in the cache and to the block in the lower-level memory.
- Write back**—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.
  - is block clean or dirty?
- Pros and Cons of each?**
  - WT: read misses cannot result in writes
  - WB: no repeated writes to same location
- WT always combined with write buffers so that don't wait for lower level memory**

FTC.W99.17

## Write Buffer for Write Through



- A Write Buffer is needed between the Cache and Memory**
  - Processor: writes data into the cache and the write buffer
  - Memory controller: write contents of the buffer to memory
- Write buffer is just a FIFO:**
  - Typical number of entries: 4
  - Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle
- Memory system designer's nightmare:**
  - Store frequency (w.r.t. time) → 1 / DRAM write cycle
  - Write buffer saturation

FTC.W99.18

### Impact of Memory Hierarchy on Algorithms

- Today CPU time is a function of (ops, cache misses) vs. just f(ops): What does this mean to Compilers, Data structures, Algorithms?
- "The Influence of Caches on the Performance of Sorting" by A. LaMarca and R.E. Ladner. *Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms*, January, 1997, 370-379.
- Quicksort: fastest comparison based sorting algorithm when all keys fit in memory
- Radix sort: also called "linear time" sort because for keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys
- For Alphastation 250, 32 byte blocks, direct mapped L2 2MB cache, 8 byte keys, from 4000 to 40000000







## Case Study: MIPS R4000 (200 MHz)

### 8 Stage Pipeline:

- IF—first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access.
- IS—second half of access to instruction cache.
- RF—instruction decode and register fetch, hazard checking and also instruction cache hit detection.
- EX—execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation.
- DF—data fetch, first half of access to data cache.
- DS—second half of access to data cache.
- TC—tag check, determine whether the data cache access hit.
- WB—write back for loads and register-register operations.

### 8 Stages: What is impact on Load delay? Branch delay? Why?

FTC.W99 37

## Case Study: MIPS R4000

TWO Cycle Load Latency



FTC.W99 38

THREE Cycle Branch Latency

(conditions evaluated during EX phase)  
Delay slot plus two stalls  
Branch likely cancels delay slot if not taken

## MIPS R4000 Floating Point

- FP Adder, FP Multiplier, FP Divider
- Last step of FP Multiplier/Divider uses FP Adder HW

### 8 kinds of stages in FP units:

| Stage | Functional unit | Description                |
|-------|-----------------|----------------------------|
| A     | FP adder        | Mantissa ADD stage         |
| D     | FP divider      | Divide pipeline stage      |
| E     | FP multiplier   | Exception test stage       |
| M     | FP multiplier   | First stage of multiplier  |
| N     | FP multiplier   | Second stage of multiplier |
| R     | FP adder        | Rounding stage             |
| S     | FP adder        | Operand shift stage        |
| U     |                 | Unpack FP numbers          |

FTC.W99 39

## MIPS FP Pipe Stages

| FP Instr       | 1                          | 2   | 3                     | 4               | 5   | 6                     | 7   | 8   | ...  |
|----------------|----------------------------|-----|-----------------------|-----------------|-----|-----------------------|-----|-----|------|
| Add, Subtract  | U                          | S+A | A+R                   | R+S             |     |                       |     |     |      |
| Multiply       | U                          | E+M | M                     | M               | M   | N                     | N+A | R   |      |
| Divide         | U                          | A   | R                     | D <sup>28</sup> | ... | D+A                   | D+R | D+R | A, R |
| Square root    | U                          | E   | (A+R) <sup>1/28</sup> | ...             | A   | R                     |     |     |      |
| Negate         | U                          | S   |                       |                 |     |                       |     |     |      |
| Absolute value | U                          | S   |                       |                 |     |                       |     |     |      |
| FP compare     | U                          | A   | R                     |                 |     |                       |     |     |      |
| Stages:        |                            |     |                       |                 |     |                       |     |     |      |
| M              | First stage of multiplier  |     |                       |                 | A   | Mantissa ADD stage    |     |     |      |
| N              | Second stage of multiplier |     |                       |                 | D   | Divide pipeline stage |     |     |      |
| R              | Rounding stage             |     |                       |                 | E   | Exception test stage  |     |     |      |
| S              | Operand shift stage        |     |                       |                 |     |                       |     |     |      |
| U              | Unpack FP numbers          |     |                       |                 |     |                       |     |     |      |

FTC.W99 40

## R4000 Performance

- Not ideal CPI of 1:
  - Load stalls (1 or 2 clock cycles)
  - Branch stalls (2 cycles + unfilled slots)
  - FP result stalls: RAW data hazard (latency)
  - FP structural stalls: Not enough FP hardware (parallelism)



FTC.W99 41

## Advanced Pipelining and Instruction Level Parallelism (ILP)

- ILP: Overlap execution of unrelated instructions
- gcc 17% control transfer
  - 5 instructions + 1 branch
  - Beyond single block to get more instruction level parallelism
- Loop level parallelism one opportunity, SW and HW
- Do examples and then explain nomenclature
- DLX Floating Point as example
  - Measurements suggests R4000 performance FP execution has room for improvement

FTC.W99 42

### FP Loop: Where are the Hazards?

```

Loop: LD F0,0(R1) ;F0=vector element
      ADDD F4,F0,F2 ;add scalar from F2
      SD 0(R1),F4 ;store result
      SUBI R1,R1,8 ;decrement pointer 8B (DW)
      BNEZ R1,Loop ;branch R1!=zero
      NOP          ;delayed branch slot

      Instruction producing result      Instruction using result      Latency in clock cycles
      FP ALU op      Another FP ALU op      3
      FP ALU op      Store double          2
      Load double    FP ALU op            1
      Load double    Store double          0
      Integer op     Integer op          0
  
```

- Where are the stalls?

FTC.W99.43

### FP Loop Hazards

```

Loop: LD F0,0(R1) ;F0=vector element
      ADDD F4,F0,F2 ;add scalar in F2
      SD 0(R1),F4 ;store result
      SUBI R1,R1,8 ;decrement pointer 8B (DW)
      BNEZ R1,Loop ;branch R1!=zero
      NOP          ;delayed branch slot
  
```

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |
| Load double                  | Store double             | 0                       |
| Integer op                   | Integer op               | 0                       |

FTC.W99.44

### FP Loop Showing Stalls

```

1 Loop: LD F0,0(R1) ;F0=vector element
2 stall
3 ADDD F4,F0,F2 ;add scalar in F2
4 stall
5 stall
6 SD 0(R1),F4 ;store result
7 SUBI R1,R1,8 ;decrement pointer 8B (DW)
8 BNEZ R1,Loop ;branch R1!=zero
9 stall          ;delayed branch slot
  
```

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |

• 9 clocks: Rewrite code to minimize stalls?

FTC.W99.45

### Revised FP Loop Minimizing Stalls

```

1 Loop: LD F0,0(R1)
2 stall
3 ADDD F4,F0,F2
4 SUBI R1,R1,8
5 BNEZ R1,Loop ;delayed branch
6 SD 8(R1),F4 ;altered when move past SUBI
  
```

#### Swap BNEZ and SD by changing address of SD

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |

6 clocks: Unroll loop 4 times code to make faster?

FTC.W99.46

### Unroll Loop Four Times (straightforward way)

```

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SD 0(R1),F4 ;drop SUBI & BNEZ
4 LD F6,-8(R1)
5 ADDD F8,F6,F2
6 SD -8(R1),F8 ;drop SUBI & BNEZ
7 LD F10,-16(R1)
8 ADDD F12,F10,F2
9 SD -16(R1),F12 ;drop SUBI & BNEZ
10 LD F14,-24(R1)
11 ADDD F16,F14,F2
12 SD -24(R1),F16
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP
  
```

Rewrite loop to minimize stalls?

15 + 4 x (1+2) = 27 clock cycles, or 6.8 per iteration  
Assumes R1 is multiple of 4

FTC.W99.47

### Unrolled Loop That Minimizes Stalls

```

1 Loop: LD F0,0(R1)
2 LD F6,-8(R1)
3 LD F10,-16(R1)
4 LD F14,-24(R1)
5 ADDD F4,F0,F2
6 ADDD F8,F6,F2
7 ADDD F12,F10,F2
8 ADDD F16,F14,F2
9 SD 0(R1),F4
10 SD -8(R1),F8
11 SD -16(R1),F12
12 SUBI R1,R1,#32
13 BNEZ R1,LOOP
14 SD 8(R1),F16 ; 8-32 = -24
  
```

- What assumptions made when moved code?

- OK to move store past SUBI even though changes register
- OK to move loads before stores: get right data?
- When is it safe for compiler to do such changes?

14 clock cycles, or 3.5 per iteration  
When safe to move instructions?

FTC.W99.48

## Compiler Perspectives on Code Movement

- Definitions: compiler concerned about dependencies in **program**, whether or not a HW hazard depends on a given **pipeline**
- Try to schedule to avoid hazards
- (True) **Data dependencies** (RAW if a hazard for HW)
  - Instruction i produces a result used by instruction j, or
  - Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i.
- If dependent, can't execute in parallel
- Easy to determine for registers (fixed names)
- Hard for memory:
  - Does  $100(R4) = 20(R6)$ ?
  - From different loop iterations, does  $20(R6) = 20(R6)$ ?

FTC.W99.49

## Where are the name dependencies?

```

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SD 0(R1),F4 ;drop SUBI & BNEZ
4 LD F0,-8(R1)
2 ADDD F4,F0,F2
3 SD -8(R1),F4 ;drop SUBI & BNEZ
7 LD F0,-16(R1)
8 ADDD F4,F0,F2
9 SD -16(R1),F4 ;drop SUBI & BNEZ
10 LD F0,-24(R1)
11 ADDD F4,F0,F2
12 SD -24(R1),F4
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP

```

How can remove them?

FTC.W99.52

## Where are the data dependencies?

```

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SUBI R1,R1,8
4 BNEZ R1,Loop ;delayed branch
5 SD 8(R1),F4 ;altered when move past SUBI

```

FTC.W99.50

## Compiler Perspectives on Code Movement

- Another kind of dependence called **name dependence**: two instructions use same name (register or memory location) but don't exchange data
- Antidependence** (WAR if a hazard for HW)
  - Instruction j writes a register or memory location that instruction i reads from and instruction i is executed first
- Output dependence** (WAW if a hazard for HW)
  - Instruction i and instruction j write the same register or memory location; ordering between instructions must be preserved.

FTC.W99.51

## Compiler Perspectives on Code Movement

- Again Name Dependence is Hard for Memory Accesses
  - Does  $100(R4) = 20(R6)$ ?
  - From different loop iterations, does  $20(R6) = 20(R6)$ ?
- Our example required compiler to know that if R1 doesn't change then:

$0(R1) -8(R1) -16(R1) -24(R1)$

There were no dependencies between some loads and stores so they could be moved by each other

FTC.W99.54

## Compiler Perspectives on Code Movement

- Final kind of dependence called **control dependence**
- Example
 

```
if p1 {S1;};
if p2 {S2;};
```

S1 is control dependent on p1 and S2 is control dependent on p2 but not on p1.

FTC.W99 55

## Compiler Perspectives on Code Movement

- Two (obvious) constraints on control dependences:
  - An instruction that is **control dependent** on a branch cannot be moved **before** the branch so that its execution is no longer controlled by the branch.
  - An instruction that is **not control dependent** on a branch cannot be moved to **after** the branch so that its execution is controlled by the branch.
- Control dependencies relaxed to get parallelism; get same effect if preserve order of exceptions (address in register checked by branch before use) and data flow (value in register depends on branch)

FTC.W99 56

## Where are the control dependencies?

```

1 Loop: LD  F0,0(R1)
2      ADDD  F4,F0,F2
3      SD  0(R1),F4
4      SUBI  R1,R1,8
5      BEQZ  R1,exit
6      LD  F0,0(R1)
7      ADDD  F4,F0,F2
8      SD  0(R1),F4
9      SUBI  R1,R1,8
10     BEQZ  R1,exit
11     LD  F0,0(R1)
12     ADDD  F4,F0,F2
13     SD  0(R1),F4
14     SUBI  R1,R1,8
15     BEQZ  R1,exit
....
```

FTC.W99 57

## When Safe to Unroll Loop?

- Example: Where are data dependencies? (A,B,C distinct & nonoverlapping)
 

```
for (i=1; i<=100; i+=1) {
  A[i+1] = A[i] + C[i];  /* S1 */
  B[i+1] = B[i] + A[i+1]; /* S2 */
```

1. S2 uses the value, A[i+1], computed by S1 in the same iteration.  
 2. S1 uses a value computed by S1 in an earlier iteration, since iteration i computes A[i+1] which is read in iteration i+1. The same is true of S2 for B[i] and B[i+1].  
 This is a "Loop-carried dependence": between iterations
- Implies that iterations are dependent, and can't be executed in parallel
- Not the case for our prior example; each iteration was distinct

FTC.W99 58

## HW Schemes: Instruction Parallelism

- Why in HW at run time?
  - Works when can't know real dependence at compile time
  - Compiler simpler
  - Code for one machine runs well on another
- Key idea: Allow instructions behind stall to proceed
 

```
DIVD  F0,F2,F4
ADDD  F10,F0,F8
SUBD  F12,F8,F14
```

  - Enables out-of-order execution => out-of-order completion
  - ID stage checked both for structuralScoreboard dates to CDC 6600 in 1963

FTC.W99 59

## HW Schemes: Instruction Parallelism

- Out-of-order execution divides ID stage:
  - Issue—decode instructions, check for structural hazards
  - Read operands—wait until no data hazards, then read operands
- Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions
- CDC 6600: In order issue, out of order execution, out of order commit (also called completion)

FTC.W99 60



| Scoreboard Example Cycle 1 |     |    |       |      |         |       |          |          |        |
|----------------------------|-----|----|-------|------|---------|-------|----------|----------|--------|
| Instruction status         |     |    | j     |      | k       |       | Issue    |          |        |
|                            |     |    | Read  |      | Execute |       | Write    |          |        |
| Instruction                | j   | k  | Issue | Read | Execute | Write | operands | complete | Result |
| LD F6                      | 34+ | R2 | 1     |      |         |       |          |          |        |
| LD F2                      | 45+ | R3 |       |      |         |       |          |          |        |
| MULT F0                    | F2  | F4 |       |      |         |       |          |          |        |
| SUBD F8                    | F6  | F2 |       |      |         |       |          |          |        |
| DIVD F10                   | F0  | F6 |       |      |         |       |          |          |        |
| ADDD F6                    | F8  | F2 |       |      |         |       |          |          |        |

  

| Functional unit status |         |     |      |    |      |    |    |    |    |  |
|------------------------|---------|-----|------|----|------|----|----|----|----|--|
| Time                   |         |     | Name |    | dest |    | S1 |    | S2 |  |
| Busy                   | Op      | Fu  | F1   | F2 | F3   | F4 | F5 | F6 | F7 |  |
| Time                   | Name    |     |      |    |      |    |    |    |    |  |
| 1                      | Integer | Yes | Load | F6 | R2   |    |    |    |    |  |
|                        | Mult1   | No  |      |    |      |    |    |    |    |  |
|                        | Mult2   | No  |      |    |      |    |    |    |    |  |
|                        | Add     | No  |      |    |      |    |    |    |    |  |
|                        | Divide  | No  |      |    |      |    |    |    |    |  |

  

| Register result status |    |  |    |    |      |    |    |     |     |  |
|------------------------|----|--|----|----|------|----|----|-----|-----|--|
| Clock                  |    |  | FU |    | dest |    | S1 |     | S2  |  |
|                        |    |  | F0 | F2 | F4   | F6 | F8 | F10 | F12 |  |
| 1                      | FU |  |    |    |      |    |    |     |     |  |

FTC.W99 67

| Scoreboard Example Cycle 2 |     |    |       |      |         |       |          |          |        |
|----------------------------|-----|----|-------|------|---------|-------|----------|----------|--------|
| Instruction status         |     |    | j     |      | k       |       | Issue    |          |        |
|                            |     |    | Read  |      | Execute |       | Write    |          |        |
| Instruction                | j   | k  | Issue | Read | Execute | Write | operands | complete | Result |
| LD F6                      | 34+ | R2 | 1     |      |         |       |          |          |        |
| LD F2                      | 45+ | R3 |       |      |         |       |          |          |        |
| MULT F0                    | F2  | F4 |       |      |         |       |          |          |        |
| SUBD F8                    | F6  | F2 |       |      |         |       |          |          |        |
| DIVD F10                   | F0  | F6 |       |      |         |       |          |          |        |
| ADDD F6                    | F8  | F2 |       |      |         |       |          |          |        |

  

| Functional unit status |         |     |      |    |      |    |    |    |    |  |
|------------------------|---------|-----|------|----|------|----|----|----|----|--|
| Time                   |         |     | Name |    | dest |    | S1 |    | S2 |  |
| Busy                   | Op      | Fu  | F1   | F2 | F3   | F4 | F5 | F6 | F7 |  |
| Time                   | Name    |     |      |    |      |    |    |    |    |  |
| 1                      | Integer | Yes | Load | F6 | R2   |    |    |    |    |  |
|                        | Mult1   | No  |      |    |      |    |    |    |    |  |
|                        | Mult2   | No  |      |    |      |    |    |    |    |  |
|                        | Add     | No  |      |    |      |    |    |    |    |  |
|                        | Divide  | No  |      |    |      |    |    |    |    |  |

  

| Register result status |    |  |    |    |      |    |    |     |     |  |
|------------------------|----|--|----|----|------|----|----|-----|-----|--|
| Clock                  |    |  | FU |    | dest |    | S1 |     | S2  |  |
|                        |    |  | F0 | F2 | F4   | F6 | F8 | F10 | F12 |  |
| 1                      | FU |  |    |    |      |    |    |     |     |  |

FTC.W99 68

- Issue 2nd LD?

| Scoreboard Example Cycle 3 |     |    |       |      |         |       |          |          |        |
|----------------------------|-----|----|-------|------|---------|-------|----------|----------|--------|
| Instruction status         |     |    | j     |      | k       |       | Issue    |          |        |
|                            |     |    | Read  |      | Execute |       | Write    |          |        |
| Instruction                | j   | k  | Issue | Read | Execute | Write | operands | complete | Result |
| LD F6                      | 34+ | R2 | 1     |      |         |       |          |          |        |
| LD F2                      | 45+ | R3 |       |      |         |       |          |          |        |
| MULT F0                    | F2  | F4 |       |      |         |       |          |          |        |
| SUBD F8                    | F6  | F2 |       |      |         |       |          |          |        |
| DIVD F10                   | F0  | F6 |       |      |         |       |          |          |        |
| ADDD F6                    | F8  | F2 |       |      |         |       |          |          |        |

  

| Functional unit status |         |     |      |    |      |    |    |    |    |  |
|------------------------|---------|-----|------|----|------|----|----|----|----|--|
| Time                   |         |     | Name |    | dest |    | S1 |    | S2 |  |
| Busy                   | Op      | Fu  | F1   | F2 | F3   | F4 | F5 | F6 | F7 |  |
| Time                   | Name    |     |      |    |      |    |    |    |    |  |
| 1                      | Integer | Yes | Load | F6 | R2   |    |    |    |    |  |
|                        | Mult1   | No  |      |    |      |    |    |    |    |  |
|                        | Mult2   | No  |      |    |      |    |    |    |    |  |
|                        | Add     | No  |      |    |      |    |    |    |    |  |
|                        | Divide  | No  |      |    |      |    |    |    |    |  |

  

| Register result status |    |  |    |    |      |    |    |     |     |  |
|------------------------|----|--|----|----|------|----|----|-----|-----|--|
| Clock                  |    |  | FU |    | dest |    | S1 |     | S2  |  |
|                        |    |  | F0 | F2 | F4   | F6 | F8 | F10 | F12 |  |
| 1                      | FU |  |    |    |      |    |    |     |     |  |

FTC.W99 69

- Issue MULT?

| Scoreboard Example Cycle 4 |     |    |       |      |         |       |          |          |        |
|----------------------------|-----|----|-------|------|---------|-------|----------|----------|--------|
| Instruction status         |     |    | j     |      | k       |       | Issue    |          |        |
|                            |     |    | Read  |      | Execute |       | Write    |          |        |
| Instruction                | j   | k  | Issue | Read | Execute | Write | operands | complete | Result |
| LD F6                      | 34+ | R2 | 1     |      |         |       |          |          |        |
| LD F2                      | 45+ | R3 |       |      |         |       |          |          |        |
| MULT F0                    | F2  | F4 |       |      |         |       |          |          |        |
| SUBD F8                    | F6  | F2 |       |      |         |       |          |          |        |
| DIVD F10                   | F0  | F6 |       |      |         |       |          |          |        |
| ADDD F6                    | F8  | F2 |       |      |         |       |          |          |        |

  

| Functional unit status |         |     |      |    |      |    |    |    |    |  |
|------------------------|---------|-----|------|----|------|----|----|----|----|--|
| Time                   |         |     | Name |    | dest |    | S1 |    | S2 |  |
| Busy                   | Op      | Fu  | F1   | F2 | F3   | F4 | F5 | F6 | F7 |  |
| Time                   | Name    |     |      |    |      |    |    |    |    |  |
| 1                      | Integer | Yes | Load | F6 | R2   |    |    |    |    |  |
|                        | Mult1   | No  |      |    |      |    |    |    |    |  |
|                        | Mult2   | No  |      |    |      |    |    |    |    |  |
|                        | Add     | No  |      |    |      |    |    |    |    |  |
|                        | Divide  | No  |      |    |      |    |    |    |    |  |

  

| Register result status |    |  |    |    |      |    |    |     |     |  |
|------------------------|----|--|----|----|------|----|----|-----|-----|--|
| Clock                  |    |  | FU |    | dest |    | S1 |     | S2  |  |
|                        |    |  | F0 | F2 | F4   | F6 | F8 | F10 | F12 |  |
| 1                      | FU |  |    |    |      |    |    |     |     |  |

FTC.W99 70

| Scoreboard Example Cycle 5 |     |    |       |      |         |       |          |          |        |
|----------------------------|-----|----|-------|------|---------|-------|----------|----------|--------|
| Instruction status         |     |    | j     |      | k       |       | Issue    |          |        |
|                            |     |    | Read  |      | Execute |       | Write    |          |        |
| Instruction                | j   | k  | Issue | Read | Execute | Write | operands | complete | Result |
| LD F6                      | 34+ | R2 | 1     |      |         |       |          |          |        |
| LD F2                      | 45  |    |       |      |         |       |          |          |        |

| Scoreboard Example Cycle 7 |         |        |         |      |        |         |          |                          |     |
|----------------------------|---------|--------|---------|------|--------|---------|----------|--------------------------|-----|
| Instruction status         |         | j      |         | k    |        | Read    |          | Execution Write          |     |
| Instruction                |         | j      |         | k    |        | Issue   |          | operands complete/Result |     |
| LD                         | F6      | 34+    | R2      | 1    | 2      | 3       | 4        |                          |     |
| LD                         | F2      | 45+    | R3      | 5    | 6      | 7       |          |                          |     |
| MULTF0                     | F0      | F2     |         | 6    |        |         |          |                          |     |
| SUBD F8                    | F6      | F2     |         | 7    |        |         |          |                          |     |
| DIVD F10                   | F0      | F6     |         |      |        |         |          |                          |     |
| ADDD F6                    | F8      | F2     |         |      |        |         |          |                          |     |
| Functional unit status     |         |        |         |      |        |         |          |                          |     |
| Time                       | Name    |        |         | dest | S1     | S2      | FU for j | FU for k                 | Fj? |
|                            | Busy    | Op     |         | F1   | F2     | R3      | Qj       | Qk                       | Rj  |
|                            | Integer | Load   |         |      |        |         |          |                          | Yes |
|                            | Mult1   | Mult   | F0      | F2   | F4     | Integer | No       | Yes                      |     |
|                            | Mult2   | No     |         |      |        |         |          |                          |     |
|                            | Add     | Sub    | F8      | F6   | F2     | Integer | Yes      | No                       |     |
|                            | Divide  | No     |         |      |        |         |          |                          |     |
| Register result status     |         |        |         |      |        |         |          |                          |     |
| Clock                      |         | F0     | F2      | F4   | F6     | F8      | F10      | F12                      | ... |
| 7                          | FU      | Multi1 | Integer | Add  | Divide |         |          |                          |     |

- Read multiply operands?

FTC.W99.73

| Scoreboard Example Cycle 8a |         |        |         |      |        |         |          |                          |     |
|-----------------------------|---------|--------|---------|------|--------|---------|----------|--------------------------|-----|
| Instruction status          |         | j      |         | k    |        | Read    |          | Execution Write          |     |
| Instruction                 |         | j      |         | k    |        | Issue   |          | operands complete/Result |     |
| LD                          | F6      | 34+    | R2      | 1    | 2      | 3       | 4        |                          |     |
| LD                          | F2      | 45+    | R3      | 5    | 6      | 7       |          |                          |     |
| MULTF0                      | F0      | F2     |         | 6    |        |         |          |                          |     |
| SUBD F8                     | F6      | F2     |         | 7    |        |         |          |                          |     |
| DIVD F10                    | F0      | F6     |         |      |        |         |          |                          |     |
| ADDD F6                     | F8      | F2     |         |      |        |         |          |                          |     |
| Functional unit status      |         |        |         |      |        |         |          |                          |     |
| Time                        | Name    |        |         | dest | S1     | S2      | FU for j | FU for k                 | Fj? |
|                             | Busy    | Op     |         | F1   | F2     | R3      | Qj       | Qk                       | Rj  |
|                             | Integer | Load   |         |      |        |         |          |                          | Yes |
|                             | Mult1   | Mult   | F0      | F2   | F4     | Integer | No       | Yes                      |     |
|                             | Mult2   | No     |         |      |        |         |          |                          |     |
|                             | Add     | Sub    | F8      | F6   | F2     | Integer | Yes      | No                       |     |
|                             | Divide  | No     |         |      |        |         |          |                          |     |
| Register result status      |         |        |         |      |        |         |          |                          |     |
| Clock                       |         | F0     | F2      | F4   | F6     | F8      | F10      | F12                      | ... |
| 8                           | FU      | Multi1 | Integer | Add  | Divide |         |          |                          |     |

### Scoreboard Example Cycle 8a

FTC.W99.74

| Scoreboard Example Cycle 8b |         |        |         |      |        |         |          |                          |     |
|-----------------------------|---------|--------|---------|------|--------|---------|----------|--------------------------|-----|
| Instruction status          |         | j      |         | k    |        | Read    |          | Execution Write          |     |
| Instruction                 |         | j      |         | k    |        | Issue   |          | operands complete/Result |     |
| LD                          | F6      | 34+    | R2      | 1    | 2      | 3       | 4        |                          |     |
| LD                          | F2      | 45+    | R3      | 5    | 6      | 7       | 8        |                          |     |
| MULTF0                      | F0      | F2     |         | 6    |        |         |          |                          |     |
| SUBD F8                     | F6      | F2     |         | 7    |        |         |          |                          |     |
| DIVD F10                    | F0      | F6     |         |      |        |         |          |                          |     |
| ADDD F6                     | F8      | F2     |         |      |        |         |          |                          |     |
| Functional unit status      |         |        |         |      |        |         |          |                          |     |
| Time                        | Name    |        |         | dest | S1     | S2      | FU for j | FU for k                 | Fj? |
|                             | Busy    | Op     |         | F1   | F2     | R3      | Qj       | Qk                       | Rj  |
|                             | Integer | Load   |         |      |        |         |          |                          | Yes |
|                             | Mult1   | Mult   | F0      | F2   | F4     | Integer | No       | Yes                      |     |
|                             | Mult2   | No     |         |      |        |         |          |                          |     |
|                             | Add     | Sub    | F8      | F6   | F2     | Integer | Yes      | No                       |     |
|                             | Divide  | No     |         |      |        |         |          |                          |     |
| Register result status      |         |        |         |      |        |         |          |                          |     |
| Clock                       |         | F0     | F2      | F4   | F6     | F8      | F10      | F12                      | ... |
| 8                           | FU      | Multi1 | Integer | Add  | Divide |         |          |                          |     |

### Scoreboard Example Cycle 8b

FTC.W99.75

| Scoreboard Example Cycle 9 |         |        |         |      |        |       |          |                          |     |
|----------------------------|---------|--------|---------|------|--------|-------|----------|--------------------------|-----|
| Instruction status         |         | j      |         | k    |        | Read  |          | Execution Write          |     |
| Instruction                |         | j      |         | k    |        | Issue |          | operands complete/Result |     |
| LD                         | F6      | 34+    | R2      | 1    | 2      | 3     | 4        |                          |     |
| LD                         | F2      | 45+    | R3      | 5    | 6      | 7     | 8        |                          |     |
| MULTF0                     | F0      | F2     |         | 6    |        |       |          |                          |     |
| SUBD F8                    | F6      | F2     |         | 7    |        |       |          |                          |     |
| DIVD F10                   | F0      | F6     |         |      |        |       |          |                          |     |
| ADDD F6                    | F8      | F2     |         |      |        |       |          |                          |     |
| Functional unit status     |         |        |         |      |        |       |          |                          |     |
| Time                       | Name    |        |         | dest | S1     | S2    | FU for j | FU for k                 | Fj? |
|                            | Busy    | Op     |         | F1   | F2     | R3    | Qj       | Qk                       | Rj  |
|                            | Integer | Load   |         |      |        |       |          |                          | Yes |
|                            | Mult1   | Mult   | F0      | F2   | F4     | Yes   | Yes      |                          |     |
|                            | Mult2   | No     |         |      |        |       |          |                          |     |
|                            | Add     | Sub    | F8      | F6   | F2     | Yes   | Yes      |                          |     |
|                            | Divide  | Yes    | Div     | F10  | F0     | F6    | Mult1    | No                       | Yes |
| Register result status     |         |        |         |      |        |       |          |                          |     |
| Clock                      |         | F0     | F2      | F4   | F6     | F8    | F10      | F12                      | ... |
| 9                          | FU      | Multi1 | Integer | Add  | Divide |       |          |                          |     |

- Read operands for MULT & SUBD? Issue ADDD?

| Scoreboard Example Cycle 11 |         |        |         |      |        |       |          |                          |     |
|-----------------------------|---------|--------|---------|------|--------|-------|----------|--------------------------|-----|
| Instruction status          |         | j      |         | k    |        | Read  |          | Execution Write          |     |
| Instruction                 |         | j      |         | k    |        | Issue |          | operands complete/Result |     |
| LD                          | F6      | 34+    | R2      | 1    | 2      | 3     | 4        |                          |     |
| LD                          | F2      | 45+    | R3      | 5    | 6      | 7     | 8        |                          |     |
| MULTF0                      | F0      | F2     |         | 6    |        |       |          |                          |     |
| SUBD F8                     | F6      | F2     |         | 7    |        |       |          |                          |     |
| DIVD F10                    | F0      | F6     |         |      |        |       |          |                          |     |
| ADDD F6                     | F8      | F2     |         |      |        |       |          |                          |     |
| Functional unit status      |         |        |         |      |        |       |          |                          |     |
| Time                        | Name    |        |         | dest | S1     | S2    | FU for j | FU for k                 | Fj? |
|                             | Busy    | Op     |         | F1   | F2     | R3    | Qj       | Qk                       | Rj  |
|                             | Integer | Load   |         |      |        |       |          |                          | Yes |
|                             | Mult1   | Mult   | F0      | F2   | F4     | Yes   | Yes      |                          |     |
|                             | Mult2   | No     |         |      |        |       |          |                          |     |
|                             | Add     | Sub    | F8      | F6   | F2     | Yes   | Yes      |                          |     |
|                             | Divide  | Yes    | Div     | F10  | F0     | F6    | Mult1    | No                       | Yes |
| Register result status      |         |        |         |      |        |       |          |                          |     |
| Clock                       |         | F0     | F2      | F4   | F6     | F8    | F10      | F12                      | ... |
| 11                          | FU      | Multi1 | Integer | Add  | Divide |       |          |                          |     |

FTC.W99.77

| Scoreboard Example Cycle 12 |         |        |         |      |        |       |          |                          |     |
|-----------------------------|---------|--------|---------|------|--------|-------|----------|--------------------------|-----|
| Instruction status          |         | j      |         | k    |        | Read  |          | Execution Write          |     |
| Instruction                 |         | j      |         | k    |        | Issue |          | operands complete/Result |     |
| LD                          | F6      | 34+    | R2      | 1    | 2      | 3     | 4        |                          |     |
| LD                          | F2      | 45+    | R3      | 5    | 6      | 7     | 8        |                          |     |
| MULTF0                      | F0      | F2     |         | 6    |        |       |          |                          |     |
| SUBD F8                     | F6      | F2     |         | 7    |        |       |          |                          |     |
| DIVD F10                    | F0      | F6     |         |      |        |       |          |                          |     |
| ADDD F6                     | F8      | F2     |         |      |        |       |          |                          |     |
| Functional unit status      |         |        |         |      |        |       |          |                          |     |
| Time                        | Name    |        |         | dest | S1     | S2    | FU for j | FU for k                 | Fj? |
|                             | Busy    | Op     |         | F1   | F2     | R3    | Qj       | Qk                       | Rj  |
|                             | Integer | Load   |         |      |        |       |          |                          | Yes |
|                             | Mult1   | Mult   | F0      | F2   | F4     | Yes   | Yes      |                          |     |
|                             | Mult2   | No     |         |      |        |       |          |                          |     |
|                             | Add     | Sub    | F8      | F6   | F2     | Yes   | Yes      |                          |     |
|                             | Divide  | Yes    | Div     | F10  | F0     | F6    | Mult1    | No                       | Yes |
| Register result status      |         |        |         |      |        |       |          |                          |     |
| Clock                       |         | F0     | F2      | F4   | F6     | F8    | F10      | F12                      | ... |
| 12                          | FU      | Multi1 | Integer | Add  | Divide |       |          |                          |     |

- Read operands for DIVD?

FTC.W99.78

| Scoreboard Example Cycle 13 |     |    |       |          |          |                 |  |  |  |
|-----------------------------|-----|----|-------|----------|----------|-----------------|--|--|--|
| Instruction status          |     |    | Read  |          |          | Execution Write |  |  |  |
| Instruction                 | j   | k  | Issue | operands | complete | Result          |  |  |  |
| LD F6                       | 34+ | R2 | 1     | 2        | 3        | 4               |  |  |  |
| LD F2                       | 45+ | R3 | 5     | 6        | 7        | 8               |  |  |  |
| MULT F0                     | F2  | F4 | 6     | 9        |          |                 |  |  |  |
| SUBD F8                     | F6  | F2 | 7     | 9        | 11       | 12              |  |  |  |
| DIVD F10                    | F0  | F6 | 8     |          |          |                 |  |  |  |
| ADDD F6                     | F8  | F2 | 13    |          |          |                 |  |  |  |

  

| Functional unit status |         |      | dest |     |    | S1 |    |    | S2 |    |    | FU for j |    |    | FU for k |    |     | Fj? | Fk? |
|------------------------|---------|------|------|-----|----|----|----|----|----|----|----|----------|----|----|----------|----|-----|-----|-----|
| Time                   | Name    | Busy | Op   | F1  | Fj | Fk | Oj | Ok | F1 | Fj | Fk | Oj       | Ok | F1 | Fj       | Fk | Rj  | Rk  |     |
| 6                      | Integer | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |    |     |     |     |
| 6                      | Mult1   | Yes  | Mult | F0  | F2 | F4 |    |    |    |    |    |          |    |    |          |    | Yes | Yes |     |
|                        | Mult2   | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |    |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2 |    |    |    |    |    |          |    |    |          |    | Yes | Yes |     |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6 |    |    |    |    |    |          |    |    |          |    | No  | Yes |     |

  

| Register result status |    |       |     |        |    |     |     |     |     |
|------------------------|----|-------|-----|--------|----|-----|-----|-----|-----|
| Clock                  | F0 | F2    | F4  | F6     | F8 | F10 | F12 | ... | F30 |
| 13                     | FU | Mult1 | Add | Divide |    |     |     |     |     |

| Scoreboard Example Cycle 14 |     |    |       |          |          |                 |  |  |  |
|-----------------------------|-----|----|-------|----------|----------|-----------------|--|--|--|
| Instruction status          |     |    | Read  |          |          | Execution Write |  |  |  |
| Instruction                 | j   | k  | Issue | operands | complete | Result          |  |  |  |
| LD F6                       | 34+ | R2 | 1     | 2        | 3        | 4               |  |  |  |
| LD F2                       | 45+ | R3 | 5     | 6        | 7        | 8               |  |  |  |
| MULT F0                     | F2  | F4 | 6     | 9        |          |                 |  |  |  |
| SUBD F8                     | F6  | F2 | 7     | 9        | 11       | 12              |  |  |  |
| DIVD F10                    | F0  | F6 | 8     |          |          |                 |  |  |  |
| ADDD F6                     | F8  | F2 | 13    | 14       | 16       |                 |  |  |  |

  

| Functional unit status |         |      | dest |     |    | S1 |    |    | S2 |    |    | FU for j |    |    | FU for k |     |     | Fj? | Fk? |
|------------------------|---------|------|------|-----|----|----|----|----|----|----|----|----------|----|----|----------|-----|-----|-----|-----|
| Time                   | Name    | Busy | Op   | F1  | Fj | Fk | Oj | Ok | F1 | Fj | Fk | Oj       | Ok | F1 | Fj       | Fk  | Rj  | Rk  |     |
| 5                      | Integer | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
| 5                      | Mult1   | Yes  | Mult | F0  | F2 | F4 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Mult2   | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6 |    |    |    |    |    |          |    |    |          | No  | Yes |     |     |

  

| Register result status |    |       |     |        |    |     |     |     |     |
|------------------------|----|-------|-----|--------|----|-----|-----|-----|-----|
| Clock                  | F0 | F2    | F4  | F6     | F8 | F10 | F12 | ... | F30 |
| 14                     | FU | Mult1 | Add | Divide |    |     |     |     |     |

| Scoreboard Example Cycle 15 |     |    |       |          |          |                 |  |  |  |
|-----------------------------|-----|----|-------|----------|----------|-----------------|--|--|--|
| Instruction status          |     |    | Read  |          |          | Execution Write |  |  |  |
| Instruction                 | j   | k  | Issue | operands | complete | Result          |  |  |  |
| LD F6                       | 34+ | R2 | 1     | 2        | 3        | 4               |  |  |  |
| LD F2                       | 45+ | R3 | 5     | 6        | 7        | 8               |  |  |  |
| MULT F0                     | F2  | F4 | 6     | 9        |          |                 |  |  |  |
| SUBD F8                     | F6  | F2 | 7     | 9        | 11       | 12              |  |  |  |
| DIVD F10                    | F0  | F6 | 8     |          |          |                 |  |  |  |
| ADDD F6                     | F8  | F2 | 13    | 14       | 16       |                 |  |  |  |

  

| Functional unit status |         |      | dest |     |    | S1 |    |    | S2 |    |    | FU for j |    |    | FU for k |     |     | Fj? | Fk? |
|------------------------|---------|------|------|-----|----|----|----|----|----|----|----|----------|----|----|----------|-----|-----|-----|-----|
| Time                   | Name    | Busy | Op   | F1  | Fj | Fk | Oj | Ok | F1 | Fj | Fk | Oj       | Ok | F1 | Fj       | Fk  | Rj  | Rk  |     |
| 4                      | Integer | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
| 4                      | Mult1   | Yes  | Mult | F0  | F2 | F4 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Mult2   | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6 |    |    |    |    |    |          |    |    |          | No  | Yes |     |     |

  

| Register result status |    |       |     |        |    |     |     |     |     |
|------------------------|----|-------|-----|--------|----|-----|-----|-----|-----|
| Clock                  | F0 | F2    | F4  | F6     | F8 | F10 | F12 | ... | F30 |
| 15                     | FU | Mult1 | Add | Divide |    |     |     |     |     |

| Scoreboard Example Cycle 17 |     |    |       |          |          |                 |  |  |  |
|-----------------------------|-----|----|-------|----------|----------|-----------------|--|--|--|
| Instruction status          |     |    | Read  |          |          | Execution Write |  |  |  |
| Instruction                 | j   | k  | Issue | operands | complete | Result          |  |  |  |
| LD F6                       | 34+ | R2 | 1     | 2        | 3        | 4               |  |  |  |
| LD F2                       | 45+ | R3 | 5     | 6        | 7        | 8               |  |  |  |
| MULT F0                     | F2  | F4 | 6     | 9        |          |                 |  |  |  |
| SUBD F8                     | F6  | F2 | 7     | 9        | 11       | 12              |  |  |  |
| DIVD F10                    | F0  | F6 | 8     |          |          |                 |  |  |  |
| ADDD F6                     | F8  | F2 | 13    | 14       | 16       |                 |  |  |  |

  

| Functional unit status |         |      | dest |     |    | S1 |    |    | S2 |    |    | FU for j |    |    | FU for k |     |     | Fj? | Fk? |
|------------------------|---------|------|------|-----|----|----|----|----|----|----|----|----------|----|----|----------|-----|-----|-----|-----|
| Time                   | Name    | Busy | Op   | F1  | Fj | Fk | Oj | Ok | F1 | Fj | Fk | Oj       | Ok | F1 | Fj       | Fk  | Rj  | Rk  |     |
| 17                     | Integer | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
| 17                     | Mult1   | Yes  | Mult | F0  | F2 | F4 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Mult2   | No   |      |     |    |    |    |    |    |    |    |          |    |    |          |     |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2 |    |    |    |    |    |          |    |    |          | Yes | Yes |     |     |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6 |    |    |    |    |    |          |    |    |          | No  | Yes |     |     |

  

| Register result status |    |       |     |        |    |     |     |     |     |
|------------------------|----|-------|-----|--------|----|-----|-----|-----|-----|
| Clock                  | F0 | F2    | F4  | F6     | F8 | F10 | F12 | ... | F30 |
| 17                     | FU | Mult1 | Add | Divide |    |     |     |     |     |

| Scoreboard Example Cycle 18 |     |    |       |          |          |                 |  |    |  |
|-----------------------------|-----|----|-------|----------|----------|-----------------|--|----|--|
| Instruction status          |     |    | Read  |          |          | Execution Write |  |    |  |
| Instruction                 | j   | k  | Issue | operands | complete | Result          |  |    |  |
| LD F6                       | 34+ | R2 | 1     | 2        | 3        | 4               |  |    |  |
| LD F2                       | 45+ | R3 | 5     | 6        | 7        | 8               |  | </ |  |

### Scoreboard Example Cycle 19

| Instruction status |     | <i>j</i> | <i>k</i> | Read  |                   | Execution Write |    |
|--------------------|-----|----------|----------|-------|-------------------|-----------------|----|
| Instruction        |     |          |          | Issue | operands complete | Result          |    |
| LD F6              | 34+ | R2       |          | 1     | 2                 | 3               | 4  |
| LD F2              | 45+ | R3       |          | 5     | 6                 | 7               | 8  |
| MULT:F0            | F2  | F4       |          | 6     | 9                 | 19              | 20 |
| SUBD:F8            | F6  | F2       |          | 7     | 9                 | 11              | 12 |
| DIVD:F10           | F0  | F6       |          | 8     |                   |                 |    |
| ADD:DF6            | F8  | F2       |          | 13    | 14                | 16              |    |

  

| Functional unit status |        | <i>Time</i> | <i>Name</i> | <i>dest</i> | <i>Op</i> | <i>F1</i> | <i>F2</i> | <i>F3</i> | <i>SI</i> | <i>S2</i> | <i>FU for j</i> | <i>FU for k</i> | <i>Fj?</i> | <i>Fk?</i> |
|------------------------|--------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------------|-----------------|------------|------------|
|                        |        |             | Time        | Name        | Busy      | Op        | F1        | F2        | SI        | S2        | FU for j        | FU for k        | Fj?        | Fk?        |
|                        |        |             | Integer     |             | No        |           |           |           |           |           |                 |                 |            |            |
| 0                      | Multi1 |             | Multi1      | F0          | F2        | F4        |           |           |           |           |                 |                 | Yes        | Yes        |
|                        | Multi2 |             | Multi2      |             |           |           |           |           |           |           |                 |                 |            |            |
|                        | Add    |             | Add         | F6          | F8        | F2        |           |           |           |           |                 |                 | Yes        | Yes        |
|                        | Divide |             | Divide      | F10         | F0        | F6        | Multi1    |           |           |           |                 |                 | No         | Yes        |

  

| Register result status |  | <i>Clock</i> | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | <i>...</i> | <i>F30</i> |
|------------------------|--|--------------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|------------|
|                        |  | 19           | FU        | Multi1    | Add       | Divide    |           |            |            |            |            |

FTC.W99 85

### Scoreboard Example Cycle 20

| Instruction status |     | <i>j</i> | <i>k</i> | Read  |                   | Execution Write |    |
|--------------------|-----|----------|----------|-------|-------------------|-----------------|----|
| Instruction        |     |          |          | Issue | operands complete | Result          |    |
| LD F6              | 34+ | R2       |          | 1     | 2                 | 3               | 4  |
| LD F2              | 45+ | R3       |          | 5     | 6                 | 7               | 8  |
| MULT:F0            | F2  | F4       |          | 6     | 9                 | 19              | 20 |
| SUBD:F8            | F6  | F2       |          | 7     | 9                 | 11              | 12 |
| DIVD:F10           | F0  | F6       |          | 8     | 21                |                 |    |
| ADD:DF6            | F8  | F2       |          | 13    | 14                | 16              |    |

  

| Functional unit status |        | <i>Time</i> | <i>Name</i> | <i>dest</i> | <i>Op</i> | <i>F1</i> | <i>F2</i> | <i>F3</i> | <i>SI</i> | <i>S2</i> | <i>FU for j</i> | <i>FU for k</i> | <i>Fj?</i> | <i>Fk?</i> |
|------------------------|--------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------------|-----------------|------------|------------|
|                        |        | Time        | Name        | Busy        | Op        | F1        | F2        | F3        | SI        | S2        | FU for j        | FU for k        | Fj?        | Fk?        |
|                        |        | Integer     |             | No          |           |           |           |           |           |           |                 |                 |            |            |
| 0                      | Multi1 |             | Multi1      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Multi2 |             | Multi2      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Add    |             | Add         | Yes         | Add       | F6        | F8        | F2        |           |           |                 |                 | Yes        | Yes        |
|                        | Divide |             | Divide      | Yes         | Dv        | F10       | F0        | F6        |           |           |                 |                 | Yes        | Yes        |

  

| Register result status |  | <i>Clock</i> | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | <i>...</i> | <i>F30</i> |
|------------------------|--|--------------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|------------|
|                        |  | 20           | FU        |           |           | Add       | Divide    |            |            |            |            |

FTC.W99 86

### Scoreboard Example Cycle 21

| Instruction status |     | <i>j</i> | <i>k</i> | Read  |                   | Execution Write |    |
|--------------------|-----|----------|----------|-------|-------------------|-----------------|----|
| Instruction        |     |          |          | Issue | operands complete | Result          |    |
| LD F6              | 34+ | R2       |          | 1     | 2                 | 3               | 4  |
| LD F2              | 45+ | R3       |          | 5     | 6                 | 7               | 8  |
| MULT:F0            | F2  | F4       |          | 6     | 9                 | 19              | 20 |
| SUBD:F8            | F6  | F2       |          | 7     | 9                 | 11              | 12 |
| DIVD:F10           | F0  | F6       |          | 8     | 21                |                 |    |
| ADD:DF6            | F8  | F2       |          | 13    | 14                | 16              |    |

  

| Functional unit status |        | <i>Time</i> | <i>Name</i> | <i>dest</i> | <i>Op</i> | <i>F1</i> | <i>F2</i> | <i>F3</i> | <i>SI</i> | <i>S2</i> | <i>FU for j</i> | <i>FU for k</i> | <i>Fj?</i> | <i>Fk?</i> |
|------------------------|--------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------------|-----------------|------------|------------|
|                        |        | Time        | Name        | Busy        | Op        | F1        | F2        | F3        | SI        | S2        | FU for j        | FU for k        | Fj?        | Fk?        |
|                        |        | Integer     |             | No          |           |           |           |           |           |           |                 |                 |            |            |
| 0                      | Multi1 |             | Multi1      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Multi2 |             | Multi2      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Add    |             | Add         | Yes         | Add       | F6        | F8        | F2        |           |           |                 |                 | Yes        | Yes        |
|                        | Divide |             | Divide      | Yes         | Dv        | F10       | F0        | F6        |           |           |                 |                 | Yes        | Yes        |

  

| Register result status |  | <i>Clock</i> | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | <i>...</i> | <i>F30</i> |
|------------------------|--|--------------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|------------|
|                        |  | 21           | FU        |           |           | Add       | Divide    |            |            |            |            |

FTC.W99 87

### Scoreboard Example Cycle 22

| Instruction status |     | <i>j</i> | <i>k</i> | Read  |                   | Execution Write |    |
|--------------------|-----|----------|----------|-------|-------------------|-----------------|----|
| Instruction        |     |          |          | Issue | operands complete | Result          |    |
| LD F6              | 34+ | R2       |          | 1     | 2                 | 3               | 4  |
| LD F2              | 45+ | R3       |          | 5     | 6                 | 7               | 8  |
| MULT:F0            | F2  | F4       |          | 6     | 9                 | 19              | 20 |
| SUBD:F8            | F6  | F2       |          | 7     | 9                 | 11              | 12 |
| DIVD:F10           | F0  | F6       |          | 8     | 21                |                 |    |
| ADD:DF6            | F8  | F2       |          | 13    | 14                | 16              |    |

  

| Functional unit status |        | <i>Time</i> | <i>Name</i> | <i>dest</i> | <i>Op</i> | <i>F1</i> | <i>F2</i> | <i>F3</i> | <i>SI</i> | <i>S2</i> | <i>FU for j</i> | <i>FU for k</i> | <i>Fj?</i> | <i>Fk?</i> |
|------------------------|--------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------------|-----------------|------------|------------|
|                        |        | Time        | Name        | Busy        | Op        | F1        | F2        | F3        | SI        | S2        | FU for j        | FU for k        | Fj?        | Fk?        |
|                        |        | Integer     |             | No          |           |           |           |           |           |           |                 |                 |            |            |
| 0                      | Multi1 |             | Multi1      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Multi2 |             | Multi2      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Add    |             | Add         | Yes         | Add       | F6        | F8        | F2        |           |           |                 |                 | Yes        | Yes        |
|                        | Divide |             | Divide      | Yes         | Dv        | F10       | F0        | F6        |           |           |                 |                 | Yes        | Yes        |

  

| Register result status |  | <i>Clock</i> | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | <i>...</i> | <i>F30</i> |
|------------------------|--|--------------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|------------|
|                        |  | 22           | FU        |           |           |           |           |            |            |            |            |

FTC.W99 88

### Scoreboard Example Cycle 61

| Instruction status |     | <i>j</i> | <i>k</i> | Read  |                   | Execution Write |    |
|--------------------|-----|----------|----------|-------|-------------------|-----------------|----|
| Instruction        |     |          |          | Issue | operands complete | Result          |    |
| LD F6              | 34+ | R2       |          | 1     | 2                 | 3               | 4  |
| LD F2              | 45+ | R3       |          | 5     | 6                 | 7               | 8  |
| MULT:F0            | F2  | F4       |          | 6     | 9                 | 19              | 20 |
| SUBD:F8            | F6  | F2       |          | 7     | 9                 | 11              | 12 |
| DIVD:F10           | F0  | F6       |          | 8     | 21                |                 |    |
| ADD:DF6            | F8  | F2       |          | 13    | 14                | 16              |    |

  

| Functional unit status |        | <i>Time</i> | <i>Name</i> | <i>dest</i> | <i>Op</i> | <i>F1</i> | <i>F2</i> | <i>F3</i> | <i>SI</i> | <i>S2</i> | <i>FU for j</i> | <i>FU for k</i> | <i>Fj?</i> | <i>Fk?</i> |
|------------------------|--------|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------------|-----------------|------------|------------|
|                        |        | Time        | Name        | Busy        | Op        | F1        | F2        | F3        | SI        | S2        | FU for j        | FU for k        | Fj?        | Fk?        |
|                        |        | Integer     |             | No          |           |           |           |           |           |           |                 |                 |            |            |
| 0                      | Multi1 |             | Multi1      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Multi2 |             | Multi2      | No          |           |           |           |           |           |           |                 |                 |            |            |
|                        | Add    |             | Add         | Yes         | Add       | F6        | F8        | F2        |           |           |                 |                 | Yes        | Yes        |
|                        | Divide |             | Divide      | Yes         | Dv        | F10       | F0        | F6        |           |           |                 |                 | Yes        | Yes        |

  

| Register result status |  | <i>Clock</i> | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | <i>...</i> | <i>F30</i> |
|------------------------|--|--------------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|------------|
|                        |  | 61           | FU        |           |           |           |           |            |            |            |            |

## CDC 6600 Scoreboard

- Speedup 1.7 from compiler; 2.5 by hand  
BUT slow memory (no cache) limits benefit
- Limitations of 6600 scoreboard:
  - No forwarding hardware
  - Limited to instructions in basic block (small *window*)
  - Small number of functional units (structural hazards), especially integer/load store units
  - Do not issue on structural hazards
  - Wait for WAR hazards
  - Prevent WAW hazards

FTC.W99 91

## Summary

- Instruction Level Parallelism (ILP) in SW or HW
- Loop level parallelism is easiest to see
- SW parallelism dependencies defined for program, hazards if HW cannot resolve
- SW dependencies/compiler sophistication determine if compiler can unroll loops
  - Memory dependencies hardest to determine
- HW exploiting ILP
  - Works when can't know dependence at run time
  - Code for one machine runs well on another
- Key idea of Scoreboard: Allow instructions behind stall to proceed (Decode => Issue instr & read operands)
  - Enables out-of-order execution => out-of-order completion
  - ID stage checked both for structural

FTC.W99 92