

### Motorola's Next PowerPC<sup>™</sup> Microarchitecture with AltiVec<sup>™</sup> Technology

#### Naras Iyengar

#### Senior Member of Technical Staff Somerset Design Center Motorola, Inc.

PowerPC



### **Goals of the New Microarchitecture**

**The Microarchitecture** 

PowerPC



#### First G4 PowerPC microarchitecture disclosure at MPF 1998

- MPC7400, first product based on this microarchitecture, announced August 1999
- Additional products based on original microarchitecture to be announced at a later date

# Second G4 PowerPC microarchitecture disclosure at MPF 1999

- Features supported in this new microarchitecture detailed today
- Feature set of first product based on new microarchitecture to be disclosed next year



### **Goals of the New Microarchitecture**

### **The Microarchitecture**

PowerPC



### **Goals of New Microarchitecture**

#### Build on capabilities of first G4 PowerPC microarchitecture

- Expand the pipeline structures to support higher frequencies while maintaining/improving instructions per cycle (IPC)
- Increase performance of execution units
- Increase performance of memory sub-system
- Establish modular design concept
- Introduce additional features
- Provide full compatibility with MPC7400



### **Goals of the New Microarchitecture**

### **The Microarchitecture**

PowerPC



### **Microarchitecture Overview**

- New seven-stage pipeline
- Increased the instructions per cycle from 3 to 4
  - Two additional execution units and enhanced AltiVec Engine
- Faster /wider memory subsystem
  - High-bandwidth, 256-bit internal memory sub-system between L1 and L2 caches
  - Provides on-chip L2 cache with parity
  - Supports large backside L3 cache with 64-bit/128-bit datapath
  - Provides multiple system bus options
  - Supports embedded applications through additional features
  - Maintains full MPC7400 compatibility









#### **Core Block Diagram**



#### **Eleven execution units:**

- Three simple fixed-point units
- Complex fixed-point unit
- Floating-point unit
- Branch execution unit
- Load/store unit
- Four AltiVec units
  - Simple
  - Complex
  - Float
  - Permute

PowerPC



### Digital DNA From Motorola Processor Pipeline Comparison

#### **MPC 7400 Pipeline**

| Fetch | Dispatch | Execute | Write Back |
|-------|----------|---------|------------|
|-------|----------|---------|------------|

#### **New Pipeline**

PowerPC



#### Four instructions per cycle

• 3 dispatch + 1 branch

#### Additional resources to improve IPC

- Additional execution units
- 12-entry instruction buffer
- 16 completion buffers
- 16 GPR, 16 FPR, and 16 AltiVec rename registers
- 2048-entry Branch History Table (BHT)
- 128-entry, four-way set associative Branch Target Instruction Cache (BTIC)



Higher Performance AltiVec Implementation

- Now able to dispatch two AltiVec instructions to any of the four AltiVec execution units per clock cycle
- Full 128-bit implementation of AltiVec instruction set
- Four fully-pipelined execution units
  - Simple, Complex, Floating-Point, Permute
- Separate 32-entry, 128-bit wide Vector register file









### **Memory Subsystem: L1 Cache**

#### L1 non-blocking caches

- 32KB eight-way set associative instruction cache
- 32KB eight-way set associative data cache
- Data integrity protected by on-chip parity
  - Byte parity on data cache
  - Word parity on instruction cache
- Cache locking supported for any combination of ways in both instruction and data caches
- 256-bit datapath between L1 and L2 caches



### **Memory Subsystem: L2 Cache**

#### On-chip L2 cache

- Unified, non-blocking L2 cache
- 256KB, eight-way set associative L2 cache
- Operates in copy back mode, supports cache coherency
- Six-cycle penalty, single-cycle throughput
- Loads and stores performed for an entire cache line in one cycle using 256-bit internal data bus
- Data integrity protected by on-chip byte parity





### **Memory Subsystem: L3 Cache**

#### L3 cache interface

- On-chip tags to support up to 2MB of off-chip cache
- Critical double-word forwarding to reduce latency
- Support for both 128-bit and 64-bit data transfers
- Data and address parity supported for L3 cache
- Support for high performance MSUG2 DDR SRAM and late-write SRAMs
- Support for cost-sensitive PB2 SRAM and PC-DDR SRAMs
- L3 tag disable option
  - Allows L3 cache to be direct-mapped memory



MPC7400 L2 latency based on 2:1 L2 bus ratio New PowerPC L3 latency based on 2:1 L3 bus ratio

PowerPC



### **Other Memory Features**

#### Software tablewalk, in addition to hardware tablewalk

- Supports 36-bit memory addressing
  - Allows 64 GB of physically-addressable memory
- Extensive support for multiprocessing environment
  - Supports multiple bus protocols
    - Both 128-bit and 64-bit data transfer supported for MPX bus architecture
    - Support for legacy 60x bus architecture



### **Process Technology** and Other Features

#### Technology:

- Designed to Motorola 0.13µ process technology
- Internal voltage: 1.5V
- Supports 1.8V and 2.5V I/O

#### Design features:

- Operating frequency 700+ MHz
- Support for extensive system and L3 bus ratios
- Power consumption: less than 10W (typical)
- Number of transistors: over 33 million





#### Motorola's second G4 PowerPC microarchitecture:

- Expanded the pipeline from 4 to 7 stages without loss of IPC
- Increased instruction dispatch
- Moved L2 cache on die and implemented 256-bit datapath to L1
- Added support for high speed backside L3 cache supporting multiple configurations
- Added 36-bit physical addressing
- Added features to support additional applications

