Power Architecture

A High-Performance Architecture with a History

	Introduction
	Broadening the Application Market
	PowerPC Architecture

	The Slimming of POWER Architecture
	Completing PowerPC Architecture
	Summary

Introduction
IBM began delivery of RS/6000 products in February of 1990 [1,2]. IBM developed these products in response to customer needs for workstations and midrange systems with UNIX operating systems. The processors in these products were implementations of the POWER Architecture, a second generation reduced instruction set computer (RISC) architecture.

The POWER Architecture incorporated characteristics common to most other RISC architectures. Instructions were a fixed length (4 bytes) with consistent formats, permitting a simple instruction decoding mechanism. Load and store instructions provided all of the accesses to memory. The architecture provided a set of general purpose registers (GPRs) for fixed-point computation, including the computation of memory addresses. It provided a separate set of floating-point registers (FPRs) for floating-point computation. All computations retrieved source operands from one register set and placed results in the same register set. Most instructions performed one simple operation.

The POWER Architecture was unique among the existing RISC architectures in that it was functionally partitioned, separating the functions of program flow control, fixed-point computation, and floating-point computation. The architecture's partitioning facilitated the implementation of superscalar designs, in which multiple functional units concurrently executed independent instructions (see Figure 1).

The POWER Architecture diverged somewhat from the path taken by most other RISC architectures. The primary objective of those architectures was to be sufficiently simple so that implementations could have a very short cycle time, which would result in processors that could execute instructions at the fastest possible clock rate. The designers of the POWER Architecture chose to minimize the total time required to complete a task. The total time is the product of three components: path length, number of cycles needed to complete an instruction, and cycle time.

Loads and stores account for 20%-30% of the instructions executed by most programs. In addition, many applications manipulate arrays, for which the pattern of memory accesses is often regular (for example, every "nth" element). Based on these observations, the designers included update forms of most load and store instructions. (The update forms perform the memory access and place the updated address in the base address register). The use of these instructions avoids the need for a separate address computation after each access.

In addition, a common operation in many floating-point computational algorithms is adding a value to the product of two other values. This observation prompted the inclusion of a floating-point multiply-add instruction, which performs this operation.

The first POWER implementations provided exceptional performance. The processor complex (including a memory controller, instruction cache, and data cache) consisted of seven or nine chips depending on the system model. The seven-chip complex provided a 32KB data cache and the nine-chip complex provided a 64KB data cache. These systems contained an 8KB instruction cache which later systems expanded to 32KB.

A second POWER processor design [3] was made possible by silicon technology advances that enabled designers to put more than a million transistors on a single chip. This RISC Single Chip (RSC) design integrated a simple branch unit, a fixed-point unit, a floating-point unit, a unified cache, a memory controller, and an I/O controller on a single silicon chip for use in low-cost desktop systems. Delivery of products using this processor began in April of 1992.

The POWER2 design, described in POWER2: Next Generation of the RS/6000 Family" [4], delivers more performance than the earlier POWER designs by doubling the number of execution units. Delivery of products using this processor began in October of 1993.

Back to top

Broadening the Application Market
Performance is not the only characteristic customers consider when purchasing computer systems. They purchase systems to improve the productivity of their employees and to reduce the cost of running their business. A key to achieving these objectives is the number of different tasks the systems can perform. Customers prefer systems for which a broad range of applications is available. With these systems, customers can choose applications based on performance, cost, employee skills, application integration requirements, and other individual needs.

In 1990, five RISC architectures were competing for market share. It was unlikely that all five would succeed in the workstation market. The application marketplace would be less attractive to application developers if it were fragmented by five architectures because each version of an application would attract fewer customers.

In addition, customers began to feel that multiprocessor systems provided extra value because they offered a better price-to-performance ratio and a higher ultimate performance than uniprocessor systems. Due to these market forces, Apple, Motorola, and IBM formed a partnership whose foundation is the use of a common architecture derived from the POWER Architecture.

Back to top

PowerPC Architecture
Early in 1991, processor architects, compiler experts, operating system developers, processor designers, system architects, and system designers from the three companies worked together to develop an architecture that would meet the needs of the alliance. Because it would have been impossible to develop a completely new architecture in time to satisfy the needs of their customers, the companies decided to use the POWER Architecture as the starting point. They made changes to achieve a number of specific goals. The architecture had to:

Permit a broad range of implementations, from low-cost controllers to high-performance processors
Be sufficiently simple so as to permit the design of processors that have a very short cycle time
Minimize effects that hinder the design of aggressive superscalar implementations
Include multiprocessor features
Define a 64-bit architecture that is a superset of the 32-bit architecture, providing application binary compatibility for 32-bit applications

By analyzing the needs of applications and operating systems, considering typical instruction mixes, and inspecting application and operating system traces, the architecture group reached a consensus on the definition of the PowerPC Architecture [5]. This architecture achieves the goals previously listed, yet permits POWER customers to run their existing applications on new systems and to run new applications on their existing systems.

The PowerPC Architecture includes most of the POWER instructions. Nearly all the excluded POWER instructions are instructions that execute infrequently and the compiler can replace each excluded instruction by several other instructions that are in both architectures. The excluded instructions will cause an Illegal Instruction type Program Interrupt on PowerPC processors and will be emulated by the AIX operating system. Most POWER applications will benefit from the improved performance of new PowerPC processors. Other applications that frequently perform the operations in the following list, which use the excluded instructions, will produce correct results on PowerPC systems but may run slowly unless they are recompiled:

Extended- precision bit string computation
Extended- precision multiplication
Integer division
Generation or modification of instructions about to be executed

The first PowerPC processor, the 601, implements all but two of the nonprivileged POWER instructions. One goal for this bridge processor was to allow application vendors additional time to recompile their products for PowerPC systems. Most existing POWER applications will run well on 601-based systems. Applications that generate or modify code might use the POWER cache flush instruction, which 601 does not implement. Such programs will produce correct results (the operation is emulated by AIX), but they may run slowly without recompilation. As noted previously, the 601 implements nearly all the excluded nonprivileged POWER instructions. However, new applications should not use these excluded instructions as other PowerPC processors will not implement them.

PowerPC Architecture is a 64-bit architecture. This architecture extends addressing and fixed-point computation to 64 bits, and supports dynamic switching between the 64-bit mode and the 32-bit mode. In 32-bit mode, a 64-bit PowerPC processor will execute application binaries compiled for the 32-bit subset architecture. Because a description of the entire architecture is too large to be addressed here, this paper concentrates on the descriptions of the changes that affect the user-mode 32-bit subset architecture.

Back to top

Previous | Next

Announcing

Jan	FEB	Apr
	14
2007	2008	2009