Dreamcast Technical Pages
Back
Nintendo's GameCube Technical Overview
Nintendo Gamecube Console
Yes it looks like a cute little toy, but a very powerful console lies underneath.

Release Dates

Nintendo released the GameCube on Sept 14th, 2001 (Japan) and Nov 18th, 2001 (North America). That's three years after the Dreamcast Japanese release of Nov/1998.

The GameCube was released in North America at a price of $199 US.

Specifications

First released by Nintendo at Nintendo's Spaceworld on August 24th, 2000 in Tokyo, Japan. New specs announced by Nintendo on it's website on May 15th, 2001, during the 2001 E3 show in LA. "Gekko" CPU upgraded from 400 MHz to 485 MHz, and "Flipper" GPU downgraded from 202.5 MHz to 162 MHz. 

CPU - "Gekko"
CPU: IBM Power PC "Gekko"
Manufacturing Process: 0.18 microns Copper Wire Technology
Clock Frequency: 485 MHz
CPU Capacity: 1125 Dmips (Dhrystone 2.1)
Internal Data Precision: 32-bit Integer & 64-bit Floating-point
External Bus Bandwidth: 1.3 GB/second peak bandwidth (32-bit address, 64-bit data bus at 162 MHz)
Internal Cache: L1: Instruction 32 KB, Data 32 KB (8 way) L2: 256 KB (2 way)

GPU - "Flipper" (system LSI)
Manufacturing Process: 0.18 microns NEC Embedded DRAM Process
Clock Frequency: 162 MHz
Embedded Frame Buffer/Z Buffer: Approx. 2 MB, Sustainable Latency: 6.2 ns  (1T-SRAM)
Embedded Texture Cache: Approx. 1 MB, Sustainable Latency: 6.2 ns  (1T-SRAM)
Texture Read Bandwidth: 10.4 GB/second (Peak)
Main Memory Bandwidth: 2.6 GB/second (Peak)
Color depth, Z Buffer depth: 24-bits
Image Processing Function: Fog, Subpixel Anti-aliasing, 8 Hardware Lights, Alpha Blending, Virtual Texture Design, Multi-texturing, Bump Mapping, Environment Mapping, MIP Mapping, Bilinear Filtering, Trilinear Filtering, Ansitropic Filtering, and Real-time Hardware Texture Decompression (S3TC).
Other: Real-time Decompression of Display List, Hardware Motion Compensation Capability, and HW 3-line Deflickering filter.

(The following sound related functions are all incorporated into the System LSI)

Sound Processor: Custom Macronix 16-bit DSP
Instruction Memory: 8 KB RAM + 8 KB ROM
Data Memory: 8 KB RAM + 4 KB ROM
Clock Frequency: 81 MHz
Maximum Number of Simultaneously Produced Sounds: 64 simultaneous channels, ADPCM encoding
Sampling Frequency: 48 KHz

System Floating-point Arithmetic Capability: 10.5 GFLOPS (Peak) (MPU, Geometry Engine, Hardware Lighting Total)
Estimated raw display capability: 32 million polygons/second (Peak) (no texture, gouraud shading)
Actual Display Capability: 6 to 12 million polygons/second (Assuming actual game conditions with complex models, fully textured, fully lit, etc.)
Total System Memory: 40 MB
Main Memory: 24 MB with sustainable latency of 10 ns or lower (1T-SRAM)
A-Memory: 16 MB 81 MHz DRAM

Disc Drive: CAV (Constant Angular Velocity) System
Average Access Time: 128ms
Data Transfer Speed: 16Mbps to 25Mbps
Media: 3 inch NINTENDO GAMECUBE Disc based on Matsushita's Optical Disc Technology
Capacity: Approx. 1.5GB

Input/Output: 4 Controller Ports, 2 Memory Card Slots, Analog AV Output, Digital AV Output, 2 High-Speed Serial Ports, High-speed Parallel Port
Power Supply: AC Adapter DC12V x 3.5A
Dimensions: 4.3"(H) x 5.9"(W) x 6.3"(D)

*The peak figures listed are all for maximum instantaneous performance and cannot be achieved in an actual game. However, following the conventions of the game industry, they are listed for your reference.

First off, lets point out that Nintendo is being very conservative in it's 6 to 12 million polygons/seconds rating, as the developer Factor 5 who is developing Star Wars: Rogue Squadron is already doing 12 million polygons/second, and the developer claims they are only using 50 percent of Gamecube's power. Factor 5 indicated they could get 20 million polygons/second per second with all effects. Effects stands for texture layers, and not polygonal lighting.

Two videos of Star Wars: Rogue Squadron running on Gamecube at cube.ign.com available here.

Here is information on Star Wars: Rogue Squadron, and the Gamecube as provided by Julian Eggebrecht (President) of Factor 5, and was originally presented on the forum at this german video game site: Maniac Online.

  • Both videos run in real-time and only use 50% of the hardware.
  • The X-Wing model is the original model used by Industrial Light and Magic (ILM), for special effects in the Stars Wars film and includes ILM's textures and shaders. The X-Wing alone is comprised of 30,000 polygons and the pilot has 4000 polygons!
  • Both demos run at a constant 60 fps, double buffered, true color, and with full screen anti-aliasing and deflickering.
  • The surface of the second Death Star from the film was rebuilt accurately at 1:1 ratio. The simple shapes on the Death Star surface have up to 300 polygons. Every element has a 512x512 true color texture. You can see 25 of them in the demo. There are 70 Tie Fighters and X-Wings onscreen. So, there are like 200,000 polygons at 60 fps with up to 8 light sources along with gloss, dirt, and bump maps. (Note: that number and type of light source can have a huge effect on the polygon rate)
  • Texturing allows all these effects: (alpha, bump mapping, gloss mapping, specular highlights, etc.) in the same cycle. It can do 8 layers in a single pass. (It has been previously reported that Julian said single "cycle", but that is not possible since Flipper has 4 pipelines with one texel unit per pipeline)
  • The 2 MB frame buffer is a render buffer - when a frame is done, it is then sent to main memory. When being sent to main memory anti-aliasing and deflickering is done.
  • The 1 MB texture cache is automatically filled during rendering, as the T&L engine triggers the swap. Textures that are used often can be "locked" into the cache.
  • A-Memory is for audio but can also be used as a buffer for other items that don't need the speed of main RAM.
  • The 2 MB on-chip frame buffer contains only the data for the current frame being rendered, and the z-buffer. Double/triple buffer is stored in main RAM, because the video DAC gets the image data directly from main RAM. To render an image, polygonal data is sent to the T&L unit, which loads/swaps textures in automatically into the texture cache. Textures are decompressed during rendering, so uncompressed textures never take up memory.
Specs are one thing, but it was the Star Wars: Rogue Squadron videos that Nintendo showed proves it has quite an amazing new console. This one game looks outstanding, and it is still very early in development.

A side note on Factor 5 is that they will be providing the sound tools for Gamecube development. The software is called MusyX and you can find details on it here.

Motherboard
 
gamecube motherboard
Click on the motherboard picture to see a larger picture.

A beautiful piece of engineering interms of size, number of large components (only 5 large semiconductors with CPU, GPU and three memory chips), low manufacturing cost, and awesome performance. The entire board is roughly the size of a compact disc jewel case!

Flipper Die

flipper die photo

PLL: Phase Lock Loop
eFB: Embedded Frame Buffer
eTM: Embedded Texture Memory
TF: Texture Filter
TC: Texture Coordinate Generator
TEV: Texture Environment
RASx: Rasterizer
C/Z: Color/Z Calculator
PEC: Pixel Copy Engine
SU: Triangle Setup
CP: Command Processor
DSP: Audio DSP
XF: Triangle Transform Engine
NB: Northbridge - all system logic
including CPU interface, Video
Interface, Memory Controller, I/O
Interface

CPU "Gekko" Details

Article released some details on the Gekko CPU here:

Unlike Sony's Playstation 2 Emotion Engine, the Gekko MPU was not built from the ground up. It's a derivative of the PowerPC 750 RISC processor and includes some 50 new instructions. Based on 0.18-micron copper wire process technology, the device runs at 485 MHz and has an external bus to the Flipper device with a peak of 1.3 Gbytes/s. The chip has a performance rating of 1125 DMips (Dhrystone 2.1). 

The GameCube team chose to build off the existing PowerPC design to leverage the available tool chain, such as compilers and optimizers. IBM claims this has given developers a jump on creating new games. "Developers have been making software for the GameCube a long time before people knew we were doing the silicon," said IBM's West. "If you were to take code written for a PowerPC you could essentially run it on this device. We didn't deviate from what is a well-understood architecture by a large amount." 

One of the modifications it made was to cut the 64-bit floating point unit in half, allowing it to do two 32-bit floating point operations every cycle. "Conventional wisdom is that four-way is actually better, but this is not necessarily true," West said. "Two-way is actually pretty much as powerful as four-way, plus it takes up less silicon and it's easier to make it go fast. We're going to try to complete two instructions every cycle." 

To improve the internal data flow, IBM tried to eliminate "cache  trashing," or wasting cache space on transient data. The 256-Kbit Level-2 cache can be locked down so that it retains only the data that needs to be reused. There's also an internal direct memory access that moves data from the cache while allowing the device to process a different set of data. This mechanism helps mitigate the incremental latency associated with compressing and decompressing the data. 

"You often get into a mode of cache trashing and filling it up with useless data," said PowerPC architect Peter Sandon. "We tried to optimize the data movement so that we don't see the cache misses you would otherwise see." 

Chip Sizes

Nintendo has released the chip size for "Gekko" and "Flipper" as indicated in this article entitled "Designers bring practical touch to GameCube" at EE Times on Sept. 7th, 2000.

Although MoSys was perhaps the most strategic partner, the GameCube project involved several alliances. IBM Corp. provided the so-called Gekko CPU, a custom version of the 400-MHz* PowerPC with 256 kbytes of secondary cache all made with a 0.18-micron copper CMOS process. Despite the large secondary cache, IBM was able to build the chip on a 43-mm2 die, said Takeda.

The 3-D graphics technology is from ArtX Inc. (Palo Alto, Calif.), which is fabricating the embedded system chip, a 120-mm2 device made with 0.18-micron technology, which contains the SRAM embedded memory as well as the ArtX graphics engine and a sound generator. Volume production of the part is to begin next month at the No. 9 fab at NEC Kyushu. And Matsushita is supplying a proprietary 8-cm optical disk for the games.

*Note: CPU is now 485 MHz due to revised specs.

Small chip sizes is important for the number chips produced on a silicon wafer. The more chips per wafer, the cheaper the cost of production. Smaller circuit sizes can have a huge effect on the size of the chip, as the IBM PowerPC CPU (0.18 micron)with 256 KB of secondary cache is the same size as the Hitachi SH-4 CPU (0.25 micron) used in the Dreamcast!

The "Flipper" GPU contains 51 million transistors, of which half is used up by the on-chip memory.

The Gamecube has a massive heat sink that covers all the chips, and it does have a fan on one of the air vents on it's side. It has been reported that the fan and the drive are very quiet.

Motherboard Datapath
 
motherboard datapath
The motherboard datapath diagram is not official as Nintendo has not released that information yet.

All the datapaths listed above are bidirectional (read and write). The 81 MHz A-memory has low bandwidth, but it is more then adequate for supporting all the sound channels that the Gamecube is capable of, as this quick calculation shows:

64 (channels) x 48,000 (sample rate) x 16-bits (data size) = roughly 6 MB/sec

With a 81 MHz DSP and 16 MB of sound memory, the quality of sound in Gamecube games should be outstanding!

Someone from Nintendo has confirmed that the sound chip on "Flipper" has it's own data pins to the 81 MHz DRAM that is seperate from the main memory bus. This means that all sound accesses will not have any negative effect on graphic operations.

Note that the CPU also has access to the 81 MHz DRAM, so it can be used for other storage besides sound data. A good place to store information that does not need lots of bandwidth like selection screens.

Graphics Processing Unit (GPU) Datapath
 
flipper data path
The "Flipper" chip datapath diagram above is not official, and was created based on speculation.

The sound chip was not included in the above diagram, as focus will concentrate on the most bandwidth intensive aspects of the "Flipper" chip.

The frame buffer contains the draw frame buffer (640x480x24-bits = 921,600 bytes), and the z-buffer (640x480x24-bits = 921,600 bytes). The display buffer is stored in main ram.

The texturing rate was released on March 16, 2001 by cube.ign who indicate that they have access to Nintendo's official GameCube Hardware Overview documentation which states a pixel rate of 648 MPixels/sec. It has come to my attention from someone who has access to the Gamecube developer documentaton that there is a single texel unit per pipeline. 
 
4 Pipelines @ 162 MHz Texture Rate Texel Rate
1 texel unit per pipeline 648 MPixels/sec 648 MTexels/sec

Polygon Rate

Information below comes from this article at cube.ign, and they got the information from Nintendo's official GameCube Hardware Overview documentation.
 
Features Performance
1 vertex color + 1 light + 1 texture 20M polygons/sec
no vertex color + 1 texture 26.4M polygons/sec
1 vertex color + no texture (gouraud shading)  32M polygons/sec

Note: above figures where changed to reflect revised specs. Flipper chip is now 162 MHz and not 202.5 MHz.

As you can see the Gamecube can push a lot of polygons per second, and it can do 26.4 million textured polygons per second maximum. Note that increasing the number of local lights in a scene would cause the polygonal rate to go down.

Flipper (GPU) Instruction Set

Info from Beyond3D message board thread from a Japanese individual. It lists the different instructions available by Flipper's Transformation and Lighting (T&L) unit.

Here is some information about Flipper from Japanese magazine "Nikkei Electronics 2000/10/9"

Flipper
51M transistors total
Logic portion: 26 M transistors
Memory portion: 25 M transistors
Frame Buffer: 2.1 MB 128 banks
Texture Cache: 1 MB 512 banks

T&L pipeline information below. I'm not good at English, and I'm not a software engineer, so there may be some misunderstandings. Sorry. "*ct*" means "I could not translate it."

1. Geometry, Texture

view transform / multiply*3, add*3 / 32bit / 1 cycle/
perspective transform / multiply*2, add*1 / 32-bit / 1 cycle /
clipping / add*1 / 32-bit / 1 cycle /
(*ct* ) / division*1 / 32-bit / 1 cycle /
(*ct* ) / multiply*1 / 32-bit / 1 cycle /
viewport scalling / multiply*1, add*1 / 32-bit / 1 cycle /
texture cordinate / multiply*3, add*3 / 32-bit / 1 cycle /
re-normalization of texture cordinate / multiply*2, add*1 / 32-bit / 1 cycle /

2. Lighting

transform normal vector / multiply*3, add*3 / 20-bit / 1 cycle /
caliculate light vector / add*1 / 20-bit / 1 cycle /
(*ct* some lighting caliculations) / multiply*3, add*3 / 20-bit / 1 cycle /

(*ct* . pointlight/spotlight/specular section .)
(lighting) / multiply*5, add*2 / 20-bit / 4 cycle /
(normalize) / division of square root*1 / 20-bit / 4 cycle /
(lighting) / multiply*5, add*2 / 20-bit / 4 cycle /
(normalize) / division of square root*1 / 20-bit / 4 cycle /
(lighting) / division*2 / 20-bit / 4 cycle /
(normalize) / multiply*2 / 20-bit / 4 cycle /
(lighting) / multiply*1 / 20-bit / 4 cycle /
(lighting) / multiply*1 / 20-bit / 4 cycle /
(lighting) / multiply*1 / 20-bit / 4 cycle /

(*ct* sum)
(lighting) / multiply*1 / 20-bit / 1 cycle /
(lighting) / add*x1 / 20-bit / 1 cycle /

(*ct* some bump mapping related caliculations)
(normalize) / multiply*1 / 20-bit / 1 cycle /
(scaling) / multiply*1 / 20-bit / 4 cycle /
(add offset) / add*1 / 32-bit / 4 cycle /
(normalize) / division of square root*1 / 20-bit / 4 cycle /

(lighting) / float to int*1 / 32-bit / 1 cycle /

Flipper Total: 46.5 floating point operations per cycle, so 46.5 * 202.5 MHz = 9.4 Gflops

Gekko (CPU): FMAC * 2 = 1.6 Gflops

System Total: 9.4 + 1.6 = 11.0 Gflops

As you can see, the Gamecube's Flipper GPU has a very rich instruction set for transformations, texturing, lighting, and bump mapping, and is very powerful in the number of instructions it can do in parallel, while being easy to program.

Texture Compression

The Gamecube's GPU can use S3TC's compressed textures which provides for a 6:1 ratio in compression for 24-bit textures. For 16-bit textures the ratio is 4:1, and for 8-bit textures the ratio is 2:1.

Let us consider how much compressed textures the Gamecube can hold in it's 24 MB of main memory if we consider different memory size requirements for game code/geometry/etc. 
 
Code/Geometry/etc. Free Texture Space 24-bit Textures (compressed 6:1)
6 MB 18 MB 108 MB
8 MB 16 MB 96 MB
10 MB 14 MB 84 MB
12 MB 12 MB 72 MB

As you can see the Gamecube can store lots of textures in it's memory using S3TC's texture compression format. Note that another benefit of using compressed textures is that the bandwidth requirements also decrease by the same ratio as the actual compression. At a ratio of 6:1, the memory bus can pass 6 times more textures. That means the GPU's texture cache bus of 10.4 GB/sec can pass 62.4 GB of 24-bit compressed textures, and the external bus of 2.6 GB/sec can pass 15.6 GB of 24-bit compressed textures each second!

Should a developer use 16-bit textures over 24-bit textures in order to save space? Let us compare:
 
Texture Size 512 x 512 16-bit 24-bit
Uncompressed 525 KB 786 KB
Compressed 131 KB 131 KB

As you can see with the greater ratio of 6:1 for 24-bit textures, it makes more sense for the developer to use only 24-bit compressed textures as they are the same size as 16-bit compressed textures.

S3TC also allows texture compression of transparencies, which the Vector Quantization (VQ) texture compression on the Dreamcast could not do. This will allow the Gamecube to store lots of transparencies in it's main memory.

Thanks to S3TC's texture compression, the Gamecube's 1 MB of texture cache can hold the equivalent of 6 MB of 24-bit textures. That's roughly 8 x (512 x 512) 24-bit textures, or 32 x (256 x 256) 24-bit textures for example.

Hidden Surface Removal

Gamecube does do hidden surface removal (HSR) by doing an early z-buffer check, that discards hidden pixels as it renders from front to back. The front to back sorting has to be done by the game engine as developed by game developer or it will not be effective. Of course the results will then vary from developer to developer. This HSR is not as effective as PowerVR's infinite planes, as the PowerVR method does not need the developer to render objects in any order to be effective. This information on Gamecube's HSR was provided by someone who has access to Gamecube's developer documentation.

Virtual Texturing

Virtual texturing is a hardware feature of managing textures by breaking them up into smaller blocks. This can contribute to quite a savings in bandwidth and make for more efficient use of the texture cache for textures like sky textures for example where in most games only half of the sky can be seen in most scenes. By keeping the most used texture blocks in the texture cache, this allows main memory bandwidth to be used more efficiently. All of this is done automatically, and does not have to be coded in by the developer.

Hardware Lighting

8 hardware lights supported. Every polygon in a scene can be affected by as many as 8 lights. The number of polygons that the Gamecube can do with 8 lights depends on whether those lights are local or infinite, and you also have to consider what type of lighting is being used. The number of polygons with light sources could vary greatly depending on these variables.

Vertex Compression

Gamecube supports vertex compression, as it allows vertex data to be represented by bytes (8-bits) or shorts (16-bits) instead of floating point numbers (32-bits) if the particular game engine can get away with using less accurate polygon positioning. The transformation unit automatically unpacts the integer data, and converts it to floating point values before processing it.

Main memory

One of the most amazing aspects of the Gamecube, is it's 24 MBytes of 1T-SRAM. 1T-SRAM was invented by MoSys, Inc., and you can find specific information on this memory here at MoSys's website.

With Gamecube's main memory of 1T-SRAM and it's sustainable latency of 10 ns or lower, it should be faster then any other affordable memory technology out there when it comes to repeated non-linear accesses. This memory will shine with repeated random accesses that complex game AI may introduce, and not with general texture accesses, since textures are stored in memory linearly. Texture access speed will not suffer though, since the main memory has a bandwidth of 2.6 GB/sec, and there is also that 1 MB of onchip texture cache to help keep the most repeated textures near the rendering unit.

The Gamecube also has an extra 16 MB of 81 MHz DRAM and this memory would be great for data that does not need the access speed of the 1T-SRAM main memory like sound, and selection screens.

More info on the memory from here:

Here again the GameCube uses 1T-SRAM, this time as 24 Mbytes of external memory. Operating at a 324-MHz clock speed, the memory moves data at 2.6 Gbytes/s, with a sustained latency of 10 ns. But unlike the Playstation 2 or the forthcoming XBox, GameCube's memory subsystem does not rely on a Rambus or Double-Data Rate (DDR) interface to boost the bandwidth. Instead, Mosys developed a proprietary active termination I/O that resides near the pads and eliminates the need for placing a bank of resistors on the board, saving area and cost.
Texture Cache

Here is any interesting article from AsiaBizTech that provides some information on the number of simultaneous accesses that can occur with the texture cache:

Parallel Processing of 32 Access Transactions

The Flipper LSI has two units of the 1T-SRAM memory integrated, namely, 2.1MB for a frame-buffer and Z-buffer and 1MB for a texture cache. NEC Corp. manufactures the LSI. 

It was necessary to enhance random access performance of 1T-SRAM applied to a texture cache which will be frequently accessed, thus making it faster than that used for the frame-buffer and Z-buffer. To meet this need, the entire bank was divided into 512 pieces. Fu-Chien Hsu, chairman and CEO of MoSys said, "Of those component banks, 32 banks can be accessed simultaneously." On the other hand, the frame and Z buffer was designed to have 128 banks, since there was no strong need to offer high operation with this buffer. 

The main memory of the Gamecube consists of two sets of 96Mbits 1T-SRAM. As it can drive a 64-bit data-bus at 400MHz*, the machine transfers data at up to 3.2GB* per second, which is the same rate the PlayStation2 has achieved through the Direct Rambus interface consisting of two channels.

However, latency on random access to the main memory is slower than that of the 1T-SRAM being embedded in the Flipper LSI, which results from a configuration of the main memory being externally attached. Nonetheless, the memory access is completed in less than 10ns, sufficiently faster than multi-purpose DRAM.

*Note: Data bus transfer is now 2.6 GB/sec due to revised specs, and the external 64-bit data-bus runs at 324 MHz now.

More info on the internal bandwidth of the caches from here:

Both internal memory buffers have a sustained latency of under 5 nanoseconds. The frame and z-buffer memory is capable of 7.68 Gbytes/second of bandwidth. The texture buffer boasts an even faster bandwidth of 10.4 Gbytes/s because it's divided into 32 independent macros, each 16 bits wide for a total I/O of 512 bits. This gives each macro its own address bus, so that all 32 macros can be accessed simultaneously, said Mark-Eric Jones, vice president of marketing for Mosys. 
Drive

From the specs:
Disc Drive: CAV (Constant Angular Velocity) System Average Access Time Data Transfer Speed 16Mbps to 25Mbps 128ms
Media: 8cm NINTENDO GAMECUBE Disc based on Matsushita's Optical Disc Technology Approx. 1.5GB Capacity

Speed roughly comparable to a 13x to 20x CD-ROM. Developers would store information on the outer tracks first, so 20x speed would be more common. Not all that fast, but not all that slow.

Links

ArtX's site
cube.ign.com Gamecube FAQ
cube.ign.com Gamecube sound article
cube.ign.com Gamecube tech article
cube.ign.com Gamecube Flipper article
Article on S3TC Texture Compression
MoSys's 1T-SRAM information page