GameCube Technical Overview
Yes it looks like a cute little toy, but a very powerful console lies underneath.
Nintendo released the GameCube on Sept 14th, 2001 (Japan) and Nov 18th, 2001 (North America). That's three years after the Dreamcast Japanese release of Nov/1998.
The GameCube was released in North America at a price of $199 US.
First released by Nintendo at Nintendo's Spaceworld on August 24th, 2000 in Tokyo, Japan. New specs announced by Nintendo on it's website on May 15th, 2001, during the 2001 E3 show in LA. "Gekko" CPU upgraded from 400 MHz to 485 MHz, and "Flipper" GPU downgraded from 202.5 MHz to 162 MHz.
CPU - "Gekko"
GPU - "Flipper" (system LSI)
(The following sound related functions are all incorporated into the System LSI)
Sound Processor: Custom Macronix
System Floating-point Arithmetic
Capability: 10.5 GFLOPS (Peak) (MPU, Geometry Engine, Hardware Lighting
Disc Drive: CAV (Constant
Angular Velocity) System
Input/Output: 4 Controller
Ports, 2 Memory Card Slots, Analog AV Output, Digital AV Output, 2 High-Speed
Serial Ports, High-speed Parallel Port
*The peak figures listed are all for maximum instantaneous performance and cannot be achieved in an actual game. However, following the conventions of the game industry, they are listed for your reference.
First off, lets point out that Nintendo is being very conservative in it's 6 to 12 million polygons/seconds rating, as the developer Factor 5 who is developing Star Wars: Rogue Squadron is already doing 12 million polygons/second, and the developer claims they are only using 50 percent of Gamecube's power. Factor 5 indicated they could get 20 million polygons/second per second with all effects. Effects stands for texture layers, and not polygonal lighting.
Here is information on Star Wars: Rogue Squadron, and the Gamecube as provided by Julian Eggebrecht (President) of Factor 5, and was originally presented on the forum at this german video game site: Maniac Online.
A side note on Factor 5 is that they will be providing the sound tools for Gamecube development. The software is called MusyX and you can find details on it here.
A beautiful piece of engineering interms of size, number of large components (only 5 large semiconductors with CPU, GPU and three memory chips), low manufacturing cost, and awesome performance. The entire board is roughly the size of a compact disc jewel case!
PLL: Phase Lock Loop
CPU "Gekko" Details
Article released some details on the Gekko CPU here:
Unlike Sony's Playstation 2 Emotion Engine, the Gekko MPU was not built from the ground up. It's a derivative of the PowerPC 750 RISC processor and includes some 50 new instructions. Based on 0.18-micron copper wire process technology, the device runs at 485 MHz and has an external bus to the Flipper device with a peak of 1.3 Gbytes/s. The chip has a performance rating of 1125 DMips (Dhrystone 2.1).Chip Sizes
Nintendo has released the chip size for "Gekko" and "Flipper" as indicated in this article entitled "Designers bring practical touch to GameCube" at EE Times on Sept. 7th, 2000.
Although MoSys was perhaps the most strategic partner, the GameCube project involved several alliances. IBM Corp. provided the so-called Gekko CPU, a custom version of the 400-MHz* PowerPC with 256 kbytes of secondary cache — all made with a 0.18-micron copper CMOS process. Despite the large secondary cache, IBM was able to build the chip on a 43-mm2 die, said Takeda.*Note: CPU is now 485 MHz due to revised specs.
Small chip sizes is important for the number chips produced on a silicon wafer. The more chips per wafer, the cheaper the cost of production. Smaller circuit sizes can have a huge effect on the size of the chip, as the IBM PowerPC CPU (0.18 micron)with 256 KB of secondary cache is the same size as the Hitachi SH-4 CPU (0.25 micron) used in the Dreamcast!
The "Flipper" GPU contains 51 million transistors, of which half is used up by the on-chip memory.
The Gamecube has a massive heat sink that covers all the chips, and it does have a fan on one of the air vents on it's side. It has been reported that the fan and the drive are very quiet.
All the datapaths listed above are bidirectional (read and write). The 81 MHz A-memory has low bandwidth, but it is more then adequate for supporting all the sound channels that the Gamecube is capable of, as this quick calculation shows:
64 (channels) x 48,000 (sample rate) x 16-bits (data size) = roughly 6 MB/sec
With a 81 MHz DSP and 16 MB of sound memory, the quality of sound in Gamecube games should be outstanding!
Someone from Nintendo has confirmed that the sound chip on "Flipper" has it's own data pins to the 81 MHz DRAM that is seperate from the main memory bus. This means that all sound accesses will not have any negative effect on graphic operations.
Note that the CPU also has access to the 81 MHz DRAM, so it can be used for other storage besides sound data. A good place to store information that does not need lots of bandwidth like selection screens.
Processing Unit (GPU) Datapath
The sound chip was not included in the above diagram, as focus will concentrate on the most bandwidth intensive aspects of the "Flipper" chip.
The frame buffer contains the draw frame buffer (640x480x24-bits = 921,600 bytes), and the z-buffer (640x480x24-bits = 921,600 bytes). The display buffer is stored in main ram.
The texturing rate was released on
March 16, 2001 by cube.ign who indicate
that they have access to Nintendo's official GameCube Hardware Overview
documentation which states a pixel rate of 648 MPixels/sec. It has come
to my attention from someone who has access to the Gamecube developer documentaton
that there is a single texel unit per pipeline.
Information below comes from this
at cube.ign, and they got the information from Nintendo's official GameCube
Hardware Overview documentation.
Note: above figures where changed to reflect revised specs. Flipper chip is now 162 MHz and not 202.5 MHz.
As you can see the Gamecube can push a lot of polygons per second, and it can do 26.4 million textured polygons per second maximum. Note that increasing the number of local lights in a scene would cause the polygonal rate to go down.
Flipper (GPU) Instruction Set
Info from Beyond3D message board thread from a Japanese individual. It lists the different instructions available by Flipper's Transformation and Lighting (T&L) unit.
Here is some information about Flipper from Japanese magazine "Nikkei Electronics 2000/10/9"As you can see, the Gamecube's Flipper GPU has a very rich instruction set for transformations, texturing, lighting, and bump mapping, and is very powerful in the number of instructions it can do in parallel, while being easy to program.
The Gamecube's GPU can use S3TC's compressed textures which provides for a 6:1 ratio in compression for 24-bit textures. For 16-bit textures the ratio is 4:1, and for 8-bit textures the ratio is 2:1.
Let us consider how much compressed
textures the Gamecube can hold in it's 24 MB of main memory if we consider
different memory size requirements for game code/geometry/etc.
As you can see the Gamecube can store lots of textures in it's memory using S3TC's texture compression format. Note that another benefit of using compressed textures is that the bandwidth requirements also decrease by the same ratio as the actual compression. At a ratio of 6:1, the memory bus can pass 6 times more textures. That means the GPU's texture cache bus of 10.4 GB/sec can pass 62.4 GB of 24-bit compressed textures, and the external bus of 2.6 GB/sec can pass 15.6 GB of 24-bit compressed textures each second!
Should a developer
use 16-bit textures over 24-bit textures in order to save space? Let us
As you can see with the greater ratio of 6:1 for 24-bit textures, it makes more sense for the developer to use only 24-bit compressed textures as they are the same size as 16-bit compressed textures.
S3TC also allows texture compression of transparencies, which the Vector Quantization (VQ) texture compression on the Dreamcast could not do. This will allow the Gamecube to store lots of transparencies in it's main memory.
Thanks to S3TC's texture compression, the Gamecube's 1 MB of texture cache can hold the equivalent of 6 MB of 24-bit textures. That's roughly 8 x (512 x 512) 24-bit textures, or 32 x (256 x 256) 24-bit textures for example.
Hidden Surface Removal
Gamecube does do hidden surface removal (HSR) by doing an early z-buffer check, that discards hidden pixels as it renders from front to back. The front to back sorting has to be done by the game engine as developed by game developer or it will not be effective. Of course the results will then vary from developer to developer. This HSR is not as effective as PowerVR's infinite planes, as the PowerVR method does not need the developer to render objects in any order to be effective. This information on Gamecube's HSR was provided by someone who has access to Gamecube's developer documentation.
Virtual texturing is a hardware feature of managing textures by breaking them up into smaller blocks. This can contribute to quite a savings in bandwidth and make for more efficient use of the texture cache for textures like sky textures for example where in most games only half of the sky can be seen in most scenes. By keeping the most used texture blocks in the texture cache, this allows main memory bandwidth to be used more efficiently. All of this is done automatically, and does not have to be coded in by the developer.
8 hardware lights supported. Every polygon in a scene can be affected by as many as 8 lights. The number of polygons that the Gamecube can do with 8 lights depends on whether those lights are local or infinite, and you also have to consider what type of lighting is being used. The number of polygons with light sources could vary greatly depending on these variables.
Gamecube supports vertex compression, as it allows vertex data to be represented by bytes (8-bits) or shorts (16-bits) instead of floating point numbers (32-bits) if the particular game engine can get away with using less accurate polygon positioning. The transformation unit automatically unpacts the integer data, and converts it to floating point values before processing it.
One of the most amazing aspects of the Gamecube, is it's 24 MBytes of 1T-SRAM. 1T-SRAM was invented by MoSys, Inc., and you can find specific information on this memory here at MoSys's website.
With Gamecube's main memory of 1T-SRAM and it's sustainable latency of 10 ns or lower, it should be faster then any other affordable memory technology out there when it comes to repeated non-linear accesses. This memory will shine with repeated random accesses that complex game AI may introduce, and not with general texture accesses, since textures are stored in memory linearly. Texture access speed will not suffer though, since the main memory has a bandwidth of 2.6 GB/sec, and there is also that 1 MB of onchip texture cache to help keep the most repeated textures near the rendering unit.
The Gamecube also has an extra 16 MB of 81 MHz DRAM and this memory would be great for data that does not need the access speed of the 1T-SRAM main memory like sound, and selection screens.
More info on the memory from here:
Here again the GameCube uses 1T-SRAM, this time as 24 Mbytes of external memory. Operating at a 324-MHz clock speed, the memory moves data at 2.6 Gbytes/s, with a sustained latency of 10 ns. But unlike the Playstation 2 or the forthcoming XBox, GameCube's memory subsystem does not rely on a Rambus or Double-Data Rate (DDR) interface to boost the bandwidth. Instead, Mosys developed a proprietary active termination I/O that resides near the pads and eliminates the need for placing a bank of resistors on the board, saving area and cost.Texture Cache
Parallel Processing of 32 Access Transactions*Note: Data bus transfer is now 2.6 GB/sec due to revised specs, and the external 64-bit data-bus runs at 324 MHz now.
More info on the internal bandwidth of the caches from here:
Both internal memory buffers have a sustained latency of under 5 nanoseconds. The frame and z-buffer memory is capable of 7.68 Gbytes/second of bandwidth. The texture buffer boasts an even faster bandwidth of 10.4 Gbytes/s because it's divided into 32 independent macros, each 16 bits wide for a total I/O of 512 bits. This gives each macro its own address bus, so that all 32 macros can be accessed simultaneously, said Mark-Eric Jones, vice president of marketing for Mosys.Drive
From the specs:
Speed roughly comparable to a 13x to 20x CD-ROM. Developers would store information on the outer tracks first, so 20x speed would be more common. Not all that fast, but not all that slow.