High-performance rendering processors have recently been adopted in most PCs. Even in the Post-PC devices, such as PDAs, DTVs, and mobile phones, 3D graphics rendering processors are embedded into the SOC chips. Three dimensional graphics provide more...
High-performance rendering processors have recently been adopted in most PCs. Even in the Post-PC devices, such as PDAs, DTVs, and mobile phones, 3D graphics rendering processors are embedded into the SOC chips. Three dimensional graphics provide more realistic images through various techniques such as lighting, shading, fogging, and sharpness and correctness adjustments. These techniques require a graphics accelerator in order for them to be processed at a reasonable speed.
However, memory performance is becoming an increasingly likely culprit in degrading overall performance, since improvements in memory performance have not kept up with those of rendering processing. Continual improvements in semiconductor technology have made computation costs relatively affordable, but memory access has remained troublesome. Studies in which a cache is exploited have been carried out to overcome these problems. To reduce memory latency and bandwidth requirement, most rendering processors include a pixel cache to store z-data and color data, and a texture cache to store texture data.
This dissertation focuses the design of a cache memory system for a rendering processor. It includes an effective pixel cache architecture and the implementation of the cache controller. An effective pixel cache architecture is proposed to improve the performance of the rendering processors. Z-data are selectively stored into either the main cache or an auxiliary buffer based on the results of a z-test, while color data are stored in the auxiliary buffer. Simulation results show that the 16Kbyte proposed cache architecture provides better performance than the conventional 32Kbyte cache architecture.
A pixel rasterization architecture that performs two depth tests is proposed. By performing the depth test before texture mapping, memory bandwidth is not wasted due to fetching unnecessary obscured texture data. This proposed architecture also reduces the miss penalties of the pixel cache by using a prefetch scheme?that is, frame memory accesses due to cache misses in the first depth test are done simultaneously with texture mapping. The proposed pixel rasterization architecture efficiently utilizes memory bandwidth, reduces power consumption and achieves a high level of performance.
The implementation of the cache controller for a CalmRISC32TM embedded processor is described. Because of the design flexibilities of a dual cache structure, a cooperative cache is used in order to improve performance and reduce power consumption. This cooperative cache system is used as both the data cache and instruction cache.