http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Workload Characteristics-based L1 Data Cache Switching-off Mechanism for GPUs
Thuan Cong Do(두콩튜안),Gwang Bok Kim(김광복),Cheol Hong Kim(김철홍) 한국컴퓨터정보학회 2018 韓國컴퓨터情報學會論文誌 Vol.23 No.10
Modern graphics processing units (GPUs) have become one of the most attractive platforms in exploiting high thread level parallelism with the support of new programming tools such as CUDA and OpenCL. Recent GPUs has applied cache hierarchy to support irregular memory access patterns; however, L1 data cache (L1D) exhibits poor efficiency in the GPU. This paper shows that the L1D does not always positively affect the applications in terms of performance and energy efficiency for the GPU. The performance of the GPU is even harmed by using the L1D for lots of applications. Our proposed technique exploits the characteristics of the currently-executed applications to predict the performance impact of the L1D on the GPU and then decides whether to continuously use the cache for the application or not. Our experimental results show that the proposed technique improves the GPU performance by 9.4% and saves up to 52.1% of the power consumption in the L1D.
A New Cache Replacement Policy for Improving Last Level Cache Performance
Cong Thuan Do(두콩튜안),Dong Oh Son(손동오),Jong Myon Kim(김종면),Cheol Hong Kim(김철홍) Korean Institute of Information Scientists and Eng 2014 정보과학회논문지 Vol.41 No.11
Cache replacement algorithms have been developed in order to reduce miss counts. In modern processors, the performance gap between the processor and main memory has been increasing, creating a more important role for cache replacement policies. The Least Recently Used (LRU) policy is one of the most common policies used in modern processors. However, recent research has shown that the performance gap between the LRU and the theoretical optimal replacement algorithm (OPT) is large. Although LRU replacement has been proven to be adequate over and over again, the OPT/LRU performance gap is continuously widening as the cache associativity becomes large. In this study, we observed that there is a potential chance to improve cache performance based on existing LRU mechanisms. We propose a method that enhances the performance of the LRU replacement algorithm based on the access proportion among the lines in a cache set during a period of two successive replacement actions that make the final replacement action. Our experimental results reveals that the proposed method reduced the average miss rate of the baseline 512KB L2 cache by 15 percent when compared to conventional LRU. In addition, the performance of the processor that applied our proposed cache replacement policy improved by 4.7 percent over LRU, on average.