http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
GPU를 위한 프로파일링 기반 페이스 예측 및 적응형 워프 스케줄러
박종현(Jong Hyun Park),윤명국(Myung Kuk Yoon),김민수(Minsu Kim),노원우(Won Woo Ro) 대한전자공학회 2015 대한전자공학회 학술대회 Vol.2015 No.11
In the many-core era, Graphics Processing Unit(GPU) have become important for processing data of large volume since General Purpose computation on GPU (GPGPU) achieve high performance successfully. To improve the GPGPU performance, many researches proposed warp scheduling policies. However, various warp scheduling policies show different performance on different kernels, since each kernels shows different characteristics. To address this problem, this paper proposes adaptive warp scheduling policy based on profiling information. Our experimental results show that the adaptive warp scheduling policy achieves average 8.7% performance improvements compared to baseline GPU architecture.
신현준(Hyun-jun Shin),윤명국(Myung Kuk Yoon),노원우(Won Woo Ro) 대한전자공학회 2017 대한전자공학회 학술대회 Vol.2017 No.6
Neural network applications are both memory intensive and computation intensive. Network models which have over 90% accuracy from ImageNet dataset have at least 27MB of parameter size and 1.6GOPS in classifying an image. However, the mobile platform has limited hardware resources to compute the real-time image. To solve this problems, hardware architectural support using parallel computing is required. Based on our simulation results, convolutional layer dominates the computational resources in neural network applications. Convolutional layer is replaced with matrix multiplication by lowering. In this paper, we estimate parallelism of each network model. This estimation can be used to determine the number of SIMD lane for neural network accelerator.
Reduced Precision Floating Point를 활용한 Ray Tracing 분석 연구
정은수(Eun Soo Jung),정연희(Yeonhee Jung),윤명국(Myung Kuk Yoon) 대한전자공학회 2023 대한전자공학회 학술대회 Vol.2023 No.6
To address the inefficiency issue of single or double precision floating point (FP) operations, which demands significant memory bandwidth and energy usage in ray tracing rendering, this study proposes the application of a single reduced precision FP format to ray tracing. Reduced precision FP refers to a FP data type that reduces the number of bits in the exponent and mantissa of the existing FP. Although this results in higher operational and memory efficiency than the existing method, it also has the disadvantage of relatively lower accuracy, resulting in more data loss. This paper demonstrates that even when reduced precision FP operations are applied to ray tracing, the resulting images are comparable to those generated by the conventional method. Furthermore, based on experiments and analyses of various precisions, this paper proposes a precision that is suitable for ray tracing.
그래픽 프로세싱 유닛의 성능 향상을 위한 프리로딩 연구
박은성(Eun Seong Park),정은비(Eunbi Jeong),윤명국(Myung Kuk Yoon) 대한전자공학회 2023 대한전자공학회 학술대회 Vol.2023 No.6
In this paper, a new architecture is proposed for GPUs that aims to solve two problems present in previous prefetching architectures. The first problem is the cache eviction problem caused by the additional prefetch memory requests. The second problem is the performance limitation of the prefetching architecture due to the extra access cycles required to load prefetched data from the L1 cache to the register file. The proposed preloading architecture addresses these problems by prefetching data into dedicated storage, which can then be directly loaded into the register file when demand memory requests access the storage. According to the evaluation results, the proposed architecture shows about 11% of performance improvement over the baseline.