RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Coordinated thread block scheduling and warp scheduler for workload distribution

        Vo Viet Tan,Jihoon Lee,Pyungkoo Park 한국디지털콘텐츠학회 2020 The Journal of Contents Computing Vol.2 No.2

        Throughput-oriented applications require special systems such as graphics processing units (GPUs) to achieve high efficiency. Complex applications are normally divided into smaller species of workload to be executed by utilizing parallelism. However, the workload cannot be distributed equally to every GPU resources, leading to resource underutilization. This paper proposes a simple approach to balance the workload distribution by limiting new issued Cooperative Threads Array (CTAs) to streaming processors (SMs) if they receive enough CTA. Our proposal can increase the resource utilization by prioritizing the warps belong to the last issued CTA. Experimental results show performance improvement by 1.26% on average compared to the baseline GTO (Greedy Then Oldest) warp scheduler.

      • Exit CTA Conscious Warp Scheduling

        Viet Tan Vo,Xuan Thien Cao,Jinsul Kim 한국디지털콘텐츠학회 2021 The Journal of Contents Computing Vol.3 No.1

        Modern GPU architecture achieves a significant performance improvement over several previous GPU generations. Unfortunately, hardware resources upgrade is still underutilized. This paper targets this problem by reducing the warp waiting time when they approach the exit point at the end of CTA. Our proposed exit CTA conscious warp scheduler prioritizes warps in CTA (Cooperative Thread Arrays) which has the largest number of warps waiting at the exit point. The experiments prove that our implementation can improve the performance by 8.4% on average over the performance of baseline warp scheduler.

      • KCI등재

        KAWS: Coordinate Kernel-Aware Warp Scheduling and Warp Sharing Mechanism for Advanced GPUs

        Viet Tan Vo,김철홍 한국정보처리학회 2021 Journal of information processing systems Vol.17 No.6

        Modern graphics processor unit (GPU) architectures offer significant hardware resource enhancements for parallel computing. However, without software optimization, GPUs continuously exhibit hardware resource underutilization. In this paper, we indicate the need to alter different warp scheduler schemes during different kernel execution periods to improve resource utilization. Existing warp schedulers cannot be aware of the kernelprogress to provide an effective scheduling policy. In addition, we identified the potential for improving resource utilization for multiple-warp-scheduler GPUs by sharing stalling warps with selected warp schedulers. To address the efficiency issue of the present GPU, we coordinated the kernel-aware warp scheduler and warp sharing mechanism (KAWS). The proposed warp scheduler acknowledges the execution progress of the running kernel to adapt to a more effective scheduling policy when the kernel progress attains a point of resource underutilization. Meanwhile, the warp-sharing mechanism distributes stalling warps to different warpschedulers wherein the execution pipeline unit is ready. Our design achieves performance that is on an average higher than that of the traditional warp scheduler by 7.97% and employs marginal additional hardware overhead.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼