RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
        • 등재정보
        • 학술지명
          펼치기
        • 주제분류
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Throttling-Based Resource Management in High Performance Multithreaded Architectures

        Lee, S.-W.,Gaudiot, J.-L. IEEE 2006 IEEE Transactions on Computers Vol.55 No.9

        <P>Up to now, the power problems which could be caused by the huge amount of hardware resources present in modern systems have not been a primary concern. More recently, however, power consumption has begun limiting the number of resources which can be safely integrated into a single package, lest the heat dissipation exceed physical limits (before actual package meltdown). At the same time, new architectural techniques such as simultaneous multithreading (SMT), whose goal it is to efficiently use the resources of a superscalar machine without introducing excessive additional control overhead, have appeared on the scene. In this paper, we present a new resource management scheme which enables an efficient low power mode in SMT architectures. The proposed scheme is based on a modified pipeline throttling technique which introduces a throttling point at the last stage of the processor pipeline in order to reduce power consumption. We demonstrate that resource utilization plays an important role in efficient power management and that our strategy can significantly improve performance in the power-saving mode. Since the proposed resource management scheme tests the processor condition cycle by cycle, we evaluate its performance by setting a target IPC as one sort of immediate power measure. Our analysis shows that an SMT processor with our dynamic resource management scheme can yield significantly higher overall performance</P>

      • An Energy and Performance Efficient DVFS Scheme for Irregular Parallel Divide-and-Conquer Algorithms on the Intel SCC

        Yu-Liang Chou,Shaoshan Liu,Eui-Young Chung,Gaudiot, Jeen-Luc IEEE 2014 IEEE computer architecture letters Vol.13 No.1

        <P>The divide-and-conquer paradigm can be used to express many computationally significant problems, but an important subset of these applications is inherently load-imbalanced. Load balancing is a challenge for irregular parallel divide-and-conquer algorithms and efficiently solving these applications will be a key requirement for future many-core systems. To address the load imbalance issue, instead of attempting to dynamically balancing the workloads, this paper proposes an energy and performance efficient Dynamic Voltage and Frequency Scaling (DVFS) scheduling scheme, which takes into account the load imbalance behavior exhibited by these applications. More specifically, we examine the core of the divide-and-conquer paradigm and determine that the base-case-reached point where recursion stops is a suitable place in a divide-and-conquer paradigm to apply the proposed DVFS scheme. To evaluate the proposed scheme, we implement four representative irregular parallel divide-and-conquer algorithms, tree traversal, quicksort, finding primes, and n-queens puzzle, on the Intel Single-chip Cloud Computer (SCC) many-core machine. We demonstrate that, on average, the proposed scheme can improve performance by 41% while reducing energy consumption by 36% compared to the baseline running the whole computation with the default frequency configuration (400MHz).</P>

      • Complexity-Effective Contention Management with Dynamic Backoff for Transactional Memory Systems

        Seung Hun Kim,Dongmin Choi,Won Woo Ro,Gaudiot, Jean-Luc IEEE 2014 IEEE Transactions on Computers Vol.63 No.7

        <P>Reducing memory access conflicts is a crucial part of the design of Transactional Memory (TM) systems since the number of running threads increases and long latency transactions gradually appear: without an efficient contention management, there will be repeated aborts and wasteful rollback operations. In this paper, we present a dynamic backoff control algorithm developed for complexity-effective and distributed contention management in Hardware Transactional Memory (HTM) systems. Our approach aims at controlling the restarting intervals of aborted transactions, and can be easily applied to the various TM systems. To this end, we have profiled the applications of the STAMP benchmark suite and have identified those “problem” transactions which repeatedly cause aborts in the applications with the attendant high contention rate. The proposed algorithm alleviates the impact of these repeated aborts by dynamically adjusting the initial exponent value of the traditional backoff approach. In addition, the proposed scheme decreases the number of wasted cycles down to 82% on average compared to the baseline TM system. Our design has been integrated in LogTM-SE where we observed an average performance improvement of 18%.</P>

      • KCI등재

        Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

        ( Songwen Pei ),( Junge Zhang ),( Linhua Jiang ),( Myoung-seo Kim ),( Jean-luc Gaudiot ) 한국인터넷정보학회 2016 KSII Transactions on Internet and Information Syst Vol.10 No.7

        As processor design has been transiting from homogeneous multicore processor to heterogeneous multicore processor, traditional Amdahl`s law cannot meet the new challenges for asymmetric multicore system. In order to further investigate the impact factors related to the Overhead of Data Preparation (ODP) for Asymmetric multicore systems, we evaluate an asymmetric multicore system built with CPU-GPU by measuring the overheads of memory transfer, computing kernel, cache missing and synchronization. This paper demonstrates that decreasing the overhead of data preparation is a promising approach to improve the whole performance of heterogeneous system.

      • An effective pre-store/pre-load method exploiting intra-request idle time of NAND flash-based storage devices

        Kim, Jin-Young,You, Tae-Hee,Seo, Hyeokjun,Yoon, Sungroh,Gaudiot, Jean-Luc,Chung, Eui-Young Elsevier 2017 Microprocessors and microsystems Vol.50 No.-

        <P><B>Abstract</B></P> <P>NAND flash-based storage devices (NFSDs) are widely employed owing to their superior characteristics when compared to hard disk drives. However, NAND flash memory (NFM) still exhibits drawbacks, such as a limited lifetime and an erase-before-write requirement. Along with effective software management, the implementation of a cache buffer is one of the most common solutions to overcome these limitations. However, the read/write performance becomes saturated primarily because the eviction overhead caused by limited DRAM capacity significantly impacts overall NFSD performance. This paper therefore proposes a method that hides the eviction overhead and overcomes the saturation of the read/write performance. The proposed method exploits the new intra-request idle time (IRIT) in NFSD and employs a new data management scheme. In addition, the new pre-store eviction scheme stores dirty page data in the cache to NFMs in advance. This reduces the eviction overhead by maintaining a sufficient number of clean pages in the cache. Further, the new pre-load insertion scheme improves the read performance by frequently loading data that needs to be read into the cache in advance. Unlike previous methods with large migration overhead, our scheme does not cause any eviction/insertion overhead because it actually exploits the IRIT to its advantage. We verified the effectiveness of our method, by integrating it into two cache management strategies which were then compared. Our proposed method reduced read latency by 43% in read-intensive traces, reduced write latency by 40% in write-intensive traces, and reduced read/write latency by 21% and 20%, respectively, on average compared to NFSD with a conventional write cache buffer.</P>

      • Network Variation and Fault Tolerant Performance Acceleration in Mobile Devices with Simultaneous Remote Execution

        Keunsoo Kim,Cho, Benjamin Y.,Won Woo Ro,Gaudiot, Jean-Luc IEEE 2015 IEEE Transactions on Computers Vol. No.

        <P>As mobile applications provide increasingly richer features to end users, it has become imperative to overcome the constraints of a resource-limited mobile hardware. Remote execution is one promising technique to resolve this important problem. Using this technique, the computation intensive part of the workload is migrated to resource-rich servers, and then once the computation is completed, the results can be returned to the client devices. To enable this operation, strong wireless connectivity is required. However, unstable wireless connections are the staple of real-life. This makes performance unpredictable, sometimes offsetting the benefits brought by this technique and leading to performance degradation. To address this problem, in this paper, we present a Simultaneous Remote Execution (SRE) model for mobile devices. Our SRE model performs concurrent executions both locally and remotely. Therefore, the worst-case execution time on fluctuating network condition is significantly reduced. In addition, SRE provides inherent tolerance for abrupt network failure. We designed and implemented an SRE-based offloading system consisting of a real smartphone and a remote server connected via 3G and Wifi networks. The experimental results under various real-life network variation scenarios show that SRE outperforms the alternative schemes in highly fluctuating network environments.</P>

      • <tex> $C\!\!-\!\!Lock$</tex> : Energy Efficient Synchronization for Embedded Multicore Systems

        Seung Hun Kim,Sang Hyong Lee,Minje Jun,Byunghoon Lee,Won Woo Ro,Eui-Young Chung,Gaudiot, Jean-Luc IEEE 2014 IEEE Transactions on Computers Vol.63 No.8

        <P>Data synchronization among multiple cores has been one of the critical issues which must be resolved in order to optimize the parallelism of multicore architectures. Data synchronization schemes can be classified as lock-based methods (“pessimistic”) and lock-free methods (“optimistic”). However, none of these methods consider the nature of embedded systems which have demanding and sometimes conflicting requirements not only for high performance, but also for low power consumption. As an answer to these problems, we propose C-Lock, an energy- and performance-efficient data synchronization method for multicore embedded systems. C-Lock achieves balanced energy- and performance-efficiency by combining the advantages of lock-based methods and transactional memory (TM) approaches; in C-Lock, the core is blocked only when true conflicts exist (advantage of TM), while avoiding roll-back operations which can cause huge overhead with regard to both performance and energy (this is an advantage of locks). Also, in order to save more energy, C-Lock disables the clocks of the cores which are blocked for the access to the shared data until the shared data become available. We compared our C-Lock approach against traditional locks and transactional memory systems and found that C-Lock can reduce the energy-delay product by up to 1.94 times and 13.78 times compared to the baseline and TM, respectively.</P>

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼