http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing
( Songwen Pei ),( Junge Zhang ),( Linhua Jiang ),( Myoung-seo Kim ),( Jean-luc Gaudiot ) 한국인터넷정보학회 2016 KSII Transactions on Internet and Information Syst Vol.10 No.7
As processor design has been transiting from homogeneous multicore processor to heterogeneous multicore processor, traditional Amdahl`s law cannot meet the new challenges for asymmetric multicore system. In order to further investigate the impact factors related to the Overhead of Data Preparation (ODP) for Asymmetric multicore systems, we evaluate an asymmetric multicore system built with CPU-GPU by measuring the overheads of memory transfer, computing kernel, cache missing and synchronization. This paper demonstrates that decreasing the overhead of data preparation is a promising approach to improve the whole performance of heterogeneous system.
Kim, Jin-Young,You, Tae-Hee,Seo, Hyeokjun,Yoon, Sungroh,Gaudiot, Jean-Luc,Chung, Eui-Young Elsevier 2017 Microprocessors and microsystems Vol.50 No.-
<P><B>Abstract</B></P> <P>NAND flash-based storage devices (NFSDs) are widely employed owing to their superior characteristics when compared to hard disk drives. However, NAND flash memory (NFM) still exhibits drawbacks, such as a limited lifetime and an erase-before-write requirement. Along with effective software management, the implementation of a cache buffer is one of the most common solutions to overcome these limitations. However, the read/write performance becomes saturated primarily because the eviction overhead caused by limited DRAM capacity significantly impacts overall NFSD performance. This paper therefore proposes a method that hides the eviction overhead and overcomes the saturation of the read/write performance. The proposed method exploits the new intra-request idle time (IRIT) in NFSD and employs a new data management scheme. In addition, the new pre-store eviction scheme stores dirty page data in the cache to NFMs in advance. This reduces the eviction overhead by maintaining a sufficient number of clean pages in the cache. Further, the new pre-load insertion scheme improves the read performance by frequently loading data that needs to be read into the cache in advance. Unlike previous methods with large migration overhead, our scheme does not cause any eviction/insertion overhead because it actually exploits the IRIT to its advantage. We verified the effectiveness of our method, by integrating it into two cache management strategies which were then compared. Our proposed method reduced read latency by 43% in read-intensive traces, reduced write latency by 40% in write-intensive traces, and reduced read/write latency by 21% and 20%, respectively, on average compared to NFSD with a conventional write cache buffer.</P>
Complexity-Effective Contention Management with Dynamic Backoff for Transactional Memory Systems
Seung Hun Kim,Dongmin Choi,Won Woo Ro,Gaudiot, Jean-Luc IEEE 2014 IEEE Transactions on Computers Vol.63 No.7
<P>Reducing memory access conflicts is a crucial part of the design of Transactional Memory (TM) systems since the number of running threads increases and long latency transactions gradually appear: without an efficient contention management, there will be repeated aborts and wasteful rollback operations. In this paper, we present a dynamic backoff control algorithm developed for complexity-effective and distributed contention management in Hardware Transactional Memory (HTM) systems. Our approach aims at controlling the restarting intervals of aborted transactions, and can be easily applied to the various TM systems. To this end, we have profiled the applications of the STAMP benchmark suite and have identified those “problem” transactions which repeatedly cause aborts in the applications with the attendant high contention rate. The proposed algorithm alleviates the impact of these repeated aborts by dynamically adjusting the initial exponent value of the traditional backoff approach. In addition, the proposed scheme decreases the number of wasted cycles down to 82% on average compared to the baseline TM system. Our design has been integrated in LogTM-SE where we observed an average performance improvement of 18%.</P>
Keunsoo Kim,Cho, Benjamin Y.,Won Woo Ro,Gaudiot, Jean-Luc IEEE 2015 IEEE Transactions on Computers Vol. No.
<P>As mobile applications provide increasingly richer features to end users, it has become imperative to overcome the constraints of a resource-limited mobile hardware. Remote execution is one promising technique to resolve this important problem. Using this technique, the computation intensive part of the workload is migrated to resource-rich servers, and then once the computation is completed, the results can be returned to the client devices. To enable this operation, strong wireless connectivity is required. However, unstable wireless connections are the staple of real-life. This makes performance unpredictable, sometimes offsetting the benefits brought by this technique and leading to performance degradation. To address this problem, in this paper, we present a Simultaneous Remote Execution (SRE) model for mobile devices. Our SRE model performs concurrent executions both locally and remotely. Therefore, the worst-case execution time on fluctuating network condition is significantly reduced. In addition, SRE provides inherent tolerance for abrupt network failure. We designed and implemented an SRE-based offloading system consisting of a real smartphone and a remote server connected via 3G and Wifi networks. The experimental results under various real-life network variation scenarios show that SRE outperforms the alternative schemes in highly fluctuating network environments.</P>
<tex> $C\!\!-\!\!Lock$</tex> : Energy Efficient Synchronization for Embedded Multicore Systems
Seung Hun Kim,Sang Hyong Lee,Minje Jun,Byunghoon Lee,Won Woo Ro,Eui-Young Chung,Gaudiot, Jean-Luc IEEE 2014 IEEE Transactions on Computers Vol.63 No.8
<P>Data synchronization among multiple cores has been one of the critical issues which must be resolved in order to optimize the parallelism of multicore architectures. Data synchronization schemes can be classified as lock-based methods (“pessimistic”) and lock-free methods (“optimistic”). However, none of these methods consider the nature of embedded systems which have demanding and sometimes conflicting requirements not only for high performance, but also for low power consumption. As an answer to these problems, we propose C-Lock, an energy- and performance-efficient data synchronization method for multicore embedded systems. C-Lock achieves balanced energy- and performance-efficiency by combining the advantages of lock-based methods and transactional memory (TM) approaches; in C-Lock, the core is blocked only when true conflicts exist (advantage of TM), while avoiding roll-back operations which can cause huge overhead with regard to both performance and energy (this is an advantage of locks). Also, in order to save more energy, C-Lock disables the clocks of the cores which are blocked for the access to the shared data until the shared data become available. We compared our C-Lock approach against traditional locks and transactional memory systems and found that C-Lock can reduce the energy-delay product by up to 1.94 times and 13.78 times compared to the baseline and TM, respectively.</P>