http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level Prefetching
Junwhan Ahn,Sungjoo Yoo,Kiyoung Choi IEEE 2016 IEEE transactions on very large scale integration Vol.24 No.2
<P>The hybrid memory cube (HMC) is a 3-D-stacked DRAM architecture designed for substantially improved memory bandwidth. In particular, its I/O interface achieves up to 320 GB/s of external bandwidth through high-speed serial links. However, it comes at the cost of large static power of off-chip links, which dominates total power consumption of HMCs. In this paper, we propose an adaptive mechanism to partially disable off-chip links of HMCs to reduce the energy consumption of the off-chip links. In order to determine the number of the links to be disabled upon application loads, we develop a simple hardware module called link delay monitor to simulate all different link configurations at the same time and find the largest number of the links to be disabled while satisfying the given performance constraint. We also present two-level prefetching with in-HMC prefetch buffers to further improve the efficiency of our link power management scheme in the presence of prefetching. Evaluations show that our scheme reduces the energy consumption of HMCs by 52% on average with a negligible performance degradation.</P>
Prediction Hybrid Cache: An Energy-Efficient STT-RAM Cache Architecture
Junwhan Ahn,Sungjoo Yoo,Kiyoung Choi IEEE 2016 IEEE Transactions on Computers Vol. No.
<P>Spin-transfer torque RAM (STT-RAM) has emerged as an energy-efficient and high-density alternative to SRAM for large on-chip caches. However, its high write energy has been considered as a serious drawback. Hybrid caches mitigate this problem by incorporating a small SRAM cache for write-intensive data along with an STT-RAM cache. In such architectures, choosing cache blocks to be placed into the SRAM cache is the key to their energy efficiency. This paper proposes a new hybrid cache architecture called prediction hybrid cache. The key idea is to predict write intensity of cache blocks at the time of cache misses and determine block placement based on the prediction. We design a write intensity predictor that realize the idea by exploiting a correlation between write intensity of blocks and memory access instructions that incur cache misses of those blocks. It includes a mechanism to dynamically adapt the predictor to application characteristics. We also design a hybrid cache architecture in which write-intensive blocks identified by the predictor are placed into the SRAM region. Evaluations show that our scheme reduces energy consumption of hybrid caches by 28 percent (31 percent) on average compared to the existing hybrid cache architecture in a single-core (quad-core) system.</P>
Isomorphism-Aware Identification of Custom Instructions With I/O Serialization
Junwhan Ahn,Kiyoung Choi IEEE 2013 IEEE transactions on computer-aided design of inte Vol.32 No.1
<P>Extensible processors have been widely used to achieve the conflicting demands for performance improvement, low power consumption, and flexibility. As extensible processors have become more popular, several algorithms have been proposed for automatically identifying instruction-set extensions in order to reduce the effort of manual design and verification. However, most of them focus on finding large and complex instructions that are used only once, rather than repeatedly used ones. Moreover, some other approaches that consider recurrence are limited to finding small instructions. This paper proposes a novel algorithm that considers the instruction reusability as well as input/output (I/O) serialization. In order to overcome the high complexity of the problem, we develop a canonical-form construction algorithm for fast isomorphism detection on directed acyclic graphs and an incremental template generation algorithm that identifies the best custom instruction in terms of a user-defined fitness function. Moreover, our algorithm serializes I/O operations so that the numbers of inputs and outputs of custom instructions are not limited by the microarchitecture. This paper also proposes an algorithm for multiple custom instructions utilizing a well-known iterative selection algorithm. Last, it presents a hybrid algorithm composed of our algorithm and the previous algorithm that does not consider reusability. Experimental results show that our isomorphism-aware algorithm achieves significant improvement over previous approaches in terms of algorithm runtime, as well as performance gain obtained by custom instructions.</P>
AIM : Energy-Efficient Aggregation Inside the Memory Hierarchy
Ahn, Junwhan,Yoo, Sungjoo,Choi, Kiyoung Association for Computing Machinery 2016 ACM transactions on architecture and code optimiza Vol.13 No.4
<P>In this article, we propose Aggregation-in-Memory (AIM), a new processing-in-memory system designed for energy efficiency and near-term adoption. In order to efficiently perform aggregation, we implement simple aggregation operations in main memory and develop a locality-adaptive host architecture for inmemory aggregation, called cache-conscious aggregation. Through this, AIM executes aggregation at the most energy-efficient location among all levels of the memory hierarchy. Moreover, AIM minimally changes existing sequential programming models and provides fully automated compiler toolchain, thereby allowing unmodified legacy software to use AIM. Evaluations show that AIM greatly improves the energy efficiency of main memory and the system performance.</P>
Fast Generation of Multiple Custom Instructions under Area Constraints
Di Wu,Imyong Lee,Junwhan Ahn,Kiyoung Choi 대한전자공학회 2011 Journal of semiconductor technology and science Vol.11 No.1
Extensible processors provide an efficient mechanism to boost the performance of the whole system without losing much flexibility. However, due to the intense demand of low cost and power consumption, customizing an embedded system has been more difficult than ever. In this paper, we present a framework for custom instruction generation considering both area constraints and resource sharing. We also present how we can speed up the process through pruning and library-based design space exploration.