http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Efficient embedded code generation with multiple load/store instructions
Paek, Yunheung,Ahn, Minwook,Cho, Doosan,Kim, Taehwan John Wiley & Sons Ltd 2007 Software Vol.37 No.11
<P>In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single-instruction multiple-data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal. Copyright © 2007 John Wiley & Sons, Ltd.</P>
Register coalescing techniques for heterogeneous register architecture with copy sifting
Ahn, Minwook,Paek, Yunheung Association for Computing Machinery 2009 ACM transactions on embedded computing systems Vol.8 No.2
<P>Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes homogeneous registers, which are all gathered in the same register file. Although this register architecture is still common in most general-purpose processors, embedded processors often contain heterogeneous registers, which are scattered in physically different register files dedicated for each dissimilar purpose and use. In this work, we show that optimistic coalescing is also useful for an embedded processor to better handle such heterogeneity of the register architecture, and developed a modified algorithm for optimal coalescing that helps a register allocator. In the experiment, an existing register allocator was able to achieve up to 13.0% reduction in code size through our coalescing, and avoid many spills that would have been generated without our scheme.</P>
동형암호를 위한 FPGA 기반의 하드웨어 가속기에 관한 연구 동향
이용석 ( Yongseok Lee ),백윤흥 ( Yunheung Paek ) 한국정보처리학회 2021 한국정보처리학회 학술대회논문집 Vol.28 No.2
최근 개인 정보 보호를 위해 주목 받고 있는 동형암호 알고리즘은 암호화된 상태로 덧셈과 곱셈 연산이 가능하여, 연산을 위한 복호화 과정 없이 데이터에 대한 가공이 가능하다. 따라서 이러한 동형암호 알고리즘이 개인 정보 보호를 위한 방법으로 떠오르고 있으며, 특히 완전동형암호 알고리즘의 경우 덧셈과 곱셈 연산을 모두 지원하며, 유효 연산 횟수에도 제한이 없어 응용 분야에서 널리 활용될 것으로 예상된다. 그러나, 완전동형암호 알고리즘의 경우 암호문의 크기가 평문대비 크게 증가하고, 다항식으로 구성된 암호문의 덧셈 및 곱셈 연산도 복잡하여 이에 대한 가속이 필요한 실정이다. 이에 FPGA 기반의 동형암호 가속기 개발이 많이 연구되고 있으며, 이를 통해 동형암호 연산의 특징을 이해하고 가속기 연구 동향을 알아보려 한다.
A dynamic per-context verification of kernel address integrity from external monitors
Lee, Hojoon,Kim, Minsu,Paek, Yunheung,Kang, Brent Byunghoon Elsevier 2018 Computers & security Vol.77 No.-
<P><B>Abstract</B></P> <P>The introduction of <I>Address Translation Redirection Attack (ATRA)</I> has revealed a critical weakness in all existing hardware-based <I>external</I> kernel integrity monitors. The attack redefines system's memory mappings in favor of the attacker so that the monitored kernel regions are relocated out of the monitor's sight. We provide a generalized approach and a prototype evaluation to prove our proposed scheme for ensuring the integrity of kernel address mapping from external monitors.</P> <P>With a slight modification on the hardware-side on the host, we were able to enable the monitor to continuously trace <I>Page Table Base Register (PTBR)</I> of the host – which is an essential capability in monitoring the host memory mapping integrity.</P> <P>In conjunction with this hardware feature, we incorporate our findings on the invariant of the kernel memory mapping patterns to implement a dynamic per-context page table monitoring scheme. Our experiment proves the viability of our work in terms of its effectiveness against memory mapping manipulation attacks such as ATRA and satisfies the time constraints required by the proposed per-context mapping verification scheme.</P>
Preprocessing Methods for Effective Modulo Scheduling on High Performance DSPs
조두산(Doosan Cho),백윤흥(Yunheung Paek) 한국정보과학회 2007 정보과학회논문지 : 소프트웨어 및 응용 Vol.34 No.5
고성능 다중 이슈 DSP의 하드웨어 리소스 사용률을 높이기 위해서, 제공되는 상용 컴파일러는 일반적으로 반복 모듈로 스케쥴링(Iterative Modulo Scheduling)을 포함하고 있다. 하지만, 통신 및 미디어 처리 응용의 루프에 존재하는 과도한 순환 데이타 의존관계는 모듈로 스케쥴링 자유도를 제한하고 있다. 결과적으로, 멀티 이슈를 위한 DSP의 병렬 기능 유닛들은 완전히 사용되고 있지 못하다. 이러한 하드웨어 리소스 저사용 문제를 해결하기 위하여, 이 논문은 효율적인 모듈로 스케쥴링을 위한 새로운 컴파일러 전처리 기법을 기술하고 있다. 제안하는 전처리 기법은 두 가지로서 클로닝과 디스맨틀링으로 불리우며, 이 두가지 기법들은 StarCore SC140 DSP 컴파일러에 구현하여 검증하였다. To achieve high resource utilization for multi-issue DSPs, production compiler commonly includes variants of iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, unduly restrict modulo scheduling freedom. As a result, replicated functional units in multi-issue DSPs are often under-utilized. To address this resource under-utilization problem, our paper describes a novel compiler preprocessing strategy for effective modulo scheduling. The preprocessing strategy proposed capitalizes on two new transformations, which are referred to as cloning and dismantling. Our preprocessing strategy has been validated by an implementation for StarCore SC140 DSP compiler.
조정훈(JEONGHUN CHO),백윤흥(YUNHEUNG PAEK),최준식(JUNSIK CHOI) 한국정보과학회 2003 한국정보과학회 학술발표논문집 Vol.30 No.2Ⅰ
Virtually every digital signal processors(DSPs) support on-chip multi- memory banks that allow the processor to access multiple words of data from memory in a single instruction cycle. Also, all existing fixed-point DSPs have irregular architecture of heterogeneous register which contains multiple register files that arc distributed and dedicated to different sets of instructions. Although there have been several studies conducted to efficiently assign data to multi-memory banks, most of them assumed processors with relatively simple, homogeneous general-purpose registers. Therefore, several vendor-provided compilers for DSPs were unable to efficiently assign data to multiple data memory banks, thereby often failing to generate highly optimized code for their machines. This paper presents an algorithm that helps the compiler to efficiently assign data to multi- memory banks. Our algorithm differs from previous work in that it assigns variables to memory banks in separate, decoupled code generation phases, instead of a single, tightly-coupled phase. The experimental results have revealed that our decoupled algorithm greatly simplifies our code generation process; thus our compiler runs extremely fast, yet generates target code that is comparable in quality to the code generated by a coupled approach.