http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Cache Conscious Parallel Pattern Matching for Aho-Corasick Algorithm on a GPU
쟌 느앗-프엉,이명호,홍석원,최동훈 한국차세대컴퓨팅학회 2012 한국차세대컴퓨팅학회 논문지 Vol.8 No.1
Pattern matching is a common and important operation in many applications including network security, bioinformatics, etc. Among many pattern matching algorithms, Aho-Corasick (AC) algorithm is intensively used in these applications. In order to speed up and meet the real-time performance requirement for AC algorithm, developing an efficient parallelization technique is essential. In this paper, we develop a new parallelization approach to cache both the input text data and the reference data organized as a 2-dimensional table in the on-chip memories (or caches) on the Graphic Processing Unit (GPU). The new approach also schedules memory accesses carefully to minimize the overhead in loading data to the on-chip shared memory. The approach significantly cuts down the memory latency to load the data and leads to impressive performance improvement. Experimental results on NVidia GT9500 GPU shows up to 15x speedup compared with a serial version on 2.2Ghz Core2Duo Intel processor.
Boyer-Moore 알고리즘을 위한 GPU상에서의 병렬 최적화
정요상(Yosang Jeong),쟌 느앗-프엉(Nhat-Phuong Tran),이명호(Myungho Lee),남덕윤(Dukyun Nam),김직수(Jik-Soo Kim),황순욱(Soonwook Hwang) 한국정보과학회 2015 정보과학회 컴퓨팅의 실제 논문지 Vol.21 No.2
Boyer-Moore 알고리즘은 컴퓨터 및 인터넷 보안, 바이오 인포매틱스 등의 응용프로그램에서 널리 활용되는 패턴매칭 알고리즘이다. 이 알고리즘은 방대한 양의 입력 데이터에 존재하는 특정한 하나의 패턴을 실시간에 검색해야하는 높은 계산 요구량으로 인하여 병렬 처리 및 성능 최적화가 필수적이다. 본 논문에서는 GPU를 활용하여 BM 알고리즘을 병렬 최적화하는 방법론을 제안한다. 방법론에 따라 알고리즘 cascading 기법을 적용하여 실행시간에 소요되는 매핑 오버헤드를 최소화하고, 멀티스레딩 효과를 극대화하여 스레드들간의 부하 부산을 향상시킴으로써 순차실행 대비 최대 45배의 성능향상을 얻었다. The Boyer-Moore algorithm is a single pattern string matching algorithm that is widely used in various applications such as computer and internet security, and bioinformatics. This algorithm is computationally demanding and requires high-performance parallel processing. In this paper, we propose a parallelization and performance optimization methodology for the BM algorithm on a GPU. Our methodology adopts an algorithmic cascading technique. This results in significant reductions in the mapping overheads for the threads participating in the parallel string matching. It also results in the efficient utilization of the multithreading capability of the GPU which improves the load balancing among threads. Our experimental results show that this approach achieves a 45-times speedup at maximum, in comparison with a serial execution.