http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Fast Computation of Rank and Select Functions for Succinct Representation
NA, Joong Chae,KIM, Ji Eun,PARK, Kunsoo,KIM, Dong Kyue The Institute of Electronics, Information and Comm 2009 IEICE transactions on information and systems Vol.92 No.10
<P>Succinct representation is a space-efficient method to represent <I>n</I> discrete objects using space close to the information-theoretic lower bound. In order to directly access the <I>i</I>th object of succinctly represented data structures in constant time, two fundamental functions, rank and select, are commonly used. In this paper we propose two implementations supporting rank and select in constant time for non-compressed bit strings. One uses $O(n \\log \\log n / \\sqrt{\\log n})$bits of extra space and the other uses $n+O(n \\log \\log n / \\log n)$bits of extra space in the worst case. The former is rather a theoretical algorithm and the latter is a practical algorithm which works faster and uses less space in practice.</P>
A new graph model and algorithms for consistent superstring problems
Na, Joong Chae,Cho, Sukhyeun,Choi, Siwon,Kim, Jin Wook,Park, Kunsoo,Sim, Jeong Seop The Royal Society 2014 Philosophical transactions. Series A, Mathematical Vol.372 No.2016
Problems related to string inclusion and non-inclusion have been vigorously studied in diverse fields such as data compression, molecular biology and computer security. Given a finite set of positive strings P and a finite set of negative strings N, a string alpha is a consistent superstring if every positive string is a substring of alpha and no negative string is a substring of alpha. The shortest (resp. longest) consistent superstring problem is to find a string alpha that is the shortest (resp. longest) among all the consistent superstrings for the given sets of strings. In this paper, we first propose a new graph model for consistent superstrings for given P and N. In our graph model, the set of strings represented by paths satisfying some conditions is the same as the set of consistent superstrings for P and N. We also present algorithms for the shortest and the longest consistent superstring problems. Our algorithms solve the consistent superstring problems for all cases, including cases that are not considered in previous work. Moreover, our algorithms solve in polynomial time the consistent superstring problems for more cases than the previous algorithms. For the polynomially solvable cases, our algorithms are more efficient than the previous ones.
Na, Joong Chae,Park, Kunsoo Elsevier 2007 Theoretical computer science Vol.385 No.1-3
<P><B>Abstract</B></P><P>The <I>suffix array</I> is a fundamental index data structure in string algorithms and bioinformatics, and the <I>compressed suffix array (CSA)</I> and the <I>FM-index</I> are its compressed versions. Many algorithms for constructing these index data structures have been developed. Recently, Hon et al. [W.K. Hon, K. Sadakane, W.K. Sung, Breaking a time-and-space barrier in constructing full-text indices, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003, pp. 251–260] proposed a construction algorithm using O(nloglog|Σ|) time and O(nlog|Σ|)-bit working space, which is the fastest algorithm using O(nlog|Σ|)-bit working space.</P><P>In this paper we give an efficient algorithm to construct the index data structures. Our algorithm constructs the suffix array, the CSA, the FM-index, and Burrows–Wheeler transform using alphabet-independent O(n) time and O(nlog|Σ|log|Σ|αn)-bit working space, where α=<SUB>log3</SUB>2. Our algorithm takes less time and more space than Hon et al.’s algorithm. Our algorithm uses least working space among alphabet-independent linear-time algorithms.</P>
FM-index of alignment with gaps
Na, Joong Chae,Kim, Hyunjoon,Min, Seunghwan,Park, Heejin,Lecroq, Thierry,Lé,onard, Martine,Mouchard, Laurent,Park, Kunsoo Elsevier 2018 Theoretical computer science Vol.710 No.-
<P><B>Abstract</B></P> <P>Recently a compressed index for similar strings, called the <I>FM-index of alignment</I> (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of strings without gaps. In this paper we propose the <I>FM-index of alignment with gaps</I>, a realistic index for similar strings, which allows gaps in their alignment. For this, we design a new version of the suffix array of alignment by using an alignment transformation and a new definition of the alignment-suffix. The new suffix array of alignment enables us to support the LF-mapping and backward search, the key functionalities of the FM-index, regardless of gap existence in the alignment. We experimentally compared our index with RLCSA due to Mäkinen et al. and related indexes GCSA due to Sirén et al. and GCSA2 due to Sirén on genome sequences from the 1000 Genomes Project. The index size of our index is smaller than those of other indexes.</P>
나중채 ( Joong Chae Na ) 한국정보처리학회 2009 한국정보처리학회 학술대회논문집 Vol.16 No.1
접미사 트리는 주어진 하나의 문자열의 모든 접미사를 표현하는 트리로, 문자열 처리, 압축 등 다양한 분야에서 활용된다. 접미사 트리는 문자열 집합에 대한 자료구조로 확장될 수 있는데, 이를 일반화된 접미사 트리라 부른다. 본 논문에서는 일반화된 접미사 트리를 동반적이면서 온라인으로 생성하는 문제를 다룬다. 기존의 생성 알고리즘은 정방향의 문자열이 아닌 역방향의 문자열들에 대한 일반화된 접미사 트리를 생성하여, 부자연스럽다. 본 논문에서는 정방향 문자열들의 일반화된 접미사 트리를 동반적이면서 온라인으로 생성하는 알고리즘을 제시한다.
품질 정보를 이용한 서열 배치 알고리즘 (pp.578-587)
나중채(Joong Chae Na),노강호(Kangho Roh),박근수(Kunsoo Park) 한국정보과학회 2005 정보과학회논문지 : 시스템 및 이론 Vol.32 No.11·12
본 논문에서 다루는 문제는 품질 정보를 가지는 서열을 배치(alignment)하는 알고리즘이다. 시퀀싱(sequencing) 작업의 일부인 염기 결정 프로그램(base-calling program)에 의해서 생성되는 DNA 서열은 각 염기가 어느 정도 신뢰할 수 있는 가를 나타내는 품질 정보를 가진다. 그러나 지금까지 개발된 서열 배치 알고리즘들은 이러한 품질 정보를 고려하지 않았다. 본 논문에서는 품질 정보를 가지는 두 서열의 배치를 평가하는 기준을 제시한다. 이 평가 기준에 의한 최적의 서열 배치는 동적 프로그래밍(dynamic programming) 기법에 의해서 찾을 수 있다. In this paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we propose a measure of an alignment of two sequences with quality scores. We show that an optimal alignment in this measure can be found by dynamic programming.
나중채 ( Joong Chae Na ) 한국정보처리학회 2009 한국정보처리학회 학술대회논문집 Vol.16 No.1
절단 접미사 트리(truncated suffix tree)는 접미사 트리의 절단 버전으로, 주어진 문자열의 부분 문자열중 일정 길이 이하인 것들만을 표현하는 자료구조이다. 절단 접미사 트리는 일정 길이 이하의 문자열들만을 고려하는 응용에 유용한데, 특히 LZ77 압축과 같이 온라인 생성 알고리즘이 필요한 응용들도 있다. 본 논문에서는 절단 접미사 트리를 온라인으로 생성하는 새로운 알고리즘을 제시한다.