유전체 서열 조합을 위한 차세대 염기서열 자료의 k-mer 분석 = Analysis of k-mer with NGS sequencing data for genome assembly|RISS 상세보기

국문 초록 (Abstract)

DNA 염기서열 분석은 유전체 정보를 해석하는 가장 기초적인 과정으로, 생물체를 이해하는데 중요한 의미를 지닌다. 최근 차세대 염기서열 분석(NGS) 방법이 발달하면서 대량의 유전체의 정보를 값싸고 빠르게 얻을 수 있게 되었다. 이렇게 NGS로 얻어진 유전체의 정보는 실제 유전체의 크기보다 높은 coverage를 갖고 있으나, 정확도가 문제가 되고 있다. 따라서 낮은 정확도를 가지는 단편서열의 일부를 자르거나 전체를 삭제하는 절단 과정과 단편서열 내에 있는 틀린 염기를 찾아내어 맞는 염기로 고쳐주는 수정 과정을 거쳐 유전체 정보의 정확도를 높여준 후 유전체 서열 조합에 사용하게 된다. k-mer 분석은 길이가 k인 단편 서열의 통계 값을 이용하여 유전체 정보를 분석하는 방법이다. k-mer 분석을 이용하여 절단 과정과 수정 과정이 잘 되었는지를 판단할 수 있고, 유전체의 크기를 예측할 수 있어 유전체 서열 조합 시에 드는 시간과 비용을 절감할 수 있다.
본 연구에서는 원핵 생물의 Escherichia coli와 진핵 생물의 Saccharomyces cerevisiae 유전체 염기서열을 사용하여 k-mer 분포의 경향을 확인하였고 벼의 유전체 염기서열인 IRGSP 1.0과 BGI 93-11의 k-mer 분포도 확인하였다. 이 결과를 근거로 하여 국내 일미 벼의 NGS 자료를 각각 절단 과정을 거치기 전, 절단 과정을 거친 후, 절단과 수정 과정을 거친 후에 k-mer 분석을 하여 정확도를 확인하고 유전체 크기를 예측하였다.

번역하기

DNA 염기서열 분석은 유전체 정보를 해석하는 가장 기초적인 과정으로, 생물체를 이해하는데 중요한 의미를 지닌다. 최근 차세대 염기서열 분석(NGS) 방법이 발달하면서 대량의 유전체의 정보...

다국어 초록 (Multilingual Abstract)

DNA sequencing is the basic process of interpreting genome information, and has important meaning for understanding an organism. Recently, Next Generation Sequencing (NGS) provides much faster and cheaper genome information than conventional sequencing. Genome information from NGS has higher coverage than actual size, however, an issue of accuracy. Therefore, genome information has to be used for genome assembly through processing such as trimming and correction. Trimming are removes either bases or reads with low quality, and correction corrects erroneous bases. Before assembly processing, k-mer (subsequence of length k) analysis can verify accuracy of pre-processing genome information. k-mer analysis is a method of using statistics of the number of k-mer occurrences. k-mer analysis can be a parameter to determine appropriate level of trimming and correction. Also, genome size can be estimated through analyzing the k-mer distribution. Thus k-mer analysis of genome information can reduce labor and time consuming during genome assembly process. In this study, k-mer distribution trend were analyzed by k-mer analysis using genome sequence of prokaryote (Escherichia coli), eukaryote (Saccharomyces cerevisiae) and rice (Oryza sativa, IRGSP 1.0 and BGI 93-11). On the basis of prevenient results, using Ilmi's NGS data with untrimmed, trimmed and corrected processing is identified accuracy and estimated genome size.

번역하기

목차 (Table of Contents)

서 론 1
자료 및 실험 방법 4
1. 유전체 염기서열 자료의 획득 4
2. k-mer 5
3. 사용 프로그램 6

서 론 1
자료 및 실험 방법 4
1. 유전체 염기서열 자료의 획득 4
2. k-mer 5
3. 사용 프로그램 6
4. 유전체의 크기 측정 6
결 과 9
1. 모델 유전체 염기서열의 k-mer 빈도 통계 분석 9
2. 절단 과정과 수정 과정을 거친 일미 벼의 NGS 자료 비교 16
3. 절단 과정에서 high quality의 정도에 따른 자료 선택 23
4. k-mer 빈도를 활용한 일미 벼의 NGS 자료의 유전체 크기 예측 28
결 론 49
참고 문헌 51
Abstract 53

상세검색

RISS 보유자료

상세검색

해외전자자료

유전체 서열 조합을 위한 차세대 염기서열 자료의 k-mer 분석 = Analysis of k-mer with NGS sequencing data for genome assembly

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료