단일염기변이 데이터상에서의 효율적인 분류 방법 : An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset|RISS 상세보기

국문 초록 (Abstract)

최근, 유전학의 변화의 하나로 단일염기변이 (SNP: Single Nucleotide Polymorphism)가 복잡한 질병과 관련하여 주목 받고 있다. 다양한 기계학습 기술이 SNP 데이터에 이용되었다. 하지만 데이터의 포맷과 수많은 feature 때문에 SNP 분석은 복잡한 작업이다. 본 논문에서는 SNP 데이터 분류를 가능하게 하는 효과적인 방법을 제안한다. 본 논문의 목적은 현존하는 여러 가지 기술을 조합하여 SNP 데이터를 분석하는 효과적인 방법을 찾는 것이다. 실험은 NCBI GEO (Gene Expression Omnibus)의 네 개 SNP 데이터 셋을 이용하여 수행되었다. 분석 과정은 세 단계로 구성된다: 첫째, feature의 차원을 줄이고 유익한 SNP를 선택한다. 그 다음, 선택한 SNP로 feature를 생성한다. 마지막으로 Classification과 validation을 수행한다. 첫 번째 과정에서 4가지의 특징선택 알고리즘을 사용하였다: ReliefF, FSDD, RFS, CBFS 그리고 새로운 데이터의 생성과정에서 이전에 다른 논문에서 제안되었던 Feature fusion과 R-value 방법을 사용한다. 그리고 학습 알고리즘으로 SVM과 KNN, Alpha를 사용한다. 실험 결과를 표와 차트로 나타내었다. 제안한 접근 방법은 두 그룹을 구분하는 것에 효과적인 방법임이 증명되었다. 대부분 높은 정확도를 보였고 때로는 100%에 이르기까지 하였다.

번역하기

최근, 유전학의 변화의 하나로 단일염기변이 (SNP: Single Nucleotide Polymorphism)가 복잡한 질병과 관련하여 주목 받고 있다. 다양한 기계학습 기술이 SNP 데이터에 이용되었다. 하지만 데이터의 포...

다국어 초록 (Multilingual Abstract)

Recently, a Single Nucleotide Polymorphism (SNP) which is a unit of genetic variation has caught much attention as it is associated with complex diseases. Various machine learning techniques have been applied on SNP data to distinguish human individuals affected with diseases from healthy ones or predict their predisposition. However, due to its data format and enormous feature space SNP analysis is a complicated task. In this research an efficient method is proposed to facilitate the SNP data classification. The aim was to find the most effective way of SNP data analysis by combining various existing techniques.
The experiment was conducted on four SNP datasets obtained from the NCBI Gene Expression Omnibus (GEO) website, two of them are from patients with mental disorders and their healthy parents; and the other two are cancer related data. The analysis process consists of three stages: first, reduction of feature space and selection of informative SNPs; next, generation of an artificial feature from the selected SNPs; and last but not least, classification and validation. For the first step, we used four feature selection algorithms: ReliefF, Distance Discriminant Feature Selection, R-value based Feature Selection, and Algorithm based on Feature Clearness; and for the construction of the new data we made use of a Feature Fusion and R-value evaluation methods which were proposed in our previous works. In addition, as learning algorithm, we employed Support Vector Machines, k-Nearest Neighbor, and Alpha. The performance of all methods is compared in tables and charts. The proposed approach proved to be effective by distinguishing two groups of individuals with high accuracy, sometimes even reaching 100% preciseness.

번역하기

목차 (Table of Contents)

1. Introduction 1
2. Related work 3
2.1 Feature Selection 3
2.1.1 ReliefF 4
2.1.2 Feature Selection based on Distance Discriminant (FSDD) 4

1. Introduction 1
2. Related work 3
2.1 Feature Selection 3
2.1.1 ReliefF 4
2.1.2 Feature Selection based on Distance Discriminant (FSDD) 4
2.1.3 Feature Selection based on R-value (RFS) 5
2.1.4 Algorithm based on Feature Clearness (CBFS) 5
2.2 Classification 6
2.2.1 K-Nearest Neighbor (KNN) 6
2.2.2 Support Vector Machines (SVM) 6
2.2.3 Artificial Gene Making Method (AGM/Alpha) 7
2.3 Feature Fusion 8
3. Methods and Datasets 9
3.1 Datasets 9
3.2 Methods 11
3.2.1 Feature Fusion Method (FFM) 11
3.2.2 R-value Evaluation 12
4. Results 14
4.1 GSE16125 series 14
4.2 GSE16619 series 18
4.3 GSE13117 series 21
4.4 GSE9222 series 25
5. Discussion and Conclusion 28
6. References 30
Appendix 1 Result Tables 32

상세검색

RISS 보유자료

상세검색

해외전자자료

단일염기변이 데이터상에서의 효율적인 분류 방법 : An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료