RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

빅 데이터 기술은 대량의 정형 또는 비정형 데이터로부터 가치를 추출하고 결과를 분석하는 기술을 의미 한다. 빅 데이터에는 다양한 응용 분야가 있지만 본 논문에서는 빅 데이터 검색에 초점을 둔다. 대표적 사례로 구글의 ‘음성인식 서비스’를 들 수 있다. 구글에서는 ‘데이터가 많으면 많을수록 더 좋은 음성 인식 서비스를 제공할 수 있다’는 것을 대용량 데이터와 검색의 정확도 상관관계를 분석함으로써 증명하고 있다. 이러한 흐름에 맞추어 본 논문에서는 빅 데이터 검색 문제를 1-Class 문제로 해석하여 접근한다.
긍정(positive)과 부정(negative) 데이터를 필요로 하는 2-Class 또는 n-Class 문제와는 다르게, 1-Class 문제는 주어진 학습데이터의 분포를 측정하고 분포에 가장 적합한 표현 체를 찾는다. 1-Class Support Vector Machine은 이러한 문제를 해결하기 위한 잘 알려진 접근 방법이며 대표적인 연구로써 standard one-class SVM와 LS(Least Squares) one-class SVM이 있다.
LS one-class SVM는 상대적 거리 측정 (proximity measure) 성능이 뛰어 나지만 역행렬 계산이 필요하기 때문에 계산량이 많고 빅 데이터인 경우 계산 자체가 불가능 하다. 반대로 standard one-class SVM은 빠르게 계산할 수 있으며 계산 방식이 학습데이터들에 대한 독립성을 제공하고 있어 병렬처리가 가능하지만 학습 데이터에 대한 최적의 경계를 구할 뿐 경계 내부의 데이터 분포에 대해서는 고려하지 않는다. 이러한 성질은 분류(classification)나 특이성 검출(novelty detection)에 적합하지만 상대적 거리 측정 방법으로는 성능이 LS one-class SVM 보다 떨어진다.
본 논문에서는 standard one-class SVM의 계산 방식으로 LS one-class SVM의 상대적 거리 측정 성능에 근접하는 새로운 one-class SVM을 제안한다. 제안하는 one-class SVM은 평행하는 두 개의 평면으로 데이터들이 존재하는 최소영역을 표현 하고 두 평면의 중간에 위치한 평면을 상대적 거리 측정 기준으로 사용함으로써 LS one-class SVM 근접한 성능을 얻는다. 또한, DH one-class SVM의 구현을 맵리듀스(Mapreduce)와 BSP(Bulk Synchronous Parallel)방식으로 설계함으로써 분산 환경에서 빅 데이터를 병렬로 처리하는 알고리즘의 확장성(scalability)를 얻는다. 실험을 통하여 제안하는 알고리즘의 분류 성능은 standard one-class SVM 보다 떨어지지만 상대적 거리 측정 성능은 standard one-class SVM, kernel mean, kernel PCA 보다 우수하고, LS one-class SVM에는 근접한 결과를 보여 주었다.

번역하기

빅 데이터 기술은 대량의 정형 또는 비정형 데이터로부터 가치를 추출하고 결과를 분석하는 기술을 의미 한다. 빅 데이터에는 다양한 응용 분야가 있지만 본 논문에서는 빅 데이터 검색에 ...

다국어 초록 (Multilingual Abstract)

Instead to two-class or multi class classification problems with positive and negative samples, one-class classification problem is to make a description of a set of training objects and to detect which objects resemble this training set. In order to overcome this problem, there are many reserches including one-class SVM (support vector machine) that is well-known approach. In one-class SVM, there are different two approach. One is standard one-class SVM, the other is LS(least squares) one-class SVM.

The performance of the LS one-class SVM on relevance ranking is superior than traditional methods including standard one-class SVM. But the time cost of the LS one-class SVM is expencive and difficult to compute because need the computation of matrix inversion. On the other hand, the standard one-class SVM has fast and parallel computation to extract the resions where a certain fraction of training objects may locate. But the performance of the standard one-class SVM as proximity measure is less than the performance of the LS one-class SVM. This is because in the standard one-class SVM, the training objects inside the regions may not contribute to the construction of the regions.

In this paper, we reformulate the standard one-class SVM and derive a new method with the computation of the standard one-class SVM and approximate performance of LS one-class SVM, which is called the DH(dual hyperplane) one-class SVM. DH one-class SVM is extract two parallel hyperplane in a kernel feature space such that a given fraction of training objects may reside between the two hyperplane, while at the same time the hyperplane has maximal distance to the origin and minimal gap. We demonstrate the performance of the DH one-class SVM on relevance ranking with positive examples, and also present the comparision with traditional methods including the standard one-class SVM and the LS one-class SVM. Also we demenstrate the scalability of the DH one-class SVM with big data on distributed system according to implement it in Mapreduce and BSP(bulk synchronous parallel) programming models. The experimental results indicate the efficacy of the DH one-class SVM.

번역하기

목차 (Table of Contents)

요 약 i
목 차 iii
그림목차 v
표 목 차 viii
약어목록 x

요 약 i
목 차 iii
그림목차 v
표 목 차 viii
약어목록 x
제 1 장 서 론 1
제 2 장 One-Class Support Vector Machine 5
2.1 구 기반 one-class SVM 7
2.2 평면 기반 one-class SVM 10
2.3 One-class 최적화 13
2.3.1 SMO을 이용한 최적화 13
2.3.2 ISMO을 이용한 최적화 15
제 3 장 검색(retrieval)을 위한 One-Class SVM 18
3.1 LS(least squares) one-class SVM 19
3.2 상대적 거리 측정(proximity measure)을 위한 LS one-class SVM 22
3.2.1 가상의 데이터를 이용한 LS one-class SVM 실험 22
3.2.2 텍스트 데이터를 이용한 상대적 거리 측정 실험 24
제 4 장 비디오 요약(video summarization)을 위한 One-Class SVM 28
4.1 중요도 기반 퍼지 (Importance-based Fuzzy) one-class SVM 29
4.2 중요도 기반 퍼지 one-class SVM의 최적화 32
4.2 중요도 기반 퍼지 one-class SVM 실험 34
4.4 중요도 기반 퍼지 one-class SVM을 이용한 비디오 요약 37
4.4.1 중요도 측정 37
4.4.2 확장 가능 비디오 요약 40
4.4.3 비디오 요약 실험 41
제 5 장 두 평면 기반 One-Class SVM 47
5.1 Dual Hyperplane one-class SVM 48
5.2 SMO를 이용한 DH one-class SVM의 최적화 53
5.3 SMO를 이용한 DH one-class SVM의 최적화 55
5.4 DH one-class SVM 실험 59
5.4.1 가상 데이터를 이용한 DH one-class SVM 실험 59
5.4.2 DH one-class SVM을 이용한 분류 60
5.4.3 상대적 거리 측정을 위한 DH one-class SVM 65
제 6 장 분산 환경을 위한 One-Class SVM 병렬 계산 69
6.1 맵리듀스(MapReduce) 방식의 설계 69
6.2 BSP(Bulk Synchronous Parallel) 방식의 설계 74
6.3 대용량 데이터 실험 78
제 7 장 결 론 80
참고문헌 82

상세검색

RISS 보유자료

상세검색

해외전자자료

빅 데이터 검색을 위한 원 클래스 서포트 벡터 머신 = Large Scale One-Class Support Vector Machine for Big Data Retrieval

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료