RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

Rand index는 군집화의 재현성을 평가하기 위한 자료 분할법에서 두 군집화 결과간의 일치도를 재는 지표이지만 (Rand, 1971) 개체가 1개 군집에 명확히 할당되는 군집화에만 적용될 수 있다. 따라...

Rand index는 군집화의 재현성을 평가하기 위한 자료 분할법에서 두 군집화 결과간의 일치도를 재는 지표이지만 (Rand, 1971) 개체가 1개 군집에 명확히 할당되는 군집화에만 적용될 수 있다. 따라서, 본 연구의 대상인 퍼지 K-평균 군집화(fuzzy K-means clustering)에서는 개체가 각 군집에 속할 소속도(membership)로 제시되므로 Rand index를 원형 그대로 사용할 수 없다.
본 연구의 목적은 퍼지 K-평균 군집화 결과 간 일치성 평가에 활용 가능하도록 Rand index를 확장하는 것이다. 제안 방법을 요약하면 다음과 같다.
1) 훈련 데이터로부터 얻은 퍼지 K-평균 군집화 규칙을 테스트 자료의 각 개체에 적용하여 K개 (=군집 수) 퍼지 소속도를 구한다. 독립적인 다른 훈련 데이터로부터 얻게 되는 퍼지 K-평균 군집화 규칙을 테스트 자료의 동일 개체에 적용하여 또 다른 K개 퍼지 소속도를 구한다.
2) 각 퍼지 군집화 규칙에 따른 군집 소속도에 비례하게 테스트 자료의 개체를 독립적으로 K개 군집 중 하나에 임의 할당하는 역 퍼지화 작업을 시행하여 명확한 분할(hard partition) 자료를 만든다.
3) 대응하는 두 개의 분할 군집화 결과로부터 통상적인 Rand index (또는 Hubert and Arabie (1985)의 C.(corrected) Rand index)를 산출한다.
4) 앞의 두 단계를 일정 수 반복하여 Rand index의 몬테칼로(Monte Carlo) 분포를 산출한다. 그 분포의 평균을 확장(extended) Rand index로 정의한다.
퍼지 K-평균 군집화에서 군집 수 K를 결정하는 문제에 확장 Rand index를 활용할 수 있다. 몇 개의 적용 사례를 제시하고 토의할 것이다.

다국어 초록 (Multilingual Abstract)

Rand index is an evaluation measure of consistency between two clustering rules (Rand. 1971). Hence it can be used to predict whether the clustering patterns are reproducible in the future. The index, however, cannot be applied to the fuzzy K-means clustering which has clear merits in dealing with overlapping clusters.
The aim of this study is to extend Rand index or corrected Rand index of Hubert and Arabie (1985) for the use in fuzzy K-means clustering. The proposed method can be summarized as follows :
Step 1: Partition the data into three parts - two training samples and one. test sample. Then, derive a K-means clustering rule from the first training sample and another rule from the second training sample. Then, apply both rules separately to the test sample units to obtain the list of cluster membership pairs.
Step 2: Perform the inverse procedure opposite to make things fuzzy. In other words, generate a pair of hard partitions according to respective memberships of fuzzy partitions.
Step 3: Compute Rand index or corrected Rand index of Hubert and Arabie (1985) from a pair of hard partitions.
Step 4: Repeat Steps 3 and 4 for sufficient number of times. Then, one obtains a batch of Rand indices. Define Extended Rand Index by the average of Rand indices.
We may use Extended Rand Index in determination of the number of clusters Kin fuzzy K-means clustering. Several examples are illustrated.

상세검색

RISS 보유자료

상세검색

해외전자자료

퍼지 K-평균 군집화의 재현성 평가 = Reproducibility Evaluation of Fuzzy K-Means Clustering

부가정보

동일학술지(권/호) 다른 논문

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료