용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구|RISS 상세보기

다국어 초록 (Multilingual Abstract)

In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, (paragraph frequency) and (term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.

번역하기

국문 초록 (Abstract)

이 연구에서는 용어 클러스터링을 이용하여 단일문서의 키워드를 추출하는 알고리즘을 제안하고자 한다. 단락단위로 분할한 단일문서를 대상으로 1차 유사도와 2차 분포 유사도를 산출하여 용어 클러스터링을 수행한 결과, 50단어 단락에서 2차 분포 유사도를 적용했을 때 가장 우수한 성능을 나타냈다. 이후, 용어 클러스터링 결과를 이용하여 단일문서의 키워드를 추출하기 위해 단순빈도와 상대빈도의 조합을 통해 다양한 키워드 추출 공식을 도출, 적용한 결과, 단락빈도와 단어빈도×역단락빈도 조건에서 가장 우수한 결과를 나타냈다. 이 결과를 통해, 본 연구에서 제안한 알고리즘은 좋은 키워드가 가져야 할 두 가지 조건인 주제성과 고른 빈도분포라는 측면에서 단일문서를 대상으로 효과적으로 키워드를 추출할 수 있음을 확인하였다.

번역하기

이 연구에서는 용어 클러스터링을 이용하여 단일문서의 키워드를 추출하는 알고리즘을 제안하고자 한다. 단락단위로 분할한 단일문서를 대상으로 1차 유사도와 2차 분포 유사도를 산출하여...

참고문헌 (Reference)

1 김수연, "텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구" 한국정보관리학회 23 (23): 147-166, 2006

2 한승희, "클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구" 한국정보관리학회 21 (21): 251-268, 2004

3 유사라, "정보학연구와 분석방법론" 나남출판 1999

4 정영미, "정보검색연구" 구미무역 2005

5 정영미, "정보검색론" 구미무역 1993

6 이주호,김학수, "의존관계를 이용한 단일문서의 키워드 추출" 36 (36): 293-296, 2009

7 서은경, "용어의 자동분류에 관한 연구" 1 (1): 78-99, 1984

8 이재윤, "분포 유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구" 한국정보관리학회 24 (24): 267-283, 2007

9 Tombros, Anastasios, "The Effects of Query-based Hierarchical Clustering of Documents for Information Retrieval" Cornell University 2002

10 Leweis, David D., "Term clustering of syntactic phrases" 385-404, 1990

1 김수연, "텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구" 한국정보관리학회 23 (23): 147-166, 2006

2 한승희, "클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구" 한국정보관리학회 21 (21): 251-268, 2004

3 유사라, "정보학연구와 분석방법론" 나남출판 1999

4 정영미, "정보검색연구" 구미무역 2005

5 정영미, "정보검색론" 구미무역 1993

6 이주호,김학수, "의존관계를 이용한 단일문서의 키워드 추출" 36 (36): 293-296, 2009

7 서은경, "용어의 자동분류에 관한 연구" 1 (1): 78-99, 1984

8 이재윤, "분포 유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구" 한국정보관리학회 24 (24): 267-283, 2007

9 Tombros, Anastasios, "The Effects of Query-based Hierarchical Clustering of Documents for Information Retrieval" Cornell University 2002

10 Leweis, David D., "Term clustering of syntactic phrases" 385-404, 1990

11 이성직, "TF‐IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법" 한국전자거래학회 14 (14): 59-73, 2009

12 Dagan, Ido, "Similarity-based models of cooccurrence probabilities" 34 (34): 43-69, 1999

13 Callan, James P., "Passage-level evidence on document retrieval" 302-310, 1994

14 Sneath, P. H. A., "Numerical Taxonomy" Freeman 1973

15 ] Lee, Lillan, "Measures of distributional similarity" 25-32, 1999

16 Weeds, J. E., "Measures and Applications of Lexical Distributional Similarity" University of Sussex 2003

17 Turney, Peter D., "Learning algorithm for keyphrase extraction" 2 (2): 303-336, 2000

18 Suzuki, Y., "Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles" 373-374, 1998

19 Matzuo, Y., "Keyword extraction from a single document using word co-occurrence statistical information" 13 (13): 157-169, 2004

20 Witten, Ian H., "KEA: practical automatic keyphrase extraction" 254-255, 1999

21 Kullback, Solomon, "Information Theory and Statistics" Dover Books 1968

22 Strehl, Alexander, "Impact of similarity measures on web-page clustering" 58-64, 2000

23 Al-Khalifa, "Folksonomies versus automatic keyword extraction: an empirical study" 2 : 132-143, 2006

24 Liu, M., "Extractive summarization based on event term clustering" 185-188, 2007

25 Zobel, J., "Efficient Retrieval of Partial Documents" 31 (31): 36-377, 1995

26 Lin, J., "Divergence measures based on the Shannon entropy" 37 (37): 145-151, 1991

27 Pereira, F., "Distributional clustering of English words" 183-190, 1993

28 Plas, L. van der, "Automatic keyword extraction from spoken text" 2205-2208, 2004

29 Sparck Jones, K., "Automatic indexing" 30 (30): 393-432, 1972

30 Hulth, A., "Automatic Keyword Extraction Using Domain Knowledge" 2004/2010 : 472-482, 2010

31 Sparck Jones, K., "Automatic Keyword Classification for Information Retrieval" Butterworth&Co 1971

32 White, H. D., "Author cocitation: a literature measure of intellectual structure" 32 : 163-171, 1981

연월일	이력구분	이력상세
2023	평가예정	계속평가 신청대상 (등재유지)
2018-01-01	평가	우수등재학술지 선정 (계속평가)
2015-01-01	평가	등재학술지 유지 (등재유지)
2013-01-01	평가	등재학술지 유지 (등재유지)
2010-01-01	평가	등재학술지 유지 (등재유지)
2008-01-01	평가	등재학술지 유지 (등재유지)
2006-01-01	평가	등재학술지 유지 (등재유지)
2004-01-01	평가	등재학술지 유지 (등재유지)
2001-07-01	평가	등재학술지 선정 (등재후보2차)
1999-01-01	평가	등재후보학술지 선정 (신규평가)

기준연도	WOS-KCI 통합IF(2년)	KCIF(2년)	KCIF(3년)
2016	0.59	0.59	0.68
KCIF(4년)	KCIF(5년)	중심성지수(3년)	즉시성지수
0.69	0.67	0.952	0.33

상세검색

RISS 보유자료

상세검색

해외전자자료

용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구 = A Study on Keyword Extraction From a Single Document Using Term Clustering

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료