대용량 텍스트 자원을 활용한 한국어 형태소 임베딩의 모델별 성능 비교 분석|RISS 상세보기

국문 초록 (Abstract)

단어 임베딩은 컴퓨터가 자연어를 인식할 수 있도록 하는 변환 기법으로 기계번역, 개체명 인식 등 기계학습을 바탕으로 하는 자연어 처리 분야에서 다양하게 사용되고 있다. 단어 임베딩을 생성하는 다양한 단어 임베딩 모델들이 존재하지만 이러한 모델들을 동일한 조건에서 성능을 비교 분석한 연구가 미비하다. 본 논문에서는 한국어 형태소 단위 띄어쓰기를 기반으로 하여 활발하게 사용되고 있는 모델인 Word2Vec의 Skip-Gram과 CBOW, GloVe, FastText의 성능을 비교 분석한다. 뉴스 대용량 말뭉치 및 세종 말뭉치를 바탕으로 실험한 결과 FastText가 가장 높은 성능을 확인할 수 있었다.

번역하기

단어 임베딩은 컴퓨터가 자연어를 인식할 수 있도록 하는 변환 기법으로 기계번역, 개체명 인식 등 기계학습을 바탕으로 하는 자연어 처리 분야에서 다양하게 사용되고 있다. 단어 임베딩을...

다국어 초록 (Multilingual Abstract)

Word embedding is a transformation technique that enables a computer to recognize natural language. It is used in various fields of natural language processing based on machine learning such as machine translation and named-entity recognition. Various word-embedding models are available; however, few studies have compared the performance of these models under similar conditions. In this paper, we compare and analyze the performance of Word2Vec Skip-Gram, CBOW, Glove, and FastText, which are actively used according to Korean morpheme spacing. Based on experimental results with large news corpus and Sejong corpus, FastText yielded the best performance among CBOW, Skip-gram, Glove, and FastText of Word2Vec.

번역하기

참고문헌 (Reference)

1 남길임, "한국어 정형화된 표현의 분석 단위에 대한 연구: 형태 기반 분석과 어절 기반 분석의 비교를 중심으로" 담화·인지언어학회 20 (20): 113-136, 2013

2 홍진표, "품사 태거와 빈도 정보를 활용한 세종 형태 분석 말뭉치 오류 수정" 한국정보과학회 40 (40): 417-428, 2013

3 Sanghyuk Choi, "On word embedding models and parameters optimized for korean" 2016

4 Hyunsoo Jo, "Korean Word Embedding using FastText"

5 Zhai, M., "Intrinsic and Extrinsic Evaluations of Word Embeddings" 4282-4283, 2016

6 Pennington, J, "Glove, Global vectors for word representation" 1532-1543, 2014

7 Bojanowski, P., "Enriching word vectors with subword information"

8 Mikolov, T., "Efficient estimation of word representations in vector space"

9 Baroni, M, "Don't count, predict! A systematic comparison of contextcounting vs. context-predicting semantic vectors" 1 : 238-247, 2014

10 Mikolov, T., "Distributed representations of words and phrases and their compositionality" 3111-3119, 2013

2 홍진표, "품사 태거와 빈도 정보를 활용한 세종 형태 분석 말뭉치 오류 수정" 한국정보과학회 40 (40): 417-428, 2013

3 Sanghyuk Choi, "On word embedding models and parameters optimized for korean" 2016

4 Hyunsoo Jo, "Korean Word Embedding using FastText"

5 Zhai, M., "Intrinsic and Extrinsic Evaluations of Word Embeddings" 4282-4283, 2016

6 Pennington, J, "Glove, Global vectors for word representation" 1532-1543, 2014

7 Bojanowski, P., "Enriching word vectors with subword information"

8 Mikolov, T., "Efficient estimation of word representations in vector space"

9 Baroni, M, "Don't count, predict! A systematic comparison of contextcounting vs. context-predicting semantic vectors" 1 : 238-247, 2014

10 Mikolov, T., "Distributed representations of words and phrases and their compositionality" 3111-3119, 2013

11 김선우, "Bidirectional LSTM-CRF 기반의 음절 단위 한국어 품사 태깅 및 띄어쓰기 통합 모델 연구" 한국정보과학회 45 (45): 792-800, 2018

연월일	이력구분	이력상세
2021	평가예정	계속평가 신청대상 (등재유지)
2016-01-01	평가	우수등재학술지 선정 (계속평가)
2015-01-01	평가	등재학술지 유지 (등재유지)
2002-01-01	평가	학술지 통합 (등재유지)

기준연도	WOS-KCI 통합IF(2년)	KCIF(2년)	KCIF(3년)
2016	0.19	0.19	0.19
KCIF(4년)	KCIF(5년)	중심성지수(3년)	즉시성지수
0.2	0.18	0.373	0.07

상세검색

RISS 보유자료

상세검색

해외전자자료

대용량 텍스트 자원을 활용한 한국어 형태소 임베딩의 모델별 성능 비교 분석 = Comparative Analysis of Various Korean Morpheme Embedding Models using Massive Textual Resources

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료