http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
세종 현대국어 코퍼스의 재발견: SJ-RIKS 코퍼스 확장판
김일환 ( Il Hwan Kim ),양경용 ( Gyeong Yong Yang ) 서강대학교 언어정보연구소 2015 언어와 정보 사회 Vol.24 No.-
Through the annotation of parts of speech of the entire corpus that was built by the 21st century Sejong Project (1998-2007), this research aims to develop a basic tool that utilizes the annotated corpus. The 21st century Sejong Project was a long term project that aimed to computerize the Korean language. The first corpus that was a result of the 21st century Sejong Project and sports a massive corpus scale that extends over 130 million words. However, there lacks annotation information on the corpus as well as a basic tool to utilize the corpus fails to exist. Therefore, limitation lies in the fact that only the annotation of part of speech of the 15 million words was usually used in research. Under the goal of improving the succession of the purpose of the 21st century Sejong Project this paper will develop and introduce a tool that annotates the part of speech of the corpus in its entirety and utilizes it. Especially, in order to increase the accuracy of the analysis the corpus text was examined one by one and through the correction process of the errors that were detected we were able to improve the credibility of corpus. Through this process we came one step closer to developing a nation scale corpus that is comparable to the BNC which was the original goal of the 21st century Sejong Project. Furthermore, this research also meant that in the future not only in the Korean language academia but this system will be able to be widely utilized in Informatics of Korean language.