http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
최소대립 문장쌍을 활용한 한국어 사전학습모델의 통사 연구 활용 가능성 검증
박권식 ( Kwonsik Park ),김성태 ( Seongtae Kim ),송상헌 ( Sanghoun Song ) 한국언어정보학회 2021 언어와 정보 Vol.25 No.3
Syntactic studies make use of the minimally pairwise sentences as an argumentation tool, because the pairs allow us to pay attention to the constraints of interest. Likewise, it is helpful to use a set of minimal pairs in deep learning-based experiments for assessing the syntactic ability of neural language models. In this context, this study verifies whether the deep learning Korean model has the ability to properly distinguish the well-formed expressions and the corresponding ill-formed expressions. In the meanwhile, this study serves to examine the feasibility of the language resource constructed by the Korean government for deep learning architecture. The research is three-fold. First, we conducted an acceptability judgment testing to verify whether and how the language resource used in this study is indeed trustworthy. The results indicate that the judgments provided in the language resource converge with the judgments of our own experiment well enough. Second, we employed four Korean models such as mBERT, KoBERT, KR-BERT, KorBERT in order to evaluate how the language resource has a potentiality to predict the well-formedness of Korean expressions. The different models yield different results, the reason of which is fully discussed. Third, we made use of an independent test-set for evaluating the deep learning systems. It turns out that the results are still challenging, which implies that the current Korean models may have room for improvement to understand the syntactic phenomena.
DeepKLM - 통사 실험을 위한 전산 언어모델 라이브러리 -
이규민 ( Lee Gyu-min ),김성태 ( Seongtae Kim ),김현수 ( Hyunsoo Kim ),박권식 ( Kwonsik Park ),신운섭 ( Unsub Shin ),왕규현 ( Guehyun Wang ),박명관 ( Myung-kwan Park ),송상헌 ( Sanghoun Song ) 연세대학교 언어정보연구원(구 연세대학교 언어정보개발원) 2021 언어사실과 관점 Vol.52 No.-
This paper introduces DeepKLM, a deep learning library for syntactic experiments. The library enables researchers to use the state-of-the-art deep computational language model, based on BERT (Bidirectional Encoder Representations from Transformers). The library, written in Python, works to fill the masked part of a sentence with a specific token, similar to the Cloze task in the traditional language experiments. The output value of surprisal is related to human language processing in terms of speed and complexity. The library additionally provides two visualization tools of the heatmap and the attention head visualization. This article also provides two case studies of NPIs and reflexives employing the library. The library has room for improvement in that the BERT-based components are not entirely on par with those in human language sentences. Despite such limits, the case studies imply that the library enables us to assess human and deep learning machines’ language ability.