http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
중국어 텍스트 분류 작업의 개선을 위한 WWMBERT 기반 방식
왕흠원 ( Xinyuan Wang ),조인휘 ( Inwhee Joe ) 한국정보처리학회 2021 한국정보처리학회 학술대회논문집 Vol.28 No.1
In the NLP field, the pre-training model BERT launched by the Google team in 2018 has shown amazing results in various tasks in the NLP field. Subsequently, many variant models have been derived based on the original BERT, such as RoBERTa, ERNIEBERT and so on. In this paper, the WWMBERT (Whole Word Masking BERT) model suitable for Chinese text tasks was used as the baseline model of our experiment. The experiment is mainly for Text-level Chinese text classification tasks are improved, which mainly combines Tapt (Task-Adaptive Pretraining) and Multi-Sample Dropout method to improve the model, and compare the experimental results, experimental data sets and model scoring standards Both are consistent with the official WWMBERT model using Accuracy as the scoring standard. The official WWMBERT model uses the maximum and average values of multiple experimental results as the experimental scores. The development set was 97.70% (97.50%) on the text-level Chinese text classification task. and 97.70% (97.50%) of the test set. After comparing the results of the experiments in this paper, the development set increased by 0.35% (0.5%) and the test set increased by 0.31% (0.48%). The original baseline model has been significantly improved.