http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
기계독해 기반 부분 트리 연결 방법을 적용한 한국어 의존 파싱
민진우,나승훈,신종훈,김영길,김강일 한국정보과학회 2022 정보과학회논문지 Vol.49 No.8
In Korean dependency parsing, biaffine attention models have shown state-of-the-art performances; they first obtain head-level and modifier-level representations by applying two multi-layer perceptrons (MLP) on the encoded contextualized word representation, perform the attention by regarding modifier-level representation as a query and head-level one as a key, and take the resulting attention score as a probability of forming a dependency arc between the corresponding two words. However, given two target words (i.e., candidate head and modifier), biaffine attention methods are basically limited to their word-level representations, not being aware of the explicit boundaries of their phrases or subtrees. Thus, without relying on semantically and syntactically enriched phrase-level and subtree-level representations, biaffine attention methods might be not effective in the case that determining a dependency arc is not simple but complicated such as identifying a dependency between “far-distant” words, where these cases may often require subtree or phrase-level information surrounding target words. To address this drawback, this paper presents the use of dependency paring framework based on machine reading comprehension (MRC) that explicitly utilizes the subtree-level information by mapping a given child subtree and its parent subtree to a question and an answer, respectively. The experiment results on standard datasets of Korean dependency parsing shows that the MRC-based dependency paring outperforms the biaffine attention model. In particular, the results further given observations that improvements in performances are likely strong in long sentences, comparing to short ones. 한국어 의존 파싱은 전이 기반 방식과 그래프 기반 방식의 두 갈래로 연구되어 왔다. 그 중 그래프 기반 의존 파싱은 입력 문장을 인코딩한 후 지배소, 의존소에 대한 MLP를 적용하여 각각의 표상을 얻고 Biaffine 어텐션을 통해 모든 단어 쌍에 대한 그래프 점수를 얻어 이를 통해 트리를 생성하는 Biaffine 어텐션 모델이 대표적이다. Biaffine 어텐션 모델에서 문장 내의 각 단어들은 구문 트리 내의 부분 트리의 역할을 하지만 두 단어간의 의존성만을 판단하기 때문에 부분 트리의 정보를 효율적으로 활용할 수 없다는 단점이 존재한다. 본 연구에서는 이러한 단점을 해결하기 위해 제안된 Span-Span(부분 트리-부분 트리)로의 부분 트리 정보를 직접 모델링하는 기계 독해 기반 의존 파싱 모델을 한국어 구문 분석 데이터 셋에 적용하여 기존 Biaffine 어텐션 방식의 의존 파싱 모델 대비 향상된 결과를 얻었다.
LUKE를 이용한 한국어 자연어 처리: 개체명 인식, 개체 연결
민진우,나승훈,김현호,김선훈,강인호 한국정보과학회 2022 정보과학회 컴퓨팅의 실제 논문지 Vol.28 No.3
Transformer-based language models (LM) such as BERT trained from a large amount of unlabeled corpus using self-supervised learning methods have shown remarkable performance improvement on various natural language processing (NLP) application tasks. Despite the marked improvements, the classical pretrained language model has not directly incorporate external real-world knowledge bases such as a Wikipedia knowledge graph or triples. To inject the real-world knowledge bases to a pretrained language model, many studies towards “knowledge enhanced” pretrained language models have been conducted. Among them, LUKE attaches a sequence of entities to a sequence of original input tokens and performs entity-aware self-attention using entity embeddings, leading to noticeable improved results on entity-related tasks and the state-of-the-art performance in SQuAD dataset. In this paper, we present a Korean version of LUKE pretrained from a large amount of Korean Wikipedia corpus and show its application results on entity-related tasks of Korean. In particular, we newly propose a way of applying LUKE to the entity linking task which has not been explored in the previous works of using LUKE. Experiment results on both Korean named entity recognition and entity linking tasks show improvements over the RoBERTa-based models. BERT와 같은 트랜스포머 기반의 언어 모델은 대용량의 레이블이 없는 말뭉치를 자가 학습방법을 통해 학습한 후 다양한 자연어 처리 응용 태스크에 적용하여 놀라운 성능 향상을 보였다. 이와 같은 언어 모델은 실세계 지식 정보를 표현할 수 없는 단점이 존재하고 이러한 문제를 해결하기 위해 언어 모델에 지식 베이스를 반영하려는 다양한 연구들이 수행되었다. 본 연구에서는 단어 시퀀스 이외에 엔티티 시퀀스와 임베딩을 정의하고 단어와 엔티티의 모든 시퀀스 쌍에 따라 별도의 쿼리 파라미터를 두고 셀프 어텐션을 수행하는 LUKE 모델을 한국어 위키피디아 상에서 학습한 후 엔티티 관련 태스크인 개체명 인식, 개체 연결에 적용하여 기존의 RoBERTa 기반 모델 대비 각각 0.5%p, 1.05%p의 성능 향상을 가져왔다.
김태철,이규승,오범룡,민진우 忠南大學校 環境問題硏究所 1998 環境硏究 Vol.16 No.-
The effect of non-point source pollution in rural area has much influence on the stream water quality. It is difficult to improve the stream water quality, because of complex pollutant loads technically and because of treatment-cost of the deteriorated water economically. The best way to reduce the non-point source pollution is to check the water quality in the inlet of irrigation channel and control the water quality in the outlet of drainage channel. Stream water quality in the rural area is closely related with the fertilizers, pesticides, and livestock wastewater. Basically the rate of treatment and retention time were estimated for BOD, COD, SS, NH₄-N, T-N, NO₃-N, T-P, PO₄-P using materials such as gravel, crushed stone, useless tire, geotextiles, and concrete block, respectively and also using the combined materials. The rate of treatment for BOD, COD, SS is high, but that for NH₄-N, T-N, NO₃-N, T-P, PO₄-P is low. Optimal retention time in the natural contact channel was 90 minutes.
민진우 ( Min Jin Woo ),문종필 ( Moon Jong Pil ),김영식 ( Kim Young Sik ),박승기 ( Park Seung Ki ),김태철 ( Kim Tai Cheol ) 한국농공학회 1998 한국농공학회 학술대회초록집 Vol.1998 No.-
It is difficult to know how to restrict the amount of water supply in the drought season, because there is no objective standard rules. The purpose of the study is to present management rules to overcome the drought in the irrigation reservoir by forecasting the water level and restricting water supply according to the operation rule curve and the pattern of rotation-irrigation system. From the operation rule curve drawn up by analyzing the observed water level of reservoir, the water supply rules and rotation-irrigation patterns using WWW and GIS are suggested.