RISS 검색 - 국내학술지논문

무료
기관 내 무료
유료

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
Tense Markers Do Not Impact Presupposition Projection In Korean

Unsub Shin(신운섭),Sanghoun Song(송상헌) 담화·인지언어학회 2022 담화·인지언어학회 학술대회 발표논문집 Vol.2022 No.11
- 원문보기
2
언어 인공지능의 상식추론과 평가 체계 현황

신운섭(Unsub Shin),송상헌(Sanghoun Song) 부경대학교 인문사회과학연구소 2022 인문사회과학연구 Vol.23 No.3
- 원문보기
최근 인공지능의 성능이 고도로 향상됨에 따라, 인공지능이 인간의 언어 구사 능력에 가까워졌다는 주장이 제기되었다. 예컨대, 인공지능 GPT-3는 인간의 작문 능력과 구별되지 않는 성능을 보이는 것처럼 알려졌다. 그러나, 구체적인 평가의 영역에 따라 인공지능과 인간이 큰 격차를 보인다. 대표적인 것이 상식추론이다. 예를 들어, 영희가 책가방을 메고 학교에 가는지, 아니면 나이트클럽을 가는지는 논리가 아닌 상식에 비추어 자명하다. 특히, 상식추론은 경험세계에 대한 광범위한 지식이 필요하다는 점에서, 문자열의 분포적 정보로부터 사실적인 지식을 이끌어내야 하는 인공지능에게 매우 도전적인 과제이다. 이 점에 착안하여 최근 인공지능이 상식추론을 학습하였는지 평가하기 위한 정량적 평가 체계 또는 벤치마크가 공개되고 있다. 튜링 테스트에서 출발한 벤치마크는 일종의 수만 건의 문제은행으로서,정확도와 유사도를 기반으로 인공지능의 상식추론을 정량적으로 검증한다. 이에 본고는 인공지능 상식추론과 평가 체계의 현황을 폭넓게 검토하고, 인문사회학적 관점에서 비판적인 이해를 시도한다. 구체적으로, 자연어처리 분야의 신경망 언어 모형 또는 워드 임베딩이 어떻게 문자열을 학습하는지 개념적으로 이해한다. 이와 함께, 인공지능이 학습한 추론 지식을 검증하는 평가 체계 또는 자연어처리 벤치마크의 구축 방법론과 예시 문장을 분석한다. 이를 위하여 최근 공개한 한국어 인공지능 벤치마크인 KLUE를 사례로 분석을 제시한다. 또한, 대표적인 벤치마크인 SWAG, CosmosQA, 그리고 CommonGen을 분석한다. 이와 함께, 최근의 대규모 인공지능의 개발이 내포하는 환경적, 경제적, 윤리적 우려가 커지고 있음을 지적하고, 언어학적 튜링 테스트를 중심으로 정량적 평가 체계의 본질적인 한계를 논의한다. Recent advances in artificial intelligence (AI) showed that language model, i.e., probability estimator of word occurrences in contexts, may capture the human reasoning ability. This surprising finding is built upon the previous research on the empirical evaluation of the reasoning ability of AIs. The empirical evaluation measures the performance gap between the AIs and human speakers on “well-curated” datasets, often called as NLP benchmarks. Recently, many researchers propose new NLP benchmarks testing the commonsense reasoning, e.g, what should you do if you encounter a grizzly bear chasing the baby?. The question is tricky because the decision depends on situations, where baby may refer to the human baby or simply the baby cow. Thus, it has been noted that benchmarking the commonsense reasoning is critical for AI’s human-like performance in intelligent tasks. In this paper, we review some types of commonsense reasoning and newly released evaluation benchmarks, suggesting that the reliability of dataset is hinged upon the data curation method. We first briefly introduce how language model using neural networks learns or predicts the word probability. We then proceed to review how workers or annotators curate the NLP benchmarks, focusing on the collection of human intuitions regarding the text examples. Although these curation methods empirically show the diverse reasoning ability of AI, there are concerns about the negative social impacts of extremely large AIs. Importantly, NLP benchmarks are sometimes misleading because AI simply captures the shallow surface structure of language, which denotes that AI successfully mapped texts to texts. Overall, we suggest that constructing the colossal AIs is not a silver bullet to commonsense reasoning since AIs are not free from data bias.
3
DeepKLM - 통사 실험을 위한 전산 언어모델 라이브러리 -

이규민 ( Lee Gyu-min ),김성태 ( Seongtae Kim ),김현수 ( Hyunsoo Kim ),박권식 ( Kwonsik Park ),신운섭 ( Unsub Shin ),왕규현 ( Guehyun Wang ),박명관 ( Myung-kwan Park ),송상헌 ( Sanghoun Song ) 연세대학교 언어정보연구원(구 연세대학교 언어정보개발원) 2021 언어사실과 관점 Vol.52 No.-
- 원문보기 2
  KCI
  
  KISS
This paper introduces DeepKLM, a deep learning library for syntactic experiments. The library enables researchers to use the state-of-the-art deep computational language model, based on BERT (Bidirectional Encoder Representations from Transformers). The library, written in Python, works to fill the masked part of a sentence with a specific token, similar to the Cloze task in the traditional language experiments. The output value of surprisal is related to human language processing in terms of speed and complexity. The library additionally provides two visualization tools of the heatmap and the attention head visualization. This article also provides two case studies of NPIs and reflexives employing the library. The library has room for improvement in that the BERT-based components are not entirely on par with those in human language sentences. Despite such limits, the case studies imply that the library enables us to assess human and deep learning machines’ language ability.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천