http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
강범일 ( Kang¸ Beomil ) 연세대학교 언어정보연구원 2021 언어사실과 관점 Vol.54 No.-
This paper introduces the process of building a Korean diachronic corpus based on articles in Chosun Ilbo and Donga Ilbo from 1920 to 2019. Newspapers reflect not only the social but also the linguistic reality of their time, as they convey a variety of information and thoughts in the language of ordinary people. Such data must be processed into a form that can be analyzed quantitatively for an effective understanding of this linguistic reality. In order to do so, the spacing and notation of some vocabulary items were modified to meet current norms, and vocabulary listed in various dictionaries was added to the dictionary referenced by the morphological analyzer to improve vocabulary unit detection. After this pre-processing, changes in linguistic form were investigated to show the application of this corpus. The mean number of syllables in words decreased and the length of the sentences showed a continuous decrease. In addition, the proportion of Chinese characters in articles dropped and the use of Hangul and Alphabets has increased.
강범일 ( Kang Beomil ) 연세대학교 언어정보연구원 2023 언어사실과 관점 Vol.59 No.-
본 연구에서는 기존의 산포도 관련 논의를 통해 산포도 척도가 발전되어 온 흐름을 정리하고, 이를 측정하는 여러 척도 중 타당성이 높다고 평가받는 비율 편차(Deviation of Proportions) 계열의 두 척도를 중심으로 한국어 말뭉치의 출현 어휘들을 분석해 보았다. 그 결과, 빈도의 영향을 없앤 DPnofreq가 빈도와의 상관성이 가장 낮고, 저빈도어를 대상으로도 변별력 있는 값을 산출하여 빈도와 차별화된 정보를 제공하는 것을 확인할 수 있었다. 이러한 결과는 말뭉치 기반 통계 연구에서 어휘를 비롯한 언어 단위의 중요도를 판단할 때 빈도와 더불어 산포도가 함께 고려될 필요가 있음을 보여 준다. This study summarises the development of dispersion measures by examining discussions on dispersion in the field of corpus linguistics, and analyzes word dispersion in the Korean corpus, focusing on two measures of the DP(Deviation of Proportions) family, which are considered to have high validity among other dispersion measures. The results show that DPnofreq, which eliminates the impact of frequency, has the lowest correlation with frequency and provides distinctive information even for low-frequency words. These results suggest that in corpus-based statistical studies, the dispersion should be taken into account in addition to frequency when assessing word importance or commonness.
말뭉치 언어학과 통계학의 만남 - Vaclav Brezina(2018), Statistics in Corpus Linguistics -
강범일 ( Kang Beomil ) 연세대학교 언어정보연구원 2022 언어사실과 관점 Vol.57 No.-
This article discusses Vaclav Brezina’s book Statistics in Corpus Linguistics: A Practical Guide (2018). Although many linguists find statistics difficult, rigorous statistical procedures must be applied to generalize the findings from a corpus to the language as a whole. This book introduces statistical procedures related to a variety of topics in linguistics. Concepts from basic to complex are explained in easy language, and various learning materials are provided through a companion website. The book also includes the latest statistical methodologies and various visualization examples for linguistics research. Thus, this book can be recommended for linguistic researchers studying statistics for the first time.
김하수(Kim, Ha-Soo),손현정(Son, Hyunjung),이재윤(Lee, Jae Yun),강범일(Kang, Beomil) 담화·인지언어학회 2013 담화와 인지 Vol.20 No.1
This paper aims to conduct a linguistic analysis of discourses by three Korean politicians who ran for the Korean presidency in 2012. Data on their speech and writing are collected for this purpose from TV talk shows on which they appeared and the books they wrote. Various methods are used to analyse the data, such as network analysis, word frequency analysis and the lexical repetition measuring method. Firstly, we extract frequent words from each politician’s subcorpus and identify statistically distinctive words that each politician used more frequently than other politicians. Secondly, we construct networks of co-occurring words via which the differences in network structure are analysed. Finally, a new method for measuring lexical repetition is used to discover pragmatic differences in their discourses. By applying these methods to the discourse data, we can more effectively propose linguistic characteristics of the politicians’ discourses.