http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Ambiguity Resolution in Chinese Word Segmentation
( Sun Maosong ) 한국언어정보학회 1995 국제 워크샵 Vol.1995 No.-
A new method for Chinese word segmentation named Conditional F&BMM (Forward and Backward Maximal Matching) \vhich incorporates both bigram statistics (ie., mutual infonnation and difference of t-test between Chinese characters) and linguistic rules for ambiguity resolution is proposed in this paper The key characteristics of this model are the use of: (i) statistics which can be automatically derived from any raw corpus, (ii) a rule base for disambiguation with consistency and controlled size to be built up in a systematic way.
Identification of Chinese Personal Names in Unrestricted Texts
( Lawrence Cheung ),( Benjamin K Tsou ),( Maosong Sun ) 한국언어정보학회 2002 국제 워크샵 Vol.2002 No.-
Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name identification algorithm.