http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Automatic Bunsetsu Segmentation of Japanese Sentences Using a Classification Tree
( Yujie Zhang ),( Kazuhiko Ozeki ) 한국언어정보학회 1998 국제 워크샵 Vol.1998 No.-
Bunsetsu, which is comprised of a content word followed by, possibly 0, function words, is a convenient unit for dependency structure analysis of Japanese. There are, however, no spaces indicating bunsetsu boundaries in the orthographic writing of Japanese. Thus a sentence must be segmented into bunsetsu``s by some means prior to dependency structure analysis. Conventionally, such segmentation has been performed by using some kind of hand-crafted rules. This paper describes a novel segmentation method using a classification tree, by which knowledge about bunsetsu boundaries is automatically acquired from a labeled corpus. The method enables quick and easy adaptation to a new task domain, and also to a new system of morpheme categorization without the need of changing the algorithm. Effectiveness of this method is shown through experiments on an ATR corpus and an EDR corpus.