The purpose of this study is to identify the properties of special-word, and to show the process of extracting special-words from a large corpus. A special-word corresponds to the notion of unknown words, which is a counterpart of the lexical databa...
The purpose of this study is to identify the properties of special-word, and to show the process of extracting special-words from a large corpus. A special-word corresponds to the notion of unknown words, which is a counterpart of the lexical database in Natural Language Process(NLP). Generally unknown words cause a lot of ambiguities and thus decline the accuracy of NLP systems. The special-word in this work includes various expressions about the events of the day or the fashions, abbreviated words and naturalized word. We came up with a semi-automatic procedure of constructing a special-word dictionary mainly based on the language-dependent heuristics. We, however, also feel that other statistical considerations including frequencies, and probability distributions may be required for unknown word extractions in a higher automatic fashion.