It is very hard to estimate the number of total words in a language. Recently large corpus which is the body of written, spoken or other material and which is thought as the representative of a language is under construction. So, it is possible to est...
It is very hard to estimate the number of total words in a language. Recently large corpus which is the body of written, spoken or other material and which is thought as the representative of a language is under construction. So, it is possible to estimate the number of words in a language based on the corpus. In this paper we propose the method for estimating the number of Korean words using Korean corpus and estimate the number of words. We also estimate the number of Korean names which occupy the large part of proper nouns. To estimate the number of total different Korean words and names we applied a generalized linear estimation method. 1,062,392 is the number of estimated Korean words using the corpus of 10 million phrases and 1,493,003 is the estimated number of Korean names.