In this paper, we design and implement a categorization system to the web-document which is diverse and noisy. As there is no consistent form and content in the document, it is not easy to design and implement the web-document categorization system. T...
In this paper, we design and implement a categorization system to the web-document which is diverse and noisy. As there is no consistent form and content in the document, it is not easy to design and implement the web-document categorization system. The presented system adopts the neural network method which is suitable for learning and processing the noise to determine the category of the web-document. This system consists of a Korean morphological analyzer, a sense extractor, a sense disambiguator, and a category determiner. The morphological analyzer separates the noun word in the document. The sense extractor acquires the senses on the words in the document. The sense disambiguator solves the ambiguity of the word. Finally, the category determiner decides the category of the input document with the neural network. In this paper, we use the sense disambiguator to solve the ambiguity of the word. Therefore, we can get the more good categorization quality with the sense disambiguator.