http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
김남원(Narnwon Kim),박진수(Jinsoo Park) 한국지능정보시스템학회 2012 지능정보연구 Vol.18 No.1
As the Internet becomes more popular, many people use it to communicate. With the increasing number of personal homepages, blogs, and social network services, people often expose their personal information online. Although the necessity of those services cannot be denied, we should be concerned about the negative aspects such as personal information leakage. Because it is impossible to review all of the past records posted by all of the people, an automatic personal information detection method is strongly required. This study proposes a method to detect or classify online documents that contain personal information by analyzing features that are common to personal information related documents and learning that information based on the Naive Bayes algorithm. To select the document classification algorithm, the Naive Bayes classification algorithm was compared with the Vector Space classification algorithm. The result showed that Naive Bayes reveals more excellent precision, recall, F-measure, and accuracy than Vector Space does. However, the measurement level of the Naive Bayes classification algorithm is still insufficient to apply to the real world. Lewis, a learning algorithm researcher, states that it is important to improve the quality of category features while applying learning algorithms to some specific domain. He proposes a way to incrementally add features that are dependent on related documents and in a step-wise manner. In another experiment, the algorithm learns the additional dependent features thereby reducing the noise of the features. As a result, the latter experiment shows better performance in terms of measurement than the former experiment does.