RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Text Representation Based on Key Terms of Document for Text Categorization

        Jieming Yang,Zhiying Liu,Zhaoyang Qu 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.4

        The text representation, “bag of words” or vector space model, is widely used by most of the classifiers in text categorization. All the documents fed into the classifier are represented as a vector in the vector space, which consists of all the terms extracted from training set. Due to the characteristics of high dimensionality, feature selection algorithm is usually used to reduce the dimensionality of the vector space. Through feature selection, each document is represented by some representative terms extracted from the training set. Although the classification results based on this document representation methodare better, it is inevitable that some documents may contain few even none representative terms, and these documents must be misclassified. In this paper, we proposed a new text representation method, KT-of-DOC, which represents one document using some key terms extracted from this document. We selected key terms of each document based on six feature selection algorithms, Improved Gini Index (GINI), Information Gain (IG), Mutual Information (MI), Odds Ratio (OR), Ambiguity Measure (AM) and DIA association factor (DIA), respectively, and evaluated the performance of two classifiers, Support Vector Machines (SVM) and K-Nearest Neighbors (KNN), on three benchmark collections, 20-Newsgroups, Reuters-21578 and WebKB. The results show that the proposed representation method can significantly improve the performance of classifier.

      • A Novel Feature Selection Based Gravitation for Text Categorization

        Jieming Yang,Zhiying Liu,Zhaoyang Qu 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.3

        The high dimensionality of feature space is a big hurdle in applying many sophisticated methods to text categorization. The feature selection method is one of methods which reduce the high dimensionality of feature space. In this paper, we proposed a new feature selection algorithm based on gravitation, named GFS, which regards a feature occurring in one category as an object, and all objects corresponding to a feature occurring in various categories can constitute a gravitational field, then the gravitation of a feature with unknown category label on which all objects in the gravitational field act is used for feature selection. We have evaluated GFS on three benchmark datasets (20-Newgroups, Reuters-21578 and WebKB), using two classification algorithms, Naïve Bayes (NB) and Support Vector Machines (SVM), and compared it with four well-known feature selection algorithms (information gain, document frequency, orthogonal centroid feature selection and Poisson distribution). The experiments show that GFS performs significantly better than other feature selection algorithms in terms of micro F1, macro F1 and accuracy.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼