RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI등재 SCOPUS

      Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

      한글로보기

      https://www.riss.kr/link?id=A106494747

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.
      번역하기

      Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features gene...

      Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

      더보기

      참고문헌 (Reference)

      1 Yarowsky, D., "Unsupervised word sense disambiguation rivaling supervised methods" 189-196, 1995

      2 Hofmann, T., "Unsupervised learning by probabilistic latent semantic analysis" 42 (42): 177-196, 2001

      3 Aslam, S, "Twitter by the numbers: Stats, demographics and fun facts"

      4 Schapire, R. E., "The strength of weak learnability" 5 (5): 197-227, 1990

      5 Manning, C. D., "The stanford coreNLP natural language processing toolkit" 55-60, 2014

      6 Beyer, M. A., "The importance of ‘big data’: A definition" Gartner Research 2012

      7 Nigam, K., "Text classification from labeled and unlabeled documents using EM" 39 (39): 103-134, 2000

      8 Mitra, V., "Text classification : A least square support vector machine approach" 7 (7): 908-914, 2007

      9 Sáez, J. A., "Tackling the problem of classification with noisy data using Multiple Classifier Systems : Analysis of the performance and robustness" 247 : 1-20, 2013

      10 Mallapragada, P. K., "Semiboost : Boosting for semi-supervised learning" 31 (31): 2000-2014, 2009

      1 Yarowsky, D., "Unsupervised word sense disambiguation rivaling supervised methods" 189-196, 1995

      2 Hofmann, T., "Unsupervised learning by probabilistic latent semantic analysis" 42 (42): 177-196, 2001

      3 Aslam, S, "Twitter by the numbers: Stats, demographics and fun facts"

      4 Schapire, R. E., "The strength of weak learnability" 5 (5): 197-227, 1990

      5 Manning, C. D., "The stanford coreNLP natural language processing toolkit" 55-60, 2014

      6 Beyer, M. A., "The importance of ‘big data’: A definition" Gartner Research 2012

      7 Nigam, K., "Text classification from labeled and unlabeled documents using EM" 39 (39): 103-134, 2000

      8 Mitra, V., "Text classification : A least square support vector machine approach" 7 (7): 908-914, 2007

      9 Sáez, J. A., "Tackling the problem of classification with noisy data using Multiple Classifier Systems : Analysis of the performance and robustness" 247 : 1-20, 2013

      10 Mallapragada, P. K., "Semiboost : Boosting for semi-supervised learning" 31 (31): 2000-2014, 2009

      11 Bennett, K. P., "Semi-supervised support vector machines" 368-374, 1999

      12 Rosenberg, C., "Semi-supervised self-training of object detection models" 1 : 29-36, 2005

      13 Tanha, J., "Semi-supervised self-training for decision tree classifiers" 8 (8): 355-370, 2017

      14 Zhu, X., "Semi-supervised learning with graphs" School of Computer Science, Language Technologies Institute, Carnegie Mellon University 2005

      15 Bruce, R., "Semi-supervised learning using prior probabilities and EM" 2001

      16 Cozman, F. G., "Semi-supervised learning of mixture models" 99-106, 2003

      17 Grandvalet, Y., "Semi-supervised learning by entropy minimization" 529-536, 2005

      18 Wang, Y., "Semi-supervised learning based on nearest neighbor rule and cut edges" 23 (23): 547-554, 2010

      19 Chapelle, O., "Semi-supervised learning" MIT Press 2006

      20 Triguero, I., "Self-labeled techniques for semi-supervised learning : Taxonomy, software and empirical study" 42 (42): 245-284, 2015

      21 Li, M., "SETRED : Self-training with editing" 3518 : 611-621, 2005

      22 Hernández, M. A., "Real-world data is dirty : Data cleansing and the merge/purge problem" 2 (2): 9-37, 1998

      23 Hansen, L. K., "Neural network ensembles" 12 (12): 993-1001, 1990

      24 Wu, X., "Mining with noise knowledge: error-aware data mining" 38 (38): 917-932, 2008

      25 L’Heureux, A., "Machine learning with big data:Challenges and approaches" 2017

      26 Alpaydin, E., "Local linear perceptrons for classification" 7 (7): 788-794, 1996

      27 Seeger, M., "Learning with labeled and unlabeled data" University of Edinburgh 2000

      28 Blei, D. M., "Latent dirichlet allocation" 3 : 993-1022, 2003

      29 Wu, X., "Knowledge acquisition from databases" Ablex Publishing Corp 1996

      30 Zhu, X., "Introduction to semi-supervised learning" 3 (3): 1-130, 2009

      31 Agarwal, S., "How much noise is too much: A study in automatic text classification" 3-12, 2007

      32 Angelova, R., "Graph-based text classification: Learn from your neighbors" 485-492, 2006

      33 Liu, W., "Fecs: A cluster based feature selection method for software fault prediction with noises" 2 : 276-281, 2015

      34 Tukey, J. W., "Exploratory data analysis" Addison-Wesley 1977

      35 Riloff, E., "Exploiting subjectivity classification to improve information extraction" 20 (20): 1106-, 2005

      36 Freund, Y., "Experiments with a new boosting algorithm" 148-156, 1996

      37 Zhou, Z. H., "Ensemble methods: Foundations and algorithms" CRC press 2012

      38 Polikar, R., "Ensemble based systems in decision making" 6 (6): 21-45, 2006

      39 Zhu, X., "Eliminating class noise in large datasets" 920-927, 2003

      40 Tanha, J., "Disagreement-based co-training" 803-810, 2011

      41 Kuncheva, L. I., "Decision templates for multiple classifier fusion : An experimental comparison" 34 (34): 299-314, 2001

      42 Kim, S., "Dealing with noise in defect prediction" 481-490, 2011

      43 Provost, F., "Data science for business: What you need to know about data mining and data-analytic thinking" O’Reilly Media, Inc 2013

      44 Witten, I. H., "Data Mining: Practical machine learning tools and techniques" Morgan Kaufmann 2016

      45 Jordan, M. I., "Convergence results for the EM approach to mixtures of experts architectures" 8 (8): 1409-1431, 1995

      46 Blum, A., "Combining labeled and unlabeled data with co-training" 92-100, 1998

      47 Woods, K., "Combination of multiple classifiers using local accuracy estimates" 19 (19): 405-410, 1997

      48 Hartley, H. O., "Classification and estimation in analysis of variance problems" 141-147, 1968

      49 Wang, L., "Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval" 629-634, 2003

      50 Breiman, L., "Bagging predictors" 24 (24): 123-140, 1996

      51 Giacinto, G., "An approach to the automatic design of multiple classifier systems" 22 (22): 25-33, 2001

      52 Jacobs, R. A., "Adaptive mixtures of local experts" 3 (3): 79-87, 1991

      53 Maulik, U., "A self-trained ensemble with semisupervised SVM : An application to pixel classification of remote sensing imagery" 44 (44): 615-623, 2011

      54 Wang, X. Z., "A nonlinear integral defined on partition and its application to decision trees" 11 (11): 317-321, 2007

      55 Ando, R. K., "A framework for learning predictive structures from multiple tasks and unlabeled data" 6 : 1817-1853, 2005

      56 Freund, Y., "A decision-theoretic generalization of on-line learning and an application to boosting" 55 (55): 119-139, 1997

      57 Chapelle, O., "A continuation method for semi-supervised SVMs" 185-192, 2006

      58 Dasarathy, B. V., "A composite classifier system design: Concepts and methodology" 67 (67): 708-713, 1979

      59 Wang, G., "A comparative assessment of ensemble learning for credit scoring" 38 (38): 223-230, 2011

      60 Dimitriadou, E., "A Cluster Ensembles Framework" IOS Press 2003

      더보기

      동일학술지(권/호) 다른 논문

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      인용정보 인용지수 설명보기

      학술지 이력

      학술지 이력
      연월일 이력구분 이력상세 등재구분
      2023 평가예정 해외DB학술지평가 신청대상 (해외등재 학술지 평가)
      2020-01-01 평가 등재학술지 유지 (해외등재 학술지 평가) KCI등재
      2017-01-01 평가 등재학술지 유지 (계속평가) KCI등재
      2013-01-01 평가 등재 1차 FAIL (등재유지) KCI등재
      2010-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2009-03-05 학술지명변경 한글명 : 경영정보학 연구 -> Asia Pacific Journal of Information Systems
      외국어명 : The Journal of MIS Research -> Asia Pacific Journal of Information Systems
      KCI등재
      2008-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2006-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2004-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2001-01-01 평가 등재학술지 선정 (등재후보2차) KCI등재
      1998-07-01 평가 등재후보학술지 선정 (신규평가) KCI등재후보
      더보기

      학술지 인용정보

      학술지 인용정보
      기준연도 WOS-KCI 통합IF(2년) KCIF(2년) KCIF(3년)
      2016 0.49 0.49 0.69
      KCIF(4년) KCIF(5년) 중심성지수(3년) 즉시성지수
      0.73 0.7 0.808 0.1
      더보기

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼