Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers|RISS 상세보기

다국어 입력

あぁかがさざただなはばぱまやゃらわゎんいぃきぎしじちぢにひびぴみりうぅくぐすずつづっぬふぶぷむゆゅるえぇけげせぜてでねへべぺめれおぉこごそぞとどのほぼぽもよょろを

アァカサザタダナハバパマヤャラワヮンイィキギシジチヂニヒビピミリウゥクグスズツヅッヌフブプムユュルエェケゲセゼテデヘベペメレオォコゴソゾトドノホボポモヨョロヲ ―

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)

中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.

ㅥ ㅦ ㅧ ㅨ ㅩ ㅪ ㅫ ㅬ ㅭ ㅮ ㅯ ㅰ ㅱ ㅲ ㅳ ㅴ ㅵ ㅶ ㅷ ㅸ ㅹ ㅺ ㅻ ㅼ ㅽ ㅾ ㅿ ㆀ ㆁ ㆂ ㆃ ㆄ ㆅ ㆆ ㆇ ㆈ ㆉ ㆊ ㆋ ㆌ ㆍ ㆎ

Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω

á à Á À é è É È ç Ç ê

Ä Ö Ü ä ö ü ß

ְ ֳ ֲ ֱ ָ ַ ֵ ֶ ִ ֹ ּ ֻ ׂ ׁ ּ פ ם ן ו ט א ר ק ף ך ל ח י ע כ ג ד ש ץ ת צ מ נ ה ב

‘ ’ “ ” 〔〕〈〉「」『』【】＂（）［］｛｝

± × ÷ ≠ ≤ ≥ ∞ ∴ ♂ ♀ ∠ ⊥ ⌒ ∂ ∇ ≡ ≒ ≪ ≫ √ ∽ ∝ ∵ ∫ ∬ ∈ ∋ ⊆ ⊇ ⊂ ⊃ ∪ ∩ ∧ ∨ ￢ ⇒ ⇔ ∀ ∃ ∮ ∑ ∏ ＋－＜＝＞

、。 · ‥ … ¨ 〃 ― ∥ ＼ ∼ ´ ～ ˇ ˘ ˝ ˚ ˙ ¸ ˛ ¡ ¿ ː ！＇，．／：；？＾＿｀｜

½ ⅓ ⅔ ¼ ¾ ⅛ ⅜ ⅝ ⅞ ¹ ² ³ ⁴ ⁿ ₁ ₂ ₃ ₄

Æ Ð Ħ Ĳ Ł Ø Œ Þ Ŧ Ŋ æ đ ð ħ ı ĳ ĸ ŀ ł ø œ ß þ ŧ ŋ ŉ

А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я

′ ″ ℃ Å ￠￡￥ ¤ ℉ ‰ ＄％Ｆ￦㎕㎖㎗ ℓ ㎘㏄㎣㎤㎥㎦㎙㎚㎛㎜㎝㎞㎟㎠㎡㎢㏊㎍㎎㎏㏏㎈㎉㏈㎧㎨㎰㎱㎲㎳㎴㎵㎶㎷㎸㎹㎀㎁㎂㎃㎄㎺㎻㎽㎾㎿㎐㎑㎒㎓㎔ Ω ㏀㏁㎊㎋㎌㏖㏅㎭㎮㎯㏛㎩㎪㎫㎬㏝㏐㏓㏃㏉㏜㏆

§ ※ ☆ ★ ○ ● ◎ ◇ ◆ □ ■ △ ▽ → ← ↑ ↓ ↔ 〓 ◁ ◀ ▷ ▶ ♤ ♠ ♡ ♥ ♧ ♣ ⊙ ◈ ▣ ◐ ◑ ▒ ▤ ▥ ▨ ▧ ▦ ▩ ♨ ☏ ☎ ☜ ☞ ¶ † ‡ ↕ ↗ ↙ ↖ ↘ ♭ ♩ ♪ ♬ ㉿㈜ № ㏇ ™ ㏂㏘ ℡ ＃＆＊＠ ª º

ⅰ ⅱ ⅲ ⅳ ⅴ ⅵ ⅶ ⅷ ⅸ ⅹ Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ

ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ک ل م ن ه و ی

최근 검색 목록
전체삭제 닫기

RISS 인기검색어

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

한글로보기

https://www.riss.kr/link?id=A106494747

저자

William Xiu Shun Wong (Biz Consulting Team, Datasolution Inc) ; Donghoon Lee (Cafe24 Corp.) ; 김남규 (국민대학교)
발행기관
한국경영정보학회
학술지명
Asia Pacific Journal of Information Systems(Asia Pacific Journal of Information Systems)
권호사항

Vol.29 No.4 [2019]
발행연도
2019
작성언어
English
주제어

Text Mining ; Text Classification ; Heterogeneity Learning ; Semi-Supervised Learning ; Ensemble Learning
등재정보
KCI등재,SCOPUS
자료형태
학술저널
수록면

789-816(28쪽)
KCI 피인용횟수
0
DOI식별코드
http://dx.doi.org/10.14329/apjis.2019.29.4.789
제공처
ScienceON, eArticle, eArticle

0
상세조회
0
다운로드
0
내보내기

서지정보 열기

부가정보

다국어 초록 (Multilingual Abstract)

Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

번역하기

Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features gene...

Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

더보기

참고문헌 (Reference)

1 Yarowsky, D., "Unsupervised word sense disambiguation rivaling supervised methods" 189-196, 1995

2 Hofmann, T., "Unsupervised learning by probabilistic latent semantic analysis" 42 (42): 177-196, 2001

3 Aslam, S, "Twitter by the numbers: Stats, demographics and fun facts"

4 Schapire, R. E., "The strength of weak learnability" 5 (5): 197-227, 1990

5 Manning, C. D., "The stanford coreNLP natural language processing toolkit" 55-60, 2014

6 Beyer, M. A., "The importance of ‘big data’: A definition" Gartner Research 2012

7 Nigam, K., "Text classification from labeled and unlabeled documents using EM" 39 (39): 103-134, 2000

8 Mitra, V., "Text classification : A least square support vector machine approach" 7 (7): 908-914, 2007

9 Sáez, J. A., "Tackling the problem of classification with noisy data using Multiple Classifier Systems : Analysis of the performance and robustness" 247 : 1-20, 2013

10 Mallapragada, P. K., "Semiboost : Boosting for semi-supervised learning" 31 (31): 2000-2014, 2009

1 Yarowsky, D., "Unsupervised word sense disambiguation rivaling supervised methods" 189-196, 1995

2 Hofmann, T., "Unsupervised learning by probabilistic latent semantic analysis" 42 (42): 177-196, 2001

3 Aslam, S, "Twitter by the numbers: Stats, demographics and fun facts"

4 Schapire, R. E., "The strength of weak learnability" 5 (5): 197-227, 1990

5 Manning, C. D., "The stanford coreNLP natural language processing toolkit" 55-60, 2014

6 Beyer, M. A., "The importance of ‘big data’: A definition" Gartner Research 2012

7 Nigam, K., "Text classification from labeled and unlabeled documents using EM" 39 (39): 103-134, 2000

8 Mitra, V., "Text classification : A least square support vector machine approach" 7 (7): 908-914, 2007

9 Sáez, J. A., "Tackling the problem of classification with noisy data using Multiple Classifier Systems : Analysis of the performance and robustness" 247 : 1-20, 2013

10 Mallapragada, P. K., "Semiboost : Boosting for semi-supervised learning" 31 (31): 2000-2014, 2009

11 Bennett, K. P., "Semi-supervised support vector machines" 368-374, 1999

12 Rosenberg, C., "Semi-supervised self-training of object detection models" 1 : 29-36, 2005

13 Tanha, J., "Semi-supervised self-training for decision tree classifiers" 8 (8): 355-370, 2017

14 Zhu, X., "Semi-supervised learning with graphs" School of Computer Science, Language Technologies Institute, Carnegie Mellon University 2005

15 Bruce, R., "Semi-supervised learning using prior probabilities and EM" 2001

16 Cozman, F. G., "Semi-supervised learning of mixture models" 99-106, 2003

17 Grandvalet, Y., "Semi-supervised learning by entropy minimization" 529-536, 2005

18 Wang, Y., "Semi-supervised learning based on nearest neighbor rule and cut edges" 23 (23): 547-554, 2010

19 Chapelle, O., "Semi-supervised learning" MIT Press 2006

20 Triguero, I., "Self-labeled techniques for semi-supervised learning : Taxonomy, software and empirical study" 42 (42): 245-284, 2015

21 Li, M., "SETRED : Self-training with editing" 3518 : 611-621, 2005

22 Hernández, M. A., "Real-world data is dirty : Data cleansing and the merge/purge problem" 2 (2): 9-37, 1998

23 Hansen, L. K., "Neural network ensembles" 12 (12): 993-1001, 1990

24 Wu, X., "Mining with noise knowledge: error-aware data mining" 38 (38): 917-932, 2008

25 L’Heureux, A., "Machine learning with big data:Challenges and approaches" 2017

26 Alpaydin, E., "Local linear perceptrons for classification" 7 (7): 788-794, 1996

27 Seeger, M., "Learning with labeled and unlabeled data" University of Edinburgh 2000

28 Blei, D. M., "Latent dirichlet allocation" 3 : 993-1022, 2003

29 Wu, X., "Knowledge acquisition from databases" Ablex Publishing Corp 1996

30 Zhu, X., "Introduction to semi-supervised learning" 3 (3): 1-130, 2009

31 Agarwal, S., "How much noise is too much: A study in automatic text classification" 3-12, 2007

32 Angelova, R., "Graph-based text classification: Learn from your neighbors" 485-492, 2006

33 Liu, W., "Fecs: A cluster based feature selection method for software fault prediction with noises" 2 : 276-281, 2015

34 Tukey, J. W., "Exploratory data analysis" Addison-Wesley 1977

35 Riloff, E., "Exploiting subjectivity classification to improve information extraction" 20 (20): 1106-, 2005

36 Freund, Y., "Experiments with a new boosting algorithm" 148-156, 1996

37 Zhou, Z. H., "Ensemble methods: Foundations and algorithms" CRC press 2012

38 Polikar, R., "Ensemble based systems in decision making" 6 (6): 21-45, 2006

39 Zhu, X., "Eliminating class noise in large datasets" 920-927, 2003

40 Tanha, J., "Disagreement-based co-training" 803-810, 2011

41 Kuncheva, L. I., "Decision templates for multiple classifier fusion : An experimental comparison" 34 (34): 299-314, 2001

42 Kim, S., "Dealing with noise in defect prediction" 481-490, 2011

43 Provost, F., "Data science for business: What you need to know about data mining and data-analytic thinking" O’Reilly Media, Inc 2013

44 Witten, I. H., "Data Mining: Practical machine learning tools and techniques" Morgan Kaufmann 2016

45 Jordan, M. I., "Convergence results for the EM approach to mixtures of experts architectures" 8 (8): 1409-1431, 1995

46 Blum, A., "Combining labeled and unlabeled data with co-training" 92-100, 1998

47 Woods, K., "Combination of multiple classifiers using local accuracy estimates" 19 (19): 405-410, 1997

48 Hartley, H. O., "Classification and estimation in analysis of variance problems" 141-147, 1968

49 Wang, L., "Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval" 629-634, 2003

50 Breiman, L., "Bagging predictors" 24 (24): 123-140, 1996

51 Giacinto, G., "An approach to the automatic design of multiple classifier systems" 22 (22): 25-33, 2001

52 Jacobs, R. A., "Adaptive mixtures of local experts" 3 (3): 79-87, 1991

53 Maulik, U., "A self-trained ensemble with semisupervised SVM : An application to pixel classification of remote sensing imagery" 44 (44): 615-623, 2011

54 Wang, X. Z., "A nonlinear integral defined on partition and its application to decision trees" 11 (11): 317-321, 2007

55 Ando, R. K., "A framework for learning predictive structures from multiple tasks and unlabeled data" 6 : 1817-1853, 2005

56 Freund, Y., "A decision-theoretic generalization of on-line learning and an application to boosting" 55 (55): 119-139, 1997

57 Chapelle, O., "A continuation method for semi-supervised SVMs" 185-192, 2006

58 Dasarathy, B. V., "A composite classifier system design: Concepts and methodology" 67 (67): 708-713, 1979

59 Wang, G., "A comparative assessment of ensemble learning for credit scoring" 38 (38): 223-230, 2011

60 Dimitriadou, E., "A Cluster Ensembles Framework" IOS Press 2003

동일학술지(권/호) 다른 논문

Interferences Between Work and NonWork In the Context of Smartwork: The Role of Boundary Strength and Autonomy
- 한국경영정보학회
- 김용영
- 2019
- KCI등재,SCOPUS
Multi-Purpose Hybrid Recommendation System on Artificial Intelligence to Improve Telemarketing Performance
- 한국경영정보학회
- 김형수
- 2019
- KCI등재,SCOPUS
The Effects of Content and Distribution of Recommended Items on User Satisfaction: Focus on YouTube
- 한국경영정보학회
- Janghun Jeong
- 2019
- KCI등재,SCOPUS
Product Images Attracting Attention: Eye-tracking Analysis
- 한국경영정보학회
- Pavel Shin
- 2019
- KCI등재,SCOPUS

동일학술지 더보기

더보기

분석정보

View

상세정보조회

0

Usage

원문다운로드

0

대출신청

0

복사신청

0

EDDS신청

0

동일 주제 내 활용도 TOP

주제

연도별 연구동향

연도별 활용동향

연관논문

연구자 네트워크맵

공동연구자 (7)

더보기

유사연구자 (20) 활용도상위20명

더보기

인용정보 인용지수 설명보기

학술지 이력

학술지 이력
연월일	이력구분	이력상세	등재구분
2023	평가예정	해외DB학술지평가 신청대상 (해외등재 학술지 평가)
2020-01-01	평가	등재학술지 유지 (해외등재 학술지 평가)
2017-01-01	평가	등재학술지 유지 (계속평가)
2013-01-01	평가	등재 1차 FAIL (등재유지)
2010-01-01	평가	등재학술지 유지 (등재유지)
2009-03-05	학술지명변경	한글명 : 경영정보학 연구 -> Asia Pacific Journal of Information Systems 외국어명 : The Journal of MIS Research -> Asia Pacific Journal of Information Systems
2008-01-01	평가	등재학술지 유지 (등재유지)
2006-01-01	평가	등재학술지 유지 (등재유지)
2004-01-01	평가	등재학술지 유지 (등재유지)
2001-01-01	평가	등재학술지 선정 (등재후보2차)
1998-07-01	평가	등재후보학술지 선정 (신규평가)

학술지 인용정보

학술지 인용정보
기준연도	WOS-KCI 통합IF(2년)	KCIF(2년)	KCIF(3년)
2016	0.49	0.49	0.69
KCIF(4년)	KCIF(5년)	중심성지수(3년)	즉시성지수
0.73	0.7	0.808	0.1

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료

서지정보
부가정보
동일학술지(권/호) 다른 논문
분석정보
인용정보
연관 공개강의(KOCW)

해외이동버튼