부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구|RISS 상세보기

다국어 초록 (Multilingual Abstract)

This article is to find the right size of decision trees that performs better for boosting algorithm. First we defined the tree size D as the depth of a decision tree. Then we compared the performance of boosting algorithm with different tree sizes in the experiment. Although it is an usual practice to set the tree size in boosting algorithm to be small, we figured out that the choice of D has a significant influence on the performance of boosting algorithm. Furthermore, we found out that the tree size D need to be sufficiently large for some dataset. The experiment result shows that there exists an optimal D for each dataset and choosing the right size D is important in improving the performance of boosting. We also tried to find the model for estimating the right size D suitable for boosting algorithm, using variables that can explain the nature of a given dataset. The suggested model reveals that the optimal tree size D for a given dataset can be estimated by the error rate of stump tree, the number of classes, the depth of a single tree, and the gini impurity.

번역하기

국문 초록 (Abstract)

범주형 목표변수를 잘 예측하기 위한 데이터마이닝 방법 중에서 최근에는 여러 단일 분류자를 결합한 앙상블 기법이 많이 활용되고 있다. 앙상블 기법 가운데 부스팅은 재표본 시 분류하기 어려운 관찰치의 가중치를 높여 분류자가 해당 관찰치에 보다 집중할 수 있도록 함으로써 다른 앙상블 기법에 비해 오차를 효과적으로 감소시키는 방법으로 알려져 있다. 부스팅을 구성하는 분류자를 의사결정나무로 둔 부스팅 트리 모형의 경우 각 트리의 사이즈를 결정해야 하는데, 본 연구에서는 자료 별로 부스팅 트리에 가장 적합한 트리사이즈가 서로 다를수 있다고 가정하고, 주어진 자료에 맞는 트리사이즈를 추정하는 문제에 대해 논의하였다. 우선 트리사이즈가 부스팅 트리의 정확도에 중요한 영향을 미치는가를 파악하기 위하여 28개의 자료를 대상으로 실험을 수행하였으며, 그 결과 트리사이즈를 결정하는 문제가 모형 전체의 성능을 결정하는데 상당한 역할을 한다는 것을 확인할 수 있었다. 또한 그 결과를 바탕으로 최적의 트리사이즈에 영향을 미칠 것으로 판단되는 몇 가지 특성 변수를 정의하고, 해당 변수를 이용하여 부스팅 트리에서의 최적 트리사이즈를 설명하는 모형을 구성해 보았다. 자료 별로 고유한 최적의 트리사이즈는 자료의 특성에 의존적일 가능성도 있으므로 본 연구에서 제안하는 추정방법은 최적 트리사이즈를 결정하기 위한 출발점 또는 가이드라인으로 활용하는 것이 적절할 것이다. 기존에는 부스팅 트리의 사이즈에 대한 값으로 목표변수의 범주의 개수를 활용하였는데, 본 모형에서 제안하는 트리사이즈의 추정치로 부스팅 트리를 구축한 경우기존방법에 비해 분류정확도를 유의미하게 개선하는 것을 확인할 수 있었다.

번역하기

범주형 목표변수를 잘 예측하기 위한 데이터마이닝 방법 중에서 최근에는 여러 단일 분류자를 결합한 앙상블 기법이 많이 활용되고 있다. 앙상블 기법 가운데 부스팅은 재표본 시 분류하기 ...

참고문헌 (Reference)

1 최진수, "연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구" 한국데이터정보과학회 21 (21): 917-926, 2010

2 최진수, "연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구" 한국데이터정보과학회 21 (21): 917-926, 2010

3 Asuncion, A., "UCI machine learning repository" University of California, School of Information and Computer Science

4 Schapire, R. E., "The strength of weak learnability" 5 : 197-227, 1990

5 Hastie, T., "The elements of statistical learning: Data mining, inference, and prediction" Springer 2001

6 Wolpert, D., "Stacked generalization" 5 : 241-259, 1992

7 Loh, W. -Y., "Regression trees with unbiased variable selection and interaction detection" 12 : 361-386, 2002

8 Zhu, J., "Multi-class Adaboost" 2 : 349-360, 2009

9 Loh, W. -Y., "Improving the precision of classication trees" 3 : 1710-1737, 2009

10 Perrone, M., "Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization" Brown University 1993

1 최진수, "연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구" 한국데이터정보과학회 21 (21): 917-926, 2010

2 최진수, "연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구" 한국데이터정보과학회 21 (21): 917-926, 2010

3 Asuncion, A., "UCI machine learning repository" University of California, School of Information and Computer Science

4 Schapire, R. E., "The strength of weak learnability" 5 : 197-227, 1990

5 Hastie, T., "The elements of statistical learning: Data mining, inference, and prediction" Springer 2001

6 Wolpert, D., "Stacked generalization" 5 : 241-259, 1992

7 Loh, W. -Y., "Regression trees with unbiased variable selection and interaction detection" 12 : 361-386, 2002

8 Zhu, J., "Multi-class Adaboost" 2 : 349-360, 2009

9 Loh, W. -Y., "Improving the precision of classication trees" 3 : 1710-1737, 2009

10 Perrone, M., "Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization" Brown University 1993

11 Schapire, R. E., "Improved boosting algorithms using confidence-rated predictions" 37 : 297-336, 1999

12 Terhune, J. M., "Geographical variation of harp seal underwater vocalisations" 72 : 892-897, 1994

13 Freund, Y., "Game theory, on-line prediction and boosting" 325-332, 1996

14 Heinz, G., "Exploring relationships in body dimensions" 11 : 2003

15 Statlib, "Datasets archive" Carnegie Mellon University, Department of Statistics

16 Kearns, M., "Cryptographic limitations on learning Boolean formulae and finite automata" 41 : 67-95, 1994

17 Clemen, R., "Combining forecasts: A review and annotated bibliography" 5 : 559-583, 1989

18 Kim, H., "Classication trees with unbiased multiway splits" 96 : 589-604, 2001

19 Kim, H., "Classication trees with bivariate linear discriminant node models" 12 : 512-530, 2003

20 Freund, Y., "Boosting a weak learning algorithm by majority" 121 : 256-285, 1995

21 Breiman, L., "Bagging predictors" 26 : 123-140, 1996

22 Friedman, J., "Additive logistic regression: A statistical view of boosting (with discussion)" 28 : 337-407, 2000

23 김현중, "A weight-adjusted voting algorithm for ensembles of classifiers" 한국통계학회 40 (40): 437-449, 2011

24 Valiant, L. G., "A theory of the learnable" 27 : 1134-1142, 1984

25 Freund, Y., "A decision-theoretic generalization of on-line learning and an application to boosting" 55 : 119-139, 1997

연월일	이력구분	이력상세
2022	평가예정	계속평가 신청대상 (등재유지)
2017-01-01	평가	우수등재학술지 선정 (계속평가)
2013-01-01	평가	등재학술지 유지 (등재유지)
2010-01-01	평가	등재학술지 유지 (등재유지)
2008-01-01	평가	등재학술지 유지 (등재유지)
2005-01-01	평가	등재학술지 선정 (등재후보2차)
2004-01-01	평가	등재후보 1차 PASS (등재후보1차)
2003-01-01	평가	등재후보학술지 유지 (등재후보2차)
2002-01-01	평가	등재후보 1차 PASS (등재후보1차)
2001-01-01	평가	등재후보학술지 선정 (신규평가)

기준연도	WOS-KCI 통합IF(2년)	KCIF(2년)	KCIF(3년)
2016	1.18	1.18	1.07
KCIF(4년)	KCIF(5년)	중심성지수(3년)	즉시성지수
1.01	0.91	0.911	0.35

상세검색

RISS 보유자료

상세검색

해외전자자료

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구 = The guideline for choosing the right-size of tree for boosting algorithm

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료