RISS 검색 - 국내학술지논문 상세보기

다국어 초록 (Multilingual Abstract)

The logisitic regression is widely used in binary data classification areas with its flexibility and a high level of classification accuracy. However, when analyzing imbalanced data with different class sizes, the classification accuracy in minority class (sensitivity) may drop significantly because logistic regression classifiers is biased toward the majority class so that it classifies almost all observations to majority class. Therefore, we study logistic regression with various sampling technique to increase classification accuracy in minority class. Furthermore, we study lasso logistic regression in analyzing an imbalanced data not only to increase classification accuracy, but also to select important explanatory variables. In this study, we demonstrate the effectiveness of the proposed methods through simulation studies and a real data analysis in terms of classification accuracy and model selection.

국문 초록 (Abstract)

로지스틱 회귀분석(logistic regression)은 이항 범주형 자료의 분류분석에서 높은 분류정확도와 유연성을 바탕으로 다양한 분야에서 널리 활용되고 있다. 그러나 소수집단과 다수집단의 개체수...

로지스틱 회귀분석(logistic regression)은 이항 범주형 자료의 분류분석에서 높은 분류정확도와 유연성을 바탕으로 다양한 분야에서 널리 활용되고 있다. 그러나 소수집단과 다수집단의 개체수가 현저하게 차이나는 불균형 자료(imbalanced data)의 분류분석에서 로지스틱 회귀분석은 다수집단에 편향된 분류함수를 추정하여 대부분의 자료를 다수집단으로 분류함으로써 소수집단의 분류 정확도가 현저히 감소하게 되는 제한사항이 있다. 따라서 로지스틱 회귀분석을 이용한 불균형 자료의 분류분석에서 소수집단의 분류 정확도를 높이기 위하여 본 논문에서는 다양한 샘플링 기법을 이용한 로지스틱 회귀분석 방법론에 대하여 연구하였다. 또한 설명변수(explanatory variable)가 고차원인 불균형 자료의 분류분석에서 잡음변수(noise variables)를 제거하고 중요한 설명변수들을 모형에 선택하기 위하여 라소 로지스틱 회귀분석(lasso logistic regression)에 샘플링 기법을 적용한 방법론에 대해서도 연구하였다. 본 논문에서는 모의실험과 실제자료의 분석을 통하여 분류정확도와 모형의 간결성 측면에서 제안한 방법론의 우수한 성능과 유용성을 확인하였다.

참고문헌 (Reference)

1 정현승, "불균형 데이터에 대한 오버샘플링 효과 연구" 한국자료분석학회 10 (10): 2089-2098, 2008

2 김유정, "로지스틱 회귀분석모형을 이용한 인터넷 서비스 이용의 사회경제적 특성" 한국자료분석학회 12 (12): 2685-2701, 2010

3 최국렬, "로지스틱 모형을 이용한 정시합격자들의 이탈 특성 분석" 한국자료분석학회 4 (4): 91-102, 2002

4 이희재, "데이터 전처리와 앙상블 기법을 통한 불균형데이터의 분류모형 비교 연구" 한국통계학회 27 (27): 357-371, 2014

5 김병수, "구매예측을 위한 로지스틱회귀모형과 MBR 모형 비교" 한국자료분석학회 14 (14): 1301-1314, 2012

6 김지현, "계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과" 한국통계학회 17 (17): 445-457, 2004

7 Breheny, P., "grpreg: Regularization paths for regression models with grouped covariates, R package version 2.8-1"

8 Bang, S., "Weighted support vector machine using k-means clustering" 43 : 2307-2324, 2014

9 Lichman, M., "UCI machine learning repository"

10 Cox, D. R., "The regression analysis of binary sequences" 20 : 215-242, 1958