RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI우수등재

      빅 데이터와 기계 학습의 시대 심리학 연구 모형의 평가 원칙과 방법 = Principles and methods for model assessment in psychological research in the era of big-data and machine learning

      한글로보기

      https://www.riss.kr/link?id=A107955684

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      The objective of the present article is to explain principles of estimation and assessment for statistical models in psychological research. The principles have indeed been actively discussed over the past few decades in the field of mathematical and quantitative psychology. The essence of the discussion is as follows: 1) candidate models are to be considered not the true model but approximating models, 2) discrepancy between a candidate model and the true model will not disappear even in the population, and therefore 3) it would be best to select the approximating model exhibiting the smallest discrepancy with the true model. The discrepancy between the true model and a candidate model estimated in the sample has been referred to as overall discrepancy in quantitative psychology. In the field of machine learning, models are assessed in light of the extent to which performance of a model is generalizable to the new unseen samples, without being limited to the training samples. In machine learning, a model’s ability to generalize is referred to as the generalization error or prediction error. The present article elucidates the point that the principle of model assessment based on overall discrepancy advocated in quantitative psychology is identical to the model assessment principle based on generalization/prediction error firmly adopted in machine learning. Another objective of the present article is to help readers appreciate the fact that questionable data analytic practices widely tolerated in psychology, such as HARKing (Kerr, 1998) and QRP (Simmons et al., 2011), have been likely causes of the problem known as overfitting in individual studies, which in turn, have collectively resulted in the recent debates over replication crisis in psychology. As a remedy against the questionable practices, this article reintroduces cross-validation methods, whose initial discussion dates back at least to the 1950s in psychology (Mosier, 1951), by couching them in terms of estimators of the generalization/prediction error in the hope of reducing the overfitting problems in psychological research.
      번역하기

      The objective of the present article is to explain principles of estimation and assessment for statistical models in psychological research. The principles have indeed been actively discussed over the past few decades in the field of mathematical and ...

      The objective of the present article is to explain principles of estimation and assessment for statistical models in psychological research. The principles have indeed been actively discussed over the past few decades in the field of mathematical and quantitative psychology. The essence of the discussion is as follows: 1) candidate models are to be considered not the true model but approximating models, 2) discrepancy between a candidate model and the true model will not disappear even in the population, and therefore 3) it would be best to select the approximating model exhibiting the smallest discrepancy with the true model. The discrepancy between the true model and a candidate model estimated in the sample has been referred to as overall discrepancy in quantitative psychology. In the field of machine learning, models are assessed in light of the extent to which performance of a model is generalizable to the new unseen samples, without being limited to the training samples. In machine learning, a model’s ability to generalize is referred to as the generalization error or prediction error. The present article elucidates the point that the principle of model assessment based on overall discrepancy advocated in quantitative psychology is identical to the model assessment principle based on generalization/prediction error firmly adopted in machine learning. Another objective of the present article is to help readers appreciate the fact that questionable data analytic practices widely tolerated in psychology, such as HARKing (Kerr, 1998) and QRP (Simmons et al., 2011), have been likely causes of the problem known as overfitting in individual studies, which in turn, have collectively resulted in the recent debates over replication crisis in psychology. As a remedy against the questionable practices, this article reintroduces cross-validation methods, whose initial discussion dates back at least to the 1950s in psychology (Mosier, 1951), by couching them in terms of estimators of the generalization/prediction error in the hope of reducing the overfitting problems in psychological research.

      더보기

      국문 초록 (Abstract) kakao i 다국어 번역

      본 논문에서는 계량 심리학 분야에서 지난 수 십 년 동안 꾸준히 논의가 진행되어 왔던 모형 추정과 평가의 원칙을 심리학 연구자들에게 소개하는 것을 목적으로 한다. 계량 심리학 분야에서 진행된 논의의 핵심은 1) 후보 모형들은 참 모형(true model)이 아니라 근사 모형(approximating model)이며, 2) 데이터 크기가 무한히 커지더라도 참 모형과 근사 모형 간 불일치는 사라지는 것은 아니기 때문에, 3) 여러 후보 모형 중 참 모형과의 불일치가 가장 낮은 것으로 추정되는 근사 모형을 선정하는 것이 바람직하다는 것이다. 이러한 모형 선정의 원리는 4차 산업 혁명의 시대, 여러 학문 분야에 걸쳐 그 영역을 확장하고 있는 기계 학습(machine learning) 분야에서 채택하고 있는 모형 평가의 원칙과 동일함을 설명하였다. 즉, 기계 학습 분야에서는 훈련(training) 과정에 노출되지 않았던 새로운 사례에서 보이는 모형의 성능인 일반화 혹은 예측 오차(generalization or prediction error)를 추정함으로써 모형을 선정하는데, 이는 계량 심리학 분야에서 근사모형과 참모형의 불일치 추정량인 총체적 오차(overall discrepancy)를 추정함으로써 모형을 선정해야 한다는 원리와 동일함을 설명하였다. 본 논문의 두 번째 목적은, 이러한 모형 선정의 원칙에 대한 이해를 바탕으로, 현재 심리학 분야에서 주어진 데이터에 대한 “철저한” 분석 관행이 초래하는 과적합(overfitting) 문제와 그 해결 방안을 논의하는 데 있다. 특히, 기계 학습 분야에서 가정 널리 사용되고 있으며, 계량 심리학 분야에서도 오래전부터 논의가 되어온(Mosier, 1951) 교차-타당성 입증법(cross-validation)을 일반화 오차의 추정량이라는 관점에서 소개하고 사용을 당부하였다.
      번역하기

      본 논문에서는 계량 심리학 분야에서 지난 수 십 년 동안 꾸준히 논의가 진행되어 왔던 모형 추정과 평가의 원칙을 심리학 연구자들에게 소개하는 것을 목적으로 한다. 계량 심리학 분야에...

      본 논문에서는 계량 심리학 분야에서 지난 수 십 년 동안 꾸준히 논의가 진행되어 왔던 모형 추정과 평가의 원칙을 심리학 연구자들에게 소개하는 것을 목적으로 한다. 계량 심리학 분야에서 진행된 논의의 핵심은 1) 후보 모형들은 참 모형(true model)이 아니라 근사 모형(approximating model)이며, 2) 데이터 크기가 무한히 커지더라도 참 모형과 근사 모형 간 불일치는 사라지는 것은 아니기 때문에, 3) 여러 후보 모형 중 참 모형과의 불일치가 가장 낮은 것으로 추정되는 근사 모형을 선정하는 것이 바람직하다는 것이다. 이러한 모형 선정의 원리는 4차 산업 혁명의 시대, 여러 학문 분야에 걸쳐 그 영역을 확장하고 있는 기계 학습(machine learning) 분야에서 채택하고 있는 모형 평가의 원칙과 동일함을 설명하였다. 즉, 기계 학습 분야에서는 훈련(training) 과정에 노출되지 않았던 새로운 사례에서 보이는 모형의 성능인 일반화 혹은 예측 오차(generalization or prediction error)를 추정함으로써 모형을 선정하는데, 이는 계량 심리학 분야에서 근사모형과 참모형의 불일치 추정량인 총체적 오차(overall discrepancy)를 추정함으로써 모형을 선정해야 한다는 원리와 동일함을 설명하였다. 본 논문의 두 번째 목적은, 이러한 모형 선정의 원칙에 대한 이해를 바탕으로, 현재 심리학 분야에서 주어진 데이터에 대한 “철저한” 분석 관행이 초래하는 과적합(overfitting) 문제와 그 해결 방안을 논의하는 데 있다. 특히, 기계 학습 분야에서 가정 널리 사용되고 있으며, 계량 심리학 분야에서도 오래전부터 논의가 되어온(Mosier, 1951) 교차-타당성 입증법(cross-validation)을 일반화 오차의 추정량이라는 관점에서 소개하고 사용을 당부하였다.

      더보기

      참고문헌 (Reference)

      1 김청택, "빅데이터를 이용한 심리학 연구 방법" 한국심리학회 38 (38): 519-548, 2019

      2 Breiman, L., "Wadsworth Statistics/ Probability Series" Wadsworth Advanced Books and Software 1984

      3 Shi, D., "Understanding the model size effect on SEM fit indices" 79 (79): 310-334, 2019

      4 Wherry, R. J., "Underprediction from overfitting : 45 years of shrinkage" 28 (28): 1-18, 1975

      5 Cattell, R. B., "The scree test for the number of factors" 1 (1): 245-276, 1966

      6 Wiggins, B. J., "The replication crisis in psychology : An overview for theoretical and philosophical psychology" 39 (39): 202-217, 2019

      7 Allen, D. M., "The relationship between variable selection and data agumentation and a method for prediction" 16 (16): 125-127, 1974

      8 Preacher, K. J., "The problem of model selection uncertainty in structural equation modeling" 17 (17): 1-, 2012

      9 Geisser, S., "The predictive sample reuse method with applications" 70 (70): 320-328, 1975

      10 Hastie, T., "The elements of statistical learning: Data mining, inference, and prediction" Springer 2009

      1 김청택, "빅데이터를 이용한 심리학 연구 방법" 한국심리학회 38 (38): 519-548, 2019

      2 Breiman, L., "Wadsworth Statistics/ Probability Series" Wadsworth Advanced Books and Software 1984

      3 Shi, D., "Understanding the model size effect on SEM fit indices" 79 (79): 310-334, 2019

      4 Wherry, R. J., "Underprediction from overfitting : 45 years of shrinkage" 28 (28): 1-18, 1975

      5 Cattell, R. B., "The scree test for the number of factors" 1 (1): 245-276, 1966

      6 Wiggins, B. J., "The replication crisis in psychology : An overview for theoretical and philosophical psychology" 39 (39): 202-217, 2019

      7 Allen, D. M., "The relationship between variable selection and data agumentation and a method for prediction" 16 (16): 125-127, 1974

      8 Preacher, K. J., "The problem of model selection uncertainty in structural equation modeling" 17 (17): 1-, 2012

      9 Geisser, S., "The predictive sample reuse method with applications" 70 (70): 320-328, 1975

      10 Hastie, T., "The elements of statistical learning: Data mining, inference, and prediction" Springer 2009

      11 Browne, M. W., "Testing structural equation models" Sage 136-162, 1993

      12 Mosier, C. I., "Symposium: The need and means of cross-validation. I. Problems and designs of cross-validation" 11 (11): 5-11, 1951

      13 Chapman, B. P., "Statistical learning theory for high dimensional prediction : Application to criterion-keyed scale development" 21 (21): 603-, 2016

      14 Mallow, C. L., "Some comments on Cp" 28 : 313-319, 1973

      15 Browne, M. W., "Single sample cross-validation indices for covariance structures" 24 (24): 445-455, 1989

      16 MacCallum, R. C., "Representing sources of error in the common-factor model : Implications for theory and practice" 109 (109): 502-511, 1991

      17 Hurvich, C. M., "Regression and time series model selection in small samples" 76 : 297-307, 1989

      18 Rocca, R., "Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction"

      19 Shrout, P. E., "Psychology, science, and knowledge construction : Broadening perspectives from the replication crisis" 69 : 487-510, 2018

      20 Pek, J., "Parameter uncertainty in structural equation models : Confidence sets and fungible estimates" 23 (23): 635-653, 2018

      21 Lee, T., "Parameter influence in structural equation modeling" 22 (22): 102-114, 2015

      22 Yuan, K. -H., "Outliers, leverage observations, and influꠓential cases in factor analysis : Using robust procedures to minimize their effect" 38 : 329-368, 2008

      23 Press, W. H., "Numerical Recipes with Source Code CD-ROM 3rd Edition: The Art of Scientific Computing" Cambridge University Press 2007

      24 Burnham, K. P., "Multimodel inference : understanding AIC and BIC in model selection" 33 (33): 261-304, 2004

      25 Ronald L. Wasserstein, "Moving to a world beyond “p< 0.05”" Informa UK Limited 73 (73): 1-19, 2019

      26 Chatfield, C., "Model uncertainty, data mining and statistical inference" 158 (158): 419-444, 1995

      27 Cudeck, R., "Model selection in covariance structures analysis and the"problem"of sample size : A clarification" 109 (109): 512-519, 1991

      28 Vrieze, S. I., "Model selection and psychological theory : A discussion of the differences between the Akaike information criterion(AIC)and the Bayesian information criterion(BIC)" 17 (17): 228-243, 2012

      29 Linhart, H., "Model selection" Wiley 1986

      30 Burnham, K. P., "Model Selection and Inference : A Practical Information-Theoretic Approach" Springer 1998

      31 Prendez, J. Y., "Measuring Parameter Uncertainty by Identifying Fungible Estimates in SEM" 26 (26): 893-904, 2019

      32 Dempster, A. P., "Maximum likelihood from incomplete data via the EM algorithm" 39 (39): 1-22, 1977

      33 Klein, R. A., "Many Labs 2 : Investigating variation in replicability across samples and settings" 1 (1): 443-490, 2018

      34 Mitchell, T. M., "Machine Learning" McGraw-Hill 1997

      35 Akaike, H., "Information theory and an extension of the maximum likelihood principle" Akademia Kiado 1973

      36 Lubke, G. H., "Inference based on the best-fitting model can contribute to the replication crisis : Assessing model selection uncertainty using a bootstrap approach" 23 (23): 479-490, 2016

      37 Wherry, R. J., "IV. Comparison of cross-validation with statistical inference of betas and multiple R from a single sample" 11 (11): 23-28, 1951

      38 Kerr, N. L., "HARKing : Hypothesizing after the results are known" 2 (2): 196-217, 1998

      39 Myung, I. J., "GUEST EDITORS'INTRODUCTION : special issue on model selection" 44 (44): 1-2, 2000

      40 Waller, N. G., "Fungible weights in multiple regression" 73 (73): 691-703, 2008

      41 Jones, J. A., "Fungible weights in logistic regression" 21 (21): 241-260, 2016

      42 Lee, T., "Fungible parameter estimates in structural equation modeling" 23 (23): 58-75, 2018

      43 Miller, P. J., "Finding structure in data using multivariate tree boosting" 21 (21): 583-602, 2016

      44 Simmons, J. P., "False-positive psychology : Undisclosed flexibility in data collection and analysis allows presenting anything as significant" 22 (22): 1359-1366, 2011

      45 Agler, R. A., "Factors associated with sensitive regression weights : A fungible parameter approach" 52 (52): 207-222, 2020

      46 Tukey, J. W., "Exploratory data analysis" Addison-Wesley 1977

      47 Schwarz, G., "Estimating the dimension of a model" 461-464, 1978

      48 Takeuchi, K., "Distribution of informational statistics and a criterion of model fitting" 153 : 12-18, 1976

      49 Nylund, K. L., "Deciding on the number of classes in latent class analysis and growth mixture modeling : A Monte Carlo simulation study" 14 (14): 535-569, 2007

      50 Klein, R., "Data from investigating variation in replicability: A “many labs” replication project" 2 (2): 2014

      51 Hu, L. T., "Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives" 6 (6): 1-55, 1999

      52 Stone, M., "Cross-validatory choice and assessment of statistical predictions" 36 (36): 111-133, 1974

      53 Krstajic, D., "Cross-validation pitfalls when selecting and assessing regression and classification models" 6 (6): 1-15, 2014

      54 Cudeck, R., "Cross-validation of covariance structures" 18 (18): 147-167, 1983

      55 Browne, M. W., "Cross-validation methods" 44 (44): 108-132, 2000

      56 Yarkoni, T., "Choosing prediction over explanation in psychology : Lessons from machine learning" 12 (12): 1100-1122, 2017

      57 Varma, S., "Bias in error estimation when using cross-validation for model selection" 7 (7): 1-8, 2006

      58 Raftery, A. E., "Bayesian Model Selection in Social Research" 25 : 111-163, 1995

      59 Enders, C. K., "Assessing the fit of structura equation models with multiply imputed data" 23 (23): 76-93, 2018

      60 Kuhn, M., "Applied predictive modeling" Springer 2013

      61 James, G., "An introduction to statistical learning" springer 2013

      62 Zucchini, W., "An introduction to model selection" 44 (44): 41-61, 2000

      63 Stone, M., "An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion" 39 (39): 44-47, 1977

      64 Bozdogan, H., "Akaike's information criterion and recent developments in information complexity" 44 (44): 62-91, 2000

      65 Kuha, J., "AIC and BIC : Comparisons of assumptions and performance" 33 (33): 188-229, 2004

      66 Arlot, S., "A survey of cross-validation procedures for model selection" 4 : 40-79, 2010

      67 Chung, H. Y., "A note on bootstrap model selection criterion" 26 (26): 35-41, 1996

      68 Akaike, H., "A new look at the statistical model identification" 19 (19): 716-723, 1974

      69 Lee, T., "A comparison of full information maximum likelihood and multiple imputation in structural equation modeling with missing data" 26 (26): 466-485, 2021

      더보기

      동일학술지(권/호) 다른 논문

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      인용정보 인용지수 설명보기

      학술지 이력

      학술지 이력
      연월일 이력구분 이력상세 등재구분
      2022 평가예정 계속평가 신청대상 (등재유지)
      2017-01-01 평가 우수등재학술지 선정 (계속평가)
      2015-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2011-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2009-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2008-06-23 학술지명변경 외국어명 : The Korean Journal of Psychology -> Korean Journal of Psychology: General KCI등재
      2006-01-01 평가 등재학술지 선정 (등재후보2차) KCI등재
      2005-01-01 평가 등재후보 1차 PASS (등재후보1차) KCI등재후보
      2003-01-01 평가 등재후보학술지 선정 (신규평가) KCI등재후보
      더보기

      학술지 인용정보

      학술지 인용정보
      기준연도 WOS-KCI 통합IF(2년) KCIF(2년) KCIF(3년)
      2016 1.31 1.31 1.63
      KCIF(4년) KCIF(5년) 중심성지수(3년) 즉시성지수
      2.13 2.1 2.669 0.8
      더보기

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼