The purpose of this study is to present a model to predict the sustainability of a small business by using a RandomForest machine learning algorithm, to evaluate the prediction performance of the model, and to identify the characteristics of factors c...
The purpose of this study is to present a model to predict the sustainability of a small business by using a RandomForest machine learning algorithm, to evaluate the prediction performance of the model, and to identify the characteristics of factors contributing to the prediction. For the empirical analysis, data from the “2019 Small Business Survey” jointly prepared by the Ministry of SMEs and Startups and the National Statistical Office was used.
For the predictive model, 45 explanatory variables were set for variables that were verified to have an effect on startup preparation, management performance, and sustainability of small business owners in previous studies. As for the predictive performance of the small business sustainability model, the accuracy of correctly classifying the continuous operation of small business owners was 91.06%, and the sensitivity of predicting that small business owners thinking sustainable operation would continue was 95.0%. The precision, which is the probability that the result predicted as a small business owner with a sustainable management idea is correct, is 88.21%, the F1-Score, the harmonic average of sensitivity and precision, is 91.48%, and the ROC-AUC score, which measures the true positive rate to the false positive rate, is 91.02. % was shown. The results of this study can be said to be comparable to or superior to those of social science studies that use random forests to present results.
The importance of the explanatory variables contributing to the prediction is, in order, sales, start-up motive (because it is likely to have more income), importance of start-up preparation activities (experience in the same industry), total start-up cost, and start-up motive (I want to run a business myself) in that order. As revealed in previous studies, sales are the most important variable used as an indicator of business performance as the factor that has the greatest influence on the sustainability of small businesses.
As an academic implication of this study, it is relevant that a predictive model composed of a number of explanatory variables was derived from using a machine learning algorithm (RandomForest), which was rarely used in previous studies. Additionally, in most of the previous studies, there were limitations due to the restricted sample composition or the total sample size. It is also meaningful to use a variety of 45 explanatory variables based on the variables verified in previous studies when setting the explanatory variables.
As a practical implication, it is meaningful in that the importance of 45 explanatory variables can be analyzed to provide a basis for the government or local governments to prepare countermeasures or support systems for the sustainability of small businesses. Start-up preparation is the key to the success of a business, and various consulting on preparation for start-up by the government or local governments will be effective for business sustainability of small businesses so that the preparation for start-up can be made more faithfully.
As a limitation of the study, this study used the 「2019 Small Business Survey」, but only 14,692 out of 38,169 data were used. There was a limit to the study due to the entire data could not be used for analysis. Since the analysis was performed using only cross-sectional data, which is a one-time data, there is a limit to clearly clarifying the relationship between the project preparation stage and the project operation stage.