Background: Nonalcoholic fatty liver disease (NAFLD) is one of the most common chronic liver diseases worldwide. Recent attention focuses on screening and prediction of NAFLD. Machine learning techniques are powerful and promising tools. Methods: A cr...
Background: Nonalcoholic fatty liver disease (NAFLD) is one of the most common chronic liver diseases worldwide. Recent attention focuses on screening and prediction of NAFLD. Machine learning techniques are powerful and promising tools. Methods: A cross-sectional study was performed among 10,508 subjects who attended their annual health examination in the first affiliated hospital, College of Medicine, Zhejiang University, China in 2010. The questionnaires, Physical examinations, laboratory tests and liver ultrasonography were performed. 20 features (e.g., age, laboratory results) were extracted. Machine learning techniques were implemented on the open source software named Weka. The tasks included feature selection and classification. By removing redundant features, feature selection techniques built a screening model. Classification was used to build a prediction model, which was evaluated by F measure. Nine machine learning techniques were investigated, i.e., logistic regression, K-Nearest Neighbor, Support Vector Machine, naive Bayes, Bayesian network, decision tree, Adaboosting, bagging, and random forest. Results: A total of 2522(24%) subjects were fulfilled the diagnostic criteria of NAFLD. By using feature selection techniques, BMI, serum triglyceride, ALT, GGT and uric acid were the top-5 features contributing most to NAFLD. 10-fold cross-validation was used in classification to evaluate machine learning techniques, i.e., subjects were randomly divided into 10 folds, 9 folds were used to build a prediction model, the remaining fold was used to evaluate. The whole process lasted for 10 times, average performance was recorded. The results showed among the nine state-of-the-art machine learning techniques, Bayesian network demonstrated the best performance. It achieves the accuracy, specificity, sensitivity, and F-measure scores up to 83%, 0.787, 0.678, and 0.665, respectively. Compared with logistic regression, Bayesian network improves F-measure score by 10.83%. Conclusions: Novel machine learning techniques may have screening and predictive value for NAFLD.