Objective To construct the random forest and decision tree prediction model for recurrence of minor ischemic stroke (MIS) within two years, and analyze the predicted performance of the models. Methods The medical records of 520 MIS patients who visited Department of Neurology of Shanxi Cardiovascular Hospital from July 1 to December 31, 2020 were retrospectively collected. Patients were divided into a recurrent group and a non-recurrent group based on whether they relapsed within two years. This study filled in the data through the missing forest. Based on literature search and expert discussion, predictive variables were selected and univariate analysis was conducted, and addressed data imbalance through the synthetic minority over-sampling technique-nominal continuity (SMOTE-NC). Random forest and decision tree models were constructed using Bayesian optimization 10-fold cross validation and compared with the Logistic regression model. The discrimination and calibration of the models were evaluated based on the area under the receiver operating characteristic curve (AUC), Brier score (BS), and calibration curve. The prediction results of the model with the excellent predictive performance were explained using the SHapley Additive exPlanations (SHAP) model. Results A total of 93 patients (17.9%) experienced recurrence within two years. There were statistical differences between the two groups in age, smoking, diabetes, location of circulatory infarction,multiple cerebral infarction, diastolic pressure, hematocrit, platelet count, and low-density lipoprotein (P< 0.05). The AUC (95%CI) of the testing set of Logistic regression model, decision tree model, and random forest model for predicting recurrence within two years in patients with MIS were 0.764 (0.691, 0.835), 0.743 (0.668, 0.818), 0.892 (0.843, 0.941), and BS were 0.200, 0.211, and 0.142, respectively. The random forest model had the excellent prediction performance, with an accuracy of 0.822, a sensitivity of 0.818, a positive prediction value of 0.808, and a negative prediction value of 0.835. SHAP analysis showed that the top five variables in the random forest model were age, low-density lipoprotein, smoking, diabetes, and diastolic pressure. Conclusions Compared with decision tree model and Logistic regression model, the random forest model performs better in predicting the recurrence of MIS within two years.
参考文献
相似文献
引证文献
引用本文
莫秋红,丁晓波,张岩波,李伟荣.随机森林和决策树模型在轻型缺血性脑卒中患者复发预测中的应用分析[J].神经疾病与精神卫生,2024,24(2): DOI :10.3969/j. issn.1009-6574.2024.02.001.