随机森林和决策树模型在轻型缺血性脑卒中患者复发预测中的应用分析
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

山西省重点研发计划项目(2021XM14)


Recurrence prediction of patients with minor ischemic stroke based on random forest and decision tree models
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 构建轻型缺血性脑卒中(MIS)患者 2 年内复发的随机森林和决策树预测模型,并分 析模型的临床应用价值。方法 回顾性收集 2020 年 7 月 1 日至 12 月 31 日于山西省心血管病医院神经 内科就诊的 520 例 MIS 患者的病历资料,根据 2 年内是否复发将患者分为复发组和未复发组。基于缺 失森林对数据进行填补,根据文献检索与专家讨论结果筛选预测变量并进行单因素分析,合成少数过 采样技术 - 标称连续(SMOTE-NC)技术处理数据不平衡,采用贝叶斯优化十折交叉验证构建随机森林、 决策树模型并与 Logistic 回归模型进行比较。基于受试者工作特征曲线下面积(AUC)、布里尔分数(BS) 与校准曲线分别评价模型的区分度与校准度。对预测性能最好的模型采用 SHAP 模型解释预测结果。 结果 2年内复发患者共93例(17.9%)。两组患者的年龄,吸烟、糖尿病、循环梗死部位、多发性脑梗死比例, 以及舒张压、红细胞压积、血小板计数、低密度脂蛋白水平比较,差异有统计学意义(P<0.05)。Logistic 回归模型、决策树模型与随机森林模型在测试集中,预测MIS患者2年内复发情况的AUC(95%CI)分别为 0.764(0.691~0.835)、0.743(0.668~0.818)、0.892(0.843~0.941),BS 分别为 0.200、0.211、0.142,随机森林 预测效果最好,准确度为 0.822,灵敏度为 0.818,阳性预测值为 0.808,阴性预测值为 0.835。SHAP 分析 结果显示,随机森林模型中重要性排序前 5 名的变量分别是年龄、低密度脂蛋白、吸烟、糖尿病、舒张压。 结论 与决策树和Logistic回归模型相比,随机森林模型预测MIS 2年内复发的性能较好。

    Abstract:

    Objective To construct the random forest and decision tree prediction model for recurrence of minor ischemic stroke (MIS) within two years, and analyze the predicted performance of the models. Methods The medical records of 520 MIS patients who visited Department of Neurology of Shanxi Cardiovascular Hospital from July 1 to December 31, 2020 were retrospectively collected. Patients were divided into a recurrent group and a non-recurrent group based on whether they relapsed within two years. This study filled in the data through the missing forest. Based on literature search and expert discussion, predictive variables were selected and univariate analysis was conducted, and addressed data imbalance through the synthetic minority over-sampling technique-nominal continuity (SMOTE-NC). Random forest and decision tree models were constructed using Bayesian optimization 10-fold cross validation and compared with the Logistic regression model. The discrimination and calibration of the models were evaluated based on the area under the receiver operating characteristic curve (AUC), Brier score (BS), and calibration curve. The prediction results of the model with the excellent predictive performance were explained using the SHapley Additive exPlanations (SHAP) model. Results A total of 93 patients (17.9%) experienced recurrence within two years. There were statistical differences between the two groups in age, smoking, diabetes, location of circulatory infarction,multiple cerebral infarction, diastolic pressure, hematocrit, platelet count, and low-density lipoprotein (P< 0.05). The AUC (95%CI) of the testing set of Logistic regression model, decision tree model, and random forest model for predicting recurrence within two years in patients with MIS were 0.764 (0.691, 0.835), 0.743 (0.668, 0.818), 0.892 (0.843, 0.941), and BS were 0.200, 0.211, and 0.142, respectively. The random forest model had the excellent prediction performance, with an accuracy of 0.822, a sensitivity of 0.818, a positive prediction value of 0.808, and a negative prediction value of 0.835. SHAP analysis showed that the top five variables in the random forest model were age, low-density lipoprotein, smoking, diabetes, and diastolic pressure. Conclusions Compared with decision tree model and Logistic regression model, the random forest model performs better in predicting the recurrence of MIS within two years.

    参考文献
    相似文献
    引证文献
引用本文

莫秋红,丁晓波,张岩波,李伟荣.随机森林和决策树模型在轻型缺血性脑卒中患者复发预测中的应用分析[J].神经疾病与精神卫生,2024,24(2):
DOI :10.3969/j. issn.1009-6574.2024.02.001.

复制
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-27