基于生物信息学及机器学习算法筛选诊断帕金森病的枢纽基因
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家自然科学基金(81960243);中央引导地方科技发展专项资金项目 (ZYD2022C17)


Screening and diagnosis of hub genes for Parkinson disease based on bioinformatics and machine learning algorithms
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 基于生物信息学和机器学习算法探索帕金森病(PD)诊断的生物标志物及其与 免疫浸润的相关性。方法 选择基因表达综合数据库(GEO)中的 GSE20164、GSE20314、GSE20333 和 GSE24378数据集进行分析,筛选PD患者和健康对照者大脑黑质中的差异表达基因。采用GO富集分析、 KEGG 通路富集分析、LASSO 逻辑回归算法和随机森林算法筛选枢纽基因,并计算枢纽基因诊断 PD 的 受试者工作特征(ROC)曲线下面积(AUC)。采用 RNA 转录相关子集进行细胞类型识别(CIBERSORTx)评 估 PD 患者中 22 种免疫细胞的浸润特性。结果 共筛出 20 个与 PD 相关的差异表达基因,包括 5 个高表 达差异基因和 15 个低表达差异基因。GO 富集分析和 KEGG 通路富集分析结果显示,20 个差异表达基 因涉及多巴胺生物合成、胺类生物合成、对毒物反应、酪氨酸代谢、多巴胺能突触、PD、突触囊泡循环等 方面。LASSO 逻辑回归算法和随机森林算法筛选出 KCNMB3、SDC1 和 EPYC 3 个诊断枢纽基因。ROC 曲线分析显示,3 个枢纽基因综合诊断 PD 的AUC为 0.783。免疫浸润分析显示,PD 组中的幼稚 B 细胞、 单核细胞比例高于健康对照组,差异有统计学意义(P< 0.05);幼稚 NK 细胞与激活的 CD4+ T 细胞呈正相 关(P< 0.05)。结论 通过 LASSO 算法和随机森林算法筛选出的 KCNMB3、SDC1 和 EPYC 枢纽基因在 PD 的诊断中展现出良好的效能。

    Abstract:

    Objective To explore biomarkers for the diagnosis of Parkinson disease (PD) and their correlation with immune infiltration based on bioinformatics and machine learning algorithms. Methods The GSE20164, GSE20314, GSE20333, and GSE24378 datasets from the Gene Expression Omnibus (GEO) were selected for analysis to screen for differentially expressed genes in the substantia nigra of PD patients and healthy controls. Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, LASSO Logistic regression algorithm, and random forest algorithm were used to screen hub genes, and the area under the receiver operating characteristic (ROC) curve (AUC) of hub genes for diagnosing PD was calculated. CIBERSORTx was used to evaluate the infiltration characteristics of 22 immune cells in PD patients. Results A total of 20 differentially expressed genes related to PD were screened, including 5 upregulated genes and 15 downregulated genes. GO enrichment analysis and KEGG pathway enrichment analysis showed that 20 differentially expressed genes were involved in dopamine biosynthesis, amine biosynthesis, toxin response, tyrosine metabolism, dopaminergic synapses, PD, synaptic vesicle circulation, and other aspects. LASSO Logistic regression algorithm and random forest algorithm screened out three diagnostic hub genes, KCNMB3, SDC1, and EPYC. The ROC curve analysis showed that the AUC for the comprehensive diagnosis of PD by the three hub genes was 0.783. Immune infiltration analysis showed that the proportion of immature B cells and monocytes in the PD group was higher than that in the healthy control group, and the difference was statistically significant (P< 0.05). There is a positive correlation between immature NK cells and activated CD4+ T cells, and the difference was statistically significant (P<0.05). Conclusions The KCNMB3, SDC1, and EPYC hub genes screened through LASSO algorithm and random forest algorithm show good performance in the diagnosis of PD.

    参考文献
    相似文献
    引证文献
引用本文

王子豪,夏欢,冯婷婷,张明洋,杨新玲.基于生物信息学及机器学习算法筛选诊断帕金森病的枢纽基因[J].神经疾病与精神卫生,2023,23(12):
DOI :10.3969/j. issn.1009-6574.2023.12.001.

复制
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-01-08