Abstract:Objective To construct and validate a prediction model based on extreme gradient boosting (XGBoost) for early identification of the risk of treatment failure of olanzapine in schizophrenia patients at 8 weeks, so as to provide auxiliary support for individualized treatment decisions. Methods This study included 200 patients with schizophrenia who received olanzapine treatment and completed an 8-week follow-up at Xiamen Xianyue Hospital from January 2023 to December 2024. Treatment failure was defined as a Positive and Negative Syndrome Scale( PANSS) reduction rate of<30% after 8 weeks of treatment. Candidate predictors included subject demographic characteristics( such as age, gender, disease duration), clinical features( baseline PANSS score, comorbidities, past medication history), laboratory indicators[ such as complete blood count,liver and kidney function, serum interleukin-6( IL-6)], and genetic markers( DRD2 rs1076560). Multiple imputation by chained equations( MICE, m=5) was used for missing values. The samples were randomly divided into a training set( n=140) and a testing set( n=60) in a 7∶3 ratio. The training set employed 5-fold cross-validation, with XGBoost hyperparameters tuned via grid search. When necessary, synthetic minority oversampling technique( SMOTE) was used to address class imbalance. Model performance was evaluated in the independent testing set using area under the curve( AUC), sensitivity, specificity, accuracy, calibration curve (Hosmer-Lemeshow test), and decision curve analysis( DCA), with variable importance interpreted using Shapley additive explanations( SHAP) values. Statistical analysis was performed using Python( XGBoost, scikit-learn) and R software. Results There were no statistically significant differences between patients in the training and testing sets in terms of gender, age, disease duration, baseline PANSS scores, comorbid anxiety disorder ratio, serum IL-6 levels, and genotype distribution at the DRD2 gene rs1076560 locus( all P>0.05). The XGBoost model identified five important predictors during training/validation: baseline PANSS positive symptom score, disease duration, serum IL-6 levels, genotype distribution at the DRD2 gene rs1076560 locus, and comorbid anxiety disorder. The model performance in the testing set was as follows: accuracy 0.833, sensitivity 0.794, specificity 0.885, AUC of 0.897[ 95%CI( 0.808,0.986)], Hosmer-Lemeshow test P=0.620, with good calibration. DCA indicated that when the threshold probability exceeded 0.25, the model demonstrated greater clinical net benefit compared to a single predictor. Conclusions The XGBoost prediction model established in this retrospective study effectively identifies high-risk patients for olanzapine treatment failure at 8 weeks within this cohort. Key identified factors include symptom severity, disease duration, inflammatory indicators, genetic polymorphisms, and comorbidities. The model needs to be validated in external cohorts before it can be used for clinical decision support.