Dr. Korea University Anam Hospital Seoul, Seoul-t'ukpyolsi, Republic of Korea
Objectives : Osteoarthritis (OA) is a common musculoskeletal disorder in older adults, characterized by joint pain, stiffness, and muscle weakness that limit independence and reduce quality of life (QoL). This study applied existing machine learning models to clinical data to generate interpretable insights that can support more effective management of QoL in individuals with OA.
Design: Clinical data from 1,104 patients with osteoarthritis were analyzed using 29 clinical variables, with the EQ-5D score binarized at 0.7 as the target. Class imbalance was managed through class weighting. Seven machine learning models were tested: logistic regression, four ensemble methods (Random Forest, XGBoost, LightGBM, CatBoost), and TabNet. Data were split 80:20 (train:test) with stratified sampling and trained using three-fold cross-validation. Performance was evaluated with accuracy, precision, recall, F1-score, and AUC, with F1 as the primary metric. Model interpretability was examined using SHAP values and TabNet attention weights to highlight clinically relevant biomechanical features.
Results: CatBoost achieved the best performance, with an accuracy of 0.87 ± 0.02 and an F1-score of 0.81 ± 0.02. CatBoost and logistic regression both reached the highest AUC of 0.94. The confusion matrix showed CatBoost provided balanced sensitivity and specificity. SHAP analysis highlighted WOMAC Function, operative side, WOMAC Pain, VAS, and 6MWD as the most influential predictors. Higher impairment and pain scores increased predicted risk, while poorer functional test results (6MWD, TUG) were linked to worse outcomes. Biomechanical factors such as joint torque and range of motion exerted a moderate influence.
Conclusion: The findings emphasize the importance of patient-reported outcomes, surgical details, and functional assessments in predicting quality of life in osteoarthritis patients. They also demonstrate the clinical relevance and interpretability of machine learning models, with CatBoost showing particular promise as a decision-support tool for OA management.