Title : Fractures under the lens: How smart are our machines
Abstract:
Objective: To systematically evaluate the diagnostic accuracy of Artificial Intelligence (AI) models for detecting paediatric appendicular fractures on plain radiographs.
Methods: This systematic review and meta-analysis followed the PRISMA-DTA guidelines. MEDLINE, Scopus, Cochrane Library, and Web of Science were searched from inception to May 2025. Eligible studies included paediatric patients (<21 years) where AI models assessed plain radiographs for fractures, using human readers as the reference standard. Primary outcomes were pooled sensitivity, specificity, Diagnostic Odds Ratio (DOR), positive Likelihood Ratio (LR?), and negative Likelihood Ratio (LR?). Risk of bias was assessed using QUADAS-2. Random-effects models and Hierarchical Summary Receiver Operating Characteristic (HSROC) curves were applied.
Results: Seventeen studies met inclusion criteria, with 11 contributing to meta-analysis (over 10,000 radiographs). Pooled sensitivity was 0.92 (95% CI: 0.89–0.94) and specificity was 0.90 (95% CI: 0.85–0.94), with a false-positive rate of 0.10 (95% CI: 0.06–0.15). The HSROC curve demonstrated high overall discriminative ability. Subgroup analyses showed comparable diagnostic performance for upper extremity fractures (sensitivity 0.91, specificity 0.89) and lower extremity fractures (sensitivity 0.89, specificity 0.94). The pooled DOR was 104.6 (SD 31.3), LR was 9.32 (SD 2.22), and LR? was 0.089 (SD 0,016). Most studies were assessed as low risk of bias. However, notable limitations included the predominance of retrospective, single-centre designs and limited external validation.
Conclusion: AI models, particularly deep learning architectures, demonstrate high diagnostic accuracy for detecting paediatric appendicular fractures on radiographs, approaching expert-level performance and improving the diagnostic abilities of junior clinicians. Despite promising results, most evidence comes from retrospective and internally validated studies, raising concerns about generalizability. Future research should prioritize prospective multicentre validation, workflow integration, and assessment of clinical impact before widespread clinical adoption. AI has the potential to become a valuable adjunct in paediatric fracture diagnosis, enhancing detection accuracy and optimizing care pathways, but its implementation must be guided by robust evidence, ethical oversight, and clear clinical protocols.