Abstrakt:
Fracture detection on musculoskeletal (MSK) radiographs is critical for both emergency and routine care, yet diagnostic errors remain common due to high workloads and limited radiological expertise. This study evaluates the diagnostic performance of an artificial intelligence (AI) system (Carebot AI Bones 1.8.10; Carebot s.r.o.) in detecting fractures on MSK X-rays, comparing its performance to six radiologists of varying experience levels in a blinded multi-reader, multi-case (MRMC) study. A total of 489 radiographs were retrospectively analyzed from routine clinical practice, with ground truth established for 448 images through consensus among three experienced radiologists. Diagnostic performance was assessed using sensitivity (Se), specificity (Sp), positive likelihood ratio (PLR), and negative likelihood ratio (NLR), with statistical analysis including McNemar’s test and Holm’s method for multiple comparisons. The AI system achieved a sensitivity of 0.921 (95% CI: 0.846–0.961) and specificity of 0.897 (95% CI: 0.861–0.924). Radiologists’ sensitivity ranged from 0.663 to 0.933, and specificity ranged from 0.916 to 0.989. The AI demonstrated consistently high sensitivity across body parts, particularly for elbow and hand/wrist fractures, often exceeding radiologists’ performance. Specificity was slightly lower but remained within an acceptable range, supporting AI’s potential as a complementary diagnostic tool. These findings highlight the clinical utility of AI in MSK fracture detection, particularly in settings with limited resources or high diagnostic workloads. Future research should validate these results in larger, multicentric studies to ensure broader generalizability and evaluate AI integration in real-world workflows.