Assessing the Performance of Artificial Intelligence Models: Insights from the American Society of Functional Neuroradiology Artificial Intelligence Competition.

Bin Jiang, Burak Berksu Ozkara, Guangming Zhu, Derek Boothroyd, Jason W Allen, Daniel P Barboriak, Peter Chang, Cynthia Chan, Ruchir Chaudhari, Hui Chen, Anjeza Chukus, Victoria Ding, David Douglas, Christopher G Filippi, Adam E Flanders, Ryan Godwin, Syed Hashmi, Christopher Hess, Kevin Hsu, Yvonne W Lui, Joseph A Maldjian, Patrik Michel, Sahil S Nalawade, Vishal Patel, Prashant Raghavan, Haris I Sair, Jody Tanabe, Kirk Welker, Chris Whitlow, Greg Zaharcuk, Max Wintermark

AJNR. American Journal of Neuroradiology 2024 April 26

BACKGROUND AND PURPOSE: Artificial intelligence (AI) models in radiology are frequently developed and validated using datasets from a single institution and are rarely tested on independent, external datasets, raising questions about their generalizability and applicability in clinical practice. The American Society of Functional Neuroradiology (ASFNR) organized a multi-center AI competition to evaluate the proficiency of developed models in identifying various pathologies on NCCT, assessing age-based normality and estimating medical urgency.

MATERIALS AND METHODS: In total, 1201 anonymized, full-head NCCT clinical scans from five institutions were pooled to form the dataset. The dataset encompassed normal studies as well as pathologies including acute ischemic stroke, intracranial hemorrhage, traumatic brain injury, and mass effect (detection of these-task 1). NCCTs were also assessed to determine if findings were consistent with expected brain changes for the patient's age (task 2: age-based normality assessment) and to identify any abnormalities requiring immediate medical attention (task 3: evaluation of findings for urgent intervention). Five neuroradiologists labeled each NCCT, with consensus interpretations serving as the ground truth. The competition was announced online, inviting academic institutions and companies. Independent central analysis assessed each model's performance. Accuracy, sensitivity, specificity, positive and negative predictive values, and receiver operating characteristic (ROC) curves were generated for each AI model, along with the area under the ROC curve (AUROC).

RESULTS: 1177 studies were processed by four teams. The median age of patients was 62, with an interquartile range of 33. 19 teams from various academic institutions registered for the competition. Of these, four teams submitted their final results. No commercial entities participated in the competition. For task 1, AUROCs ranged from 0.49 to 0.59. For task 2, two teams completed the task with AUROC values of 0.57 and 0.52. For task 3, teams had little to no agreement with the ground truth.

CONCLUSIONS: To assess the performance of AI models in real-world clinical scenarios, we analyzed their performance in the ASFNR AI Competition. The first ASFNR Competition underscored the gap between expectation and reality; the models largely fell short in their assessments. As the integration of AI tools into clinical workflows increases, neuroradiologists must carefully recognize the capabilities, constraints, and consistency of these technologies. Before institutions adopt these algorithms, thorough validation is essential to ensure acceptable levels of performance in clinical settings. ABBREVIATIONS: AI = artificial intelligence; ASFNR = American Society of Functional Neuroradiology; AUROC = area under the receiver operating characteristic curve; DICOM = Digital Imaging and Communications in Medicine; GEE = generalized estimation equation; IQR = interquartile range; NPV = negative predictive value; PPV = positive predictive value; ROC = receiver operating characteristic; TBI = traumatic brain injury.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Lung ultrasound for diagnosis and management of ARDS.Marry R Smit, Paul H Mayo, Silvia MongodiIntensive Care Medicine 2024 April 25

Executive Summary: State-of-the-Art Review: Unintended Consequences: Risk of Opportunistic Infections Associated with Long-term Glucocorticoid Therapies in Adults.Daniel B Chastain et al.Clinical Infectious Diseases 2024 April 11

Autoimmune Hemolytic Anemias: Classifications, Pathophysiology, Diagnoses and Management.Melika Loriamini et al.International Journal of Molecular Sciences 2024 April 13

Clinical practice guidelines on the management of status epilepticus in adults: A systematic review.Luca Vignatelli et al.Epilepsia 2024 April 13

Should renin-angiotensin system inhibitors be held prior to major surgery?Matthieu LegrandBritish Journal of Anaesthesia 2024 May

Contrast-induced acute kidney injury: a review of definition, pathogenesis, risk factors, prevention and treatment.Yanyan Li, Junda WangBMC Nephrology 2024 April 23

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Assessing the Performance of Artificial Intelligence Models: Insights from the American Society of Functional Neuroradiology Artificial Intelligence Competition.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app