Comparing the Quality of Domain-Specific Versus General Language Models for Artificial Intelligence-Generated Differential Diagnoses in PICU Patients.

Alireza Akhondi-Asl, Youyang Yang, Matthew Luchette, Jeffrey P Burns, Nilesh M Mehta, Alon Geva

Pediatric Critical Care Medicine 2024 Februrary 9

OBJECTIVES: Generative language models (LMs) are being evaluated in a variety of tasks in healthcare, but pediatric critical care studies are scant. Our objective was to evaluate the utility of generative LMs in the pediatric critical care setting and to determine whether domain-adapted LMs can outperform much larger general-domain LMs in generating a differential diagnosis from the admission notes of PICU patients.

DESIGN: Single-center retrospective cohort study.

SETTING: Quaternary 40-bed PICU.

PATIENTS: Notes from all patients admitted to the PICU between January 2012 and April 2023 were used for model development. One hundred thirty randomly selected admission notes were used for evaluation.

INTERVENTIONS: None.

MEASUREMENTS AND MAIN RESULTS: Five experts in critical care used a 5-point Likert scale to independently evaluate the overall quality of differential diagnoses: 1) written by the clinician in the original notes, 2) generated by two general LMs (BioGPT-Large and LLaMa-65B), and 3) generated by two fine-tuned models (fine-tuned BioGPT-Large and fine-tuned LLaMa-7B). Differences among differential diagnoses were compared using mixed methods regression models. We used 1,916,538 notes from 32,454 unique patients for model development and validation. The mean quality scores of the differential diagnoses generated by the clinicians and fine-tuned LLaMa-7B, the best-performing LM, were 3.43 and 2.88, respectively (absolute difference 0.54 units [95% CI, 0.37-0.72], p < 0.001). Fine-tuned LLaMa-7B performed better than LLaMa-65B (absolute difference 0.23 unit [95% CI, 0.06-0.41], p = 0.009) and BioGPT-Large (absolute difference 0.86 unit [95% CI, 0.69-1.0], p < 0.001). The differential diagnosis generated by clinicians and fine-tuned LLaMa-7B were ranked as the highest quality in 144 (55%) and 74 cases (29%), respectively.

CONCLUSIONS: A smaller LM fine-tuned using notes of PICU patients outperformed much larger models trained on general-domain data. Currently, LMs remain inferior but may serve as an adjunct to human clinicians in real-world tasks using real-world data.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Renin-Angiotensin-Aldosterone System: From History to Practice of a Secular Topic.Sara H Ksiazek et al.International Journal of Molecular Sciences 2024 April 5

Albumin: a comprehensive review and practical guideline for clinical use.Farshad Abedi, Batool Zarei, Sepideh ElyasiEuropean Journal of Clinical Pharmacology 2024 April 13

Revascularization Strategy in Myocardial Infarction with Multivessel Disease.Alexander Jobs et al.Journal of Clinical Medicine 2024 March 27

Clinical practice guidelines on the management of status epilepticus in adults: A systematic review.Luca Vignatelli et al.Epilepsia 2024 April 13

Interstitial Lung Disease: A Review.Toby M MaherJAMA 2024 April 23

Detecting Abnormal Eye Movements in Patients with Neurodegenerative Diseases - Current Insights.Akila Sekar, Muriel T N Panouillères, Diego KaskiEye and Brain 2024

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Comparing the Quality of Domain-Specific Versus General Language Models for Artificial Intelligence-Generated Differential Diagnoses in PICU Patients.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app