Chatbot Reliability in Managing Thoracic Surgical Clinical Scenarios.

Joseph J Platz, Darren S Bryan, Keith S Naunheim, Mark K Ferguson

Annals of Thoracic Surgery 2024 April 3

BACKGROUND: Chatbot use in medicine is growing, and concerns have been raised regarding their accuracy. This study assessed the performance of 4 different chatbots in managing thoracic surgical clinical scenarios.

METHODS: Topic domains were identified and clinical scenarios were developed within each domain. Each scenario included 3 stems using Key Feature methods related to diagnosis, evaluation, and treatment. Twelve scenarios were presented to ChatGPT-4 (OpenAI), Bard (recently renamed Gemini; Google), Perplexity (Perplexity AI), and Claude 2 (Anthropic) in 3 separate runs. Up to 1 point was awarded for each stem, yielding a potential of 3 points per scenario. Critical failures were identified before scoring; if they occurred, the stem and overall scenario scores were adjusted to 0. We arbitrarily established a threshold of ≥2 points mean adjusted score per scenario as a passing grade and established a critical fail rate of ≥30% as failure to pass.

RESULTS: The bot performances varied considerably within each run, and their overall performance was a fail on all runs (critical mean scenario fails of 83%, 71%, and 71%). The bots trended toward "learning" from the first to the second run, but without improvement in overall raw (1.24 ± 0.47 vs 1.63 ± 0.76 vs 1.51 ± 0.60; P = .29) and adjusted (0.44 ± 0.54 vs 0.80 ± 0.94 vs 0.76 ± 0.81; P = .48) scenario scores after all runs.

CONCLUSIONS: Chatbot performance in managing clinical scenarios was insufficient to provide reliable assistance. This is a cautionary note against reliance on the current accuracy of chatbots in complex thoracic surgery medical decision making.

Full text links

We have located links that may give you full text access.

Full Text PDF

Show additional links to paperHide additional links to paper

PubMed Article Locator

Add to Saved Papers

Get 1-tap access

Related Resources

Obesity pharmacotherapy in older adults: a narrative review of evidence.Alex E Henney et al.International Journal of Obesity 2024 May 7

Haemodynamic monitoring during noncardiac surgery: past, present, and future.Karim Kouz et al.Journal of Clinical Monitoring and Computing 2024 April 31

SGLT2 Inhibitors in Kidney Diseases-A Narrative Review.Agata Gajewska et al.International Journal of Molecular Sciences 2024 May 2

Use of Intravenous Albumin: A Guideline from the International Collaboration for Transfusion Medicine Guidelines.Jeannie Callum et al.Chest 2024 March 5

''Myth Busting in Infectious Diseases'': A Comprehensive Review.Ali Almajid et al.Curēus 2024 March

Antithrombotic Therapy for VTE Disease: Compendium and Review of CHEST Guidelines 2012-2021.Scott M Stevens et al.Chest 2024 March 7

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Chatbot Reliability in Managing Thoracic Surgical Clinical Scenarios.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app