Kai Xiong Cheong, Chenxi Zhang, Tien-En Tan, Beau J Fenner, Wendy Meihua Wong, Kelvin Yc Teo, Ya Xing Wang, Sobha Sivaprasad, Pearse A Keane, Cecilia Sungmin Lee, Aaron Y Lee, Chui Ming Gemmy Cheung, Tien Yin Wong, Yun-Gyung Cheong, Su Jeong Song, Yih Chung Tham
BACKGROUND/AIMS: To compare the performance of generative versus retrieval-based chatbots in answering patient inquiries regarding age-related macular degeneration (AMD) and diabetic retinopathy (DR). METHODS: We evaluated four chatbots: generative models (ChatGPT-4, ChatGPT-3.5 and Google Bard) and a retrieval-based model (OcularBERT) in a cross-sectional study. Their response accuracy to 45 questions (15 AMD, 15 DR and 15 others) was evaluated and compared. Three masked retinal specialists graded the responses using a three-point Likert scale: either 2 (good, error-free), 1 (borderline) or 0 (poor with significant inaccuracies)...
May 15, 2024: British Journal of Ophthalmology