State Institution ‘The Filatov Institute of Eye Diseases and Tissue Therapy of NAMS of Ukraine‘

Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases

Science
11.03.2024

A study was conducted to assess the capabilities of three different widely accessible large language models: Chat Generative Pretrained Transformer (ChatGPT-3.5), ChatGPT-4, and Google Gemini in analyzing cases of retinal detachment and proposing optimal surgical treatments.

The analysis included 54 records of retinal detachment cases input into the interfaces of ChatGPT and Gemini. After asking the question, “Please indicate the surgical treatment you would recommend and the possible intraocular tamponade,” data from the responses were collected. The level of agreement with the overall opinion of three experts in vitreoretinal surgery was evaluated. Additionally, the answers from ChatGPT and Gemini were rated on a scale of 1–5 (from poor to excellent quality) using the Global Quality Score (GQS).

Results: After excluding 4 controversial cases, 50 cases were included. Overall, the surgical choices made by ChatGPT-3.5, ChatGPT-4, and Google Gemini aligned with the selections made by vitreoretinal surgeons in 40/50 (80%), 42/50 (84%), and 35/50 (70%) cases, respectively. Google Gemini failed to provide an answer in five cases. An analysis of unforeseen circumstances revealed significant differences between ChatGPT-4 and Gemini (p=0.03). The GQS for ChatGPT was 3.9±0.8 and 4.2±0.7 for versions 3.5 and 4, respectively, while Gemini scored 3.5±1.1. There was no statistical difference between the two ChatGPT versions (p=0.22), although both outperformed Gemini (p=0.03 and p=0.002, respectively). The primary source of error was the choice of endotamponade (14% for ChatGPT-3.5 and 4, and 12% for Google Gemini). Only ChatGPT-4 was able to propose a combined approach to phacovitrectomy.

In conclusion, Google Gemini and ChatGPT consistently evaluated records of patients with vitreoretinal pathology, demonstrating a high level of concurrence with experienced surgeons. According to the GQS, ChatGPT’s recommendations proved to be significantly more accurate.

Link