Chat Generative Pretrained Transformer (chatgpt-4) Passes The Vascular Education And Self-assessment Program (vesap-5)
Robert Tatum, Robyn Guinto, Daniel Bertges, Matthew Alef.
University of Vermont Medical Center, Winooski, VT, USA.

OBJECTIVES: Large language models (LLM) have attracted increasing attention and scrutiny, a development partly attributable to the technological progress and commercial successes of ChatGPT and other sophisticated generative linguistic systems. Deep-learning models like ChatGPT carry tremendous potential to be employed as education tools for vascular surgeons. We sought to characterize the performance of the premier LLMs (ChatGPT versions 3.5 and 4.0) when presented with questions from the Vascular Education and Self-Assessment Program (VESAP-5) to better understand their present capabilities and to identify specific strengths and weaknesses within vascular surgery.
METHODS: VESAP-5 includes ten modules comprised of multiple-choice questions. Three researchers manually transcribed all questions from nine modules into ChatGPT 3.5 and ChatGPT 4.0 to generate responses. If a question included an image, a researcher provided a description of the image, and the description was included in the transcribed question. Performance records of ChatGPT 3.5 and 4.0 were kept for each of the ten completed VESAP-5 modules. Overall performance and module-specific performances were calculated and compared.
RESULTS: Overall, we transcribed 678 VESAP-5 questions into ChatGPT 3.5 and ChatGPT 4.0. ChatGPT 4.0 answered 75.1% of questions correctly, and ChatGPT 3.5 answered 53.2 % correctly. The module for which ChatGPT 4.0 obtained the highest score was venous and lymphatic disease (89% correct) followed by radiation (82%) and renal and mesenteric (82%). The module for which ChatGPT 4.0 obtained the lowest score was dialysis (63%). ChatGPT 3.5 performed best for vascular medicine (58%) and worst for aortoiliac disease (48%). CONCLUSIONS: LLMs have demonstrated advanced capabilities when presented with exam-style clinical vascular surgery questions as evidenced by ChatGPT 4.0 earning a passing score on the VESAP-5. The potential impact of LLMs on didactic education and surgical training within vascular surgery is tremendous and will depend on advancements made in two general domains; complex image-recognition capabilities, and integration of data processing capacities with response generation in ways that produce correct answers to complex questions with greater frequency.
