Generative Artificial Intelligence Chatbots May Provide Appropriate Informational Responses To Common Vascular Surgery Questions
Ethan Chervonski, B.A.1, Keerthi B. Harish, B.A.1, Caron B. Rockman, M.D.2, Mikel Sadek, M.D.2, Katherine A. Teter, M.D.2, Glenn R. Jacobowitz, M.D.2, Todd L. Berland, M.D.2, Thomas S. Maldonado, M.D.2.
1New York University Grossman School of Medicine, New York, NY, USA, 2New York University Langone Medical Center, New York, NY, USA.

Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality and accessibility of AI responses to common patient questions regarding vascular surgery diseases.
OpenAI’s ChatGPT3.5 and Google’s Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales.
ChatGPT responses were rated more accurate than Bard responses ( (SD): 3.08 (0.33) versus 2.82 (0.40), p<0.01). ChatGPT responses were scored more complete than Bard responses ( (SD): 2.98 (0.34) versus 2.62 (0.36), p<0.01). The inter-rater reliability (IRR) was fair among faculty ratings of accuracy (WChatGPT=0.20, WBard=0.25) and completeness (WChatGPT=0.23, WBard=0.28). Most ChatGPT responses (75.0%, n=18) and almost half of Bard responses (45.8%, n=11) were unanimously deemed appropriate. While the IRR for Bard appropriateness ratings was fair (κ=0.20), the IRR was poor for ChatGPT ratings (κ=0.01). The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 (SD: 10.8), 14.5 (SD: 2.2), and 17.7 (SD: 3.1), respectively, indicating that responses were readable with a post-secondary education. Bard’s mean readability scores were 58.9 (SD: 10.5), 8.2 (SD: 1.7), and 11.0 (SD: 2.0), respectively, indicating that responses were readable with a high-school education (p<0.0001 for three metrics). Table 1 summarizes the findings.
AI offers a novel means of educating patients that avoids the inundation of information from “Dr. Google” and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of reader accessibility. While Bard responses are more readable, their quality is poorer.

DomainQuestionAccuracya (Mean Rating)Completenessb (Mean Rating)Appropriatenessc (Fraction of Raters)
SVS Website “Common Questions”1. What is a vascular surgeon?3.503.173.172.836/66/6
2. What is vascular disease?3.333.173.172.836/66/6
3. What is the vascular system?3.503.333.332.676/66/6
Peripheral artery disease4. I have leg pain. Is this peripheral arterial disease?3.002.673.002.175/65/6
5. How is peripheral arterial disease diagnosed?3.503.003.332.836/66/6
6. How can I prevent my peripheral arterial disease from getting worse?3.333.333.503.006/66/6
Abdominal Aortic Aneurysm7. I do not have any symptoms from my abdominal aortic aneurysm. Do I need to consider having it fixed?3.673.003.672.836/66/6
8. Should everyone be checked to see if they have an abdominal aortic aneurysm?3.172.333.332.175/65/6
9. How are abdominal aortic aneurysms treated?2.672.832.502.675/65/6
Carotid artery disease10. I am dizzy. Is this from carotid artery disease?2.832.502.672.834/64/6
11. How much carotid artery blockage is considered significant?2.832.002.831.831/61/6
12. If I have no symptoms from my carotid artery disease, should I have it fixed?3.332.673.002.175/65/6
Deep venous thrombosis13. What are the symptoms of a deep venous thrombosis?3.003.503.173.336/66/6
14. Can a deep venous thrombosis be detected on physical exam?3.172.333.002.004/64/6
15. Does my deep venous thrombosis require surgery?2.502.172.672.332/62/6
Varicose veins16. I have an ulcer on my leg. What is this and is it related to my varicose veins?2.672.502.502.504/64/6
17. You can see my varicose veins. What else will my vascular surgeon look for on a physical exam?2.673.002.832.836/66/6
18. How do I know if the visible veins on my legs are just a “cosmetic” concern or if they represent a real medical condition?
Chronic kidney disease19. Are there any symptoms in my arms or legs that I should mention to my vascular surgeon before deciding on an access procedure?3.172.332.832.334/64/6
20. How do I know which types of dialysis access are best for me?3.002.832.332.834/65/6
21. How do I maintain and care for my dialysis access?3.333.003.172.836/66/6
Thoracic outlet syndrome22. What are the symptoms of thoracic outlet syndrome caused from compression of the nerve?2.502.832.672.675/66/6
23. What tests can be done to diagnose thoracic outlet syndrome?3.333.173.172.676/66/6
24. What is the treatment of thoracic outlet syndrome caused from compression of the nerve?
a: Accuracy rater scale: 1- completely inaccurate, 2- somewhat accurate, 3- mostly accurate, 4- completely accurate;
b: Completeness rater scale: 1- totally incomplete, 2- somewhat complete, 3- mostly complete, 4- totally complete;
c: Appropriateness rater scale: appropriate or inappropriate?

Table 1: Expert faculty ratings of accuracy, completeness, and appropriateness of ChatGPT and Bard responses to vascular surgery questions posed by patients
