Objective:
Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality and accessibility of AI responses to common patient questions regarding vascular surgery diseases.
Methods:
OpenAI’s ChatGPT3.5 and Google’s Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales.
Results:
ChatGPT responses were rated more accurate than Bard responses (x̄ (SD): 3.08 (0.33) versus 2.82 (0.40), p<0.01). ChatGPT responses were scored more complete than Bard responses (x̄ (SD): 2.98 (0.34) versus 2.62 (0.36), p<0.01). The inter-rater reliability (IRR) was fair among faculty ratings of accuracy (WChatGPT=0.20, WBard=0.25) and completeness (WChatGPT=0.23, WBard=0.28). Most ChatGPT responses (75.0%, n=18) and almost half of Bard responses (45.8%, n=11) were unanimously deemed appropriate. While the IRR for Bard appropriateness ratings was fair (κ=0.20), the IRR was poor for ChatGPT ratings (κ=0.01). The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 (SD: 10.8), 14.5 (SD: 2.2), and 17.7 (SD: 3.1), respectively, indicating that responses were readable with a post-secondary education. Bard’s mean readability scores were 58.9 (SD: 10.5), 8.2 (SD: 1.7), and 11.0 (SD: 2.0), respectively, indicating that responses were readable with a high-school education (p<0.0001 for three metrics). Table 1 summarizes the findings.
Conclusions:
AI offers a novel means of educating patients that avoids the inundation of information from “Dr. Google” and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of reader accessibility. While Bard responses are more readable, their quality is poorer.
Domain | Question | Accuracya (Mean Rating) | Completenessb (Mean Rating) | Appropriatenessc (Fraction of Raters) | ||||
ChatGPT | Bard | ChatGPT | Bard | Bard | Bard | |||
SVS Website “Common Questions” | 1. What is a vascular surgeon? | 3.50 | 3.17 | 3.17 | 2.83 | 6/6 | 6/6 | |
2. What is vascular disease? | 3.33 | 3.17 | 3.17 | 2.83 | 6/6 | 6/6 | ||
3. What is the vascular system? | 3.50 | 3.33 | 3.33 | 2.67 | 6/6 | 6/6 | ||
Peripheral artery disease | 4. I have leg pain. Is this peripheral arterial disease? | 3.00 | 2.67 | 3.00 | 2.17 | 5/6 | 5/6 | |
5. How is peripheral arterial disease diagnosed? | 3.50 | 3.00 | 3.33 | 2.83 | 6/6 | 6/6 | ||
6. How can I prevent my peripheral arterial disease from getting worse? | 3.33 | 3.33 | 3.50 | 3.00 | 6/6 | 6/6 | ||
Abdominal Aortic Aneurysm | 7. I do not have any symptoms from my abdominal aortic aneurysm. Do I need to consider having it fixed? | 3.67 | 3.00 | 3.67 | 2.83 | 6/6 | 6/6 | |
8. Should everyone be checked to see if they have an abdominal aortic aneurysm? | 3.17 | 2.33 | 3.33 | 2.17 | 5/6 | 5/6 | ||
9. How are abdominal aortic aneurysms treated? | 2.67 | 2.83 | 2.50 | 2.67 | 5/6 | 5/6 | ||
Carotid artery disease | 10. I am dizzy. Is this from carotid artery disease? | 2.83 | 2.50 | 2.67 | 2.83 | 4/6 | 4/6 | |
11. How much carotid artery blockage is considered significant? | 2.83 | 2.00 | 2.83 | 1.83 | 1/6 | 1/6 | ||
12. If I have no symptoms from my carotid artery disease, should I have it fixed? | 3.33 | 2.67 | 3.00 | 2.17 | 5/6 | 5/6 | ||
Deep venous thrombosis | 13. What are the symptoms of a deep venous thrombosis? | 3.00 | 3.50 | 3.17 | 3.33 | 6/6 | 6/6 | |
14. Can a deep venous thrombosis be detected on physical exam? | 3.17 | 2.33 | 3.00 | 2.00 | 4/6 | 4/6 | ||
15. Does my deep venous thrombosis require surgery? | 2.50 | 2.17 | 2.67 | 2.33 | 2/6 | 2/6 | ||
Varicose veins | 16. I have an ulcer on my leg. What is this and is it related to my varicose veins? | 2.67 | 2.50 | 2.50 | 2.50 | 4/6 | 4/6 | |
17. You can see my varicose veins. What else will my vascular surgeon look for on a physical exam? | 2.67 | 3.00 | 2.83 | 2.83 | 6/6 | 6/6 | ||
18. How do I know if the visible veins on my legs are just a “cosmetic” concern or if they represent a real medical condition? | 3.00 | 3.00 | 3.00 | 2.83 | 5/6 | 5/6 | ||
Chronic kidney disease | 19. Are there any symptoms in my arms or legs that I should mention to my vascular surgeon before deciding on an access procedure? | 3.17 | 2.33 | 2.83 | 2.33 | 4/6 | 4/6 | |
20. How do I know which types of dialysis access are best for me? | 3.00 | 2.83 | 2.33 | 2.83 | 4/6 | 5/6 | ||
21. How do I maintain and care for my dialysis access? | 3.33 | 3.00 | 3.17 | 2.83 | 6/6 | 6/6 | ||
Thoracic outlet syndrome | 22. What are the symptoms of thoracic outlet syndrome caused from compression of the nerve? | 2.50 | 2.83 | 2.67 | 2.67 | 5/6 | 6/6 | |
23. What tests can be done to diagnose thoracic outlet syndrome? | 3.33 | 3.17 | 3.17 | 2.67 | 6/6 | 6/6 | ||
24. What is the treatment of thoracic outlet syndrome caused from compression of the nerve? | 3.00 | 3.00 | 2.67 | 2.83 | 5/6 | 4/6 | ||
a: Accuracy rater scale: 1- completely inaccurate, 2- somewhat accurate, 3- mostly accurate, 4- completely accurate; b: Completeness rater scale: 1- totally incomplete, 2- somewhat complete, 3- mostly complete, 4- totally complete; c: Appropriateness rater scale: appropriate or inappropriate? |