Back to 2025 Display Posters
Human Vs. Artificial Intelligence: Analyzing Cpt Coding Accuracy In Vascular Surgeries
Brandon Dean Madris, MD, Keivan Ranjbar, MD, Andre Critsinelis, MD, Stephanie Talutis, MD, Shivani Kumar, MD, Payam Salehi, MD.
Tufts Medical Center, Boston, MA, USA.
OBJECTIVES: Current Procedural Terminology (CPT) coding is important for medical billing and reimbursement. With AI advancements, there's growing interest in using AI to improve coding accuracy and efficiency. This study compares CPT coding accuracy between human finance teams and AI applications, focusing on vascular surgeries performed at Tufts Medicine health system over one month.
METHODS: We conducted a comparative analysis of CPT code classifications between ChatGPT and Perplexity AI using a dataset of 120 surgical cases performed in April 2024. The analysis used SPSS version 29. Each case was scored against standard charge codes (provided by the finance department) as a match, partial match, or non-match. Agreement between the AI systems was assessed using crosstabs analysis and Cohen's Kappa statistic. We calculated the percentage of correct classifications, partial matches, and non-matches for each AI system, and the frequency of agreement and disagreement with the standard codes.
RESULTS: The study included 111 patients who underwent 120 surgeries by five vascular surgeons. Of the 257 CPT codes from the finance department, 98 were unique. Perplexity AI accurately matched 38.3% of cases and partially matched 35.0%, while ChatGPT accurately matched 27.5% and partially matched 47.5%. Non-matches were similar: Perplexity AI at 26.7% and ChatGPT at 25.0%. Radiofrequency Ablation and Endarterectomy were most frequently matched correctly by both AI systems, while Endovascular Aneurysm Repair (EVAR) was most frequently mismatched. Agreement between AI systems occurred in 64.2% of cases, with a Cohen’s Kappa value of 0.458 (p < 0.001), indicating moderate agreement.
CONCLUSIONS: The study shows significant differences in coding accuracy between AI applications and human billing teams. While Perplexity showed slightly higher accuracy, there's room for improvement with additional training. Further refinement and training of AI tools could enhance coding efficiency and accuracy. Future research should explore collaborative approaches between human billers and AI applications, and the potential to improve accuracy by providing an AI learning model with a comprehensive dataset of billed codes.
Back to 2025 Display Posters