A Commentary on Computerized Diagnostic Decision Support Systems-A Comparative Performance Study of Isabel Pro <i>versus</i> ChatGPT4

Joe M Bridges

Commentary Open Access

A Commentary on Computerized Diagnostic Decision Support Systems-A Comparative Performance Study of Isabel Pro versus ChatGPT4

Abstract

This paper compares the diagnostic performance of the commercially available diagnostic decision support system, Isabel Pro, to the OpenAI generative pre-trained artificial intelligence system, ChatGPT4. The study used 201 cases, each with a confirmed diagnosis, using identical inputs, requesting a differential diagnosis listing, and comparing the ranking of the correct diagnosis by each system using Mean Reciprocal Rank (MRR) and Recall at Rank for ranks 1, 5, 10, 20, 30 and 40. ChatGPT4 was requested to provide a complete reference citation for each diagnosis returned in its differential. An MRR of 1.0 would imply the correct diagnosis presented as the first-ranked diagnosis in all cases. ChatGPT4 returned an MRR of 0.428, while Isabel Pro returned an MRR of 0.389. ChatGPT4 outperformed on Recall at Ranks 1, 5, and 10, while Isabel Pro outperformed at ranks 20, 30, and 40. The 201 cases were insufficient to conclude that the systems were equivalent. The concerning issue for the clinical use of ChatGPT4 is “What reference substantiates the correct diagnosis?” ChatGPT4 fabricated over 12% of the references cited and almost 70% of the DOI. The study concludes that while the promise of artificial intelligence is high, the fabrication of references will limit the clinical use of these models until they achieve absolute accuracy.

Joe M Bridges

To read the full article Download Full Article | Visit Full Article