LLMs have clinically significant error rates

Jun 3

Of the models tested on a standardized set of oncology questions, GPT-4 was observed to have the highest performance. Although this performance is impressive, all LLMs continue to have clinically significant error rates, including examples of overconfidence and consistent inaccuracies. Given the enthusiasm to integrate these new implementations of AI into clinical practice, continued standardized evaluations of the strengths and limitations of these products will be critical to guide both patients and medical professionals.

Link to study

Ari Gray

LLMs have clinically significant error rates

Bias in Medical Machine Learning Models

Predicting Type 2 Diabetes with CXR