ChatGPT's Performance on US Medical Licensing Exam (USMLE)

A study conducted by Tiffany Kung, Victor Tseng, and their team at AnsibleHealth reveals that ChatGPT, a large language model (LLM) or artificial intelligence (AI) system, can achieve a performance level close to the passing threshold for the United States Medical Licensing Exam (USMLE). The findings of this study were published on February 9, 2023, in the open-access journal PLOS Digital Health.

Understanding ChatGPT’s Capabilities

ChatGPT operates by generating text that mimics human writing patterns, using internal processes to predict word relationships. Unlike typical chatbots, it does not rely on online searches but generates responses based on its predictive capabilities.

Performance Evaluation on USMLE

The researchers assessed ChatGPT’s performance on the USMLE, which comprises three standardized exams (Steps 1, 2CK, and 3) crucial for medical licensure in the US. These exams cover a broad spectrum of medical knowledge, including biochemistry, diagnostic reasoning, and bioethics.

The study analyzed 350 questions from the June 2022 USMLE release, excluding image-based questions. After filtering out indeterminate responses, ChatGPT achieved scores ranging from 52.4% to 75.0%, surpassing the passing threshold of approximately 60%. It also demonstrated a high level of concordance (94.6%) across its responses and provided significant insights in 88.9% of its responses.

Outperforming Biomedical Domain Models

In comparison to PubMedGPT, a model trained specifically on biomedical literature, ChatGPT exhibited superior performance by scoring 50.8% on an older dataset of USMLE-style questions.

Implications for Medical Education and Practice

Despite limitations in input size, the study underscores ChatGPT’s potential to augment medical education and clinical practice. At AnsibleHealth, clinicians already leverage ChatGPT to simplify complex medical reports for patient understanding.

Dr. Tiffany Kung emphasized ChatGPT’s integral role in the research process, stating that it contributed significantly to manuscript writing and offered valuable insights akin to a colleague’s input.

The authors view ChatGPT’s achievement in reaching the passing score for a challenging expert exam without human reinforcement as a significant milestone in the advancement of clinical AI. They foresee ChatGPT’s continued evolution benefiting medical professionals and ultimately enhancing patient care and outcomes.