How well does OpenAI’s GPT-5 answer health questions?

A new generative artificial intelligence model, GPT-5 by OpenAI, was advertised as a significant leap in genAI development. Indeed, it introduces several key improvements: it is no longer overly optimistic when giving mental health advice, the frequency of hallucinations has decreased by up to tenfold, and the accuracy of diagnoses has doubled.

Since the launch of ChatGPT, patients have been eager to ask it about diagnoses, treatments, and how to manage their health. Some even uploaded lab test results and entire medical files for a second opinion. Sometimes they received misleading answers, and sometimes a clue that allowed them to finally find the causes of the disease after years of visiting doctor after doctor.

It was the biggest problem with ChatGPT: While previous models provided surprisingly accurate responses, they occasionally conveyed false information with undue confidence, sometimes persisting in these inaccuracies.
GPT-5 promises to address these shortcomings with more in-depth, nuanced responses and enhanced contextual understanding. This makes it a potentially valuable assistant for both patients and healthcare professionals.

“For the first time, I feel like I’m talking to an expert in any field, such as a doctor of science,” said OpenAI CEO Sam Altman during the GPT-5 launch.

Health-related queries are particularly challenging for AI, as even slight nuances in symptoms or medical history can significantly impact outcomes. According to OpenAI, GPT-5’s improved reasoning and decision-making considerably reduce the likelihood of hallucinations, making it a safer and more reliable tool. Here is what we know about the new capabilities of GPT-5.

Multimodal medical reasoning

GPT-5 was evaluated using HealthBench, a diagnostic platform containing 5,000 real-life health scenarios validated by medical professionals. The results demonstrated a substantial improvement over previous models.
When the “think longer” option is enabled, GPT-5’s accuracy doubled compared to GPT-4o. Hallucination rates for medical questions dropped from 15.8% in GPT-4o to just 1.6% in GPT-5. Even without this option, which is available for free only once per day, the reduction remains fourfold.

GPT-5 extends beyond text analysis with built-in multimodal medical reasoning. It can interpret patient data in various formats, including test results and medical images. GPT-4 had only moderate success in this area.
On medical exams such as the USMLE, GPT-5 outperformed human experts. Its multimodal capabilities allow it to combine textual descriptions with images, ask follow-up questions, and provide preliminary assessments.

GPT-5 also demonstrates an advantage over general search engines. Unlike classical Google Search, which may prioritize content popularity and sometimes overstate risks, classifying minor symptoms as severe conditions, GPT-5 offers more cautious guidance, often based on the latest evidence. The new model also has one minor advancement: it should now focus on practical next steps and suggest professional consultation when necessary, rather than assuming the role of a doctor.

More empathy and less overoptimism

OpenAI has incorporated safeguards to prevent GPT-5 from spreading misinformation. Health advice is now tailored to the user’s knowledge level, cultural context, and geographic region. Built-in limitations ensure the model refrains from responding to queries beyond its expertise or addressing ethically sensitive topics, prioritizing responsible guidance.

A recent Harvard Business Review study found that users often turn to ChatGPT for therapy or companionship. GPT-5 builds on this trend with more empathetic and scientifically grounded responses.

Unlike GPT-4, which sometimes offered overly optimistic or unrealistic advice, GPT-5 emphasizes professional consultation. It guides users to seek qualified support when necessary and provides practical strategies for discussing sensitive issues. This makes it a safer and more supportive tool for users experiencing anxiety, depression, or trauma.

GPT-5 ready for AI agents orchestrating patients’ care

GPT-5 gives the impression that it works well from the very first chat. The answers are more natural. The good news is that it is free to use, although the conversation length is limited. Features such as the “think longer” option or document analysis are available once per day for free users. Paid subscriptions, such as GPT-Plus (€23/month) and the PRO version (€229/month), remove these restrictions and provide additional tools for extended research and in-depth analyses.

GPT-5 represents a significant advancement over GPT-4, functioning as an active conversational partner that proactively identifies potential issues and asks relevant follow-up questions—a feature that GPT-4 struggled to provide.

The model is more cautious in its medical reasoning, encourages professional consultation, and significantly reduces hallucinations, marking a significant step toward the safe use of AI in healthcare. Just three years after ChatGPT’s debut, GPT-5’s heightened sensitivity to mental health topics, improved accuracy, and multimodal capabilities mark a remarkable leap forward, transforming it into a more reliable, nuanced, and empathetic AI companion for both patients and clinicians.

This progress is reflected in its global adoption: ChatGPT now engages over 120 million daily users and 800 million weekly users, handling more than a billion prompts every day. With OpenAI planning to release a more open model in the coming months and preparing its first custom AI chips for deployment by 2026, GPT-5 is ready for the next step in genAI evolution: AI agents tailored to orchestrate patient journeys and care coordination.