The integration of visual capabilities into AI-powered medical scribes may significantly improve the accuracy of clinical documentation while reducing administrative burden for healthcare professionals. That is the conclusion of a study by researchers from Flinders University.
AI scribes are increasingly used to record and transcribe patient consultations in real time, helping clinicians reduce time spent on documentation. However, current systems rely primarily on audio input, which may limit their ability to capture important clinical details.
Adding visual context to consultations
The research team investigated whether combining audio with visual data could enhance documentation quality. Using a setup that combined Google Gemini with Ray-Ban Meta smart glasses, they developed a vision-enabled AI scribe capable of analyzing both spoken and visual information during consultations. Findings were published in npj Digital Medicine.
“AI scribes are already helping clinicians by listening to consultations, but health care involves far more than spoken words,” says Bradley Menz, academic pharmacist and lead author of the study. “A lot of clinically important information is visual. Important visual cues during consultations include patients' medicine containers, prescriptions and devices, as well as their body language. When an AI system can use both what it hears and what it sees in these consultations, it captures more of the details that matter for patient care.”
Higher accuracy in medication data
In the study, 10 clinical pharmacists conducted 110 simulated medication-history interviews involving more than 100 different medicine containers, including tablets, capsules, injections and creams. The consultations were recorded using smart glasses, after which the AI system processed the data.
The results show a clear improvement in accuracy. The vision-enabled AI scribe achieved 98% accuracy, compared with 81% for the same system using audio alone.
A notable improvement was seen in capturing medication strength and form, critical factors for safe dosing. With video input, the system correctly identified this information in 97% of cases, compared with just 28% using audio-only data.
Support tool for AI-assisted documentation
According to the researchers, the technology is intended to support clinicians rather than replace their judgment. “This is an augmented tool, not a replacement for clinical judgment,” says Menz.“ The clinician still needs to review and sign off the document. “The AI scribe can contain a verification step, take screenshots of medication packages, and generate a full spoken transcript, giving the health professional a much stronger basis for checking what the AI has produced.”
Senior author Ashley Hopkins notes that AI scribes are gaining traction because they help reduce documentation workload and free up time for patient care. The addition of visual capabilities could represent the next phase in their development. “These findings suggest that the next step, when the scribe can see as well as hear, produces a more accurate and complete draft,” says Hopkins. “This means less time editing AI documentation and even more time focusing on patient care.”
The researchers emphasize that broader implementation will require careful consideration of privacy, consent, data security and integration into clinical workflows. Human oversight remains essential, and further research is needed to validate the technology in real-world healthcare settings.