Study shows that LLMs act like medical devices in clinical care

Picture this: you're in a medical emergency, no doctor in sight, and someone nearby whips out their phone and asks a chatbot for help. That chatbot might just answer like a seasoned physician. Sound far-fetched? It's not. A new study out of Penn by Dr. Gary Weissman shows that large language models (LLMs) like GPT-4 and Llama-3 are already dishing out advice that walks and talks like clinical decision-making — even when explicitly told not to.

"These models are already being used in ways they weren't built for," Weissman says. And if they're acting like medical devices, shouldn't they be regulated like them?

The U.S. FDA hasn't caught up yet. While LLMs are technically not approved for clinical use, they're slipping into that role — and regulators seem to be blinking at a red light.

How does it work?

At the heart of Weissman's study is a clever simulation. Think of it as a medical improv exercise, but with machines instead of actors.

  • Researchers fed GPT-4 and Llama-3 evolving clinical scenarios, asking for advice as a case progressed.
  • They repeated each scenario five times to test the models' consistency (or lack thereof).
  • What they found was startling: even with guardrails, the AIs often gave responses that looked an awful lot like medical recommendations.

That's where things get tricky. Unlike traditional medical software, which is deterministic and behaves the same way every time, LLMs are stochastic. That's a fancy word for unpredictable. Ask the same question ten times, and you might get ten different answers. Like asking ten teenagers what they want for dinner.

"In some cases, they recommended CPR and calling 911 — great advice for a bystander. But in other cases, they suggested inserting IVs or giving oxygen, which is only appropriate if you're a trained clinician," Weissman noted.

So the output isn't just clinical — it's contextually clinical. And that means it's entering the realm of medical devices, whether the FDA likes it or not.

Why does it matter?

If you've ever Googled a symptom at 2 a.m., you know how tempting it is to trust the internet over your instincts. Now imagine an AI that sounds like a doctor, acts like a doctor, but has zero training, zero accountability, and no medical license.

That's the danger.

  • These AIs could save lives in emergencies — if used right.
  • But they could also cause harm by giving advice they're not equipped to give.
  • Worse, they don't know the difference between a panicked bystander and a board-certified ER doc.

"This tech is like a Swiss Army knife with no instruction manual," Weissman says. "We need to define what's safe, what's allowed, and who gets to use it."

That means regulators need to draw some hard lines. Quickly.

The context

The FDA still treats AI tools under a framework built for old-school devices — pacemakers, thermometers, you name it. But LLMs don't play by the same rules. They're not just responding to inputs — they're generating knowledge on the fly.

Here's what Weissman thinks should change:

  • Clamp down on "off-label" AI use. Right now, LLMs will answer anything you throw at them, even if it's outside their safe zone.
  • Create new pathways for "general-purpose" AI tools. If an AI can help in emergencies, maybe it deserves its own category — not a square peg forced into a round regulatory hole.
  • Differentiate between users. A model used by a trained physician should be regulated differently than one used by a teenager with a smartphone and a Wi-Fi signal.

In short: the FDA's playbook is outdated. LLMs are evolving faster than bureaucracy can blink. And that puts patients — and the promise of AI in health care — at risk.

Still, Weissman is hopeful. "These technologies are here to stay. If we think smart, act fast, and regulate wisely, we can harness them for good."

But if we don't? Well, don't say the machines didn't warn us.

source

💡Did you know?

You can take your DHArab experience to the next level with our Premium Membership.
👉 Click here to learn more