Prof. Stern: We cannot regulate AI with 50-year-old rules

Europe is investing heavily in AI, health data infrastructure and digital transformation. Yet according to Prof. Ariel Dora Stern, Alexander von Humboldt Professor for Digital Health, Economics and Policy at the Hasso Plattner Institute, healthcare regulation is still built on assumptions from a different technological era.

In this interview, Stern discusses why access to health data remains a challenge, why AI cannot be governed using frameworks designed half a century ago, and why healthcare risks becoming what she calls a “digital broken system” if incentives fail to change.

Why Europe?

Question: In 2024, you moved from Harvard Business School to Hasso Plattner Institute in Germany – at a time when Europe is basically questioning whether it can still compete in digital innovation. Research in Europe is considered difficult due to restrictive regulations and data protection requirements, and Europe still lags behind China and the USA in digital innovation. With all my love to Europe: Why?

Prof. Ariel Dora Stern: It was equal parts personal and professional.

On the personal side, I used to live in Berlin. My kids were born here. Raising a family in a place that you genuinely believe is a better place to live was definitely part of the decision.

On the professional side, I knew what I was getting into. I spent a sabbatical year working in Berlin with the Health Innovation Hub of the Ministry of Health in 2020 and 2021. I really had a front-row seat to the first wave of digital transformation in the German healthcare system and in German health policy. I experienced firsthand the momentum we have been seeing here in Germany since late 2019.

At the same time, for me, the experience was also about connecting the dots with what had happened earlier in the United States. Looking at examples like the rollout of electronic health records in the US and asking: what did they do better, what is actually working, and perhaps even more importantly, what is not working in a system that began its digitalization journey roughly a decade earlier.

Just as a benchmark, the HITECH Act in the United States was introduced in February 2009 as part of Barack Obama’s economic recovery package. The German equivalent, the Krankenhauszukunftsgesetz (Hospital Future Act), was passed in October 2020. There were over eleven years between the two. So yes, Europe is in many ways more than a decade behind, but that also means there is an enormous amount to learn from.

During my time with the Health Innovation Hub, I learned a ton. It was really a crash course in German health policy and digital health policymaking. But it also created opportunities to bring perspectives and lessons from the US that perhaps were not yet as widely discussed in Germany.

Then I received the Alexander von Humboldt Professorship, a unique opportunity with outstanding resources to build a lab and team here. I had already been a visiting professor at the institute, so in the end it simply felt like too interesting and too entrepreneurial an opportunity to pass up.

The health data challenge

Question: Is there something that you are missing when it comes to health and research in the US?

Prof. Ariel Dora Stern: Absolutely. And let me be careful here, because whenever I am asked some version of the question, “Which system is better, the US or Europe?”, my standard answer is always the same: both systems are messed up, just in different ways. And I think that is important to remember.

From a research perspective, the thing I miss most in Europe is actually what has brought me back into conversations with policymakers and writing more about policy again: access to health data for research.

In the United States, we have projects looking at care for patients with chronic conditions, including digitally enabled care pathways such as telemonitoring and remote patient monitoring. Of course, there are safeguards in place, and we have to be careful, but ultimately, we are able to work with claims data from the entire Medicare population. Medicare essentially covers all Americans over 65, which means we can analyze the use of these services and their downstream effects across millions and millions of people.

Now, in Germany, we finally have the promise of doing similar work through the new Health Data Lab, and there are many other exciting developments underway. But frankly, one of my biggest frustrations is that we are still building the kind of research infrastructure I was already using a decade ago in the US. There, the infrastructure simply already existed.

That does not mean the US has solved everything when it comes to health data access. Many of the conversations we are now having, including at the Digital Health Innovation Forum at the Hasso Plattner Institute, are also happening in the United States. There is still a real opportunity to shape what this infrastructure and access should look like going forward, and to make it more useful and more researcher-friendly.

But right now, it is still very much a struggle.

Why regulation no longer fits AI

Question: Let’s leave aside the debate about which healthcare system is better, the European or the US. But we have to talk about regulation. This is also the topic of one of your recent studies, in which you examined the challenges of commercialization and deployment in digital health technologies. What worries you more right now: that Europe regulates too aggressively, or that Silicon Valley companies are entering healthcare with too little understanding of medicine?

Prof. Ariel Dora Stern: There is a long list of things that worry me about regulation. But maybe the biggest issue is this: the problem is not really overregulation or underregulation. I could easily give examples of both, and companies have been complaining about both for years.

The real challenge is that our regulatory system was designed for a completely different era of technology. Most algorithms today legally qualify as medical devices, but medical device regulation was never built for complex AI systems that can evolve, adapt, and perform multiple tasks at once. As we move from individual algorithms to large-scale generative systems, the gap between the technologies we are developing and the rules designed to govern them is becoming impossible to ignore.

This could be about large language models, but it does not have to be. It could also involve image-based models or other AI systems that support diagnosis, treatment recommendations, or broader care guidance.

Regulation works reasonably well when an algorithm is designed for one relatively narrow clinical purpose, because that is how we have traditionally understood medical technologies and medical devices. But we do not yet have a good framework for thinking about generative systems that can do many things at once, perhaps almost anything. They may perform exceptionally well in some areas and much less reliably in others.

So the question becomes: how do we responsibly put this kind of software into the hands of doctors, or even patients, knowing all of that?

This is something I first started thinking about in a short article late last year and then explored further in policy work with colleagues, especially in the US. What would sensible regulation for these systems actually look like?

Could we imagine a world in which we regulate AI systems more like medical professionals than like traditional medical devices? Because with medical licensure, on both sides of the Atlantic, we actually have a framework that works. We know what training looks like. We know what continuing education looks like. We have clear moments in time where performance is evaluated.

Doctors take licensing exams before they can practice medicine. They are required to continue their education throughout their careers. In Germany, for example, physicians must continuously collect Fortbildungspunkte (educational points).

So, is there a future in which we ask something similar of medical algorithms? A set of structures in which systems are trained and tested under defined conditions, evaluated and re-evaluated regularly, and expected to keep pace with the evolving state of medicine?

It is not a perfect one-to-one analogy, but there are important lessons we could take from medical licensure and apply to the future scope of practice for algorithms.

This is something medical professional societies, whether the American Medical Association or the German Medical Association, will inevitably have to engage with. But in the near term, we are still constrained by regulatory frameworks built for a different technological reality. And that is really where the tension comes from: the growing mismatch between what we now want these systems to do and the regulatory structures we inherited from the last century.

I will give you one more example. The legal definition of a medical device in the United States comes from legislation passed by Congress in May 1976. In other words, the definition of a medical device is now fifty years old.

So when people say, “This framework no longer works,” well, of course, it does not fully work. It was written with the best intentions at a time when software was not part of medical technology, let alone algorithms or generative AI systems. That is really the core challenge we are facing today.

The OpenEvidence dilemma

Question: And there is a big mismatch between regulation and the technologies that we already have on the market. A symbol of this conflict is the case of OpenEvidence, a system used daily by millions of doctors in the US. But in Europe, it has not been available since April 2026. So is this a symbol of overregulation?

Prof. Ariel Dora Stern: Through our partnership at the Hasso Plattner Institute, and also through my faculty role at Mount Sinai School of Medicine in New York, where they are currently rolling out OpenEvidence to doctors, trainees, nurses, and nurse practitioners, I have been thinking about this a lot.

I do not think this is the first time a US company has been somewhat intimidated by European regulation. And I think, out of caution, OpenEvidence essentially decided: “We need to be careful here.”

At the same time, they have enormous room for growth and many opportunities in other markets, so waiting out the European situation is probably a strategic decision as well. They simply do not want to create unnecessary regulatory complications during a period of rapid expansion.

And by the way, we do have alternatives in Europe.

But let me say this clearly: we should absolutely celebrate the use of evidence-based tools. For me, the fact that a system like OpenEvidence is built around evidence-based medicine is a much more promising approach to using large language models in clinical decision-making than simply imagining doctors turning to the free version of any major LLM.

Because that is actually the real risk. In the absence of access to better and more evidence-based tools, we are effectively pushing clinicians in Europe toward the shadow use of AI—and that is without any training on prompt engineering or how to use LLMs most effectively.

And honestly, these stories are no longer even surprising. Doctors are printing patient records, blacking out names with a marker, taking photos with their smartphones, and uploading them into free summarization tools. This is not only highly inefficient but also a far less desirable way for clinicians to work than using properly designed, evidence-based clinical AI systems.

So despite all of this, I remain optimistic. I think putting evidence-based tools into the hands of clinicians, and not only doctors but healthcare professionals more broadly, is absolutely part of the future we should be building toward.

And this is exactly why we urgently need a better regulatory framework.

Continuous evaluation instead of one-time approval

Question: One of the most provocative ideas in your paper is the notion of adaptive post-market evidence loops. Are we moving toward a future in which medical technologies are no longer approved once and for all, but are constantly re-evaluated in real time? Recently, there has also been debate about whether we need a new approach to generating evidence for generative AI solutions. Nature Medicine published an article in April 2026 arguing that we need more evidence for these solutions. Then OpenAI introduced ChatGPT for Clinicians and HealthBench, which lets us measure model performance in real time.

Prof. Ariel Dora Stern: Let me start by saying that this idea of one-time approval was already becoming misaligned in the software era, even before the arrival of AI and generative AI models.

Together with my Harvard colleague Will Gordon, I wrote an article in 2019 about software-driven medical devices and the unique challenges that emerge the moment you start putting software into medical technologies, not only from a regulatory perspective, but also from the perspective of reimbursement and payers. And there are many challenges there. One of the biggest is responsibility.

If I give you a hip implant today as a manufacturer, and it is a good implant, but the manufacturer goes bankrupt next year, that is probably never going to become a problem for you. You have the implant, it works, and you do not really need to think about it anymore.

But if I give you an implantable, software-driven medical device, such as a pacemaker, and the manufacturer disappears, then it suddenly absolutely matters. What happens if the pacemaker needs firmware updates? What happens if somebody discovers a safety issue? The company is no longer there to maintain the software.

Questions like these already existed long before AI. AI simply amplifies them and adds entirely new layers of complexity.

In the past, in the context of relatively simple digital medical technologies such as digital therapeutics or digital health applications, I argued for the concept of dynamic health technology assessment. In other words, you launch a product based on strong evidence of safety and effectiveness, but then continue to evaluate its performance and value over time.

This matters because we also want to create incentives for manufacturers to improve their products. If a company can improve safety or improve outcomes for patients with chronic diseases, we should absolutely want that to happen. So if we establish regular moments where products can be re-evaluated, manufacturers are incentivized to continue improving performance and patient outcomes. Reimbursement models can then evolve alongside the product itself.

That was already difficult before AI. But now, as we move toward much more complex systems, we have to think about continuous performance testing and much more sophisticated benchmarking. And to be clear, benchmarks themselves are also a problem. Right now, everybody is creating their own benchmarks.

Then there is another issue, which you already mentioned. If you use something like the US medical licensing exam as a benchmark, as many people have pointed out, it is not even obvious that this is the right way to evaluate these systems in the first place.

So while I strongly support the development of better benchmarks, we also need to standardize the benchmarks themselves. Otherwise, we cannot meaningfully compare the performance of different systems if every company is using a completely different evaluation framework. This is just another reason why we need fit-for-purpose regulation. Simply continuing to do things the way we have for decades is no longer a viable strategy. Honestly, that may be the core takeaway from this entire conversation: we cannot continue using frameworks designed fifty years ago and expect them to work in the age of AI.

AI is improving workflows, not transforming healthcare

Question: Let us move from regulation and policy to real-life clinical applications. Many new AI systems are emerging now, such as ambient scribes and predictive systems. Which recent advances in AI in medicine surprise or impress you the most?

Prof. Ariel Dora Stern: What has surprised me most is actually the proliferation of administrative solutions. I am on the advisory board of the Peterson Health Technology Institute in the United States, and they recently published a report that I was not personally involved in but that strongly reinforces this message.

They looked at the use of AI tools for administrative purposes: prior authorization of medical claims, note-taking, billing, report generation, and similar tasks.

One of my favorite takeaways from the report is that ambient AI systems do appear to have real benefits. They can reduce at least some of the so-called “pajama time” for doctors, meaning the hours physicians spend writing notes after work. Doctors also report somewhat lower levels of burnout. But the deeper truth is that when you layer AI onto a healthcare system with poor incentives, you simply create a more technologically advanced system with the same poor incentives. And that is essentially what the report concludes.

It reminds me of a phrase that smart healthcare experts often repeat, some version of: if you layer digital technology onto a broken system, you get a broken digital system.

And honestly, that is deeply true.

A lot of what we are seeing right now is simply the digitization of a healthcare system that already contains many perverse incentives. And maybe that is also what disappoints me slightly compared to what I hoped we would see five years ago. We still have not seen truly transformative innovation in healthcare delivery itself.

What we have today is, in many ways, actually pretty boring. We were already writing clinical notes. We were already doing prior authorizations. We were already billing insurance companies. We were already making diagnoses. Now we are simply doing some of these things faster and more efficiently.

But we are not fundamentally redesigning healthcare delivery. We are mostly layering technology onto existing workflows.

What still makes me optimistic, however, is the opportunity to ask a much bigger question: what would a care pathway look like if we started from zero? If we designed it completely from scratch, greenfield style, using today’s technologies and today’s capabilities?

Question: But the technology can already predict diseases, while no health system in the world really pays for prevention.

Prof. Ariel Dora Stern: And this brings us back to one of the most fundamental ideas in health economics, something thought leaders in the field have been arguing long before I even started my studies: we should be paying for outcomes and value, not simply for doing more things.

Historically, healthcare systems have paid for activities, procedures, and interventions because measuring outcomes was incredibly difficult. Now, for the first time, we actually have much better data and far better ways of measuring outcomes, and we are starting to see the limitations of the old model much more clearly.

But we still need many, many more experiments with payment models and incentives.

Will AI replace doctors?

Question: When we hear pitches from AI companies and startups, they always start with: “We do not want to replace doctors.” But what is the honest answer to the question: Will AI replace doctors?

Prof. Ariel Dora Stern: I believe that people currently in medical school should not worry about job security. But I also believe that what their jobs will look like in ten years may be very different from what those jobs look like today.

Healthcare has always been, and still is, perhaps five or even ten years behind the technology industry when it comes to adopting new technologies. In some ways, that actually helps manage transitions. It acts as a kind of safety mechanism. But at the same time, it also makes the system incredibly frustrating. What I think we really need to focus on now, and this is something we are actively researching, is the question of clinical deskilling.

If I received a serious diagnosis today, I would absolutely want my doctor to have access to the best information available at their fingertips. But at the same time, we are already seeing signs that overreliance on AI systems may gradually deskill clinicians.

You already see examples of this. Radiologists, for instance, may stop paying attention to certain findings unless the system explicitly points them out. Clinicians may start accepting AI-generated clinical notes without stopping to ask themselves whether that is actually how they would have interpreted or documented the case.

So for me, the next important phase is really about understanding human-AI interactions in medicine.

We are still quite far from algorithms replacing humans entirely. And frankly, I would not want to be treated by an algorithm trained only on historical medical data, even if it were a world-class model trained on all available medical information.

At the same time, I do think we can probably do better than the current state of medicine.

But getting there requires understanding something much more nuanced: where humans are better, where computers are better, and how the two can actually complement one another in clinical settings. And honestly, we still have not studied these dynamics deeply enough.

Understanding how clinicians improve when augmented by algorithms is something we are only beginning to learn. The doctors of the future will almost certainly have very different jobs from doctors today. But I do not think we yet have enough evidence to confidently describe what that future should look like. We should be careful not to pretend that we already know.

We still do not have enough data to build a truly evidence-based vision of the future of medicine and of what ultimately leads to the best possible patient care. And that, in the end, should always remain the central question.

Human-AI collaboration in medicine

Question: Robert Wachter, in his recent book A Giant Leap, questions whether we always need to keep doctors in the loop of decision-making. What if keeping humans in the loop could actually reduce the algorithm's performance?

Prof. Ariel Dora Stern: What we absolutely have to do is think about these questions in a highly context-specific way.

One of my frustrations with the broader AI debate in medicine is that people often talk about healthcare as if it were one single thing. It is not. Medicine comprises many distinct environments, workflows, and decision-making settings.

And I actually agree that there are probably situations where we can, and perhaps should, allow algorithms to operate more independently.

Earlier, we talked about licensing systems. So, imagine we have AI systems licensed for relatively narrow and clearly defined tasks where we know they perform extremely well. In those situations, maybe we do not always need a human constantly supervising every single decision.

I think “human in the loop” has become, in many ways, a politically comforting phrase. But it also hides an enormous amount of variation between different medical contexts.

So yes, it sounds reassuring to say we always want humans involved, but the more important question is: in which settings, for which tasks, and where does human involvement actually improve outcomes?

And this is exactly where expertise matters.

We absolutely need doctors, but we also need people who deeply understand how healthcare delivery and clinical workflows actually function today. We need to understand where human time genuinely adds value.

Simply adding technology for the sake of technology is wasteful. But in the same way, insisting on human involvement purely for symbolic reasons can also become wasteful.

What we ultimately need are systems that are truly fit for purpose.

Prevention and the future of care

Question: Beyond technologies for doctors and clinicians, there is also huge progress in consumer technologies. Do you think that in ten years, let’s say in 2036, Apple – with its smartwatch and health-related features – will do more for prevention than healthcare systems do?

Prof. Ariel Dora Stern: I hope they will be working together more than competing. And honestly, I hope everyone is doing more prevention. I do not really care whether it is Apple or some combination of big tech and smaller companies. If they can meaningfully improve prevention and do it well, that would address an area where healthcare systems have historically failed quite badly.

Big tech companies are extremely good at growth hacking. If they can grow-hack their way into better prevention, then frankly, please go ahead, because healthcare systems have not exactly been very successful at this.

So I take a fairly pragmatic view.

Healthcare systems on both sides of the Atlantic have struggled to prioritize preventive medicine under current incentive structures. If technology companies, whether large or small, can genuinely help advance prevention, I would absolutely welcome it.

At the same time, we still need much better incentives. Prevention is still not properly rewarded within formal healthcare delivery systems.

And I think this is one reason why we are now seeing this entire wave, particularly in the United States, of wellness culture and increasingly extreme prevention-related trends. In many ways, it reflects a broader feeling that the healthcare system is not proactively helping people manage their health.

So what happens when formal healthcare systems fail to provide the evidence-based preventive tools people are looking for? You end up with a marketplace full of snake oil. Wild marketing for non-evidence-based supplements, questionable therapies, and all kinds of dubious health optimization products.

In some ways, it is similar to the shadow use of AI we discussed earlier. When formal systems fail to provide safe and effective tools, people start improvising. And while a lot of that is deeply problematic, it also reflects something important: these trends reveal needs that healthcare systems are currently failing to meet.

Rapid fire: healthcare in 2036

Question: It’s a time for a rapid-fire question. Are you ready?

Prof. Ariel Dora Stern: Yes, let’s go.

Question: Which matters more right now: better algorithms or better healthcare data?

Prof. Ariel Dora Stern: Better data.

Question: Will hospitals eventually employ more AI engineers than administrative staff?

Prof. Ariel Dora Stern: I hope so.

Question: What is more dangerous in medicine today: AI hallucinations or biased data?

Prof. Ariel Dora Stern: Biased data.

Question: Who currently understands healthcare AI best: tech giants or ministries of health?

Prof. Ariel Dora Stern: Academic researchers (laugh).

Question: Ten years from now, will AI reduce healthcare costs or simply create more expensive medicine?

Prof. Ariel Dora Stern: If we do not change incentives, it will create more expensive medicine.

Question: In 2036, will we have keyboardless doctors’ offices?

Prof. Ariel Dora Stern: I think we will probably have some version of that. Maybe not with holograms, I do not know what the interface will look like, but I do think we will have much better interfaces than keyboards.

Question: In 2036, will patients who want to see a doctor first have to go through some AI triage to determine whether they actually need a human doctor?

Prof. Ariel Dora Stern: Yes, I believe there will be much more triage.

Question: It is the first day of the 22nd century. Has AI already replaced some doctors?

Prof. Ariel Dora Stern: Yes.

Question: What is more dangerous: too fast or too slow adoption of AI?

Prof. Ariel Dora Stern: Too slow.

Question: AI that hallucinates or no doctor at all? For example, if the system does everything possible to prevent patients from seeing a doctor?

Prof. Ariel Dora Stern: Probably no doctor at all.

Question: And the last question: will the patient journey in ten years be completely different from today, or mostly the same with just some additional technologies?

Prof. Ariel Dora Stern: Much of the same, but significantly different in certain areas, especially chronic care.