AI Better Than Doctors At Diagnosing Patients

Microsoft, famous for its popular Windows desktop operating system, is expanding into the medical field. Microsoft has developed an artificial intelligence (AI)-based medical program that can diagnose diseases four times more accurately than human doctors and at a fraction of the cost, according to a study conducted by the tech giant.

With the Microsoft AI Diagnostic Orchestrator (MAI-DxO), the company has taken “a genuine step toward medical superintelligence,” Mustafa Suleyman, CEO of Microsoft AI, told Wired.

Suleyman and other AI experts who previously worked at Google developed the MAI-DxO. Suleyman worked as an executive at Google before moving on to head up Microsoft’s AI division.

The Microsoft research team wanted to test the MAI-DxO to determine whether the tool could accurately diagnose a patient’s illness, just like a human doctor would. The MAI-DxO interacts with other top AI models, such as OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and xAI’s Grok, in a manner similar to how several human experts collaborate.

The research involved using 304 real case studies from the New England Journal of Medicine. The cases were described in Microsoft AI’s blog post as “the most diagnostically complex and intellectually demanding in clinical medicine, often requiring multiple specialists and diagnostic tests to reach a definitive diagnosis.”

The team developed a test called the Sequential Diagnosis Benchmark. A language model broke down each case into a step-by-step process that a doctor would follow to reach a diagnosis. For example:

• The doctor asks questions and may order blood tests and a chest X-ray.

• The doctor reviews the tests and X-ray before feeling confident enough to diagnose the patient with pneumonia.

The study demonstrated that the AI system accurately diagnosed 80% of the cases. Microsoft researchers also involved 21 practicing physicians from the United Kingdom and the United States, each with 5 to 20 years of clinical experience. The doctors were given the same tasks as the AI system and reached an average accuracy of 20 percent across the completed cases.

The researchers also reported that not only did the AI system accurately diagnose cases, but it reduced costs by 20 percent compared to doctors by choosing less expensive tests and procedures.

“This orchestration mechanism—multiple agents that work together in this chain-of-debate style—that’s what’s going to drive us closer to medical superintelligence,” Suleyman told Wired.

The company also believes that AI could help reduce healthcare costs, a significant problem currently facing the United States.

“Our model performs incredibly well, both getting to the diagnosis and getting to that diagnosis very cost-effectively,” Dominic King, a vice president at Microsoft who is involved with the project, told Wired.

AI is already widely used in certain areas of the healthcare industry in the United States, despite concerns about its use, such as bias from training data that’s skewed toward particular demographics.

In recent years, both Microsoft and Google have conducted research to show that large language models can accurately diagnose medical conditions using patient records. However, Microsoft’s new study differs from previous work in that it more accurately reflects how human doctors diagnose diseases by analyzing symptoms, ordering tests, and conducting further analysis until a diagnosis is determined.

Experts Weigh In On Microsoft AI Study

David Sontag, a scientist at MIT and cofounder of Layer Health, a startup that builds medical AI tools, called the study “quite exciting.”

Sontag told Wired that Microsoft AI’s work is essential because it more accurately reflects how doctors work and thoroughly addresses potential issues with the underlying methodology. “That’s what makes this paper strong,” Sontag said.

Eric Topol, a scientist at the Scripps Research Institute, described the study as “impressive” because it “tackles highly complex cases for diagnosis.” Showing, in theory, that AI can reduce the cost of medical care is a novel idea, Topol told Wired.

Both Topol and Sontag suggest that the next step Microsoft should take to validate its AI system before widespread use is to test it in a clinical trial and compare its results with those of real doctors treating real patients. “Then you can get a very rigorous evaluation of cost,” Sontag said.

Limitations of the Study Highlighted

Although impressive, Sontag says Microsoft AI’s results should be treated with caution because the doctors in the study were instructed not to use any additional tools to help with their diagnosis, which may not reflect how they operate in real-life settings.

Sontag also questioned whether the AI system would significantly reduce costs in practice. For example, the doctors in the study may have considered factors that the AI could not, such as a patient’s tolerance for a procedure or the availability of a specific medical instrument.

Additionally, the doctors did not have access to their colleagues, textbooks, or even generative AI, which they may use in their everyday clinical practice. “This was done to enable a fair comparison to raw human performance,” according to the Microsoft AI blog post about the study.

While the MAI-DxO exceeded Microsoft researchers’ expectations in tackling the most complex diagnostic challenges, they acknowledged that further testing is needed to determine how it performs on everyday routine cases.

The researchers also said that there are still challenges that need to be overcome before generative AI can be “safely and responsibly” used in healthcare. This is why Microsoft AI officials say they plan to partner with top health organizations to thoroughly “test and validate” the methods before implementing them on a broader scale.

“What you’ll see over the next couple of years is us doing more and more work proving these systems out in the real world,” Suleyman told Wired.

In addition to having evidence from real clinical settings, Microsoft AI leaders say that what is also needed is “proper governance and regulatory frameworks to guarantee reliability, safety, and effectiveness.” Nonetheless, they are still enthusiastic about the study results.

“For us, this is just the first step,” Microsoft AI said in its blog post. “We’re energized by the opportunities ahead. ”

Source Links:

https://www.wired.com/story/microsoft-medical-superintelligence-diagnosis/
https://microsoft.ai/new/the-path-to-medical-superintelligence/