Mayank Daswani

Mayank Daswani

Member of Technical Staff, Microsoft AI

At Microsoft AI, I co-developed MAI-DxO, a multi-agent diagnostic system that matches specialist-level accuracy on complex NEJM cases. Across six years at DeepMind Health, Google Health Research, and Gemini, I took mammography screening AI through a regulatory submission, shipped post-training for factuality on Gemini 2.5, and led research showing that consumer hardware — a smartphone camera, a cheap radar — can replace clinical-grade cardiovascular sensing.

I hold a PhD in reinforcement learning from ANU under Marcus Hutter, where I worked on principled state abstraction and learned forgetting.

Full publication list on Google Scholar. Reach me on LinkedIn or at mayankdaswani@gmail.com.

Microsoft AI

Member of Technical Staff

I work on health AI within Microsoft AI, focused on using large language models for clinical applications. My most notable project is MAI-DxO (MAI Diagnostic Orchestrator), a model-agnostic orchestrator that simulates a panel of physicians to propose differential diagnoses and strategically select high-value, cost-effective diagnostic tests. When paired with OpenAI’s o3 model, MAI-DxO achieves 80% diagnostic accuracy on complex NEJM clinicopathological conference cases — four times higher than the 20% average of human clinicians on the same cases.

This work was published as Sequential Diagnosis with Language Models (arXiv, 2025) and introduced SDBench, an interactive benchmark for evaluating AI diagnostic agents through realistic sequential clinical encounters drawn from 304 consecutive NEJM cases.

Google / Google DeepMind

Senior Software Engineer / Senior Researcher

Nov 2024 – May 2025 · Gemini, Google DeepMind

For my last 6 months at Google I moved to the Gemini team, working on post-training for factuality for Gemini 2.5. Gemini 2.5 Pro achieves state-of-the-art performance on factuality benchmarks including SimpleQA and FACTS Grounding.

Jul 2019 – Nov 2024 · Google Health Research (originally joined as DeepMind Health)

I worked across medical imaging and physiological sensing, adapting ResNet-family architectures (2D, 1D, and a 2D+1D hybrid) to each modality. Key projects:

  • Mammography screening AI: Helped take a mammography system through a regulatory submission — built out-of-distribution classifiers and led the statistical analysis plan.
  • PPG-based cardiovascular risk prediction: Showed that PPG from a fingertip device can predict 10-year cardiovascular risk non-inferior to traditional office-based screening. Published in PLOS Global Public Health (2024).
  • Radar-based heart rate monitoring: First cross-radar transfer learning (FMCW → IR-UWB) for contactless heart rate monitoring — sub-1 bpm error on FMCW and 25% error reduction on UWB. Published on arXiv (2025).
QuintessenceLabs

Software Developer, Technical Lead

I worked on qCrypt, the flagship enterprise key management product, backed by a quantum random number generator (QRNG) and hardware security module (HSM). Notable projects include designing a multi-master replication system, third-party HSM integration, and backup/restore improvements.

Australian National University

PhD Computer Science

PhD supervised by Marcus Hutter and Peter Sunehag. Thesis: Generic Reinforcement Learning beyond Small MDPs.

I worked on Feature Reinforcement Learning — automatically compressing large, partially observable environments into tractable MDPs via learned state abstractions. The focus was empirical, aimed at making general RL practical.

Australian National University

Bachelor of Computer Science (Honours), University Medal

Custom program spanning Computer Science and Mathematics at ANU. Honours thesis: “An Empirical Evaluation of ΦMDP Agents” (reinforcement learning). Awarded the University Medal.

Plots Unlock Time-Series Understanding in Multimodal Models

arXiv · Google Research Blog

Demonstrates that plotting health time-series as images for multimodal models dramatically outperforms text-based representations — up to 150% on consumer health tasks including fall detection, activity recognition, and readiness — with 90% reduction in API cost. No additional training required; standard multimodal models benefit directly.

Sequential Diagnosis with Language Models

arXiv

Introduces SDBench, an interactive benchmark for evaluating AI diagnostic agents through 304 NEJM clinicopathological cases. Also presents MAI-DxO, a diagnostic orchestrator that achieves 80% accuracy on complex clinical cases — 4x higher than the human clinician baseline — while managing cost-effectiveness via strategic test ordering.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

arXiv

Technical report introducing the Gemini 2.X model family (Gemini 2.5 Pro and Flash). Gemini 2.5 Pro achieves state-of-the-art performance on factuality benchmarks including SimpleQA and FACTS Grounding. I contributed to the factuality work during my last six months at Google.

UWB Radar-based Heart Rate Monitoring: A Transfer Learning Approach

arXiv

First demonstration of transfer learning between FMCW and IR-UWB radar systems for vital sign monitoring. Using a novel 2D+1D ResNet architecture, achieves MAE of 0.85 bpm for heart rate with FMCW radar and a 25% MAE reduction on a small IR-UWB dataset via transfer learning, enabling accurate contactless heart rate monitoring via consumer electronics.

Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning

PLOS Global Public Health

Demonstrates that a deep learning model using PPG signals from a fingertip device — combined with age, sex, and smoking status — can predict 10-year risk of major adverse cardiovascular events (MACE) with C-statistic 71.1%, non-inferior to traditional office-based screening that requires blood pressure, BMI, and cholesterol measurements. A proof-of-concept for accessible CVD screening in resource-limited settings.

I’m an avid indoor boulderer. I also make electronic music — you can find some of it on SoundCloud.