Our progress on generative AI in health
Last year at Google Health’s Check Up event, we introduced Med-PaLM 2, our large language model (LLM) fine-tuned for healthcare. Since introducing that research, the model has become available to a set of global customer and partner organizations that are building solutions for a range of uses — including streamlining nurse handoffs and supporting clinicians’ documentation. At the end of last year, we introduced MedLM, a family of foundation models for healthcare built on Med-PaLM 2, and made it more broadly available through Google Cloud’s Vertex AI platform.
Since then, our work on generative AI for healthcare has progressed — from the new ways we’re training our health AI models to our latest research on applying AI to the healthcare industry.
New modalities in models for healthcare
Medicine is a multimodal discipline; it’s made up of different types of information stored across formats — like radiology images, lab results, genomics data, environmental context and more. To get a fuller understanding of a person’s health, we need to build technology that understands all of this information.
We’re bringing new capabilities to our models with the hope of making generative AI more helpful to healthcare organizations and people’s health. We just introduced MedLM for Chest X-ray, which has the potential to help transform radiology workflows by helping with the classification of chest X-rays for a variety of use cases. We’re starting with Chest X-rays because they are critical in detecting lung and heart conditions. MedLM for Chest X-ray is now available to trusted testers in an experimental preview on Google Cloud.
Research on fine-tuning our models for the medical domain
Approximately 30% of the world’s data volume is being generated by the healthcare industry - and is growing at 36% annually. This includes large quantities of text, images, audio, and video. And further, important information about patients' histories is often buried deep in a medical record, making it difficult to find relevant information quickly.
For these reasons, we’re researching how a version of the Gemini model, fine-tuned for the medical domain, can unlock new capabilities for advanced reasoning, understanding a high volume of context, and processing multiple modalities. Our latest research resulted in state-of-the-art performance on the benchmark for the U.S. Medical Licensing Exam (USMLE)-style questions at 91.1%, and on a video dataset called MedVidQA.
And because our Gemini models are multimodal, we were able to apply this fine-tuned model to other clinical benchmarks — including answering questions about chest X-ray images and genomics information. We’re also seeing promising results from our fine-tuned models on complex tasks such as report generation for 2D images like X-rays, as well as 3D images like brain CT scans – representing a step-change in our medical AI capabilities. While this work is still in the research phase, there’s potential for generative AI in radiology to bring assistive capabilities to health organizations.
A Personal Health LLM for personalized coaching and recommendations
Fitbit and Google Research are working together to build a Personal Health Large Language Model that can power personalized health and wellness features in the Fitbit mobile app, helping people get even more insights and recommendations from the data from their Fitbit and Pixel devices. This model is being fine-tuned to deliver personalized coaching capabilities, like actionable messages and guidance, that can be individualized based on personal health and fitness goals. For example, this model may be able to analyze variations in your sleep patterns and sleep quality, and then suggest recommendations on how you might change the intensity of your workout based on those insights.
This model is being built on Gemini models and fine-tuned on a de-identified, diverse set of health signals from high-quality research case studies. The studies are being collected and validated in partnership with accredited coaches and wellness experts, enabling the model to exhibit profound reasoning capabilities on physiological and behavioral data. For example, we’re testing performance using sleep medicine certification exam-like practice tests, and are already seeing that our model currently performs well. We'll continue to iterate and learn as we build this new Personal Health Large Language Model, and will share more research soon.
Better understanding the assistive capabilities of AI
Generative AI is already working as an assistive tool for clinicians, helping them with administrative tasks like documentation that typically takes up hours of time. We’re not stopping there – we’re now building on our work with partners to explore what’s possible.
Earlier this year, we introduced AMIE (Articulate Medical Intelligence Explorer), a research AI system built on a LLM and optimized for diagnostic reasoning and clinical conversations. We explored the performance of a LLM by simulating text-based consultations with patient actors, adapting a well-known framework of “Objective Structured Clinical Examinations” to a consumer-facing user interface. In a randomized comparison with real primary care clinicians performing the same simulated text consultations, appropriately trained LLM rated higher than or on par with these consultations when measured for traits like diagnostic accuracy, empathy and helpful explanation.
Our next step is to test this with a healthcare organization to see how a LLM like AMIE can be helpful in supporting clinical conversations. We hope to learn how clinicians and patients perceive its use within the care experience with oversight from medical professionals.
Healthcare presents some of society’s most complex challenges. We’re working alongside our partners to see how AI might overcome them and improve care.