Simplifying Multi-modal & Multi-omics Analysis with AWS for Health

August 8, 2022 Stephanie Black

New AWS for Health Guidance: Multi-modal and Multi-omics

The new era of personalized health relies on data to guide more customized patient treatments, therapeutics, and diagnoses. Genomics sits at the core of personalized health, and by taking into account the individual variability among people and diseases, clinicians can create more personalized care journeys and targeted treatments. Across clinical and research disciplines, combining and analyzing different modalities including multiple molecular data types and imaging data is powering a more holistic view of patients and more robust insights into an area of study.

A great example of this is the work being done by Philips to incorporate multi-modal data into its Philips Healthsuite Platform, which was recently presented at the 2022 AWS Industry Innovators: Healthcare & Life Sciences event. To help determine the best treatment options on an individualized-basis, Philips created a platform on AWS that integrated different modalities of medical data involved in cancer treatments, including genomic, imaging, digital pathology, and clinical data. As a result, leading healthcare organizations like MD Anderson Cancer Center can now run more data-driven, personalized oncology treatments and clinical trial matching.

While the promise of multi-modal and multi-omics is becoming evident, the integration and analysis of varying forms of structured and unstructured data poses a unique set of challenges, including:

  • Addressing influx of diverse data types and formats
  • Extracting insights from unstructured data, such as voice and imaging
  • Ingesting, normalizing, structuring, and formatting differing data types for consumption
  • Creating cohorts and defining relative data subsets

To reduce barriers for handling and analyzing multi-modal and multi-omics data, AWS for Health has released the new Guidance for Multi-omics and Multi-modal data Integration and Analysis on AWS.

It is a prescriptive deep dive on how to prepare genomic, clinical, mutation, expression, and imaging data for large-scale analysis, and perform interactive queries using The Cancer Genome Atlas (TGCA) and The Cancer Imaging Archive (TCIA) as an example dataset. The ETL code provided in this guidance can be customized to ingest and transform additional datasets.

This comprehensive guidance provides step-by-step instructions and recommendations for:

  • optimizing data formats and structures,
  • querying and accessing data from different sources with ease, and
  • integrating and analyzing genomics data together with other omics (for example, epigenomics, proteomics, transcriptomics, metabomics)
  • as well as other modalities of data (for example, X-rays, health records, recorded audio, wearables data).

Following the six pillars of the AWS Well-Architected Framework, the guidance is designed to help healthcare and life sciences organizations build a secure, resilient, and scalable environment in AWS. It directs how to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake.

The modern data architecture (Image 1) in this guidance demonstrates how to ingest common multi-omics data sets into a centralized data lake and work with that data using Amazon Athena and low-code Jupyter Notebooks. There are example ingestion pipelines for clinical, mutation, gene expression, and copy number data (TCGA), imaging metadata (TCIA), genomic variant calls data (1000 Genomes), annotation data (ClinVar), and an individual Variant Call File (VCF) data.

Image 1: AWS for Health Guidance: The Modern Data Architecture

Image 1: AWS for Health Guidance: The Modern Data Architecture

This guidance demonstrates how to:

  • Build, package, and deploy libraries
  • Provision serverless data ingestion pipelines for multi-modal data preparation and cataloging
  • Visualize and explore clinical data through an interactive interface
  • Run interactive analytics queries against a multi-modal data lake

This guidance was built in collaboration with AWS for Health featured consulting partner BioTeam. BioTeam is a scientific IT consulting company expert in applying strategies, advanced technologies, and IT services to solve the most challenging research, technical, and operational problems in the life sciences. They can help implement and customize this guidance to ingest customized datasets.

The full guidance is now available here: Guidance for Multi-Omics and Multi-Modal Data Integration and Analysis on AWS

Additional AWS Resources for Multi-modal and Multi-omics:

Previous Article
Executive Conversations: Future-proofing population genomics initiatives through federation with Thorben Seeger of Lifebit
Executive Conversations: Future-proofing population genomics initiatives through federation with Thorben Seeger of Lifebit

Population genomics initiatives amass a multitude of clinical, omics, and phenotypic data from diverse part...

Next Article
Scalable Medical Computer Vision Model Training with Amazon SageMaker Part 2
Scalable Medical Computer Vision Model Training with Amazon SageMaker Part 2

Introduction Training medical computer vision (CV) models requires a scalable compute and storage infrastru...