Data and Informatics Programs

Advanced Phenotyping for Precision Health

Phenotyping refers to the characterization of a population of interest based on clinical, behavioral, social, economic, or other non-genotypic features. For precision health, phenotypes imply a well-specified characterization of a patient cohort of interest (e.g., patients with a particular disease or having received a particular treatment) using explicit concepts such diagnostic codes, medications codes, laboratory values, or other standardized data. The other major effort is the development of deep phenotyping, which reflects the rapid growth in diversity of data types (e.g., sensor data) as well as machine learning techniques that can cluster patients into a common phenotype.

Clinical Decision Support

Precision health will soon exceed human cognitive capacity by orders of magnitude, removing any doubt that health care’s future is entwined with computers and software. We will leverage electronic health records (EHRs) with integrated clinical decision support (CDS) will be vital to achieving important goals of this proposal, including the creation of a Precision Health cohort, expanding the Biobank with subjects who meet particular study criteria, creating a research pipeline to medical providers across Indiana, and catalyzing highly competitive precision medicine research.

Functional Annotation of Mutations

People’s genomes differ at millions of sites; interpreting how this variation affects phenotypes and disease risk is extremely challenging. Given the increased capacity for identifying genetic variants using high-throughput sequencing technology, a lack of computational and experimental approaches in interpreting genetic variation data is already a bottleneck in Precision Health. This focus will have the following emphases: Coding and Noncoding Variant Function Prediction (understanding the function of biological macromolecules and how they impact organismal phenotypes, and a focus on variant functions that controls gene expression and splicing, and Genetic Basis of Mutation Rate Variation Determination (De novo mutations are passed from parents to offspring are the source of hereditary diseases, while somatic mutations occurring during an individual’s lifetime are the cause of cancer.

Systems Biology and System Pharmacology

Complex diseases have more than one molecular mechanism. This heterogeneity exists not only in a disease population, but also in a single patient. Using omics data, the heterogeneous disease mechanisms can be characterized at the system level, i.e. “systems biology.” This systems biology theme has an enormous impact on genomic medicine, cell and gene therapy, and chemical biology components, through the following innovative projects: Cancer Systems Pharmacology for Single Drug and Drug Combinatory Effect Predictions (systems pharmacology models to predict single drug and drug combinatory effects), and Integrative Multi-Omics Analysis in Cancer Genomics an expansion of proteomic and glycomic data analyses for novel drug target and pathway identification).

Advanced Medical Records Mining Using Statistical Relational Learning

Our grand vision is to use our research in statistical artificial intelligence to build clinical decision support systems. There are two key research questions for the clinical decision support system: (1) To investigate the presentation of knowledge to the clinician (say, at a point of care), we need to consider what the algorithms can effectively learn as well as what would be useful to the clinician. (2) To determine what can we learn from this high-dimensional medical record data. Our goal is to develop robust and interpretable learning algorithms and to provide trustworthy support and understandable decisions. We will also develop and characterize algorithms that can learn and reason from clinical, genomic, and lifestyle data combined.

Privacy, Security and High-Performance Computing

High throughput data, such as next generation sequencing (NGS), are critical for precision medicine. We will apply novel privacy-preserving techniques to analyze and share sensitive human genomic data without undermining participants’ privacy. We will use novel technologies to develop a practical (parallelizable) security platform. We will design and prototype the necessary hardware and software for an optimized data management system and a closely linked system for data analytics.

Biomedical Informatics Ph.D. Training Program

The mission of this Ph.D. program is to train next generation informaticians in academia as well as the health care, biotech, and pharmaceutical industries, who will go into precision health related research and service. This program will take advantage of existing informatics strengths at Indiana University, such as genomic variant function prediction, system pharmacology, data mining, phenotyping, and clinical decision system. This program will also grow under the emerging new areas such as chemical informatics and immunology bioinformatics. This program will fill the needs for the translational research in the precision medicine. In particularly, this program needs to support the bioinformatics expertise in precision medicine clinics and basic science research at Indiana University.

Informatics Infrastructure

The precision health initiative will create a novel infrastructure that is: (1) flexible to manage a wide array of data of many different formats described in different ways; (2) powerful and able to scale to manage large data sets and complex analyses; (3) accessible by investigators, care providers, and patients; (4) integrative to support broad new applications and uses as described in earlier sections. The informatics Infrastructure will have the following components:

  • The Precision Health Data Commons. Based on a high throughput spinning disk array, the data commons will be the infrastructure for collecting, describing, and managing data that are collected by PH activities, be they contributed by researchers, care providers, or patients. It will be architected as a Virtual Logical Data Warehouse that allows data to be brought in and described in their native format, with the ability to create additional descriptive layers and logical linkages amongst diverse data sets for integrated analyses.
  • The Precision Health Integration and Analytics Platform. Composed of high capacity compute nodes and connected to the Data Commons at high speed, this computational array will support data integration and analytics, software applications, and workflows required for PHI research.
  • Portals and Patient Engagement. To support the integration of communities of researchers, care providers, and patients for precision health, the Infrastructure Program will create means by which these communities can easily engage in the precision health program and collaborate. This may include provisioning of patient data back to care providers, collection of patient contributed data, and the ability of patients to engage more meaningfully in collection and validation of their data, participation in PH research or care, and more effective engagement with them regarding our research and care efforts.