UCI and NYU Researchers Explore Machine Learning for Disease Risk Prediction with $1.3M NIH Grant
Padhraic Smyth, Chancellor’s Professor in the Computer Science and Statistics Departments in the Donald Bren School of Information and Computer Sciences (ICS), is co-principal investigator on a new research project award from the National Institutes of Health (NIH). The $1.3 million project is an interdisciplinary effort involving computer scientists, statisticians, and medical experts at both New York University and UC Irvine. UCI computer science Ph.D. students Yuxin Chang and Preston Putzel (a Student Fellow of the HPI Research Center in Machine Learning and Data Science at UCI) will be participating in this research. The four-year project will focus on the development of new machine learning and statistical methods for building predictive models that use electronic health record (EHR) data to predict individual patient risk over time for clinical outcomes such as incidence of cardiovascular disease.
The project is motivated by the challenges of using EHR data to build useful predictive models. There has been significant interest in clinical medicine over the past decade on using EHR data as the basis for risk prediction of chronic disease. Such data is appealing to researchers because it can offer large sample sizes, timely information, and a wealth of clinical information beyond that obtained from either health surveys or administrative data. However, while millions of patient records are included in large EHR records, the data are often biased in various ways. For example, there is often over-representation of individuals who have insurance and relatively easy access to healthcare resources. Biases are also implicit in the general characteristics of the patient populations that a specific hospital serves. These biases mean, for example, that the performance of a predictive model developed on data from a single hospital will often not generalize well to broader patient populations.
This joint NYU-UCI project will focus on addressing these types of systematic problems, combining statistical and machine learning approaches to develop risk prediction algorithms that are more robust and accurate relative to current approaches. The team will develop and evaluate their approaches using EHR and outcome data based on 12 million patient encounters from 20 different health institutions.