Informatics Professor Kai Zheng Helps UCI Leverage Health Data
UCI’s Institute for Clinical and Translational Science (ICTS) is part of a nationwide Clinical and Translational Science Awards program aimed at advancing scientific discovery and medical breakthroughs, and the ICTS Center for Biomedical Informatics (CBMI) is a key enabler of this work. CBMI leverages informatics to transform how biomedical data is collected, shared and analyzed, helping researchers in health sciences realize the potential of big data.
“There has always been a high demand for medical data, particularly patient records and imaging data,” says Kai Zheng, a professor of informatics in the Donald Bren School of Information and Computer Sciences (ICS) who serves as the director of CBMI. “This year, we have made a lot of progress in providing access to our health system data [and] establishing UCI as a place where health and data science really get together and produce something magnificent.”
Over the past five months, CBMI has not only helped make UCI’s health data more accessible but has also held a series of virtual workshops to ensure UCI researchers understand how to access that real-world data and apply it to their work.
De-Identified Electronic Health Records Data
Last summer, Zheng started collaborating with UCI Health and the Susan and Henry Samueli College of Health Sciences on a new project involving UCI’s De-Identified Clinical Data Warehouse (DeID CDW). Professor Steve Goldstein, vice chancellor for health affairs; Professor Pramod Khargonekar, vice chancellor for research; and Tom Andriola, vice chancellor for information technology and data were put in charge of collaboration, while Zheng was tasked with setting up the health data science infrastructure.
“Basically, we now have a clinical data warehouse that is readily accessible to researchers, which contains de-identified clinical data of all patients seen at UCI Health since 2009,” says Zheng. “It’s a huge data set that contains over 700,000 patients and all of their past medical history from diagnosis to lab tests to medications.”
In April, Zheng helped CBMI host a virtual workshop for clinical and translational researchers, as well as for data scientists, on how to access the data. One of the advantages of using de-identified data is that researchers don’t need to obtain individual Institutional Review Board (IRB) approval to use the data in their work. “Anybody — any UCI student or faculty — can use the data,” says Zheng. “They just need to sign an agreement without the need to have an IRB.”
The workshop presented an overview of the DeID CDW and tutorials on how to access it through a protected virtual desktop environment. Real-world examples illustrated how to write queries to retrieve data from the DeID CDW and use it in statistical or machine learning software tools. “A significant number of researchers, many from ICS, are now using that data,” notes Zheng.
The DeID CDW is part of a larger, UC-wide effort that aims to provide researchers with easy access to de-identified EHR data contributed by all UC medical campuses.
UC COVID Research Data Set
As the global pandemic became more widespread in the spring, the CBMI also assisted in contributing UCI data to a UC COVID Research Data Set (UC CORDS). Coordinated by the UC Office of the President (UCOP), the data set contains SARS-CoV-2 testing results and inpatient COVID-19 treatment information from across UC Health, from medical providers at UCI as well as UC Davis, UC San Diego, UCLA and UCSF.
“All of the UCs submit data weekly to a central place, the UC Health Data Warehouse, so I’ve been involved with that for a while and the COVID data is being derived based on that centralized database,” explains Zheng. “They send the data to each campus, and I’m in charge of making that data available to UCI researchers.”
The data set currently has records from approximately 150,000 patients, and the data is accessible to all UC researchers; researchers affiliated with the UCs can request access through the UCOP as well.
In June, CBMI set up another virtual workshop, this one focused solely on helping UCI faculty, students and fellows use the CORDS data to answer COVID-related research questions. “It is also available to engineering and ICS faculty and students conducting large-data machine learning research using medical records,” says Zheng.
National Institutes of Health “All of Us” Data
The National Institutes of Health (NIH) All of Us Research Program has been collecting health data and biospecimens from participants nationwide for a couple of years now, and UCI is one of the Healthcare Provider Organizations (HPOs) funded by the program. The program aims to gather information from at least 1 million people living in the U.S. as part of an effort to accelerate health research and medical breakthroughs. UCI’s involvement means that UCI researchers have access to the data, which currently includes surveys, physical measurements and electronic health records from 250,000 participants.
In early August, CMBI collaborated with the UCI All of Us Research Program to host a virtual workshop designed to help UCI researchers access and use the All of US data and research hub, including the data browser and researcher workbench.
“Understanding how to access this gigantic dataset collected through the NIH All of Us Research Program,” says Zheng, “and learning how to apply the data is very important to helping us become a productive place for using data-driven methods to solve healthcare problems.”
— Shani Murray