The widespread adoption of electronic health records (EHR) and their subsequent linkage to specimen biorepositories has generated massive amounts of routinely collected medical data for use in translational research. These integrated data sets have potential to drive real-world predictive modeling of disease risk and progression. However, the analysis of EHR data remains both practically and methodologically challenging due to data heterogeneity and quality issues. In this talk, I will discuss methods that bridge classical statistical theory and modern machine learning tools to help extract reliable insights from imperfect EHR data. I will focus primarily on (i) the challenges in obtaining annotated outcome data from patient records and (ii) how leveraging unlabeled examples to improve model estimation and evaluation can reduce the annotation burden.
Jesse Gronsbell is an Assistant Professor in the Department of Statistical Sciences at the University of Toronto. Her primary interest is in the development of statistical methods for modern digital health data sources as electronic health records and mobile health data. Prior to joining U of T, Jesse spent a couple of years as a data scientist in the Mental Health Research and Development Group at Alphabet’s Verily Life Sciences.