Beyond Curve Fitting: Transfer Learning and Causal Reasoning

When and Where

Thursday, October 24, 2019 3:30 pm to 4:30 pm
Room 409
Stewart Building
149 College Street, Toronto, ON M5T 1P5


Xu Shi, University of Michigan


The growing availability of administrative healthcare data such as the electronic health record (EHR) data is opening new opportunities for research. EHRs are routinely collected longitudinal data containing demographics, medical diagnosis and procedure, medication, immunization, laboratory test results, radiology images, vital signs, and billing information. Ongoing efforts have been made to integrate large scale EHR data across healthcare systems, as well as to link EHR data with biobank, insurance claims, registries, and death indices. With such cost-effective data sources, the health of an individual is now characterized with unprecedented precision and depth, facilitating contemporary research that make the transition from data to knowledge.

However, EHR data are not collected for research purposes. Critical issues such as data quality, system heterogeneity, unmeasured and mismeasured confounding, high-dimensional covariates, and patient privacy concerns naturally arise. In this talk, I will detail the problem of inconsistent “languages” used by different healthcare systems, which limits the transportability of phenotyping algorithm and statistical method across different systems. I will present an automated data quality control and harmonization pipeline that aim to overcome this challenge. I will also present a tailored causal inference method that leverages the unique pool of information in EHR data to mitigate unmeasured confounding.

Please register for the event.

About Xu Shi

Xu Shi is an Assistant Professor in the Department of Biostatistics at University of Michigan. Her research focuses on developing novel statistical methods that provide insights from high volume and high variability administrative healthcare data such as the electronic health records (EHR) data.

She is particularly interested in developing causal inference methods tailored to EHR data, automated knowledge extraction, data harmonization across healthcare systems, post-marketing drug safety surveillance, and high-throughput comparison of healthcare utilization.


149 College Street, Toronto, ON M5T 1P5