With the increasing availability of electronic health records (EHR) data, it is important to effectively integrate evidence from multiple data sources to enable reproducible scientific discovery. However, we are still facing practical challenges in data integration, such as protection of data privacy, the high dimensionality of features, and heterogeneity across different datasets. Aim to facilitate efficient multi-institutional data analysis without sharing IPD, we developed a toolbox of Privacy-preserving Distributed Algorithms (PDA) that conduct distributed learning and inference for various models, such as logistic regression, Cox model, Poisson model, and more. Our algorithms do not require iterative communication across sites and are able to account for heterogeneity across different hospitals. In addition, PDA outperforms meta-analysis methods in many settings such as pharmacovigilance applications. The validity and efficiency of PDA are also demonstrated with real-world use cases in Penn Medicine Biobank (PMBB), Observational Health Data Sciences and Informatics (OHDSI) and a Pediatric Learning Health System (PEDSnet).
Please join the event.
About Yong Chen
Dr. Yong Chen is a tenured Associate Professor of Biostatistics at the University of Pennsylvania. His research has been focusing on statistical inference under non-standard conditions, robust inference, variations of likelihoods, evidence synthesis and data integration. In recent years, he has been working on developing novel federated learning algorithms and distributed inference, with applications to integrating electronic health records data and biobank data across institutes. He is an elected fellow of the American Statistical Association in 2020 and International Statistical Institute in 2018, and an elected member of the society of research synthesis method in 2018. His research has been continuously funded by NIH, PCORI and AHRQ.