Semi-Supervised Inference with Large and High Dimensional Data: A Semi-Parametric Perspective

Add to Calendar

When and Where

Monday, February 25, 2019 11:00 am to 12:00 pm

Room SS2125

Sidney Smith Hall

100 Saint George Street Toronto, ON M5S 3G3

Speakers

Abhishek Chakrabortty (Department of Statistics, The Wharton School, University of Pennsylvania)

Description

The abundance of large and complex datasets in the current big data era has also created a host of novel statistical challenges for properly harnessing such rich (but often incomplete) information. One such challenge includes statistical inference in semi-supervised (SS) settings, where apart from a moderate sized supervised data (L), one also has a much larger sized unsupervised data (U) available. Such datasets arise naturally when the response, unlike the covariates, is difficult and/or expensive to obtain, a frequent scenario in modern studies involving large databases, including biomedical data like electronic health records (EHR). It is natural to investigate whether and how the information from U can be exploited to improve efficiency over a given supervised approach.

In this talk, I will consider SS inference for a class of standard Z-estimation problems. I will discuss first the subtleties and associated challenges that necessitate a semi-parametric perspective. I will then demonstrate a family of SS Z-estimators that are robust and adaptive, thus ensuring that they are always as efficient as the supervised estimator and more efficient (optimal in some cases) when the information from U actually relates to the parameter of interest. These properties are crucial for advocating ‘safe’ use of the unlabeled data U and are often left unaddressed. Our framework provides a much needed unified understanding of these problems. Multiple EHR data applications are also presented to exhibit the practical benefits of our estimator. In the later part of the talk, I consider SS inference in high dimensional settings, and demonstrate the remarkable benefits the unlabeled data provides in seamlessly obtaining a family of SS estimators with asymptotic linear expansions, without directly requiring any sparsity conditions or debiasing needed in supervised settings. This, in particular, facilitates high dimensional inference under minimal assumptions.

Map

100 Saint George Street Toronto, ON M5S 3G3

Universal Navigation

Universal Navigation2

Main menu

Semi-Supervised Inference with Large and High Dimensional Data: A Semi-Parametric Perspective

When and Where

Speakers

Description

Map

Categories

Audiences

Footer Main-Menu

Footer Secondary Menu

Contact Us

Footer Accessibility Menu

Universal Navigation

Universal Navigation2

Main menu

Search form

Semi-Supervised Inference with Large and High Dimensional Data: A Semi-Parametric Perspective

When and Where

Speakers

Description

Map

Categories

Audiences