Modern genetics datasets continue to increase in size and complexity, with publicly accessible sources such as the UK Biobank now offering information on hundreds of thousands of subjects and tens of thousands of outcomes. In these settings, the challenges of detecting weak and sparse signals and accounting for arbitrary correlations structures are well-known. Thus, researchers have increasingly relied on set-based inference strategies as an alternative to testing individual features. Set-based tests possess obvious advantages including lowered multiple testing burdens, the ability to aggregate multiple small effects into a stronger signal, and possibly improved interpretations. However, the unique characteristics of varied genetics settings often require bespoke statistical development. This talk will present some general strategies for rare-weak inference as well as approaches for time-to-event data, situations where the global null is not the null hypothesis of interest, and switching between testing multiple explanatory factors and multiple outcomes.
Ryan Sun is an Assistant Professor in the Department of Biostatistics at the University of Texas MD Anderson Cancer Center. He received his PhD in Biostatistics from Harvard University in 2017 and joined MD Anderson in 2019. His research interests lie in developing novel statistical methodology that enables researchers to extract knowledge and insights from increasingly complex biomedical datasets. He also emphasizes applying these methods and disseminating the tools to the broader biomedical research community.