Yihong Wu: Besting Good-Turing for probability estimation over large domains
When and Where
Speakers
Description
Besting Good-Turing for probability estimation over large domains
When faced with a small sample from a large universe of possible outcomes, scientists often turn to the venerable Good-Turing estimator. Despite its pedigree, however, this estimator comes with considerable drawbacks, such as the need to hand-tune smoothing parameters and the lack of a precise optimality guarantee. We introduce a parameter-free estimator that bests Good-Turing in both theory and practice. Our method marries two classic ideas, namely Robbins’s empirical Bayes and Kiefer-Wolfowitz non-parametric maximum likelihood estimation (NPMLE), to learn an implicit prior from data and then convert it into probability estimates. We prove that the resulting estimator attains the optimal instance-wise risk up to logarithmic factors in the competitive framework of Orlitsky and Suresh, and that the Good-Turing estimator is strictly suboptimal in the same framework. Our simulations on synthetic data and experiments with English corpora and U.S. Census data show that our estimator consistently outperforms both the Good-Turing estimator and explicit Bayes procedures. This is based on joint work with Yanjun Han (NYU), Jonathan Niles-Weed (NYU) and Yandi Shen (CMU), available at https://arxiv.org/abs/2509.07355
BIO: Yihong Wu is James A. Attwood Professor and Department Chair of Statistics and Data Science at Yale University. He received his B.E. degree from Tsinghua University in 2006 and Ph.D. degree from Princeton University in 2011. He was a postdoctoral fellow with the Statistics Department in The Wharton School at the University of Pennsylvania from 2011 to 2012 and an assistant professor in the Department of ECE at the University of Illinois at Urbana-Champaign from 2013 to 2016. His research interests are in the theoretical and algorithmic aspects of high-dimensional statistics, information theory, and optimization. He was elected an IMS fellow in 2023 and was a recipient of the NSF CAREER award in 2017 and the Sloan Research Fellowship in Mathematics in 2018.