Publications: Computational Statistics

Adaptive Component-wise Multiple-Try Metropolis

by JinyoungYang, Evgeny Levi, R.V. Craiu and J.S. Rosenthal

Journal of Computational and Graphical Statistics | 2018

Short Summary: Adaptive MCMC for targets with irregular characteristics

Read more



Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

by Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, and David Duvenaud

International Conference on Learning Representations | 2018

Short Summary: We learn low-variance, unbiased gradient estimators for any function of random variables. We backprop through a neural net surrogate of the original function, which is optimized to minimize gradient variance during the optimization of the original objective. We train discrete latent-variable models, and do continuous and discrete reinforcement learning with an adaptive, action-conditional baseline.

Read more



Global Non-convex Optimization with Discretized Diffusions

by Murat A. Erdogdu, Lester Mackey, and Ohad Shamir

Advances in Neural Information Processing Systems | 2018 (to appear)

Short summary: An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.

Read more



Neural Ordinary Differential Equations

by Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud

Advances in Neural Information Processing Systems | 2018

Short Summary: We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

Read more



Scalable Approximations for Generalized Linear Problems

by Murat A. Erdogdu, Mohsen Bayati, and Lee H. Dicker

Journal of Machine Learning Research | 2018 (to appear)

Short Summary: In stochastic optimization, the population risk is generally approximated by the empirical risk. However, in the large-scale setting, minimization of the empirical risk may be computationally restrictive. In this paper, we design an efficient algorithm to approximate the population risk minimizer in generalized linear problems such as binary classification with surrogate losses and generalized linear regression models. We focus on large-scale problems, where the iterative minimization of the empirical risk is computationally intractable, i.e., the number of observations $n$ is much larger than the dimension of the parameter $p$, i.e. $n \gg p \gg 1$. We show that under random sub-Gaussian design, the true minimizer of the population risk is approximately proportional to the corresponding ordinary least squares (OLS) estimator. Using this relation, we design an algorithm that achieves the same accuracy as the empirical risk minimizer through iterations that attain up to a quadratic convergence rate, and that are computationally cheaper than any batch optimization algorithm by at least a factor of $\mathcal{O}(p)$. We provide theoretical guarantees for our algorithm, and analyze the convergence behavior in terms of data dimensions.

Read more



Stability of Adversarial Markov Chains, with an Application to Adaptive MCMC Algorithms

by R.V. Craiu, L. Gray, K. Latuszynski, N. Madras, G.O. Roberts, and J.S. Rosenthal

Annals of Applied Probability | 2015 | Vol. 25(6), pp. 3592-3623

Short Summary: Provides a simple way to verify the correct convergence of adaptive MCMC algorithms, thus opening up new avenues for computational progress and accurate estimation.

Read more