Statistics and Computing | 2022 | accepted
One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic (ARCH) process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem’s conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.
Journal of Theoretical Probability | 2023 | accepted
This paper gathers together different conditions which are all equivalent to geometric ergodicity of time-homogeneous Markov chains on general state spaces. A total of 34 different conditions are presented (27 for general chains plus 7 for reversible chains), some old and some new, in terms of such notions as convergence bounds, drift conditions, spectral properties, etc., with different assumptions about the distance metric used, finiteness of function moments, initial distribution, uniformity of bounds, and more. Proofs of the connections between the different conditions are provided, somewhat self-contained but using some results from the literature where appropriate.
By G.O. Roberts and J.S. Rosenthal
Canadian Journal of Statistics | 2023 | accepted
This paper considers the challenge of designing football group draw mechanisms which have the uniform distribution over all valid draw assignments, but are also entertaining, practical, and transparent. Although this problem is trivial in completely symmetric problems, it becomes challenging when there are draw constraints which are not exchangeable across each of the competing teams, so that symmetry breaks down. We explain how to simulate the FIFA Sequential Draw method, and compute the non-uniformity of its draws by comparison to a uniform Rejection Sampler. We then propose two practical methods of achieving the uniform distribution while still using balls and bowls in a way which is suitable for a televised draw. The solutions can also be carried out interactively. The general methodology we provide can readily be transported to different competition draws and is not restricted to football events.
By Yu-Jui Huang, and Yuchong Zhang
Journal of Machine Learning Research | 2023 | 24: 1-40
This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. A main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data dis- tribution emerges as the long-time limit. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, the simulation of the ODE is shown equivalent to the training of generative adversarial networks (GANs). This equiva- lence provides a new “cooperative” view of GANs and, more importantly, sheds new light on the divergence of GANs. In particular, it reveals that the GAN algorithm implicitly minimizes the mean squared error (MSE) between two sets of samples, and this MSE fitting alone can cause GANs to diverge. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, by the Crandall-Liggett theorem for differential equations in Banach spaces. Based on this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, using Trevisan’s superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.
By T.-K. L. Wong, and J. Zhang
IEEE Transactions on Information Theory | 2022 | 68 (8), 5353 - 5373
Tsallis and Rényi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the q-exponential family which has applications in non-extensive statistical physics, information theory and statistics. We show that a generalized λ -duality, where λ=1−q is to be interpreted as the constant information-geometric curvature, leads to a generalized exponential family which is essentially equivalent to the q-exponential family and has deep connections with Rényi entropy and optimal transport. Using this generalized convex duality and its associated logarithmic divergence, we show that our λ-exponential family satisfies properties that parallel and generalize those of the exponential family. Under our framework, the Rényi entropy and divergence arise naturally, and we give a new proof of the Tsallis/Rényi entropy maximizing property of the q-exponential family. We also introduce a λ-mixture family which may be regarded as the dual of the λ-exponential family, and connect it with other mixture-type families.
by Jun Yang, and Jeffrey S. Rosenthal
Annals of Applied Probability | 2022 (accepted) | To appear.
Short Summary: This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional settings. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the ``large sets'', and are chosen to rule out some ``bad'' states which have poor drift property when the dimension of the state space gets large. Using the ``large sets'' together with a ``fitted family of drift functions'', a quantitative bound can be obtained which can be translated into a tight complexity bound. As a demonstration, we analyze several Gibbs samplers and obtain complexity upper bounds for the mixing time. In particular, for one example of Gibbs sampler which is related to the James--Stein estimator, we show that the number of iterations required for the Gibbs sampler to converge is constant under certain conditions on the observed data and the initial state. It is our hope that this modified drift-and-minorization approach can be employed in many other specific examples to obtain complexity bounds for high-dimensional Markov chains.
by Gareth O. Roberts, Jeffrey S. Rosenthal, Nick Tawn
Journal of Applied Probability | 2021 (accepted) | To appear.
Short Summary: Simulated tempering is a popular method of allowing MCMC algorithms to move between modes of a multimodal target density. The paper [TawnEtAl2020] introduced the Annealed Leap-Point Sampler (ALPS) to allow for rapid movement between modes. In this paper, we prove that, under appropriate assumptions, a suitably scaled version of the ALPS algorithm converges weakly to skew Brownian motion. Our results show that under appropriate assumptions, the ALPS algorithm mixes in time O(d [log d]^2) or O(d), depending on which version is used.
North American Actuarial Journal | 2021 (forthcoming)
Short Summary: This paper proposes a transformed Gamma logit-weighted mixture of experts (TG-LRMoE) model for severity regression.
by Marcel Nutz, and Yuchong Zhang
Annals of Applied Probability | 2020 | 30(4), 1669-1692
Short Summary: Inspired by recent work of P.-L. Lions on conditional optimal control, we introduce a problem of optimal stopping under bounded rationality: the objective is the expected payoff at the time of stopping, conditioned on another event. For instance, an agent may care only about states where she is still alive at the time of stopping, or a company may condition on not being bankrupt. We observe that conditional optimization is time-inconsistent due to the dynamic change of the conditioning probability and develop an equilibrium approach in the spirit of R. H. Strotz’ work for sophisticated agents in discrete time. Equilibria are found to be essentially unique in the case of a finite time horizon whereas an infinite horizon gives rise to nonuniqueness and other interesting phenomena. We also introduce a theory which generalizes the classical Snell envelope approach for optimal stopping by considering a pair of processes with Snell-type properties.
by Soumik Pal, and Ting-Kam Leonard Wong
Probability and Related Fields | 2020 | 178, 613–654
Short Summary: We study the Dirichlet optimal transport problem on the unit simplex which can be regarded as a multiplicative analogue of the Wasserstein transport.
by Peter Baxendale, and Ting-Kam Leonard Wong
Annals of Applied Probabiltiy | 2021 (Accepted)
Short Summary: We construct and study probability measures supported on spaces of concave functions. The random concave functions are constructed by taking a suitably scaled (mollified, or soft) minimum of random hyperplanes. Depending on the regime of the parameters, we show that as the number of hyperplanes tends to infinity there are several possible limiting behaviors.
by Jin Ma, Ting-Kam Leonard Wong, and Jianfeng Zhang
Mathematics of Operations Research | 2021
Short Summary: When the underlying probability distorted by a weighting function, we construct a nonlinear conditional expectation such that the tower property remains valid. This construction is of interest in time-inconsistent stochastic optimization problem.
by H. Duanmu, J.S. Rosenthal, and W. Weiss
Memoirs of the American Mathematical Society | 2019 (to appear)
Short Summary: Provides an alternative approach to proving convergence of Markov chains to stationary distributions, allowing for greater applicability including for more general MCMC algorithms.
by Soumik Pal and Ting-Kam Leonard Wong | 2018
Annals of Probability | Vol. 46, Number 2 (2018), 1070-1113
Short Summary: This paper uncovers deep connections between optimal transport and information geometry. It develops the dual geometry of L-divergence which extends the classical Bregman divergence. Our geometry can be applied to determine the optimal rebalancing frequency of portfolios.
Annals of Applied Probability | 2015 | Vol. 25(6), pp. 3592-3623
Short Summary: Provides a simple way to verify the correct convergence of adaptive MCMC algorithms, thus opening up new avenues for computational progress and accurate estimation.