Xunyu Zhou: Reward-Directed Score-Based Diffusion Models via q-Learning 

When and Where

Thursday, October 30, 2025 11:00 am to 1:00 pm
9014
9th Floor, 700 University Ave, Toronto, Ontario, M5G 1Z5

Speakers

Xunyu Zhou, Columbia University

Description

 

Reward-Directed Score-Based Diffusion Models via q-Learning 

We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Unlike most existing studies, ours does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions, nor does it attempt to learn the score functions. Instead, we formulate the problem as entropy-regularized continuous-time RL and show that the optimal stochastic policy has a Gaussian distribution with a known covariance matrix. Based on this result, we parameterize the mean of Gaussian policies and develop an actor–critic type (little) q-learning algorithm to solve the RL problem. A key ingredient in our algorithm design is to obtain noisy observations from the unknown score function via a ratio estimator. Our formulation can also be adapted to solve pure score-matching and fine-tuning pretrained models. Numerically, we show the effectiveness of our approach by comparing its performance with two state-of-the-art RL methods that fine-tune pretrained models on several generative tasks including high-dimensional image generations. Finally, we discuss extensions of our RL formulation to probability flow ODE implementation of diffusion models and to conditional diffusion models.  Joint work with Xuefeng Gao and Jiale Zha.

BIO: Xunyu Zhou is the Liu Family Professor of Industrial Engineering and Operations Research at Columbia University in New York. Before joining Columbia, he was the Nomura Professor of Mathematical Finance, the Director of Nomura Center for Mathematical Finance and the Director of Oxford-Nie Financial Big Data Lab at University of Oxford during 2007-2016, and Choh-Ming Li Professor of Systems Engineering and Engineering Management at The Chinese University of Hong Kong during 2013-2014.

He is well known for his work in indefinite stochastic LQ control theory and application to dynamic mean—variance portfolio selection, in asset allocation and pricing under cumulative prospect theory, and in general time inconsistent problems. His current research focuses on reinforcement learning for controlled diffusion processes and applications to generative AI and intelligent wealth management solutions. He directs the Nie Center for Intelligent Asset Management, a research center funded by a FinTech company, at Columbia.  He has addressed the 2010 International Congress of Mathematicians, and has been awarded the Wolfson Research Award from The Royal Society (UK), the Outstanding Paper Prize from the Society for Industrial and Applied Mathematics, the Humboldt Distinguished Lecturer, the Alexander von Humboldt Research Fellowship, the Archimedes Lecturer at Columbia, and Distinguished Faculty Teaching Award at Columbia University.  He is both an IEEE Fellow and a SIAM Fellow.

Professor Zhou received his Ph.D. in Operations Research and Control Theory from Fudan University in China in 1989.

Map

9th Floor, 700 University Ave, Toronto, Ontario, M5G 1Z5