Undergraduate students “pick up the gauntlet” to predict the 2021 federal election

October 8, 2021 by Eran Vijayakumar

Just in time for the 2021 Canadian election, the Canadian Election Study, CANSSI Ontario, the Faculty of Information and the Department of Statistical Science challenged students to predict the winners of Canada’s 338 federal ridings. The competition invited teams to use innovative strategies that combined statistical modelling with political understanding.

Third-year statistics student Johnson Vo predicted the most correct seats, and the team with the most exciting approach was that of Eric Zhu, Brian Diep, Ashely (Jing Yuan) Zhang, Kristin (Xi Yu Huang) and Tanvir Hyder. On October 1, 2021, the winning teams presented their work at the Toronto Data Workshop, a weekly event series that welcomes everyone with an interest in data science.

Vo came impressively close by predicting a Liberal government with 188 seats vs. the 160 seats the Liberal party ended up winning. Using census results, the model analyzed each riding’s gender, age, and education data. Vo’s advice to future students is to “start by branching out and thinking about all the different options they have and see which one is the most realistic and most practical for their purposes.”

The winning team for “most exciting approach” consisted of Eric Zhu, Brian Diep, Ashely (Jing Yuan) Zhang, Kristin (Xi Yu Huang) and Tanvir Hyder. They also predicted a Liberal majority government. Their approach involved examining a variety of tools and varying their model as new challenges arose.

“One might want the most complicated model possible, thinking that will provide the best result,” says Rohan Alexander, an assistant professor at the Department of Statistical Sciences and the Faculty of Information, and one of the judges in this competition. “However, often what we find in practice is that trading away some of that complication is worth doing if it means we are able to exchange it for things like faster iteration, or a better understanding of the model.”

While the team’s model focused on simplicity, the data proved to be extremely complex. The model‘s input predictors included province, age, highest education level, household income, homeownership status, English or French language spoken, immigration status and age category by decade. In the end, their model included over 7.7 million rows of data.

“Both teams were absolutely incredible and speak to the high-quality of the undergraduates that we have at the Department of Statistical Sciences,” says Alexander.

The Toronto Data Workshop: a data science forum for everyone

The Toronto Data Workshop (TDW) is a weekly series of data science workshops started by Assistant Professor Rohan Alexander and Professor Kelly Lyons. Both recognized the need for a forum to discuss best practices in data science not covered at conferences or in academic papers. Academic papers typically tend to cover high-level ideas without focusing on common data handling challenges such as inconsistent date formats – a common issue for data scientists.

Since late 2019, the community has grown to over 1,000 students and professionals from academia and industry.

“When the pandemic hit and we shifted online, one of the things that surprised me was the interest from folks outside of the University of Toronto,” says Alexander. “We grew rapidly in various workplaces and on social media. Roughly 50 per cent of our participants are now outside the University. It’s been lovely to get to know folks doing data science in academia, industry and government in Toronto, and even more broadly,” says Alexander.

Aside from sharing knowledge and informal learning, the TDW is also a great place for students to advance their careers with at least two students scoring jobs through their participation in the workshop.

It’s also a great opportunity to learn and make connections for anyone who doesn't have a typical data science background but an interest in the field, says Alexander.

“Data science is multi-disciplinary and increasingly critical. That’s why it must reflect our world. There is a pressing need for a diversity of backgrounds, approaches and disciplines in data science. I hope that the workshop plays a small role in providing a community for everyone."