With a PhD in Economics from the Australian National University and published work in the Journal of Economic History, Rohan joined U of T in July 2020 as an assistant professor. Being jointly appointed to the Department of Statistical Sciences and the Faculty of Information, his contributions promise to impact both areas and will serve to further strengthen the ties between statistics and the iSchool. Rohan's research interests include trying to make political polling better, understanding and developing the role of ethics in data science, and improving statistical workflows. And with the founding of his incredibly popular Toronto Data Workshop and the recent launch of stipends and support for visible minorities, Rohan has already had an impact on UofT’s academic community, promising much more to come with the start of the academic year and Fall classes!
U of T Statistical Sciences: Welcome to the StatsSci team and congrats on your new role! What got you interested in U of T and the Stats Department?
I feel very lucky to be here and am looking forward to using my position to help others. U of T has an incredible number of faculties and departments that are extremely strong, so I am thrilled to have the opportunity to be a part of this community.
From a historical perspective, DoSS has been associated with some incredibly influential thinkers, like Ross Prentice and Rob Tibshirani, and that spirit continues through in the current faculty. As a department, DoSS builds on strong foundations and a philosophy of doing really innovative statistics that can be applied to areas of great importance.
The remarkable thing about U of T is that almost every faculty/department has this strong historical background. For instance, my other appointment – the Faculty of Information – has many people who changed the course of disciplines, and of course this is also the case further afield in areas such as CS, maths, law, and political science to name a few.
It’s certainly that opportunity to work between faculties/departments that attracted me. For instance, my formal role between the Faculty of Information and DoSS is to bring the two closer together in terms of teaching, research, and community. I’ve started that process through the Toronto Data Workshop -- which everyone is welcome to join, by the way -- and joint courses in Winter 2021, such as ‘Ethics and Data Science.’
Both DoSS and Information have such proud histories that I’m excited about what we can do together in the future.
[The Department of Statistical Sciences] has been associated with some incredibly influential thinkers, like Ross Prentice and Rob Tibshirani, and that spirit continues through in the current faculty.
When did you move to Toronto? Tell us about your family’s path from Australia to Canada!
My partner, Monica, who is a faculty member in DoSS and Sociology, and I are Australian; I grew up in the north – Queensland -- and Monica in the south -- Tasmania -- and we met while living in Sydney. We lived in the US for five years and have been in Toronto for the past two years. It certainly feels like home, and we love it here.
After moving to Toronto, I began a post-doc in the Faculty of Information, working with Professor Kelly Lyons. It was working with Professor Lyons for a year that made me realize that some combination of DoSS and Information was exactly where I wanted to be and I’m very grateful that it worked out.
How do you like Toronto and U of T so far?
I love Toronto and U of T and am very grateful for the opportunities that I have been given. The spirit of this place was certainly revealed when we shut down in March. Despite the considerable pressure on them, the students reacted remarkably well. The quality of work that was turned in was extremely high, and the number of students that would stay on extended Zoom calls to just have a chat was lovely. One of the challenges moving forward will be continuing that spirit even when we may not have met in person.
As a department, DoSS builds on strong foundations and a philosophy of doing really innovative statistics that can be applied to areas of great importance.
Speaking of the shutdown: what has life been like for you and Monica these past few months at home, managing the challenges of a tenure-track position with a toddler around?
The initial change was a bit of a shock, but we’ve gradually adapted. We don’t have a car, so there was initially a lot of walking up to our offices at U of T to ‘borrow’ chairs, computer screens, and books for our apartment. I promise I’ll return these! But after that just a lot of staying at home or walking the baby masked and keeping far away from anyone else.
After a month or two we had a routine working, and then Monica broke her foot. It turned out she broke the exact same bone as Michael Jordan, so I’m looking forward to her imminent six NBA championships!
Having Edward, our son, around while working full time adds another layer of course. I try to get up early to do work but that seems to just encourage him to get up early, too. We have been trying to split the days as equally as possible, which means that if you schedule a meeting with me between 9 and noon then I’m likely to be talking to you while I’m walking with him at the park. We feel very fortunate to not only both still have jobs, but to have jobs with the flexibility to allow this and so it is important to both of us that we try to make others’ lives a little easier where we can.
You’re an economic historian by background; how did you end up with a joint appointment in the iSchool and Statistics?
As you mention, I had a bit of a funny path to DoSS and unlike a lot of people I know exactly when I fell in love with statistics as an academic discipline – 31 July 2016, which was the first day of the first JSM -- Joint Statistical Meetings -- that I went to. It was in Chicago, and I was just blown away by how exciting everything was, so I quickly transitioned my work from economics to statistics.
One of the issues with this was that technically I was still enrolled in an economics PhD. My panel, John Tang, Martine Mariotti, Tim Hatton and Zach Ward, were very supportive, but ultimately the rules made it impossible to have my more-statistical papers in my thesis. So, I ended up essentially writing two theses – a bunch of papers that would have been a statistics thesis, as well as my actual thesis, which is more traditional economics. In hindsight, it probably would have been better to have dropped out of the economics PhD and started one in a more statistics or information science focused field. Writing two theses, especially when one of them is focused on a different field, is not something that I’d recommend, but it worked out in the end.
I hope to share my love of statistics and information science with students and colleagues, and to encourage others like me, who are statistics-adjacent, to see what the field has to offer. The data that we get to explore and the open source tools that we get to use are just incredible. It’s just such an exciting and versatile field and I only wish that I had realised this as an undergraduate! Everywhere we look these days we see statistics in the wild – and the wonderful part is that to a large extent the discipline is generous with its resources. Plenty of professors put entire courses online for free, R and Python are free, many exciting datasets are freely available, many of the important textbooks are free. It’s incredibly accessible and exciting and I’m so glad to be a small part of it, and I hope to do what I can to encourage more people into the field.
The data that we get to explore and the open source tools that we get to use are just incredible. [Statistics is] just such an exciting and versatile field.
What about your personal background? Can you tell us more about where you grew up and how you ended up in academia?
I’ve taken a round-about path to academia including stints working in government and at my own start-up.
I grew up in Central Queensland, Australia. All of your pre-conceived notions of Australia are true – there were snakes and kangaroos and we would go off into the bush by ourselves all day and then come back for dinner.
My first job out of university was at the Reserve Bank of Australia, the RBA, in Sydney. Among other things, the RBA is responsible for printing banknotes, so my role was a fun mix of quantitative projects, such as forecasting banknote demand, and dealing with the public, such as teaching school children about banknotes!
After three years I moved to Australia’s capital, Canberra, to work full time as an economic consultant and study full time in a masters at the Australian National University (ANU).
All this time in the background, myself and some friends from university had been working on a side-project, and eventually we were accepted into an accelerator. Accelerators give you some money and guidance, in exchange for equity, in the hope that you will be able to grow your business aggressively. That side-project became Go1, which is sort of like Quercus for businesses, and now has offices in five countries, and household-name investors such as M12, the venture capital arm of Microsoft. We’re all still close friends and it’s lovely now that our partners have become friends and our children and dogs play together.
My partner, Monica, started a PhD at Berkeley in California and so I started a PhD back at the ANU, largely remotely, as I wanted to have the opportunity to explore some of the deeper questions that my masters had inspired. Luckily my chair, John Tang, had himself done his PhD at Berkeley, so he ended up spending a lot of time there, but there was a lot of travel between Berkeley and Canberra during those years. I did my best to catch up, but Monica finished her PhD a year before me and so we moved to Toronto, where I finished mine, and then was lucky enough to start a post-doc at the Faculty of Information, which transitioned into this joint role between Information and Statistical Sciences.
Tell us more about your research; in particular, about your work around elections and polling!
My background and research interests are quite varied but fall at the intersection of information and statistical science. I am interested in not only statistical science, which is drawing quantitative inferences from real-world data, but also in information science, which has a focus on the sources of those data, how they were collected and how they are managed, through my emphasis on all aspects of obtaining, preparing, storing, and sharing data.
My research typically involves first constructing new datasets in a reproducible way, drawing on methods including digitization, record matching, survey collection, and web-scraping. After constructing datasets, I then use statistical methods to analyze the large amounts of information contained in them, drawing on techniques from natural language processing, machine learning, and Bayesian methods. As my work is focused on individuals, issues of privacy and consent are of critical importance, and exist within my broader research interests in the intersection of data and ethics. Additionally, I develop open science best practises that help enhance reproducibility and replicability, as well as understanding.
My work is arranged around three tranches. First, multilevel regression with post-stratification, or MRP. Secondly, natural language processing, or NLP, and, lastly, workflow analysis.
I’ll focus on MRP here. The basic idea around surveys and sampling is that we want our survey to be as similar as possible to the population that we’re interested in. So, traditionally, you do your poll and then you just use those findings directly. However, it is becoming increasingly more difficult to obtain surveys that are representative of the broader population, and we are increasingly interested in sub-populations. MRP is a technique to overcome these issues, allowing for non-representative surveys to be adjusted to make inferences about the population at large.
A common application of MRP techniques is in forecasting political elections. MRP tries to understand the relationship between who you’re going to vote for and variables that you may have gathered in your survey, such as age-group, gender, and location. I first applied MRP in 2016 to the Australian Federal Election. Like Canada, Australia has a parliamentary system so what is important is the number of seats that a party will win. We forecast that the ruling conservative party would win 80-85 seats out of a possible 150. They ended up with 76. The seats of Bass and Braddon in Tasmania, and Mayo in South Australia, haunt my dreams to this day, as we unexpectedly got them wrong.
Isabel Ott said that "[I] think most non-academics underestimate how much research projects are driven by sheer annoyance with how wrong everybody else is". And certainly, in the case of my interest in MRP, that’s absolutely true. The current state of polling back home in Australia, but also here in Canada, drives me mad. We have all these way better ways of doing things, but, with a few notable exceptions, they’re barely used. I’m really hoping to change that!
Lauren Kenney, who is a lecturer at Monash University, and I run an MRP Reading Group. We’d love for more people to join and anyone who is interested should feel free to get in touch with me. And of course, everyone who is in my class this term will be learning an awful lot about all this!
The current state of polling back home in Australia, but also here in Canada, drives me mad. We have all these way better ways of doing things, but, with a few notable exceptions, they’re barely used. I’m really hoping to change that!
What courses will you be teaching this Fall, and how do you feel about the new online format?
I am teaching a section of STA304, which focuses on surveys, sampling, and observational data - the very stuff of statistical science! We will approach these topics from a practical perspective. Students will actually run surveys and learn how messy it is to put them together. They will learn how to think about sampling, how to implement it, and why the details matter. They will forecast the 2020 US Presidential Election!
With regard to online, like everyone, I am trying to adjust to the fact that there is not going to be a quick fix for our current circumstances. Online approaches for teaching, especially for courses with larger enrollments, are going to be with us for some time and I am developing the course in that belief. I hope that the students are as comfortable with the idea that the baby may walk into the room in the middle of the lecture, as I am with the fact that they might be listening to me while they are trying to feed their own child, or dealing with some other situation at home.
I’ve always seen my role as a teacher to act as a conduit to knowledge – identifying which resources are useful and important and opening opportunities for interested students to pursue their own research – rather than being the repository of all knowledge. The resources that are being made available online will provide students with opportunities that would have been otherwise difficult in-person.
Applied statistics isn’t for everyone, but I love it and I hope to impart that love to the students. The best outcome would be for a student in a year or two to email and say that they realise why we did the things that we’ll do in this course; and for them to email again in four or five years to point out all the things that we missed and that I got wrong! Because if students don’t continue learning and developing beyond what I teach them then I’ve not done my job.
Applied statistics isn’t for everyone, but I love it and I hope to impart that love to the students
Adding on to that: what do you hope to accomplish in your tenure at U of T?
The work of applied statisticians, regardless of their specific job title or area of application, is the most important work in the world right now. The ability to gather data, analyze it, and communicate your understanding of the underlying process is incredibly valuable. At U of T there are many different people doing that at an incredibly high level across a range of faculties/departments, and I hope I can help be a catalyst for more of this, both in terms of research and teaching. The amazing response and enthusiasm to the Toronto Data Workshop shows the demand for increased collaboration in data science both within academia and beyond to industry.
From a research perspective, I intend to continue my work of putting MRP on a firmer statistical footing and broadening its use. I will continue exploring how NLP and data cleaning and preparation can interact and build on each other. And I will continue my research that develops the idea of the entire statistical workflow.
From a teaching perspective, I would love to encourage students to build on the foundation that I’ll provide in class and for them to iterate on it given their background and interests. Students in DoSS have all of these remarkable skills and I hope to provide them with opportunities to do the best work of their lives.
Finally, and from a broader perspective, I’ve been incredibly lucky in so many aspects of my life, and I’m aware that many others are not so lucky. So broadly, I hope to use my position to help others, particularly to work towards improving representation of minority groups in statistics and data science. With the help of Radu Craiu, who is the Chair of the Statistics Department, and Dean Wendy Duff from the Faculty of Information, Kelly and I set up the Toronto Data Workshop stipends for BIPoC, to provide support for students from these groups to work on data projects. I hope that this will be the start of a larger program to improve support for under-represented students.
I’ve always seen my role as a teacher to act as a conduit to knowledge – identifying which resources are useful and important and opening opportunities for interested students to pursue their own research – rather than being the repository of all knowledge.
What do you and your family like to do in your spare time?
Our son takes up most of our non-work time at the moment. He’s walking now and a confident little explorer, so I spend a lot of time in Grange Park, and also the various greenspaces at the U of T, especially the college quadrangles which tend to be a tad quieter. As he was born here and, so, is actually a Canadian, I’m a little worried that he’s soon going to be embarrassed by my lack of hockey knowledge, so we are reading books like ‘M is for Maple Leafs’ and ‘1, 2, 3, Cheers for the Toronto Maple Leafs’, both of which I’ve found concerningly informative despite their intended audience of toddlers.