This May, 17 teams of undergraduate students competed in the annual ASA DataFest@UofT event. In its sixth consecutive year, this was the first year with participation from teams across all three campuses.
DataFest invites teams to analyze data curated and provided by the American Statistical Association (ASA) to find and present unique insights. Teams were given a dataset and presented their findings. They were judged based on their video presentation, written report and analysis of the data. Just as a hackathon is open-ended, so too was DataFest. Teams relied on their creativity, as well as their statistical toolkit, to discover unique insights and present their findings to a panel of industry judges.
This year’s challenge was focused on data provided by Rocky Mountain Poison & Drug Safety. “I love how the organization gave us a data set that's really complicated. But at the same time, it's actually really interesting to clean the data and then to find the patterns,” says Rain Wu, a third-year applied statistics student.
Rain Wu, a member of team Library(Purrr), that won Best Insight, explains that the team's initial process of exploring the data was essential to figuring out their research questions. “At first we had this list of all the possible variables that we're interested in, and then after some basic data visualization and cleaning, we actually thought that maybe we have three main research questions,” says Rain.
To investigate their research questions, fellow team member Tina Wang, a third-year statistics student, started simple, with basic data visualization. “We started with really simple linear regressions and then we expanded on building more complex models and a bit of machine learning techniques such as random forest models,” says Tina.
Judges from TD Insurance, KPMG and other industry and academic partners relied on teams’ communication, analysis and findings. They noted that communication was key. “A lot of students don’t realize this, but in our jobs, we have to justify or rationalize why you're doing certain things, and so I think that really stood out most with the judges,” says Assistant Professor Samantha-Jo Caetano, the organizer of this year’s event.
Team Prospective Analytics was recognized for Best Visualization for their heat map of the United States to represent predicted drug use over geographic regions. Rain, a member of team Library(Purr), says she learned a lot from their visualization. “I learned how to produce a good heatmap for geographic location, and perhaps I'll apply it for my future projects,” says Rain.
As their prize for Best Insight, Rain and Tina look forward to having their resumés critiqued by the 2020 top data science interviewer at Amazon.
Rain and Tina also shared some good advice for students thinking to participate in next year’s competition, especially for students who worry that they don’t have the necessary experience.
“My top advice would be to not hesitate to sign up for it. Neither of us has had any previous DataFest experience prior to our very first data competition in our second year. We gradually learned by participating in those competitions and learning from our competitors,” says Rain.
And Tina adds: “It's a good way for you to connect your education to a possible career in the future and see what people in the industry actually do. You get to practice knowledge from school on actual industry data. You also learn what your interests are and build great networks with not just your teammates, but also the mentors or judges that participate. So never hesitate on those valuable possibilities.”
Congratulations to the winning teams and many thanks to everyone who joined DataFest this year!
2021 Team Awards
Library(Purrr): Tina Wang, Rain Wu, George Huang & Ben Min
Prospective Analytics: Jing Yuan Zhang, Eric Zhu, Muhammad Tsany & Sergio Steven Zheng Zhou
Best Use of External Data
Bad Boys: Haoluan Chen, Dawei Dong, Yujie Li & Tu Wu
GO!: Hanrui Dou (Lisa), Yuying Chen (Kate), Aichen Liu & Yuyuan Liu (Livia)