When your field of study is the known universe and everything in it, you generate a lot of data.
This has never been more true for astronomers as the current generation of telescopes observe not just a single object or tiny patch of sky, but the entire sky for an entire night, night after night after night.
In the past 10 years, projects like the Canadian Hydrogen Intensity Mapping Experiment (CHIME) in British Columbia — in which the University of Toronto is a collaborator — and space telescope missions like the Milky Way galaxy-mapping Gaia and the exoplanet-hunting Kepler have all been part of a “data-driven” revolution in astronomy.
When it begins operation in the 2020s, the SKA will be the largest radio telescope ever built. As it scans the sky, it will generate 600 petabytes of data a year. If you had to store that much data on a typical laptop computer with 500 gigabytes of memory, you’d need a million laptops.
Big data isn’t better data
The age of really big data in astronomy places even greater importance on the tools used to analyze and make sense of this scientific trove. Techniques based on machine learning can handle the classification aspect of the task — sifting through the data to identify asteroids, variable stars, quasars, etc. — but investigating these objects and discovering their true nature still relies on a well-equipped statistical toolbox.
“The LSST will collect terabytes of data,” says Gwen Eadie. “But big data is not necessarily going to answer all your questions. Having big data is great but in order to understand its properties, you need rigorous statistical practices.”
Eadie is an astrostatistician — a rare breed of scientist with one foot firmly in astronomy and the other firmly in statistics.
“The data science revolution has had a deep and rapid impact on academia and industry,” says Radu Craiu, chair of the statistics department. “We have taken a follow-the-data approach and initiated a sustained campaign of joint hires with relevant departments like astronomy and astrophysics who have rich, data-driven research programs.”