When people hear “data science,” their minds typically go to big business and tech giants using data to make money and further their businesses. And that’s because we’ve all seen how Amazon leverages our purchase data to suggest what else we might want to add to our cart, Comcast recommends which on-demand movies you should buy, Starbucks uses data to predict the success of every new store opening, and Wal-Mart recently partnered with Hewlett Packard to store data that helps them detect patterns to manage their inventory and supply chains.
But the power of big data is far greater than its commercial applications. Leading organizations, and the data scientists who work for them, are increasingly focused on the power of data to do good. This is the future of data science — and it’s time that people and companies catch on.
Take for example Erin Boehmer, a graduate of the datascience@berkeleyprogram, who rather than heading into the high-paying tech sector upon graduation, instead packed her bags and moved from Boston to Kampala, Uganda. There, Erin works for Fenix International, a company that has developed a pay-as-you-go solar kit that collects data from people in rural environments who do not generate a data footprint. Through the data Fenix International collects, Erin is able to identify how as a company they can more effectively replace dangerous light sources with safer, more affordable options.
Erin’s not the only one creating positive change in her community. Data scientist Jake Porway attended a hackathon that changed his path entirely. Even though he was paired with some remarkable and talented people at the at the hackathon, Porway found that they were creating much of the same — apps to “sell stuff” versus creating change. It was seeing these skills used in this way that led Jake to create DataKind, which partners social organizations with data scientists across the globe.
Erin and Jake are just getting started — and are proof of how just a few years have made a world of difference in the industry.
From Data to Big Data
To grasp how the use of big data has evolved to this point, it’s important to understand when this information transformed from just data, to “big data.” Because although data has been captured for centuries, the term “big data” truly came to light in 2007 when Wired published “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” The article points out that Google — one of the first companies to use consumer data in this way — proved that the scientific method of hypothesize/model/test is a thing of the past. Rather than spending time on these hypotheses and why consumers do what they do, Google skipped ahead. They instead used algorithms to find patterns in the data and went from there. Data became the start of the equation, not the end.
Since 2007, the evolution of data has hit many other milestones. For example, according to McKinsey’s report “Big data: The next frontier for innovation, competition, and productivity,” in 2009 the average U.S. company with over 1,000 employees was storing more than 200 terabytes of data. And in 2010, Google’s CEO Eric Schmidt announced that we create as much data every two days as we did from the beginning of civilization up to 2003.
The Fastest Growing Job in America
Since Schmidt’s announcement, data science has transformed. Whereas just a certain subset of companies were using this information a few years ago, the proportion of data scientists employed by startups dropped from 29 percent in 2014 to 14 percent in 2015. Industries like healthcare, nonprofit, and government are expanding their data science capabilities.
In fact, data scientist is now the fastest growing job in America, and the U.S. Department of Labor estimates there will be more than 1.4 million new computing-related job openings by 2020. And these 1.4 million job openings are not just at Google and Amazon — but at hospitals such as Memorial Sloan Kettering Cancer Center, which is using data to assist doctors in cancer treatment choices, and companies like Crisis Text Line are analyzing the texts they receive to predict when crises are most likely to occur.
Fostering Data for Good
And now, schools (like the one Erin Boehmer graduated from) are creating specific programs to further foster this idea of “data for good.” For instance, recognizing that one of the areas that has been underrepresented is the use of data science to benefit society as a whole, UC Berkeley launched the Jack Larson Data for Good Fellowship, a $50,000 fellowship designed to support MIDS (Master of Information and Data Science) students in their pursuit of using data science for good. The fellowship will award $8,500 to six different MIDS students over the next two years, with the hope that students like these will start to shape the future of data science.
With programs like datascience@berkeley creating this opportunity, and the Erins and the Jakes of the world applying their talent to community purposes instead of purely commercial endeavors, they are setting the stage for a better future and a better society. And this is just the beginning.