While the phrase data scientist may be growing exponentially in its usage, and the number of data scientists job requisitions following a similar trend, the definition of the term is hard to pin down precisely.
I wasn’t sure I could define it well until I watched a talk by Hilary Mason, former chief scientist at Bitly, called Dirty Secrets of Data Science at a NYC meetup. During the presentation, she highlighted a chart created by the Data Community DC team that demystifies term data scientist.
Using a survey, the DCDC team recorded the major activities of 250 people calling themselves data scientists. Once collected, the team bucketed the activities (business, learning & big data, math, programming and stats) and named four different behavioral segments of data scientist: data businessperson, data creative, data engineer and data researcher.
Each of these roles emphasizes different skills. The data businessperson focuses on answering business questions, such as forecasting the revenue for the business overt next year. Data creatives create data exploration experiences, like ones Amanda Cox and her team built at the NYTimes. The data engineer erects the infrastructure to answer questions for the business, e.g. Hadoop. Data researchers model data and develop algorithms to predict important things including which movie a consumer will want to see next, what ad should be displayed to a particular user at a particular time, and how much discount a user should be given to maximize loyalty.
The diversity of these examples makes clear the day-to-day work of people calling themselves data scientists varies widely across five disciplines. Unfortunately, the word data scientist conflates all five types of work. For startups, this ambiguity makes hiring the right kind of data scientist a challenge.
Most founders would likely point to the data businessperson. But the next question is what data infrastructure will that person need to perform their analyses? And so, is a data engineer required to build that infrastructure? Or will an existing engineer support them?
Computer science has formalized refined roles for Ops, Back End Engineers, Front End Engineers, UX Designers, UI Designers and Product Managers. It seems to me the same process will happen with data scientists as we discover and uncover the different key roles and functions they serve in startups. DCDC’s analysis is a great first step at defining those roles and creating a lexicon for distinguishing among them.