The earnest editorial staff here at Forbes urge us contributors not to get too clever with headlines, but since headline wit is one of the few genuine and unadulterated pleasures of blogging, I occasionally (okay, often) succumb to temptation and sacrifice SEO best practices for a bit of a private chuckle.
Besides, being at a major data-related conference like Strata, and listening to data geeks salivating over prospects of ruling the world by 2015 using large scale automated analysis of Web content, I’ve realized that it is up to us bad punners, over-clever reference droppers and metaphor-mixers to prevent the data geeks from accidentally creating Skynet. Or at least delay the takeover. Giving up a few ounces of SEO juice and some traffic is a small price to pay. Call it the Skynet Resistance Tax.
Speaking of data geeks, that’s the topic of this post. I’ve spent much of my first couple of days at Strata idly observing them (and wondering whether I am one of them), and speculating about the various species in the data scene (people-watching is how you stay awake between the interesting slides at conferences). So here is my informal taxonomy and anthropological survey of data-land.
A Taxonomy for Data Land
The taxonomy part is simple. Apparently the list of species in data land is very short. It has only one item:
Okay, I am exaggerating a bit, but that’s what it feels like, to hear the talks and hallway conversations. IT admins, six sigma types rushing to the data bandwagon, ex-BI types, visualization and infographic geeks, analytics geeks, programmers, old-school statisticians, Hadoop wranglers — they all seem to be calling themselves data scientist now. There are more complicated taxonomies floating around, but everybody appears wary of accepting them.
Which brings us to the informal anthropology of what’s going on.
Everybody is a Data Scientist
I was gratified to learn in an early talk during the first day that I am apparently not a newbie to the field at all, like I thought. I too am a data scientist, and a grizzled veteran at that. Apparently, I am among the vast numbers of people who’ve been doing data science all along without realizing it. All it takes is some rudimentary experience running statistical tests on a data set, mucking around trying to prove a couple of hypotheses, and generating a few visualizations. You don’t need to know that Hadoop is a toy elephant to qualify for the title apparently. Having used Google Analytics or Search Insights counts as having experience with Big Data.
I am not making that case. One of the Strata speakers who can genuinely lay claim to the title made the case.
Extremely generous and inclusive, you say? Therein lies a story.
If you’ve been in the technology world for a while and have surfed a couple of hype cycles, you are probably familiar with all the types of identity angst you encounter around any new technology trend. At any given technology conference, you will find the following types:
All in all, every technology trend is a seething drama of identity angst. And besides individuals, corporations have a stake in the labels-and-titles game as well. Big and established players like IBM, Microsoft and EMC must contend with an endless army of feisty startups trying to redefine the game in their own interests. Multi-million dollar deals can be won or lost based on whether you use the term “Big Data” or “Data Warehousing” in your sales pitch.
What makes the data scene interesting is that this soul-searching for identity appears to be happening with a peculiar urgency, and the data community has reached a certain bizarre consensus position that I’ve never seen before in technology: they’ve decided that everybody is a data scientist. I’ve never seen quite this level of title ambiguity before (the one field that might potentially have more title ambiguity is “user experience”). It’s like the South Park episode about the alien race that uses the word “marklar” for everything
Comments
Great article, except the Republican/Democrat part. That didn’t add anything to the story, and painting all Republicans as wanting to employ only people at minimum wage, well, that’s just silly. Big Data can solve major problems in the world, but not if we’re going to paint with a broad brush and ascribe traits to people based solely on their political affiliation. A single variable with no data to back it up – that’s not data science. :)
That was my weak attempt at humor. Which of course means the next presidential election will take it seriously.
Interesting article. I’m a data scientist (it says so right on my business card), and I mostly agree with what you have to say. I thought the observation of the democratic and inclusive notion of the category was insightful. (As an aside, I used your Grit blog post to write about Data Scientists previously, in the sense of an untraditional career path based on hard work and pivoting.)
You may also be interested that another of those consulting firms, CapGemini, proposed a 3-level categorization of what might be called Data Science: Descriptive, Predictive, and Prescriptive Analytics. Hasn’t taken off outside of the OR community, which sponsored the study. http://www.informs.org/ORMS-Today/Public-Articles/October-Volume-37-Number-5/INFORMS-News-INFORMS-to-Officially-Join-Analytics-Movement
The article was fun to read. One more observation about the term “data scientist”. Data science was a promised science, a science to come. Now some people of the promised folk found themselves to be the promised scientists.
I have a soft spots for ready-mades. It looks like cheating but it can be a positive power.