My 10-year-old son recently asked me what a data scientist does. I’m a visual guy, and like to paint a picture, so I thought about how best to explain this. I liked an explanation I came across a while back at http://hadoop-karma.blogspot.com/2010/03/how-much-data-is-generated-on-internet.html, describing the relationship between data, information and knowledge. I would take it one step further, because that’s what we do here at Dataspora. We take data, and transform it into actionable intelligence.
A data scientist is someone who takes your data and transforms it into actionable intelligence. But how do you explain that to a 10 year old? Well it just so happens, this 10 year old is starting to play the clarinet, so music seemed to be a good choice to use as an example.
Let’s say the note this aspiring Benny Goodman squeaks out of his shiny new instrument is a piece of data. All alone, hanging in the air in my living room, echoing off the walls, it doesn’t mean a whole lot. It’s raw, unadultered data. Now how about if we do something with it to put it in context. I can play a chord on the piano, and have him play his (squeaky) note, and suddenly we can tell if he’s sharp or flat or in tune. We can measure how long the note is. His note is no longer alone, but has some context. We have some information. We know he was playing middle C. This is the equivalent of step 2. We turned data into information by giving it context.
Is that enough? Perhaps. If that’s all you want to know, sure. But you probably want more. My son has a pretty good ear, and can pick up a rhythm fairly quickly. If he were to play several notes in a particular rhythm, we could have a motif. Wow. That’s more useful than a single note. He’s now taken several notes, with varying durations, added some pauses and made something larger – a motif. In this analogy, the motif is equivalent to a bit of knowledge. More motifs = more knowledge. If I were a composer, I could combine various motifs to make a symphony (really important knowledge). The more skilled the composer (data scientist), the better the symphony (= better knowledge).
This is great! But we’re still just at the knowledge state, and I want to do something useful. I have a score sitting in front of me that started from a 10 year-old boy blowing on his rented clarinet. What can I do with it? This is where things get interesting. If I were a conductor, I could choose how and when to present this new score. Do I give it to my 5th grade band to play, or pass it on to the Philharmonic? Maybe I change a few things, and prepare it for a string quartet. The choice of action is up to me. This is the final stage – ACTION.
So, some might say that a data scientist fiddles around with data, but I prefer to look at the larger picture. A data scientist transforms data into actionable intelligence, picking and choosing what’s useful and what’s not. It doesn’t really matter if you have the data if you can’t actually do something with it – even if your choice is to do nothing at all.



