Archive for August, 2010

The Seven Secrets of Successful Data Scientists

Friday, August 27th, 2010

At O’Reilly’s “Making Data Work” seminar earlier this summer, I teamed up with a few other folks (data diva Hilary Mason, R extraordinaire Joe Adler, and visualization guru Ben Fry) to talk about data.

What follows is a blog-ified and amended version of that talk, originally entitled “Secrets of Successful Data Scientists.”

1. Choose The Right-Sized Tool

Or, as I like to say, you don’t need a chainsaw to cut butter.

If you’ve got 600 lines of CSV data that you need to work with on a one-time basis, paste it into Excel or Emacs and just do it (yes, curse the Flying Spaghetti Monster, I’ve just endorsed that dull knife called Excel).

In fact, Excel’s and Emacs’ program-by-example keyboard macros can be fantastic tool for quick and dirty data clean-up.
(more…)