At O’Reilly’s “Making Data Work” seminar earlier this summer, I teamed up with a few other folks (data diva Hilary Mason, R extraordinaire Joe Adler, and visualization guru Ben Fry) to talk about data.
What follows is a blog-ified and amended version of that talk, originally entitled “Secrets of Successful Data Scientists.”
1. Choose The Right-Sized Tool
Or, as I like to say, you don’t need a chainsaw to cut butter.
If you’ve got 600 lines of CSV data that you need to work with on a one-time basis, paste it into Excel or Emacs and just do it (yes, curse the Flying Spaghetti Monster, I’ve just endorsed that dull knife called Excel).
In fact, Excel’s and Emacs’ program-by-example keyboard macros can be fantastic tool for quick and dirty data clean-up.
(more…)

