dataspora

 

Winning with open source tools

muybridge_horses

The world's leading financial, technology, and pharmaceutical firms are using open source tools.

Not because they are free software, but because they are the best software.

These tools are on the leading edge of analytics. They represent the collective efforts of some of the best minds in academia and industry. But they aren't always easy to deploy.

Yet in the right hands they are more powerful than any software that money can buy.

Using Hadoop for Big Data

Hadoop is a reliable, scalable, distributed computing platform, suitable for the storage and analysis of massive data sets. Hadoop is based on a system originally developed at Google, and is widely deployed at Yahoo and Facebook to crunch petabyte-scale (one million gigabytes) data sets.

We have successfully deployed Hadoop on 32-node client clusters, using Cloudera's distribution, allowing the speed up of analysis workflows from days to hours.

Using R for Analytics and Visualization

R is an open source programming language for statistical computing, data analysis, and graphical visualization. R has an estimated one million users worldwide, and its user base is growing. While most commonly used within academia, in fields such as computational biology and applied statistics, it's gained currency in commercial areas such as quantitative finance and business intelligence.

Among R's strengths as a language are its powerful built-in tools for inferential statistics, its compact modeling syntax, its data visualization capabilities, and its ease of connectivity with persistent data stores (from databases to flatfiles).

In addition, R's open source nature and its extensibility via add-on "packages" has allowed it to keep up with the leading edge in academic research.