Good post on "big data" from a speaker at the upcoming Money:Tech conference in New York:
At present, much of the worldâ€™s Big Data is iceberg-like: frozen and mostly underwater. Itâ€™s frozen because format and meta-data standards make it hard to flow from one place to another: comparing the SECâ€™s financial data with that of Europeâ€™s requires common formats and labels (ahem, XBRL) that donâ€™t yet exist. Data is â€œunderwaterâ€ when, whether reasons of competitiveness, privacy, or sheer incompetence itâ€™s not shared: US medical records may contain a wealth of data, but much of it is on paper and offline (not so in Europe, enabling studies with huge cohorts).
Yet thereâ€™s a slow thaw underway as evidenced by a number of initiatives: Aaron Swartzâ€™s theinfo.org, Flip Kromerâ€™s infochimps, Carl Malamudâ€™s bulk.resource.org, as well as Numbrary, Swivel, Freebase, and Amazonâ€™s public data sets.