Data Should be the Intel Outside

The ever-quotable Tim O’Reilly is fond of saying that in Web 2.0 “data is the Intel inside”. He is right, but I think his forward-looking phrase is set to become inverted.
To start at the start, Tim’s (correct) point is that data is the special sauce in Web 2.0 software. It isn’t the logic; it isn’t (necessarily) the UI. What makes Gmap/Flickr/ great is the data. Granted, the data can come from all sorts of places, ranging from community contributions to licensed stuff, but it is still the data that puts the bums in the Aerons.
So here is the problem: The more people that figure out that it’s not about proprietary applications, but about proprietary data, then we merely move from one walled garden to another. Because a walled garden of data, however pretty, still has walls, just like old algorithmic garden of shrink-wrapped software. All that we are doing by over-focusing on data being the Intel inside is ensuring that data becomes the Intel inside in every sense — another proprietary layer for litigation, instead of something that can be readily extracted and combined to create better apps & services.
Where should we be going? Call it “data as the Intel outside”, where the innovation engine is how easily data can be recombined outside any one application. Turning things inside-out should be the Web 2.0 goal (or Web 3.0, as Steve Mallett puts it on his DataLibre site). We have open-source software messing up markets for shrink-wrap vendors of proprietary software, why shouldn’t open-source data vendors mess up the market for would-be Web 2.0 vendors who are trying to Balkanize things by locking up data inside their own apps?
Scott White has a nice proposal related to this. He is calling for a kind of EnergyStar seal on Web 2.0 apps, one that shows the vendor is playing nicely with others and sharing data so that people can freely mix data across apps, pulling together Amazon reviews and auction ratings and news and geo-tagged data. That is a much more interesting path forward than where we seem currently headed, which is toward a new world of proprietary apps, albeit one based on data not binary installs.
Anyone want to take up Scott’s challenge and start agitating for an Open Data seal on Web 2.0 apps? The time to do it is now, not later.


  1. yo I think you have a point about the Open Data seal. Just I think it already exists. look at audioscribbler and musicbrainz etc, or Flickr or Music for Dozens, where the data is licensed under Creative Commons licenses and freely available through the API.
    perhaps Creative Commons should release a license that specifically related to data, but for now the good old Attribution License seems to work well.

  2. I also find the CC attribution license fine, but there are some practices that must follow like an easy way to extract your data.
    There are a lot of different aspects to how to handle your data in the datalibre universe. Slapping a license on an app isn’t an all encompassing answer.

  3. idea of a community certification is very interesting. but tagsonomy-related approaches aside, how do we poll the openess? or who is the certifying body? O’Reilly maybe?
    a community vote- how to wisdom the crowds?
    great question though. thanks Paul.

  4. Proprietary Data: Intel Outside?

    The Infectious Greed blog (as noted on tackles the proprietary data issue. The writer offers that Tim O’Reilly’s take that “data is the Intel inside of Web 2.0” should be flipped. Data should be “Intel outside.” He goes on to offer that

  5. James: I think a opendata definition wouldn’t be too far off the mark as the open source definition has stood up well.
    I’ve been mulling these things over for months, but haven’t written up a formal doc yet as the battle over data is really just starting to be a pale dot on the horizon for only a few people.
    After living through the open vs free symantics fight I did purposely choose “libre” of for liberation. It is, I think, more to the point of the purpose since data’s primarily function will be to move and be used freely.

  6. Yes. I think there needs to be a strong display about applications and services that preserve our ownership of data, documents, and everything else of our own creation, without any captivity under private formats or services.
    I was sensitized to this in a talk that Eliot Kimber gave at an xDev event in July, 1999. It has always stuck with me to ask who owns my data and (these days) even who owns my computer.
    I just looked up the notes that I was smart enough to take at the event in San Jose and I am going to see if Eliot has published anything about it. I think I’ll post my notes if he doesn’t mind.

  7. I wholeheartedly favor open data initiatives. But I think that means, as well, that the practical ownership of our own content be preserved for us. That means to me that what ever custody is entrusted to some service or software, the service warrant that our content can be recovered in a public format unencumbered by proprietary and captive formats, databases, and so on.
    Note that this is different than licensing the use of the material (as by a Creative Commons license, for example).
    I have been wary of who owns our data and (these days of spy-ware and presumptious mobile code from web sites) our computers ever since I was sensitized to it by a talk Eliot Kimber gave at a San Jose xDev event in July, 1999. I just dug out my notes from the presentation and I’m going to see if Eliot has preserved his presentation somewhere. If not, I’ll see if I can post my notes.

  8. Great post! On a related note, I do think there’s a distinction to be drawn between data that is by it’s nature from the public, and ultimately for the benefit of the public (e.g. Amazon reviews /, vs. from private communities (e.g. company teams) that aren’t working in the open.
    Both need open data, but the dynamics are slightly different. I think it becomes harder to operate in a drive-by data mode (your term: — love it) simply because the community is orders of magnitude smaller, and there are fewer tinkerers to put the data to work. The tools that generate the data in the first place need to step it up to make re-mixing and sharing easier.

  9. Peter Glaskowsky says:

    The Creative Commons license doesn’t need an external “certifying body” because the licensing terms are self-documenting. If the IP owner says the IP is available for a certain kind of reuse, it just is.
    So could we do the same thing for other kinds of information? If you display the “Open Data” logo, you’re telling the world it’s free to reuse your data (according to your rules, if any).
    There could still be an issue with someone using the logo but not actually making the data easily accessible, but this isn’t much of a problem. The logo would convey explicit permission to scrape out the data and use it according to the license terms. Also, it just wouldn’t make sense; it would be like using the Creative Commons logo on a song that can’t be downloaded. Who would bother?
    . png