Tech: Scraping Location

Pardon the geek-ish interruption to my usual financial harumphing, but I have a lazy web question:

Given an abitrary list of URLs (representing companies), is there a straightforward way of figuring out the underlying company’s real-world location? Note: I don’t mean the GeoIP of the domain.

I got a short way into writing a scraping script, and then decided to ask before going further.

Related posts:

  1. Sending & Receiving Location-Aware Data
  2. Pecked to Death By Ducks: The Location-based Data Explosion
  3. Location-Based Technologies & the Quest for Ithaca
  4. Rethinking Domain Names
  5. The Curse of Domain Kiting

Comments

  1. AJ says:

    You could try to scrape their WHOIS data:
    http://whois.domaintools.com/kedrosky.com
    One of the three records above shows La Jolla as a possible place of busines for the the ‘kedrosky.com’ ‘company’.

  2. Janis says:

    Hi, I made a site called Beer Hunter and I didn’t find any easier way than to screen scrape the site addresses and then geocode the results with geocoder.ca’s xml api via a little php script. If the locations that you want to geocode are U.S. addresses, I believe that geocoder.us provides the same sort of public api for the states.

  3. Rod Edwards says:

    Yep – I use the geocoder.ca to scrape realestate listings from WebView360. Take a look at Winnipeg, Edmonton or Vancouver over on http://www.blockrocker.com, and you’ll see results parsed out in this painful method.
    Problem being, the webview listings are in a consistent format from one page to another, whereas separate companies will not be.
    Perhaps scraping a business category listing from Canada411.com would be faster?

  4. Thanks all. Unfortunately, sounds like scraping is it. Too bad that companies are so bad about uniform “Contact Us” pages.

  5. Brian says:

    Too bad that companies are so bad about uniform “Contact Us” pages.
    Suggest a standard, by all means. Unless there is one?

  6. Rod Edwards says:

    Brian – downloadable vCards?

  7. Marc says:

    I had needed to come up with a solution to the “what’s a company’s real address based on thier URL” problem, and the solution worked as Brian suggested. However, without needing standardized Contact Us pages. It simply looked for pages anywhere on the site that contain a phone number (in any of the various possible formats), known city, known country name/code, zip code/postal code, and a few other identifiers and mention of the word “contact” (in the known Western language forms [didn't have unicode parsing for doube-byte characters])
    It’s easier to implement than you might think.

  8. John K says:

    Some search company out there does this – specializes in contact us pages. Of course, I can’t remember who it is…
    How’s that for a lazy lazy web answer?

  9. Brian says:

    Brian – downloadable vCards?
    D’oh. Focused on ‘what a page should look like’ and forgot about vCards.
    Something else to add to the to-do list.