Tech: Scraping Location

Pardon the geek-ish interruption to my usual financial harumphing, but I have a lazy web question:

Given an abitrary list of URLs (representing companies), is there a straightforward way of figuring out the underlying company’s real-world location? Note: I don’t mean the GeoIP of the domain.

I got a short way into writing a scraping script, and then decided to ask before going further.


  1. You could try to scrape their WHOIS data:
    One of the three records above shows La Jolla as a possible place of busines for the the ‘’ ‘company’.

  2. Hi, I made a site called Beer Hunter and I didn’t find any easier way than to screen scrape the site addresses and then geocode the results with’s xml api via a little php script. If the locations that you want to geocode are U.S. addresses, I believe that provides the same sort of public api for the states.

  3. Yep – I use the to scrape realestate listings from WebView360. Take a look at Winnipeg, Edmonton or Vancouver over on, and you’ll see results parsed out in this painful method.
    Problem being, the webview listings are in a consistent format from one page to another, whereas separate companies will not be.
    Perhaps scraping a business category listing from would be faster?

  4. Thanks all. Unfortunately, sounds like scraping is it. Too bad that companies are so bad about uniform “Contact Us” pages.

  5. Too bad that companies are so bad about uniform “Contact Us” pages.
    Suggest a standard, by all means. Unless there is one?

  6. Brian – downloadable vCards?

  7. I had needed to come up with a solution to the “what’s a company’s real address based on thier URL” problem, and the solution worked as Brian suggested. However, without needing standardized Contact Us pages. It simply looked for pages anywhere on the site that contain a phone number (in any of the various possible formats), known city, known country name/code, zip code/postal code, and a few other identifiers and mention of the word “contact” (in the known Western language forms [didn’t have unicode parsing for doube-byte characters])
    It’s easier to implement than you might think.

  8. Some search company out there does this – specializes in contact us pages. Of course, I can’t remember who it is…
    How’s that for a lazy lazy web answer?

  9. Brian – downloadable vCards?
    D’oh. Focused on ‘what a page should look like’ and forgot about vCards.
    Something else to add to the to-do list.