« Revisiting the Tech-Subprime Connection | Main | To All iPhone Customers: Whoops! »

Latest Stories

September 6, 2007

LazyWeb Tech: XPath Query to Extract Earnings Dates

A geekish digression: I've been trying to write an XPath expression to exploit a new feature in Google Spreadsheets whereby you can extract live content from other sites. I want to pull the next quarterly earnings date for public companies from another site, and then embed those in a Google Spreadsheet where I can do some subsequent analysis.

Here, using Cisco as the example, is the source url: http://www.earningswhispers.com/stocks.asp?symbol=CSCO&artno=&ref=434. And here is the XPath query I have (using XPath Checker) to pull the earnings date from the preceding page: id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]. The Google syntax is =importXML("URL","query").

I can't for the life of me get it to stop giving me errors. Thoughts? I do minimal amounts with XPath, so it's entirely possible my expression is hosed.

Sphere It   |  Digg this! Digg it   |  Bookmark this! Bookmark it   |  Stumble It! Stumble it

Comments

a quick look at the source page, but i don't see a tbody tag anywhere on the page?

I think you may want something like:

//table[@id='Indices']/tr/td/table/tr[6]/td[2]

However, I don't know if the page conforms to proper XML, so the parser may barf on the page itself, even though the XPATH may be correct.

Actually, I think google is having an issue with the URL for some reason. It can't even retrieve the page.

The first expression grabs the right td in Firebug with $x("id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]")

I forget how to grab the actual text - and am wondering if the carriage return in the value might be causing the Google Spreadsheet to hickup.

Yeah, I think that's the problem. The URL is blowing up. Maybe bot blocking. Damn.

Didn't see James' answer; in any case the expression I was looking for was:
$x("id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]/text()")

Why use Google Spreadsheets to do this as opposed to Excel? Just for the purposes of sharing it?

Partly that, and partly because I wanted to test out the new ImportXML function in Google Spreadsheets.

Firebug also lies about XPath statements putting in tbody elements where there are none - http://blogs.nitobi.com/dave/index.php/2007/06/06/firebug-copy-xpath/