« Revisiting the Tech-Subprime Connection | Main | To All iPhone Customers: Whoops! »
Latest Stories
- Interview with a (Fake) Hedge Fund Manager
- Reason # 7,732 Why Yahoo Management is Delusional
- First Faux Apple 3G iPhone Teardown
- New York as (Financial) Tech Startup Hub
- Companies Mutate or Die
September 6, 2007
LazyWeb Tech: XPath Query to Extract Earnings Dates
A geekish digression: I've been trying to write an XPath expression to exploit a new feature in Google Spreadsheets whereby you can extract live content from other sites. I want to pull the next quarterly earnings date for public companies from another site, and then embed those in a Google Spreadsheet where I can do some subsequent analysis.Here, using Cisco as the example, is the source url: http://www.earningswhispers.com/stocks.asp?symbol=CSCO&artno=&ref=434. And here is the XPath query I have (using XPath Checker) to pull the earnings date from the preceding page: id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]. The Google syntax is =importXML("URL","query").
I can't for the life of me get it to stop giving me errors. Thoughts? I do minimal amounts with XPath, so it's entirely possible my expression is hosed.
Sphere It
|
Digg it
|
Bookmark it
|
Stumble it
I think you may want something like:
//table[@id='Indices']/tr/td/table/tr[6]/td[2]
However, I don't know if the page conforms to proper XML, so the parser may barf on the page itself, even though the XPATH may be correct.
Actually, I think google is having an issue with the URL for some reason. It can't even retrieve the page.
The first expression grabs the right td in Firebug with $x("id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]")
I forget how to grab the actual text - and am wondering if the carriage return in the value might be causing the Google Spreadsheet to hickup.
Yeah, I think that's the problem. The URL is blowing up. Maybe bot blocking. Damn.
Didn't see James' answer; in any case the expression I was looking for was:
$x("id('Indices')/tbody/tr/td/table/tbody/tr[6]/td[2]/text()")
Why use Google Spreadsheets to do this as opposed to Excel? Just for the purposes of sharing it?
Partly that, and partly because I wanted to test out the new ImportXML function in Google Spreadsheets.
Firebug also lies about XPath statements putting in tbody elements where there are none - http://blogs.nitobi.com/dave/index.php/2007/06/06/firebug-copy-xpath/









a quick look at the source page, but i don't see a tbody tag anywhere on the page?