I linked scoffingly a week ago to a paper now getting lots of attention on how Twitter supposedly predicts the stock market. The scoffing at the Twitter/stock paper is now getting louder and more rigorous, with Michael Bommarito pointing out a potentially crucial methodological problem in the data use of in-sample versus out-of-sample data.
With Michael’s permission, his post follows after the jump.
As I noted when I first linked to this paper on arXiv, I think there may be an issue with the claim of prediction. Here is the portion of text that raises some serious questions in my mind. Emphasis is mine.
Note then that the assessment of predictive power later uses these z-scores, which are clearly not out-of-sample since they incorporate $k$ periods of future knowledge. Figure 3 and its caption below drive this point home, as they clearly indicate that $Z_t$ is used here.
The remainder of the text is somewhat ambiguous.
I’ve emailed the authors twice over the last week, and despite the fact that they visited my personal homepage through the email, I’ve received no response. In the meantime, I think the jury is out on whether Twitter can actually be used to rigorously, out-of-sample predict the stock market.