Using the Twitter Search API to Refine TOC Conference Tweet Data

By Andrew Savikas
February 15, 2009 | Comments: 1

I didn't see it coming from this audience, but the Twitter chatter was thunderous during the TOC Conference this year. As things wound up, a lot of attendees were looking for a single list of all the conference tweets. Not having played with the Twitter API before, I didn't know about the apparent limit to the number of tweets returned in a search, but fortunately hashtags had a history extending back at least to late in the day before the conference started; from that I was able to get a raw text file.

With a bit of regex on the file in TextMate and then an (admittedly ugly) Ruby script I generated an XML file listing all of the people who tweeted with hashtag #toc during the conference, and listed most of their #toc tweets (not all, though, as far as I could tell some limit on the API that I'm not aware of -- not that I spent a ton of time looking...). Here's some more details for anyone interested in getting their feet wet with the Twitter search API:

The simple XML I built from the raw text looked like this:

<tweet user="golfgal">Reading a great article about the coolest gadget and PressDisplay by
   Robert Ivan of Metaprinter at #TOC -</tweet>
<tweet user="jmandala">@robotech_master #TOC, not sure wht 2 make of King *choosing* 2
   evangelize the kindle? Seems irresponsible, I guess.</tweet>
<tweet user="ericrumsey">@timoreilly Google trends - "pdf download ebooks" via @basirat Top
   countries : 3 i's: Iran, India, Indonesia #toc

I fed that into this Ruby script:


require 'rubygems'
require 'rexml/document'
require 'twitter'
require 'builder'

urlbase = ""

all = open('toctweets.xml').read
allxml =

toc_twitterers = REXML::XPath.match(allxml, '//@user').uniq

x = => $stdout)
x.toctweeters do |tweeters|
  toc_twitterers.each do |username|
    x.tweeter("user" => username) do |tweeter|
      toc_tweets ='toc')
      toc_tweets.each do |t|
        x.created_at t.created_at
        x.url "#{urlbase}#{username}/status/#{}"
        x.text t.text
      STDOUT.flush # just for better feedback while running
      sleep(0.5) # be nice to the API
    end #tweeter
  end #username
end #tweeters

After it was clear that the above didn't actually produce all of a user's #toc tweets, I also posted the XML-ified version of the raw text file as well. (BTW, best visualization of this stuff wins a free pass to TOC 2010 -- details here).

1 Comment

