Using the Twitter Search API to Refine TOC Conference Tweet Data

By Andrew Savikas
February 15, 2009 | Comments: 1

I didn't see it coming from this audience, but the Twitter chatter was thunderous during the TOC Conference this year. As things wound up, a lot of attendees were looking for a single list of all the conference tweets. Not having played with the Twitter API before, I didn't know about the apparent limit to the number of tweets returned in a search, but fortunately hashtags had a history extending back at least to late in the day before the conference started; from that I was able to get a raw text file.

With a bit of regex on the file in TextMate and then an (admittedly ugly) Ruby script I generated an XML file listing all of the people who tweeted with hashtag #toc during the conference, and listed most of their #toc tweets (not all, though, as far as I could tell some limit on the API that I'm not aware of -- not that I spent a ton of time looking...). Here's some more details for anyone interested in getting their feet wet with the Twitter search API:

The simple XML I built from the raw text looked like this:

<tweet user="golfgal">Reading a great article about the coolest gadget and PressDisplay by
   Robert Ivan of Metaprinter at #TOC -</tweet>
<tweet user="jmandala">@robotech_master #TOC, not sure wht 2 make of King *choosing* 2
   evangelize the kindle? Seems irresponsible, I guess.</tweet>
<tweet user="ericrumsey">@timoreilly Google trends - "pdf download ebooks" via @basirat Top
   countries : 3 i's: Iran, India, Indonesia #toc
</tweet>

I fed that into this Ruby script:

#!/usr/bin/ruby

require 'rubygems'
require 'rexml/document'
require 'twitter'
require 'builder'

urlbase = "http://twitter.com/"

all = open('toctweets.xml').read
allxml = REXML::Document.new(all)

toc_twitterers = REXML::XPath.match(allxml, '//@user').uniq

x = Builder::XmlMarkup.new(:target => $stdout)
x.toctweeters do |tweeters|
  toc_twitterers.each do |username|
    x.tweeter("user" => username) do |tweeter|
      toc_tweets = Twitter::Search.new.from(username).hashed('toc')
      toc_tweets.each do |t|
        x.id t.id
        x.created_at t.created_at
        x.url "#{urlbase}#{username}/status/#{t.id}"
        x.text t.text
      end
      STDOUT.flush # just for better feedback while running
      sleep(0.5) # be nice to the API
    end #tweeter
  end #username
end #tweeters

After it was clear that the above didn't actually produce all of a user's #toc tweets, I also posted the XML-ified version of the raw text file as well. (BTW, best visualization of this stuff wins a free pass to TOC 2010 -- details here).

1 Comment

You can learn a lot more about how to use the Twitter API in the nearly finished Twitter API: Up and Running.

Projects

Bookworm

The free platform for reading EPUB books online from any device.

Integrated with O'Reilly Labs 02/09/09.

First translations added 03/11/09.

Feedbooks integration & one-click addition added 07/29/09.

Beta Projects

Open Feedback Publishing System (OFPS)

Participate in collaborative community feedback to help refine in-progress, open manuscripts like Building iPhone Apps with HTML, CSS, and JavaScript or the published Programming Scala.

Released 05/20/09.

O'Reilly Product Metadata Interface (OPMI)

Want to know all we know about an  O'Reilly book? Give us an ISBN and we'll let you in on our (RDF) secrets!

Released 02/09/09.

Open Source

DocBook-XSL 1.74.3 with Improved ePub Output

Keith Fahlgren (O'Reilly Media) helped release the stable 1.74.3 release of the open source DocBook-XSL project and improved the EPUB generation stylesheets. Paul Norton (Adobe) and Liza Daly (Threepress) provided very helpful patches.

Released 02/17/09.

DocBook-XSL 1.74.0 EPUB Output

Paul Norton (Adobe) and Keith Fahlgren (O'Reilly Media) have contributed code to the 1.74.0 release of the open source DocBook-XSL project that generates EPUB documents from DocBook. An alpha-quality reference implementation in Ruby was also been provided.

EPUB is an open standard of  the The International Digital Publishing Forum (IDPF) and something O'Reilly is trying  to help gain wider adoption.

Released 06/02/08.