A lot of developers have been contacting us with feedback on the launch of our O'Reilly Product Metadata Interface (OPMI), the web service that provides product data and information about O'Reilly's titles as RDF/XML. While we've got some rough help text up already, I just answered someone's question on how to start using the service and though I should share it.
"How do I just get titles and ISBNs of all your books?" It seemed a reasonable enough question.
Technically, I'm a Python guy now, but I can write Ruby oneliners in my sleep, so...
- Save the snapshot of more than 1100 titles available on 12 February, 2009 from the O'Reilly Store in a new folder
- On your curl & Ruby-enabled command line terminal, go to that folder with the file
- Run the following Ruby (making sure it's all one line)
ruby -r 'rexml/document' -e 'ARGV.each {|isbn|
rdf = isbn + ".rdf" ;
system("curl -so #{rdf} http://opmi.labs.oreilly.com/product/#{isbn}");
doc = REXML::Document.new(File.new(rdf));
title = REXML::XPath.first(doc, "//dc:title"); puts "#{title.text}\t#{isbn}"
}'
`head orm_complete_12_feb_2009.txt `
- ???
- Profit
The above should give you the first 10 titles and ISBNs (because I used head--to get the whole list use cat or similar):
Palm OS Network Programming 9780596000059
Physics for Game Developers 9780596000066
Windows 2000 Pro: The Missing Manual 9780596000103
Managing IMAP 9780596000127
Windows 2000 Quick Fixes 9780596000172
Oracle and Open Source 9780596000189
Java Internationalization 9780596000196
Programming Perl 9780596000271
MCSE: Windows 2000 Exams in a Nutshell 9780596000301
Beyond Contact 9780596000370
Just to be explicit, here's what's going on in the code above:
# Run the Ruby interpreter
ruby# and require Ruby's REXML XML-processing library
-r 'rexml/document'# evaluate the string ("run it")
-e# take each argument (at the end) and call it "isbn"
'ARGV.each {|isbn|# isbn.rdf will be our filename
rdf = isbn + ".rdf" ;# ask curl to download the OPMI data to our file
system("curl -so #{rdf} http://opmi.labs.oreilly.com/product/#{isbn}");# Open the RDF/XML on our filesystem as a REXML document
doc = REXML::Document.new(File.new(rdf));# Find the first Dublin Core title in the document (lazy!)
title = REXML::XPath.first(doc, "//dc:title");# and print it to the screen with a tab and then the ISBN
puts "#{title.text}\t#{isbn}"# close the loop
}'# put 10 ISBNs as arguments
`head orm_complete_12_feb_2009.txt `
Print
Listen




It's great that you guys published your book data in RDF - now it's definitely a best time to join Linked Data cloud.
I see one problem though - there is no way at the moment to reference any object in your data. For example, there is no way for me to state that I read "iPhone: The Missing Manual" book.
Even though there is RDF document that describes the book (http://opmi.labs.oreilly.com/product/9780596521677), there is no dereferencible URI (like http://opmi.labs.oreilly.com/product/9780596521677#book for example), that I can use to point at it in a statement like:
Also, I wonder if you have any effort going on about producing data about O'Reilly conferences as well? I run http://www.techpresentations.org project and consider O'Reilly being one of the most web-advanced conference organizer so I hope that conference and presentation data will be available on the data web some day.
Thank you,
Sergey