O'Reilly Product Metadata Interface (OPMI) Usage At Tweet Length

By Keith Fahlgren
February 12, 2009 | Comments: 2

A lot of developers have been contacting us with feedback on the launch of our O'Reilly Product Metadata Interface (OPMI), the web service that provides product data and information about O'Reilly's titles as RDF/XML. While we've got some rough help text up already, I just answered someone's question on how to start using the service and though I should share it.

"How do I just get titles and ISBNs of all your books?" It seemed a reasonable enough question.

Technically, I'm a Python guy now, but I can write Ruby oneliners in my sleep, so...


  1. Save the snapshot of more than 1100 titles available on 12 February, 2009 from the O'Reilly Store in a new folder

  2. On your curl & Ruby-enabled command line terminal, go to that folder with the file

  3. Run the following Ruby (making sure it's all one line)
    ruby -r 'rexml/document'  -e 'ARGV.each {|isbn| 
    rdf = isbn + ".rdf" ;
    system("curl -so #{rdf} http://opmi.labs.oreilly.com/product/#{isbn}");
    doc = REXML::Document.new(File.new(rdf));
    title = REXML::XPath.first(doc, "//dc:title"); puts "#{title.text}\t#{isbn}"
    }'
    `head orm_complete_12_feb_2009.txt `


  4. ???

  5. Profit

The above should give you the first 10 titles and ISBNs (because I used head--to get the whole list use cat or similar):


Palm OS Network Programming 9780596000059
Physics for Game Developers 9780596000066
Windows 2000 Pro: The Missing Manual 9780596000103
Managing IMAP 9780596000127
Windows 2000 Quick Fixes 9780596000172
Oracle and Open Source 9780596000189
Java Internationalization 9780596000196
Programming Perl 9780596000271
MCSE: Windows 2000 Exams in a Nutshell 9780596000301
Beyond Contact 9780596000370

Just to be explicit, here's what's going on in the code above:


# Run the Ruby interpreter
ruby

# and require Ruby's REXML XML-processing library
-r 'rexml/document'

# evaluate the string ("run it")
-e

# take each argument (at the end) and call it "isbn"
'ARGV.each {|isbn|

# isbn.rdf will be our filename
rdf = isbn + ".rdf" ;

# ask curl to download the OPMI data to our file
system("curl -so #{rdf} http://opmi.labs.oreilly.com/product/#{isbn}");

# Open the RDF/XML on our filesystem as a REXML document
doc = REXML::Document.new(File.new(rdf));

# Find the first Dublin Core title in the document (lazy!)
title = REXML::XPath.first(doc, "//dc:title");

# and print it to the screen with a tab and then the ISBN
puts "#{title.text}\t#{isbn}"

# close the loop
}'

# put 10 ISBNs as arguments
`head orm_complete_12_feb_2009.txt `


2 Comments

It's great that you guys published your book data in RDF - now it's definitely a best time to join Linked Data cloud.

I see one problem though - there is no way at the moment to reference any object in your data. For example, there is no way for me to state that I read "iPhone: The Missing Manual" book.

Even though there is RDF document that describes the book (http://opmi.labs.oreilly.com/product/9780596521677), there is no dereferencible URI (like http://opmi.labs.oreilly.com/product/9780596521677#book for example), that I can use to point at it in a statement like:


<foaf:Person about="http://www.sergeychernyshev.com/sergey">
<book:read>
<book:Book about="http://opmi.labs.oreilly.com/product/9780596521677#book"></book:Book>
</book:read>
</foaf:Person>

Also, I wonder if you have any effort going on about producing data about O'Reilly conferences as well? I run http://www.techpresentations.org project and consider O'Reilly being one of the most web-advanced conference organizer so I hope that conference and presentation data will be available on the data web some day.

Thank you,

Sergey

[ Probably talking to nobody by now, but... ]

Sergey,

You can use the ISBN to query search.oreilly.com, which should return 1 result

For example:

Hadoop: The Definitive Guide, Second Edition (PRINT) is ISBN 978-1-4493-8973-4 and a search for it is:

http://search.oreilly.com/?q=978-1-4493-8973-4

Projects

Bookworm

The free platform for reading EPUB books online from any device.

Integrated with O'Reilly Labs 02/09/09.

First translations added 03/11/09.

Feedbooks integration & one-click addition added 07/29/09.

Beta Projects

Open Feedback Publishing System (OFPS)

Participate in collaborative community feedback to help refine in-progress, open manuscripts like Building iPhone Apps with HTML, CSS, and JavaScript or the published Programming Scala.

Released 05/20/09.

O'Reilly Product Metadata Interface (OPMI)

Want to know all we know about an  O'Reilly book? Give us an ISBN and we'll let you in on our (RDF) secrets!

Released 02/09/09.

Open Source

DocBook-XSL 1.74.3 with Improved ePub Output

Keith Fahlgren (O'Reilly Media) helped release the stable 1.74.3 release of the open source DocBook-XSL project and improved the EPUB generation stylesheets. Paul Norton (Adobe) and Liza Daly (Threepress) provided very helpful patches.

Released 02/17/09.

DocBook-XSL 1.74.0 EPUB Output

Paul Norton (Adobe) and Keith Fahlgren (O'Reilly Media) have contributed code to the 1.74.0 release of the open source DocBook-XSL project that generates EPUB documents from DocBook. An alpha-quality reference implementation in Ruby was also been provided.

EPUB is an open standard of  the The International Digital Publishing Forum (IDPF) and something O'Reilly is trying  to help gain wider adoption.

Released 06/02/08.