O'Reilly Product Metadata Interface

Most publishers are familiar with the ONIX standard for exchanging metadata about books among trading partners. Anyone who's actually spent time working with ONIX knows that its syntax is abstruse at best. While ONIX does use XML, there are more modern, more general, and more immediately comprehensible standards out there, particularly for the basic details like "author," "title," and "edition."

One of those standards is RDF, or "Resource Description Framework." This experimental O'Reilly Product Metadata Interface (OPMI) exposes RDF for all of O'Reilly's titles, organized by ISBN. Here's a snippet of the RDF metadata for iPhone: The Missing Manual, 2e from the OPMI at http://opmi.labs.oreilly.com/product/9780596521677:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <om:Product xmlns:om="http://purl.oreilly.com/ns/meta/" 
              rdf:about="urn:x-domain:oreilly.com:product:9780596521677.BOOK" 
              xmlns:dc="http://purl.org/dc/terms/"
              xml:lang="en">
    <dc:isFormatOf rdf:resource="urn:x-domain:oreilly.com:product:955988693.IP"/>
    <dc:issued>2008-08-13</dc:issued>
    <dc:creator>
      <rdf:Seq rdf:ID="creator">
        <rdf:li rdf:resource="urn:x-domain:oreilly.com:agent:pdb:350"/>
      </rdf:Seq>
    </dc:creator>
    <dc:rightsHolder>David Pogue</dc:rightsHolder>
    <dc:description>The new iPhone 3G is here, and bestselling author David Pogue 
      is back with a thoroughly updated edition of &lt;em&gt;iPhone: The Missing 
      Manual&lt;/em&gt;. With its faster downloads, touch-screen iPod, and best-ever 
      mobile Web browser, the new affordable iPhone is packed with possibilities.
      But without an objective guide like this one, you'll never unlock all it can
      do for you. Each custom designed page helps you accomplish specific tasks for
      everything from web browsing, to new apps, to watching videos.</dc:description>
    <dc:extent>376 pages</dc:extent>
    <dc:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
    <dc:format>6 x 9 in</dc:format>
...

Our RDF includes many Dublin Core (DCMI Metadata Terms), including:

  • title
  • creator
  • subject
  • description
  • abstract
  • publisher
  • issued
  • type
  • format
  • identifier
  • language
  • rights
  • rightsHolder

In addition to Dublin Core, the OPMI includes elements from::

  • FOAF ("Friend of a Friend") for describing people that contribute to a title
  • MARC Relators codes for used to describe how a person contributed to a work, such as the Cover Designer or Editor
  • MODS (Metadata Object Description Schema), used for all sorts of... no, not really it's used for exactly one thing: to specify the edition of the work

You'll also find data on all of the available product formats. Here's the attributes noting information about the Print, Safari, Ebook, and iPhone App versions of this title:

rdf:about="urn:x-domain:oreilly.com:product:9780596521677.SAF"
rdf:about="urn:x-domain:oreilly.com:product:9780596153960.EBOOK"
rdf:about="urn:x-domain:oreilly.com:product:9780596521677.BOOK"
rdf:about="urn:x-domain:oreilly.com:product:9780596801007.APP"

The URLs are structured by ISBN. Once you have the ISBN for an O'Reilly book, you can get the full metadata via HTTP request to:

http://opmi.labs.oreilly.com/product/ISBN

To get you started, here's direct links to the public RDF for our current top-5 bestsellers:

After working through those five, start your exploration with an ISBN of a title you find interesting. Identifying which ISBNs are interesting to you is something in your court for now, but we're brainstorming on filtering and querying. Here's a snapshot of more than 1100 titles available on 12 February, 2009 from the O'Reilly Store, if you'd like to go whole hog (be kind to our servers, please). You can find an O'Reilly ISBN on the back of all your O'Reilly books, while reading on Safari Books Online, on the oreilly.com Store, or on Amazon in the "Product Details" section. For this example, let's use Practical RDF's ISBN, 9780596002633.

Using your favorite browser, programing language, or command-line utility do an HTTP GET of http://opmi.labs.oreilly.com/product/9780596002633.

    Client                                     Server
        |                                           |
        |  1.) GET to OMPI URI                      |
        |------------------------------------------>|
        |                                           |
        |  2.) 200 Ok                               |
        |      RDF Product Representation           |
        |<------------------------------------------|
        |                                           |

You'll get back an RDF/XML document containing all the metadata for not only the ISBN of the product you asked about, but all of it's directly related products as well. For example, while we asked about the Print form of Practical RDF, the document we get back will also include information about the eBook and Safari Books Online version of the product as well. There will also be a foaf:Person record for every person who contributed to the work as well as an author biography.

If you're frightened by RDF, we have a few links that might help. The RDF Primer from the W3C is an excellent, if dense, guide to getting started with RDF. Our own Practical RDF was read extensively by our developers while working on our applications. We've posted the second chapter, RDF: Heart and Soul, an overview of RDF, to get your feet wet. For more advanced topics including reasoning and the use of OWL in conjunction with RDF we found Semantic Web for the Working Ontologist to be invaluable.

Happily, there are a large number of open source tools for working with RDF. Some of the ones we use here at O'Reilly are:

  • The Tabulator Extension, a Firefox extension that allows for good visualization and browsing of RDF data.
  • The Jena framework, for our Java based applications.
  • RDFLib, for our Python-based applications

There's a lot more we'll be doing here to both provide more data and to add some human-friendly views into the data, but we wanted to let this out in the wild, if a bit unpolished, in time with the 2009 TOC Conference. If you're familiar with XML and RDF and don't mind poking around among the angle brackets, we'd love to hear what you come up with!

We've described a very very simple use in O'Reilly Product Metadata Interface (OPMI) Usage At Tweet Length.

Stay tuned to the O'Reilly Labs blog for updates and more information on experimental projects from O'Reilly.

Projects

Bookworm

The free platform for reading EPUB books online from any device.

Integrated with O'Reilly Labs 02/09/09.

First translations added 03/11/09.

Feedbooks integration & one-click addition added 07/29/09.

Beta Projects

Open Feedback Publishing System (OFPS)

Participate in collaborative community feedback to help refine in-progress, open manuscripts like Building iPhone Apps with HTML, CSS, and JavaScript or the published Programming Scala.

Released 05/20/09.

O'Reilly Product Metadata Interface (OPMI)

Want to know all we know about an  O'Reilly book? Give us an ISBN and we'll let you in on our (RDF) secrets!

Released 02/09/09.

Open Source

DocBook-XSL 1.74.3 with Improved ePub Output

Keith Fahlgren (O'Reilly Media) helped release the stable 1.74.3 release of the open source DocBook-XSL project and improved the EPUB generation stylesheets. Paul Norton (Adobe) and Liza Daly (Threepress) provided very helpful patches.

Released 02/17/09.

DocBook-XSL 1.74.0 EPUB Output

Paul Norton (Adobe) and Keith Fahlgren (O'Reilly Media) have contributed code to the 1.74.0 release of the open source DocBook-XSL project that generates EPUB documents from DocBook. An alpha-quality reference implementation in Ruby was also been provided.

EPUB is an open standard of  the The International Digital Publishing Forum (IDPF) and something O'Reilly is trying  to help gain wider adoption.

Released 06/02/08.