Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview of OAI-PMH and support in Samvera

The Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) facilitates the sharing of metadata via an endpoint, or base URL, that exposes metadata to harvesting by HTTP requests. Librarians working with metadata rely on OAI-PMH to make structured data available to a variety of platforms with differing requirements. Flexibility in how data can be structured via OAI-PMH is therefore key. Current use cases for OAI-PMH include but are not limited to sharing metadata with DPLA, EBSCO Discovery Service, Worldcat’s Digital Collections Gateway, local catalogs (PrimoVE), and importing metadata and files with Bulkrax.

Although support for OAI-PMH isn’t present by default in all Samvera repository solutions, it can be configured for use in Hyrax using the BlacklightOAIProvider plug-in and ruby-oai library. The current version of Hyku (version 5) has Blacklight OAI built into it. Note that the only metadata prefix that is supported out of the box using this plug-in is Dublin Core. Other established and custom metadata formats can be configured. There is no OAI-PMH support in Avalon.

Configure OAI with Hyrax

Detailed step-by-step instructions for creating an OAI-PMH service endpoint in a Hyrax application can be found in the README.md file of the BlacklightOAIProvider plug-in github repository.   

As indicated in the README, customization of OAI-PMH provider parameters is done in the catalog controller (app/controllers/catalog_controller.rb).  The OAI-PMH endpoint URL receives requests from metadata harvesters in the form of HTTP POST keys, then sends back information in XML serialization. To customize/modify these fields as they appear in the XML, define the field_semantics in the Solr document (app/models/solr_document.rb). 

Configure OAI with Hyku

The current version of Hyku has Blacklight OAI built into it. As per the Hyrax instructions, it is configured in the catalog controller (app/controllers/catalog_controller.rb) with metadata field specifics in the field_semantics section of the SOLR document (app/models/solr_document.rb).

...

Example of custom metadata format display:

...

OAI-PMH Feed Display & Queries

The OAI-PMH feed can be accessed by /catalog/oai at the end of your Samvera-based app (e.g. [URL]/catalog/oai).*

...

As per the OAI-PMH specification, there are various queries that can be performed on the records in your Samvera repository. The string [URL] displayed in these examples should be replaced with the repository’s URL when following the query’s syntax.

Identify

To view general information about your OAI-PMH feed, including your Request URL, click the Identify link or enter the following query.

[URL]/catalog/oai?verb=Identify

List records

To view all records in the repository, click the ListRecords link or enter the following query. 

...

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_hyku

List sets

Whether the repository uses collections and/or admin sets, one can query by the sets. For a list of the sets, click the ListSets link or enter the following query.

[URL]/catalog/oai?verb=ListSets

List metadata formats

To view all the metadata formats in the OAI feed, click the link for ListMetadataFormats or enter the following query.

...

Unless other metadata formats have been configured, the default metadata prefix is oai_dc.

List Identifiers

To view all identifiers in the repository, click the ListIdentifiers link or enter the following query.

...

[URL]/catalog/oai?verb=ListIdentifiers&metadataPrefix=oai_dc

List records by set

Sets are configured locally by an institution. To view all records by a set, click on ListSets link or, in a particular implementation, enter according to the query syntax examples.

...

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc&set=unit:[internal identifier]

Resumption Tokens

OAI-PMH feeds typically consist of a large amount of data. For most queries, you will likely need to use the resumption tokens to get the entire data set. One can use the Resume link or copy and paste the resumptionToken string into the query to continue pulling results.

Challenges

The out of the box OAI service defaults to Dublin Core. Additional metadata fields will not show up in the Blacklight feed unless custom development work is done. In Hyku, the oai_hyku prefix has been added by SoftServ, and the process for adding additional customized metadata fields is simplified in the code. Documentation for how to map to schemas other than DC is lacking.

...

Some version upgrades are not backwards compatible with existing OAI-PMH code. For example, code written for Hyrax version 2 is incompatible with Hyrax version 3.

Examples in the Samvera community

Oregon Digital:

https://github.com/OregonDigital/oregondigital/wiki/OAI-Documentation  

...