OAI-PMH Documentation

Overview of OAI-PMH and support in Samvera

The Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) facilitates the sharing of metadata via an endpoint, or base URL, that exposes metadata to harvesting by HTTP requests. Librarians working with metadata rely on OAI-PMH to make structured data available to a variety of platforms with differing requirements. Flexibility in how data can be structured via OAI-PMH is therefore key. Current use cases for OAI-PMH include but are not limited to sharing metadata with DPLA, EBSCO Discovery Service, Worldcat’s Digital Collections Gateway, local catalogs (PrimoVE), and importing metadata and files with Bulkrax.

Although support for OAI-PMH isn’t present by default in all Samvera repository solutions, it can be configured for use in Hyrax using the BlacklightOAIProvider plug-in and ruby-oai library. The current version of Hyku (version 5) has Blacklight OAI built into it. Note that the only metadata prefix that is supported out of the box using this plug-in is Dublin Core. Other established and custom metadata formats can be configured. There is no OAI-PMH support in Avalon.

Configure OAI with Hyrax

Detailed step-by-step instructions for creating an OAI-PMH service endpoint in a Hyrax application can be found in the README.md file of the BlacklightOAIProvider plug-in github repository.   

As indicated in the README, customization of OAI-PMH provider parameters is done in the catalog controller (app/controllers/catalog_controller.rb).  The OAI-PMH endpoint URL receives requests from metadata harvesters in the form of HTTP POST keys, then sends back information in XML serialization. To customize/modify these fields as they appear in the XML, define the field_semantics in the Solr document (app/models/solr_document.rb). 

Configure OAI with Hyku

The current version of Hyku has Blacklight OAI built into it. As per the Hyrax instructions, it is configured in the catalog controller (app/controllers/catalog_controller.rb) with metadata field specifics in the field_semantics section of the SOLR document (app/models/solr_document.rb).

Samvera defaults to oai_dc as its metadata format in the OAI-PMH feed. However, many repositories have custom formats. SoftServ has implemented code in Hyku for easier customization of OAI-PMH data (https://playbook-staging.notch8.com/en/samvera/oai-feeds ). For example, Hyku for Consortia uses the custom metadata prefix oai_hyku in its Hyku Commons repository.  This custom prefix has been added to hyku. See the pull request: https://github.com/samvera/hyku/pull/1934/files

The standard OAI XSLT transformation does not fully support displaying the custom metadata prefixes. See example below.  However, one can right click to view the View Page Source to see the raw XML. One can get all the XML data with the custom prefixes relatively easily or a developer can configure that. This includes the custom header specs and custom encodings. It can generate the information from a SOLR service query.

Example of custom metadata format display:

Example of custom metadata format
Screen grab of a OAI 2.0 Requests Results screen for customized Hyku metadata. The image shows the the OAI XSLT transformation had a problem converting the OAI feed and displays some with the message “Unknown Metadata Format.”

OAI-PMH Feed Display & Queries

The OAI-PMH feed can be accessed by /catalog/oai at the end of your Samvera-based app (e.g. [URL]/catalog/oai).*

*Note: The initial OAI-PMH feed page gives an error code of “badVerb”.

This screen grab shows the OAI 2.0 Request Results page with an OAI Error Code of “badVerb.”

As per the OAI-PMH specification, there are various queries that can be performed on the records in your Samvera repository. The string [URL] displayed in these examples should be replaced with the repository’s URL when following the query’s syntax.

Identify

To view general information about your OAI-PMH feed, including your Request URL, click the Identify link or enter the following query.

[URL]/catalog/oai?verb=Identify

List records

To view all records in the repository, click the ListRecords link or enter the following query. 

[URL]/catalog/oai?verb=ListRecords

If the repository has different metadata prefixes, it may be necessary to specify the prefix as per the following queries.

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_hyku

List sets

Whether the repository uses collections and/or admin sets, one can query by the sets. For a list of the sets, click the ListSets link or enter the following query.

[URL]/catalog/oai?verb=ListSets

List metadata formats

To view all the metadata formats in the OAI feed, click the link for ListMetadataFormats or enter the following query.

[URL]/catalog/oai?verb=ListMetadataFormats

Unless other metadata formats have been configured, the default metadata prefix is oai_dc.

List Identifiers

To view all identifiers in the repository, click the ListIdentifiers link or enter the following query.

[URL]/catalog/oai?verb=ListIdentifiers

[URL]/catalog/oai?verb=ListIdentifiers&metadataPrefix=oai_dc

List records by set

Sets are configured locally by an institution. To view all records by a set, click on ListSets link or, in a particular implementation, enter according to the query syntax examples.

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc&set=collection:[collection name]

[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc&set=unit:[internal identifier]

Resumption Tokens

OAI-PMH feeds typically consist of a large amount of data. For most queries, you will likely need to use the resumption tokens to get the entire data set. One can use the Resume link or copy and paste the resumptionToken string into the query to continue pulling results.

Challenges

The out of the box OAI service defaults to Dublin Core. Additional metadata fields will not show up in the Blacklight feed unless custom development work is done. In Hyku, the oai_hyku prefix has been added by SoftServ, and the process for adding additional customized metadata fields is simplified in the code. Documentation for how to map to schemas other than DC is lacking.

When using the oai_dc feed, thumbnails are not included. Development work is needed to pull thumbnails into the feed. In terms of Hyrax, Lafayette College and The Ohio State University Libraries provide the thumbnail link in the dc:identifier field as [repository url/]downloads/[file id]?file=thumbnail . This is provided by pulling the thumbnail_path_ss from SOLR.

Example: https://library.osu.edu/dc/downloads/7h149s49p?file=thumbnail 

In Hyku, Hyku for Consortia/Hyku Commons provides the thumbnail link in its custom oai_hyku format in the thumbnail_url field (this is custom).

Example: https://au-archives.hykucommons.org/downloads/5d20d75f-0744-4f81-bcb0-998b1ff08e70?file=thumbnail

Some version upgrades are not backwards compatible with existing OAI-PMH code. For example, code written for Hyrax version 2 is incompatible with Hyrax version 3.

Examples in the Samvera community

 

Oregon Digital:

https://github.com/OregonDigital/oregondigital/wiki/OAI-Documentation  

Base URL: http://oregondigital.org/catalog/oai?verb=Identify

Note: The GitHub documentation is out-of-date but provides other useful insights.

 

Hyku Commons (Hyku):

Accessing the hyku OAI feed

Base URL: https://TENANT.hykucommons.org/catalog/oai?verb=Identify

Example: https://au-archives.hykucommons.org/catalog/oai?verb=Identify

 

Shared Research Repository (Hyku)

Base URL: https://iro.bl.uk/catalog/oai?verb=Identify

 

Carleton University (Hyrax)

Base URL: https://repository.library.carleton.ca/catalog/oai?verb=Identify

 

Duke University (Hyrax)

Base URL: https://research.repository.duke.edu/catalog/oai?verb=Identify

 

Lafayette College (Hyrax)

Base URL: https://ldr.lafayette.edu/catalog/oai?verb=Identify

 

National Institute for Materials Science (Hyrax)

Base URL:https://mdr.nims.go.jp/catalog/oai?verb=Identify

 

The Ohio State University Libraries (Hyrax)

Base URL: https://library.osu.edu/dc/api/oai?verb=Identify

Note: This repository uses api rather than catalog in the repository’s URL.

 

Northwestern University Libraries (custom application):

Base URL:https://api.dc.library.northwestern.edu/api/v2/oai?verb=Identify

Note: implemented as an AWS Lambda using the node runtime as a part of our public digital collections API https://github.com/nulib/dc-api-v2/blob/main/src/handlers/oai.js

 

University of North Carolina at Chapel Hill (Hyrax)

Base URL: https://cdr.lib.unc.edu/catalog/oai?verb=Identify