OAI-PMH Documentation
Overview of OAI-PMH and support in Samvera
The Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) facilitates the sharing of metadata via an endpoint, or base URL, that exposes metadata to harvesting by HTTP requests. Librarians working with metadata rely on OAI-PMH to make structured data available to a variety of platforms with differing requirements. Flexibility in how data can be structured via OAI-PMH is therefore key. Current use cases for OAI-PMH include but are not limited to sharing metadata with DPLA, EBSCO Discovery Service, Worldcat’s Digital Collections Gateway, local catalogs (PrimoVE), and importing metadata and files with Bulkrax.
Although support for OAI-PMH isn’t present by default in all Samvera repository solutions, it can be configured for use in Hyrax using the BlacklightOAIProvider plug-in and ruby-oai library. The current version of Hyku (version 5) has Blacklight OAI built into it. Note that the only metadata prefix that is supported out of the box using this plug-in is Dublin Core. Other established and custom metadata formats can be configured. There is no OAI-PMH support in Avalon.
Configure OAI with Hyrax
Detailed step-by-step instructions for creating an OAI-PMH service endpoint in a Hyrax application can be found in the README.md file of the BlacklightOAIProvider plug-in github repository.
As indicated in the README, customization of OAI-PMH provider parameters is done in the catalog controller (app/controllers/catalog_controller.rb
). The OAI-PMH endpoint URL receives requests from metadata harvesters in the form of HTTP POST keys, then sends back information in XML serialization. To customize/modify these fields as they appear in the XML, define the field_semantics
in the Solr document (app/models/solr_document.rb
).
Configure OAI with Hyku
The current version of Hyku has Blacklight OAI built into it. As per the Hyrax instructions, it is configured in the catalog controller (app/controllers/catalog_controller.rb
) with metadata field specifics in the field_semantics
section of the SOLR document (app/models/solr_document.rb
).
Samvera defaults to oai_dc as its metadata format in the OAI-PMH feed. However, many repositories have custom formats. SoftServ has implemented code in Hyku for easier customization of OAI-PMH data (OAI Feeds ). For example, Hyku for Consortia uses the custom metadata prefix oai_hyku in its Hyku Commons repository. This custom prefix has been added to hyku. See the pull request: Extended oai by sephirothkod · Pull Request #1934 · samvera/hyku
The standard OAI XSLT transformation does not fully support displaying the custom metadata prefixes. See example below. However, one can right click to view the View Page Source to see the raw XML. One can get all the XML data with the custom prefixes relatively easily or a developer can configure that. This includes the custom header specs and custom encodings. It can generate the information from a SOLR service query.
Example of custom metadata format display:
OAI-PMH Feed Display & Queries
The OAI-PMH feed can be accessed by /catalog/oai
at the end of your Samvera-based app (e.g. [URL]/catalog/oai).*
*Note: The initial OAI-PMH feed page gives an error code of “badVerb”.
As per the OAI-PMH specification, there are various queries that can be performed on the records in your Samvera repository. The string [URL] displayed in these examples should be replaced with the repository’s URL when following the query’s syntax.
Identify
To view general information about your OAI-PMH feed, including your Request URL, click the Identify link or enter the following query.
[URL]/catalog/oai?verb=Identify
List records
To view all records in the repository, click the ListRecords link or enter the following query.
[URL]/catalog/oai?verb=ListRecords
If the repository has different metadata prefixes, it may be necessary to specify the prefix as per the following queries.
[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc
[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_hyku
List sets
Whether the repository uses collections and/or admin sets, one can query by the sets. For a list of the sets, click the ListSets link or enter the following query.
[URL]/catalog/oai?verb=ListSets
List metadata formats
To view all the metadata formats in the OAI feed, click the link for ListMetadataFormats or enter the following query.
[URL]/catalog/oai?verb=ListMetadataFormats
Unless other metadata formats have been configured, the default metadata prefix is oai_dc.
List Identifiers
To view all identifiers in the repository, click the ListIdentifiers link or enter the following query.
[URL]/catalog/oai?verb=ListIdentifiers
[URL]/catalog/oai?verb=ListIdentifiers&metadataPrefix=oai_dc
List records by set
Sets are configured locally by an institution. To view all records by a set, click on ListSets link or, in a particular implementation, enter according to the query syntax examples.
[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc&set=collection:[collection name]
[URL]/catalog/oai?verb=ListRecords&metadataPrefix=oai_dc&set=unit:[internal identifier]
Resumption Tokens
OAI-PMH feeds typically consist of a large amount of data. For most queries, you will likely need to use the resumption tokens to get the entire data set. One can use the Resume link or copy and paste the resumptionToken string into the query to continue pulling results.
Challenges
The out of the box OAI service defaults to Dublin Core. Additional metadata fields will not show up in the Blacklight feed unless custom development work is done. In Hyku, the oai_hyku prefix has been added by SoftServ, and the process for adding additional customized metadata fields is simplified in the code. Documentation for how to map to schemas other than DC is lacking.
When using the oai_dc feed, thumbnails are not included. Development work is needed to pull thumbnails into the feed. In terms of Hyrax, Lafayette College and The Ohio State University Libraries provide the thumbnail link in the dc:identifier field as [repository url/]downloads/[file id]?file=thumbnail . This is provided by pulling the thumbnail_path_ss
from SOLR.
Example: https://library.osu.edu/dc/downloads/7h149s49p?file=thumbnail
In Hyku, Hyku for Consortia/Hyku Commons provides the thumbnail link in its custom oai_hyku format in the thumbnail_url field (this is custom).
Example: https://au-archives.hykucommons.org/downloads/5d20d75f-0744-4f81-bcb0-998b1ff08e70?file=thumbnail
Some version upgrades are not backwards compatible with existing OAI-PMH code. For example, code written for Hyrax version 2 is incompatible with Hyrax version 3.
Examples in the Samvera community
Oregon Digital:
Base URL: http://oregondigital.org/catalog/oai?verb=Identify
Note: The GitHub documentation is out-of-date but provides other useful insights.
Hyku Commons (Hyku):
Base URL: https://TENANT.hykucommons.org/catalog/oai?verb=Identify
Example: https://au-archives.hykucommons.org/catalog/oai?verb=Identify
Shared Research Repository (Hyku)
Base URL: https://iro.bl.uk/catalog/oai?verb=Identify
Carleton University (Hyrax)
Base URL: https://repository.library.carleton.ca/catalog/oai?verb=Identify
Duke University (Hyrax)
Base URL: https://research.repository.duke.edu/catalog/oai?verb=Identify
Lafayette College (Hyrax)
Base URL: https://ldr.lafayette.edu/catalog/oai?verb=Identify
National Institute for Materials Science (Hyrax)
Base URL:https://mdr.nims.go.jp/catalog/oai?verb=Identify
The Ohio State University Libraries (Hyrax)
Base URL: https://library.osu.edu/dc/api/oai?verb=Identify
Note: This repository uses api rather than catalog in the repository’s URL.
Northwestern University Libraries (custom application):
Base URL:https://api.dc.library.northwestern.edu/api/v2/oai?verb=Identify
Note: implemented as an AWS Lambda using the node runtime as a part of our public digital collections API https://github.com/nulib/dc-api-v2/blob/main/src/handlers/oai.js
University of North Carolina at Chapel Hill (Hyrax)
Base URL: https://cdr.lib.unc.edu/catalog/oai?verb=Identify