Hydra North Use Cases

Since we haven't got our contributor agreements filed yet, I'm parking these here rather than submitting a PR. - pbinkley

METS/ALTO Newspaper Issues

Given a newspaper issue that has been digitized in the METS/ALTO format, with article-level MODS records embedded in the METS

As a repository

I want to

  • store the digital images and the METS/ALTO files, maintaining their relationships

  • store the article-level MODS as discrete items, maintaining their relationships to zones on pages as specified in the METS/ALTO structures

  • search and display individual articles

  • extract the text of a given article from its zones for indexing

  • display the zones for a given article

So that articles can be presented as top-level objects in the discovery system with views of the appropriate zones on the appropriate pages.

 METS/ALTO Monographs

Given a monograph that has been digitized in the METS/ALTO format, with metadata in the form of a MODS record

As a repository I want to

  • store the digital images and the METS/ALTO files, maintaining their relationships

  • extract the full text from the ALTO files for indexing

  • display search results in the form of page images with overlaid highlighting to show the positions of the search terms (i.e. the nesting of text blocks within the page is used in the display)

  • enable specifying sections of a work (e.g. chapters) which may be discoverable (i.e. have their own metadata) or merely navigable (e.g. from a table of contents in the monograph-level metadata record)

  • display a table of contents, with links to individual pages

  • display the book in the Internet Archive bookreader, with full-text search enabled

So that the book is navigable and searchable.


Given a digitized scrapbook which has been scanned at the page level as well as at the level of each item (clipping, picture, etc.) attached to a given page, for which MODS records have been created both at the scrapbook level and at the attached item level

As a repository I want to

  • make the scrapbook browsable at the page level, with the ability to view individual items independently

  • make the scrapbook and the individual items discoverable as top-level items in the discovery system

  • display the individual items in the context of their page and independently

So that the contents of the scrapbook are accessible in as rich a way as possible.

 Research Data Sets

Given a research project with multiple researchers producing a number of datasets

As a data curator I want to

  • archive these datasets while maintaining relationships between a project and its datasets and sub-datasets, if any, so that at any point in the future users are able to pull all related datasets

  • archive metadata at the project level as well as at each dataset level

  • archive researcher information and maintain relationships with datasets they produced so that users are able to see contributions of a particular researcher

  • may be versioned at the object level or the file level or both

  • would want to optimize storage in the case of multiple versions of large objects/files

So that future users are able to discover, interpret and reuse these research contributions

 Archival documents

Given a digitized archival collection with multiple object types and formats,

As an archivist I want to

  • provide access to manuscript letters, where multiple text pages may be on a single folded physical page, and the order may not be consistent

  • allow flexibility in storing archival units (e.g. some at folder level, some at item level)

So that archival collections can be presented to users with as much richness of content and navigation as we can afford in our digitization and metadata work.

Annotated items

Given a research project making heavy use of linked open data to manage relationships among objects, which may contain an entity (person, organization, place, custom, bib-record, work, annotation, event) and its representation (document, event, image, map, audio, video)

As a researcher I want to

  • maintain the relationship between an entity and all its representations and annotations

  • store any aggregation details which could be a collection level aggregation or a project level aggregation

  • store information about a representation object pointing to multiple entities

  • archive all major revisions of entities/representations/annotations

So that an annotated object can be properly presented and associated with all its representations

Website in a WARC

Given a WARC file for a website which is part of a bigger collection, e.g. Government Websites

As a web archiving coordinator I want to

  • keep WARC files from a particular area e.g. Government Websites, under one collection so that users have access to all the related WARC files from one collection

  • provide access to all the PDFs harvested within a single WARC file

so that users interested in a specific object can find and use it

  • provide bulk or otherwise efficient access to the collection

So that researchers can study or perform computational analysis (e.g. data mining)