Notes on Stanford, Hull, and MediaShelf Demos

Submitted by Bill Parod

Stanford Demonstrations
Argo - Michael Klein
Mostly Blacklight - 
Facets: Project Name, Workflows
Workflows specific workflow facets (count) Steps (with count of items within each step) completed (count) errors (count)
Can 'rotate' facets to reorder child structure, for example so that WF child facet is status rather than step
Workflow grid view: Can view all workflows, status, steps. Clicking on any will drill to that item with its facets showing. Can rotate facets to different facet views for that item.
When an object is registered, it is associated with a specific workflow. Most items use manual generic digitization workflow. (initiate, digitize, start accession). Workflows can chain so that completion of one workflow passes an item to another workflow. Some workflow steps are completed manually - human approval perhaps from other systems, such as PeopleSoft.
How is the workflow orchestrated and steps accomplished? What is the process that actually does the work?
Robots are periodically running services that query the workflow queues - show me those items that are ready for me. Robots refine prerequisites and work that has to be done. It used to be that robots were complex. They have been simplifying the work that robots actually do. They've done this by adding behaviors to the item objects themselves that the robots invoke at the appropriate time.
Tom Cramer: Every object has a workflow datastream. For an ETD it has to be submitted, processes, shelved, …. perhaps 7 discrete steps. Each step corresponds to a robot. This datastream references a series of queues, The robots look at those queues for work and then, when finished declare their work complete, which potentially satisfies prerequisites for other robots.  The DS is an external DS - an Oracle query - that provides DS content.
URL for Robots provided by Tom Cramer: https://wiki.duraspace.org/display/hydra/Stanford+Digital+Library+Workflow+%28Workdo%29
Argo is a Blacklight dashboard for exposing workflow status.
There are currently 6 workflows.
-------------------
Registration: 
SULair
Popup dialog for person doing registration / filling in 
Example register - 
Select project (McLaughlin Maps)
Select Object Type : 
Select where metadata comes from (Metadata Toolkit, …)
Select metadata toolkit form (MODS McLaughlin)
Can also add ad-hoc tags.
Admin Policy Object: Defines properties for a collection to store administrative information for objects that are governed by that policy. Defines rights defaults for objects that don't have their own rights DS.  It also includes common RELS-EXT relationships and policy agreement documents.
The APO defines possible workflows that can be selected. 
To register:
Add a row (metadata, source id, druid, label);
Enter its metadata id. If a DRUID is also entered, the Label will be obtained from the DRUID.
Can also use text view to paste in a list of ids. 
Then hit 'register' button which invokes an Ajax process to submit 10 items at a time. This initiates the workflow (taken from the APO), obtains descriptive metadata from Metadata toolkit (based on id provided), creates identity metadata, and project (provided by register) 
APO is referenced by hydra:isGoverendBy assetion in RELS-EXT
APO is an object referenced by any objects that are registered to be governed by it. Parts of the APO are copied into the object at acccession/registration time.
For example, if rights are changed in the APO, it will affect objects that have not yet been registered but not objects that have already been registered unless they are retroactively applied.
Tracking Sheets - generates PDF with title, identifier, barcode, for registered items.
Metadata Toolkit - is an Orbeon XForms application
-------
Hypatia - Naomi Dushay 
Supports archival accessioning and arrangement
Built on Hydra Head plugin Rails 2
Showing Fixture objects that were hand created - collection object and its items (components). Data from fedora. Can navigate 1st level of relationships to top level components
Loading content from Forensic Toolkit (FTK). 
FTK is useful for accessioning a hard drive. Can use FTK to examine a disc for files and filetypes.
FTK provides a folder that has all the individual files that have been extracted from the disk. For example, a set of 5 floppy disks. This report can be difficult to read or process. 
FTK provides a report in Formatting Objects xml format that can be turned into a PDF. Can detect duplicates and rename them in the report. It generates checksums, created/modified dates. Stanford has been writing code to extract that data for FO for better access, preservation and indexing. Can do some selection in the tool to obtain specific files. 
FTK will also provide html versions of WordPerfect files. 
Will be able to access/download files through Blacklight/Fedora interface. 
For archiving, Stanford creates a sub-series for 'born digital' records which collects output from FTK. EAD description above that is provided from  archivists. Some archivists want to author / edit sub-series description. 
Roadmap: Hypatia with Fixtures; Importing content from Stanford, UVa, Hull, Yale; Redesigning the interface. 
Observing EAD markup discrepancies which is making creation of single ingest/migrate program challenging. 
Adam Wead: Do you start with an existing EAD document? Yes, at this time. Currently the EADs were hand created for this purpose. Expect that the arrangement and description will happen in Hypatia over time. May be able to reveal / suppress sub-series' to the public as ready.
Roadmap: September - Loading AIMS collections (Mellon grant ends in September) ; refractor Hydra Head; Apply extensions that UVa and Stanford are working on. Stanford's Special Collections and Archives are very interested so Stanford will probably continue development after the grant. May also factor custom FTK code out of Hypatia for use in other workflows /systems.http://hypatia-test.stanford.edu
------------
Hydra Head Rails 2 (New Hydrangea) - Jessie Keck
Started with workflow for a MODS asset. Starting with unadorned javascript-less workflow, so to facilitate thorough automated testing, and support progressive enhancement and frequent testing. 
Incorporating more of Blacklight than previous version.
Will apply more styling later when integrated with Hypatia.
Adding a new asset (MODS) includes various form validation. 
Edit view is very similar to browse view.
Display is easily configured in model configuration.
Validation can check current and previous work steps. Validation methods are in controller classes. Date validation uses Ruby Date module for conversion for preferred Fedora format. 
The intention is to be flexible. If you need to create a new content type, you can use your own methods or leverage bundled code.
Includes permission controls.
Includes HTML5 validation. There is a Cucumber "And the page should be HTML5 valid" step which will submit page for W3C validation. 
Is Flash being used for any interface widgets, such as multiple upload? Not yet.
There are tests written for all code supporting progressive enhancement with javascript. 
------------
Related collection project for Hypatia and Registration.
Revs Archive at Stanford - Collier Auto Collection materials. Initially 1M million / now 1.6M item archive. Interested in digitization workflows for prints and negatives. Want to arrange and describe them, support exhibits, and have a full digital asset management system. Stanford will be working on exhibits functionality in Blacklight to support this. 
------
University of Hull - Richard Green
hydra-test.hull.ac.uk
UH developers working with MediaShelf.
Has existing repository with Muradora interface to be replaced with Hydra-based implementation. Want some consistency with existing system as it is familiar to users. Must support various levels of access control. 
Content includes (among other types):
Thesis and dissertations
Learning materials
Conference paper or abstract
Presentations
Journal articles
Photograph
Planning to migrate main library catalog to Blacklight.
Using MODS as primary metadata schema. Convert to UKETD schema dynamically.
Typically employ atomistic object model, but will occasionally employ compound model, for example when need to limit access to materials in the context of a specific ETD.
Edit forms for MODS allow catalogers to drop down to great detail in MODS schema
New objects go into a 'port' queue where it is in a protected space for editing / enhancement. It then goes in the a QA queue. 
Support a 'set' structure of membership relationships that supports multiple hierarchies, for access policy inheritance, display structure, item granularity, collection linking and 'breadcrumbing',…
Content also arrives in the QA queue from other systems into Fedora. 
Eddie Shin: Structural Sets for policy are like APO but hierarchical. 
SOLR full text indexing of PDFs.
Departmental secretaries submit exam materials for their department
A small group of others have submission privileges.
All submissions (including from catalogers) go into quality assurance before they're generally available. 
Matt Zumwalt: Can you speak to the difference between generic content model, generic display. 
Richard Green: Hull wants to stay as close to the core Hydra specification as possible.
Roadmap: Expect to have read-only version up by end of September, 2011. Rev 3 and Javascript soon but not in the next 2 weeks. Have specialist pages coming - those will trickle in. OAI-PMH to sort out - having problems with old configuration. Want to integrate with the library catalog in Blacklight. Expect these within 2012, perhaps by the start of the next academic year.
Content in existing repository are all in external datastreams.
-----
NARM "The National Association of Recording Merchants"
Matt Zumwalt
Aggregates catalog files from music publishers. In UPC425 format
Project parses UPc425 files and creates Fedora objects indexed in Solr.
Do lookups in Freebase to enhance records and note what enhancement processes are performed.
Analyze the content to gather releases into albums and associate manifestations 
Blacklight is used for searching/faceting. 
Using DDEX, EACCPF
Using Redis / Resque (Ruby client gem for workers that run on Redis) for queuing batch processing
Using Hydra technologies outside of a 'Head'
Marpa Foundation Project supporting Tibetan script, Wylie transliteration, phonetic Tibetan, as well as Tibetan translation. Contributors often have Sanskrit names which are sometimes in script or transliterated with diacritics, and without diacritics. Can associate media objects, described in PBCore. 
Hydra Camp - Rails 3 with RSpec, Cucumber, and Git. MVC and writing tickets; Creating Fedora objects and indexing them in Solr; Blacklight ;