CDM to Hydra

CDM to Hydra breakout notes

Kinza Masood & Brian McBride
University of Utah
http://www.lib.utah.edu/collections/digital-library.php (digital home)
http://content.lib.utah.edu/cdm/search (inside CDM public UI)
    •    Two Servers - 2.5 million objects total
    ⁃    Newspapers
    ⁃    1.5 million pages
    ⁃    19 million articles
    ⁃    Everything else
    ⁃    .5 million
    •    Version 6 is much better received than 5 by users (esp. zoom)
    •    Scalability issues
    ⁃    Indexing problems
    ⁃    Export sometimes only
    ⁃    Can’t find and replace metadata
    •    Also spend a lot of local development to support vended CDM solution
    •    Breaks with each upgrade
    •    Evaluated other systems
    ⁃    Took a year on a large committee and partners
    ⁃    External partners and paying customers
    ⁃    Looked at Hydra among other systems
    ⁃    Had a scoring criteria among 9 dimensions
    ⁃    Conclusion: no vendor solution would serve needs/requirements criteria
    •    Scale
    •    Kinza can share the scored document upon request
    •    Need article-level newspaper support
    •    Hoping to partner with someone in Hydra community to help
    •    Next steps
    ⁃    Where to start?
    ⁃    How to show to upper administators?
    ⁃    Newspaper article-level partners?

Katherine Lynch
Temple University
http://digital.library.temple.edu/cdm/
    •    Hosted CDM
    •    30 collections, several thousand items
    •    Motivation to move out
    ⁃    Problems with metadata entry ; keep clean
    ⁃    Workaround for compound objects and other various objects
    ⁃    ADA
    •    Looked at Hydra for other projects, but thought CDM would be a good pilot to test Hydra
    •    Requirements gathering complete
    ⁃    Created SCRUM list of tasks and a Phase 1
    ⁃    Phase 1 will have base requirements from CDM and critical improvements
    ⁃    Timeline: SOON! (maybe 1 year)
    •    Next steps
    ⁃    Re-skinning front-end of CDM
    ⁃    Didn’t want to re-introduce both new front-end and new workflow UI at the same time
    ⁃    Wanted proof of concept first and give people confidence in Hydra

Patricia Hswe
Penn State
http://collection1.libraries.psu.edu/
    •    100 collections in CDM and some other platforms
    •    PH doesn’t work directly with CDM
    •    No migration project planned
    •    Might be a culture shift in library in terms of both internal workflow and vendor/open source with internal support
    •    [Newspapers in Olive]
    •    Interested in hearing folks' next steps
    ⁃    Interested in listening and figuring out what other people are planning and take information back to Penn State

Karen Estlund and Tom Johnson
University of Oregon and Oregon State University
http://oregondigital.org/
    •    Just under 300,000 items and collections in various states
    •    [Newspapers on LC Newspaper Viewer; doesn’t use article-level]
    •    Working on migrating metadata to RDF
    •    Will launch Hydra replacement at the end of March 2014
    •    Hydra and CDM will be run concurrently
    ⁃    New collections Hydra
    ⁃    Old collections as metadata and uncorrupted archival files retrieved to ingest into Hydra
    •    Tools
    ⁃    Wrote gem to move objects/metadata to Bags (HyBag) - Available http://github.com/osulp/hybag
    ⁃    Ingest and Export of Bags from Hydra
    ⁃    Command line but soooo easy
    ⁃    Should be easy to work with Fedora 4 as long as ActiveFedora does, too; may be RDF questions with 4 and how to load data stream or as a linked data platform server
    ⁃    CDM2Bag, https://github.com/OregonDigital/cdm2bag
    ⁃    Transforms metadata from CDM internal metadata to RDF predicates
    ⁃    Magic to call custom Ruby methods to do complex transformations, such as taking Geographic values to GeoNames
    ⁃    Gets put into bag as metadata stream
    ⁃    Get file into bag
    •    Collections
    ⁃    Created collection landing pages, which render based on a metadata field (configurable)
    ⁃    Would need UI developer time but basic rails template
    ⁃    Blacklight gives faceted browse of controlled vocabulary collection terms for free
    •    Next Steps
    ⁃    Provide tools

Exhibits
    •    Utah - OmniUpdate (small Utah company) for exhibits
    ⁃    Duplicated objects
    •    Oregon - Will use external system (Omeka, Atrium, Drupal, whatever) for curated exhibits and DH projects

William Ying
ARTstor
    •    SharedShelf
    ⁃    Maybe make SharedShelf a development project and use Fedora
    ⁃    Most Hydra users are ARTstor users and many SharedShelf participants
    ⁃    Expose content through Blacklight?
    ⁃    A lot of partners want to change but are worried about newspapers
    •    IIIF
    ⁃    Going to make public content from ARTstor available
    ⁃    …and OAC http://www.openannotation.org/
    •    Next Steps
    ⁃    Want to share content, metadata, development
    ⁃    Want to make content more available

Alicia Cozine
DCE
    •    Getting a lot of calls about can DCE help move from CDM to Hydra
    •    Provide support for moving to Hydra!

Tools
    •    UO tools may not work for everyone
    •    Hosted CDM institutions can’t get at their files with metadata in structured form or compound object structure files

What would be helpful?
    •    List what did Oregon learn and would do differently?
    ⁃    Nothing; we’re awesome!
    ⁃    If we had started a year later, may have combined ingest UI with workflow UIs from other places
    ⁃    If possible would have less programmers full-time than so many at part time or percentage of several
    •    Should wait for Fedora 4?
    ⁃    Only if you need Fedora 4
    ⁃    Newspapers may need to wait for Fedora 4 (scalability and speed)

Questions
    •    Can you have a DC XML data stream and RDF in Hydra?
    ⁃    Yes! As many as you want.
    ⁃    Oregon Digital working on a crosswalk tool, ActiveFedoraCrosswalks, https://github.com/osulp/active_fedora-crosswalks:
    ⁃    We track dc:isPartOf about Fedora/RELS to go to RELS/EXT and sync down to description metadata stream
    ⁃    Can have phantom data streams that don’t exist in Fedora at all but can respond to methods and build their own content
    ⁃    E.g. define relationship between your RDF datastream and OAI-dc and have OAI-dc datastream build itself
    •    Tips
    ⁃    Use DCE to get started (1 week sprint), especially if don’t have Ruby programmers
    •    Staffing
    ⁃    At Oregon
    ⁃    4  programmers at .10-.30 FTE (ouch!)
    ⁃    System Admins as needed
    ⁃    Metadata
    ⁃    Project Mgt