Hydra Tech Call 2015-09-16
Time: 9:00am PDT / Noon EDT
Call-In Info: 1-641-715-3660, access code 651025
Moderator: Carolyn Cole
Notetaker: Nikitas Tampakis
Attendees:
Carolyn Cole (Penn State)
Nikitas Tampakis (Princeton)
Lakeisha Robinson (Yale)
Anna Headley (Chemical Heritage Foundation)
Colin Gross (UMich)
Lynette Rayle (Cornell)
Corey Harper (NYU)
Steven Ng (Temple)
Justin Coyne (Data Curation Experts)
Mike Giarlo (Penn State)
Trey Terrell (Princeton)
Drew Myers (WGBH)
Agenda:
Call for agenda items
Derivatives - current examples for generating derivatives: https://gist.github.com/elrayle/9a72ffc0c879927b327b
a la carte API - Lynette to make tickets hydra-works and hydra-derivatives to make full-text extraction a configurable derivative in hydra-derivatives.
Currently full-text extraction isn't in Derivatives, it's in hydra-works (recently pulled down from curation concerns)
Characterization currently encapsulates full-text extraction - suggested to move full-text out of characterization and into derivatives to specify which formats should have the full text extraction service run on it.
When using the Hydra Works PersistOutputFile service (https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/services/generic_file/persist_derivative.rb) defining a custom makes_derivatives proc currently appends to the set of derivatives defined in hydra-works: https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/models/concerns/generic_file/derivatives.rb#L12-L22. It was suggested to have the custom derivatives override the defaults.
side-loading and derivatives - Discuss in next -tech call
Nathan Rogers not on the call. He expressed interest in minimizing calls to Fedora when batch ingesting files.
Note: calling create_derivatives isn't required to create the derivatives.
Dive in to Hydra PCDM - review https://github.com/projecthydra-labs/hydra-pcdm/wiki/Dive-into-Hydra-PCDM
Characterization - Colin to continue working on moving characterization from curation concerns into hydra-works. Follow-up discussion to continue on hydra-tech e-mail.
E-mail thread: https://groups.google.com/forum/#!topic/hydra-tech/KWH-bUo1F3s
Should the generic file model in curation concerns include all characterization properites? Or should different formats be included a la carte? Consensus seemed to be to include a top-level characterization base class, and then allow behaviors for specific formats to be included in addition to the base.
Next call
Date: September 30, 2015 (skipping 9/23 due to HydraConnect)
Moderator: Justin Coyne
Notetaker: Colin Gross