Hydra Tech Call 2015-09-30

Time: 9:00am PDT / Noon EDT

Call-In Info: 1-641-715-3660, access code 651025

Moderator: Justin Coyne

Notetaker: Colin Gross

Attendees:

  • Peter from Alberta
  • Ryan from Oregon State
  • Justin from DCE
  • Colin from UMich
  • Drew from WGBH
  • Adam from PSU
  • Carolyn from PSU
  • Anna from Chemical Heritage Foundation
  • Trey from Princeton
  • Steven from Temple


Agenda:

  1. Call for agenda items
  2. Hydra Connect wrap up
  3. Side loading and derivatives (hydra-derivatives 3.0)
    1. https://github.com/projecthydra/hydra-derivatives/pull/84
    2. https://github.com/projecthydra-labs/curation_concerns/pull/347
  4. Other urgent things:
    1. https://github.com/projecthydra/hydra-derivatives/pull/83
  5. Full Text extraction Solr errors. Are these correct?
    1. https://github.com/projecthydra/sufia/blob/master/sufia-models/lib/tasks/sufia-models_tasks.rake#L39-L70
    2. https://github.com/projecthydra-labs/hydra-works/blob/fae0dcae973957ab9acbcb0a28400b623d9a6907/lib/tasks/hydra-works_tasks.rake#L6-L38
  6. Next call
    1. Date: October 7, 2015
    2. Moderator: Steven Ng
    3. Notetaker: Drew Myers

Call for agenda items

  • Active Fedora pr 901 from Trey. Contained rdf sources.

Hydra Connect wrap up

  • Wiki is getting documented with slides and notes.
  • No talks were recorded.
  • Future hydra connect talks could be recorded. expense? volunteers?
    At very least audio recording + slides.
  • There is a survey for attendees.
  • DPLA & Standford did lots of work for HyBox.
    See project in projecthydra-labs for feature requests.

Side loading and derivatives

  • Hydra-derivatives 3.0 by PR #84
    • Breaks backwards compatibility
    • Addresses Nathan Rogers use case. Ingesting lots of big objects.
    • Derivatives require a lot of network traffic when pulling bitstreams from Fedora.
    • This alternative is to keep files in staging area on disk.
    • Potential drawback is process doing this is staging location has to be visible to characterization job.
    • OutputFileService defaults to being on disk
    • Use case for some derivatives going into repository and some on disk/cache needs to be tested.

Other urgent things

  • Hydra-derivaties PR #83.
    Closes IO hanging issue.
  • Full text extraction solr errors
    • Update FULLTEXT_JARS according to Sufia issue.
    • Jars need to be updated to solr 4.10 compatibility.
    • Should jars be obtained from solr distro or maven?
  • There is room for improvement around hydra jetty and configuration management.