Samvera Newspapers Interest Group Call: 2018-04-05

Time: 1:00 PM EST / 10:00 AM PST

Call-In Info: 712-775-7035 (Access Code: 960009)

Moderator: Eben English (Boston Public Library)

Notetaker: Brian McBride (Etherpad link: https://etherpad.wikimedia.org/p/Samvera_Newspapers_Interest_Group_Call__2018-04-05)

Attendees:

Agenda (with notes)

  1. IMLS Grant Update
    1. Finishing up modeling
      1. Dealing with hyrax metadata model and how to add/modify a new newspaper model
      2. Issues with making the newspaper page a fileset; will make a regular PCDM object instead
    2. Getting ready to look at ingest
    3. Originally planned project to produce a gem but have been working within an application instead; now extracting back into a gem

  2. JSON serialization of ALTO
    1. open-oni: open-oni_OCR-word-coordinates.json
    2. IIIF Annotation list: http://dams.llgc.org.uk/iiif/3320863/annotation/list/ART7.json
    3. performance benchmarking: https://gist.github.com/ebenenglish/e9b381ba8867b383b34b16ac1c9635e7
    4. open-oni or IIIF options; performance is fairly similiar
    5. Gordon will check with Roger on how this is done in Michigan Daily code
    6. Roger says: We transformed the ALTO into a "words" data structure inspired by what open-ini was doing --- basically a hash with words as key to an array of coordinates.  Mostly for performance: given an array of highlighted terms, easy to find all the coordinates.

  3. Content Examples: https://drive.google.com/drive/folders/0BwKKtxaBVqjEbE5zMFdWUEU4WGM?usp=sharing
    1. Still need: CONTENTdm, TEI, Olive

  4. Intel sharing from other groups/projects
    1. Europeana newspapers full-text profile: https://docs.google.com/document/d/1t5yGEzQ0KV2rqU0sFDoKnI2bIDBGrmj0f1gSOCRUgJ4/edit#

  5. Next meeting: Thursday May 3, 1 PM EST