...
Moderator: Eben English (Boston Public Library)
Notetaker: TK Brian McBride (Etherpad link: https://etherpad.wikimedia.org/p/Samvera_Newspapers_Interest_Group_Call__2018-04-05)
Attendees:
- Gordon Leacock (Univ. of Michigan)
- Cliff Wulfman (Princeton)
- Brian McBride (University of Utah)
- Eben English (BPL)
Agenda (with notes)
- IMLS Grant Update
- Finishing up modeling
- Dealing with hyrax metadata model and how to add/modify a new newspaper model
- Issues with making the newspaper page a fileset; will make a regular PCDM object instead
- Getting ready to look at ingest
- Originally planned project to produce a gem but have been working within an application instead; now extracting back into a gem
- JSON serialization of ALTO
- open-oni: open-oni_OCR-word-coordinates.json
- IIIF Annotation list: http://dams.llgc.org.uk/iiif/3320863/annotation/list/ART7.json
- performance benchmarking: https://gist.github.com/ebenenglish/e9b381ba8867b383b34b16ac1c9635e7
- open-oni or IIIF options; performance is fairly similiar
- Gordon will check with Roger on how this is done in Michigan Daily code
- Roger says: We transformed the ALTO into a "words" data structure inspired by what open-ini was doing --- basically a hash with words as key to an array of coordinates. Mostly for performance: given an array of highlighted terms, easy to find all the coordinates.
- Content Examples: https://drive.google.com/drive/folders/0BwKKtxaBVqjEbE5zMFdWUEU4WGM?usp=sharing
- Still need: CONTENTdm, TEI, Olive
- Intel sharing from other groups/projects
- Europeana newspapers full-text profile: https://docs.google.com/document/d/1t5yGEzQ0KV2rqU0sFDoKnI2bIDBGrmj0f1gSOCRUgJ4/edit#
- Next meeting: Thursday May 3, 1 PM EST