You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 5
Next »
Time: 1:00 PM EST / 10:00 AM PST
Call-In Info: 712-775-7035 (Access Code: 960009)
Moderator: Eben English (Boston Public Library)
Notetaker: TK (Etherpad link: https://etherpad.wikimedia.org/p/Samvera_Newspapers_Interest_Group_Call__2018-04-05)
Attendees:
Agenda
- IMLS Grant Update
- Finishing up modeling
- Dealing with hyrax metadata model and how to add/modify a new newspaper model
- Issues with making the newspaper page a fileset; will make a regular PCDM object instead
- Getting ready to look at ingest
- Originally planned project to produce a gem but have been working within an application instead; now extracting back into a gem
- JSON serialization of ALTO
- open-oni: open-oni_OCR-word-coordinates.json
- IIIF Annotation list: http://dams.llgc.org.uk/iiif/3320863/annotation/list/ART7.json
- performance benchmarking: https://gist.github.com/ebenenglish/e9b381ba8867b383b34b16ac1c9635e7
- open-oni or IIIF options; performance is fairly similiar
- Gordon will check with Roger on how this is done in Michigan Daily code
- Roger says: We transformed the ALTO into a "words" data structure inspired by what open-ini was doing --- basically a hash with words as key to an array of coordinates. Mostly for performance: given an array of highlighted terms, easy to find all the coordinates.
- Content Examples: https://drive.google.com/drive/folders/0BwKKtxaBVqjEbE5zMFdWUEU4WGM?usp=sharing
- Still need: CONTENTdm, TEI, Olive
- Intel sharing from other groups/projects
- Europeana newspapers full-text profile: https://docs.google.com/document/d/1t5yGEzQ0KV2rqU0sFDoKnI2bIDBGrmj0f1gSOCRUgJ4/edit#
- Next meeting: Thursday May 3, 1 PM EST