One tap mobile +16699006833,,993200218# US (San Jose) +16465588656,,993200218# US (New York)
Dial by your location +1 669 900 6833 US (San Jose) +1 646 558 8656 US (New York) Meeting ID: 993 200 218 Find your local number: https://zoom.us/u/adKGAdrW7F
Should create (or link existing, based on LCCN) title/publication work, ingest reels, issues, pages
UI:
Calendar-based browsing for any given title/publication
Basic working functionality, later goals to style and improve look
Search results, highlighting within IIIF viewer of search term, when linked to show view of page/issue, from search result
Semantic URLs:
for publication/title (using LCCN)
for issues, pages (not sequential)
not included: article URLs
Search within a newspaper title
Upcoming:
Search interface just for newspaper content (like "advanced search" in sense of field, but not necessarily the boolean operators
Search within a title, select a date range, choose a language (facet), article type, etc.
Question (Gordon): is this based on Blacklight advanced search?
Eben: This is TBD — trying to avoid collision in view configuration for multiple adavanced search.
Batch ingest for PDF, TIFF
Article segmentation in METS-ALTO?
Testing
Still a work in progress
Vagrant instance is likely best choice
Test site is up, works, provides the PDF ingest functionality, but still needs most of the UI features deployed to it.
Behind current master (of newspaper_works gem).
Hoping soon for feedback.
Improving documentation in wiki of `newspaper_works` repository.
Article-segmented ALTO?
Open query to the people on the call... - Anyone in group with experience with this? - Thinking about metadata extraction (e.g. headline extraction, text classification)? - Are there any projects anyone is aware of that have tackled this? - Eben: "people like the idea of this, but very few have paid to have this done" ... ergo "little information on best practices". - Title extraction: if there is something in the ALTO with cues for font size, "this block of text might be the title".
TIFF batch ingest
Anyone having experience with ingest workflows for (directories of TIFF newspaper pages, PDFs of issues)?
Gordon: ingesting of PDFs, mostly.
Eben: was there some kind of manifest that gave hints to the files?
Gordon: I think it might have been XML descriptions of the files (will message Eben details)
Eben: Leaning toward solution that is based on stipulated folder and file naming convention, instead of inventing some kind of required manifest.
PDF would need to have date, folder containing them would need to specify something like LCCN (or some other clue to publication name or identifier?). - Enough metadata for a batch ingest with a useful result. - Naming convention does not seem an arduous requirement.
Could have some kind of nesting, for TIFFs, with folder per date, possibly in folders for year, in a parent folder for the publication. - Would have to presume some kind of lexical order of the file naming for the pages within each issue. Files would not need issue date if the parent folder had the issue date in its directory name.
Hypothetically not complex to prepare materials for this structure.
Nicholas: may have some YML manifest example.
Sean: possible to eventually do configuration and/or manifest as a progressive enhancement at a future date.
Eben: how arduous is it to create the YML file vs. creating the file/folder structure/naming?
e.g. NDNP: folder with LCCN of a title, containing folders for reels, each issue directories under each reel using a date as the naming convention.. While the reel folder is somewhat superflous to this example, the LCCN and the date is enough to MVP (of ingested pages linked to parent issue and title works, without being orphaned).
Newspapers project looking at this in next 3-4 weeks.
Content examples, intel sharing
Standing item for content examples, not getting some of these materials so far, so we presume we are in okay shape for now for representative materials.