Content Comparison

...

For the newspaper-issue object, you might have a PDF or full text of that object, so there's a fileset for those, too.

Question: where would the OCR for a particular article or a particular page go?

Answer: page is a PCDM fileset that contains any combo of files you might have. That's at the page level; for the article level – you might have OCR for article without an image. Might be able to store it in coordinated format, or something other than plain text, for OCR. I.e., wouldn't this be another file in the fileset (an ALTO file, an hOCR file, etc.) Plain text makes indexing or reindexing much easier, though.

Maybe page isn't the level for articles: maybe change Newspaper article image to Newspaper Article Fileset?

Version	Old Version 12	New Version 13
Changes made by	Clifford Wulfman	Clifford Wulfman
Saved on	Jan 04, 2018	Jan 04, 2018

Versions Compared

Key