Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For the newspaper-issue object, you might have a PDF or full text of that object, so there's a fileset for those, too.


Question: where would the OCR for a particular article or a particular page go?

Answer: page is a PCDM fileset that contains any combo of files you might have. That's at the page level; for the article level – you might have OCR for article without an image. Might be able to store it in coordinated format, or something other than plain text, for OCR. I.e., wouldn't this be another file in the fileset (an ALTO file, an hOCR file, etc.) Plain text makes indexing or reindexing much easier, though.

Maybe page isn't the level for articles: maybe change Newspaper article image to Newspaper Article Fileset?