04. Manifest Validation Service
A pre-ingest validation service is required to ensure that assets that begin their preservation lifecycle with Archivematica are able to then successfully continue as DIPs.
If a Manifest has been created and assets have been migrated into Archivematica but fail to be uploaded to Avalon due to a manifest error, the manifest file will have to be adjusted and the entire process will have to start over from the beginning by deleting the material from Archivematica and pulling files in again. While Archivematica does support a metadata-only re-ingest of assets, this only works with Archivematica-compliant metadata.csv files and not manifest files that carry over to the Avalon Media System. Archivematica also supports a full re-ingest option, but a full re-ingest will not pick up on the modified files in the temporary Transfer Source. The goal of validating the Avalon Manifest file prior to Archivematica ingest will help to alleviate this issue and require that files be re-ingested.
Manifest file rules
Following the Batch Ingest Package Format guidelines, the following rules have been written in IETF’s RFC 2119 best practices language and minimally must be applied for Manifest file validation:
- Row 1, Column A MUST NOT be empty (SHOULD contain a batch reference name).
- Row 1, Column B MUST NOT be empty (SHOULD contain batch author contact information).
- Row 2 MUST contain metadata field names.
- Only the following metadata field names are allowed: "Bibliographic ID", "Bibliographic ID Label", "Other Identifier", "Other Identifier Type", "Title", "Creator", "Contributor", "Genre", "Publisher", "Date Created", "Date Issued", "Abstract", "Language", "Physical Description", "Related Item URL", "Related Item Label", "Topical Subject", "Geographic Subject", "Temporal Subject", "Terms of Use", "Table of Contents", "Statement of Responsibility", "Note", "Note Type", "Publish", "Hidden", "File", "Label", "Offset", "Skip Transcoding", "Absolute Location", "Date Ingested"
- Row 2 MUST contain the three required fields: Title, Date Issued, and File.
- Exception to the above rule: If "Bibliographic ID" ingest is used, only "file name" is then required.
- Row 2 field names MUST NOT have leading or trailing blanks
- The following fields MUST NOT be repeated:Title, Bibliographic ID, Bibliographic ID Label, Title, Date Created, Date Issued, Abstract, Physical Description, Terms of Use. All other fields are repeatable.
- Values under File MUST include a file extension.
- For values under file, value MUST NOT include more than one period.
- Exception to the above rule: preceding derivative files MAY include an additional period if following a specific format: file.{high, medium, low}.ext.
- The following fields must be paired if present: Other Identifier with Other Identifier Type, Related Item URL with Related Item Label, Note with Note Type.
- Publish and Hidden fields, if present, MUST be “no” or “yes.”
- The following fields, if used, MUST be repeated for each attached file: File, Label, Offset, Skip Transcoding, Absolute Location, Date Ingested
Manifest file recommendations
- The value in the Bibliographic ID Label and Other Identifier Type field must be an acceptable value in the configured list of valid values within the Avalon system. The following are default, but others could be added: "local", "oclc", “lccn", "issue number", "matrix number", "music publisher","video recording identifier", and "other". Users should be aware and take heed if straying from these default values.
Validation service call
Avalon Media System users who want to validate their Manifest files prior to moving through the Archivematica preservation process can test their Manifest CSVs by sending the CSV to an endpoint in the Archivematica API.
Usage example: (assuming that path/to/Manifest.csv exists)
curl http://127.0.0.1:62080/api/v2beta/validate/avalon \ --data-binary "@path/to/Manifest.csv" \ --header "Authorization: ApiKey test:test" \ --header "Content-Type: text/csv; charset=utf-8"
Note on above example: do not use --data, use --data-binary instead.
Response examples:
200 OK (Response was valid) { "valid": true } 400 Bad Request (Expect reason to include different validation errors, or that the file could not be found) { "valid": false, "reason": "Administrative data must include reference name and author." } 404 Not Found (Validator was not found.) { "error": true, "message": "Unknown validator. Accepted values: avalon" }
This request can be performed manually, per manifest, or set to run regularly on all files in a folder using the cron service or another script scheduler.