Data Migration from DSpace to Hydra
The primary goal of this Interest Group is to identify, document and communicate complete workflow solutions for the migration of data out of DSpace and into Hydra. With that goal in mind, use the table below to see that is being worked on, add your own information, and contact any resources that may be able to help you out.
Use the template below to share your data migration path:
Example:
Institution: Miskatonic University
Data Migration Approach Narrative: We called on the Great Old Ones to use their cosmic magics to move the data.
Major Issues Encountered: Collaboration with Cthulhu is difficult and can be dangerous.
Code: github.com/chulhu_code_repo/dspace-sufia-migration
Institution | Data Migration Approach & Architecture | Major/Minor Issues Encountered | Status | Example | Code |
---|---|---|---|---|---|
California State University | DSpace AIP Packager export: Importing and Exporting Content via Packages Writing custom import rake job to process AIP Package Storing Community & Collection naming in metadata fields for faceting. Currently coverage and sponsorship respectively, but that is likely to change. | Major: The size of data and frequency of calls with this approach appear to tax Fedora leading to timeouts after approx. 150 items or so. Will likely need to address the system resources on the fedora server during migration. Major: Ambiguity of bitstream naming. Bitstream file names come out as bitstream_[some number].[extension]. The file name is in the metadata as "dc.title" with the dspaceType of "BITSTREAM". But this will likely be very difficult to address for multiple bitstreams on one item. SOLUTION: Complete In the AIP Package mets.xml
Minor: Duplication of dublin core fields for "system" data vs. item metadata Minor: Embargoes work somewhat differently in Sufia. During the embargo period for an item, it is completely private. This might be expected behavior in some cases, in most cases for us DSpace allows viewing of the metadata. This functionality will need to be adapted into Sufia, but is not a super high priority yet. | In Progress | https://github.com/scholarworks/dspace_packager/blob/master/lib/tasks/packager.rake | |
University of Michigan | I have a perl script that exports 5 items from each of the close to 400 collections we have - each item in one directory. This is done using the "./dspace export" command. The script then creates a yml file for each item to be used by a rake task we have to import the items into Hyrax as works. | We are doing this more to stress test Hyrax. So we are not concerning ourselves presently with permissions of the items or bitstreams; where to store the bitstream descriptions; or the mapping of the items to their respective collections. There's also other issues which we are tabling for the time. The one issue we did encounter is that the dspace exports have lots of numeric character references in the dublin_core.xml file. In order to convert them back to utf-8 characters we are using htmlentities. Here is a link that helped me out with this: https://makandracards.com/makandra/898-encode-or-decode-html-entities | In Prgress | These files: https://github.com/mlibrary/deep-blue/blob/master/lib/append_content_service.rb https://github.com/mlibrary/deep-blue/blob/master/lib/build_content_service.rb https://github.com/mlibrary/deep-blue/blob/master/lib/tasks/populate_dev_app.rake | |
DSpace Replicate plugin to export Collection and Item BAGS. dspace-replicate Stand alone ruby application, with configurations, to map metadata from each BAG into properly formed data to publish works into a Hyrax server "through the front door". | We are focused on striking a balance on how to operate the application, some of the concepts in mind are;
| In Progress | https://github.com/osulp/dspace2hydra (See README.md) |