Code for Indexing
Table of Contents
Where are collections indexed now?
- https://github.com/samvera/hyrax/blob/master/app/indexers/hyrax/collection_indexer.rb
- https://github.com/samvera/hyrax/blob/master/app/indexers/hyrax/collection_with_basic_metadata_indexer.rb
- https://github.com/samvera/hyrax/blob/master/app/indexers/hyrax/basic_metadata_indexer.rb
- https://github.com/samvera/hyrax/blob/master/app/indexers/hyrax/admin_set_indexer.rb
- https://github.com/samvera/hydra-pcdm/blob/master/lib/hydra/pcdm/collection_indexer.rb
- https://github.com/samvera/hydra-pcdm/blob/master/lib/hydra/pcdm/pcdm_indexer.rb
- https://github.com/samvera/active_fedora/blob/master/lib/active_fedora/indexing_service.rb
- https://github.com/samvera/active_fedora/blob/master/lib/active_fedora/rdf/indexing_service.rb
Thoughts on indexing nested collections
At Notre Dame, we have implemented Nested Collections and leveraged the following indexing strategy:
- https://github.com/ndlib/curate-indexer (which could be better named)
- We have a method for reindexing a relationship (https://github.com/ndlib/curate-indexer/blob/master/lib/curate/indexer.rb#L20) and reindexing the whole repository (https://github.com/ndlib/curate-indexer/blob/master/lib/curate/indexer.rb#L36)
- The gem was developed without concern for the persistence layer, instead relying on an adapter (it is tested via an InMemoryAdapter) who's interface is defined in the AbstractAdapter
- Our implementation details in CurateND's adapter for indexing are found in Curate::LibraryCollectionIndexingAdapter and added a module for IsMemberOfLibraryCollection.
Potential Pitfalls
Nested collections can create infinite loops (e.g. A is in B is in C is in A). At Notre Dame we adopted a maximum depth (aka time_to_live) in graph traversal (another option is Cycle detection but that might not be performant). Also with nested collections, reindexing everything can take considerable time. Those tasks should be relegated to background jobs.