You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current approach to indexing data from PDC Discovery creates a new Solr collection, indexes into it, and then swap Solr to point to the new collection. This was needed because of the way we were indexing data from two sources (PDC Describe and DataaSpace) but once we stop indexing DataSpace we won't need this process anymore.
I suggest ditching this process once we stop harvesting from DataSpace since the collection creation has caused issues before where Solr stop accepting requests to create and delete collections (but reading from collections work).
We should also figure out how to handle withdrawals from PDC (how to remove them from the index). One approach for this is to tag all records with a timestamp indicating when they were indexed and at the end of the index process delete all records that are older than the time the rake task was ran (meaning, delete Solr documents that were not touched during the index).
Our current approach to indexing data from PDC Discovery creates a new Solr collection, indexes into it, and then swap Solr to point to the new collection. This was needed because of the way we were indexing data from two sources (PDC Describe and DataaSpace) but once we stop indexing DataSpace we won't need this process anymore.
I suggest ditching this process once we stop harvesting from DataSpace since the collection creation has caused issues before where Solr stop accepting requests to create and delete collections (but reading from collections work).
We should also figure out how to handle withdrawals from PDC (how to remove them from the index). One approach for this is to tag all records with a timestamp indicating when they were indexed and at the end of the index process delete all records that are older than the time the rake task was ran (meaning, delete Solr documents that were not touched during the index).
Related to #684
The text was updated successfully, but these errors were encountered: