Skip to content
This repository has been archived by the owner on Nov 26, 2019. It is now read-only.

Better mass index #412

Merged
merged 1 commit into from
Mar 31, 2017
Merged

Better mass index #412

merged 1 commit into from
Mar 31, 2017

Conversation

jrochkind
Copy link
Contributor

  • Uses batch and progress bar.
  • Indexes permissions objects first, should eliminate double-index need

Based on code already in active-fedora master.

Closes #348

@hackmaster.a I'm not sure how to test to confirm it eliminates need
for double-index. Do you want to do that, or tell me how (what the
problem case looked like before)?

@jrochkind jrochkind force-pushed the chf_indexer branch 2 times, most recently from 21fdcb3 to 25fb93e Compare March 28, 2017 02:38
@jrochkind
Copy link
Contributor Author

jrochkind commented Mar 28, 2017

Total time to reindex on staging: bit over 2 hours. Of that, 30 minutes is the "getting all URLs from Fedora" stage, which is the hardest to speed up with concurrency (although potentially possible).

It ought to only require one indexing, even on an empty index, but I'm not sure how to test/verify that is true.

But this speed-up is maybe good enough for now?

@hackartisan
Copy link
Contributor

@jrochkind To test whether a second index is required, you have to start from an empty index. Run the script and then try to edit something you previously had edit rights for (if you don't own any objects you may have to ask someone else to do this for you, maybe @sanfordd). If you cannot edit it, that means you need the second index.

Are you testing this again with the time stamping?

@jrochkind
Copy link
Contributor Author

Wasn't planning on testing again with time-stamping, but I can if you'd like. I can and will also easily test the permissions thing locally in my dev copy with like 4 items. Once I figure out how to clean out my solr index, heh.

@hackartisan
Copy link
Contributor

are you still using solr_wrapper? solr_wrapper clean should work

@jrochkind
Copy link
Contributor Author

okay, after fixing a typo I found when it didn't work, verified in dev this does work to maintain permissions with only one index on an empty solr.

Will run again on staging to verify timing. Still pretty sure it's ~2 hours, that's a lot faster than before, yes?

@jrochkind
Copy link
Contributor Author

Finished on staging again in 2 hours and 2 minutes. Only need to run once even on an empty index!

Rebased into one commit to clean up history.

Ready for merge! I think this is a good enough index improvement for now, #348 can be closed?

@hackartisan
Copy link
Contributor

👍 sorry it took so long for me to merge; my brain has been distracted and/or scrambled from conference and travel but I wanted to actually be able to read the code first!

@hackartisan hackartisan merged commit fde924c into master Mar 31, 2017
@hackartisan hackartisan deleted the chf_indexer branch March 31, 2017 13:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants