-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRCH-5154 Bulk delete zombie records from Elastic Search #1743
base: main
Are you sure you want to change the base?
Conversation
e1c91c7
to
30e403c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plase make sure rspec, cucumber, and codeclimate checks pass.
I have to make one more change to delete the record with |
be72efe
to
52117bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krbhavith don't forget to fix the test coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure file exists where the job will read it. And avoid using rescue as much.
BulkZombieUrlUploaderJob.perform_later( | ||
current_user, | ||
@file.original_filename, | ||
@file.tempfile.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach won't work on staging or production. When the request comes in the tempfile gets created in the app
servers but the workers will try to perform the upload from the crawl
server.
This file needs to be send to S3 and the job needs to read it from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing similar functionality at https://github.com/GSA/search-gov/blob/main/app/controllers/admin/bulk_url_upload_controller.rb#L33 working fine now in production
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the worker picking up the tempfile?
0d18682
to
c83b0ed
Compare
c83b0ed
to
7b4929d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need to use.send
to call methods plase change that. There is a few places where logging can be improved.
BulkZombieUrlUploaderJob.perform_later( | ||
current_user, | ||
@file.original_filename, | ||
@file.tempfile.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the worker picking up the tempfile?
let(:row) { { 'URL' => 'https://example.com', 'DOC_ID' => nil } } | ||
|
||
it 'logs a missing document ID error' do | ||
uploader.send(:process_row, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use send
to call methods. If the method is private test it through the parent or make it public.
uploader.process_row, row
context 'when document ID is present' do | ||
it 'handles URL processing' do | ||
allow(uploader).to receive(:handle_url_processing) | ||
uploader.send(:process_row, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a need to use send
?
|
||
describe '#handle_processing_error' do | ||
subject(:handle_processing_error) do | ||
uploader.send(:handle_processing_error, error, url, document_id, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use send
.
|
||
describe '#initialize_results' do | ||
it 'initializes the results object' do | ||
uploader.send(:initialize_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use send
. private methods do no need to be unit tested. The
uploadmethod in here should have tests for what
initialize_results` is doing instead.
# frozen_string_literal: true | ||
|
||
class BulkZombieUrls::FileValidator | ||
MAXIMUM_FILE_SIZE = 4.megabytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where this this restriction comes from? I don't see it as part of the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the requirement from the ticket, but I am following this as per the previous other upload processes.
Summary
SearchgovUrl
table and from Kibana.BulkUploadHandler
to be used by all bulk upload controllers.Checklist
Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the⚠️
:warning:
emoji and explain why the step was not completed.Functionality Checks
You have merged the latest changes from the target branch (usually
main
) into your branch.Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.
PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.
Automated checks pass. If Code Climate checks do not pass, explain reason for failures:
Process Checks