Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRCH-5154 Bulk delete zombie records from Elastic Search #1743

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

krbhavith
Copy link
Contributor

@krbhavith krbhavith commented Nov 26, 2024

Summary

  • Upload a list of url's(zombie) through Super Admin page to delete from SearchgovUrl table and from Kibana.
  • Added concern BulkUploadHandler to be used by all bulk upload controllers.

Checklist

Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the ⚠️ :warning: emoji and explain why the step was not completed.

Functionality Checks

  • You have merged the latest changes from the target branch (usually main) into your branch.

  • Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.

  • PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.

  • Automated checks pass. If Code Climate checks do not pass, explain reason for failures:

Process Checks

  • You have specified at least one "Reviewer".

@krbhavith krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from e1c91c7 to 30e403c Compare November 26, 2024 17:24
Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plase make sure rspec, cucumber, and codeclimate checks pass.

@krbhavith
Copy link
Contributor Author

Plase make sure rspec, cucumber, and codeclimate checks pass.

I have to make one more change to delete the record with document_id in I14yDocument when url is not available in searchgov_urls table.

@krbhavith krbhavith force-pushed the SRCH-5154/zombie_url_upload branch 4 times, most recently from be72efe to 52117bf Compare December 10, 2024 16:47
Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krbhavith don't forget to fix the test coverage

Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure file exists where the job will read it. And avoid using rescue as much.

app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
spec/services/bulk_zombie_url_uploader_spec.rb Outdated Show resolved Hide resolved
app/controllers/concerns/bulk_upload_handler.rb Outdated Show resolved Hide resolved
app/controllers/concerns/bulk_upload_handler.rb Outdated Show resolved Hide resolved
BulkZombieUrlUploaderJob.perform_later(
current_user,
@file.original_filename,
@file.tempfile.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach won't work on staging or production. When the request comes in the tempfile gets created in the app servers but the workers will try to perform the upload from the crawl server.

This file needs to be send to S3 and the job needs to read it from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing similar functionality at https://github.com/GSA/search-gov/blob/main/app/controllers/admin/bulk_url_upload_controller.rb#L33 working fine now in production

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the worker picking up the tempfile?

app/jobs/bulk_zombie_url_uploader_job.rb Outdated Show resolved Hide resolved
app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
@krbhavith krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from 0d18682 to c83b0ed Compare December 17, 2024 04:14
@krbhavith krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from c83b0ed to 7b4929d Compare December 17, 2024 14:23
Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need to use.send to call methods plase change that. There is a few places where logging can be improved.

app/controllers/concerns/bulk_upload_handler.rb Outdated Show resolved Hide resolved
BulkZombieUrlUploaderJob.perform_later(
current_user,
@file.original_filename,
@file.tempfile.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the worker picking up the tempfile?

app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
app/services/bulk_zombie_url_uploader.rb Outdated Show resolved Hide resolved
let(:row) { { 'URL' => 'https://example.com', 'DOC_ID' => nil } }

it 'logs a missing document ID error' do
uploader.send(:process_row, row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use send to call methods. If the method is private test it through the parent or make it public.

uploader.process_row, row

context 'when document ID is present' do
it 'handles URL processing' do
allow(uploader).to receive(:handle_url_processing)
uploader.send(:process_row, row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to use send?


describe '#handle_processing_error' do
subject(:handle_processing_error) do
uploader.send(:handle_processing_error, error, url, document_id, row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use send.


describe '#initialize_results' do
it 'initializes the results object' do
uploader.send(:initialize_results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use send. private methods do no need to be unit tested. The uploadmethod in here should have tests for whatinitialize_results` is doing instead.

# frozen_string_literal: true

class BulkZombieUrls::FileValidator
MAXIMUM_FILE_SIZE = 4.megabytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where this this restriction comes from? I don't see it as part of the ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the requirement from the ticket, but I am following this as per the previous other upload processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants