SRCH-5154 Bulk delete zombie records from Elastic Search #1743

krbhavith · 2024-11-26T17:23:37Z

Summary

Upload a list of url's(zombie) through Super Admin page to delete from SearchgovUrl table and from Kibana.
Added concern BulkUploadHandler to be used by all bulk upload controllers.

Checklist

Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the ⚠️ :warning: emoji and explain why the step was not completed.

Functionality Checks

You have merged the latest changes from the target branch (usually main) into your branch.
Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.
PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.
Automated checks pass. If Code Climate checks do not pass, explain reason for failures:

Process Checks

You have specified at least one "Reviewer".

stevenbarragan

Plase make sure rspec, cucumber, and codeclimate checks pass.

krbhavith · 2024-11-27T16:06:46Z

Plase make sure rspec, cucumber, and codeclimate checks pass.

I have to make one more change to delete the record with document_id in I14yDocument when url is not available in searchgov_urls table.

stevenbarragan

@krbhavith don't forget to fix the test coverage

stevenbarragan

Please make sure file exists where the job will read it. And avoid using rescue as much.

app/services/bulk_zombie_url_uploader.rb

spec/services/bulk_zombie_url_uploader_spec.rb

app/controllers/concerns/bulk_upload_handler.rb

stevenbarragan · 2024-12-13T19:11:33Z

app/controllers/admin/bulk_zombie_url_upload_controller.rb

+    BulkZombieUrlUploaderJob.perform_later(
+      current_user,
+      @file.original_filename,
+      @file.tempfile.path


This approach won't work on staging or production. When the request comes in the tempfile gets created in the app servers but the workers will try to perform the upload from the crawl server.

This file needs to be send to S3 and the job needs to read it from there.

Existing similar functionality at https://github.com/GSA/search-gov/blob/main/app/controllers/admin/bulk_url_upload_controller.rb#L33 working fine now in production

How is the worker picking up the tempfile?

app/jobs/bulk_zombie_url_uploader_job.rb

app/services/bulk_zombie_url_uploader.rb

stevenbarragan

We should not need to use.send to call methods plase change that. There is a few places where logging can be improved.

app/controllers/concerns/bulk_upload_handler.rb

stevenbarragan · 2024-12-18T17:33:51Z

app/controllers/admin/bulk_zombie_url_upload_controller.rb

+    BulkZombieUrlUploaderJob.perform_later(
+      current_user,
+      @file.original_filename,
+      @file.tempfile.path


How is the worker picking up the tempfile?

app/services/bulk_zombie_url_uploader.rb

stevenbarragan · 2024-12-18T23:40:33Z

spec/services/bulk_zombie_url_uploader_spec.rb

+      let(:row) { { 'URL' => 'https://example.com', 'DOC_ID' => nil } }
+
+      it 'logs a missing document ID error' do
+        uploader.send(:process_row, row)


Do not use send to call methods. If the method is private test it through the parent or make it public.

uploader.process_row, row

stevenbarragan · 2024-12-18T23:41:01Z

spec/services/bulk_zombie_url_uploader_spec.rb

+    context 'when document ID is present' do
+      it 'handles URL processing' do
+        allow(uploader).to receive(:handle_url_processing)
+        uploader.send(:process_row, row)


is there a need to use send?

stevenbarragan · 2024-12-18T23:41:37Z

spec/services/bulk_zombie_url_uploader_spec.rb

+
+  describe '#handle_processing_error' do
+    subject(:handle_processing_error) do
+      uploader.send(:handle_processing_error, error, url, document_id, row)


do not use send.

stevenbarragan · 2024-12-18T23:45:28Z

spec/services/bulk_zombie_url_uploader_spec.rb

+
+  describe '#initialize_results' do
+    it 'initializes the results object' do
+      uploader.send(:initialize_results)


We should not use send. private methods do no need to be unit tested. The uploadmethod in here should have tests for whatinitialize_results` is doing instead.

stevenbarragan · 2024-12-19T00:06:01Z

app/services/bulk_zombie_urls/file_validator.rb

+# frozen_string_literal: true
+
+class BulkZombieUrls::FileValidator
+  MAXIMUM_FILE_SIZE = 4.megabytes


Where this this restriction comes from? I don't see it as part of the ticket.

It's not the requirement from the ticket, but I am following this as per the previous other upload processes.

krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from e1c91c7 to 30e403c Compare November 26, 2024 17:24

stevenbarragan requested changes Nov 27, 2024

View reviewed changes

krbhavith force-pushed the SRCH-5154/zombie_url_upload branch 4 times, most recently from be72efe to 52117bf Compare December 10, 2024 16:47

stevenbarragan requested changes Dec 12, 2024

View reviewed changes

stevenbarragan requested changes Dec 13, 2024

View reviewed changes

krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from 0d18682 to c83b0ed Compare December 17, 2024 04:14

krbhavith requested a review from stevenbarragan December 17, 2024 14:23

krbhavith added 10 commits December 17, 2024 09:23

SRCH-5154 Bulk delete zombie records from Elastic Search

18a9d2f

refactor code

7714aba

refactor and fix code climate issues

f597ccf

fix code climate

b58dcdb

fix rubocop and rspec

502a5fb

update test coverage

a3bc228

update errors on email template

8c00b6f

fix code coverage

8497dde

code coverage

139cfb4

remove unwanted rescue

7b4929d

krbhavith force-pushed the SRCH-5154/zombie_url_upload branch from c83b0ed to 7b4929d Compare December 17, 2024 14:23

stevenbarragan requested changes Dec 19, 2024

View reviewed changes

krbhavith added 2 commits January 7, 2025 11:06

refactor

5b1923a

upload file to s3

d894950

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SRCH-5154 Bulk delete zombie records from Elastic Search #1743

SRCH-5154 Bulk delete zombie records from Elastic Search #1743

krbhavith commented Nov 26, 2024 •

edited

Loading

stevenbarragan left a comment

krbhavith commented Nov 27, 2024

stevenbarragan left a comment

stevenbarragan left a comment

stevenbarragan Dec 13, 2024

krbhavith Dec 17, 2024

stevenbarragan Dec 18, 2024

stevenbarragan left a comment

stevenbarragan Dec 18, 2024

stevenbarragan Dec 18, 2024

stevenbarragan Dec 18, 2024

stevenbarragan Dec 18, 2024

stevenbarragan Dec 18, 2024

stevenbarragan Dec 19, 2024

krbhavith Dec 19, 2024

SRCH-5154 Bulk delete zombie records from Elastic Search #1743

Are you sure you want to change the base?

SRCH-5154 Bulk delete zombie records from Elastic Search #1743

Conversation

krbhavith commented Nov 26, 2024 • edited Loading

Summary

Checklist

Functionality Checks

Process Checks

stevenbarragan left a comment

Choose a reason for hiding this comment

krbhavith commented Nov 27, 2024

stevenbarragan left a comment

Choose a reason for hiding this comment

stevenbarragan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenbarragan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krbhavith commented Nov 26, 2024 •

edited

Loading