-
-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Reject duplicate submissions #876
Conversation
9b39237
to
53c6320
Compare
53c6320
to
019e7bf
Compare
@@ -52,7 +52,7 @@ jobs: | |||
- name: Install Python dependencies | |||
run: pip-sync dependencies/pip/dev_requirements.txt | |||
- name: Run pytest | |||
run: pytest -vv -rf | |||
run: pytest -vv -rf --disable-warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
@@ -32,6 +32,7 @@ test: | |||
POSTGRES_PASSWORD: kobo | |||
POSTGRES_DB: kobocat_test | |||
SERVICE_ACCOUNT_BACKEND_URL: redis://redis_cache:6379/4 | |||
GIT_LAB: "True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something more descriptive like SKIP_TESTS_WITH_CONCURRENCY
@@ -40,7 +41,7 @@ test: | |||
script: | |||
- apt-get update && apt-get install -y ghostscript gdal-bin libproj-dev gettext openjdk-11-jre | |||
- pip install -r dependencies/pip/dev_requirements.txt | |||
- pytest -vv -rf | |||
- pytest -vv -rf --disable-warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
fakeredis.FakeStrictRedis(), | ||
): | ||
with patch( | ||
'onadata.apps.django_digest_backends.cache.RedisCacheNonceStorage._get_cache', | ||
fakeredis.FakeStrictRedis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is one instantiated and the other isn't?
results[result] += 1 | ||
|
||
assert results[status.HTTP_201_CREATED] == 1 | ||
assert results[status.HTTP_409_CONFLICT] == DUPLICATE_SUBMISSIONS_COUNT - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the OpenRosa spec allow returning a 409? and do Enketo and Collect handle a 409 properly? i can't find the code, but i remember wanting to return a 40x that wasn't 400 but being forced to use 400 only because without it Collect wouldn't display the error message i was sending. could've been Enketo, though, or i might be misremembering entirely
# The start-time requirement protected submissions with identical responses | ||
# from being rejected as duplicates *before* KoBoCAT had the concept of | ||
# submission UUIDs. Nowadays, OpenRosa requires clients to send a UUID (in | ||
# `<instanceID>`) within every submission; if the incoming XML has a UUID | ||
# and still exactly matches an existing submission, it's certainly a | ||
# duplicate (https://docs.opendatakit.org/openrosa-metadata/#fields). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reject outright any submission without a UUID
if existing_instance: | ||
existing_instance.check_active(force=False) | ||
# ensure we have saved the extra attachments | ||
new_attachments = save_attachments(existing_instance, media_files) | ||
if not new_attachments: | ||
raise DuplicateInstanceError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't Nevermind; I misunderstood, and that exception only has to do with concurrent processing of submissions with identical XMLConflictingXMLHashInstanceError
be raised already and block us from getting here?
if is_postgresql: | ||
cur = connection.cursor() | ||
cur.execute('SELECT pg_try_advisory_lock(%s::bigint);', (int_lock,)) | ||
acquired = cur.fetchone()[0] | ||
else: | ||
prefix = os.getenv('KOBOCAT_REDIS_LOCK_PREFIX', 'kc-lock') | ||
key_ = f'{prefix}:{int_lock}' | ||
redis_lock = settings.REDIS_LOCK_CLIENT.lock(key_, timeout=60) | ||
acquired = redis_lock.acquire(blocking=False) | ||
yield acquired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think support for something other than PostgreSQL is needed. We already depend on Postgres for many things.
Moot point, then, but just to say it: I think all os.getenv()
for configuration (and default values) should be in the settings files
except DuplicateInstance: | ||
response = OpenRosaResponse(t("Duplicate submission")) | ||
except ConflictingXMLHashInstanceError: | ||
response = OpenRosaResponse(t('Conflict with already existing instance')) | ||
response.status_code = 409 | ||
response['Location'] = request.build_absolute_uri(request.path) | ||
error = response | ||
except DuplicateInstanceError: | ||
response = OpenRosaResponse(t('Duplicate instance')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A conflicting XML hash with no additional attachments is the same thing as a duplicate instance. If the XML is the same but there are additional attachments, no exception should be raised; this is normal operation. I don't think a new exception class is needed. I also don't think that [was based on a misunderstanding]DuplicateInstanceError
can be reached anymore, but that's addressed by a different comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that confused me about this is that the verbiage is wrong. It's not a really a conflict with an existing instance, it's unwanted concurrent processing of two (or more) instances with identical XML.
I think ideally we should wait (a short amount of time) to try to acquire the lock before returning an error. I don't know what happens in Enketo now, but it's easy to imagine that a client would asynchronously send the following requests concurrently:
- Submission XML (hash abc123) + dog.jpg
- Submission XML (hash abc123) + cat.jpg
- Submission XML (hash abc123) + gecko.jpg
Ideally, all three requests would succeed. Let's assume request 1 arrives first and is still processing while requests 2 and 3 are received by the server: what I understand this PR would do is reject requests 2 and 3 immediately with an error code. I think we should wait (again, briefly) for request 1 to finish1 before returning an immediate rejection.
If the waiting takes too long, then we do have to reject in order to avoid sapping the worker pool with useless spinning. The message we return should be effectively "try again later", not something about a conflict with an existing instance. The HTTP code has to be dependent on the OpenRosa specification and compatible with Enketo and ODK Collect. Hopefully, those requirements are one and the same, but we'll have to test.
Footnotes
-
Addendum: Whoops, we don't really need to spin until request 1 finishes, we just need to wait until the row has been created in
logger_instance
. Imagine onePOST
has 50 files: locking while all 50 are written to storage is not what we want. We should still avoid storing the same attachment multiple times for a submission, so within a single submission, we should have some kind of attachment uniqueness constraint (if we don't already). ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume request 1 arrives first and is still processing while requests 2 and 3 are received by the server: what I understand this PR would do is reject requests 2 and 3 immediately with an error code.
So my interpretation of the spec would be that we would only need to (immediately!) reject if the same submission XML was subsequently received [ie duplicate hash] and it includes no attachment (!). Because a client should never be attempting to resend just the submission XML on its own more than once.
Whereas if the same submission XML hash was received - but it included an attachment (or multiple!) - then we can then decide whether to or not to reject based on whether any of those attachments have already been received (or are currently being processed). [and just throw away the XML since we know its already been processed, or is currently being processed...]
So any locking would minimally only need to occur for the duration of calculating (then checking for existing, then storing if not) the submission XML hash, right?
@@ -1,6 +1,7 @@ | |||
# coding: utf-8 | |||
import os | |||
|
|||
from fakeredis import FakeConnection, FakeStrictRedis, FakeServer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, was the support for locking without PostgreSQL exclusively for unit testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit testing, at least on GitLab, is already using PostgreSQL, so I don't think we need to support any non-Postgres locking mechanisms for this
Closed in favor of kobotoolbox/kpi#5047 |
## Summary Implemented logic to detect and reject duplicate submissions. ## Description We have identified a race condition in the submission processing that causes duplicate submissions with identical UUIDs and XML hashes. This issue is particularly problematic under conditions with multiple remote devices submitting forms simultaneously over unreliable networks. To address this issue, a PR has been raised with the following proposed changes: - Race Condition Resolution: A locking mechanism has been added to prevent the race condition when checking for existing instances and creating new ones. This aims to eliminate duplicate submissions. - UUID Enforcement: Submissions without a UUID are now explicitly disallowed. This ensures that every submission is uniquely identifiable and further mitigates the risk of duplicate entries. - Introduction of `root_uuid`: - To ensure a consistent submission UUID throughout its lifecycle and prevent duplicate submissions with the same UUID, a new `root_uuid` column has been added to the `Instance` model with a unique constraint (`root_uuid` per `xform`). - If the `<meta><rootUuid>` is present in the submission XML, it is stored in the `root_uuid` column. - If `<meta><rootUuid>` is not present, the value from `<meta><instanceID>` is used instead. - This approach guarantees that the `root_uuid` remains constant across the lifecycle of a submission, providing a reliable identifier for all instances. - UUID Handling Improvement: Updated the logic to strip only the `uuid:` prefix while preserving custom, non-UUID ID schemes (e.g., domain.com:1234). This ensures compliance with the OpenRosa spec and prevents potential ID collisions with custom prefixes. - Error Handling: - 202 Accepted: Returns when content is identical to an existing submission and successfully processed. - 409 Conflict: Returns when a duplicate UUID is detected but with differing content. These changes should improve the robustness of the submission process and prevent both race conditions and invalid submissions. ## Notes - Implemented a fix to address the race condition that leads to duplicate submissions with the same UUID and XML hash. - Incorporated improvements from existing work, ensuring consistency and robustness in handling concurrent submissions. - The fix aims to prevent duplicate submissions, even under high load and unreliable network conditions. ## Related issues Supersedes [kobotoolbox/kobocat#876](kobotoolbox/kobocat#876) and kobotoolbox/kobocat#859 --------- Co-authored-by: Olivier Leger <[email protected]>
Description
TBC
Additional info
would supersede #859