Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission duplication: add advisory lock on xml_hash #859

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobertoMaurizzi
Copy link

The problem

During a project we're managing for a national government we noticed several duplicated submissions with the same uuid and xml_hash, as describe in #470

After a bit of investigation, we found a race condition caused by transaction isolation between different sessions: in https://github.com/kobotoolbox/kobocat/blob/main/onadata/libs/utils/logger_tools.py#L177 there's a check to find other submissions with the same xml_hash from the same user, however this check won't be able to detect other submissions being created concurrently because they're all in different database transactions, "since ATOMIC_REQUESTS is set to True".

In this case we need another mechanism to detect if there are other documents with the same xml_hash being submitted but still not completely written to the database.

Reproducing duplication reliably

The main problem with testing a fix for this problem is that it usually happens only when you have many remote devices submitting forms with attachments using unreliable networks.
Given we were suspecting a race condition we wrote an async script that would submit the same form and its attachments 5 times from "parallel" async tasks (in hindsight using requests with multiprocessing would have been a better use of my time but hey, async is fashionable... 😅)

Possible solution

A PostgreSQL's Advisory lock is perfect for the task: it can work across transaction and session boundaries and allows to ensure atomic access using an integer id. So we got the xml_hash number, converted it to an integer then took its 60 least significant bits to obtain a (hopefully still) unique id for the lock.

The strategy is:

  1. open a sub transaction to ensure the lock isn't lost (we're using a transaction-bound lock)
  2. try to acquire a lock on the (partial) xml_hash of the current submission
  3. if the lock is already taken, some other thread is already submitting an entry with the same uuid (and likely receiving its attachments) so raise a DuplicateInstance exception
  4. if not, the execution continues with the same code as before, including the check for an existing document in the database, needed because this might be a continuation of a previous submission that got interrupted by network issues.

Testing the problem and the fix

If you extract the files from this duplication_repro.zip file, you'll find:

  1. a sample_form.xml that you can import in your kf instance (a local development instance works fine)
  2. 2 sample attachment files under testfiles/
  3. the multiple_submission_test.py script

First of all, import the test form in your kf application, taking note of its ID (something like the a7G7MvQMmzARfb2irNsyn3 already in the script)

Create and activate a virtual environment if you're not already in one, then pip install aiohttp aiofiles dict2xml, then edit the script: in the first few lines after the imports, you need to change the following variables so they match your test instance:

# Replace the data in the following variables with the ones for your test environment
server_domain = "kobo.local"
form_id = "a7G7MvQMmzARfb2irNsyn3"
kc_token = "6e223fa57f8ec6a122e76bd7b1bd6c07afb74e1a"  # from http://kf.kobo.local/token
kc_submission_url = f"http://kc.{server_domain}/api/v1/submissions.xml"

run the script from the same directory in which you extracted it: on an unpatched system most of the 5 submissions (often all) will get accepted and you'll have 5 new submissions with exactly the same content.

If you switch to a branch that includes this PR, you should get 1 accepted submission and 4 Duplicate submission responses.

Remaining issues and questions

Most important: I'm not sure if "DuplicateSubmission" is the best exception to raise: maybe telling the client to retry (how?) is a safer approach.
In my opinion the big question however is why it's so common to receive multiple submissions at the same time:

  • Submission retry code in Android app too "aggressive"?
  • Sessions that get stuck open? Shouldn't they time out after a minute or so?

This was an obvious race condition, but there could be more. It took us a long time to find how to write a reproduction script able to work reliably both locally and remotely, so we were unable to deploy the fix to our production system in time (our data collection efforts were pratically closed when we finished this).

P.S.
Many thanks to whoever wrote the clean_duplicated_submissions admin command (probably @noliveleger)

@RobertoMaurizzi
Copy link
Author

RobertoMaurizzi commented Jan 23, 2023

This has been tested on what we have in production, version 2.022.24a
The PR is rebased against main but it didn't get much testing against that yet.

@jnm jnm self-assigned this Jan 23, 2023
@jnm jnm self-requested a review January 23, 2023 22:49
rajpatel24 added a commit to kobotoolbox/kpi that referenced this pull request Jan 15, 2025
## Summary
Implemented logic to detect and reject duplicate submissions.

## Description

We have identified a race condition in the submission processing that
causes duplicate submissions with identical UUIDs and XML hashes. This
issue is particularly problematic under conditions with multiple remote
devices submitting forms simultaneously over unreliable networks.

To address this issue, a PR has been raised with the following proposed
changes:

- Race Condition Resolution: A locking mechanism has been added to
prevent the race condition when checking for existing instances and
creating new ones. This aims to eliminate duplicate submissions.

- UUID Enforcement: Submissions without a UUID are now explicitly
disallowed. This ensures that every submission is uniquely identifiable
and further mitigates the risk of duplicate entries.

- Introduction of `root_uuid`:

- To ensure a consistent submission UUID throughout its lifecycle and
prevent duplicate submissions with the same UUID, a new `root_uuid`
column has been added to the `Instance` model with a unique constraint
(`root_uuid` per `xform`).

- If the `<meta><rootUuid>` is present in the submission XML, it is
stored in the `root_uuid` column.

- If `<meta><rootUuid>` is not present, the value from
`<meta><instanceID>` is used instead.

- This approach guarantees that the `root_uuid` remains constant across
the lifecycle of a submission, providing a reliable identifier for all
instances.

- UUID Handling Improvement: Updated the logic to strip only the `uuid:`
prefix while preserving custom, non-UUID ID schemes (e.g.,
domain.com:1234). This ensures compliance with the OpenRosa spec and
prevents potential ID collisions with custom prefixes.

- Error Handling:
- 202 Accepted: Returns when content is identical to an existing
submission and successfully processed.
- 409 Conflict: Returns when a duplicate UUID is detected but with
differing content.

These changes should improve the robustness of the submission process
and prevent both race conditions and invalid submissions.

## Notes

- Implemented a fix to address the race condition that leads to
duplicate submissions with the same UUID and XML hash.
- Incorporated improvements from existing work, ensuring consistency and
robustness in handling concurrent submissions.
- The fix aims to prevent duplicate submissions, even under high load
and unreliable network conditions.

## Related issues

Supersedes
[kobotoolbox/kobocat#876](kobotoolbox/kobocat#876)
and kobotoolbox/kobocat#859

---------

Co-authored-by: Olivier Leger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants