Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional deadlocks on DB #1675

Open
dukhaSlayer opened this issue Jan 14, 2025 · 1 comment
Open

Occasional deadlocks on DB #1675

dukhaSlayer opened this issue Jan 14, 2025 · 1 comment

Comments

@dukhaSlayer
Copy link
Contributor

dukhaSlayer commented Jan 14, 2025

Sooo...
After submitting this PR #1674 I looked at the code and it seems there are more issues of this nature.

The problem is that sometimes code that holds an open connection to the DB calls another component that tries to open another connection.

As an example:
AccountRecordProcessor is constructed with a Connection, yet in a call to StorageSyncModels.localToRemoteRecord it passes account.getConfigurationStore(), which will try to create a new Connection with keyValueStore.

This particular example might not be critical but there are more issues like that and under load they produce errors while working with storage now and then.

For us, it leads to complete session degradation within a week with the client being completely unable to decrypt incoming messages from known contacts.

I modifier Database.getConnection and found some more deadlocks (basically, SQLBUSY exceptions on store modifications):

    private static final Map<String, String> OpenConnections = new LinkedHashMap<>();

    public final Connection getConnection() throws SQLException {
        var id = UUID.randomUUID().toString();
        var stack = Arrays.stream(Thread.currentThread().getStackTrace()).map(Objects::toString).collect(Collectors.joining(", "));
        logger.trace("getConnection: {} on {}", id, stack);
        OpenConnections.put(id, stack);
        final Connection connection;
        try {
            connection = dataSource.getConnection();
        } catch (SQLException e) {
            logger.error("getConnection() failed for {}, possible deadlock. Open connections (the last one is failed):\n {}",
                id,
                OpenConnections.entrySet().stream().map(kv -> kv.getKey() + ": " + kv.getValue()).collect(Collectors.joining("; \n")),
                e);
            throw e;
        }
        return wrap(logger, id, connection);
    }

    public static Connection wrap(Logger logger, String id, Connection originalConnection) {
        return (Connection) Proxy.newProxyInstance(
            originalConnection.getClass().getClassLoader(),
            new Class[]{Connection.class},
            (proxy, method, args) -> {
                if ("close".equals(method.getName())) {
                    logger.trace("Connection.close() for {}", id);
                    OpenConnections.remove(id);
                }
                return method.invoke(originalConnection, args);
            }
        );
    }

There are many ways to address the issue, e.g.

  • "open" a connection for each operation never holding/passing it around
  • using interfaces/inheritance/wrappers to hide the fact that we're using an existing connection from other components (can be done in many ways). I have done some exercises with the code using this approach and so far I don't see error logs. Will do more testing today/tomorrow. Can submit a patch from what I did so far if you are interested.
  • the best way I can come up with is using an "opaque" db/store context which can provide a connection. The top-level context is constructed in jobs or on message arrival and we don't store it to account but rather pass it around to all methods that require access to DB (to avoid cluttering the code we can pass some other useful things in that context, like loggers, account itself etc). Then when we need an ad-hoc connection, there's a context.withConnection(c -> { ...}) method and if we want to do multiple operations we try (var ctx = ctx.connect()) {} which produces a context of the same type but with an already opened connection. This will allow to have functions, components, stores and helpers that can access DB but are unaware in which context they are called (with or without opened connection). So the code will be really simple and safe.
@AsamK
Copy link
Owner

AsamK commented Jan 14, 2025

The signal-cli database implementation started the first point, i.e. get a new connection for each operation. Then when implementing the storage sync it was extended to use existing connections so reading from storage could be done in a single transaction to prevent partial updates in case of failures.
The current model with some methods taking a connection and some creating a new one is a result of that. And that has indeed been a bit error prone ...

I'd be interested to see your changes.
Adding an additional context would probably be a bigger change, though maybe it could also be implemented with ThreadLocals or ScopedValues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants