Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock committing to the object log when a request that performs a call:answer: hits a commit conflict #71

Closed
jbrichau opened this issue Apr 13, 2015 · 15 comments
Assignees
Milestone

Comments

@jbrichau
Copy link
Member

The GRGemStonePlatform>>transactionMutex is a TransientRecursionLock that guards the processing of a Seaside request. Such a TransientRecursionLock keeps a reference to the process that is locking it's critical section, to allow recursive wait calls on the mutex.

The seaside call/answer implementation in Gemstone manipulates processes, eventually leading to a change of the active process. In such a case, when the request handling leaves the critical section, it's active process identifier will have changed.

When a request hits a commit conflict, a commit should be done on the object log. This commit is guarded by the same transactionMutex. For the same process, entry is thus allowed. For any other process, entry is blocked.

As a result, when a request processed a call/answer and hits a commit conflict, the commit to the object log will block. See discussion http://forum.world.st/Glass-Recursion-deadlock-with-TransientRecursionLock-td4819261.html

I reconstruct this by randomly returning false from GRPlatform>>doTransaction and using an app that has a WATask where call:answer is used. This makes it easy to reconstruct the issue. In production, we randomly hit the issue every xx weeks per seaside gem (as mentioned in the mail thread).

@jbrichau
Copy link
Member Author

A possible fix (proposed in the mail thread) is to manually set the lockingProcess in the transactionMutex when returning from the seaside request processing. I think this is safe to do because we are still in the mutex's critical section and no other process might have dropped in.

Another fix might be not to use doTransaction: for performing object log commits (essentially not using the transactionMutex anymore for those commits). Of course, this might change behavior where applications manually create the object log entries from within the app (I believe we do this as well), but that would possibly also hit the same issue (blocking).

So, this thread is here to discuss...

@dalehenrich
Copy link
Member

Good point about not using doTransaction: in selected spots ... of course the recursion lock is supposed to work, but the recursion lock isn't necessary when we "know" that we are already in a transaction so an alternate implementation for doTransaction might be:

doTransaction: aBlock
  "Evaluate aBlock in a transaction. 
     Return true if the transaction succeeds and false if the transaction fails.
     Nested calls are allowed and will always return true.
     See System class>>transactionConflicts for dealing with failed transactions."

  "Ensure that each block evaluation is mutually exclusive: https://code.google.com/p/glassdb/issues/detail?id=355"

  System inTransaction
        ifTrue: [ 
          "We already are in a transaction, so just evaluate the block"
          aBlock value.
          ^ true ]
        ifFalse: [ self transactionMutex critical: [ 
          | commitResult |
          "Get the transactionMutex, and perform the transaction."

          [ 
          self doBeginTransaction.
          aBlock value ]
            ensure: [ 
              "workaround for Bug 42963: ensure: block executed twice (don't return from ensure: block)"
              commitResult := self doCommitTransaction ] ].
          ^ commitResult ]

@dalehenrich
Copy link
Member

Ahhh but in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we are not in transaction, so while my suggestion is not a bad thing to do, it wouldn't help the current situation ... and @jbrichau suggestion might be the best fix available ...

@jbrichau jbrichau added this to the 3.1.4.1 milestone Apr 14, 2015
@jbrichau jbrichau self-assigned this Apr 14, 2015
@dalehenrich dalehenrich modified the milestones: 3.1.4.1, 3.1.4.2 Jun 5, 2015
jbrichau pushed a commit that referenced this issue Jun 6, 2015
@jbrichau
Copy link
Member Author

jbrichau commented Jun 6, 2015

The good news here is that since we are running this fix in production, our Seaside gem failures have been reduced to almost zero. This is an important improvement for stability.

@marianopeck
Copy link
Contributor

@jbrichau uhhh it would have been cool to include this in 3.1.4.1 :(

@jbrichau
Copy link
Member Author

jbrichau commented Jun 6, 2015

Version numbers are cheap :)
I wonder why the build fails though.

Johan (sent from my mobile)

On 06 Jun 2015, at 14:37, marianopeck [email protected] wrote:

@jbrichau uhhh it would have been cool to include this in 3.1.4.1 :(


Reply to this email directly or view it on GitHub.

@dalehenrich
Copy link
Member

which build is failing ... I've seen lots o green recently .... perhaps we should push out 3.1.4.2 shortly? I've got nothing pending ...

@marianopeck
Copy link
Contributor

@dalehenrich I would love to have this fix as part of a 3.1.4.2!

@jbrichau
Copy link
Member Author

jbrichau commented Jun 9, 2015

There was a failed 3.2.6 build but I refreshed it and then it was green. Random failure seems to happen sometimes.

On 09 Jun 2015, at 01:29, Dale Henrichs [email protected] wrote:

which build is failing ... I've seen lots o green recently .... perhaps we should push out 3.1.4.2 shortly? I've got nothing pending ...


Reply to this email directly or view it on GitHub.

@jbrichau
Copy link
Member Author

jbrichau commented Jun 9, 2015

Mind that issue #72 is now exposed and needs to be fixed too. These were the two issues queued up for milestone 3.1.4.2

@dalehenrich
Copy link
Member

@marianopeck, picking up the bugfixes that you want is a major argument for using a local clone of the github repository ... then you don't have to wait for Johan or I (or you) to fix and tag bugs:)

dalehenrich added a commit that referenced this issue Jun 19, 2015
@marianopeck
Copy link
Contributor

Hi @jbrichau @dalehenrich
I found out that the implementation of #seasideProcessRequestWithRetry:resultBlock: from this commit: https://github.com/GsDevKit/Seaside31/blob/793303f0252d7a268fb719e9eb1d5ae31453b206/repository/Seaside-GemStone-Core.package/GRGemStonePlatform.extension/instance/seasideProcessRequestWithRetry.resultBlock..st

does something that looks wrong to me:

       self
        saveLogEntry:
          (WAObjectLogEntry
            warn: 'Commit failure - retrying'
            request: aNativeRequest url
            object: conflicts)

Note that you are setting "aNativeRequest url" to #request:. Then that fails when you try to use WAObjectLog and sends #requestString to WAObjectLogEntry which looks like this:

   requestString

self request == nil ifTrue: [ ^super requestString ].
^self request url asString

So... I recommend to set "aNativeRequest" rather than "aNativeRequest url".

Thoughts?

@jbrichau
Copy link
Member Author

this is because of issue #64 but you are right that this leads to an error in The WAObjectLog app

@marianopeck
Copy link
Contributor

Hi @jbrichau
But the error would be at any user of #requestString since the 'request' instVar is assigned a string rather than a request object. And yes, WAObjectLog being one of those users. Maybe we can fix #requestString to do a ... isString kind of test? ....

@dalehenrich
Copy link
Member

Because of Issue #64, it's not a good idea to persist a #nativeRequest, I'm inclined to add an extra IV called requestString and change the calls to pass an url instead the nativeRequest (it wasn't a WARequest either:) ... then the request field can be used for folks that need that information and are willing to put up with possibly persisting a socket if using FastCGI ... that should take care of things... a bit more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants