-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock committing to the object log when a request that performs a call:answer: hits a commit conflict #71
Comments
A possible fix (proposed in the mail thread) is to manually set the lockingProcess in the transactionMutex when returning from the seaside request processing. I think this is safe to do because we are still in the mutex's critical section and no other process might have dropped in. Another fix might be not to use doTransaction: for performing object log commits (essentially not using the transactionMutex anymore for those commits). Of course, this might change behavior where applications manually create the object log entries from within the app (I believe we do this as well), but that would possibly also hit the same issue (blocking). So, this thread is here to discuss... |
Good point about not using doTransaction: in selected spots ... of course the recursion lock is supposed to work, but the recursion lock isn't necessary when we "know" that we are already in a transaction so an alternate implementation for doTransaction might be: doTransaction: aBlock
"Evaluate aBlock in a transaction.
Return true if the transaction succeeds and false if the transaction fails.
Nested calls are allowed and will always return true.
See System class>>transactionConflicts for dealing with failed transactions."
"Ensure that each block evaluation is mutually exclusive: https://code.google.com/p/glassdb/issues/detail?id=355"
System inTransaction
ifTrue: [
"We already are in a transaction, so just evaluate the block"
aBlock value.
^ true ]
ifFalse: [ self transactionMutex critical: [
| commitResult |
"Get the transactionMutex, and perform the transaction."
[
self doBeginTransaction.
aBlock value ]
ensure: [
"workaround for Bug 42963: ensure: block executed twice (don't return from ensure: block)"
commitResult := self doCommitTransaction ] ].
^ commitResult ] |
Ahhh but in GRGemStonePlatform>>seasideProcessRequestWithRetry:resultBlock: we are not in transaction, so while my suggestion is not a bad thing to do, it wouldn't help the current situation ... and @jbrichau suggestion might be the best fix available ... |
The good news here is that since we are running this fix in production, our Seaside gem failures have been reduced to almost zero. This is an important improvement for stability. |
@jbrichau uhhh it would have been cool to include this in 3.1.4.1 :( |
Version numbers are cheap :) Johan (sent from my mobile)
|
which build is failing ... I've seen lots o green recently .... perhaps we should push out 3.1.4.2 shortly? I've got nothing pending ... |
@dalehenrich I would love to have this fix as part of a 3.1.4.2! |
There was a failed 3.2.6 build but I refreshed it and then it was green. Random failure seems to happen sometimes.
|
Mind that issue #72 is now exposed and needs to be fixed too. These were the two issues queued up for milestone 3.1.4.2 |
@marianopeck, picking up the bugfixes that you want is a major argument for using a local clone of the github repository ... then you don't have to wait for Johan or I (or you) to fix and tag bugs:) |
Hi @jbrichau @dalehenrich does something that looks wrong to me:
Note that you are setting "aNativeRequest url" to #request:. Then that fails when you try to use WAObjectLog and sends #requestString to WAObjectLogEntry which looks like this:
So... I recommend to set "aNativeRequest" rather than "aNativeRequest url". Thoughts? |
this is because of issue #64 but you are right that this leads to an error in The WAObjectLog app |
Hi @jbrichau |
Because of Issue #64, it's not a good idea to persist a #nativeRequest, I'm inclined to add an extra IV called |
The GRGemStonePlatform>>transactionMutex is a TransientRecursionLock that guards the processing of a Seaside request. Such a TransientRecursionLock keeps a reference to the process that is locking it's critical section, to allow recursive wait calls on the mutex.
The seaside call/answer implementation in Gemstone manipulates processes, eventually leading to a change of the active process. In such a case, when the request handling leaves the critical section, it's active process identifier will have changed.
When a request hits a commit conflict, a commit should be done on the object log. This commit is guarded by the same transactionMutex. For the same process, entry is thus allowed. For any other process, entry is blocked.
As a result, when a request processed a call/answer and hits a commit conflict, the commit to the object log will block. See discussion http://forum.world.st/Glass-Recursion-deadlock-with-TransientRecursionLock-td4819261.html
I reconstruct this by randomly returning false from GRPlatform>>doTransaction and using an app that has a WATask where call:answer is used. This makes it easy to reconstruct the issue. In production, we randomly hit the issue every xx weeks per seaside gem (as mentioned in the mail thread).
The text was updated successfully, but these errors were encountered: