-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support cancellation of timer operations #110
Conversation
Looking good, I do have a question though and it's a long one. Basically, I want to ensure that Concurrent ML's
To explain In fibers, we have considered Now back to
And then the example goes like this (translated): (define (annotate-operation op nack-fn)
(withNack
(lambda (nack-op)
(spawn-fiber (lambda ()
(nack-fn (perform-operation nack-op))))
op))) The idea is that when you make a nack-event, your nack guard function will be called on a fresh event (operation), as if from a The fiber spawned by the nack guard function could then, say, send a message back to a remote server to release some kind of resource. The corner cases come in for
Here we want the nack condition to signal if op-a synchronizes, but not to signal if either of op-b or op-c synchronize. In https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=9393e5ba6fa5cdcd981fee71c3bdfee8841c047d §5.2, Donnely and Fluet show how to implement So, again, my questions:
Also, when we settle on the answer, can I request some documentation, please? Thank you :) |
Hey @wingo, I'm afraid I cannot answer your questions, this being my first time hearing about What I can say is that this change preserves If we cannot answer the question of whether cancellation functions would prevent Regarding documentation, the commit updates the Thanks for your detailed review! |
(put-message channel 'hello) | ||
(loop (+ i 1)))))) | ||
|
||
(let ((initial-heap-size (heap-size))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're checking heap sizes, best do a gc beforehand, otherwise some heavy activity could lead to false negatives (as in 'no bug detected even though it exists').
Would it be possible (and sufficiently informative & meaningful) to instead check the length of the timer wheel? Seems less finicky to me (e.g. what if in the future tests are run in parallel in a single process).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if something finicky like heap sizes is avoided, I imagine the number of iterations could be reduced a lot (good for test performance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that when #109 is present, the heap would grow way beyond the 2x limit that's tested here; it would not go unnoticed. (Maybe we could make the test faster but it was already reasonably fast in my experience.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How fast is 'reasonably fast'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that when #109 is present, the heap would grow way beyond the 2x limit that's tested here; it would not go unnoticed. (Maybe we could make the test faster but it was already reasonably fast in my experience.)
That's going to lead to false positives (*) in case of concurrent tests, or a hypothetical future 'Guile OS' where all Guile is run in a single process (with appropriate isolation, but also with a shared heap and GC).
(*) where positive = "there is a bug"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't answered yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this comment as well. Again, I really don't think there's going to be false positives here; please judge for yourself by commenting out the "cancel" function of timers to see where it goes.
That said we can always add a comment in the test to clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refuse to [rewrite the foundations of operating systems, Guile processes or Fiber's test suite just to test this elementary logical conclusion]. (brackets added for clarity]. And why clarify things when you can just fix things? Surely a length check of the timer wheel would be straightforward, and less noisy than heap size information - it should even be feasible to check the exact length (two iterations should be sufficient, could be increased a little 'just in case').
Also, the question on speed remains unanswered.
fibers/operations.scm
Outdated
;; (sched) -> () | ||
(cancel-fn base-op-cancel-fn)) | ||
|
||
(define* (make-base-operation wrap-fn try-fn block-fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ABI reasons, there needs to be a way to tell to Guile 'don't inline this' (not blocking this PR).
I have just seen this issue and have only skimmed the paper you linked so far, but I wrote about the lack of In particular, I'd highlight § 7 of Flatt and Findler, “Kill-Safe Synchronization Abstractions” (PLDI 2004):
I haven't read Donnely and Fluet closely enough (or re-read the implementation of I wrote in #97 that I don't think The mention of the fact in Flatt and Findler that “continuation jumps back into a guard are always blocked by our definition of |
Hey @LiberalArtist, Thanks for the explanation and for the reference. It would seem to me that what I called "cancellation" here could actually help implement At any rate, the goal of this PR is to fix #109, which I consider serious. As long as it doesn't prevent future work such as implementing Would you object, @LiberalArtist, @wingo, or @emixa-d? |
Good point to avoid blocking this bug fix, if possible, while thinking through deeper questions. If you went with this suggestion:
do I understand correctly that this fix would not add any new API, just fix a problem with the current implementation? Or is If there's no new API, I don't see how this could possibly pose a (new) problem. One subtlety for cancellation and #lang racket
(define saved-nack #f)
(sync (nack-guard-evt (λ (gave-up-evt)
(set! saved-nack gave-up-evt)
always-evt)))
(sync saved-nack) ; must never return |
(define wheel-entry | ||
;; If true, this is the currently active timer entry for this operation. | ||
#f) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a problem. Previously, operation objects are reusable, and nothing suggested you need to re-create the operation object (convenient for loops!). Now they aren't, and worse it's undocumented.
If state is needed, it needs to be moved into arguments of one of the 'lambdas' below (is API change, but you could define separate 'make-base-operation' and 'make-base-operation/stateful'). Another option is to rename 'flag' to state', make it a pair of (atomic box with flag . wheel-entry) and allow overriding the default construction of the state ((fibers operations)
initialises to an atomic box, but it doesn't actually use its contents -- doing something with it is left entirely to the individual implementations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that passing state around would be more elegant. In practice though, the current approach is okay IMO because the variable is closed over by the closures of the operation.
In this case, I fixed the problem you mentioned (being able to reuse a timer operation after it's been "canceled") simply by resetting the timer-wheel
variable upon cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice though, the current approach is okay IMO because the variable is closed over by the closures of the operation.
AFAICT, there is still a problem if a single timer operation value is used concurrently from multiple fibers. (Having guard-operation
would make this pattern work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, there is still a problem if a single timer operation value is used concurrently from multiple fibers. (Having guard-operation would make this pattern work.)
Apologies @LiberalArtist, I missed this comment of yours.
Would using an atomic box (as I suggested in the first message) solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it might be possible to adjust the timer wheel implementation to use atomic boxes everywhere + compare and swap, but that seems inefficient, and because two pointers need to be replaced, difficult to do correctly and verify for correctness. It seems simpler & less error-prone to me to simply add a state argument.
In case of 'replace #f' by 'atomic box containing #f', no. We need to assign things to the right thread (in particular, the right scheduler, because of work stealing(?)) and atomic boxes don't do such things, they impose ordering constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note: IIUC, if/when guard
is created, an explicit 'state' argument could be eliminated, although a 'state' operation-construction API could still be provided for convenience.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, re atomicity, I was thinking about the (set! timer-wheel ...)
bit, but you're saying the timer wheel implementation itself should be made atomic?
I must say I'm unclear on that, though my understanding is that there's one timer wheel per scheduler and one scheduler per thread, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that it should be made atomic. The state version is also an option.
though my understanding is that there's one timer wheel per scheduler and one scheduler per thread, no?
And that's why, if you don't go for the state version, it needs to be atomic -- you are saving a timer wheel entry of the current scheduler in the closure, and due to work stealing the fiber might migrate to another scheduler (IIUC), so the cancellation can be run from another scheduler, and fiddling with another threads data structures leads to trouble unless designed for that.
I don't think questions on whether it is sufficient Also see my comments on code and tests. |
No. It adds API to Where it is necessary (or not strictly necessary, but very convenient for implementation or performance), I'm not against making operations single-use, but then it should actually be documented that the particular operation is single-use (and it should also be defined what single-use means: only once per call to
Why? I gather that this is Racket semantics, but I don't see why this limitation should be included in Guile -- the second Edit: right, I imagine this might perhaps be useful for complex combinations of operations and nesting, but this still is a serious incompatible change that shouldn't be swept under the rug (in a previous version of Scheme-GNUnet, I constructed a complex operation before a loop and then reused it as an optimisation). So, should be a v2.0 version thing. |
I've read (part of) this again, and I think (but I could definitely be wrong!) that the "lower-level primitive" needed is type CMLEvt a = IO ([AckVar], Evt ([AckVar], IO a)) I'm also not sure to what extent Donnely and Fluet's system makes sense in a language with impure side-effects. (It might indeed make sense, I just haven't figured it out for myself one way or the other yet.) If you're interested in transactions and three-way rendezvous, I remember someone (maybe Sam Tobin-Hochstadt?) saying Aaron Turon's “reagents” would be a good place to look (but I haven't read it in any kind of detail).
It's not for me to say what the goal of Fibers should be. But if the goal is to provide an implementation of CML primitives—and I do think that is a useful goal!—then I would argue that Fibers should plan to provide both missing CML primitives,
This is the Concurrent ML semantics of (define (guard-operation thunk)
(nack-guard-operation (lambda (ignored-nack)
(thunk))) This is why I wrote in #97:
As far as reuse:
Yes, operations should be reusable, but maybe in a subtler sense than one might initially guess. I think the Donnely and Fluet paper explained this well on p. 653, in particular sentence I've put in bold here:
(I'd have to look more closely to say whether the implementation in this PR maintains the needed invariants.) For a concrete example, I strongly recommend the paper “Kill-Safe Synchronization Abstractions”, which walks through the implementation of the (define ach
(make-async-channel))
(define e
(async-channel-put-evt ach "message")) The Of course, it is possible to implement an operation that won't become ready more than once: (define (once-operation val)
(define ch (make-channel))
(spawn-fiber (lambda ()
(put-message ch val)))
(get-operation ch))
Sorry, that may not have been totally clear: I was assuming at that point taking @civodul's suggestion in #110 (comment) to not change But is
This is a separate, important question that, as I said, I'd have to look again more closely to try to answer. |
Note that this reply and what you replied to don't have to do with each other - fixing cancellation shouldn't hinder implementation of extra operations, and if those extra operations need changes to cancellation that that can simply be done, as mentioned previously ... Also, it is for you to say what the goal should be as much as it is for anyone else here. Also, I don't consider that the goal of Fibers. For me the goal of Fibers is to provide a useful concurrency library based on fibers and composable operations, not to reimplement CML the exact same way (neither is it to not do that, and neither is the goal to reinvent the wheel).
"Q: Why do this semantics of P for R? A: because that's the semantics of Q too." Your description of this semantics doesn't explain why these semantics (to people who don't share your goal of implementing Concurrent ML'). 'X being particular to a Y' could perhaps also be handled by ... what I mentioned in my comment about state (and elsewhere about extra arguments to procedures in the
This does not follow. Why would they need to distinguish between each call, and why would it need to be fresh? This does not at all seem necessary to get underlying operation of the 'guard-evt' thunk (which does not actually need to be fresh -- because of its nature it probably is fresh, but there is no actual requirement for freshness, and in case of 'only perform once' operations, it sometimes might perhaps be useful to allow for returning the same operation multiples times -- in short, it's none of Going by the documentation of
-- all the I can imagine that re-use of the
I recommend against that paper, because the following paragraph from the paper is untrue in Fibers
(there is no such GC behaviour in Fibers yet -- fibers are strongly references to IIRC). Also, that behaviour is wrong -- if it's inside a
This is not an implementation of the thing, see my previous paragraph.
There is no list of public/non-public in Fibers. I would treat it as non-public.
It's the other way around -- this is not the separate question, all the questions about 'guard' and 'nack' are the separate questions (see: title of this PR). |
027411a
to
19e992b
Compare
The update I just pushed introduces |
* fibers/operations.scm (<base-op>): Rename constructor to ‘%make-base-operation’. [cancel-fn]: New field. (make-base-operation/internal, cancel-other-operations): New procedures. (perform-operation)[block]: Define ‘resume’ to call ‘cancel-other-operations’ because calling the real ‘resume’.
* fibers/timer-wheel.scm (timer-wheel-remove!): New procedure.
Fixes wingo#109. Previously, an operation like: (choice-operation (sleep-operation 1234) (get-operation channel)) would accumulate timer wheel entries every time the ‘get’ operation wins over the ‘sleep’ operation, potentially leading to unbounded memory usage (each ‘sleep’ timer and its associated continuation would remain on the wheel for 1234 seconds in this case). This commit fixes it by removing the timer wheel entry as soon as the timer operation is canceled. * fibers/timers.scm (timer-operation)[wheel-entry]: New variable. Set it in block function. Use ‘make-timer-operation/internal’ and add cancel function. * fibers/scheduler.scm (scheduler-timers): Export. * tests/cancel-timer.scm: New file. * Makefile.am (TESTS): Add it.
19e992b
to
6facb6a
Compare
Oops, I messed up with the default value of the |
@@ -141,6 +142,18 @@ | |||
(else | |||
(timer-wheel-add! (or outer (add-outer-wheel! wheel)) t obj))))))) | |||
|
|||
(define (timer-wheel-remove! wheel entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument wheel
is unused
This patch series fixes #109 by supporting a "cancel" function for base operations and using it to remove timer wheel entries when a timer "loses" in a choice operation.
I'd like feedback in particular on the last commit: can someone confirm
set!
good enough, or should we use an atomic box instead?I ran the test suite of the Shepherd and that of Cuirass against this branch: the former has good coverage though it uses a single POSIX thread, the latter has not-so-good coverage but uses multiple POSIX threads. Both passed.
Thoughts?