Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a quota handler callback #423

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

rg0now
Copy link
Contributor

@rg0now rg0now commented Nov 18, 2024

Note: the PR is on top of #419, see the changes on top of that PR here.

RFC 8656 includes the following text:

To mitigate either intentional or unintentional denial-of-service attacks against the server by clients with valid usernames and passwords, it is RECOMMENDED that the server impose limits on both the number of allocations active at one time for a given username and on the amount of bandwidth those allocations can use. The server should reject new allocations that would exceed the limit on the allowed number of allocations active at one time with a 486 (Allocation Quota Exceeded [sic!]) [error].

(Note the mistake in the text: the name of the error is in fact "Allocation Quota Reached", not "Allocation Quota Exceeded".)

This PR adds a quota handler callback function which, if specified, is called by the server just before making an allocation for a user. The handler should return a single bool: if true then the allocation request can proceed, otherwise the request is rejected with the 486 (Allocation Quota Reached) error. Then, the lifecycle API can be used to track the number of active allocations per user and this callback can be leveraged to reject allocation requests that would exceed the user's quota.

Note that the other DoS mitigation recommendation given in the RFC (limiting the amount of bandwidth a single user can use) is not targeted by this PR.

@rg0now rg0now requested a review from Sean-Der November 18, 2024 20:33
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 91.50000% with 17 lines in your changes missing coverage. Please review.

Project coverage is 68.94%. Comparing base (3ff9392) to head (9927614).

Files with missing lines Patch % Lines
internal/server/util.go 72.00% 5 Missing and 2 partials ⚠️
internal/allocation/five_tuple.go 60.00% 4 Missing ⚠️
server.go 57.14% 2 Missing and 1 partial ⚠️
internal/server/nonce.go 50.00% 0 Missing and 1 partial ⚠️
relay_address_generator_range.go 0.00% 1 Missing ⚠️
server_config.go 97.43% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
+ Coverage   66.63%   68.94%   +2.31%     
==========================================
  Files          43       43              
  Lines        2919     3104     +185     
==========================================
+ Hits         1945     2140     +195     
+ Misses        807      798       -9     
+ Partials      167      166       -1     
Flag Coverage Δ
go 68.94% <91.50%> (+2.31%) ⬆️
wasm 25.54% <3.00%> (-1.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rg0now rg0now force-pushed the quota-handler branch 3 times, most recently from 14ea578 to 9927614 Compare December 5, 2024 14:07
@JoeTurki
Copy link
Member

JoeTurki commented Jan 11, 2025

@rg0now Sorry for causing conflicts in your PRs due to the linter updates [1]. Would you like me to fix them for all of your branches?


On a side note, I’ve been reviewing past contributions, to understand more about Pion, and I’m a big fan of your reviews. I'm going to get more involved with Pion and would love to submit PRs to implement some of the missing TURN specs (after I finish my todolist for other repos). I’m really looking forward to getting your reviews, can't wait :)

Thank you for all the work you do on Pion/Turn

  1. Impr/more linters .goassets#220

@rg0now
Copy link
Contributor Author

rg0now commented Jan 15, 2025

@JoeTurki Just fixed the conflicts, no worries. I super-appreciate your work on pion, what you're doing is so immensely useful! If we are at it, can you please also take a look at #419 ?

And thanks for the nice words too. Whenever I have time I come here and do core-reviews. My only worry is that sometimes we may get too picky and this can scare people off (I guess this is how we lost @AndyLc , see #315). I guess sometimes we should just let unfinished code in and fix stuff later in order to encourage people to contribute...

@JoeTurki
Copy link
Member

JoeTurki commented Jan 17, 2025

@rg0now

Thanks :)

can you please also take a look at #419 ?

I did actually take a look, I do like it.

My main perspective on this is from a developer experience (DX) standpoint as a library user.
And going through all the Pion repos, I believe we need an abstraction for defining events, systematically:

  1. Blocking and behavior-changing events (like the control mechanisms in your PR), That allow us to change a setting or cancel an action etc.
  2. Non-blocking telemetry events that allow users to inspect ongoing activity.
  3. Events can send custom state.

Something like an event emitter library could help establish consistent behavior for library users, similar to the web EventEmitter, but more tailored to Pion's, I think we need this.

However, I'm not sure, I can starting working on drafting this right now and we can consider postponing it until v5, but I believe we can introduce this without making breaking changes.

My only worry is that sometimes we may get too picky and this can scare people off (I guess this is how we lost @AndyLc , see #315).

Feel free to go hard on me as you can, I just want to do things right :)

cc @jech maybe this is something we should add to the v5 wiki?

// allocation's lifecycle. All events are reported with the context that identifies the allocation
// triggering the event (source and destination address, protocol, username and realm used for
// authenticating the allocation). It is OK to handle only a subset of the callbacks.
type EventHandlers struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called Callbacks rather than EventHandlers?

Also, should we call it ServerCallbacks? It's not going to appear in a lot of code, perhaps we should be more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is top-level API I guess we'll always use this as turn.XXX from the caller site. This suggests that maybe ServerCallbacks would be the right name, WDYT?

@jech
Copy link
Member

jech commented Jan 17, 2025

As a general rule, I'm suspicious of callbacks, I prefer direct style whenever possible, especially in Go. In this commit, you're adding a whole bunch of callbacks, so naturally I'm suspicious a whole bunch ;-)

Are all of these callbacks necessary? Perhaps some of these can be omitted, and others can be converted to direct code?

@@ -82,6 +90,21 @@ func (a *Allocation) AddPermission(perms *Permission) {
a.permissions[fingerprint] = perms
a.permissionsLock.Unlock()

if a.callback != nil {
if u, ok := perms.Addr.(*net.UDPAddr); ok {
a.callback(EventHandlerArgs{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code repeats a bunch of times. Factor it out into a helper function?

@@ -27,7 +27,8 @@ type Request struct {
NonceHash *NonceHash

// User Configuration
AuthHandler func(username string, realm string, srcAddr net.Addr) (key []byte, ok bool)
AuthHandler func(username string, realm string, srcAddr net.Addr) (key []byte, ok bool)
QuotaHandler func(username string, realm string, srcAddr net.Addr) (ok bool)
Copy link
Member

@jech jech Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be returning error here. There should be a specific error value (tested with error.Is) that causes the server to return Quota Exceeded, and the other values should cause it to return Internal server error (or the TURN equivalent, I haven't checked).

This has two advantages: it will allow us to return other errors in the future without breaking the API, for other return codes, and it will provide better logging by allowing the to wrap the distinguished error value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we decided for this API is (1) the error returned by the user is largely ignored by the server (we always return the same standard STUN error code stun.CodeAllocQuotaReached to the client regardless of the error) and (2) because this is very similar to the AuthHandler API. Happy to change though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be returning error here. There should be a specific error value (tested with error.Is) that causes the server to return Quota Exceeded, and the other values should cause it to return Internal server error (or the TURN equivalent, I haven't checked).

AuthHandler has the exact same structure in that it returns a bool instead of an error. There's an in-progress PR to redo the design, should we change the signature there to likewise return an error?

relayAddr, peer net.Addr, channelNumber uint16)
}

func genericEventHandler(handlers EventHandlers) allocation.EventHandler { //nolint:cyclop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this dispatch function? It looks to me like complexity for the sake of complexity. Why not have a bunch of individual handlers stored in a record, and directly call them, without need for the arg.Type field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EventHandlers struct here is a top-level API that we must access across several subpackages without creating circular imports. I see 3 ways to solve this:

  • Define the struct in a subpackage and then re-export from server_config.go. This does not feel idiomatic in pion, plus we lose the top-level docs (Kubernetes does this a lot though, see https://github.com/kubernetes-sigs/controller-runtime/blob/v0.20.0/alias.go).
  • Repeat the same def in a stub subpackage that we import everywhere we need it, as well as in the top level server_config.go. This just feels wrong though.
  • The "dispatcher" approach above. I agree this is ugly but at least it avoids the above isues.

Happy to adopt approach you see fit.

@JoeTurki
Copy link
Member

As a general rule, I'm suspicious of callbacks, I prefer direct style whenever possible, especially in Go. In this commit, you're adding a whole bunch of callbacks, so naturally I'm suspicious a whole bunch ;-)

I do agree, But what design pattern do you suggest for making overridable / controllable behavior? Should we expose more of the library and make users define their own Allocation here for example?

@jech
Copy link
Member

jech commented Jan 17, 2025

First of all, we shold consider which of these callbacks are actually needed. My suspicion is that there's only one or two that are actually useful, and that the others are just here for completeness. We should only implement the ones that are provably needed, since we can always add more later, but once we've added them, we won't be able to remove them.

Second, we should consider whether the full generality of a callback is required. Perhaps we can just add a boolean or integer option somewhere, and avoid a callback?

@rg0now
Copy link
Contributor Author

rg0now commented Jan 17, 2025

As a general rule, I'm suspicious of callbacks, I prefer direct style whenever possible, especially in Go. In this commit, you're adding a whole bunch of callbacks, so naturally I'm suspicious a whole bunch ;-)

I do agree, But what design pattern do you suggest for making overridable / controllable behavior?

I think your distinction between callbacks that implement "overridable/controllable behavior" and ones used for telemetry is crucial. For the former (like the auth-handler or the quota-handler) synchronous callbacks seem the best fit, while for the latter (like the event-handler API in #419) something asynchronous (like a channel for emitting events) feels way more "right" than the synchronous callbacks we propose above.

Oh, and I agree: pion needs a generic event emitter lib for generalizing this. Do you have any good reference design from a major Go lib we could copy? We could try to prototype the API here in a branch and then, once we're happy with the design we can move it into a separate module.

@rg0now
Copy link
Contributor Author

rg0now commented Jan 17, 2025

First of all, we shold consider which of these callbacks are actually needed. My suspicion is that there's only one or two that are actually useful, and that the others are just here for completeness. We should only implement the ones that are provably needed, since we can always add more later, but once we've added them, we won't be able to remove them.
Second, we should consider whether the full generality of a callback is required. Perhaps we can just add a boolean or integer option somewhere, and avoid a callback?

Essentially all the server events defined in #419 have been asked or proposed in some way or another earlier: authentication events in #402, allocation lifecycle events in #324, channel lifecycle events in #360, etc., and our OpenTelemetry reference design uses each.

In general I see no reason not to be liberal in defining our events: the hooks come with zero runtime overhead and the more data we surface with events the more future proof pion becomes.

@jech
Copy link
Member

jech commented Jan 17, 2025

Essentially all the server events defined in #419 have been asked or proposed in some way or another earlier:

Oh, good.

In general I see no reason not to be liberal in defining our events: the hooks come with zero runtime overhead and the more data we surface with events the more future proof pion becomes.

If we define a callback now, and then it turns out we need more data in the callback, we cannot easily add a new parameter without breaking compatibility.

If we refrain from defining a callback until there's a demonstrated need for it, there's an increased chance that we'll get it right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants