Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate between GCRA burst capacity and tolerance #251

Conversation

jonasmalacofilho
Copy link
Contributor

In GCRA terms, the total capacity of the rate limiter is Τ + τ, where Τ is the nominal cell arrival interval and τ is (extra) tolerance allowed for bursts.

The current implementation has a subtle issue, caused by conflating capacity and tolerance, which results in one extra cell being allowed after arrival intervals equal or greater than "capacity + 1", as reported in #107 and #249. The problem was partially hidden in other cases by a compensating addition of Τ to initial TAT in the case of the first cell (i.e. when the rate limiter has no state recorded for it).

To fix the issue, correct the value tau used for making the decision, making it the tolerance and not the capacity, and remove the compensation for the first cell that's no longer necessary.

This PR also adds a new specific test for the "capacity + 1" interval scenario, as well as fixes an existing test that incorrectly expected one of these extra cells. Finally, some needless lifetimes reported by clippy are also removed since the project opts for #[deny(warnings)].

Fixes: #107
Fixes: #249

A rate limiter constructed with burst capacity 2 should never allow more
than 2 cells in a short burst.
In GCRA terms, the total capacity of the rate limiter is `Τ + τ`, where
`Τ` is the nominal cell arrival interval and `τ` is (extra) tolerance
allowed for bursts.

The current implementation has a subtle issue, caused by conflating
capacity and tolerance, which results in one extra cell being allowed
after arrival intervals equal or greater than "capacity + 1", as
reported in boinkor-net#107 and boinkor-net#249.

This problem was partially hidden by a compensating addition of Τ to
initial TAT in the case of the first cell (i.e. when the rate limiter
has no state recorded for it).

To fix the issue, correct the value `tau` used for making the decision,
making it the tolerance, and remove the adjustment for the first cell
that's no longer necessary.

Fixes: boinkor-net#107
Fixes: boinkor-net#249
@jonasmalacofilho
Copy link
Contributor Author

The cargo deny failure is unrelated to these changes, but I can allow-list the "Unicode-3.0" license for unicode-ident, if acceptable to you.

Relevant except from the cargo deny run log:

error[rejected]: failed to satisfy license requirements
  ┌─ unicode-ident 1.0.14 (registry+https://github.com/rust-lang/crates.io-index):4:36
  │
4 │ license = "(MIT OR Apache-2.0) AND Unicode-3.0"
  │            ------------------------^^^^^^^^^^^
  │            │                       │
  │            │                       rejected: not explicitly allowed
  │            license expression retrieved via Cargo.toml `license`
  │
  = Unicode-3.0 - Unicode License v3:
  =   - OSI approved
  = unicode-ident v1.0.14
    ├── proc-macro2 v1.0.92
    │   ├── quote v1.0.37
    │   │   ├── serde_derive v1.0.215
    │   │   │   ├── criterion v0.5.1
    │   │   │   │   └── (dev) governor v0.7.0
    │   │   │   └── serde v1.0.215
    │   │   │       ├── ciborium v0.2.2
    │   │   │       │   └── criterion v0.5.1 (*)
    │   │   │       ├── criterion v0.5.1 (*)
    │   │   │       ├── serde_json v1.0.133
    │   │   │       │   ├── criterion v0.5.1 (*)
    │   │   │       │   └── tinytemplate v1.2.1
    │   │   │       │       └── criterion v0.5.1 (*)
    │   │   │       └── tinytemplate v1.2.1 (*)
    │   │   ├── syn v2.0.90
    │   │   │   ├── serde_derive v1.0.215 (*)
    │   │   │   ├── wasm-bindgen-backend v0.2.99
    │   │   │   │   └── wasm-bindgen-macro-support v0.2.99
    │   │   │   │       └── wasm-bindgen-macro v0.2.99
    │   │   │   │           └── wasm-bindgen v0.2.99
    │   │   │   │               ├── js-sys v0.3.[76](https://github.com/boinkor-net/governor/actions/runs/12218164850/job/34093914103#step:5:77)
    │   │   │   │               │   └── web-sys v0.3.76
    │   │   │   │               │       ├── plotters v0.3.7
    │   │   │   │               │       │   └── criterion v0.5.1 (*)
    │   │   │   │               │       └── quanta v0.12.3
    │   │   │   │               │           └── governor v0.7.0 (*)
    │   │   │   │               ├── plotters v0.3.7 (*)
    │   │   │   │               └── web-sys v0.3.76 (*)
    │   │   │   ├── wasm-bindgen-macro-support v0.2.99 (*)
    │   │   │   └── zerocopy-derive v0.7.35
    │   │   │       └── zerocopy v0.7.35
    │   │   │           └── ppv-lite[86](https://github.com/boinkor-net/governor/actions/runs/12218164850/job/34093914103#step:5:87) v0.2.20
    │   │   │               └── rand_chacha v0.3.1
    │   │   │                   ├── proptest v1.5.0
    │   │   │                   │   └── (dev) governor v0.7.0 (*)
    │   │   │                   └── rand v0.8.5
    │   │   │                       ├── governor v0.7.0 (*)
    │   │   │                       └── proptest v1.5.0 (*)
    │   │   ├── wasm-bindgen-backend v0.2.99 (*)
    │   │   ├── wasm-bindgen-macro v0.2.99 (*)
    │   │   ├── wasm-bindgen-macro-support v0.2.99 (*)
    │   │   └── zerocopy-derive v0.7.35 (*)
    │   ├── serde_derive v1.0.215 (*)
    │   ├── syn v2.0.[90](https://github.com/boinkor-net/governor/actions/runs/12218164850/job/34093914103#step:5:91) (*)
    │   ├── wasm-bindgen-backend v0.2.99 (*)
    │   ├── wasm-bindgen-macro-support v0.2.99 (*)
    │   └── zerocopy-derive v0.7.35 (*)
    └── syn v2.0.90 (*)

@antifuchs
Copy link
Collaborator

Thanks for the contribution - your explanation and the change seems really good on first glance (haven't read through all of it yet, but soon)!

I'm OK with allowlisting the unicode license - it's a bit annoying that ~every project needs to add that specific one combination of licenses to its list, but oh well (:

Copy link
Collaborator

@antifuchs antifuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I have one open/question that I'm sure you thought of already (and that my morning brain just can't reason through fully yet), but otherwise I'm super excited to include this and fix all those bugs (:

.into();
let t: Nanos = quota.replenish_1_per.into();
let t: Nanos = cmp::max(quota.replenish_1_per, Duration::from_nanos(1)).into();
let tau: Nanos = t * (quota.max_burst.get() - 1).into();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, hm, hm. I think this might cause tau to become 0 if you pick a burst capacity of 1... I'm slightly convinced that won't cause problems in the tat calculation below, but not fully certain. a tau=0 case is fine, right?

Copy link
Contributor Author

@jonasmalacofilho jonasmalacofilho Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will cause tau to be zero, intentionally. If the burst capacity is one, we want to reject any cells that arrive before the TAT. Only with capacities greater than one that we want to allow for non-zero tolerance tau.

The TAT is (ignoring the max operation) the time the last cell arrived + the nominal interval t. So if tau was greater than zero for a burst capacity of one, it would cause us to allow a cell to be accepted before the nominal interval has passed since the last cell, effectively accepting a burst of two and violating the burst capacity.

Copy link
Contributor Author

@jonasmalacofilho jonasmalacofilho Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to answer your actual question: yes, tau=0 should be fine in the decision calculations.

What I'm less sure about is if I adjusted all places where tau is used outside of the gcra module... in fact, let me take another look today and see if I didn't miss anything the tests also didn't catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad I did another pass, as I found an issue in gcra::test_n_all_and_update. Fixed it and adjusted the relevant test case to cover that case too.

@jonasmalacofilho jonasmalacofilho force-pushed the dont-allow-extra-cell-after-longer-interval branch from fe0a3dc to a527973 Compare December 9, 2024 18:49
The second GCRA parameter is the tolerance `tau`. The capacity of the
bucket is `t + tau`.

The `tau` parameter was accidentally renamed to `capacity` in
273f77b as left over changes from a previous experiment.

Fixes: 273f77b ("Differentiate between GCRA burst capacity and tolerance")
Since 273f77b, both relate to the cells *in addition* to the first
cell.

Fixes: 273f77b ("Differentiate between GCRA burst capacity and tolerance")
@jonasmalacofilho jonasmalacofilho force-pushed the dont-allow-extra-cell-after-longer-interval branch from a527973 to d4685a3 Compare December 9, 2024 18:56
Copy link
Collaborator

@antifuchs antifuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks really good. I'm extra glad that this makes the additional_weight handling easier!

@antifuchs antifuchs added this pull request to the merge queue Dec 10, 2024
Merged via the queue into boinkor-net:master with commit a028daf Dec 10, 2024
16 checks passed
@jonasmalacofilho
Copy link
Contributor Author

Thanks!

@antifuchs
Copy link
Collaborator

Thank you! I'll cut a release containing this soon & will credit you in the changelog (:

antifuchs added a commit that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Getting a free cell after clock advances Unexpected Behaviour for Batched Cells
2 participants