-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] fix zero based range performance #3932
base: nightly
Are you sure you want to change the base?
Conversation
8b2b7ed
to
20668b0
Compare
Signed-off-by: Seth Stadick <[email protected]>
20668b0
to
6a60e10
Compare
The CI failure should be resolved by #3922. It is unrelated to the change I'm making and seems due to this change: 4903cf9#diff-f8da59dec02a848d700736433c150c14adfea4c79cb453ded5c247c1c597b654. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sstadick thanks for contributing :)
return self.end - curr | ||
var start = self.start | ||
self.start += 1 | ||
return start | ||
|
||
@always_inline | ||
fn __has_next__(self) -> Bool: | ||
return self.__len__() > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where we can add a micro-optimization :D
return self.__len__() > 0 | |
return self.end > self.start |
|
||
@always_inline | ||
fn __has_next__(self) -> Bool: | ||
return self.__len__() > 0 | ||
|
||
@always_inline | ||
fn __len__(self) -> Int: | ||
return self.curr | ||
return max(0, self.end - self.start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make this branchless. No idea how often this gets called or if the optimization would have a positive effect. Would you be willing to measure the diff using something like
return (self.end - self.start) & -int(self.end > self.start)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! I'll post some numbers when I have them
@martinvuyk, I tried both suggestions, but did not get an improvement. The bench numbers look near identical, and more importantly, the output assembly vs What I tried: import benchmark
from benchmark import keep, Unit
from memory import Span
from ir_utils.dump import dump_ir
from builtin._stubs import _IntIterable
from builtin.range import _StridedRange
@register_passable("trivial")
struct _ZeroStartingRange(Sized, ReversibleRange, _IntIterable):
var start: Int
var end: Int
@always_inline
@implicit
fn __init__(out self, end: Int):
self.start = 0
self.end = end
@always_inline
fn __iter__(self) -> Self:
return self
@always_inline
fn __next__(mut self) -> Int:
var start = self.start
self.start += 1
return start
@always_inline
fn __has_next__(self) -> Bool:
return self.end > self.start
@always_inline
fn __len__(self) -> Int:
return (self.end - self.start) & -int(self.end > self.start)
@always_inline
fn __getitem__(self, idx: Int) -> Int:
debug_assert(idx < self.__len__(), "index out of range")
return self.start + index(idx)
@always_inline
fn __reversed__(self) -> _StridedRange:
return range(self.end - 1, self.start - 1, -1)
fn main() raises:
var input = List[UInt8]()
for i in range(1000000):
input.append(i)
var data = Span(input)
var s1 = 0
for i in _ZeroStartingRange(len(data)):
s1 += int(data[i])
var s2 = 0
for i in range(0, len(data)):
s2 += int(data[i])
if s1 != s2:
raise "Invalid custom iterator"
@parameter
fn range_custom_iter_implicit_zero() raises:
var sum = 0
for i in _ZeroStartingRange(len(data)):
sum += int(data[i])
keep(sum)
@parameter
fn range_without_start() raises:
var sum = 0
for i in range(len(data)):
sum += int(data[i])
keep(sum)
@parameter
fn range_with_start() raises:
var sum = 0
for i in range(0, len(data)):
sum += int(data[i])
keep(sum)
@parameter
fn range_iter() raises:
var sum = 0
for value in data:
sum += int(value[])
keep(sum)
# dump_ir[range_with_start, "range_with_start"]()
# dump_ir[range_without_start, "range_without_start"]()
# dump_ir[
# range_custom_iter_implicit_zero, "range_custom_iter_implicit_zero"
# ]()
print("Custom Iter")
var report = benchmark.run[range_custom_iter_implicit_zero]()
report.print(Unit.ms)
print("Without start")
report = benchmark.run[range_without_start]()
report.print(Unit.ms)
print("With start")
report = benchmark.run[range_with_start]()
report.print(Unit.ms)
print("With direct iter")
report = benchmark.run[range_iter]()
report.print(Unit.ms) Output: Custom Iter
--------------------------------------------------------------------------------
Benchmark Report (ms)
--------------------------------------------------------------------------------
Mean: 0.324559812458138
Total: 2422.839
Iters: 7465
Warmup Total: 0.321
Fastest Mean: 0.324559812458138
Slowest Mean: 0.324559812458138
Without start
--------------------------------------------------------------------------------
Benchmark Report (ms)
--------------------------------------------------------------------------------
Mean: 0.43432440260680666
Total: 2399.208
Iters: 5524
Warmup Total: 0.428
Fastest Mean: 0.43432440260680666
Slowest Mean: 0.43432440260680666
With start
--------------------------------------------------------------------------------
Benchmark Report (ms)
--------------------------------------------------------------------------------
Mean: 0.32141025290498976
Total: 2351.116
Iters: 7315
Warmup Total: 0.31
Fastest Mean: 0.32141025290498976
Slowest Mean: 0.32141025290498976
With direct iter
--------------------------------------------------------------------------------
Benchmark Report (ms)
--------------------------------------------------------------------------------
Mean: 0.3155641093357982
Total: 2389.767
Iters: 7573
Warmup Total: 0.31
Fastest Mean: 0.31556410933579826 Assembly: |
I'm guessing the difference will only be evident for functional style code where the iterator length or |
Totally makes sense! Which version of this would you prefer:
|
I think someone from the stdlib team would have to chime in. But if you want a small change merged, then I'd leave the code as is (plus the micro-optimizations if you want to). But, if you don't have much problem waiting then I'd say you could completely drop |
Ha! I thought you were one of the stdlib devs! I'll leave it as is for now till I get some more feedback, and then maybe a separate PR to remove |
See #3931 all details.
This is the minimal possible change to get
range(end)
performance matchingrange(0, end)
. The cleaner approach would be to haverange(end)
just callrange(0, end)
, most likely.The potential optimization in the original version of the
_ZeroStartingRange
that could skip a call tomax
seems to make it harder for the compiler to generate efficient assembly for loops (see linked issue for example assembly). This is possibly a compiler bug, and_ZeroStartingRange
could be re-instated to its original implementation.