Different approach with a few benefits #230

notEvil · 2023-08-12T12:30:15Z

Hi,

I've been tinkering with line_profiler and implemented a different approach to line profiling:

https://github.com/notEvil/line_profiler/tree/sub

Essentially, it keeps track of "calls" (PyTrace_CALL and PyTrace_RETURN are used for generators and async as well) and by doing so records "total time": time spent within a function call, generator or async step, excluding time spent in line profiled functions that were called. From there it calculates "cumulative time".

This approach has multiple benefits:

"total time"
times in recursive function calls are attributed correctly (see comparisons below)
times add up (see https://github.com/pyutils/line_profiler#frequently-asked-questions)
calls can be traced
- the prototype creates simple statistics

Potential downside is increased overhead, but in my testing it wasn't much slower.

I've also created a Neovim plugin (not published yet) to see the differences:

import time


@profile
def f():
    time.sleep(0.1)
    (g(1), g(2))
    g(3)


@profile
def g(n):
    if n == 0:
        return

    time.sleep(0.1)
    g(n - 1)


f()

Original

New cumulative

New total

Tested on a real application (40s runtime, no async)

The text was updated successfully, but these errors were encountered:

Erotemic · 2023-08-12T16:05:22Z

It would help to see the python -m line_profiler <script>.lprof results in addition to the neovim (heatmap?) visualization to get a better sense of what is happening and what the difference is.

notEvil · 2023-08-12T16:27:51Z

There you go

Original

Timer unit: 1e-06 s

Total time: 0.70161 s
File: /home/arappold/git/line_profiler/sub.py
Function: f at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def f():
     6         1     100060.9 100060.9     14.3      time.sleep(0.1)
     7         1     300288.5 300288.5     42.8      (g(1), g(2))
     8         1     301260.9 301260.9     42.9      g(3)

Total time: 0.601441 s
File: /home/arappold/git/line_profiler/sub.py
Function: g at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def g(n):
    13         9          6.2      0.7      0.0      if n == 0:
    14         3          0.9      0.3      0.0          return
    15                                           
    16         6     601357.4 100226.2    100.0      time.sleep(0.1)
    17         6         76.5     12.8      0.0      g(n - 1)

New cumulative

Timer unit: 1e-06 s

Total time: 0.701555 s
File: /home/arappold/git/line_profiler/sub.py
Function: f at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def f():
     6         1     100061.0 100061.0     14.3      time.sleep(0.1)
     7         1     300265.7 300265.7     42.8      (g(1), g(2))
     8         1     301228.6 301228.6     42.9      g(3)

Total time: 1.00325 s
File: /home/arappold/git/line_profiler/sub.py
Function: g at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def g(n):
    13         1          7.2      7.2      0.0      if n == 0:
    14         1          3.6      3.6      0.0          return
    15                                           
    16         1     601360.9 601360.9     59.9      time.sleep(0.1)
    17         1     401881.4 401881.4     40.1      g(n - 1)

The prototype doesn't yet count the number of hits (not difficult to add)

Theelx · 2023-08-12T16:50:37Z

That looks promising, but the tracing seems to have a 4x overhead based on the return function in the base case (assuming it was hit 3 times in your cumulative version). While that's not too significant in this case, it can be very significant in tighter loops, and it would be beneficial to create a tight loop to test this in.

notEvil · 2023-08-12T17:20:41Z

How do you infer a 4x overhead from that output? I just tested

@profile
def f():
    for _ in range(int(1e7)):
        g()


@profile
def g():
    pass


f()

and it takes 22.37s with v4.0.3 and 18.53s with the new approach. The prototype runs the new approach in addition to the original one, so I temporarily disabled the original part for this crude benchmark.

Theelx · 2023-08-12T17:23:48Z

Ah, I looked at the return statement in the base case and it looked like it took 1.2us per hit in your version whereas the original took 0.3us per hit. Usually, very simple lines like a plain return statement are virtually instant in Python, and almost all the time taken is in the profiler.

Theelx · 2023-08-12T17:35:14Z

Here's a benchmark that shows around a 11x slowdown with the current line_profiler vs unprofiled code (total time 500ms unprofiled vs 5.4s profiled):

Time Elapsed tight: 494.96ms
Time Elapsed tight: 5390.15ms
Slowdown from profiling: 4895.19ms, 10.89x

Total time: 5.39021 s
File: tight.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           def main():
     4         1          3.1      3.1      0.0      start = time.perf_counter()
     5  20000001    2628245.8      0.1     48.8      for x in range(20000000):
     6  20000000    2761886.3      0.1     51.2          y = x
     7         1         31.5     31.5      0.0      elapsed_ms = round((time.perf_counter()-start)*1000, 2)
     8         1         42.2     42.2      0.0      print(f"Time Elapsed tight: {elapsed_ms}ms")
     9         1          1.3      1.3      0.0      return elapsed_ms

Your code gives these results:

Time Elapsed tight: 498.77ms
Time Elapsed tight: 18367.76ms
Slowdown from profiling: 17868.989999999998ms, 36.83x
Wrote profile results to tight.py.lprof
Timer unit: 1e-06 s

Total time: 5.91982 s
File: /home/theel/Coding/line_profiler/line_profiler_cumulative/tight.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           def main():
     4         1          1.7      1.7      0.0      start = time.perf_counter()
     5  20000001    2961819.0      0.1     50.0      for x in range(20000000):
     6  20000000    2957938.7      0.1     50.0          y = x
     7         1         18.6     18.6      0.0      elapsed_ms = round((time.perf_counter()-start)*1000, 2)
     8         1         39.2     39.2      0.0      print(f"Time Elapsed tight: {elapsed_ms}ms")
     9         1          0.6      0.6      0.0      return elapsed_ms

I believe the total time elapsed shown by line_profiler is incorrect, as I definitely waited at least 14 seconds for yours to finish (based off watching the clock). This indicates that your code has a ~37x slowdown, which is ~3x more overhead than the current line_profiler.

Theelx · 2023-08-12T17:50:44Z

The thing to remember (which I forgot in my first comment here!) is that line_profiler's measurements of time taken in the output aren't actually the overhead, as line_profiler calls hpTimer() to try to measure the overhead, and eliminate it from the output.

notEvil · 2023-08-12T18:11:03Z

I fail to reproduce your results. Could you try to run them again with the new commit. I changed a few things so you can get a fair comparison with your benchmark.

Theelx · 2023-08-12T18:20:55Z

This version is significantly better, but still about 1.5x slower:

Time Elapsed tight: 513.24ms
Time Elapsed tight: 9631.74ms
Slowdown from profiling: 9118.5ms, 18.77x
Wrote profile results to tight.py.lprof
Timer unit: 1e-06 s

Total time: 4.17773 s
File: /home/theel/Coding/line_profiler/line_profiler_cumulative/tight.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           def main():
     4         1          1.7      1.7      0.0      start = time.perf_counter()
     5         1    2005221.9 2005221.9     48.0      for x in range(20000000):
     6         1    2172444.6 2172444.6     52.0          y = x
     7         1         21.0     21.0      0.0      elapsed_ms = round((time.perf_counter()-start)*1000, 2)
     8         1         38.0     38.0      0.0      print(f"Time Elapsed tight: {elapsed_ms}ms")
     9         1          0.7      0.7      0.0      return elapsed_ms

notEvil · 2023-08-12T18:35:07Z

The weird thing is, my results are the complete opposite. v4.0.3 gives me "Time Elapsed tight: 12030.22ms" while the new commit "Time Elapsed tight: 6169.69ms" (tripple checked). My process

checkout tag/branch
pipenv --rm
pipenv run pip install /path/to/repo (Python 3.11.3)
pipenv run kernprof -l ...

Theelx · 2023-08-12T18:37:55Z

Ohhhhh, the issue is that I was using my fork from #203 instead of the current one. I can confirm that yours is faster than the version of line_profiler in this repo, but it would be slower if #203 were to be merged.

notEvil · 2023-08-12T18:39:22Z

Great to hear :) I think most improvements can be carried over. I'll give it a try sometime next week

Theelx · 2023-08-12T19:18:21Z

See notEvil#1

notEvil · 2023-08-12T20:46:47Z

Nice, thanks! So if I understand correctly, performance is fine.

Regarding the differences to the current approach, the implementation at the bottom of _line_profiler.pyx is supposed to be readable. Just ignore the parts about call. If its not clear, I can add comments pointing out its details.

notEvil · 2023-08-13T08:15:49Z

The latest commit now counts the number of line hits and provides the times in a backwards compatible way.

Timer unit: 1e-06 s

Total time: 0.700474 s
File: /home/arappold/git/line_profiler/sub.py
Function: f at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def f():
     6         1     100057.6 100057.6     14.3      time.sleep(0.1)
     7         1     300195.1 300195.1     42.9      (g(1), g(2))
     8         1     300221.3 300221.3     42.9      g(3)

Total time: 1.00071 s
File: /home/arappold/git/line_profiler/sub.py
Function: g at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def g(n):
    13         9          2.4      0.3      0.0      if n == 0:
    14         3          0.5      0.2      0.0          return
    15                                           
    16         6     600365.7 100060.9     60.0      time.sleep(0.1)
    17         6     400342.9  66723.8     40.0      g(n - 1)

"Total time" reported for g is wrong because its summing up cumulative times which don't add up for recursive functions. Total times would, but the default output should remain cumulative times. So maybe remove the "% Time" column if its a recursive function (can be inferred from call stats)?

notEvil · 2023-08-13T10:44:08Z

An example of what call stats would allow: graph.dot.pdf
created with https://github.com/notEvil/line_profiler/blob/sub/graph.py
edit: keep in mind that generators create "calls" on every next/yield and therefore directly into sub functions

notEvil · 2023-08-16T15:52:22Z

I'm happy to announce the next iteration :) Getting useful numbers in case of recursive code was a lot more difficult than anticipated, but here they are nonetheless. The last commit

adds block_times: how much time is spent in each function,
adds call_times: how much time is spent in the function called
and fixes "cumulative time" of lines

It is impossible to derive block_times and call_times from the line timings nor from call stats since in the presence of recursion, cumulative times don't add up.

Now the output for the initial example is

Timer unit: 1e-06 s

Total time: 0.700534 s
File: /home/arappold/git/line_profiler/example1.py
Function: f at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def f():
     6         1     100061.6 100061.6     14.3      time.sleep(0.1)
     7         1     300238.6 300238.6     42.9      (g(1), g(2))
     8         1     300233.5 300233.5     42.9      g(3)

Total time: 0.600456 s
File: /home/arappold/git/line_profiler/example1.py
Function: g at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def g(n):
    13         9          3.6      0.4      0.0      if n == 0:
    14         3          0.8      0.3      0.0          return
    15                                           
    16         6     600380.3 100063.4    100.0      time.sleep(0.1)
    17         6     300267.2  50044.5     50.0      g(n - 1)

Testet with

Code

import time


def wait(n):
    time.sleep(n * 0.01)


@profile
def f(n):
    f1(n)
    f2(n)
    f3(n)
    f4(n)
    f5(n)
    f6(n)


@profile
def f1(n):
    if n == 0:
        return

    wait(1)
    f1(n - 1)


@profile
def f2(n):
    if n == 0:
        return

    wait(1)
    (f2(n - 1), f2(n // 2))


@profile
def f3(n):
    if n == 0:
        return

    wait(1)
    f3_2(n - 1)


@profile
def f3_2(n):
    if n == 0:
        return

    wait(1)
    f3(n - 1)


@profile
def f4(n):
    if n == 0:
        return

    wait(1)
    (f4_2(n - 1), f4_2(n // 2))


@profile
def f4_2(n):
    if n == 0:
        return

    wait(1)
    (f4(n - 1), f4(n // 2))


@profile
def f5(n):
    if n == 0:
        return

    wait(1)
    f5_2(n - 1)


@profile
def f5_2(n):
    if n == 0:
        return

    wait(1)
    f5_3(n - 1)


@profile
def f5_3(n):
    if n == 0:
        return

    wait(1)
    f5(n - 1)


@profile
def f6(n):
    if n == 0:
        return

    wait(1)
    (f6_2(n - 1), f6_2(n // 2))


@profile
def f6_2(n):
    if n == 0:
        return

    wait(1)
    (f6_3(n - 1), f6_3(n // 2))


@profile
def f6_3(n):
    if n == 0:
        return

    wait(1)
    (f6(n - 1), f6(n // 2))


f(6)

graph.dot.pdf

Output

Timer unit: 1e-06 s

Total time: 0.754708 s
File: /home/arappold/git/line_profiler/sub.py
Function: f at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def f(n):
    10         1      60377.5  60377.5      8.0      f1(n)
    11         1     191180.4 191180.4     25.3      f2(n)
    12         1      60361.0  60361.0      8.0      f3(n)
    13         1     191203.2 191203.2     25.3      f4(n)
    14         1      60369.8  60369.8      8.0      f5(n)
    15         1     191216.3 191216.3     25.3      f6(n)

Total time: 0.0603742 s
File: /home/arappold/git/line_profiler/sub.py
Function: f1 at line 18

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    18                                           @profile
    19                                           def f1(n):
    20         7          1.3      0.2      0.0      if n == 0:
    21         1          0.1      0.1      0.0          return
    22                                           
    23         6      60350.0  10058.3    100.0      wait(1)
    24         6      50312.4   8385.4     83.3      f1(n - 1)

Total time: 0.191177 s
File: /home/arappold/git/line_profiler/sub.py
Function: f2 at line 27

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    27                                           @profile
    28                                           def f2(n):
    29        39          4.8      0.1      0.0      if n == 0:
    30        20          1.1      0.1      0.0          return
    31                                           
    32        19     191067.2  10056.2     99.9      wait(1)
    33        19     181120.1   9532.6     94.7      (f2(n - 1), f2(n // 2))

Total time: 0.0603579 s
File: /home/arappold/git/line_profiler/sub.py
Function: f3 at line 36

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    36                                           @profile
    37                                           def f3(n):
    38         4          0.6      0.1      0.0      if n == 0:
    39         1          0.1      0.1      0.0          return
    40                                           
    41         3      30169.0  10056.3     50.0      wait(1)
    42         3      50301.2  16767.1     83.3      f3_2(n - 1)

Total time: 0.0502975 s
File: /home/arappold/git/line_profiler/sub.py
Function: f3_2 at line 45

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    45                                           @profile
    46                                           def f3_2(n):
    47         3          0.4      0.1      0.0      if n == 0:
    48                                                   return
    49                                           
    50         3      30168.7  10056.2     60.0      wait(1)
    51         3      40241.4  13413.8     80.0      f3(n - 1)

Total time: 0.191199 s
File: /home/arappold/git/line_profiler/sub.py
Function: f4 at line 54

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    54                                           @profile
    55                                           def f4(n):
    56        21          2.8      0.1      0.0      if n == 0:
    57        12          0.7      0.1      0.0          return
    58                                           
    59         9      90512.6  10057.0     47.3      wait(1)
    60         9     181143.0  20127.0     94.7      (f4_2(n - 1), f4_2(n // 2))

Total time: 0.181137 s
File: /home/arappold/git/line_profiler/sub.py
Function: f4_2 at line 63

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    63                                           @profile
    64                                           def f4_2(n):
    65        18          2.2      0.1      0.0      if n == 0:
    66         8          0.4      0.0      0.0          return
    67                                           
    68        10     100576.7  10057.7     55.5      wait(1)
    69        10     161017.3  16101.7     88.9      (f4(n - 1), f4(n // 2))

Total time: 0.060366 s
File: /home/arappold/git/line_profiler/sub.py
Function: f5 at line 72

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    72                                           @profile
    73                                           def f5(n):
    74         3          0.4      0.1      0.0      if n == 0:
    75         1          0.1      0.1      0.0          return
    76                                           
    77         2      20113.6  10056.8     33.3      wait(1)
    78         2      50308.6  25154.3     83.3      f5_2(n - 1)

Total time: 0.0503043 s
File: /home/arappold/git/line_profiler/sub.py
Function: f5_2 at line 81

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    81                                           @profile
    82                                           def f5_2(n):
    83         2          0.3      0.2      0.0      if n == 0:
    84                                                   return
    85                                           
    86         2      20115.2  10057.6     40.0      wait(1)
    87         2      40247.2  20123.6     80.0      f5_3(n - 1)

Total time: 0.0402436 s
File: /home/arappold/git/line_profiler/sub.py
Function: f5_3 at line 90

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    90                                           @profile
    91                                           def f5_3(n):
    92         2          0.3      0.1      0.0      if n == 0:
    93                                                   return
    94                                           
    95         2      20113.8  10056.9     50.0      wait(1)
    96         2      30186.7  15093.3     75.0      f5(n - 1)

Total time: 0.191212 s
File: /home/arappold/git/line_profiler/sub.py
Function: f6 at line 99

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    99                                           @profile
   100                                           def f6(n):
   101        13          1.8      0.1      0.0      if n == 0:
   102         6          0.3      0.0      0.0          return
   103                                           
   104         7      70401.5  10057.4     36.8      wait(1)
   105         7     181155.2  25879.3     94.7      (f6_2(n - 1), f6_2(n // 2))

Total time: 0.181149 s
File: /home/arappold/git/line_profiler/sub.py
Function: f6_2 at line 108

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   108                                           @profile
   109                                           def f6_2(n):
   110        14          1.7      0.1      0.0      if n == 0:
   111         8          0.5      0.1      0.0          return
   112                                           
   113         6      60347.1  10057.8     33.3      wait(1)
   114         6     161035.2  26839.2     88.9      (f6_3(n - 1), f6_3(n // 2))

Total time: 0.161023 s
File: /home/arappold/git/line_profiler/sub.py
Function: f6_3 at line 117

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   117                                           @profile
   118                                           def f6_3(n):
   119        12          1.6      0.1      0.0      if n == 0:
   120         6          0.3      0.1      0.0          return
   121                                           
   122         6      60343.6  10057.3     37.5      wait(1)
   123         6     120793.6  20132.3     75.0      (f6(n - 1), f6(n // 2))

Theelx · 2023-08-16T19:10:10Z

Thanks for this great work! I'm going to rebase my PR to your fork on this latest work, could you take a look at it once I do that? After this work of yours is merged, I can start working on adding CPython 3.12 support to line_profiler.

notEvil · 2023-08-16T19:51:11Z

Thanks :)
I would still consider it work-in-progress. The call stats feature is somewhat out of scope of line profiling, naming of stuff breaks with former conventions (which is different from the more common convention of "total" vs "cumulative"), some parts are pretty ugly, the default output from CLI could make better use of the available information, ...
And there is performance, which has to be worse than before (when both are equally optimized).
It would be great if someone more experienced with the existing code base could review the fork and propose changes.

Theelx · 2023-08-16T20:00:04Z

I just tested the performance, and it is indeed significantly worse than before. Previously, notEvil#1 was able to pull it down to 13x overhead over no profiling, in the tight loop case from above, whereas now with the same optimizations applied, it'd 16x overhead over no profiling. That being said, I believe I can optimize it further, as some of your data structure choices may be suboptimal performance-wise.

tmm1 · 2023-08-21T19:59:24Z

@notEvil thanks for sharing this! many years ago i wrote a line profiler for another dynamic language runtime and I think I used a similar call/return stacking approach.

if you get a chance, would you mind taking a look at the merge conflicts pulling in the latest main after the recent autoprofiler changes. would love to try the two together.

notEvil · 2023-10-17T16:57:10Z

@tmm1 sry for not responding! I will check the conflicts soon, I promise ^^

I've pushed a new commit* which should improve confidence in the solution. The previous version was manually tested on a few examples with manual inspection of the output. graph.py was very helpful, but obviously we'd want something automated with a lot more variation. So I started the journey of implementing strategies for generating code with Hypothesis**, implementing a reference profiler and using it to validate the numbers. And it took a while to work as I wanted it to (statements spanning multiple lines, recursions, generators, async, ...), to look reasonably good and eventually for the tests to pass. So my apologies for keeping this private and not providing any updates.

The testing is not perfect:

It might generate lots of useless examples
It uses busy loops to simulate work
- context switches introduce inaccuracies
- busy loops are more accurate than time.sleep
- inaccurate timings can and will eventually produce false positives
It doesn't validate call profiles (hit counts and cumulative time) and call stats
- a "call" (CALL RETURN pair) might be a function call or a generator/coroutine continuation
- it is not easy to add generator/coroutine "call"s to the reference profiler
- I don't think it makes sense to further complicate the Cython code to distinguish them

What I like:

The generated examples look very diverse and complex
The reference profiler is very simple

So now that it "works" and won't get any more complex, I'd say its a good time to review and benchmark.

https://github.com/notEvil/line_profiler/tree/sub

* changes

added counting of hits for blocks and calls
added counting of primitive hits for lines, blocks and calls
added randomized tests
improved graph.py to show the additional information
renamed structs LineTime and Timing to CLineProfile and LineProfile respectively
added structs CBlockProfile and CCallProfile, and classes BlockProfile and CallProfile
fixed minor regressions

** If you don't know Hypothesis yet, I'd definitely recommend checking it out!

notEvil · 2023-10-19T17:43:08Z

@tmm1 just merged main and except for trivial conflicts it went smooth. The randomized tests still pass :) ~~Beware though that line_profiler.py isn't adjusted yet and might show unexpected things, especially aggregates.~~

@Theelx you created a PR back then for better performance and comparisons. I didn't merge because I didn't want to mix the two proposals and it looked like it was based off of #203 but without git reference (merge). Would you consider the PR mature enough to merge with sub on a test branch?

notEvil · 2023-11-03T16:24:01Z

Just a quick update: I tried to measure and improve performance and got some interesting results. The benchmark is https://gitlab.com/notEvil/code_collapse applied to itself which spends most of the time in Black and the remaining time in nested generators (tree traversal) and LibCST.

name	Python 3.11	vs baseline	Python 3.12	vs baseline
baseline	16.1		34.3
v4.1.2	27.2	168.9%	178.6	520.7%
sub	28.2	175.2%	167.1	487.2%
key opt	28	173.9%	161.6	471.1%
tss opt*	27.6	171.4%
preshed opt*	27.2	168.9%
map opt*	26.3	163.4%
time opt*	25.8	160.3%
all opt	25.2	156.5%	150.9	439.9%

key opt: use address of bytecode << 24 + line number as key (with lower bits of address as offset for some randomness)
- not worth it imo, but the first one I tried
tss opt: store thread id in thread local storage to save some threading.get_ident()
preshed opt: use Preshed for entry check
map opt: use parallel_flat_hash_map and flat_hash_map from parallel-hashmap instead of unordered_map
time opt: check time just after entry check instead of asap
all opt: all combined, see https://github.com/notEvil/line_profiler/tree/subopt

As you can see, I wasn't able to significantly improve performance of sub. Maybe time opt is worth the tradeoff (creates some bias), what do you think?

I will try to figure out why Python 3.12 is performing so bad compared to 3.11 on my project. Regarding the bad performance when line profiling on 3.12, can you reproduce this? e.g. jsonpickle

Update 2023-11-04

Python 3.12 is a bit slower than 3.11 in general, but the large difference is due to Black not compiling for 3.12 (due to a bug in Mypyc). So the benchmark is flawed in so far as it presents a measure of slowdown that is not representative (most time is spent in compiled code which doesn't add any overhead). Maybe we could run pyperformance somehow.

Also regarding key opt, turns out it is not uncommon for short functions to have the exact same bytecode (get_metadata in pyperf and at least two in re) and therefore the same hash. So sub is currently broken (https://github.com/notEvil/line_profiler/blob/f0c17516e1012dd99e9e161a6a5057a97b60a02b/line_profiler/_line_profiler.pyx#L593) and current line_profiler is affected if the line ranges of such functions intersect. key opt would solve it.

* either directly from #203 or inspired by

notEvil · 2023-11-05T14:03:13Z

Good news I guess:

name	baseline	v4.1.2	sub	preshed opt	key opt	tss opt	time opt	map opt	subopt
generators	57.18	52.70	58.95	56.27	56.43	53.36	53.65	52.26	48.58
deltablue	4.46	36.44	32.43	30.90	30.04	28.51	28.67	28.87	25.98
richards	54.57	34.22	28.39	26.44	25.54	23.47	24.36	23.68	23.72
richards_super	66.02	31.38	26.71	24.68	23.29	21.88	22.82	22.03	21.61
raytrace	331.44	26.21	25.36	24.02	22.78	21.72	21.95	21.25	20.72
scimark_sor	135.53	26.55	24.00	23.27	22.11	20.61	21.08	20.71	20.41
scimark_lu	128.57	24.43	21.93	21.37	20.67	19.62	19.61	19.04	19.37
hexiom	7.08	24.97	22.51	21.27	20.02	18.92	19.62	18.90	18.42
go	154.55	31.02	23.95	21.68	21.09	19.64	20.74	19.70	18.35
spectral_norm	116.12	19.70	17.70	17.10	16.44	15.51	15.74	15.54	15.18
logging_silent	0.00	24.76	19.56	16.20	16.40	14.26	16.67	14.63	13.42
scimark_monte_carlo	75.35	23.64	16.81	15.02	14.47	13.15	14.32	13.40	12.67
pyflate	484.89	18.75	14.83	13.62	13.20	11.84	12.91	12.24	11.91
float	90.14	16.91	13.15	12.60	12.27	11.48	11.83	11.47	11.33
deepcopy_memo	0.04	25.66	17.43	13.73	14.40	12.50	14.69	13.26	11.31
chaos	75.99	17.22	13.79	12.68	12.63	11.41	12.09	11.56	11.24
unpack_sequence	0.00	19.93	12.92	11.42	11.29	10.12	11.33	10.69	10.10
deepcopy_reduce	0.00	18.36	13.14	10.96	11.23	9.99	11.39	10.40	9.33
deepcopy	0.40	18.00	12.54	10.02	10.59	9.40	11.00	9.37	8.31
pprint_pformat	1645.84	15.44	10.61	9.34	9.27	8.11	9.24	8.54	7.90
logging_simple	0.01	13.87	10.63	8.89	8.88	7.92	8.84	8.01	7.59
pprint_safe_repr	804.19	15.28	10.22	8.56	8.73	7.59	8.81	8.08	7.54
logging_format	0.01	12.97	10.10	8.41	8.42	7.58	8.42	7.61	7.25
nbody	112.98	14.13	8.85	7.67	7.67	6.83	7.76	7.32	6.74
comprehensions	0.03	7.25	7.35	7.12	7.03	6.73	6.31	6.52	6.30
fannkuch	426.18	14.32	8.48	7.33	7.16	6.16	7.23	7.00	6.18
tomli_loads	2337.04	6.11	6.12	6.78	6.14	6.03	4.79	5.57	5.60
coroutines	28.96	5.29	5.58	5.71	5.71	5.61	4.92	5.37	5.05
genshi_text	25.17	5.47	6.08	5.88	6.00	5.99	4.88	5.44	4.99
scimark_sparse_mat_mult	4.77	11.09	6.65	5.73	5.69	5.06	5.73	5.26	4.95
scimark_fft	358.69	10.09	6.39	5.61	5.46	4.83	5.54	5.17	4.79
genshi_xml	57.81	5.76	5.91	5.51	5.64	5.42	4.77	5.12	4.72
chameleon	7.54	5.41	5.66	5.63	5.57	5.37	4.40	5.11	4.67
nqueens	93.59	7.13	6.04	5.60	5.55	5.28	5.12	5.22	4.67
meteor_contest	119.01	9.93	5.96	5.21	5.19	4.56	5.23	4.88	4.39
async_generators	409.54	5.05	4.87	4.81	4.77	4.66	4.34	4.39	4.29
crypto_pyaes	82.58	8.73	5.38	4.89	4.69	4.27	4.68	4.50	4.21
sqlglot_parse	1.58	4.52	4.67	4.66	4.66	4.68	3.99	4.30	4.19
coverage	82.92	4.06	4.06	4.08	4.08	4.05	4.10	4.03	4.05
async_tree_none_tg	536.83	7.00	5.52	4.77	4.86	4.14	4.53	4.18	3.96
sqlglot_transpile	1.92	4.31	4.43	4.43	4.41	4.44	3.77	4.05	3.94
mako	11.94	4.36	4.44	4.38	4.45	4.41	3.82	4.11	3.91
xml_etree_process	62.94	4.72	4.57	4.76	4.64	4.55	3.70	4.29	3.88
async_tree_memoization_tg	718.20	6.29	4.99	4.41	4.43	3.98	4.48	3.99	3.75
sqlglot_normalize	123.56	3.98	4.17	4.31	4.11	4.08	3.55	3.77	3.69
async_tree_none	603.23	6.02	4.83	4.20	4.20	3.84	4.17	3.85	3.69
async_tree_memoization	765.38	5.85	4.93	4.11	4.16	3.80	4.27	3.79	3.64
async_tree_io_tg	1354.10	5.92	4.79	4.20	4.24	3.82	4.18	3.85	3.57
async_tree_io	1407.59	5.66	4.64	4.01	4.07	3.70	4.02	3.70	3.51
sqlglot_optimize	60.38	3.86	3.97	4.00	3.99	3.94	3.43	3.55	3.48
html5lib	67.36	4.46	4.00	3.85	3.86	3.77	3.45	3.66	3.38
async_tree_cpu_io_mixed_tg	823.45	5.33	4.33	3.72	3.75	3.41	3.71	3.40	3.24
async_tree_cpu_io_mixed	850.96	5.24	4.14	3.59	3.62	3.33	3.60	3.32	3.17
xml_etree_generate	91.35	3.78	3.61	3.89	3.65	3.63	3.00	3.41	3.12
mdp	3100.88	4.11	3.64	3.41	3.39	3.24	3.14	3.15	3.01
json_dumps	13.07	5.30	3.89	3.24	3.36	2.99	3.38	3.10	2.84
sqlite_synth	0.00	3.98	3.18	3.01	3.01	2.81	2.95	2.88	2.78
telco	7.70	5.50	3.72	3.07	3.03	2.72	3.08	2.95	2.77
sympy_expand	541.44	2.84	2.96	3.05	2.96	2.92	2.51	2.76	2.76
sympy_str	334.32	2.75	2.88	3.01	2.89	2.86	2.47	2.68	2.71
sympy_integrate	23.89	2.81	2.95	3.10	2.98	2.95	2.44	2.74	2.70
typing_runtime_protocols	0.54	2.83	2.82	2.94	2.76	2.76	2.51	2.66	2.63
pathlib	20.81	3.95	3.37	2.94	3.03	2.74	2.99	2.79	2.62
dulwich_log	82.35	3.18	2.96	2.87	2.87	2.76	2.64	2.63	2.59
sqlalchemy_imperative	22.06	2.80	2.75	2.67	2.70	2.70	2.45	2.49	2.45
sympy_sum	196.00	2.39	2.48	2.54	2.47	2.45	2.17	2.31	2.30
create_gc_cycles	1.29	2.66	1.95	2.87	2.48	2.33	1.91	2.38	2.27
tornado_http	146.44	2.48	2.43	2.60	2.36	2.33	2.18	2.25	2.26
sqlalchemy_declarative	154.12	2.38	2.39	2.47	2.37	2.35	2.12	2.22	2.24
docutils	3014.27	2.41	2.45	2.46	2.43	2.41	2.14	2.32	2.20
xml_etree_iterparse	126.73	2.22	2.29	2.39	2.32	2.25	1.94	2.18	2.05
regex_v8	23.23	2.22	1.79	1.70	1.72	1.62	1.70	1.63	1.58
json_loads	0.03	1.88	1.65	1.52	1.53	1.47	1.55	1.46	1.43
pidigits	204.68	1.55	1.46	1.43	1.41	1.38	1.39	1.39	1.38
gc_traversal	4.39	1.52	1.35	1.45	1.50	1.31	1.28	1.38	1.35
asyncio_tcp	969.36	nan	nan	nan	nan	1.40	1.36	1.33	1.30
xml_etree_parse	182.39	1.24	1.25	1.31	1.25	1.26	1.17	1.22	1.21
regex_effbot	3.18	1.26	1.18	1.14	1.14	1.13	1.15	1.13	1.12
asyncio_tcp_ssl	2924.68	nan	nan	nan	nan	1.12	1.12	1.11	1.10
2to3	318.27	0.97	0.95	0.98	0.98	0.98	0.96	0.93	1.06
pickle	0.01	1.19	1.11	1.09	1.08	1.06	1.09	1.08	1.06
unpickle	0.01	1.13	1.07	1.05	1.05	1.04	1.05	1.06	1.05
pickle_list	0.00	1.12	1.07	1.05	1.05	1.04	1.05	1.05	1.04
unpickle_list	0.01	1.09	1.05	1.04	1.04	1.05	1.04	1.02	1.03
asyncio_websockets	643.37	1.06	1.05	1.05	nan	1.00	1.01	1.01	1.01
regex_dna	187.26	0.99	1.00	0.99	0.99	0.99	0.99	0.99	1.00
pickle_dict	0.03	1.02	1.01	1.00	1.00	1.00	1.00	1.00	1.00
python_startup	11.26	0.98	0.95	1.00	0.97	0.96	0.95	0.95	0.98
python_startup_no_site	8.62	0.98	0.95	0.95	0.99	0.95	0.95	0.96	0.97
bench_mp_pool	14.86	nan	nan	nan	nan	nan	nan	nan	nan
bench_thread_pool	1.22	nan	nan	nan	nan	nan	nan	nan	nan
dask	500.97	nan	nan	nan	nan	nan	nan	nan	nan
django_template	37.73	nan	nan	nan	nan	nan	nan	nan	nan
pickle_pure_python	0.34	nan	nan	nan	nan	nan	nan	nan	nan
regex_compile	150.42	nan	nan	nan	nan	nan	nan	nan	nan
unpickle_pure_python	0.26	nan	nan	nan	nan	nan	nan	nan	nan

Average factor (geometric mean, complete case only)

subopt         4.251947
map opt        4.499286
time opt       4.535526
tss opt        4.547399
key opt        4.908084
preshed opt    4.987763
sub            5.329236
v4.1.2         6.309396
dtype: float64

I managed to adjust pyperformance so it uses line_profiler (with autoprofile) and ran it*. See https://github.com/notEvil/line_profiler/tree/sub/pyperformance and https://github.com/notEvil/pyperformance/tree/line_profiler if you want to do the same. Set up the Pipenv environment and run prepare.py to prepare an environment. Then run pyperformance in that environment.

The column baseline is the average time in ms, the others are scaling factors (e.g. 1.5 means 1.5 * baseline). Some benchmarks like 2to3 spawn subprocesses which explains some ~1 factors. ~~Didn't investigate the < 1 yet.~~ nevermind, should've used pyperf to load the json from the start.

Update 2023-11-06

Added the opt variants and sorted by average factor. Interestingly, the success of time opt indicates benchmarks where the entry check fails a lot. However, map opt is on par and doesn't add bias. So overall my take is: preshed is not worth it**, time opt might be and map opt, tss opt and key opt are. ~~What is still strange is that sub and key opt are so different although they use a very similar approach.~~ (sub used id() while key opt a type cast. With type cast, sub is a lot closer to key opt now).

* on Intel 4570S with 3 of them in parallel (1 idle core). Most nan are due to conflicts (e.g. bind to port).
** not better than key opt

Theelx mentioned this issue Aug 12, 2023

Speed up parts of the hot path & rearrange structs notEvil/line_profiler#1

Closed

Theelx mentioned this issue Aug 16, 2023

Speed up parts of the hot path & rearrange structs notEvil/line_profiler#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different approach with a few benefits #230

Different approach with a few benefits #230

notEvil commented Aug 12, 2023

Erotemic commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

Theelx commented Aug 12, 2023 •

edited

Loading

Theelx commented Aug 12, 2023 •

edited

Loading

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

notEvil commented Aug 13, 2023

notEvil commented Aug 13, 2023 •

edited

Loading

notEvil commented Aug 16, 2023

Theelx commented Aug 16, 2023

notEvil commented Aug 16, 2023

Theelx commented Aug 16, 2023

tmm1 commented Aug 21, 2023

notEvil commented Oct 17, 2023

notEvil commented Oct 19, 2023 •

edited

Loading

notEvil commented Nov 3, 2023 •

edited

Loading

notEvil commented Nov 5, 2023 •

edited

Loading

Different approach with a few benefits #230

Different approach with a few benefits #230

Comments

notEvil commented Aug 12, 2023

Erotemic commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

Theelx commented Aug 12, 2023 • edited Loading

Theelx commented Aug 12, 2023 • edited Loading

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

Theelx commented Aug 12, 2023

notEvil commented Aug 12, 2023

notEvil commented Aug 13, 2023

notEvil commented Aug 13, 2023 • edited Loading

notEvil commented Aug 16, 2023

Theelx commented Aug 16, 2023

notEvil commented Aug 16, 2023

Theelx commented Aug 16, 2023

tmm1 commented Aug 21, 2023

notEvil commented Oct 17, 2023

notEvil commented Oct 19, 2023 • edited Loading

notEvil commented Nov 3, 2023 • edited Loading

notEvil commented Nov 5, 2023 • edited Loading

Theelx commented Aug 12, 2023 •

edited

Loading

Theelx commented Aug 12, 2023 •

edited

Loading

notEvil commented Aug 13, 2023 •

edited

Loading

notEvil commented Oct 19, 2023 •

edited

Loading

notEvil commented Nov 3, 2023 •

edited

Loading

notEvil commented Nov 5, 2023 •

edited

Loading