Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate python build sources #49

Open
ankush opened this issue Jan 1, 2025 · 3 comments
Open

Evaluate python build sources #49

ankush opened this issue Jan 1, 2025 · 3 comments

Comments

@ankush
Copy link
Member

ankush commented Jan 1, 2025

We currently use deadsnakes which might not be optimal: it doesn't do LTO / PGO which people have reported to help quite a bit with performance.

@ankush
Copy link
Member Author

ankush commented Jan 1, 2025

I have no idea how to interpret this or what I did wrong or if it's just compilers defying expectations 😄

Comparison:

"source" deadsnakes uv build from src
version 3.12.8 3.12.8 3.12.8
compiler GCC 11.4.0 Clang 18.1.8 GCC 11.4.0
PGO No yes yes
LTO No yes yes
Compilation flags '--enable-shared' '--prefix=/usr' '--libdir=/usr/lib/x86_64-linux-gnu' '--enable-ipv6' '--enable-loadable-sqlite-extensions' '--with-dbmliborder=bdb:gdbm' '--with-computed-gotos' '--without-ensurepip' '--with-system-expat' 'MKDIR_P=/bin/mkdir -p' 'CC=x86_64-linux-gnu-gcc' 'CFLAGS=-g -fstack-protector-strong -Wformat -Werror=format-security ' 'LDFLAGS=-Wl,-Bsymbolic-functions -g -fwrapv -O2 ' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' '--build=x86_64-unknown-linux-gnu' '--host=x86_64-unknown-linux-gnu' '--prefix=/install' '--with-openssl=/tools/deps' '--with-system-expat' '--with-system-libmpdec' '--without-ensurepip' '--with-readline=editline' '--enable-shared' '--enable-optimizations' '--enable-bolt' '--with-lto' '--with-build-python=/tools/host/bin/python3.12' '--with-dbmliborder=bdb' 'build_alias=x86_64-unknown-linux-gnu' 'host_alias=x86_64-unknown-linux-gnu' 'CC=clang' 'CFLAGS= -fPIC ' 'LDFLAGS= -Wl,--exclude-libs,ALL -LModules/_hacl' 'CPPFLAGS= -fPIC ' '--prefix=/home/ankush/.pyenv/versions/3.12.8' '--enable-shared' '--libdir=/home/ankush/.pyenv/versions/3.12.8/lib' '--enable-optimizations' '--with-lto' 'CFLAGS=-march=native -mtune=native' 'LDFLAGS=-L/home/ankush/.pyenv/versions/3.12.8/lib -Wl,-rpath,/home/ankush/.pyenv/versions/3.12.8/lib' 'LIBS=-L/home/ankush/.pyenv/versions/3.12.8/lib -Wl,-rpath,/home/ankush/.pyenv/versions/3.12.8/lib' 'CPPFLAGS=-I/home/ankush/.pyenv/versions/3.12.8/include'
python cflags -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fno-strict-overflow -Wsign-compare -Wunreachable-code -DNDEBUG -g -O3 -Wall -fPIC -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -march=native -mtune=native
Benchmark deadsnakes uv native
database_delete_value_simple 202 us 235 us: 1.16x slower 243 us: 1.20x slower
database_empty_transaction_cycling 235 us 254 us: 1.08x slower 271 us: 1.15x slower
database_get_cached_value_simple 118 us 166 us: 1.41x slower 169 us: 1.44x slower
database_get_single_value 665 us 763 us: 1.15x slower 799 us: 1.20x slower
database_get_value_simple 2.88 ms 3.27 ms: 1.14x slower 3.42 ms: 1.19x slower
database_get_value_with_dict_filters 311 us 353 us: 1.13x slower 366 us: 1.18x slower
database_get_value_with_list_filters 448 us 513 us: 1.15x slower 535 us: 1.20x slower
database_select_star 1.38 ms 1.60 ms: 1.17x slower 1.67 ms: 1.21x slower
database_set_value_simple 418 us 466 us: 1.12x slower 481 us: 1.15x slower
database_sql_select_many_rows 38.1 ms 48.5 ms: 1.27x slower 50.3 ms: 1.32x slower
orm_doc_to_dict 55.3 us 66.2 us: 1.20x slower 74.3 us: 1.34x slower
orm_get_all 36.4 us 50.5 us: 1.39x slower 50.9 us: 1.40x slower
orm_get_all_with_filters 241 us 304 us: 1.26x slower 320 us: 1.33x slower
orm_get_all_with_many_fields 243 us 310 us: 1.27x slower 324 us: 1.33x slower
orm_get_cached_doc 705 us 769 us: 1.09x slower 804 us: 1.14x slower
orm_get_doc 3.43 ms 3.96 ms: 1.15x slower 4.05 ms: 1.18x slower
orm_get_list 94.8 us 128 us: 1.35x slower 131 us: 1.38x slower
orm_get_local_cached_doc 15.5 us 22.1 us: 1.43x slower 23.2 us: 1.50x slower
orm_get_user 6.40 ms 7.34 ms: 1.15x slower 7.53 ms: 1.18x slower
orm_new_doc 46.0 us 60.5 us: 1.32x slower 63.9 us: 1.39x slower
orm_save_doc 1.50 ms 1.71 ms: 1.14x slower 1.74 ms: 1.16x slower
qb_qb_get_query 98.1 us 125 us: 1.28x slower 135 us: 1.38x slower
qb_qb_get_query_multiple_fields 134 us 171 us: 1.27x slower 184 us: 1.37x slower
qb_qb_select_multiple_fields 46.0 us 61.1 us: 1.33x slower 67.3 us: 1.46x slower
qb_qb_select_star 31.7 us 43.2 us: 1.36x slower 47.4 us: 1.50x slower
qb_qb_simple_get_query 98.6 us 127 us: 1.29x slower 136 us: 1.38x slower
redis_make_key 708 ns 1.15 us: 1.62x slower 1.19 us: 1.67x slower
redis_redis_get_local_value 15.5 us 22.1 us: 1.43x slower 23.3 us: 1.50x slower
redis_redis_get_set_delete_cycle 39.0 ms 42.7 ms: 1.10x slower 44.9 ms: 1.15x slower
utils_cint_on_string 122 ns 210 ns: 1.72x slower 187 ns: 1.53x slower
utils_flt_explicit_rounding 1.74 us 2.54 us: 1.46x slower 2.70 us: 1.55x slower
utils_flt_no_rounding 94.4 ns 127 ns: 1.35x slower 139 ns: 1.47x slower
utils_flt_str 2.53 us 3.73 us: 1.47x slower 3.90 us: 1.54x slower
utils_flt_typical 2.21 us 3.32 us: 1.51x slower 3.41 us: 1.55x slower
utils_frappe_dict_getattr 42.8 ns 86.4 ns: 2.02x slower 98.4 ns: 2.30x slower
utils_frappe_dict_setattr 75.9 ns 124 ns: 1.63x slower 139 ns: 1.83x slower
utils_get_system_settings 403 ns 557 ns: 1.38x slower 612 ns: 1.52x slower
utils_no_translation_required 2.80 us 3.85 us: 1.37x slower 4.11 us: 1.47x slower
utils_parse_datetime 736 ns 1.10 us: 1.49x slower 1.17 us: 1.58x slower
utils_redis_cache_deco_with_local_cache 156 us 229 us: 1.47x slower 242 us: 1.55x slower
utils_redis_cache_deco_without_local_cache 5.56 ms 6.02 ms: 1.08x slower 6.25 ms: 1.12x slower
utils_request_cache_many_args 378 ns 541 ns: 1.43x slower 585 ns: 1.55x slower
utils_site_cache_many_args 544 ns 760 ns: 1.40x slower 816 ns: 1.50x slower
utils_site_cache_no_arg 546 ns 762 ns: 1.40x slower 819 ns: 1.50x slower
utils_site_cache_with_ttl 686 ns 919 ns: 1.34x slower 999 ns: 1.46x slower
utils_unknown_translations 2.81 us 3.84 us: 1.37x slower 4.11 us: 1.46x slower
utils_valid_translation 2.81 us 3.86 us: 1.37x slower 4.10 us: 1.46x slower
web_requests_desk_page_render 29.6 ms 35.2 ms: 1.19x slower 36.1 ms: 1.22x slower
web_requests_list_view_count_query 4.32 ms 4.80 ms: 1.11x slower 4.95 ms: 1.15x slower
web_requests_list_view_query 5.16 ms 5.86 ms: 1.14x slower 6.01 ms: 1.16x slower
web_requests_login_page_render 80.7 ms 100.0 ms: 1.24x slower 104 ms: 1.28x slower
web_requests_request_authed_overheads 3.15 ms 3.51 ms: 1.12x slower 3.60 ms: 1.15x slower
web_requests_request_getdoc 17.4 ms 19.4 ms: 1.12x slower 20.1 ms: 1.16x slower
web_requests_request_overheads 2.90 ms 3.25 ms: 1.12x slower 3.34 ms: 1.15x slower
web_requests_request_socketio_auth 3.13 ms 3.48 ms: 1.11x slower 3.59 ms: 1.15x slower
web_requests_request_socketio_perm_check 4.13 ms 4.64 ms: 1.12x slower 4.70 ms: 1.14x slower
Geometric mean (ref) 1.29x slower 1.35x slower

@ankush
Copy link
Member Author

ankush commented Jan 1, 2025

TODO:

  • evaluate all 3 on official pyperformance test suite
  • "reproduce" deadsnakes build

@ankush
Copy link
Member Author

ankush commented Jan 4, 2025

deadsnakes.json

Performance version: 1.11.0
Report on Linux-6.8.0-51-generic-x86_64-with-glibc2.39
Number of logical CPUs: 8
Start date: 2025-01-04 13:26:43.127026
End date: 2025-01-04 14:16:55.409136

uv.json

Performance version: 1.11.0
Report on Linux-6.8.0-51-generic-x86_64-with-glibc2.39
Number of logical CPUs: 8
Start date: 2025-01-04 15:32:22.626617
End date: 2025-01-04 16:26:02.399614


Benchmarks with tag 'apps':

Benchmark deadsnakes uv
chameleon 7.61 ms 7.09 ms: 1.07x faster
docutils 2.21 sec 2.33 sec: 1.05x slower
html5lib 55.7 ms 51.2 ms: 1.09x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (2): 2to3, tornado_http

Benchmarks with tag 'asyncio':

Benchmark deadsnakes uv
async_tree_none 363 ms 372 ms: 1.02x slower
async_tree_cpu_io_mixed 565 ms 601 ms: 1.06x slower
async_tree_cpu_io_mixed_tg 557 ms 609 ms: 1.09x slower
async_tree_eager 118 ms 121 ms: 1.03x slower
async_tree_eager_cpu_io_mixed 385 ms 417 ms: 1.08x slower
async_tree_eager_cpu_io_mixed_tg 341 ms 374 ms: 1.10x slower
async_tree_eager_io 885 ms 907 ms: 1.02x slower
async_tree_eager_io_tg 889 ms 909 ms: 1.02x slower
async_tree_eager_memoization 243 ms 245 ms: 1.01x slower
async_tree_eager_memoization_tg 202 ms 197 ms: 1.02x faster
async_tree_eager_tg 82.6 ms 81.6 ms: 1.01x faster
async_tree_io 828 ms 851 ms: 1.03x slower
async_tree_io_tg 839 ms 863 ms: 1.03x slower
async_tree_memoization_tg 446 ms 454 ms: 1.02x slower
async_tree_none_tg 332 ms 339 ms: 1.02x slower
Geometric mean (ref) 1.03x slower

Benchmark hidden because not significant (1): async_tree_memoization

Benchmarks with tag 'math':

Benchmark deadsnakes uv
float 84.3 ms 93.0 ms: 1.10x slower
nbody 117 ms 120 ms: 1.02x slower
pidigits 192 ms 214 ms: 1.12x slower
Geometric mean (ref) 1.08x slower

Benchmarks with tag 'regex':

Benchmark deadsnakes uv
regex_compile 114 ms 111 ms: 1.03x faster
regex_dna 207 ms 179 ms: 1.16x faster
regex_effbot 3.00 ms 3.15 ms: 1.05x slower
regex_v8 24.4 ms 24.2 ms: 1.01x faster
Geometric mean (ref) 1.04x faster

Benchmarks with tag 'serialize':

Benchmark deadsnakes uv
json_dumps 9.98 ms 11.6 ms: 1.16x slower
json_loads 21.6 us 25.6 us: 1.18x slower
pickle 10.3 us 10.6 us: 1.03x slower
pickle_dict 24.4 us 19.5 us: 1.25x faster
pickle_list 3.67 us 3.28 us: 1.12x faster
pickle_pure_python 293 us 282 us: 1.04x faster
tomli_loads 2.42 sec 2.12 sec: 1.14x faster
unpickle 13.5 us 15.4 us: 1.14x slower
unpickle_list 3.93 us 4.47 us: 1.14x slower
unpickle_pure_python 229 us 211 us: 1.09x faster
xml_etree_parse 132 ms 274 ms: 2.08x slower
xml_etree_iterparse 91.5 ms 146 ms: 1.60x slower
xml_etree_generate 84.5 ms 85.5 ms: 1.01x slower
xml_etree_process 60.8 ms 62.4 ms: 1.03x slower
Geometric mean (ref) 1.09x slower

Benchmarks with tag 'startup':

Benchmark deadsnakes uv
python_startup 11.5 ms 13.8 ms: 1.20x slower
python_startup_no_site 7.90 ms 9.97 ms: 1.26x slower
Geometric mean (ref) 1.23x slower

Benchmarks with tag 'template':

Benchmark deadsnakes uv
django_template 36.9 ms 36.7 ms: 1.01x faster
genshi_text 24.7 ms 24.1 ms: 1.02x faster
genshi_xml 54.7 ms 52.6 ms: 1.04x faster
mako 12.5 ms 11.9 ms: 1.05x faster
Geometric mean (ref) 1.03x faster

All benchmarks:

Benchmark deadsnakes uv
async_generators 386 ms 475 ms: 1.23x slower
async_tree_none 363 ms 372 ms: 1.02x slower
async_tree_cpu_io_mixed 565 ms 601 ms: 1.06x slower
async_tree_cpu_io_mixed_tg 557 ms 609 ms: 1.09x slower
async_tree_eager 118 ms 121 ms: 1.03x slower
async_tree_eager_cpu_io_mixed 385 ms 417 ms: 1.08x slower
async_tree_eager_cpu_io_mixed_tg 341 ms 374 ms: 1.10x slower
async_tree_eager_io 885 ms 907 ms: 1.02x slower
async_tree_eager_io_tg 889 ms 909 ms: 1.02x slower
async_tree_eager_memoization 243 ms 245 ms: 1.01x slower
async_tree_eager_memoization_tg 202 ms 197 ms: 1.02x faster
async_tree_eager_tg 82.6 ms 81.6 ms: 1.01x faster
async_tree_io 828 ms 851 ms: 1.03x slower
async_tree_io_tg 839 ms 863 ms: 1.03x slower
async_tree_memoization_tg 446 ms 454 ms: 1.02x slower
async_tree_none_tg 332 ms 339 ms: 1.02x slower
asyncio_tcp 352 ms 355 ms: 1.01x slower
asyncio_tcp_ssl 1.06 sec 1.41 sec: 1.33x slower
asyncio_websockets 580 ms 1.43 sec: 2.47x slower
chameleon 7.61 ms 7.09 ms: 1.07x faster
chaos 64.7 ms 63.9 ms: 1.01x faster
comprehensions 16.8 us 16.2 us: 1.03x faster
bench_thread_pool 934 us 938 us: 1.00x slower
coroutines 26.0 ms 25.6 ms: 1.02x faster
coverage 67.7 ms 75.7 ms: 1.12x slower
crypto_pyaes 70.2 ms 74.0 ms: 1.05x slower
deepcopy 365 us 358 us: 1.02x faster
deepcopy_memo 40.1 us 37.8 us: 1.06x faster
deltablue 3.53 ms 3.17 ms: 1.11x faster
django_template 36.9 ms 36.7 ms: 1.01x faster
docutils 2.21 sec 2.33 sec: 1.05x slower
dulwich_log 46.1 ms 52.8 ms: 1.15x slower
fannkuch 407 ms 380 ms: 1.07x faster
float 84.3 ms 93.0 ms: 1.10x slower
create_gc_cycles 1.13 ms 1.35 ms: 1.19x slower
gc_traversal 3.21 ms 5.08 ms: 1.58x slower
generators 34.0 ms 36.5 ms: 1.07x slower
genshi_text 24.7 ms 24.1 ms: 1.02x faster
genshi_xml 54.7 ms 52.6 ms: 1.04x faster
go 159 ms 136 ms: 1.17x faster
hexiom 6.39 ms 6.09 ms: 1.05x faster
html5lib 55.7 ms 51.2 ms: 1.09x faster
json_dumps 9.98 ms 11.6 ms: 1.16x slower
json_loads 21.6 us 25.6 us: 1.18x slower
logging_format 6.91 us 6.80 us: 1.02x faster
logging_silent 105 ns 88.6 ns: 1.19x faster
logging_simple 6.33 us 6.10 us: 1.04x faster
mako 12.5 ms 11.9 ms: 1.05x faster
mdp 2.47 sec 2.80 sec: 1.13x slower
meteor_contest 105 ms 101 ms: 1.04x faster
nbody 117 ms 120 ms: 1.02x slower
nqueens 87.6 ms 90.1 ms: 1.03x slower
pathlib 23.1 ms 23.8 ms: 1.03x slower
pickle 10.3 us 10.6 us: 1.03x slower
pickle_dict 24.4 us 19.5 us: 1.25x faster
pickle_list 3.67 us 3.28 us: 1.12x faster
pickle_pure_python 293 us 282 us: 1.04x faster
pidigits 192 ms 214 ms: 1.12x slower
pprint_safe_repr 747 ms 813 ms: 1.09x slower
pprint_pformat 1.55 sec 1.66 sec: 1.07x slower
pyflate 494 ms 457 ms: 1.08x faster
python_startup 11.5 ms 13.8 ms: 1.20x slower
python_startup_no_site 7.90 ms 9.97 ms: 1.26x slower
raytrace 267 ms 268 ms: 1.01x slower
regex_compile 114 ms 111 ms: 1.03x faster
regex_dna 207 ms 179 ms: 1.16x faster
regex_effbot 3.00 ms 3.15 ms: 1.05x slower
regex_v8 24.4 ms 24.2 ms: 1.01x faster
richards 50.4 ms 43.8 ms: 1.15x faster
richards_super 56.7 ms 48.1 ms: 1.18x faster
scimark_fft 346 ms 324 ms: 1.07x faster
scimark_lu 116 ms 104 ms: 1.11x faster
scimark_monte_carlo 80.2 ms 62.4 ms: 1.29x faster
scimark_sor 152 ms 126 ms: 1.21x faster
scimark_sparse_mat_mult 4.79 ms 4.71 ms: 1.02x faster
spectral_norm 117 ms 125 ms: 1.07x slower
sqlglot_normalize 106 ms 113 ms: 1.07x slower
sqlglot_optimize 51.1 ms 55.3 ms: 1.08x slower
sqlglot_parse 1.26 ms 1.21 ms: 1.04x faster
sqlglot_transpile 1.56 ms 1.49 ms: 1.04x faster
sqlite_synth 2.34 us 3.46 us: 1.48x slower
sympy_expand 372 ms 398 ms: 1.07x slower
sympy_integrate 18.6 ms 18.8 ms: 1.01x slower
sympy_sum 116 ms 122 ms: 1.05x slower
sympy_str 217 ms 230 ms: 1.06x slower
telco 7.64 ms 8.88 ms: 1.16x slower
tomli_loads 2.42 sec 2.12 sec: 1.14x faster
typing_runtime_protocols 158 us 160 us: 1.01x slower
unpack_sequence 53.6 ns 49.8 ns: 1.08x faster
unpickle 13.5 us 15.4 us: 1.14x slower
unpickle_list 3.93 us 4.47 us: 1.14x slower
unpickle_pure_python 229 us 211 us: 1.09x faster
xml_etree_parse 132 ms 274 ms: 2.08x slower
xml_etree_iterparse 91.5 ms 146 ms: 1.60x slower
xml_etree_generate 84.5 ms 85.5 ms: 1.01x slower
xml_etree_process 60.8 ms 62.4 ms: 1.03x slower
Geometric mean (ref) 1.04x slower

Benchmark hidden because not significant (6): 2to3, async_tree_memoization, bench_mp_pool, dask, deepcopy_reduce, tornado_http

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant