[crashtracking] improve poll waiting logic #754

sanchda · 2024-11-22T18:46:24Z

What does this PR do?

The original implementation accidentally had a mutable array with immutable objects, causing the interface to always throw errors. Since this part of the code is in the critical path for handling zombie processes, this condition had an adverse side-effect on customer infrastructure.

This code also used a BorrowedFd, which is supposed to track an OwnedFd. This was problematic in some conditions, since the underlying implementation would use prctl() to check file descriptor liveness and panic in some edge-cases. The code has been ported to libc, using exclusively RawFd, in order to prevent this condition.

Finally, this patch grants some additional time to the act of reaping a PID. When a receiver process exceeds its timeout budget, it's sent a SIGKILL. However, the old behavior was to SIGKILL, the immediately waitpid( pid, ..., WNOHANG). On a saturated system (i.e., precisely the kind of system where a timeout might be necessary!), it may take some time for the receiver PID to respond to the SIGKILL.

In general, there's no way to provided a bounded guarantee for the duration of this reap operation, so an arbitrary number of scheduler slices is chosen as the maximum reaping wait duration.

Motivation

Fix zombies

pr-commenter · 2024-11-22T18:48:02Z

Benchmarks

Comparison

Benchmark execution time: 2024-11-25 20:51:20

Comparing candidate commit 8074728 in PR branch sanchda/fix_poll_zombies with baseline commit bdbbd73 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.

Candidate

Candidate benchmark details

Group 1

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
two way interface	execution_time	18.279µs	24.451µs ± 14.485µs	18.529µs ± 0.084µs	18.910µs	47.313µs	50.310µs	157.177µs	748.29%	5.095	38.179	59.09%	1.024µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
two way interface	execution_time	[22.444µs; 26.459µs] or [-8.210%; +8.210%]	None	None	None

Group 2

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	68.939µs	69.078µs ± 0.144µs	69.058µs ± 0.040µs	69.101µs	69.194µs	69.296µs	70.775µs	2.49%	8.559	95.753	0.21%	0.010µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[69.058µs; 69.098µs] or [-0.029%; +0.029%]	None	None	None

Group 3

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
credit_card/is_card_number/	execution_time	4.623µs	4.632µs ± 0.004µs	4.632µs ± 0.003µs	4.635µs	4.638µs	4.642µs	4.656µs	0.52%	0.814	4.307	0.09%	0.000µs	1	200
credit_card/is_card_number/	throughput	214760651.790op/s	215875063.962op/s ± 195540.286op/s	215871407.835op/s ± 135440.082op/s	216018713.084op/s	216158066.801op/s	216284750.093op/s	216324363.504op/s	0.21%	-0.798	4.221	0.09%	13826.786op/s	1	200
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	90.788µs	91.770µs ± 0.671µs	91.687µs ± 0.334µs	92.037µs	92.744µs	92.978µs	98.291µs	7.20%	4.792	42.734	0.73%	0.047µs	1	200
credit_card/is_card_number/ 3782-8224-6310-005	throughput	10173846.773op/s	10897360.266op/s ± 77162.762op/s	10906677.627op/s ± 39854.989op/s	10946144.293op/s	10977464.386op/s	11006785.374op/s	11014664.864op/s	0.99%	-4.341	36.931	0.71%	5456.231op/s	1	200
credit_card/is_card_number/ 378282246310005	execution_time	83.848µs	84.077µs ± 0.372µs	83.998µs ± 0.043µs	84.086µs	84.304µs	84.885µs	88.772µs	5.68%	10.331	125.472	0.44%	0.026µs	1	200
credit_card/is_card_number/ 378282246310005	throughput	11264858.016op/s	11894055.091op/s ± 50373.942op/s	11905093.594op/s ± 6034.973op/s	11910120.029op/s	11916260.214op/s	11922160.575op/s	11926305.853op/s	0.18%	-10.058	120.548	0.42%	3561.976op/s	1	200
credit_card/is_card_number/37828224631	execution_time	4.612µs	4.627µs ± 0.006µs	4.627µs ± 0.003µs	4.629µs	4.634µs	4.642µs	4.686µs	1.28%	4.134	34.541	0.14%	0.000µs	1	200
credit_card/is_card_number/37828224631	throughput	213407435.660op/s	216119953.021op/s ± 300385.283op/s	216129973.890op/s ± 128759.917op/s	216267594.440op/s	216502749.974op/s	216687204.414op/s	216833761.719op/s	0.33%	-4.054	33.638	0.14%	21240.447op/s	1	200
credit_card/is_card_number/378282246310005	execution_time	81.037µs	81.207µs ± 0.122µs	81.181µs ± 0.045µs	81.232µs	81.445µs	81.655µs	81.785µs	0.74%	1.998	5.169	0.15%	0.009µs	1	200
credit_card/is_card_number/378282246310005	throughput	12227214.782op/s	12314193.910op/s ± 18501.917op/s	12318138.578op/s ± 6816.619op/s	12324203.377op/s	12335607.574op/s	12338683.247op/s	12340018.870op/s	0.18%	-1.983	5.096	0.15%	1308.283op/s	1	200
credit_card/is_card_number/37828224631000521389798	execution_time	58.994µs	59.196µs ± 0.127µs	59.159µs ± 0.072µs	59.264µs	59.455µs	59.564µs	59.622µs	0.78%	1.027	0.694	0.21%	0.009µs	1	200
credit_card/is_card_number/37828224631000521389798	throughput	16772437.229op/s	16893170.466op/s ± 36260.575op/s	16903686.699op/s ± 20603.765op/s	16919739.838op/s	16937463.637op/s	16946668.856op/s	16950869.560op/s	0.28%	-1.016	0.664	0.21%	2564.010op/s	1	200
credit_card/is_card_number/x371413321323331	execution_time	6.832µs	6.844µs ± 0.004µs	6.843µs ± 0.002µs	6.845µs	6.851µs	6.856µs	6.874µs	0.45%	2.042	11.292	0.06%	0.000µs	1	200
credit_card/is_card_number/x371413321323331	throughput	145474504.101op/s	146123544.609op/s ± 93963.996op/s	146134991.854op/s ± 44864.562op/s	146178873.500op/s	146231430.255op/s	146309969.641op/s	146361085.010op/s	0.15%	-2.024	11.157	0.06%	6644.258op/s	1	200
credit_card/is_card_number_no_luhn/	execution_time	4.617µs	4.631µs ± 0.005µs	4.631µs ± 0.003µs	4.634µs	4.639µs	4.642µs	4.646µs	0.33%	0.155	0.130	0.11%	0.000µs	1	200
credit_card/is_card_number_no_luhn/	throughput	215241746.376op/s	215938487.180op/s ± 230355.301op/s	215942332.601op/s ± 146336.679op/s	216088980.049op/s	216315197.200op/s	216422922.874op/s	216607519.109op/s	0.31%	-0.149	0.127	0.11%	16288.580op/s	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	73.051µs	73.694µs ± 0.165µs	73.700µs ± 0.073µs	73.773µs	73.916µs	74.041µs	74.546µs	1.15%	-0.087	5.195	0.22%	0.012µs	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	13414497.620op/s	13569778.853op/s ± 30436.014op/s	13568543.367op/s ± 13510.832op/s	13582053.357op/s	13620689.594op/s	13675573.636op/s	13689054.142op/s	0.89%	0.135	5.117	0.22%	2152.151op/s	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	64.736µs	65.096µs ± 0.210µs	65.091µs ± 0.155µs	65.245µs	65.494µs	65.598µs	65.618µs	0.81%	0.388	-0.462	0.32%	0.015µs	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	15239687.630op/s	15362065.305op/s ± 49395.660op/s	15363086.348op/s ± 36567.904op/s	15399770.935op/s	15433387.764op/s	15444169.585op/s	15447409.645op/s	0.55%	-0.374	-0.478	0.32%	3492.801op/s	1	200
credit_card/is_card_number_no_luhn/37828224631	execution_time	4.617µs	4.632µs ± 0.004µs	4.632µs ± 0.003µs	4.635µs	4.639µs	4.641µs	4.644µs	0.25%	-0.071	0.145	0.09%	0.000µs	1	200
credit_card/is_card_number_no_luhn/37828224631	throughput	215335732.333op/s	215891683.590op/s ± 204315.941op/s	215873670.744op/s ± 141078.451op/s	216041620.424op/s	216189108.183op/s	216343195.231op/s	216575136.934op/s	0.32%	0.077	0.150	0.09%	14447.319op/s	1	200
credit_card/is_card_number_no_luhn/378282246310005	execution_time	62.738µs	63.531µs ± 0.161µs	63.572µs ± 0.076µs	63.633µs	63.721µs	63.771µs	63.783µs	0.33%	-1.467	2.965	0.25%	0.011µs	1	200
credit_card/is_card_number_no_luhn/378282246310005	throughput	15678104.307op/s	15740410.022op/s ± 40080.747op/s	15730144.787op/s ± 18671.866op/s	15756546.597op/s	15825375.097op/s	15849784.068op/s	15939301.526op/s	1.33%	1.489	3.073	0.25%	2834.137op/s	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	59.026µs	59.244µs ± 0.136µs	59.240µs ± 0.087µs	59.297µs	59.482µs	59.708µs	59.906µs	1.12%	1.288	3.008	0.23%	0.010µs	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	16692932.519op/s	16879402.512op/s ± 38578.081op/s	16880406.931op/s ± 24813.401op/s	16910888.926op/s	16927507.371op/s	16934041.573op/s	16941642.557op/s	0.36%	-1.265	2.898	0.23%	2727.882op/s	1	200
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	6.831µs	6.841µs ± 0.004µs	6.842µs ± 0.002µs	6.844µs	6.847µs	6.850µs	6.856µs	0.20%	-0.268	0.415	0.06%	0.000µs	1	200
credit_card/is_card_number_no_luhn/x371413321323331	throughput	145866709.156op/s	146167371.011op/s ± 89332.511op/s	146161902.902op/s ± 51205.760op/s	146214435.021op/s	146343828.910op/s	146378248.892op/s	146382553.683op/s	0.15%	0.273	0.413	0.06%	6316.762op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
credit_card/is_card_number/	execution_time	[4.632µs; 4.633µs] or [-0.013%; +0.013%]	None	None	None
credit_card/is_card_number/	throughput	[215847963.958op/s; 215902163.965op/s] or [-0.013%; +0.013%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	[91.677µs; 91.863µs] or [-0.101%; +0.101%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	throughput	[10886666.250op/s; 10908054.283op/s] or [-0.098%; +0.098%]	None	None	None
credit_card/is_card_number/ 378282246310005	execution_time	[84.026µs; 84.129µs] or [-0.061%; +0.061%]	None	None	None
credit_card/is_card_number/ 378282246310005	throughput	[11887073.747op/s; 11901036.435op/s] or [-0.059%; +0.059%]	None	None	None
credit_card/is_card_number/37828224631	execution_time	[4.626µs; 4.628µs] or [-0.019%; +0.019%]	None	None	None
credit_card/is_card_number/37828224631	throughput	[216078322.510op/s; 216161583.532op/s] or [-0.019%; +0.019%]	None	None	None
credit_card/is_card_number/378282246310005	execution_time	[81.190µs; 81.224µs] or [-0.021%; +0.021%]	None	None	None
credit_card/is_card_number/378282246310005	throughput	[12311629.722op/s; 12316758.098op/s] or [-0.021%; +0.021%]	None	None	None
credit_card/is_card_number/37828224631000521389798	execution_time	[59.178µs; 59.213µs] or [-0.030%; +0.030%]	None	None	None
credit_card/is_card_number/37828224631000521389798	throughput	[16888145.099op/s; 16898195.833op/s] or [-0.030%; +0.030%]	None	None	None
credit_card/is_card_number/x371413321323331	execution_time	[6.843µs; 6.844µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number/x371413321323331	throughput	[146110522.103op/s; 146136567.115op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/	execution_time	[4.630µs; 4.632µs] or [-0.015%; +0.015%]	None	None	None
credit_card/is_card_number_no_luhn/	throughput	[215906562.150op/s; 215970412.209op/s] or [-0.015%; +0.015%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	[73.671µs; 73.716µs] or [-0.031%; +0.031%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	[13565560.714op/s; 13573996.992op/s] or [-0.031%; +0.031%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	[65.067µs; 65.125µs] or [-0.045%; +0.045%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	[15355219.542op/s; 15368911.069op/s] or [-0.045%; +0.045%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	execution_time	[4.631µs; 4.633µs] or [-0.013%; +0.013%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	throughput	[215863367.365op/s; 215919999.814op/s] or [-0.013%; +0.013%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	execution_time	[63.509µs; 63.553µs] or [-0.035%; +0.035%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	throughput	[15734855.216op/s; 15745964.828op/s] or [-0.035%; +0.035%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	[59.225µs; 59.263µs] or [-0.032%; +0.032%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	[16874055.961op/s; 16884749.063op/s] or [-0.032%; +0.032%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	[6.841µs; 6.842µs] or [-0.008%; +0.008%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	throughput	[146154990.384op/s; 146179751.638op/s] or [-0.008%; +0.008%]	None	None	None

Group 4

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	620.330µs	621.642µs ± 0.762µs	621.553µs ± 0.267µs	621.832µs	622.376µs	626.648µs	627.086µs	0.89%	4.982	32.457	0.12%	0.054µs	1	200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	1594676.587op/s	1608645.270op/s ± 1960.700op/s	1608872.568op/s ± 691.469op/s	1609532.035op/s	1610487.097op/s	1611017.933op/s	1612045.202op/s	0.20%	-4.947	32.143	0.12%	138.642op/s	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	466.243µs	467.103µs ± 0.316µs	467.084µs ± 0.198µs	467.283µs	467.683µs	467.888µs	468.137µs	0.23%	0.410	0.373	0.07%	0.022µs	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	2136126.097op/s	2140857.798op/s ± 1445.963op/s	2140942.770op/s ± 907.296op/s	2141821.983op/s	2142910.293op/s	2143700.746op/s	2144805.771op/s	0.18%	-0.405	0.369	0.07%	102.245op/s	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	191.179µs	191.749µs ± 0.191µs	191.748µs ± 0.128µs	191.859µs	192.030µs	192.167µs	192.759µs	0.53%	0.679	3.182	0.10%	0.014µs	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	5187827.282op/s	5215161.570op/s ± 5198.981op/s	5215189.413op/s ± 3484.594op/s	5218927.834op/s	5222871.372op/s	5226089.271op/s	5230701.428op/s	0.30%	-0.665	3.122	0.10%	367.623op/s	1	200
normalization/normalize_service/normalize_service/[empty string]	execution_time	46.811µs	47.141µs ± 0.115µs	47.138µs ± 0.074µs	47.218µs	47.336µs	47.404µs	47.487µs	0.74%	0.111	0.054	0.24%	0.008µs	1	200
normalization/normalize_service/normalize_service/[empty string]	throughput	21058260.644op/s	21213074.683op/s ± 51754.535op/s	21214387.747op/s ± 33111.851op/s	21245382.637op/s	21293046.651op/s	21333361.560op/s	21362497.053op/s	0.70%	-0.096	0.052	0.24%	3659.598op/s	1	200
normalization/normalize_service/normalize_service/test_ASCII	execution_time	51.468µs	51.666µs ± 0.091µs	51.657µs ± 0.052µs	51.709µs	51.825µs	51.919µs	52.065µs	0.79%	0.988	2.404	0.18%	0.006µs	1	200
normalization/normalize_service/normalize_service/test_ASCII	throughput	19206665.300op/s	19354972.833op/s ± 33992.185op/s	19358595.110op/s ± 19487.497op/s	19376893.807op/s	19398378.247op/s	19427710.394op/s	19429566.222op/s	0.37%	-0.971	2.344	0.18%	2403.610op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	[621.536µs; 621.748µs] or [-0.017%; +0.017%]	None	None	None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	[1608373.536op/s; 1608917.004op/s] or [-0.017%; +0.017%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	[467.059µs; 467.146µs] or [-0.009%; +0.009%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	[2140657.402op/s; 2141058.195op/s] or [-0.009%; +0.009%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	[191.722µs; 191.775µs] or [-0.014%; +0.014%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	[5214441.041op/s; 5215882.098op/s] or [-0.014%; +0.014%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	execution_time	[47.125µs; 47.157µs] or [-0.034%; +0.034%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	throughput	[21205902.002op/s; 21220247.363op/s] or [-0.034%; +0.034%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	execution_time	[51.654µs; 51.679µs] or [-0.024%; +0.024%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	throughput	[19350261.843op/s; 19359683.823op/s] or [-0.024%; +0.024%]	None	None	None

Group 5

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching deserializing traces from msgpack to their internal representation	execution_time	60.344ms	60.707ms ± 0.197ms	60.676ms ± 0.084ms	60.747ms	61.128ms	61.456ms	61.672ms	1.64%	2.101	5.967	0.32%	0.014ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching deserializing traces from msgpack to their internal representation	execution_time	[60.680ms; 60.734ms] or [-0.045%; +0.045%]	None	None	None

Group 6

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
tags/replace_trace_tags	execution_time	2.699µs	2.740µs ± 0.013µs	2.739µs ± 0.007µs	2.747µs	2.769µs	2.773µs	2.777µs	1.38%	0.454	0.930	0.47%	0.001µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
tags/replace_trace_tags	execution_time	[2.738µs; 2.742µs] or [-0.065%; +0.065%]	None	None	None

Group 7

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching string interning on wordpress profile	execution_time	137.209µs	137.870µs ± 0.261µs	137.843µs ± 0.124µs	137.978µs	138.266µs	138.702µs	139.017µs	0.85%	1.006	3.481	0.19%	0.018µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching string interning on wordpress profile	execution_time	[137.834µs; 137.906µs] or [-0.026%; +0.026%]	None	None	None

Group 8

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
redis/obfuscate_redis_string	execution_time	38.113µs	38.906µs ± 1.279µs	38.305µs ± 0.075µs	38.488µs	41.686µs	41.714µs	41.813µs	9.16%	1.686	0.887	3.28%	0.090µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
redis/obfuscate_redis_string	execution_time	[38.728µs; 39.083µs] or [-0.455%; +0.455%]	None	None	None

Group 9

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_trace/test_trace	execution_time	298.278ns	310.104ns ± 13.165ns	305.457ns ± 4.981ns	312.348ns	344.190ns	347.503ns	350.171ns	14.64%	1.636	1.720	4.23%	0.931ns	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_trace/test_trace	execution_time	[308.279ns; 311.928ns] or [-0.588%; +0.588%]	None	None	None

Group 10

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
concentrator/add_spans_to_concentrator	execution_time	9.153ms	9.191ms ± 0.015ms	9.189ms ± 0.010ms	9.199ms	9.215ms	9.226ms	9.276ms	0.94%	0.999	4.156	0.16%	0.001ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
concentrator/add_spans_to_concentrator	execution_time	[9.189ms; 9.193ms] or [-0.023%; +0.023%]	None	None	None

Group 11

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	299.229µs	304.009µs ± 1.645µs	303.959µs ± 1.121µs	305.149µs	306.475µs	307.376µs	307.536µs	1.18%	-0.174	-0.275	0.54%	0.116µs	1	200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	3251652.622op/s	3289474.887op/s ± 17815.405op/s	3289914.391op/s ± 12093.385op/s	3301472.695op/s	3321469.782op/s	3331962.021op/s	3341920.832op/s	1.58%	0.201	-0.255	0.54%	1259.739op/s	1	200
normalization/normalize_name/normalize_name/bad-name	execution_time	28.089µs	28.288µs ± 0.104µs	28.284µs ± 0.054µs	28.336µs	28.444µs	28.731µs	28.893µs	2.15%	1.955	8.413	0.37%	0.007µs	1	200
normalization/normalize_name/normalize_name/bad-name	throughput	34610516.871op/s	35351520.445op/s ± 129003.217op/s	35355533.890op/s ± 67691.691op/s	35433198.978op/s	35520213.863op/s	35568833.429op/s	35601185.930op/s	0.69%	-1.884	7.969	0.36%	9121.905op/s	1	200
normalization/normalize_name/normalize_name/good	execution_time	16.586µs	16.703µs ± 0.053µs	16.705µs ± 0.041µs	16.740µs	16.788µs	16.812µs	16.829µs	0.74%	-0.008	-0.758	0.31%	0.004µs	1	200
normalization/normalize_name/normalize_name/good	throughput	59419486.849op/s	59870324.532op/s ± 188889.357op/s	59862026.274op/s ± 145730.933op/s	60027158.561op/s	60164550.056op/s	60256907.334op/s	60291408.474op/s	0.72%	0.020	-0.761	0.31%	13356.495op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	[303.781µs; 304.237µs] or [-0.075%; +0.075%]	None	None	None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	[3287005.843op/s; 3291943.930op/s] or [-0.075%; +0.075%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	execution_time	[28.273µs; 28.302µs] or [-0.051%; +0.051%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	throughput	[35333641.840op/s; 35369399.050op/s] or [-0.051%; +0.051%]	None	None	None
normalization/normalize_name/normalize_name/good	execution_time	[16.696µs; 16.710µs] or [-0.044%; +0.044%]	None	None	None
normalization/normalize_name/normalize_name/good	throughput	[59844146.284op/s; 59896502.780op/s] or [-0.044%; +0.044%]	None	None	None

Group 12

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`8074728`	1732567225	sanchda/fix_poll_zombies

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
write only interface	execution_time	1.403µs	3.261µs ± 1.445µs	3.105µs ± 0.020µs	3.123µs	3.150µs	14.256µs	15.303µs	392.77%	7.629	58.122	44.21%	0.102µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
write only interface	execution_time	[3.060µs; 3.461µs] or [-6.143%; +6.143%]	None	None	None

Baseline

Omitted due to size.

codecov-commenter · 2024-11-22T19:02:27Z

Codecov Report

Attention: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.

Project coverage is 70.47%. Comparing base (bdbbd73) to head (8074728).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #754      +/-   ##
==========================================
- Coverage   70.49%   70.47%   -0.02%     
==========================================
  Files         297      297              
  Lines       43401    43411      +10     
==========================================
  Hits        30595    30595              
- Misses      12806    12816      +10

Components	Coverage Δ
crashtracker	`43.48% <0.00%> (-0.14%)`	⬇️
crashtracker-ffi	`8.41% <ø> (ø)`
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`89.09% <ø> (ø)`
data-pipeline-ffi	`0.00% <ø> (ø)`
ddcommon	`83.46% <ø> (ø)`
ddcommon-ffi	`69.12% <ø> (ø)`
ddtelemetry	`59.05% <ø> (ø)`
ddtelemetry-ffi	`22.13% <ø> (ø)`
dogstatsd	`89.45% <ø> (ø)`
dogstatsd-client	`79.77% <ø> (ø)`
ipc	`82.76% <ø> (ø)`
profiling	`84.30% <ø> (ø)`
profiling-ffi	`77.46% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`38.01% <ø> (ø)`
sidecar-ffi	`0.00% <ø> (ø)`
spawn-worker	`50.36% <ø> (ø)`
tinybytes	`94.77% <ø> (ø)`
trace-mini-agent	`72.36% <ø> (ø)`
trace-normalization	`98.23% <ø> (ø)`
trace-obfuscation	`95.77% <ø> (ø)`
trace-protobuf	`77.67% <ø> (ø)`
trace-utils	`93.29% <ø> (ø)`

pawelchcki · 2024-11-25T17:43:43Z

crashtracker/src/collector/crash_handler.rs

-            _ => Err(anyhow::anyhow!("poll returned unexpected result")),
-        },
+    let mut poll_fds = [pollfd {
+        fd: target_fd,


+1 - BorrowedFd prefferably should be use in conjuction with OwnedFd. borrow_raw - without any guarantees of FD lifetime is problematic.

Probably the safest option would be to dup the fd - and own it within the context of this function.

Otherwise the code looks like correct but "C'ish" rust :)

#758 to track

pawelchcki · 2024-11-25T17:49:26Z

crashtracker/src/collector/crash_handler.rs

+        let reaping_allowed_ms = std::cmp::min(
+            timeout_ms.saturating_sub(start_time.elapsed().as_millis() as u32),
+            DD_CRASHTRACK_MINIMUM_REAP_TIME_MS,
+        );

        let _ = reap_child_non_blocking(receiver_pid_as_pid, reaping_allowed_ms);


libdatadog/crashtracker/src/collector/crash_handler.rs

Line 159 in 25079f5

return Err(anyhow::anyhow!("Timeout waiting for child process to exit"));

In sidecar - we send kill and term. When the timeout ends.

And it looks that - we're not doing that here either way - so a non 0 timeout will only reduce the incidence of zombies. Not prevent them.

Don't we do that just above? https://github.com/DataDog/libdatadog/blob/main/crashtracker/src/collector/crash_handler.rs#L483

pawelchcki

Approved - because Its an improvement over previous code. But it looks like some issues with zombies can still show up from time to time.

danielsn · 2024-11-25T20:46:34Z

crashtracker/src/collector/crash_handler.rs

-            _ => Err(anyhow::anyhow!("poll returned unexpected result")),
-        },
+    let mut poll_fds = [pollfd {
+        fd: target_fd,


#758 to track

danielsn · 2024-11-25T20:51:00Z

crashtracker/src/collector/crash_handler.rs

+        revents: 0,
+    }];
+
+    match unsafe { poll(poll_fds.as_mut_ptr(), 1, timeout_ms) } {


style: this should be .len not constant 1

danielsn · 2024-11-25T20:53:30Z

crashtracker/src/collector/crash_handler.rs

-            revents if revents.contains(PollFlags::POLLHUP) => Ok(true),
-            _ => Err(anyhow::anyhow!("poll returned unexpected result")),
-        },
+    let mut poll_fds = [pollfd {


comment to explain the meaning of the boolean result

danielsn · 2024-11-25T21:03:27Z

crashtracker/src/collector/crash_handler.rs

+        let reaping_allowed_ms = std::cmp::min(
+            timeout_ms.saturating_sub(start_time.elapsed().as_millis() as u32),
+            DD_CRASHTRACK_MINIMUM_REAP_TIME_MS,
+        );

        let _ = reap_child_non_blocking(receiver_pid_as_pid, reaping_allowed_ms);


Don't we do that just above? https://github.com/DataDog/libdatadog/blob/main/crashtracker/src/collector/crash_handler.rs#L483

Fix const, port to libc

f569525

sanchda requested a review from a team as a code owner November 22, 2024 18:46

sanchda requested a review from danielsn November 22, 2024 18:48

Also give some extra time for reaping

d17c01f

sanchda enabled auto-merge (squash) November 22, 2024 20:27

pawelchcki reviewed Nov 25, 2024

View reviewed changes

pawelchcki approved these changes Nov 25, 2024

View reviewed changes

Merge branch 'main' into sanchda/fix_poll_zombies

8074728

danielsn mentioned this pull request Nov 25, 2024

[crashtracker]: Dup fd to ensure proper ownership and lifetimes #758

Open

sanchda merged commit 6fe032f into main Nov 25, 2024
32 checks passed

sanchda deleted the sanchda/fix_poll_zombies branch November 25, 2024 21:04

danielsn reviewed Nov 25, 2024

View reviewed changes

This was referenced Nov 25, 2024

[crashtracker] Small style improvements to 754 #759

Merged

Bump version to 14.3.0 in preperation for release #760

Merged

rochdev mentioned this pull request Dec 17, 2024

update libdatadog to 14.3.1 DataDog/libdatadog-nodejs#42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[crashtracking] improve poll waiting logic #754

[crashtracking] improve poll waiting logic #754

sanchda commented Nov 22, 2024 •

edited

Loading

pr-commenter bot commented Nov 22, 2024 •

edited

Loading

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

codecov-commenter commented Nov 22, 2024 •

edited

Loading

pawelchcki Nov 25, 2024

danielsn Nov 25, 2024

pawelchcki Nov 25, 2024

danielsn Nov 25, 2024

pawelchcki left a comment

danielsn Nov 25, 2024

danielsn Nov 25, 2024

danielsn Nov 25, 2024

danielsn Nov 25, 2024

[crashtracking] improve poll waiting logic #754

[crashtracking] improve poll waiting logic #754

Conversation

sanchda commented Nov 22, 2024 • edited Loading

What does this PR do?

Motivation

pr-commenter bot commented Nov 22, 2024 • edited Loading

Benchmarks

Comparison

Candidate

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Baseline

codecov-commenter commented Nov 22, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pawelchcki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanchda commented Nov 22, 2024 •

edited

Loading

pr-commenter bot commented Nov 22, 2024 •

edited

Loading

codecov-commenter commented Nov 22, 2024 •

edited

Loading