-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more siginfo_t context #726
base: main
Are you sure you want to change the base?
Conversation
BenchmarksComparisonBenchmark execution time: 2024-11-12 16:14:07 Comparing candidate commit dac1014 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics. CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
BaselineOmitted due to size. |
… into sanchda/add_faultinfo
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #726 +/- ##
==========================================
+ Coverage 71.16% 71.20% +0.04%
==========================================
Files 287 288 +1
Lines 42739 42835 +96
==========================================
+ Hits 30416 30502 +86
- Misses 12323 12333 +10
|
"faulting_address": 0, | ||
"pid": 0, | ||
"si_addr": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in hex
// developers who are querying for specific faults. However, we should probably decide how | ||
// this gets canonized and which component is responsible. | ||
// For now, double-ship the address. | ||
if siginfo.si_signo == libc::SIGSEGV { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not SIGBUS
?
} | ||
} | ||
|
||
// These are defined in siginfo.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these stable across OS/Architecture?
#[serde(skip_serializing_if = "Option::is_none")] | ||
#[serde(default)] | ||
pub signame: Option<String>, | ||
pub signame: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ we normally keep fields alphabetical unless there is a reason not to
@@ -509,8 +501,7 @@ fn handle_posix_signal_impl(signum: i32, sig_info: *mut siginfo_t) -> anyhow::Re | |||
config, | |||
config_str, | |||
metadata_string, | |||
signum, | |||
faulting_address, | |||
unsafe { *siginfo }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling by value not reference?
"faulting_address": 0, | ||
"pid": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only collect the fields that are relevent for the given signal https://man7.org/linux/man-pages/man2/sigaction.2.html
This should be synchronized with RFC5. |
What does this PR do?
This adds some more siginfo_t context to crashes. In addition to the SIGBUS codes, I added some logic to make sense of things like SIGSYS, since I would like to add support for that soon.
One possible point of contention that I was unable to resolve: currently, we emit a
faulting_address
, which is a representation of thesi_addr
from thesiginfo_t
only when the fault is a segfault. This makes a lot of sense in the current product, which does no special normalization in the telemetry backend. However, from a data transmission perspective, I think it makes more sense to represent the struct-as-it-is-written and defer to the backend for normalization. Moreover,si_addr
has context outside ofSIGSEGV
.This decision duplicates an address for now. Oh well.
Motivation
While exploring the behavior of crashtracking for system tests, and as part of my experiments in other branches, I started relying on
strace
's comprehension forsiginfo_t
. We also had a recent bug that could really only have been comprehended with content fromsiginfo_t
(stack overflow by hitting the crashtracking guard page).I think this information is really important.