-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] - Log reason when node switches tip to a fork #6014
Comments
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days. |
Please stay as an open feature request. |
Yes, please: if this is considered impractical to add I would like to hear why. Our SPO operation believes this is important for quality assurance purposes. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days. |
@carbolymer what kind of engagement do the developers expect for this issue not to be This log info has been long & vitally requested for diagnostic purposes and the need for it is never going to go away. Those who need it most would be small stake pools with infrequent blocks... and therefore needed more in the near future if & when the If this issue keeps being dismissed, the implication is that QA isn't very important for stake pools and/or chain density isn't that important for Cardano itself. I don't think either of these things are true and I can state from SPO experience dating back to the beginning of the Shelley period that implementing this feature (which we arguably should have had from day 1) would be helpful for SPOs and node infrastructure operators at all levels of endeavour. |
There was a slack discussion about this issue some time ago. |
The trace messages already contain enough detail to deduce the reason why a node changes its chain. Concretely, whenever we change our selection, we trace the so called " In Conway, the chain order works like this, via a lexicographical combinationi of the following criteria, when deciding whether to switch from the current chain to a candidate chain:
By comparing the old and new Let's look at an example: {
"at": "2024-12-13T13:50:58.295305594Z",
"ns": "ChainDB.AddBlockEvent.AddedToCurrentChain",
"data": {
"kind": "AddedToCurrentChain",
"newTipSelectView": {
"chainLength": 11214658,
"issueNo": 16,
"issuerHash": "8e0bb9b126acd8e65a023602377d9a7a6f15af394d1f7bdd852182b8",
"kind": "PraosChainSelectView",
"slotNo": 142531567,
"tieBreakVRF": "f8f31aabb555124392485cbd7ba02bded4e4eac8980fc36fde0c9a21d7cf2d51c6fd4ddda8113153639db862781dafacbdd77da4e072452a330c32509506762e"
},
"newtip": "17d5b0f21e86503b41851640e9558610ebe9b7a62c96a08873eb4d0c7961a460@142531567",
"oldTipSelectView": {
"chainLength": 11214657,
"issueNo": 20,
"issuerHash": "c63dab6d780a74cbae2a27696c9723f55b3092b2bd001256df03827f",
"kind": "PraosChainSelectView",
"slotNo": 142531554,
"tieBreakVRF": "905ff37db665e86d60d93dabc92ec86124b1a22165e4968facca226e790d9b7dd3f60502fa84dcd30d95ac28a9278e5e72b8b2eb451759ab59c92fc65909a2b4"
}
},
"sev": "Notice",
"thread": "76",
"host": "ramify"
} Here, we can see that the If these would be equal, one would have to look at rule 2 next (and then rule 3) to see whether it is responsible. For convenience/simplicity, we could further enrich the trace message to directly include eg a cc @mgmeier as you might have already started with this. |
That sort of log information would absolutely suit my needs. I wasn't aware we could get the "tieBreakVRF" values logged ordinarily. What do I need to configure in my node config.json file so that such information is logged in my systemd journal? |
In both the old and new tracing system, you should see this when you enable JSON output and use a severity of at least |
Thanks @amesgen for summarizing the discussion we had (BTW: needed to edit your post to change my GH handle to @mgmeier - the correct one). I'll add some more
|
Thanks for catching that!
In the code, it looks like cardano-node/cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs Lines 511 to 518 in 1774c93
|
Aplogies, yes, the info will be traced at default detail level. |
@amesgen @mgmeier our pool has 4+ years of homebrewed automation driven from text-based output ( Does it follow from #6014 (comment) that |
Which tracing system are you using (i.e., what is your config value for |
Whatever the default would be for the node: we don't have this setting in any of our config files. |
Then it is the old system, as currently it defaults to However, here's an idea about how to scrape that info without breaking existing automation. In the node config, you could set up an additional scribe to a file, having Would that work for you somehow? |
yes, thanks @mgmeier - to log these events we'll fork off another log as you suggest. 🙏 If whatever changes are contemplated here produce effects in |
Internal/External
External: Stake pool operator
Area
Other: Logging
Describe the feature you'd like
Please log the reason why the node preferred another fork
Describe alternatives you've considered
The independent tool
cncli
provides logging of block data including the block VRF value into a sqlite database. This database can be queried for insight into whether the likely reason the node preferred another fork was because the other fork terminal block had a lower block VRF and the tie break rule was applied.Additional context / screenshots
The node knows the reason for switching tip to a competing fork but this reason isn't logged. Rather the logs only confirm that the tip was switched. But, was the reason due to the competing fork being longer? Or was it due to the other fork's terminal block having a lower VRF causing the tie-break rule to be applied? Or did the node prefer another fork some other reason?
This information would be really helpful when trying to identify what caused your own block to not be adopted.
The text was updated successfully, but these errors were encountered: