-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarize compact snapshot logs to reduce noise #467
Conversation
The significantly quieter logs may surprise some users, especially when updating the collector. Some alternatives:
|
I like that idea - I do often use the logs line ("Submitted compact logs snapshot successfully") in particular to debug whether logs are being sent successfully, but wouldn't be a big deal if that gets sent once a minute vs right away. It would be a tad harder to work with if that didn't get written at all by default (since usually the background logs are not in verbose mode, and with systemd its a bit of effort to turn on verbose logging for the background process). |
4eac449
to
6637bfd
Compare
Okay, the logs now look like this: ./pganalyze-collector
2023/10/25 11:20:00 I [server1] Submitted snapshot successfully
2023/10/25 11:20:02 I [server1] Compact snapshots submitted over the past minute: 6 activity
2023/10/25 11:21:10 I [server1] Compact snapshots submitted over the past minute: 6 activity, 4 logs
2023/10/25 11:22:10 I [server1] Compact snapshots submitted over the past minute: 5 activity, 2 logs
2023/10/25 11:23:10 I [server1] Compact snapshots submitted over the past minute: 5 activity, 2 logs
2023/10/25 11:24:20 I [server1] Compact snapshots submitted over the past minute: 6 activity Note that it currently doesn't track the stats per-server. It wasn't clear to me what server identity value should be used for the |
@lfittl per our discussion, this has been updated to have a shorter log message and to track the stats per-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but looks good.
output/compact.go
Outdated
details += ", " | ||
} | ||
} | ||
logger.PrintInfo("Compact snapshots submitted: " + details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ever get here with an empty CompactLogStats
map? If so, that would print a weird
Compact snapshots submitted:
message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK that would only happen if all snapshots failed, which would be logged as errors separately. I'll go ahead and add a presence check here.
output/compact.go
Outdated
server.CompactLogTime = time.Now() | ||
server.CompactLogStats = make(map[string]uint8) | ||
} else if time.Now().Sub(server.CompactLogTime) > time.Minute { | ||
server.CompactLogTime = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but since this is part of resetting the log stats state, it feels like it belongs together with resetting the map below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally didn't do that, to avoid clock skew from the time it takes to build the log message. But it should never take a long time and this code path is non-critical so I'll move it down to improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 a way to avoid the timestamp changing over time would be to truncate it to the nearest even minute, instead of simply using time.Now()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would ensure we always print the log at an even minute. Though it does mean that when initially starting up the collector you may have to wait was long as 1m59s before anything is logged.
Or is it 1m30s? I'm not 100% sure what Truncate
does
@@ -140,7 +140,7 @@ func submitCompactSnapshot(ctx context.Context, server *state.Server, collection
} else if !quiet {
logger.PrintVerbose("Submitted compact %s snapshot successfully", kind)
if server.CompactLogTime.IsZero() {
- server.CompactLogTime = time.Now()
+ server.CompactLogTime = time.Now().Truncate(time.Minute)
server.CompactLogStats = make(map[string]uint8)
} else if time.Now().Sub(server.CompactLogTime) > time.Minute {
var keys []string
@@ -158,7 +158,7 @@ func submitCompactSnapshot(ctx context.Context, server *state.Server, collection
if len(details) > 0 {
logger.PrintInfo("Compact snapshots submitted: " + details)
}
- server.CompactLogTime = time.Now()
+ server.CompactLogTime = time.Now().Truncate(time.Minute)
server.CompactLogStats = make(map[string]uint8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example output:
./pganalyze-collector
2023/11/08 15:50:00 I [server1] Compact snapshots submitted: 5 activity
2023/11/08 15:50:00 I [server1] Submitted snapshot successfully
2023/11/08 15:51:00 I [server1] Compact snapshots submitted: 5 activity
2023/11/08 15:52:00 I [server1] Compact snapshots submitted: 5 activity
2023/11/08 15:53:00 I [server1] Compact snapshots submitted: 5 activity, 1 logs
2023/11/08 15:54:00 I [server1] Compact snapshots submitted: 5 activity, 1 logs
2023/11/08 15:55:00 I [server1] Compact snapshots submitted: 5 activity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added this change; it's nice that the logs show up reliably at the same time every minute.
The high frequency of compact snapshots causes them to create the vast majority of log lines in the default (info) log level. That makes it difficult to find logs related to errors when debugging an issue.
This PR summarizes the 6+ compact snapshots sent per minute into a single log line: