fix:when the shard is created, disk iops is high, causing http write … #24584
+28
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have identified an issue with the InfluxDB service that occurs daily at midnight. During the creation of shards each day (shardGroupDuration=24h), the disk IO usage spikes to 100%, the number of goroutines increases to 200k, and occasionally leads to system OOM. This results in HTTP write requests failing due to timeouts, significantly impacting the service's usability.
Upon analysis and tracing, I have pinpointed the root cause of the issue. In the log_file.AddSeriesList function, if the seriesID of a point is not present in the seriesSet, it triggers the f.FlushAndSync() operation, causing a significant increase in disk IOPS. I suggest increasing the probability of performing this operation from the current 10%, allowing more flush operations to be handled by the operating system to alleviate disk pressure.
System: CentOS Linux release 7.9.2009 (Core)
Version: InfluxDB1.8.6
Please help me review the code and give suggestions!
Closes #
Describe your proposed changes here.