Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphdb batch writer resiliency #295

Merged
merged 13 commits into from
Dec 2, 2024

Conversation

Zenithar
Copy link
Contributor

@Zenithar Zenithar commented Nov 28, 2024

Context

The graphdb entity writers can be subject to unexpected behaviours from the JanusGraph backend (eg. backend not responding due to runtine errors).

The writers have been modified to add resilience to unexpected situation:

  • The lock-free microbatcher pattern has been externalised to a dedicated instance to prevent design issues
  • The batch processing are bound in time to react to the unresponsive backend issue (default: 60s)
  • A retry pattern allows a batch to be retried with an capped limit. Batchs are lost if the retry max is reached. (default: 3)
  • The writer supports parallel insertion workers. (default: 10)
  • When a timeout error occurs, each batch size is divided by 2 to reduce the size of each attempt to prevent GraphBinary errors due to too large requests.

Sample warning triggering a retry with split batch after a unresponsive backend issue:

11:58:02 WARN Retrying write operation with smaller vertex batch (n:500 -> 250, r:0) app=kubehound dd.span_id=0 dd.trace_id=0

The batcher settings has been updated to reflect this new state:

  • Insertion batch 500 -> 250
    • tunable with KH_BUILDER_VERTEX_BATCH_SIZE / KH_BUILDER_EDGE_BATCH_SIZE env. variables
    • or builder.egde/vertex.batch_size config key
  • Concurrent insertion 1 -> 10
    • tunable with KH_JANUSGRAPH_WRITER_WORKER_COUNT
    • or janusgraph.writer_worker_count config key

@Zenithar Zenithar requested a review from a team as a code owner November 28, 2024 14:14
@Zenithar Zenithar self-assigned this Nov 28, 2024
Copy link
Contributor

@jt-dd jt-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not working as the async synchro is not properly done. Also maybe adding a dedicated function to add element into the queue.

pkg/kubehound/storage/graphdb/janusgraph_edge_writer.go Outdated Show resolved Hide resolved
pkg/kubehound/storage/graphdb/janusgraph_edge_writer.go Outdated Show resolved Hide resolved
pkg/kubehound/storage/graphdb/janusgraph_edge_writer.go Outdated Show resolved Hide resolved
pkg/kubehound/storage/graphdb/provider.go Show resolved Hide resolved
pkg/kubehound/storage/graphdb/provider.go Show resolved Hide resolved
pkg/kubehound/storage/graphdb/janusgraph_vertex_writer.go Outdated Show resolved Hide resolved
pkg/kubehound/storage/graphdb/janusgraph_vertex_writer.go Outdated Show resolved Hide resolved
@Zenithar Zenithar requested a review from jt-dd November 29, 2024 09:51
Copy link
Contributor

@jt-dd jt-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. LTGM, can you add to the reference configuration file with the default value ?

@Zenithar Zenithar merged commit d8de160 into main Dec 2, 2024
8 checks passed
@Zenithar Zenithar deleted the zenithar/graphdb_batch_writer_resiliency branch December 2, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants