Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow down consensus (increase timeouts) when vertex store is close to being full #859

Open
wants to merge 3 commits into
base: feature/vertex-store-overflow-mitigations
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions core/src/main/java/com/radixdlt/RadixNodeModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
import com.radixdlt.consensus.ProposalLimitsConfig;
import com.radixdlt.consensus.bft.*;
import com.radixdlt.consensus.epoch.EpochsConsensusModule;
import com.radixdlt.consensus.liveness.PacemakerTimeoutCalculatorConfig;
import com.radixdlt.consensus.sync.BFTSyncPatienceMillis;
import com.radixdlt.consensus.vertexstore.VertexStoreConfig;
import com.radixdlt.environment.*;
Expand Down Expand Up @@ -150,11 +151,16 @@ protected void configure() {
.annotatedWith(BFTSyncPatienceMillis.class)
.to(properties.get("bft.sync.patience", 200));

// Max timeout = (1.2^8)×3 ~= 13s
bindConstant().annotatedWith(PacemakerBaseTimeoutMs.class).to(3000L);
bindConstant().annotatedWith(PacemakerBackoffRate.class).to(1.2);
bindConstant().annotatedWith(PacemakerMaxExponent.class).to(8);
bindConstant().annotatedWith(AdditionalRoundTimeIfProposalReceivedMs.class).to(30_000L);
/* Default timeouts config:
Max exponential timeout (based on consecutive timeout occurrences) = (1.2^8)×3 ~= 13s
Additionally, when vertex store reaches 2/3 of its max capacity (that is: 2/3*150 = 100 MB by default)
we start multiplying the timeout by a linearly increasing value, up to 10x.
So a maximum theoretical timeout is 130s. */
bind(PacemakerTimeoutCalculatorConfig.class)
.toInstance(new PacemakerTimeoutCalculatorConfig(3000L, 1.2, 8, 30_000L, 0.66, 10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding it a little hard to read this - because all of these don't have names any more.

Perhaps we should have a PacemakerTimeoutCalculatorConfig::default() and a PacemakerTimeoutCalculatorConfig::testing()? And inside those methods, we can label the values with variable names, and then pass the variables into the constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For review purposes, these are:

    long baseTimeoutMs: 3 000
    double consecutiveTimeoutSlowdownRate: 1.2
    int consecutiveTimeoutMaxExponent: 8 // 1.2^8 = 4.29981696
    long additionalRoundTimeIfProposalReceivedMs: 30 000
    double vertexStoreMultiplierThreshold: 0.66
    double maxVertexStoreMultiplier: 10


// Delayed resolution is disabled for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be worth explaining why in this comment? (i.e. because we cannot create QCs on fallback vertices, because we don't just sign the ledger header, but also the BFT header, which captures the previous certificate chain, and all the nodes have a different certificate chain for their fallback vertex)

// TODO: consider reviving this feature or clean it up
bindConstant().annotatedWith(TimeoutQuorumResolutionDelayMs.class).to(0L);

final var vertexStoreConfig =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@
import com.radixdlt.api.system.generated.models.BFTConfiguration;
import com.radixdlt.api.system.generated.models.MempoolConfiguration;
import com.radixdlt.api.system.generated.models.SystemConfigurationResponse;
import com.radixdlt.consensus.bft.PacemakerBaseTimeoutMs;
import com.radixdlt.consensus.bft.Self;
import com.radixdlt.consensus.liveness.PacemakerTimeoutCalculatorConfig;
import com.radixdlt.consensus.sync.BFTSyncPatienceMillis;
import com.radixdlt.crypto.ECDSASecp256k1PublicKey;
import com.radixdlt.mempool.MempoolThrottleMs;
Expand All @@ -81,7 +81,7 @@

public final class ConfigurationHandler extends SystemGetJsonHandler<SystemConfigurationResponse> {

private final long pacemakerTimeout;
private final PacemakerTimeoutCalculatorConfig pacemakerTimeoutCalculatorConfig;
private final int bftSyncPatienceMillis;
private final long mempoolThrottleMs;
private final SyncRelayConfig syncRelayConfig;
Expand All @@ -93,7 +93,7 @@ public final class ConfigurationHandler extends SystemGetJsonHandler<SystemConfi
@Inject
ConfigurationHandler(
@Self ECDSASecp256k1PublicKey self,
@PacemakerBaseTimeoutMs long pacemakerTimeout,
PacemakerTimeoutCalculatorConfig pacemakerTimeoutCalculatorConfig,
@BFTSyncPatienceMillis int bftSyncPatienceMillis,
@MempoolThrottleMs long mempoolThrottleMs,
SyncRelayConfig syncRelayConfig,
Expand All @@ -102,7 +102,7 @@ public final class ConfigurationHandler extends SystemGetJsonHandler<SystemConfi
ProtocolConfig protocolConfig) {
super();
this.self = self;
this.pacemakerTimeout = pacemakerTimeout;
this.pacemakerTimeoutCalculatorConfig = pacemakerTimeoutCalculatorConfig;
this.bftSyncPatienceMillis = bftSyncPatienceMillis;
this.mempoolThrottleMs = mempoolThrottleMs;
this.syncRelayConfig = syncRelayConfig;
Expand All @@ -120,7 +120,7 @@ public SystemConfigurationResponse handleRequest() {
.bft(
new BFTConfiguration()
.bftSyncPatience(bftSyncPatienceMillis)
.pacemakerTimeout(pacemakerTimeout))
.pacemakerTimeout(pacemakerTimeoutCalculatorConfig.baseTimeoutMs()))
.mempool(new MempoolConfiguration().maxSize(0).throttle(mempoolThrottleMs))
.sync(systemModelMapper.syncConfiguration(syncRelayConfig))
.networking(systemModelMapper.networkingConfiguration(self, p2PConfig))
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,8 @@
public class EpochsConsensusModule extends AbstractModule {
@Override
protected void configure() {
bind(ExponentialPacemakerTimeoutCalculator.class).in(Scopes.SINGLETON);
bind(PacemakerTimeoutCalculator.class).to(ExponentialPacemakerTimeoutCalculator.class);
bind(MultiFactorPacemakerTimeoutCalculator.class).in(Scopes.SINGLETON);
bind(PacemakerTimeoutCalculator.class).to(MultiFactorPacemakerTimeoutCalculator.class);

OptionalBinder.newOptionalBinder(
binder(), EpochManager.class); // So that this is consistent with tests
Expand Down
Loading
Loading