Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway refuses to route to non-adhoc group if adhoc group is unhealthy #587

Open
abridgett opened this issue Jan 10, 2025 · 5 comments
Open

Comments

@abridgett
Copy link

trino-gateway v13. I created these clusters:

  • cluster1 -> adhoc group
  • cluster2 -> adhoc group
  • cluster3 -> cluser3 group

Then I added routingrules to send queries to cluster3. I can confirm this by running select * from system.runtime.nodes (always returns cluster3).

I then set cluster 1 inactive. Everything is fine. I then stop cluster2 and now the queries fail despite the fact that they should be routed to cluster3. This was unexpected. I'm not using analyzeRequest.

config snippet showing active modules:

modules:
    - io.trino.gateway.ha.module.HaGatewayProviderModule
    - io.trino.gateway.ha.module.ClusterStateListenerModule
    - io.trino.gateway.ha.module.ClusterStatsMonitorModule
    - io.trino.gateway.ha.module.QueryCountBasedRouterProvider

  managedApps:
    - io.trino.gateway.ha.clustermonitor.ActiveClusterMonitor

stacktrace:

2025-01-10T07:11:37.624-0500    WARN    http-worker-73  org.eclipse.jetty.ee10.servlet.ServletChannel   /v1/statement
jakarta.servlet.ServletException: io.trino.gateway.ha.router.RouterException: did not find any cluster for the adhoc routing group
        at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:412)
        at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:349)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:312)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
        at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
        at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
        at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:824)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:436)
        at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:597)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1060)
        at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
        at org.eclipse.jetty.server.handler.EventsHandler.handle(EventsHandler.java:81)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:151)
        at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
        at org.eclipse.jetty.server.handler.EventsHandler.handle(EventsHandler.java:81)
        at org.eclipse.jetty.server.Server.handle(Server.java:182)
        at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:662)
        at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:414)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
        at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
        at org.eclipse.jetty.util.thread.MonitoredQueuedThreadPool$1.run(MonitoredQueuedThreadPool.java:73)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
        at java.base/java.lang.Thread.run(Thread.java:1575)
Caused by: io.trino.gateway.ha.router.RouterException: did not find any cluster for the adhoc routing group
        at io.trino.gateway.ha.router.QueryCountBasedRouter.lambda$provideAdhocBackend$5(QueryCountBasedRouter.java:227)
        at java.base/java.util.Optional.orElseThrow(Optional.java:403)
        at io.trino.gateway.ha.router.QueryCountBasedRouter.provideAdhocBackend(QueryCountBasedRouter.java:227)
        at io.trino.gateway.ha.router.QueryCountBasedRouter.provideBackendForRoutingGroup(QueryCountBasedRouter.java:234)
        at io.trino.gateway.ha.handler.RoutingTargetHandler.getBackendFromRoutingGroup(RoutingTargetHandler.java:96)
        at io.trino.gateway.ha.handler.RoutingTargetHandler.lambda$getRoutingDestination$0(RoutingTargetHandler.java:72)
        at java.base/java.util.Optional.orElseGet(Optional.java:364)
        at io.trino.gateway.ha.handler.RoutingTargetHandler.getRoutingDestination(RoutingTargetHandler.java:72)
        at io.trino.gateway.proxyserver.RouteToBackendResource.postHandler(RouteToBackendResource.java:68)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
        at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:274)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:253)
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:696)
        at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:397)
        ... 33 more
@vishalya
Copy link
Member

What version are you using?

@abridgett
Copy link
Author

It was v13

@andythsu
Copy link
Member

What's the ClusterStatsMonitor you are using? Right now we have

        return switch (clusterStatsConfig.getMonitorType()) {
            case INFO_API -> new ClusterStatsInfoApiMonitor(httpClient, config.getMonitor());
            case UI_API -> new ClusterStatsHttpMonitor(config.getBackendState());
            case JDBC -> new ClusterStatsJdbcMonitor(config.getBackendState(), config.getMonitor());
            case NOOP -> new NoopClusterStatsMonitor();
        };

@andythsu
Copy link
Member

Also may I know your routing rule's condition? Is it something like request.getHeader("") == ""?

@abridgett
Copy link
Author

Sure, apologies - I should have realised that you might need that info!

We're using the JDBC monitorType.

routing rules is a set like this:

---
name: "cluster 3:
condition: 'request.getServerName() ~= "^cluster3(\\..*)?$"'
actions:
  - 'result.put("routingGroup", "cluster3")'
---
name: "cluster 4:
condition: 'request.getServerName() ~= "^cluster4(\\..*)?$"'
actions:
  - 'result.put("routingGroup", "cluster4")'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants