Queue full error when trying to send data to Cyanite #284

dancb10 · 2017-12-08T09:58:39Z

Hello,
We are trying Cyanite for the first time in our testing environment with two Cyanite+Graphite API instances (16G RAM, 8 Cores and SSD) and a three node Cassandra cluster (64G RAM, 8 Cores and SSD) but it seems it cannot handle the load.
We see a lot of queue full errors in the Cyanite config file when starting our load test. We are using graphite stress tool to load test the environment:

java8 -jar build/libs/graphite-stresser-0.1.jar loadbalancer 2003 100 975 10 true

We have set the heap size of Cyanite to 12G

java -Xms12g -Xmx12g -jar cyanite-0.5.1-standalone-fix.jar --path cyanite.yaml

The cyanite.yaml file also contains the following ingest and write queues settings

queues:
  defaults:
    ingestq:
      pool-size: 100
      queue-capacity: 2000000
    writeq:
      pool-size: 100
      queue-capacity: 2000000

It seems that it barely starts the load test and the following errors are sent continuously. NOTE that if even when we stop the load test the log file just keeps sending this type of errors non stop:

WARN [2017-12-08 09:46:13,337] nioEventLoopGroup-2-15 - io.netty.channel.DefaultChannelPipeline An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98)
	at io.cyanite.engine.queue.EngineQueue.engine_event_BANG_(queue.clj:44)
	at io.cyanite.engine.Engine.enqueue_BANG_(engine.clj:108)
	at io.cyanite.input.carbon$pipeline$fn__16369.invoke(carbon.clj:40)
	at io.cyanite.input.carbon.proxy$io.netty.channel.ChannelInboundHandlerAdapter$ff19274a.channelRead(Unknown Source)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)

Do you have any ides how to fix this? @pyr @ifesdjeen . I see that this issue was addressed here previously but, has it been fixed completely? Note that we have compiled Cyanite until this commit because the latest version does not return multiple metrics in a single query. I have submitted a issue for this bug as well.

The text was updated successfully, but these errors were encountered:

dancb10 · 2017-12-08T12:53:45Z

We have tried load testing with the following data:
hosts timers interval
10 * 6415 10 - 9600 per 10 seconds OK
10 * 97515 10 - 146250 per 10 seconds OK
10 * 1956*15 10 - 293400 per 10 seconds FAILED

This was done with two Cyanite nodes (16G RAM, 8 Cores and SSD). Cyanite daemon fails while we still have plenty of resources.

ifesdjeen · 2017-12-09T12:23:36Z

@dancb10 you can increase queue capacity.

What's your ingestion rate? How many events per second does Cyanite get approximately?

dancb10 · 2017-12-11T09:19:38Z

The number of events are written in the last commit. So there are 10 instances each sending 1956*10 number of metrics every 10 seconds. So there are 293400 number of metrics send every 10 seconds, that's 29340 each second.
Note that we are using pretty powerful instances and we have also split the writes and reads into multiple instances. So we have two instances that write two that read each with its own elb.
Instances are c3.2xlarge | 8 | 15 | 2 x 80
We are using 2 million queue size so @ifesdjeen you are saying to increase this number? I will try with 20 million.

ifesdjeen · 2017-12-11T17:16:16Z

We are using 2 million queue size so @ifesdjeen you are saying to increase this number?

Hm, no actually 2M should be usually ok...

dancb10 changed the title ~~Queue error full when trying to send data to Cyanite~~ Queue full error when trying to send data to Cyanite Dec 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue full error when trying to send data to Cyanite #284

Queue full error when trying to send data to Cyanite #284

dancb10 commented Dec 8, 2017 •

edited by ifesdjeen

Loading

dancb10 commented Dec 8, 2017

ifesdjeen commented Dec 9, 2017

dancb10 commented Dec 11, 2017

ifesdjeen commented Dec 11, 2017

Queue full error when trying to send data to Cyanite #284

Queue full error when trying to send data to Cyanite #284

Comments

dancb10 commented Dec 8, 2017 • edited by ifesdjeen Loading

dancb10 commented Dec 8, 2017

ifesdjeen commented Dec 9, 2017

dancb10 commented Dec 11, 2017

ifesdjeen commented Dec 11, 2017

dancb10 commented Dec 8, 2017 •

edited by ifesdjeen

Loading