Skip to content
This repository has been archived by the owner on Sep 2, 2020. It is now read-only.

Queue full error when trying to send data to Cyanite #284

Open
dancb10 opened this issue Dec 8, 2017 · 4 comments
Open

Queue full error when trying to send data to Cyanite #284

dancb10 opened this issue Dec 8, 2017 · 4 comments

Comments

@dancb10
Copy link

dancb10 commented Dec 8, 2017

Hello,
We are trying Cyanite for the first time in our testing environment with two Cyanite+Graphite API instances (16G RAM, 8 Cores and SSD) and a three node Cassandra cluster (64G RAM, 8 Cores and SSD) but it seems it cannot handle the load.
We see a lot of queue full errors in the Cyanite config file when starting our load test. We are using graphite stress tool to load test the environment:

java8 -jar build/libs/graphite-stresser-0.1.jar loadbalancer 2003 100 975 10 true

We have set the heap size of Cyanite to 12G

java -Xms12g -Xmx12g -jar cyanite-0.5.1-standalone-fix.jar --path cyanite.yaml

The cyanite.yaml file also contains the following ingest and write queues settings

queues:
  defaults:
    ingestq:
      pool-size: 100
      queue-capacity: 2000000
    writeq:
      pool-size: 100
      queue-capacity: 2000000

It seems that it barely starts the load test and the following errors are sent continuously. NOTE that if even when we stop the load test the log file just keeps sending this type of errors non stop:

WARN [2017-12-08 09:46:13,337] nioEventLoopGroup-2-15 - io.netty.channel.DefaultChannelPipeline An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98)
	at io.cyanite.engine.queue.EngineQueue.engine_event_BANG_(queue.clj:44)
	at io.cyanite.engine.Engine.enqueue_BANG_(engine.clj:108)
	at io.cyanite.input.carbon$pipeline$fn__16369.invoke(carbon.clj:40)
	at io.cyanite.input.carbon.proxy$io.netty.channel.ChannelInboundHandlerAdapter$ff19274a.channelRead(Unknown Source)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)

Do you have any ides how to fix this? @pyr @ifesdjeen . I see that this issue was addressed here previously but, has it been fixed completely? Note that we have compiled Cyanite until this commit because the latest version does not return multiple metrics in a single query. I have submitted a issue for this bug as well.

@dancb10 dancb10 changed the title Queue error full when trying to send data to Cyanite Queue full error when trying to send data to Cyanite Dec 8, 2017
@dancb10
Copy link
Author

dancb10 commented Dec 8, 2017

We have tried load testing with the following data:
hosts timers interval
10 * 6415 10 - 9600 per 10 seconds OK
10 * 975
15 10 - 146250 per 10 seconds OK
10 * 1956*15 10 - 293400 per 10 seconds FAILED

This was done with two Cyanite nodes (16G RAM, 8 Cores and SSD). Cyanite daemon fails while we still have plenty of resources.

@ifesdjeen
Copy link
Collaborator

@dancb10 you can increase queue capacity.

What's your ingestion rate? How many events per second does Cyanite get approximately?

@dancb10
Copy link
Author

dancb10 commented Dec 11, 2017

The number of events are written in the last commit. So there are 10 instances each sending 1956*10 number of metrics every 10 seconds. So there are 293400 number of metrics send every 10 seconds, that's 29340 each second.
Note that we are using pretty powerful instances and we have also split the writes and reads into multiple instances. So we have two instances that write two that read each with its own elb.
Instances are c3.2xlarge | 8 | 15 | 2 x 80
We are using 2 million queue size so @ifesdjeen you are saying to increase this number? I will try with 20 million.

@ifesdjeen
Copy link
Collaborator

We are using 2 million queue size so @ifesdjeen you are saying to increase this number?

Hm, no actually 2M should be usually ok...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants