-
Notifications
You must be signed in to change notification settings - Fork 95
FusekiTuning
If you use Fuseki (with Skosmos or otherwise) in a production system, it can be useful to tune it for better performance and resilience. This page contains tips and advice for Fuseki tuning.
By default, the Fuseki startup script sets the JVM option -Xmx1200M, i.e. allocates up to 1.2GB memory for the Fuseki process. This can be a bit low and may cause Fuseki to run out of memory. The amount of memory required also depends on the number of requests performed in parallel, so if you are low on memory, it makes sense to aggressively limit the number of Jetty threads (see below).
You can adjust the -Xmx setting either by editing the startup script or by setting the JVM_ARGS environment variable.
Finto.fi uses -Xmx8G on a machine with 16GB of memory, with the Jetty thread pool set to 4-6 threads.
If there are lots of parallel requests, Fuseki easily gets overwhelmed, eats too much memory and the JVM GC starts thrashing. By default, there is no limit on the number of parallel requests in Jetty (the servlet container for Fuseki). Also the queue for incoming requests has no upper limit, so even when the situation starts clearing up there may be a long backlog of requests to process. See the Jetty thread pool tuning documentation for more information.
Fuseki can use a custom Jetty configuration (using the --jetty-config=jetty.xml parameter) where limits can be set on the thread count and the queue size for waiting requests. A Jetty thread count close to the number of CPU cores in the system makes the most sense - queries are generally CPU bound as the TDB database usually fits in disk cache. Anything above that will generally just increase Fuseki memory consumption with no improvement in performance.
This custom jetty.xml configuration sets the thread count to between 4 and 6 and the size of the request queue to 100:
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN"
"http://www.eclipse.org/jetty/configure.dtd">
<!--
Reference: http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax
http://wiki.eclipse.org/Jetty/Reference/jetty.xml
-->
<Configure id="Fuseki" class="org.eclipse.jetty.server.Server">
<Call name="addConnector">
<Arg>
<!-- org.eclipse.jetty.server.nio.BlockingChannelConnector -->
<!-- org.eclipse.jetty.server.nio.SelectChannelConnector -->
<New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
<!-- BlockingChannelConnector specific:
<Set name="useDirectBuffer">false</Set>
-->
<!-- Only listen to interface ...
<Set name="host">localhost</Set>
-->
<Set name="port">3030</Set>
<Set name="maxIdleTime">0</Set>
<!-- All connectors -->
<Set name="requestHeaderSize">65536</Set> <!-- 64*1024 -->
<Set name="requestBufferSize">5242880</Set> <!-- 5*1024*1024 -->
<Set name="responseBufferSize">5242880</Set> <!-- 5*1024*1024 -->
</New>
</Arg>
</Call>
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<!-- specify a bounded queue -->
<Arg>
<New class="java.util.concurrent.ArrayBlockingQueue">
<Arg type="int">100</Arg>
</New>
</Arg>
<Set name="minThreads">4</Set>
<Set name="maxThreads">6</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
</Configure>
Using a caching reverse proxy (e.g. Varnish or nginx) is recommended, either in front of Skosmos, in front of the SPARQL endpoint, or both. Most SPARQL queries performed by Skosmos are HTTP GET requests, which are easily cached.