Skip to content
osma edited this page Sep 30, 2014 · 8 revisions

If you use Fuseki (with Skosmos or otherwise) in a production system, it can be useful to tune it for better performance and resilience. This page contains tips and advice for Fuseki tuning.

Setting JVM memory usage limits

By default, the Fuseki startup script sets the JVM option -Xmx1200M, i.e. allocates up to 1.2GB memory for the Fuseki process. This can be a bit low and may cause Fuseki to run out of memory. The amount of memory required also depends on the number of requests performed in parallel, so if you are low on memory, it makes sense to aggressively limit the number of Jetty threads (see below).

You can adjust the -Xmx setting either by editing the startup script or by setting the JVM_ARGS environment variable.

Finto.fi uses -Xmx8G on a machine with 16GB of memory, with the Jetty thread pool set to 4-6 threads.

Tuning Jetty

If there are lots of parallel requests, Fuseki easily gets overwhelmed, eats too much memory and the JVM GC starts thrashing. By default, there is no limit on the number of parallel requests in Jetty (the servlet container for Fuseki). Also the queue for incoming requests has no upper limit, so even when the situation starts clearing up there may be a long backlog of requests to process. See the Jetty thread pool tuning documentation for more information.

Fuseki can use a custom Jetty configuration (using the --jetty-config=jetty.xml parameter) where limits can be set on the thread count and the queue size for waiting requests. A Jetty thread count close to the number of CPU cores in the system makes the most sense - queries are generally CPU bound as the TDB database usually fits in disk cache. Anything above that will generally just increase Fuseki memory consumption with no improvement in performance.

This custom jetty.xml configuration sets the thread count to between 4 and 6 and the size of the request queue to 100:

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN"
"http://www.eclipse.org/jetty/configure.dtd">
 
<!-- 
  Reference: http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax
  http://wiki.eclipse.org/Jetty/Reference/jetty.xml
-->

<Configure id="Fuseki" class="org.eclipse.jetty.server.Server">
  <Call name="addConnector">
    <Arg>
      <!-- org.eclipse.jetty.server.nio.BlockingChannelConnector -->
      <!-- org.eclipse.jetty.server.nio.SelectChannelConnector -->
      <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
        <!-- BlockingChannelConnector specific:
             <Set name="useDirectBuffer">false</Set>
        -->
        <!-- Only listen to interface ...
        <Set name="host">localhost</Set>
        -->
        <Set name="port">3030</Set>
        <Set name="maxIdleTime">0</Set>
        <!-- All connectors -->
        <Set name="requestHeaderSize">65536</Set>       <!-- 64*1024 -->
        <Set name="requestBufferSize">5242880</Set>     <!-- 5*1024*1024 -->
        <Set name="responseBufferSize">5242880</Set>    <!-- 5*1024*1024 -->
      </New>
    </Arg>
  </Call>
    <Set name="ThreadPool">
      <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
        <!-- specify a bounded queue -->
        <Arg>
           <New class="java.util.concurrent.ArrayBlockingQueue">
              <Arg type="int">100</Arg>
           </New>
      </Arg>
        <Set name="minThreads">4</Set>
        <Set name="maxThreads">6</Set>
        <Set name="detailedDump">false</Set>
      </New>
    </Set>
</Configure>

HTTP caching

Using a caching reverse proxy (e.g. Varnish or nginx) is recommended, either in front of Skosmos, in front of the SPARQL endpoint, or both. Most SPARQL queries performed by Skosmos are HTTP GET requests, which are easily cached.