Converts Netwitness log parser configuration to Logstash configuration
Disclamer: Vincent Maury or Elastic cannot be held responsible for the use of this script! Use it at your own risk
The purpose of this tool is to convert an existing configuration made for RSA Netwitness Log Parser software (ingestion piece of the RSA SIEM) into a Logstash configuration that can ingest logs to Elasticsearch.
RSA uses one configuration file per device source (product). For example, one file will handle F5 ASM, another one will handle F5 APM, etc.
Please note that RSA released the configuration files for 300 devices on github with the Apache 2.0 license. So if you are not an RSA user, you can still pass any of these configuration files to the rsa2elk tool to generate the corresponding Logstash pipeline.
These instructions will get you a copy of the project up and running on your local machine.
This piece of python has no other pre-requisite than Python 3. It should work on any platform (tested on Windows so far). No need for additional library.
Just clone this repository and run the script.
git clone https://github.com/blookot/rsa2elk
python rsa2elk.py -h
The script has several options:
-h
will display help.-i
or--input-file FILE
to enter the absolute patch to the RSA XML configuration file. Alternative is url.-u
or--url URL
to enter the URL to the RSA XML configuration file. if no file or url is provided, this program will run on a sample XML file located in the RSA repo.-o
or--output-file FILE
to enter the absolute path to the Logstash .conf file (default:logstash-[device].conf
).-c
or--check-config
runs on check of the generated configuration withlogstash -f
(default: false).-l
or--logstash-path
to enter the absolute path to logstash bin executable (default is my local path!).-n
or--no-grok-anchors
removes the begining (^) and end ($) anchors in grok (default: false, ie default is to have them).-a
or--add-stop-anchors
adds hard stop anchors in grok to ignore in-between chars, see explanation below. Should be set as a serie of plain characters, only escaping " and \. Example:\"()[]
(default: "").-s
or--single-space-match
to only match 1 space in the log if there is 1 space in the RSA parser (default: false, ie match 1-N spaces aka[\s]+
).-p
or--parse-url
adds a pre-defined filter block (see filter-url.conf) to parse URLs into domain, query, etc (default: false).-q
or--parse-ua
adds a pre-defined filter block (see filter-ua.conf) to parse User Agents (default: false).-r
or--remove-parsed-fields
removes the event.original and message fields if correctly parsed (default: false).-d
or--debug
to enable debug mode, more verbose (default: false).
The tool mostly generates the filter
part of the Logstash configuration. The input
and output
sections are copied from the input.conf and output.conf files that you can customize.
Note: the filter-url.conf file adds a section at the end of the Logstash configuration to deal with urls. The filter-ua.conf parses user agents. Both files can be customized, partially commented... In particular, the user-agent parsing can be resource intensive.
You can grab the logstash-[device].conf
file (or custom name you defined) generated by this script.
When the check-config
flag has been activated, this configuration file is automatically tested by Logstash. The output of Logstash can be checked in the output-logstash-[device]-configtest.txt
file that is created in the same directory than the rsa xml file input.
RSA Netwitness Log Parser is the piece of software ingesting data in the Netwitness platform. It comes with a nice UI (see the user guide). Elastic also provides 2 ways to ingest data into Elasticsearch: Logstash - as an ETL - and the Elasticsearch ingest pipelines. This tool focuses on Logstash, as a way to ease ingest (capturing data via syslog, files, etc and writing to elasticsearch or other destinations) but the plan is to port this tool to the Elasticsearch ingest pipeline (leveraging Filebeat as syslog termination).
The syntax of the XML configuration file is specific to RSA and falls into 2 parts mainly:
- headers, describing headers of logs, capturing the first fields that are common to many types of messages. These headers then point (using the
messageid
field) to the appropriate message parser - messages, parsing the whole log line, extracting fields, computing the event time (
EVNTTIME
function), concatenating strings and fields to generate new ones (STRCAT
function), setting additional fields with static or dynamic values, etc
In both, the content
attribute describes how the log is parsed. The syntax supports alternatives {a|b}
, field extraction <fld1>
and static strings.
The transform.py
module does the core of the conversion by reading this content line character after character and computing the corresponding grok pattern.
The whole idea of the grok pattern is to capture fields with any character but the one after the field. For example, <fld1> <fld2>
in RSA will result in ?<fld1>[^\s]*)[\s]+(?<fld2>.*)
in grok. Note that the [\s]+
in the middle is quite permissive because many products use several spaces to tabularize their logs. The -s
flag can be used to change this behavior to strictly match the log according to the exact number of spaces in the RSA configuration. This flag will replace the [\s]+
by a simple \s
.
RSA can also handle missing fields when reading specific characters. For example, this RSA parser <fld1> "<fld99>"
will match both aaa "zzz"
(where fld1='aaa') and aaa bbb "zzz"
(where fld1='aaa bbb').
The -a
flag will let the user input specific characters that will serve as anchors, so that when they are found, the grok will jump over the unexpected fields. Using the above example, the grok will look like (?<fld1>[^\s]*)[\s]+(?<anchorfld>[^\"]*)\"(?<fld99>[^\"]+)\"
. Please note that we are adding a anchorfld
field to capture the possible characters before the anchor, so for aaa bbb "zzz"
, the anchorfld
field will only have 'bbb'). Which is what you would expect I think ;-)
Note: dissect (see documentation) is faster and easier to read but doesn't support alternatives. Could be an improvement though.
RSA uses specific field names in the configuration files that map to meta keys, as described here.
Elastic also defined a set of meta fields called ECS, see documentation.
The rsa2ecs.txt
file is used to map RSA meta fields to ECS naming (as well as field types).
There are still a few ideas to improve this rsa2elk:
- for content lines that don't use alternatives, generate a dissect instead of a grok
- input a custom
ecat.ini
(RSA customers) - input a custom
table-map.xml
andtable-map-custom.xml
(RSA customers) - support additional custom enrichment with external files (RSA customers)
- generate the Elasticsearch index mapping (template) based on the ecs map
- use the
DEVICEMESSAGES
in the XML file to set the device name and group - port this converter to Elasticsearch ingest pipeline (see documentation), specially since Elasticsearch 7.5 added an enrichment processor
- Vincent Maury - Initial commit - blookot
This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details
- First things first, I should thank RSA for sharing such content and helping the community with great resources!
- Many thanks to my Elastic colleagues for their support, in particular @andsel, @jsvd and @yaauie from the Logstash team, as well as @webmat and @melvynator for the ECS mapping
- Thanks also to my dear who let me work at nights and week-ends on this project :-*