0% found this document useful (0 votes)
120 views

Maximizing NetWitness Performance

NetWitness SIEM

Uploaded by

Navis Nayagam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

Maximizing NetWitness Performance

NetWitness SIEM

Uploaded by

Navis Nayagam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

© Copyright 2016 EMC Corporation.

Maximizing NetWitness Performance


Sean Ennis – Principal SE, Seattle [[email protected]]

#RSACharge © Copyright 2016 EMC Corporation.


Agenda
• Overall Concept
• Optimizing Usage
– Processing Pipeline
– Feeds and App Rules and Lists, Oh My.
• Optimizing the Datastore (mostly index)
– Database & Data Flow
– Index, Index, and more Index
• Group Aggregation
• Monitoring Performance – Case Study

© Copyright 2016 EMC Corporation.


Overall Concept
Disclaimer: Lot’s of knobs to turn, and RSA tries to minimize the requirement to do so. This
presentation focuses on the most common concepts. If you are having serious performance issues,
please engage your friendly RSA {SE, PS, CS} representative.

It might help to think of performance optimization in 3.5 categories:

Usage Datastore Tuning Capture Tuning


Processing pipeline, best Parser load, aggregation
IndexDB
practices, etc. delays, packet drops, etc.

© Copyright 2016 EMC Corporation.


Optimizing Usage

© Copyright 2016 EMC Corporation.


Query Architecture

Note: 10.4+ uses “partial


results”, so results are loaded
as they come in. Feels
faster.

A query is not complete until all constituent


concentrators/brokers return their results. So, 1 slow
concentrator can ruin the whole party.

© Copyright 2016 EMC Corporation.


Query Architecture con’t.

Takeaways:

- Be aware of how many concentrators your query touches (log only? No need to query packet concentrators)
- Turn on Debug in Investigation
- In multi-site/large environments, consider Brokers to break up into queryable groups

© Copyright 2016 EMC Corporation.


Processing Pipeline
(slightly simplified)
Pre-Processing Post-Processing

Reporting Engine
Decoder Concentrator Lists, Reports, RE
Alerts = Queries

Investigation
Consume meta/
session ranges

Write meta & index

QUERIES
Write packets/logs
Queries

Raw Meta Index


ESA
Meta Aggregation +
Stream Processing

© Copyright 2016 EMC Corporation.


Operator Impact

Note: Optimizations made in


10.5 to underlying logic
engine (OR)

Takeaways:

- Move as much to “pre processing” as possible. App rules & Feeds are your best friend. Results in single keys to query.
- Use feeds instead of Reporting Engine lists whenever possible (RE Lists effectively break up into many logical OR statements)
- Don’t use meta groups with ALL keys open. Break the problem down and open the minimal number to start (every open is a
query)
- Smaller, more specific meta groups.

© Copyright 2016 EMC Corporation.


Ex. App rules
Use case: Very frequently looking for users logging in to certain hosts with admin accounts from a particular
subnet.

Investigator Query (post processing) with no pre-processing app rule:


ip.src=172.16.14.5/16 && user.dst begins ‘adm’ && host.dst begins ‘dc’

Instead, what about creating an app rule to move the processing earlier in the pipeline and create a single meta value
(Admin -> Decoder -> Config -> App Rules)?

New investigator query (or RE rule) to get the same data:


alert=‘tag_abnormaladminlogin’

Note: Could optimize this even


further by using a FEED to
track admin accounts and
critical hosts. This would save
logic processing.

© Copyright 2016 EMC Corporation.


Ex. Feeds vs Lists
Use case: Daily report for traffic to/from a list of critical internal hosts

© Copyright 2016 EMC Corporation.


Optimizing the
Datastore

© Copyright 2016 EMC Corporation.


Databases
Use case: Very frequently looking for users logging in to certain hosts with admin accounts from a particular
subnet.

sessionDB packetDB metaDB indexDB

Query engine
Meta and Packet IDs ValueMap
Raw Log or Packets Metadata
A few stats SummaryDB
PageDB

© Copyright 2016 EMC Corporation.


Databases by Decoder/Concentrator
Decoder Concentrator

sessionDB metaDB sessionDB


(temp)
packetDB metaDB

RAW DATA META DATA

indexDB indexDB
(‘time’ only)

Most important for our purposes are:

packetDB(Decoder)
metaDB(Concentrator)
indexDB(Concentrator) ** Heavily impacts performance

© Copyright 2016 EMC Corporation.


Data Model

Session
Meta Array (stored sequentially), 1-n
Meta ID Start
Meta ID End
Packet Packet Packet Packet
Packet/Log ID 1 4 12 127
Start

Or
Log 1

© Copyright 2016 EMC Corporation.


Indexing – the DB

- One file per key, per slice


Value- - Contains all unique values seen during that period (up to the
maps
defined valuesMax)
- For each value, there’s a link to the summaryDB

- For each unique value, contains various counts/stats and a


link to the pageDB
indexDB summaryDB

- Compressed storage of session IDs


pageDB - Used to locate actual sessions containing reference to meta
key/value.

© Copyright 2016 EMC Corporation.


Indexing – the DB, con’t

© Copyright 2016 EMC Corporation.


Indexing – the DB. SLICES.
- NW holds the current slice in
memory (fast) but needs to flush
(save) to disk after a period of
time OR number of sessions.
- Pre 10.5 = scheduled job to save
every 8 hours slice = /var/netwitness/concentrator/index/managed-values-X
- Post 10.5 = save every
600,000,000 sessions
- Note: If upgraded through 10.5,
the default 8 hour schedule
persists. Fresh 10.5+ installs
default to session count saves.

Value
-
maps

summ
indexDB aryDB

pageD
B

© Copyright 2016 EMC Corporation.


Indexing – Optimizations
(1) Index at the right level

IndexKeys: Optimized for exists/!exists condition


IndexValues: Optimized for search/comparisons of actual values
IndexNone: Key defined, but no index

If a key needs to be searched often, you likely need IndexValues.

In investigator, you can still manually query values where index level =
IndexKeys but it will be SLOW.

Note: For Reporting Engine


rules, meta in the “WHERE”
clause (not “SELECT”) must
be indexed at some level.

does not need


to be indexed

PSA: Do NOT index “msg” !! must be indexed

© Copyright 2016 EMC Corporation.


Indexing – Optimizations
(2) Keep the slice count LOW (~200-300?)

- Install >= 10.6, defaults to 600M slices


instead of time.
- Install <= 10.5, defaults to 8 hours –
must change setting & remove scheduled
job (concentrator -> files -> scheduler).

Any low volume devices initially installed @ 10.5 or


earlier?
- 1 slice ever 8 hours. 300 days of metadata =
~1000 slices = SLOW.

© Copyright 2016 EMC Corporation.


Indexing – Optimizations
(2) Keep the slice count LOW (~200-300?) (con’t)

What can you do if slice count = high?

1) Age out data for low volume devices if you can.


Timeroll on metaDB will truncate the index on 10.5+ after next index save point.

2) Orphaned slices? Open a support ticket - delete the files.

3) >= 10.6, make sure slicing is configured by session count and


Remove time-based slice save schedule.

© Copyright 2016 EMC Corporation.


Indexing – Optimizations
(3) # unique values per key, per slice < valueMax

If # unique values for a key in a slice > configured valueMax, that value becomes unsearchable.

<key description=“ACME Location" format="Text" level="IndexValues" name=“acme.loc" valueMax=“5“/>

Value Atlanta New York Seattle LA Cleveland Miami Chicago


slice1 sessionIDs
1-5,21 6,7,50-51 8,24 11-16,18 25,27,28 29-32 45
with value

Value Seattle Chicago LA New York Cleveland Atlanta Miami


slice2 sessionIDs
76,77 79, 81 85-90 82, 90-92 83-84 86 99,101-103
with value

Query> acme.loc = ‘Miami’ Result Session IDs = NIL


Query> acme.loc = ‘Chicago’ Result Session IDs = 79, 81

© Copyright 2016 EMC Corporation.


Indexing – Optimizations
(3) # unique values per key, per slice < valueMax (con’t)

So how do you check? Index inspect/language queries (API)

(1) Check config for value X (2) Check current slice (or all) to get # unique values for a key

alias.host
406/2,500,000 = OK.

Note: There are some user-


generated scripts to
automate this. Check with
your local SE.

(can also check index-concentrator.xml and


index-concentrator-custom.xml files)
© Copyright 2016 EMC Corporation.
Indexing – Optimizations
(4) Index Age

Prior to 10.5, nothing cleaned up old index slices.


Result: Index Age > Meta Age (no point having an index for data that doesn’t exist)
Issue: With time-based slicing, this means more slices = more overhead = slower queries.

Note: Other problem is when


Note: 10.5 and later – index index age < meta age = Un-
timerolls with the metadb so queryable data. Too much
this is not an issue indexing?
© Copyright 2016 EMC Corporation.
Group Aggregation

© Copyright 2016 EMC Corporation.


Group Aggregation
https://sadocs.emc.com/0_en-us/088_SA106/100_Dep/20GrpAggreg

- Effectively multiplies compute for queries


- Concentrators SPLIT the sessions between
themselves (NOT HA)
- Fewer sessions per concentrator given the same
amount of ingest

N:M relationship.

Most common group is 2 Concentrators -> 1 Decoder.

© Copyright 2016 EMC Corporation.


Monitoring
Performance
A Real World Study

© Copyright 2016 EMC Corporation.


Case Study – Noname Inc.

Symptoms:
1) Analysts: “We can’t use the
system – it’s too slow”
2) Reports timing out (blank
reports in the morning)
3) Inconsistent reporting
against meta keys (gaps in
data where certain values
should exist)

3 x Log Decoder/Concentrator Stacks


3 x Packet Decoder/Concentrator Stacks
1 x Global Broker
2 x Type Broker (1 x Log, 1 x Packet)
Packet Requirements: 30 days of metadata, 7 days of raw
Log Requirements: 60 days of metadata, 60 days of raw
© Copyright 2016 EMC Corporation.
Case Study – Noname Inc.

Checklist
Query time statistics (query distribution) + analysis

Configure app rules for common queries

Check Reporting Engine Config


Symptoms:
1) Analysts: “We can’t use the system – it’s Check index slices
too slow”
Check index age (vs meta age)
2) Reports timing out (blank reports in the
Check index depth/configuration
morning)

3) Inconsistent reporting against meta keys


(gaps in data where certain values should
exist)

© Copyright 2016 EMC Corporation.


Query (in)Sanity - topQuery

- Most useful build in 10.6 (part of NwConsole – rpm can be


installed standalone on any CentOS host and pointed at live
NW stack)
- Run against query logs or direct live API call
- Many options to narrow the range, query type, etc.

Returns the poorest performing queries based on


overall execution time for both Investigation
(SDK-Values) and RE (SDK-Query)

Query time distribution of result set

(CLI) > NwConsole -c login concentratorIP:50005:[ssl]


admin netwitness -c topQuery days=7 top=20

© Copyright 2016 EMC Corporation.


Case Study – Noname Inc.
topQuery Results

© Copyright 2016 EMC Corporation.


Case Study – Noname Inc.
topQuery Results

# 781001 audit 2016-Oct-10 21:48:27 SDK-Values User admin (session 1390049, 192.168.1.212:60144) has finished values
(channel 1390059, queued 00:00:00, execute 00:00:05, 192.168.1.213:50005=00:00:00 192.168.1.215:56005=00:00:05):
id1=9877205 id2=254187287 size=15 fieldName=ioc.malware where="(time='2016-Oct-10 21:20:00'-'2016-Oct-10 21:29:59') && (ioc.malware
exists)" flags=sessions,sort-total,order-descending threshold=0/sdk values id1=9877205 id2=254187287 size=15 fieldName=ioc.malware
where="(time='2016-Oct-10 21:20:00'-'2016-Oct-10 21:29:59') && (ioc.malware exists)" flags=sessions,sort-total,order-descending
threshold=0

Broker query time – only as fast as it’s slowest Concentrator 1 Concentrator 2


concentrator

Observations (from real environment, not above):

1) Terribly inefficient queries (multiple contains, regex, begins, logical statements)


2) Slowest top level queries for log data (most of the reports were log-based) showed 1 of 2 things:
- The same log concentrator always responsible (DC1LC1)
or
- A packet concentrator was responsible

© Copyright 2016 EMC Corporation.


Case Study – Noname Inc.
Index Slices & Index/Meta Age
slices.total vs slices on disk (file count) meta age vs index age
2500 450.00
400.00
2000 350.00
300.00
1500
slices (on disk) 250.00 index age days
200.00
1000 slices.total meta age days
150.00
500 100.00
50.00
0 0.00
DC1LC DC2LC1 DC3LC1 DC1PC1 DC2PC1 DC3PC1 DC1LC DC2LC1 DC3LC1 DC1PC1 DC2PC1 DC3PC1

(API) https://concentrator:50105/index/stats/slices.total (API) https://concentrator:50105/index/stats/time.begin


(disk)> find /var/netwitness/concentrator/index -mindepth 1 -type d | wc -l (API) https://concentrator:50105/database/stats/meta.oldest.file.time

Observations: Corrective Actions:


1) Too many slices on disk: DC1LC1, DC2LC1 1) CRON job to timeRoll MetaDB (10.5 should also roll index) –
2) Disparity between API reported value and slices on disk: DCLC1, consistent across all devices
DC2LC1, DC3LC1 2) Clean-up/Delete Old Index slices (delete from disk)
3) Index age > Meta age on DC1LC1 (and both are much larger than 3) Remove scheduled task for time-based slicing, use the session-
business requirement) count config.
4) Index age < Meta age on DC2LC1 = ~100 days of meta that isn’t 4) Engage Customer Support (re-index might be needed here)
queryable
© Copyright 2016 EMC Corporation.
5) Packet stacks all look good.
Case Study – Noname Inc.
Index Depth/Configuration

Observations:

1) Lines up with the “Data is missing” complaint. Low alias.host max values, ip.dst randomly restricted to 10,000 on DC2PC1
2) Note (not shown) – DC3LC1 had a HUGE index defined. Many unnecessary IndexValues and large ValuesMax = Too much data in the
index, space filled up before metaDB did.
** This was done due to misunderstanding of the reporting engine. Only meta in the “Where” clause must be indexed, not the
“Select” clause.

Corrective Actions:

1) Full index review (remove unnecessary indexes, remove completely unique indexes like ‘msg’, increase valuesMax for alias.host
2) Make sure indexes are consistent across like-decoders

© Copyright 2016 EMC Corporation.


Case Study – Noname Inc.
Reporting Engine Configuration – Careful where you point that thing.

Observations:

1) Every single report, whether log or packet, was pointed at the Primary Broker
2) Log reports were timing out mainly due to packet concentrators taking a long time to respond to the query !!
3) Many, many inefficient queries, using lists when feeds would be better, etc.

Corrective Actions:

1) Go through each report, point log reports at log devices, packet reports at packet devices
2) Fixed overlapping report ranges (eg. weekly reports asking for 30 days of data)
3) Moved as much logic to app rules as possible, moved most (but not all) lists to feeds
© Copyright 2016 EMC Corporation.
Case Study – Noname Inc.
After things got happy again.

Meta timeroll

Index slice ->


session cnt

Queries -> app rules

Reports ->
split log/packet

© Copyright 2016 EMC Corporation.


Please Complete Session Evaluation

#RSACharge © Copyright 2016 EMC Corporation.


#RSACharge
© Copyright 2016 EMC Corporation.

You might also like