Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory
Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory
Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory
Outline
Intro to Continuous Query Systems
a.k.a Streaming Databases Relevance to data networks
Performance Comparisons
Observations
Traffic analysis tools are data-type-specific
Flowtools netflow Snort pcap Psad iptables logs
Example datasets:
Sensor data (temperature, traffic, etc) Stock exchange transactions Packets, flows, logs
Example systems:
NiagraCQ (Wisc), Telegraph (Berkeley), SMACQ, etc. Commerical: StreamBase, etc.
Type Model
Stream of dynamically & heterogeneously typed objects
Each object can have different type Types need not be statically defined in advance
pcaplive
Input
==
Stateless filtering
uniq
Stateful filtering
print
Output
Continuous-query optimization:
Executing many queries simultaneously Minimize resource consumption per unit of data input
Maximize data throughput
11
Why is multiple query processing important? Approximately 8 new rules each week
12
sport=80?
ip=x?
Packet Capture
Reporter
14
Snort Approach
[Roesh, LISA 99]
Unique 5-Tuples
srcip=x?
sport=80?
Packet Capture
contains BOO?
Reporter
srcip=y? sport=80?
srcip=*?
sport=80?
15
Counting Approach
[Carzaniga & Wolf, SIGCOMM 03]
Example: 7 Tests
Rules/Queries
(x, 80)
total=2?
sport=80
total=1?
Unique
Sub-expressions
ip=x?
Packet Capture
sport=80?
ip=y?
contains
BOO?
Reporter
16
Packet Capture
sport=80?
ip=y?
contains
BOO?
Reporter
1. Common roots 2. Common leaves 3. Common upstream graphs 4. Common downstream graphs
17
Performance Comparison
Total Constraints
18
Vector Functions
Most optimizations in stream analysis have employed a class of
algorithms that can be characterized as vector functions:
f(x, v ) = f(x, v1), f(x, v2), . Vector version is typically O(1) or O(log n) instead of O(n)
Examples
Set of equality tests becomes a single lookup in a hash-table Set of string matches becomes a single DFA to traverse
Lookup dstport 80 25
dstport==80 dstport==25
X Y
X Y
19
20
21
print srcip, dstip from (cflow where dstport==80 and uniq(srcip, dstip))
Misplaced belief that since SQL is well defined, people can just use it Deeply nested queries make you wish you were merely nested in s-expressions
cflow | where dstport==80 | uniq srcip dstip | print srcip, dstip AND
pcaplive
Input
==
Stateless filtering
uniq
Stateful filtering
print
Output
22
23
Join Models
DFA module
Define a state machine where transitions specified as Booleans on new inputs
SQL style
Example: print running cross-product
print a.ipid b.ipid from pcapfile([email protected]) a, b where a.ipid != b.ipid
24
Usage Experience
Online detection & automated response systems Ad-hoc queries for forensic analysis and data exploration Feature extraction for other software
25
Conclusions
Continuous Queries provide a common query syntax,
software infrastructure, and optimization framework for traffic analysis
26
Conclusions
Performance Analysis:
Counting is preferable when short-circuiting is rare Data-flow out-performs counting when short-circuiting is significant
When breadth of graph is reduced with vector functions, actual IDS workload benefits significantly from short-circuiting
27