0% found this document useful (0 votes)
85 views21 pages

Efficient Regular Expression Evaluation: Theory To Practice: Michela Becchi and Patrick Crowley

The document proposes techniques to efficiently evaluate regular expressions on different architectures. It combines default transition compression, alphabet reduction, and stride multiplying on DFAs and extends these techniques to NFAs. An FPGA implementation uses one-hot encoding with logic minimization for alphabet reduction and decoding. The techniques reduce memory usage by 98% and increase throughput by implementing stride-2 matching. For ASICs, the techniques could achieve 4-8 Gbps throughput with compressed representations stored in SRAM.

Uploaded by

vanikasturi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views21 pages

Efficient Regular Expression Evaluation: Theory To Practice: Michela Becchi and Patrick Crowley

The document proposes techniques to efficiently evaluate regular expressions on different architectures. It combines default transition compression, alphabet reduction, and stride multiplying on DFAs and extends these techniques to NFAs. An FPGA implementation uses one-hot encoding with logic minimization for alphabet reduction and decoding. The techniques reduce memory usage by 98% and increase throughput by implementing stride-2 matching. For ASICs, the techniques could achieve 4-8 Gbps throughput with compressed representations stored in SRAM.

Uploaded by

vanikasturi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Efficient Regular Expression Evaluation: Theory to Practice

Michela Becchi and Patrick Crowley

ANCS08

Motivation

Size and complexity of rule-set increased in recent years


Snort, as of November 2007
8536 rules, 5549 Perl Compatible Regular Expressions 99% with character ranges ([c1-ck],\s,\w) 16.3 % with dot-star terms (.*, [^c1..ck]* 44 % with counting constraints (.{n.m}, [^c1..ck]{n,m})

Several proposals to accelerate regular expression matching


FPGA Memory centric architecture

Michela Becchi 2/27/2008 11/06/2008

Objectives

Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? Can we apply techniques intended for memory centric architectures also on FPGAs?

Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice

Michela Becchi 2/27/2008 11/06/2008

Target Architectures
Regex-Matching Engine

Memory-centric architectures

FPGA logic

General purpose processors

Network processors

FPGA / ASIC + memory

available parallelism

Michela Becchi 2/27/2008 11/06/2008

Challenges
DFA NFA Memory-centric architectures

FPGA logic
Logic cell utilization Clock frequency

General purpose processors

Network processors

FPGA / ASIC + memory

Memory space Memory bandwidth

Michela Becchi 2/27/2008 11/06/2008

D2FA: default transition compression

Observations:
DFA state: set of || next state pointers Transition redundancy

Idea:
Differential state representation through use of non-consuming default transitions
a s1 b s3 s4 s5 s3 b c s4 s6 s2 c s6 s1 a b c s3 s4 s5

c
a

s2

In general:

DEFAULT PATH

c1
Michela Becchi 2/27/2008 11/06/2008

c2

c3
6

c1

c4

D2FA algorithms

Problem: set default transitions so to


1. 2. Maximize memory compression Minimize memory bandwidth overhead

[Kumar et al, SIGCOMM06]


Bound dpMAX on max default path length O(dpMAX+1) memory accesses per input char Better compression for higher dpMAX

[Becchi et al, ANCS07]


Only backward-directed default transitions (skipping k levels) Amortized memory bandwidth O((k+1/k)N) on N input chars Depth-first traversal at DFA creation

Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2)

vs.

Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n)

Compression w/ k=1 ~ compression w/ dpMAX=


Michela Becchi 2/27/2008 11/06/2008
7

DFA alphabet reduction


[a-z] 3/1 [a-zA-Z]

1
0
[0-9B-Z] A 4/2 [a-zA-Z]

Effective for: Ignore-case regex Char-ranges Never used chars

2
A

5/3

[B-Z]

[a-z]

0 1 2 3 4

3/1

[0-2]

A [B-Z] [0-9] [^0-9a-zA-Z]

0
[2-3] 1

4/2

[0-2]

2
1

5/3

Alphabet translation table

Michela Becchi 2/27/2008 11/06/2008

Multiple-stride DFAs

[Brodie et al, ISCA 2006] Idea:


Process stride input chars at a time
DFA
a:1-8 a 0 b 5 b:2-8 c 6 e 7 f
8/2

DFA w/ stride 2
d b 2 c 3 e d
4/1

[a-f]a [a-cef]a 1 1
[a -f] a

da

1/1

bc

dd

ab 2 2
[b-f]b

4/1

ab

[b-f ]b bc

5 6

Observations:

Mechanism used on small DFAs (1-2 regex) No distinct accepting state handling
Michela Becchi 2/27/2008 11/06/2008
9

Multiple stride + alphabet reduction

Stride s Alphabet s
=ASCII alphabet | 2|=2562=65,536; | 4|=2564~4,294M

Effective alphabet much smaller


Char grouping: [a-cef]a, [b-f]b 2-DFA
[a -f] a
[a-f]a [a-cef]a 1 ab 2 bc 3

da
dd

1/1

DFA
a 0 b

a:1-8 1 b 2 c 3 e 5 b:2-8 c 6 e 7 f d

d
4/1

4/1

ab
0

[b-f]b [b-f]b

8/2

[b-f ]b bc

5 6

Alphabet reduction may be necessary to make stride doubling feasible on large DFAs
DFA alphabet reduction Stride doubling alphabet reduction 2-DFA Stride doubling alphabet reduction 4-DFA

TxTable1

TxTable2,1
10

TxTable4,2,1

Michela Becchi 2/27/2008 11/06/2008

Multiple stride + default transitions

Compression
Default transitions eliminate transition redundancy In multiple stride DFAs
# of states does not substantially change # of transitions per state increases exponentially ( stride ) Fraction distinct/total transitions decreases Increased potential for compression!

Accepting state handling


DFA
a 0 b 5 b:2-8 c 6 e 7 f
8/2

2-DFA
d
4/1

[a-f]a 1 bc 3
dd

1/1

a:1-8 1 b 2 c 3 e d

4/1
0/1

2 0 5

Duplicated states have same outgoing transitions as original states but different depth
Default transition will remove all outgoing transitions from new accepting states
Michela Becchi 2/27/2008 11/06/2008
11

Multiple stride + default transitions (contd)

Problem:
For large and stride, uncompressed DFA may be unfeasible
Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory

Solution
Perform default transition compression during DFA creation
Use [Becchi et al, ANCS 2006] compression algorithm In the situation above, only 10% memory used
alphabet reduction Stride doubling + compression alphabet reduction compressed 2-DFA Stride doubling + compression alphabet reduction compressed 4-DFA

DFA

TxTable1

TxTable2,1

TxTable4,2,1

Michela Becchi 2/27/2008 11/06/2008

12

Putting everything together


1-22 regex 48-1,940 states

DFA

alphabet reduction Stride-2 transformation

||=25-44 96.3-98.5% transitions removed

default transition compression

Stride-2 DFA

Compressed DFA

avg 1-2 labeled tx/state

||=53-470

alphabet reduction

97.9-99.5% transitions removed

default transition compression

Same memory bandwidth requirement Initial size=40X-80X final size

Avg 3-5 labeled tx/state

Compressed Stride-2 DFA


13

Michela Becchi 2/27/2008 11/06/2008

NFA
b

1
a a *

2
b

3 7
*

4/1

1. 2. 3. 4. 5.

5
a

b b

6
b

8/2

ab+cd ab+ce ab+c.*f b[d-f]a bdc

0
b b

9 13 16
d-f
d

10

11
15/4

12/3

14 17

a
c

4/1 5/2

18/5
a * b

3
* f f

1
b

2
*

0 8
e-f d

6
10/4

7/3

a a
c

11
Michela Becchi 2/27/2008 11/06/2008

12/5

14

Multiple stride + alphabet reduction

Stride doubling
NFA
* * b

4/1

Avoid new state creation

5
* c

e cd

6/2

2-NFA
* ab b.

.c
bc

d.

4/1

Keep multiple transitions on the same symbol separated

.a
ac

1
cc

ce ce,e.

5
cc

6/2

Alphabet reduction:
Clustering-based algorithm as for DFA, but sets of target states are compared

Michela Becchi 2/27/2008 11/06/2008

15

FPGA implementation
INIT INPUT klog|| log||
r MATCH

Alphabet Tx
CLK

Decoder

||

NFA

Quine-McCluskey like minimization scheme

One-hot encoding [Sidhu & Prasanna]

S2 ci S3 ck
c1

cm S1 cn
=

S2
S3

S1
cm cn

ci ck

+ logic reduction schemes

c2
-{bBcCdD}={aA}
(c1=b OR c1=B) AND NOT (c2=a OR c2=A)

S2 ci

S1

S2

S3

S1
reset

S1
-{ci,ck}

= S3

S1
ci ck

ci

Michela Becchi 2/27/2008 11/06/2008

16

FPGA Results - throughput


8 7

Throughput (Gpbs)

6 5 4 3

stride 1, full alp. stride 1, red. alp.


stride 2, red. alp.

2
1

0
any_99 mail_79 http_406

Rule-set

Michela Becchi 2/27/2008 11/06/2008

17

FPGA Results logic utilization


4000 3500
#s=7,864 1=64 2=2206 #s=2,086 1=78 2=1,969

3000 2500

# slices

2000
1500 1000 500 0

#s=2,147 1=68 2=1640

stride 1, full alp. stride 1, red. alp. stride 2, red. alp.

any_99

mail_79

http_406

Rule-set

Utilization:
8-46% on XC5VLX50 device (7,400 slices) XC5VLX330 device has 51,840 slices

Michela Becchi 2/27/2008 11/06/2008

18

ASIC projected results


Regex partitioning into multiple DFAs Stride = 1 Memory footprint Rule -set || k-NFA any k-DFA any1 any2 any3 78 59 45 60 #states 2,086 23,846 86,977 14,084 Compressed Full states states 505KB 2.9 MB 299MB 200 KB 55 KB 48 KB || 1969 850 579 627 #states 2,091 28,223 102,940 19,344 Content addressing w/ 64 bit words: -98% states compressed w/ stride 1 -82% states compressed w/ stride 2 Stride = 2 Memory footprint Compressed Full states states 356KB 1.27MB 244KB 32MB 81MB 16 MB

Throughput: SRAM@500 MHz 2-4 Gbps for stride 1 4-8 Gbps for stride 2

Alternative representation: decoders in ASIC or instruction memory

Michela Becchi 2/27/2008 11/06/2008

19

Conclusion

Algorithm:
Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs Extension of alphabet reduction and stride multiplying to NFAs

FPGA Implementation:
Use of one-hot encoding w/ incremental improvement schemes Logic minimization scheme for alphabet reduction & decoding

Additional aspects:
Multiple flow handling: FPGA vs. memory centric architectures Design improvements tailored to specific architectures and data-sets:
Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides

Michela Becchi 2/27/2008 11/06/2008

20

Thank you!

Questions?

http://regex.wustl.edu

Michela Becchi 2/27/2008 11/06/2008

21

You might also like