Hash-Based Pattern Matching
Hash-Based Pattern Matching
(DDECS)
Abstract—Regular expression matching is a complex task logic, any update of RE set cause dynamic reconfiguration of
which is widely used in network security monitoring applications. the FPGA. Moreover, the new RE set has to be mapped to
With the growing speed of network links and the number of the FPGA by synthesis tools, which is too slow for network
regular expressions, pattern matching architectures have to be
improved to retain wire-speed processing. Multi-striding is a well- security applications.
known technique to increase processing speed but it requires a RE-NFA [7] architecture has been created to avoid synthesis
lot of FPGA resources. Therefore, we focus on the design of new process for dynamic updates of the RE set. Takaguchi [8] de-
hardware architecture for fast pre-filtering of network traffic. signed architecture consisting of configurable matching units
The proposed pre-filter performs fast hash-based matching of and configurable interconnection network (feedback plane),
short strings, which are specific for matched regular expressions.
As the proposed pre-filter significantly reduces input traffic, exact which forms connections between matching engines. Matching
pattern matching can operate on significantly lower speeds. Then engines are dedicated to comparing several input symbols.
the exact pattern match can be done by CPU or by a slow Strings are used to trigger transitions in the automaton and
automaton with a few hardware resources. The paper provides feedback plane is intended to execute transitions and acti-
analyses of false-positive detection of the pre-filter with respect vate the next states. As feedback plane allows to create a
to the length of matching strings. The number of false-positives
is low, even if the length of the selected strings is short. Therefore connection between all matching engines, hardware resources
input traffic can be significantly reduced. For 100 Gb links, the are increased with quadratic complexity. For most real-world
pre-filter reduced the input data to 1.83 Gbps using four-symbol automata, the number of inward/outward directed transitions
strings. to/from each state is small. It means that the feedback plane
Index Terms—regular expression matching, pattern matching, can be significantly reduced. The drawback of this approach
hash function, high speed network, network security
is the limited set of mappable automata [9].
Architectures based on deterministic finite automata (DFA)
I. I NTRODUCTION
[10]–[12] store transition table in memory. Current automaton
Regular expression (RE) matching is widely used operation state is stored in a register and the next state is determined
in network Intrusion Detection Systems (IDS), lexical analy- by register value (current state) and input symbol that are
sers, protocol analysers, and many other network applications. used as an address into the transition table. If the RE set is
With the number of network devices, the speed of network changed, only new content is written into memory. However,
links is continuously growing. Current processors are not the algorithm to construct DFA from NFA has exponential
powerful enough to achieve the speed of high-speed network time complexity and can cause an exponential grown of states
links, because the throughput of one processor core is limited and transitions. The Increase in transitions has a direct impact
to less than one Gbps [1]. Therefore hardware acceleration for on memory requirements.
RE matching is needed. To reduce memory requirements, REs optimization can be
Many hardware architectures are designed to accelerate used [3], [13]. Some architectures combinate DFA and NFA to
pattern matching for IDS against thousands of REs. Several cope with the exponential growth of transitions caused mainly
architectures take advantage of massively parallel processing by dot-star construction in REs [14]. The number of transitions
in FPGAs to map RE matching using finite automata [2]–[5]. reduces Delayed input DFAs (D2 FA) [10]. Several transitions
Prasanna introduced a technique to map NFA to FPGA logic are replaced by one default. Default transition is triggered
[6] were all non-deterministic paths are processed simultane- when there is no standard transition for a combination of
ously and therefore time complexity is linear. the current state and current input symbol. Default transitions
Many applications need an incremental and fast update of are placed in a separate table. The table contains few items
RE set. For example, mitigation of network attacks has to be because each state of automaton can have at most one default
performed as soon as the attack is detected. It means that RE transition. On the other hand, when a default transition is used,
to mitigate the attack has to be uploaded very fast. If RE set is no input character is processed and therefore time complexity
represented by the automaton, which is mapped to the FPGA is quadratic. Using default transitions has to be minimalized
!
to reduce the impact on time complexity. There are more a3 *
a3
*
a1 2 3 a5 2 3
equivalent D2 FA to one DFA thus it has to be chosen the
1 6 6
most suitable one for processing real data [10], [15]. a2 a6 a6
a4
With the growing speed of network links, it is necessary to 4 5 5
increase matching speed. Matching speed primary depends on (b) reduced automaton
(a) original automaton
frequency and the number of input symbols processed per one
clock cycle. The FPGA frequency is limited by the technology Fig. 1: Creation of a automaton for significant strings matching
and increases only slightly over time thus more input symbols
are processed at once.
The first architecture accepting multiple input symbols has Proposed architecture consists of two parts – pre-filter and a
introduced by Brodie [16]. Becchi present general techniques standard automaton. The pre-filter performs matching against
[17], which is widely used to transform NFA or DFA to multi- a set of strings Str(M ). To automaton are sent only packets
striding automata. However, multi-striding causes exponential contains some string from set Str(M ) and the automaton
grow of transitions, so memory and logic requirements are verifies detections from pre-filter. Input traffic is substantially
rapidly increasing. reduced by the pre-filter. Thus, the speed of the automaton can
Pipelined automata architecture [15] consists of parallel be lower to save FPGA resources.
automata directly connected to the shared buffer. As states
A. Hardware Architecture
circulate in the automata pipeline, input data don’t have to
be multiplexed and the complex crossbar is not needed. The The pre-filter consists of a table containing all strings from
architecture is fully utilized and achieve maximum throughput, set Str(M ) and a unit to compute a hash function. The hash
only if every automaton is processing input data. It means that is computed from N input symbols and the result is used as
the input buffer has to store packets for every automaton in an address to the table. Then the addressed item is read from
the pipeline and a large input buffer is needed. Moreover, the the table and compared with input symbols. The comparison
architecture has a long latency, because only one byte of the is necessary to eliminate hash collisions.
input packet is processed within one clock cycle. To increase throughput, S symbols are processed at once
The processing speed can be scaled up also by smart by pre-filter. The search string can start in any of S symbols
pre-filtering of network traffic. As the pre-filtering can have so S hashes are computed parallelly. All hash functions use
false-positive matches of REs, network traffic pre-filtering one shared table to reduce memory requirements (S-times).
utilizes approximated automaton [18]. The approximation of The table has to allow S parallel accesses. In FPGA the table
the automaton significantly reducea the number of states and can be implemented using registers or on-chip block RAM
transitions, which has a direct impact on FPGA hardware memory (BRAM).
resources. However, the approximation of an automaton is
hash2 Table
a computation-intensive task, which takes a long time. The
approximate automaton cannot be used, if dynamic updates hash4
of RE sets are needed. Therefore, we focus on hardware
pre-filtering of network traffic for fast pattern matching with
dynamic RE sets. a1 a2 a3 a4 a5 a6 a7 ...
!
= 11
... ·107
... Arbiter1 BRAM1 2
Accesses
= 12
...
Addr
= ... 1
Result
HASH1
dec1 21
Collisions [%]
4
REG1
·10−2
Collisions [%]
Queue
3
2
1 https://www.cesnet.cz/
!
!
TABLE II: Comparison of prefixes and significant strings method
Regular expressions Prefixes Significant strings
Protocol Matches Passed [%] Strings Matches Passed [%] Strings Matches Passed [%] Improvement [%]
DNS 15301 1,2335 12 1240443 100 1 30018 2,4199 97,58
FTP 0 0 3 3396 0,2738 1 846 0,0682 75,09
HTTP 6206 0,5003 2 30138 2,4296 4 17175 1,3846 43,12
IMAP 4 0,0003 4 12672 1,0216 2 49 0,0040 99,61
POP3 3 0,0002 2 88 0,0071 2 58 0,0047 34,09
SMTP 53 0,0043 2 3396 0,2738 2 3105 0,2503 8,57
SSH 1707 0,1376 1 3722 0,3001 1 2094 0,1688 43,74
IV. P REFIX LENGTH ANALYSIS TABLE III: Number of prefixes depended on length
Prefix Snort L7 L7
length backdoor Great Good
The set of search strings Str(M ) can be constructed as 1 49 55 16
a set of all prefixes of length N . These strings are not the 2 169 112 21
most appropriate but creating is very easy. The precision of 3 297 169 27
4 446 256 37
the pre-filtration depends on the length of prefixes. Longer 5 641 411 52
prefixes have less false-positive detections, on the other hand, 6 908 670 75
computing of hash is more complex, in the table have to be 7 1270 1072 107
stored longer strings, comparators compare longer strings etc. 8 1763 1644 155
9 2446 2437 226
Therefore, it is necessary to find an optimal prefixes length. 10 3383 3488 328
The graph in Fig. 8 shows the percentage of false-positive
detections with different length of prefixes, the percentage of
pattern matching using regular expressions is 0.48 %. The test V. S IGNIFICANT STRINGS
was performed on RE for L7 filter2 marked as “Great” (without
For some REs prefixes cause a lot of false-positive detec-
a rule for DNS).
tions. RE for detection DNS protocol from L7 filter rules starts
with several arbitrary symbols thus one of a four-symbol prefix
100
99.33 is four arbitrary symbols. All packets are marked as potentially
43.56 interested traffic using this prefix, so it causes up to 100 %
false-positive detections.
Error [%]
!
Vivado 2018.1. One BRAM has a capacity for 1024 rows of [3] I. Sourdis, J. Bispo, J. M. P. Cardoso, and S. Vassiliadis, “Regular
four-symbols words, one BRAM is additionally used for the expression matching in reconfigurable hardware,” Journal of Signal
Processing Systems, vol. 51, no. 1, pp. 99–121, Apr 2008. [Online].
input queue. Increase in LUT utilization is caused especially Available: https://doi.org/10.1007/s11265-007-0131-0
by a large number of comparators. the increase can be reduced [4] S. Yun and K. Lee, “Optimization of regular expression pattern matching
by using shared comparators (S comparators). circuit using at-most two-hot encoding on fpga,” in 2010 International
Conference on Field Programmable Logic and Applications, Aug 2010,
pp. 40–43.
TABLE IV: FPGA resources utilization of the pre-filter [5] C. R. Clark and D. E. Schimmel, “Scalable pattern matching for
high speed networks,” in 12th Annual IEEE Symposium on Field-
Multi-striding level Programmable Custom Computing Machines, April 2004, pp. 249–257.
2 4 8
[6] R. Sidhu and V. K. Prasanna, “Fast regular expression matching using
BRAMs fpgas,” in The 9th Annual IEEE Symposium on Field-Programmable
LUT FF LUT FF LUT FF
(table items) Custom Computing Machines (FCCM’01), March 2001, pp. 227–238.
2 (2048) 236 630 595 762 1103 1040 [7] Y. Yang and V. Prasanna, “High-performance and compact architecture
4 (4096) 354 1034 1017 1186 1905 1480 for regular expression matching on fpga,” IEEE Transactions on Com-
8 (8192) 578 1838 1849 2026 3485 2344 puters, vol. 61, no. 7, pp. 1013–1025, July 2012.
16 (16384) 1024 3442 3501 3698 7821 4056 [8] H. Takaguchi, Y. Wakaba, S. Wakabayashi, S. Nagayama, and M. Inagi,
32 (32768) 1952 6646 6877 7034 16397 7464 “An nfa-based programmable regular expression matching engine highly
64 (65536) 3784 13050 13585 13698 24546 14264 suitable for fpga implementation,” 2013.
[9] V. Košař and J. Kořenek, “Dynamically reconfigurable architecture with
atomic configuration updates for flexible regular expressions matching in
fpga,” in 2016 Euromicro Conference on Digital System Design (DSD),
VII. C ONCLUSION Aug 2016, pp. 591–598.
[10] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner,
We have introduced new hardware pre-filter for fast pattern “Algorithms to accelerate multiple regular expressions matching
matching. As the number of FPGA resources grows signif- for deep packet inspection,” SIGCOMM Comput. Commun. Rev.,
icantly with the matching speed and the number of REs, it vol. 36, no. 4, pp. 339–350, Aug. 2006. [Online]. Available:
http://doi.acm.org/10.1145/1151659.1159952
is important to reduce the hardware resources of fast pattern [11] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, “Fast
matching architectures. Therefore, proposed hardware pre- and memory-efficient regular expression matching for deep packet
filter uses only RE prefixes and specific substrings of REs inspection,” in Proceedings of the 2006 ACM/IEEE Symposium on
Architecture for Networking and Communications Systems, ser. ANCS
to identify packets, where REs can occur. We can see in ’06. New York, NY, USA: ACM, 2006, pp. 93–102. [Online].
results that even short strings can be used in the pre-filter to Available: http://doi.acm.org/10.1145/1185347.1185360
decrease the amount of input network traffic significantly. For [12] M. Becchi and P. Crowley, “A-dfa: A time- and space-efficient dfa
compression algorithm for fast regular expression evaluation,” ACM
strings of length four bytes, the pre-filter was able to reduce Trans. Archit. Code Optim., vol. 10, no. 1, pp. 4:1–4:26, Apr. 2013.
input traffic to 1.83 %, which is approximately 1.83 Gbps for [Online]. Available: http://doi.acm.org/10.1145/2445572.2445576
a 100 Gb link. Then the exact pattern matching can be done [13] C. Lin, C. Huang, C. Jiang, and S. Chang, “Optimization of pattern
matching circuits for regular expression on fpga,” IEEE Transactions
by CPU or by slow automaton directly mapped to the FPGA. on Very Large Scale Integration (VLSI) Systems, vol. 15, no. 12, pp.
As only short strings are matched at high-speed and exact 1303–1310, Dec 2007.
pattern matching is provided by CPU, hardware resources are [14] J. Kořenek, “Fast regular expression matching using fpga,”
Information Sciences and Technologies Bulletin of the ACM
reduces significantly comparing to the high-speed exact pattern Slovakia, vol. 2, no. 2, pp. 103–111, 2010. [Online]. Available:
matching. Moreover, as short strings can be determined from http://www.fit.vutbr.cz/research/view pub.php?id=9511
RE set very fast, the proposed hardware pre-filter can be easily [15] D. Matoušek, J. Kubiš, J. Matoušek, and J. Kořenek, “Regular
expression matching with pipelined delayed input dfas for high-speed
configured on runtime and supports dynamic updates of RE networks,” in Proceedings of the 2018 Symposium on Architectures
sets. for Networking and Communications Systems, ser. ANCS ’18. New
York, NY, USA: ACM, 2018, pp. 104–110. [Online]. Available:
http://doi.acm.org/10.1145/3230718.3230730
ACKNOWLEDGEMENTS [16] B. C. Brodie, D. E. Taylor, and R. K. Cytron, “A scalable architecture
for high-throughput regular-expression pattern matching,” in 33rd Inter-
This research has been supported by the project Smart national Symposium on Computer Architecture (ISCA’06), June 2006,
Application Aware Embedded Probes, project number pp. 191–202.
VI20152019001, granted by Ministry of Interior of the Czech [17] M. Becchi and P. Crowley, “Efficient regular expression evaluation:
Theory to practice,” in Proceedings of the 4th ACM/IEEE Symposium
Republic, and Brno University of Technology grant under on Architectures for Networking and Communications Systems, ser.
number FIT-S-17-3994. ANCS ’08. New York, NY, USA: ACM, 2008, pp. 50–59. [Online].
Available: http://doi.acm.org/10.1145/1477942.1477950
R EFERENCES [18] M. Češka, V. Havlena, L. Holı́k, O. Lengál, and T. Vojnar, “Approximate
reduction of finite automata for high-speed network intrusion detection,”
[1] M. Becchi, C. Wiseman, and P. Crowley, “Evaluating regular expression in Tools and Algorithms for the Construction and Analysis of Systems,
matching engines on network and general purpose processors,” in D. Beyer and M. Huisman, Eds. Cham: Springer International
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Publishing, 2018, pp. 155–175.
Networking and Communications Systems, ser. ANCS ’09. New [19] F. Yamaguchi and H. Nishi, “Hardware-based hash functions for network
York, NY, USA: ACM, 2009, pp. 30–39. [Online]. Available: applications,” in 2013 19th IEEE International Conference on Networks
http://doi.acm.org/10.1145/1882486.1882495 (ICON), Dec 2013, pp. 1–6.
[2] R. Sidhu and V. K. Prasanna, “Fast regular expression matching using [20] J. Deepakumara, H. M. Heys, and R. Venkatesan, “Fpga implemen-
fpgas,” in Field-Programmable Custom Computing Machines, Annual tation of md5 hash algorithm,” in Canadian Conference on Electri-
IEEE Symposium on(FCCM), vol. 00, 04 2001, pp. 227–238. [Online]. cal and Computer Engineering 2001. Conference Proceedings (Cat.
Available: doi.ieeecomputersociety.org/10.1109/FCCM.2001.22 No.01TH8555), vol. 2, May 2001, pp. 919–924 vol.2.