Prefix-Based Multi-Pattern Matching On FPGA
Prefix-Based Multi-Pattern Matching On FPGA
I. I NTRODUCTION
Multi-pattern matching is an important task in data min-
ing. The multi-pattern matching algorithm is employed to Figure 1. Prefix matching architecture
search multiple string patterns in a target text to find the
corresponding position of each pattern in the text. This Our main contributions in this work are the hardware
algorithm requires many comparisons for each matching architecture for multi-pattern matching on FPGA and the
window, thus achieving low performance in sequential- analysis of its hardware consumption and latency. We also
execution processors. Several works proposed algorithms for present the experimental results and address future work.
multi-pattern matching on CPU, such as hash-based multi-
pattern matching [1] and multi-string searching based on II. M ULTI - PATTERN M ATCHING A RCHITECTURE
improved prefix tree [2]. However, these algorithms also The multi-pattern matching architecture is divided into
require many comparisons, and the execution times also two parts. The first part is called the prefix matching part.
scale with the number of patterns. The second part is the body matching part. Before presenting
We believe that multi-pattern matching can be accelerated the two parts, we define several concepts as follows:
on FPGA by executing comparisons in parallel. However, • N is the number of patterns.
executing all comparisons in hardware consumes a large • Prefix of a string is a set of the first several characters
number of hardware resources when the number of patterns in the string. k is defined as the length of the prefix.
and the length of patterns increase. Yuichiro Utan [3] • Pattern body is the rest of a pattern after its prefix.
implemented comparisons on FPGA, but for approximate • Matching window is the text window in the target text
regular expression matching. Tomas Fukac [4] used a hash- that is matched with patterns. The matching window will
based pre-filter on FPGA to reduce the input traffic before slide from the beginning to the end of the text. We assume
matching in CPU. In this work, we implement the whole that all the patterns have the same length of L characters.
exact multi-pattern matching on FPGA. We propose to • Prefix window is the set of the first characters in the
compare only the prefix of each pattern with the prefix of matching window that has the same length as the prefixes -
the matching window. These comparisons are executed in k characters.
parallel. If a comparison is matched, the rest of the pattern • Body window is the rest of the matching window after
will be compared with the rest of the matching window. the prefix window. The length of the body window is L-k.
Otherwise, the matching window will be shifted to a new • Input bandwidth is the number of characters coming per
string for the next matching. clock cycles, defined as M.
69