0% found this document useful (0 votes)
24 views31 pages

Yapc Na 2013 Sregex

The document discusses the sregex library, which allows for streaming regex matching and substitution. It uses Thompson's Construction Algorithm to build deterministic finite automata (DFAs) from regular expressions to enable efficient matching on data streams without backtracking. The library includes two regex engines and has passed tests for PCRE and Perl. It has a JIT compiler and is used in the ngx_replace_filter Nginx module.

Uploaded by

headbutt
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views31 pages

Yapc Na 2013 Sregex

The document discusses the sregex library, which allows for streaming regex matching and substitution. It uses Thompson's Construction Algorithm to build deterministic finite automata (DFAs) from regular expressions to enable efficient matching on data streams without backtracking. The library includes two regex engines and has passed tests for PCRE and Perl. It has a JIT compiler and is used in the ngx_replace_filter Nginx module.

Uploaded by

headbutt
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Streamingregexmatchingand substitutionbythesregexlibrary agentzh@gmail.

com
YichunZhang(agentzh)


2013.06.03

Inefficientwebservers,requestbodies andresponsebodiesareprocessedindatachunks.

Weusuallyuseafixedsizebuffereven weareprocessingamuchlargerdatastream.

Backtrackingregexenginessuck.

Thompson'sConstructionAlgorithm comestorescue!

Italsosupportssubmatchcaptures!

DFAscannotfind thebeginningsofsubmatchcaptures withoutmatchingbackwards.

Icreatedthesregexlibrarybasedon RussCox'sre1library.

sregexiswritteninpureC.

sregexincludestwoengines: ThompsonVM&PikeVM.

^ $ \ A \ z \ b \ B . \ c [ 0 9 a z ] [ ^ 0 9 a z ] \ d \ D \ s \ S \ h \ H \ v \ V \ w \ W \ c K \ N a b a | b ( a ) ( ? : a ) a ? a * a + a ? ? a * ? a + ? a { n } a { n , m } a { n , } a { n } ? a { n , m } ? a { n , } ? \ t \ n \ r \ f . . .

Passingalltherelatedtestcases inboththeofficialPCRE8.32and Perl5.16.2testsuites.

# i n c l u d e < s r e g e x / s r e g e x . h > . . . r c = s r e _ v m _ p i k e _ e x e c ( v m _ c t x , p o s , l e n , l a s t _ b u f , & p e n d i n g _ m a t c h e d )

TheThompsonVMhasasimple JustinTime(JIT)compiler targetingx86_64.

TheregexJITcompilerusesDynASM whichpowersLuaJIT'sinterpreter.

Stillalotofimportantoptimizationstodo.

MyNginxCmodulengx_replace_filter isthefirstuserofsregex.

l o c a t i o n ~ ' \ . c p p $ ' { # p r o x y _ p a s s . . . / f a s t c g i _ p a s s . . . # r e m o v e a l l t h o s e u g l y C / C + + c o m m e n t s : r e p l a c e _ f i l t e r ' / \ * . * ? \ * / | / / [ ^ \ n ] * ' ' ' g }

# s k i p C / C + + s t r i n g l i t e r a l s : r e p l a c e _ f i l t e r " ' ( ? : \ \ [ ^ \ n ] | [ ^ ' \ n ] ) * ' " $ & g r e p l a c e _ f i l t e r ' " ( ? : \ \ [ ^ \ n ] | [ ^ " \ n ] ) * " ' $ & g

r e p l a c e _ f i l t e r _ m a x _ b u f f e r e d _ s i z e 8 k

Thankyou!

You might also like