Regular expressions : What is a regular expression?, sequence characters in
egular expressions, quantifiers in regular expressions, special characters in
regular expressions, using regular expression on files, retrieving information
from an html file,‘Chaptor2: Regular Expressions IO
2110218
2a
24.1 Regex Functions/Methods
22 Sequence Characters in Regular Expressions.
23° Quantifiersin Regular Expressions
24 Special Characters in Regular Expressions.
25 Using Regular Expression on Files.
26 Retrieving Information from an HTML File.
‘Chapter 3; Threads in Python 3-1 103-16REGULAR EXPRESSIONS
i, sequence characters in regular expressions, quantifies in
isa regular expresso
lar expressions, using regular expression on files, retieving|
gl expressions: What
reguat expressions. special characters 19 eg
afrmaton from anh fl.
2.1 What is a Regular Ex
reson sa powefltol intext processing tolbox Repl exresson haa parser which mashes
voona test Ase know thee are mutpefles on command line othe regular expression specifies
+ Rep
suring: wt
‘ing t mateh may diferent possible le names
reuse" and? Wildcard characters We can we these characters with dr command in windows ands command
ene
Forexampe mya
rewiring the fs andthe directories present inthe mydi directory. The *s known a wildcard
«There sretwo main dferences between regular expression anda simple wildcard
1. Aregularexpesson can match moliple ines anywhere inalnger tring.
2 Ascompareto simple wildcards regula expressions are much more complicated and much ricer
+The important pont about regular expression i, regular expression the string always matches isl. For
example if there sa patern’AAK; wil always mate tslfnPQEAAAPQR: In short we are nding strings
ther sree.
+The wildcard (ot) matches any character in string For exmpleAA wll match the strings AAA or 'ABA
orn.
+The problem arses withthe dot wildcard when we have to find the something with dot Le.'AA: Regular
expression makes you to escape special characters by adding the backlash infront of them. For example to
rmtch’AA’ and only’AA’ se the pattern 'A\A; the speci meaning of otis iminated,
‘+ Pyton uses backlash for escape sequences, Ike ‘\ Is eeriage Feturn and “We Is tab character. So the
aclstath ll bea problem in normal string processing, To aol his problem regular expression are specified
‘ae raw tinge or youtackan'e” ont the front ofthe string constant, and then Python treats them special.
‘+ Sofiralipwe match AX’ by spelvingthe pattern #7A\A™dvaeed tan gang MU IS Can 2 ep Eprrsig
“re module”
1+ There module i used to support the regular expression n python. The reeror exception alse 3 era,
encounter white compling the regular expression. There module has tw Important functons
1 Marg)
2 Search)
ye asl things fist ike varius characters and thelr meaning whe,
reson
Before studying his function, we lear
‘we te them in regular expression, We wll us he Raw strngsa
‘Types of Regular Expression
“There are two yes of regular expressions
Base roger oxresion
2) eres equa expressions
Fig. 2.1.1: Types of Regular Expression
1 Basie regular expressions
‘Te hale regular expression are \2\%\U\)AC\.sed. ep Inthe base regular expression the backlash
required for 0,0. E6\(\)
2. Extended regular expressions
The extended regular expressions are (| ()- Fp. egrep, awk In extended regular expression the bacslss
isnot required for 2nd 0.
2.1.1 Regex Functions/Methods
Here, weare going to discuss the ollowing regular expression methods.
fallmatch)
finda)
sub
sping
“The match fenton canbe used to check the given patter at the start of the target string Ifa match found
willbe given a Match abject otherwise we wil be given None.
o_oeG Feme Ran ten ink
fe mend)
2 fatimateno
‘The fliatch function canbe used to match pattern to the entire tage sng Le. he enie sting shoud
‘match the given pattern, This function returns 2 Match objet If the entre string was matched otherwise returns
a aaa
‘it eon no ip
BEETS: c:/oner/ueove/Agpata/tocah/nipgram/Python/tyton3?/tutimtch.py =_
3. searcho)
‘Te sare function can be used to find the ven pattern nthe target string I the match is found, the Mary
ye match is not aalable
jects returned, which represent he int occurrence ofthe math tretuens None
sai", mea)
[a Pent
eteaees
TB en 3775p
Fie Eda Shel Ong Optont_Vindow Hep
See al
5. finditerd
Returns a match object for each match nthe erator. We can use the start) end), and group() functions *
‘each match object. This method hasbeen used in many examples since the beginning ofthe chapter.Advance ton aging MU A5 co 2s especie
ate py CoAneVAINOVOMGpOAUN ACHP gen/PheNyheNTIndte.. =~
Fie dR Famet fun Option Window Hp
tere. finditer("[a-z)","albseSka2") :
“print imratart (0,7... Tomand(0s*...75 megroupth)
6 bo
Subst or replacements what sub means.
‘esa (repex replacement target string)
‘very matched pattern inthe tage string wl be replaced with the provided replacement
Fie fan Format Run _ Opi
feubU" Care)", tatbcskt2") i
Punter
= Seer
ony:
"ython37/ sub. py 4
rstsaee .
ees cate
2. subg
les dental to sub with the exception that can also return the numberof replacement. This function returns
‘tuple with the res string a5 the st element andthe number of replacements asthe second (result ting
umber oepacemen).
ae) Fomings CAneviOvOApCaatsclPapamaontsomaTaanrrarT =~ |
Fe Eat Fort Ran Optom Window Hep
eres guba("[22)"y Fy Merb Hes icahy
print (e}
Print ("ihe Ress + eto)
Print (The tobe atenente:t, ef)
outpat:
[a Pytnon 275m
Fle fot Shel Debug Options Window Help
Dyenons/7aunn py
(evesesestt, 5)
me Result string: #7#9#s#08
Bae pmber of replacemence: 5
8. spit
‘The split function should be wsedifwe want to split the given target string according toa speci patter. Ale
fall tokens is returned by this function.
eine (e) ,
output:
[2 Pen dT
Fle_Edt_ Srl Osbug Options Window Hep
9. compile
Regular expressions are compile into pater objects, which have methods for searching for pattern mat?
and tring substitutions, among othe things
—savant ramming MUS co
Sep he ten ntPapr eel ape oo
‘oatpat:
Demat ae
Fle Edt Shel_Debug Options Window Help
‘PYCHONS 7 COMDIIE-BY aA
[ety tat, tat, fat, tety tat]
>>
‘Sequence Characters in Regular Expressions
speci sequences do not mate forthe actual characterin the string instead they specy where the match mst
ccurin the search tring Iimakes tesla wrt pater that re frequently used.
Character | Description
\A__|rreturs match ithe specie characters ae present at the begining of he sting.
WW | feretrns mate the specie characters are present the being or the end fhe string
\e | teretuns match the speci charactersare presenta the begining ofthe string but not tthe
\e_| return match ithe string contains digs (0).
0 | teretuns matchfthe string doesn't contain the digs (0)
\s__| tereturnsa imate the string contain any whit space characte.
\8__| teretunsa mache sing does'tcontain ay whitespace character
we [trea mate ifthe srg contin any word characters
w | teresa mate ithe string doesnt contain any word
\z | Returns machi the spc characters are atthe ond ofthe rng
————————
‘Dapoli Education Society's
Corte
ea
SSbrGry Acoassion NS 1107Advanced Pos Prgcammig (MUI. 2s pint
‘Example 2.21 : Use a speci sequence wilde a egex pattem fo nda aphanamerchraterin he sha 8M sey
‘sequence W side regex ten of the nor-apharumer tacts
a
he 66 fet tan Cpe vinden hep
4 \w so match alt atphanumarie chasactere
Feit = te. finaatl (enw eerget eee
fine (romete)
[faniat’s"ze‘fonanil Gvleny eazget_ete)
|ptistieeeaity
rman
eee
10s Teh tats tat, ay fal, tay
[Paes ‘nelly")
eet, ei
bf
‘Example 222: Use a epecil sequence \ nie a regex pte to find al whitespace charactor in our tet sg
spacial saquarc \S ise a regex patted af the NON espace character
[Dice cenoincipimtuaimapanmaentieni B
net fam fan Op nn He
eazget ste
1 \e to maten any whitespace
‘anget_ste)
prin (ceeui®)
16 to maten nen-vhiteepace
rectindald (oss target_At)
print (cenalel|
FTcessre ai) muttipie white-spaces with single space
eit s revemlevter"y "oy sence ad ”
| Sinetressto)
iAdsacr yon Progra MU SSC 2 23
Supers
Demin aa]
|
Fie Et Sek Oebig Opts iow Hep
RESTART: C:/Users/LENOVO/appData/tocal/Prograns/Python/Python3T
Tey Nee Met, td
[rats tel, tet, teh, fat)
['gessa', ‘and', ‘Relly']
|sessa and Relly |
“Erampio 223 Use a oped Gharacteaqvace nso aep84 palo 10 Vou aman ie epocd charac ae
prevent atthe begining a he end of he ng. sas Bo rete peti carcrs we eset
thebageningof esting utr atte ends
pa Esifinanatienvsssa eae
Output:
Yes, there is at least one match for x!
No match for y‘The form ofa quater i (on) where m and mare the mlnimum ad maximum times thatthe ees,
‘which the quantifier apples must match, Quant can be used to specify the numberof occurrences that shoul,
matched.
a = |
[ee nearer
a Saneupee
7 | earcrarnr ene arr [ Pater
7 sartrertn esr [2nd ent nee
[owen ttn ere) eer re a 0) ran
crater ‘sour mune tng 0
Gen ererveesandvenm | LA es me et =
and ta maximum of four.
= aera quantifier makes ita reluctant
quant. It tries to find the smallest
‘match. This makes the regular expression
sop atthe fst math
Example 23.1: Mate ha goon character wth he gen sng art 7
Beant p)-CrnewENOVOrAGpOHaNacArogImUPonyeNTi¢. — CX
Fe tat Format fn Options Window Hep
Jsatcher=re-finditer("2", "abaaceahnbaaab")
match» matcher:
print (match. start (),"......"ymatch.group())vance ton Progamming (MV BC con 2a
Example 2.22 : Check ibe occurence of he ghen cherie gen ng
umber) CtbewtENOVOiAppMecPegamahonFybonTQue. — Ox
for match in matcher: |
print (match. start (J,
TB rye 7 er
Example 239 Chece oxcoerc lhe gon aati onn oor
DD geese RES - Cire NOVA AppDatLncal Program thor Pyth - a
fie_Edt_Fomat_fun_Options_Window Help He
impert re ere a}
matcher=re. finditer ("a+", "abaabaaabaaaasaa") f
for match in matcher: a
print (match. start (),"......"ymateh.group ()) Z
Seeerer = tnd CED,
(SSromatch in matcher:
| Print (match start (),"ee a
BW certrten gammy MSE Comp) 22 fap teri
Output
a Bron snet =e x
2.4 _ Special Characters in Regular Expressions
‘The Table 241 shows the spec characters in the regular expression,
‘wa
‘Special Character | Description a
(ge ur eet
+ |e srs
+ [sets srin
Match any character except a newine
* Match eo or more quater
. Match one or more quantifier
Indicates range in character class (Uke 22)
Sabsitte complete match
Grouping characters
Character lass to match a single character
Range quantifiers
ele le lo le
Anchors that spec left or ight word boundary
2 Match ero or one quantifier
i Species series of alternatives choles
oe
“Example 241 We can use" oymba fo ave # he ve gp sag starts wh te patom we proved nt te we
sting bois wit Lear, Match ibe returod thors, None wl be ered
SSngvmend hon Progen MY Cc ay a agen
reeera.sareh("AFot 3)
=
fe erie ces tetera i
eee
rere)
fet Set ony penn Hy
jryCuousrrcazgee Sering Suare WEA
Target String Not starts with Fast
>>)
‘Example 242 We can use Be sybase gn are sing aces cur Fae ot He ret lg eres
‘nay Mah wb rome there None wil eer.
resereearchEany, 3)
oye", a)
String ande vith Basy")
1g Wot ends with Easy") :
(B ython 3:77 She - Oo x
Fle Est She Debug Options Window Help
PYCHOUST/ vacgeL NCE LLY SHU Wabi -¥=BY .
Target string ends with Easy aqt
‘AZanao9,
2a
smple 2.4.3: Proramtocheck haa ting conan oye et a hac (th cas
a
ie Ede Frnt Ron Opens nde Hep
hare = re,compile(:!(*3-
string = charRe, seareh(string)
bool (string)
| ie alovetatiag:
|
Print (allowed ("28°28 Fabode! |
i (allowed("* 884: )("))
Drymen 3.775
Fle £6t_Shel_ Debug Options Window Help
True 7
False
B-etq- Cent orgteateatPaguan ely TT!
Fie Ct famt fan Opis Winn Hp
7
iteas = [1
Pele foe
print(p)
n itens if re.search(c'\aden|ly\2", @)]
1B Python 377 Se
Fle Eat Shell Debug Options Window Help
Lepy
Ulovely', *2 lonely', ‘dent']
>>|Avan tan Propaing cay 2
2.5 _ Using Regular Expression on Files 7
Sometimes ts necessary to extract some matching dats from the le and store it Inthe oter leo print
Regsar expressions an be wed for matching and extracting the data rom the le,
‘Example 25:1: Cone te fotonng aarp, bre Fon th gen pi He argu we hav eat he rie
rues en the athe sa ie we hv 0 wt he exc able mambo nance at
compe.
Input le:
LB sree citeentenovorappdutteca/ Pog.
Fie Et Fermat Run Options Window Help
[seossere2s
| 2356994752
x548962315,
9905764256
[Bicloset)
f2close()
utpat:
Dowpune-ciumenuinon. - Ox
File E&k Format Ron Optans Window Hep
9095567625
possredase
In the above example out of 4 mobile numbers the correct mobile numbers are extracted and stored inthe
pute
EEE eoPeecee eer eceeaareeeeeHe Peer coerce aw SeUb ett yoanichopiaPenPphaTapt 377 - 0 x
ie (at Fama hr Open Winds Hp
eeys and email addcess is ceeyatsomecompany.co-ok
‘email Sa girienla3igeail.com §
jy soeese ml ide sang. com
Feeyadsomecompany.co.uk
girishi23egmail.com
[email protected] 1
ues cob2t
26 Information from an HTML File
‘+ Using regular expressions to search for and extract substrings that match a specie patern i 2 simple way
parse HTML.
+ Here's an example of simple web page:
nto Tn Piet Pages n>
you tke youcan switch tothe
ea refe"htp/ wr dchuckccom page. it>
Second Pagec/@>.
te
+ Tomatchand extract the nk values fromthe above text we can use the following regular expression
Het ="p//47°
———Tor more characters
J urrepn looks for strings that begin wth href followed Dy one MOT Ay
axracr dole quote. The wth a question mark ndeses that the match souk! De Aone 8" ATE
Imamer rater than 2 “ereedy” manner. A nn-reedy mate looks fr the shortest Pos
tehereas greedy match looks forthe longest possible matching si
«+ Wease prenthess in our egular expression tina which prt of our matched
dhe resus the programe below
«+ Thefisd al regular expression method returns only the ink et bebween the double quotes and ern Hs of
llth strings hat match or regular expression.
“+ Wegetthe following output when we nthe programme
ton wegen
er b/w chuckcom page.
bap /spewaeehuccom page
prion weneipy
ner nep/aendowneycomy
‘nap /fompeore/fo/13
aep://miealentowney comp rairppoP
acy alendownecom/p/led/
nape alendowney om wp commens/fe/
nap alendowney com wp ip p36
p/w allen downey com/wp/ pines what)
ap /onealindowney com/wp/
te alentowney com p>
po embed. /embed?arehepMSANZPHZEWrWallendowney cmi2Fwpi2P
p/w alendowney com
‘on /oembed/. 0 embed? urs bpitAK2PH wwe allendowne coms2FwpX2F4H038formatsxm]
a alendowneycom/ wp
p/w alendowney com wp)
nap wallendowney com p05)
at: fw alendowney compass)
ETing MU. BSC co
1p ws alndowne 0m (pres
ip /wneallendowneycom/p/
ep /mined
ip abendowneycom/downey pdt
ep /abenowne ogspotcom/
p/m)
preteen)
ip /aendowney om/researeh
//aewallendowne com /p/publoserviceannounceent/
np wallendowneycom/w/byesianstastsorndergads/
ep://ewallendowney com/eple-munchn-win/
np alendomey on/p oncrent programming aus)
‘np: /wewallendownny com /wp/tink python 2d-edton/
tn: /woealendowney com ep
‘ep wordpressore/
stoma)
+ Wen your HTML is wil formated and pedicle, eg expressions work very well. weve bees
{bere ares many “broken” MTML pages ou there, soition based solely on rgular expressions may mis
‘omens or rest incorrect data A robust HTML parsing ira can bese to save hs problen.
leview Questions:
1 Whats eguar expression 7
2.2 What are tne types of regular expression?
12.3 Explain the varous methods of raga expression wih example
24 Explain the sequence charactors of regular expression with exam
25 Explain te quatitors wth example
28 Explain he special charactors of rogua expression wih exemple
(2:7 how to use regular expressions on fe? Explain wth example?
2.8 Howto exact infomaton trom HTML fe ? Explain wth examio,
990