0% found this document useful (0 votes)
34 views

Advanced Python Programming - Lesson No.002

Advanced Python Programming - Lesson No.002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
34 views

Advanced Python Programming - Lesson No.002

Advanced Python Programming - Lesson No.002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
Regular expressions : What is a regular expression?, sequence characters in egular expressions, quantifiers in regular expressions, special characters in regular expressions, using regular expression on files, retrieving information from an html file, ‘Chaptor2: Regular Expressions IO 2110218 2a 24.1 Regex Functions/Methods 22 Sequence Characters in Regular Expressions. 23° Quantifiersin Regular Expressions 24 Special Characters in Regular Expressions. 25 Using Regular Expression on Files. 26 Retrieving Information from an HTML File. ‘Chapter 3; Threads in Python 3-1 103-16 REGULAR EXPRESSIONS i, sequence characters in regular expressions, quantifies in isa regular expresso lar expressions, using regular expression on files, retieving| gl expressions: What reguat expressions. special characters 19 eg afrmaton from anh fl. 2.1 What is a Regular Ex reson sa powefltol intext processing tolbox Repl exresson haa parser which mashes voona test Ase know thee are mutpefles on command line othe regular expression specifies + Rep suring: wt ‘ing t mateh may diferent possible le names reuse" and? Wildcard characters We can we these characters with dr command in windows ands command ene Forexampe mya rewiring the fs andthe directories present inthe mydi directory. The *s known a wildcard «There sretwo main dferences between regular expression anda simple wildcard 1. Aregularexpesson can match moliple ines anywhere inalnger tring. 2 Ascompareto simple wildcards regula expressions are much more complicated and much ricer +The important pont about regular expression i, regular expression the string always matches isl. For example if there sa patern’AAK; wil always mate tslfnPQEAAAPQR: In short we are nding strings ther sree. +The wildcard (ot) matches any character in string For exmpleAA wll match the strings AAA or 'ABA orn. +The problem arses withthe dot wildcard when we have to find the something with dot Le.'AA: Regular expression makes you to escape special characters by adding the backlash infront of them. For example to rmtch’AA’ and only’AA’ se the pattern 'A\A; the speci meaning of otis iminated, ‘+ Pyton uses backlash for escape sequences, Ike ‘\ Is eeriage Feturn and “We Is tab character. So the aclstath ll bea problem in normal string processing, To aol his problem regular expression are specified ‘ae raw tinge or youtackan'e” ont the front ofthe string constant, and then Python treats them special. ‘+ Sofiralipwe match AX’ by spelvingthe pattern #7A\A™ dvaeed tan gang MU IS Can 2 ep Eprrsig “re module” 1+ There module i used to support the regular expression n python. The reeror exception alse 3 era, encounter white compling the regular expression. There module has tw Important functons 1 Marg) 2 Search) ye asl things fist ike varius characters and thelr meaning whe, reson Before studying his function, we lear ‘we te them in regular expression, We wll us he Raw strngsa ‘Types of Regular Expression “There are two yes of regular expressions Base roger oxresion 2) eres equa expressions Fig. 2.1.1: Types of Regular Expression 1 Basie regular expressions ‘Te hale regular expression are \2\%\U\)AC\.sed. ep Inthe base regular expression the backlash required for 0,0. E6\(\) 2. Extended regular expressions The extended regular expressions are (| ()- Fp. egrep, awk In extended regular expression the bacslss isnot required for 2nd 0. 2.1.1 Regex Functions/Methods Here, weare going to discuss the ollowing regular expression methods. fallmatch) finda) sub sping “The match fenton canbe used to check the given patter at the start of the target string Ifa match found willbe given a Match abject otherwise we wil be given None. o_o eG Feme Ran ten ink fe mend) 2 fatimateno ‘The fliatch function canbe used to match pattern to the entire tage sng Le. he enie sting shoud ‘match the given pattern, This function returns 2 Match objet If the entre string was matched otherwise returns a aaa ‘it eon no ip BEETS: c:/oner/ueove/Agpata/tocah/nipgram/Python/tyton3?/tutimtch.py = _ 3. searcho) ‘Te sare function can be used to find the ven pattern nthe target string I the match is found, the Mary ye match is not aalable jects returned, which represent he int occurrence ofthe math tretuens None sai", mea) [a Pent eteaees TB en 3775p Fie Eda Shel Ong Optont_Vindow Hep See al 5. finditerd Returns a match object for each match nthe erator. We can use the start) end), and group() functions * ‘each match object. This method hasbeen used in many examples since the beginning ofthe chapter. Advance ton aging MU A5 co 2s especie ate py CoAneVAINOVOMGpOAUN ACHP gen/PheNyheNTIndte.. =~ Fie dR Famet fun Option Window Hp tere. finditer("[a-z)","albseSka2") : “print imratart (0,7... Tomand(0s*...75 megroupth) 6 bo Subst or replacements what sub means. ‘esa (repex replacement target string) ‘very matched pattern inthe tage string wl be replaced with the provided replacement Fie fan Format Run _ Opi feubU" Care)", tatbcskt2") i Punter = Seer ony: "ython37/ sub. py 4 rstsaee . ees cate 2. subg les dental to sub with the exception that can also return the numberof replacement. This function returns ‘tuple with the res string a5 the st element andthe number of replacements asthe second (result ting umber oepacemen). ae) Fo mings CAneviOvOApCaatsclPapamaontsomaTaanrrarT =~ | Fe Eat Fort Ran Optom Window Hep eres guba("[22)"y Fy Merb Hes icahy print (e} Print ("ihe Ress + eto) Print (The tobe atenente:t, ef) outpat: [a Pytnon 275m Fle fot Shel Debug Options Window Help Dyenons/7aunn py (evesesestt, 5) me Result string: #7#9#s#08 Bae pmber of replacemence: 5 8. spit ‘The split function should be wsedifwe want to split the given target string according toa speci patter. Ale fall tokens is returned by this function. eine (e) , output: [2 Pen dT Fle_Edt_ Srl Osbug Options Window Hep 9. compile Regular expressions are compile into pater objects, which have methods for searching for pattern mat? and tring substitutions, among othe things — savant ramming MUS co Sep he ten ntPapr eel ape oo ‘oatpat: Demat ae Fle Edt Shel_Debug Options Window Help ‘PYCHONS 7 COMDIIE-BY aA [ety tat, tat, fat, tety tat] >> ‘Sequence Characters in Regular Expressions speci sequences do not mate forthe actual characterin the string instead they specy where the match mst ccurin the search tring Iimakes tesla wrt pater that re frequently used. Character | Description \A__|rreturs match ithe specie characters ae present at the begining of he sting. WW | feretrns mate the specie characters are present the being or the end fhe string \e | teretuns match the speci charactersare presenta the begining ofthe string but not tthe \e_| return match ithe string contains digs (0). 0 | teretuns matchfthe string doesn't contain the digs (0) \s__| tereturnsa imate the string contain any whit space characte. \8__| teretunsa mache sing does'tcontain ay whitespace character we [trea mate ifthe srg contin any word characters w | teresa mate ithe string doesnt contain any word \z | Returns machi the spc characters are atthe ond ofthe rng ———————— ‘Dapoli Education Society's Corte ea SSbrGry Acoassion NS 1107 Advanced Pos Prgcammig (MUI. 2s pint ‘Example 2.21 : Use a speci sequence wilde a egex pattem fo nda aphanamerchraterin he sha 8M sey ‘sequence W side regex ten of the nor-apharumer tacts a he 66 fet tan Cpe vinden hep 4 \w so match alt atphanumarie chasactere Feit = te. finaatl (enw eerget eee fine (romete) [faniat’s"ze‘fonanil Gvleny eazget_ete) |ptistieeeaity rman eee 10s Teh tats tat, ay fal, tay [Paes ‘nelly") eet, ei bf ‘Example 222: Use a epecil sequence \ nie a regex pte to find al whitespace charactor in our tet sg spacial saquarc \S ise a regex patted af the NON espace character [Dice cenoincipimtuaimapanmaentieni B net fam fan Op nn He eazget ste 1 \e to maten any whitespace ‘anget_ste) prin (ceeui®) 16 to maten nen-vhiteepace rectindald (oss target_At) print (cenalel| FTcessre ai) muttipie white-spaces with single space eit s revemlevter"y "oy sence ad ” | Sinetressto) i Adsacr yon Progra MU SSC 2 23 Supers Demin aa] | Fie Et Sek Oebig Opts iow Hep RESTART: C:/Users/LENOVO/appData/tocal/Prograns/Python/Python3T Tey Nee Met, td [rats tel, tet, teh, fat) ['gessa', ‘and', ‘Relly'] |sessa and Relly | “Erampio 223 Use a oped Gharacteaqvace nso aep84 palo 10 Vou aman ie epocd charac ae prevent atthe begining a he end of he ng. sas Bo rete peti carcrs we eset thebageningof esting utr atte ends pa Esifinanatienvsssa eae Output: Yes, there is at least one match for x! No match for y ‘The form ofa quater i (on) where m and mare the mlnimum ad maximum times thatthe ees, ‘which the quantifier apples must match, Quant can be used to specify the numberof occurrences that shoul, matched. a = | [ee nearer a Saneupee 7 | earcrarnr ene arr [ Pater 7 sartrertn esr [2nd ent nee [owen ttn ere) eer re a 0) ran crater ‘sour mune tng 0 Gen ererveesandvenm | LA es me et = and ta maximum of four. = aera quantifier makes ita reluctant quant. It tries to find the smallest ‘match. This makes the regular expression sop atthe fst math Example 23.1: Mate ha goon character wth he gen sng art 7 Beant p)-CrnewENOVOrAGpOHaNacArogImUPonyeNTi¢. — CX Fe tat Format fn Options Window Hep Jsatcher=re-finditer("2", "abaaceahnbaaab") match» matcher: print (match. start (),"......"ymatch.group()) vance ton Progamming (MV BC con 2a Example 2.22 : Check ibe occurence of he ghen cherie gen ng umber) CtbewtENOVOiAppMecPegamahonFybonTQue. — Ox for match in matcher: | print (match. start (J, TB rye 7 er Example 239 Chece oxcoerc lhe gon aati onn oor DD geese RES - Cire NOVA AppDatLncal Program thor Pyth - a fie_Edt_Fomat_fun_Options_Window Help He impert re ere a} matcher=re. finditer ("a+", "abaabaaabaaaasaa") f for match in matcher: a print (match. start (),"......"ymateh.group ()) Z Seeerer = tnd CED, (SSromatch in matcher: | Print (match start ()," ee a BW certrten gammy MSE Comp) 22 fap teri Output a Bron snet =e x 2.4 _ Special Characters in Regular Expressions ‘The Table 241 shows the spec characters in the regular expression, ‘wa ‘Special Character | Description a (ge ur eet + |e srs + [sets srin Match any character except a newine * Match eo or more quater . Match one or more quantifier Indicates range in character class (Uke 22) Sabsitte complete match Grouping characters Character lass to match a single character Range quantifiers ele le lo le Anchors that spec left or ight word boundary 2 Match ero or one quantifier i Species series of alternatives choles oe “Example 241 We can use" oymba fo ave # he ve gp sag starts wh te patom we proved nt te we sting bois wit Lear, Match ibe returod thors, None wl be ered SS ngvmend hon Progen MY Cc ay a agen reeera.sareh("AFot 3) = fe erie ces tetera i eee rere) fet Set ony penn Hy jryCuousrrcazgee Sering Suare WEA Target String Not starts with Fast >>) ‘Example 242 We can use Be sybase gn are sing aces cur Fae ot He ret lg eres ‘nay Mah wb rome there None wil eer. resereearchEany, 3) oye", a) String ande vith Basy") 1g Wot ends with Easy") : (B ython 3:77 She - Oo x Fle Est She Debug Options Window Help PYCHOUST/ vacgeL NCE LLY SHU Wabi -¥=BY . Target string ends with Easy a qt ‘AZanao9, 2a smple 2.4.3: Proramtocheck haa ting conan oye et a hac (th cas a ie Ede Frnt Ron Opens nde Hep hare = re,compile(:!(*3- string = charRe, seareh(string) bool (string) | ie alovetatiag: | Print (allowed ("28°28 Fabode! | i (allowed("* 884: )(")) Drymen 3.775 Fle £6t_Shel_ Debug Options Window Help True 7 False B-etq- Cent orgteateatPaguan ely TT! Fie Ct famt fan Opis Winn Hp 7 iteas = [1 Pele foe print(p) n itens if re.search(c'\aden|ly\2", @)] 1B Python 377 Se Fle Eat Shell Debug Options Window Help Lepy Ulovely', *2 lonely', ‘dent'] >>| Avan tan Propaing cay 2 2.5 _ Using Regular Expression on Files 7 Sometimes ts necessary to extract some matching dats from the le and store it Inthe oter leo print Regsar expressions an be wed for matching and extracting the data rom the le, ‘Example 25:1: Cone te fotonng aarp, bre Fon th gen pi He argu we hav eat he rie rues en the athe sa ie we hv 0 wt he exc able mambo nance at compe. Input le: LB sree citeentenovorappdutteca/ Pog. Fie Et Fermat Run Options Window Help [seossere2s | 2356994752 x548962315, 9905764256 [Bicloset) f2close() utpat: Dowpune-ciumenuinon. - Ox File E&k Format Ron Optans Window Hep 9095567625 possredase In the above example out of 4 mobile numbers the correct mobile numbers are extracted and stored inthe pute EEE eoPeecee eer eceeaareeeeeHe Peer coerce aw Se Ub ett yoanichopiaPenPphaTapt 377 - 0 x ie (at Fama hr Open Winds Hp eeys and email addcess is ceeyatsomecompany.co-ok ‘email Sa girienla3igeail.com § jy soeese ml ide sang. com Feeyadsomecompany.co.uk girishi23egmail.com [email protected] 1 ues cob2t 26 Information from an HTML File ‘+ Using regular expressions to search for and extract substrings that match a specie patern i 2 simple way parse HTML. + Here's an example of simple web page: nto Tn Piet Pages n> you tke youcan switch tothe ea refe"htp/ wr dchuckccom page. it> Second Pagec/@>. te + Tomatchand extract the nk values fromthe above text we can use the following regular expression Het ="p//47° ——— Tor more characters J urrepn looks for strings that begin wth href followed Dy one MOT Ay axracr dole quote. The wth a question mark ndeses that the match souk! De Aone 8" ATE Imamer rater than 2 “ereedy” manner. A nn-reedy mate looks fr the shortest Pos tehereas greedy match looks forthe longest possible matching si «+ Wease prenthess in our egular expression tina which prt of our matched dhe resus the programe below «+ Thefisd al regular expression method returns only the ink et bebween the double quotes and ern Hs of llth strings hat match or regular expression. “+ Wegetthe following output when we nthe programme ton wegen er b/w chuckcom page. bap /spewaeehuccom page prion weneipy ner nep/aendowneycomy ‘nap /fompeore/fo/13 aep://miealentowney comp rairppoP acy alendownecom/p/led/ nape alendowney om wp commens/fe/ nap alendowney com wp ip p36 p/w allen downey com/wp/ pines what) ap /onealindowney com/wp/ te alentowney com p> po embed. /embed?arehepMSANZPHZEWrWallendowney cmi2Fwpi2P p/w alendowney com ‘on /oembed/. 0 embed? urs bpitAK2PH wwe allendowne coms2FwpX2F4H038formatsxm] a alendowneycom/ wp p/w alendowney com wp) nap wallendowney com p05) at: fw alendowney compass) ET ing MU. BSC co 1p ws alndowne 0m (pres ip /wneallendowneycom/p/ ep /mined ip abendowneycom/downey pdt ep /abenowne ogspotcom/ p/m) preteen) ip /aendowney om/researeh //aewallendowne com /p/publoserviceannounceent/ np wallendowneycom/w/byesianstastsorndergads/ ep://ewallendowney com/eple-munchn-win/ np alendomey on/p oncrent programming aus) ‘np: /wewallendownny com /wp/tink python 2d-edton/ tn: /woealendowney com ep ‘ep wordpressore/ stoma) + Wen your HTML is wil formated and pedicle, eg expressions work very well. weve bees {bere ares many “broken” MTML pages ou there, soition based solely on rgular expressions may mis ‘omens or rest incorrect data A robust HTML parsing ira can bese to save hs problen. leview Questions: 1 Whats eguar expression 7 2.2 What are tne types of regular expression? 12.3 Explain the varous methods of raga expression wih example 24 Explain the sequence charactors of regular expression with exam 25 Explain te quatitors wth example 28 Explain he special charactors of rogua expression wih exemple (2:7 how to use regular expressions on fe? Explain wth example? 2.8 Howto exact infomaton trom HTML fe ? Explain wth examio, 990

You might also like