0% found this document useful (0 votes)
22 views

Unit 5 - Application Development Using Python

The document discusses strings in Python including creating and accessing strings, string literals, truth values of strings, indexing and slicing strings, and using the in and not in operators with strings. Regular expressions are also introduced for pattern matching within strings.

Uploaded by

Tushar Vaswani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Unit 5 - Application Development Using Python

The document discusses strings in Python including creating and accessing strings, string literals, truth values of strings, indexing and slicing strings, and using the in and not in operators with strings. Regular expressions are also introduced for pattern matching within strings.

Uploaded by

Tushar Vaswani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Self-Learning Material

tushar.1801@gmail.com
D0OLHR8SGA

Program: MCA
Specialization: Core
Semester: 3
Course Name: Application Development using Python *
Course Code: 21VMT0C301
Unit Name: Strings, Pattern Matching with Regular Expressions

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Table of Contents:
- Working with strings …3
- String methods …6
- Methods to work with strings …9
- Regular expression …16
- findall() method …17
- substituting strings with sub() method … 18
- Pattern matching with regular expression and finding patterns of text with regular
expressions. … 20
- Greedy matching … 22
- Non-greedy matching …23
- character class … 24
- wildcard character … 26
- case sensitive matching … 27
- managing complex regexes … 28
- combining re.IGNORECASE, re.DOTALL, re.VERBOSE … 29
- Program. … 31

tushar.1801@gmail.com
D0OLHR8SGA

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Unit 5:
Strings, Pattern Matching with Regular Expressions
Unit Overview:
Strings are an important part of data and information. However, strings cannot be
immediately evaluated always, especially when user input is considered. This is wear string
manipulation and handling comes into picture. Regular expressions are also called Regex
functions, which play a major role in interpretation of strings.
Unit Outcomes:
- Working with strings
- String methods
- Methods to work with strings
- Regular expression
- Pattern matching with regular expression and finding patterns of text with regular
expressions.
- Greedy matching
tushar.1801@gmail.com
D0OLHR8SGA - Non-greedy matching
- findall() method
- character class
- wildcard character
- case sensitive matching
- substituting strings with sub() method
- managing complex regexes
- combining re.IGNORECASE, re.DOTALL, re.VERBOSE
- Program.

The human brain can only understand words and not 0s and 1s, i.e., it is not trained to read
the binary code. We store these words in strings on python.
Strings in Python are collections of bytes that represent Unicode characters. A single
character in Python is just a string of length 1, since there is no such thing as a character
data type. To access the string's constituents, use square brackets.
Creation of strings:
In Python, single, double, or even triple quotes can be used to create strings. In the given
code we have created strings using single quotes, double quotes and triple quotes. All the

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
strings are stored in a variable called str1. When we want to print these, we simply have to
call the variable ‘str1’.

The code of string creation is as follows:

tushar.1801@gmail.com
D0OLHR8SGA

Output:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Truth values of a string:
Any object can be used to test the truth value. The checking can be done by including the
condition in the if or while statement.
We can assume that an object's truth value is True up until the class methods __bool__()
and __len__() return False or 0, respectively. When a constant is False or None, its value is
False.
A variable is said to have a false value when it has many values, such as 0, 0.0, Fraction(0, 1),
Decimal(0), and 0j. The elements in the empty sequence ", [], (),, set(0), range(0), and " have
a False Truth Value.
False is represented by the truth value 0, and true is represented by 1.
With respect to the above mentioned rules and conventions, we can check truth values by
looking at the following example:

tushar.1801@gmail.com
D0OLHR8SGA

The output code for the same is as follows:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
String Literals:
Literals are a type of source code notation used to express fixed values. They may
alternatively be described as the unprocessed data or values included in variables or
constants. There are various literal kinds in Python, including:
- Strings literals
- numeric literals
- Boolean literals
- Literal collections
- Special literals
A text (a collection of Characters) can be turned into a string literal by enclosing it in single,
double, or triple quotations. We can compose multi-line strings using triple quotes or
display them whichever we like. In the first figure of the unit, we defined string literals in
single quotes, double quotes, triple quotes and multi-line strings. Each text is a string literal.
Character literal are those where a single character is enclosed in single quotes or double
quotes.
Indexing and Slicing:
tushar.1801@gmail.com
D0OLHR8SGA The indexing method in Python can be used to access specific characters within a String.
Through indexing, characters at the end of the String can be accessed using negative
address references, such as -1 for the last character, -2 for the next-to-last character, and so
on.
Accessing an index outside of the permitted range will result in an IndexError. Only integers,
floats, and other types that result in a TypeError are permitted to be supplied as indexes.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Here, when we index the string to [-1], it returns the first character starting from the end of
the string. When we put str[12], it gives the 12th character in the string. It must be noted
that python indexing starts from 0. It is another word for positioning of characters in a
string.
To reverse a string, one can simply use ::-1, for instance:

The process of slicing is used to gain access to a specific subset of characters in the String.
Using the Slicing operator, one can slice through a string (colon).

tushar.1801@gmail.com
D0OLHR8SGA

The in and not in operators:


The in operator in Python can be used to determine whether a value is present in a
collection of values.
In Python, iterable types like lists and strings can be used using the in operator. It's
employed to determine whether an element is present in the iterable. If an element is
found, the in operator returns True. If not, it returns False.
Similar to this, the not in operation (which combines the not operator and the in operator)
can be used to determine whether a value is present in a collection:
To do this, you can create the not in operator by negating the in operator with the not
operator.
This can be done to check whether a substring exists within a string, if a key exists in a
dictionary and to check if a value exists in a list.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
String Methods:
Python strings are Unicode character strings that are encapsulated in quotation marks. It
has built-in functions, or Python functions, that can be used to manipulate strings.
tushar.1801@gmail.com
D0OLHR8SGA Every string method, it should be noted, returns a new string with the modified
characteristics rather than altering the original string.
Some of the built in python string methods are as follows:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Source: https://medium.com/@bloggingtech260/what-
lower(): Lowercases every uppercase character in a string.
upper(): All lowercase characters in a string are converted to uppercase using the upper()
function.
title(): will convert any string to title case.
A few string methods are used below:
String concatenation:
String concatenation using the .join() method:
A string is returned by the join() method, which joins sequence components together using
the str separator. The string that is contained in var1 and var2 is combined using this
procedure. It only accepts lists as arguments, and lists can be any size.
String concatenation using + operator:
The + operator makes concatenating strings relatively simple. Multiple strings can be added
together using this operator. However, a string is required for the parameters. Since strings
cannot be changed, they are always given to a new variable when they are concatenated.
The % operator can be used for string formatting as well as string concatenation. It comes in
tushar.1801@gmail.com
handy when we need to combine strings and apply basic formatting. String data type is
D0OLHR8SGA
indicated by the %s.
Another Python method for formatting strings that supports multiple substitutions and
value formatting is str.format(). Through positional formatting, it joins together components
included in a string. The location of strings is set using curly braces. In the first set of curly
braces, the first variable is stored, and the second variable is stored in the second set of
curly braces.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Changing cases of strings:
isupper() is a built-in method in Python that is used to handle strings. If the string contains
just capital characters, the procedure returns True; otherwise, it returns "False."
If the string contains only whitespace, it returns "False," else it returns "True." It does not
accept any parameters, thus if a parameter is supplied, it will return an error. If the string
solely contains digits and integers, it returns "False," but if it also contains symbols, it
returns "True."
tushar.1801@gmail.com
D0OLHR8SGA islower() is a built-in function for managing strings in Python. If every character in the string
is lowercase, the islower() method returns True; otherwise, it returns "False."
If the string contains only whitespace, it returns "False," else it returns "True." It does not
accept any parameters, thus if a parameter is supplied, it will return an error. If the string
solely contains digits and integers, it returns "False," but if it also contains symbols, it
returns "True."
Lower() is a built-in method in Python that is used to manage strings. The lower() method
extracts a lowercase string from the input string and returns it. All capitalization is changed
to lowercase. If there are no uppercase characters, the original string is returned.
It does not accept any parameters, thus if a parameter is supplied, it will return an error.
Only an uppercase letter is returned after converting to lowercase; all other characters are
returned in their original form.
Upper() is a built-in method in Python that is used to handle strings. The upper() method
extracts an uppercase string from the input string and returns it. All lowercase characters
are changed to uppercase. If there are no lowercase characters, the original string is
returned.
It does not accept any parameters, thus if a parameter is supplied, it will return an error.
Only a lowercase letter is returned after the digits and symbols have been converted to
uppercase.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
None of the four methods take any parameters.

Example:

startswith() method:
If a string begins with the supplied prefix, the Python String startswith() function returns
True (string). If not, False is returned.

str.startswith(prefix, start, end)


tushar.1801@gmail.com
D0OLHR8SGA

Parameters:
Prefix: Prefix ix is nothing more than a string that demands inspection.
start: The first position in the string where the prefix needs to be verified.
end: The final position in the string where the prefix has to be verified.

str.endswith(suffix, start, end)

Parameters:
suffix: All that a suffix is is a string that needs to be verified.
start: The point in the string where the suffix needs to be checked first.
end: The place in the string where the suffix has to be checked after the ending position + 1.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
.join() method:
The built-in string function join() in Python is used to join sequence elements that have a
string separator between them. This function creates a string by joining the elements of a
sequence.

stringName.join(iterable)

Here, iterable means the objects that can return their members one at a time. Example of
iterables – list, tuple, set, dictionary, string.
Example:

tushar.1801@gmail.com
D0OLHR8SGA

split() method:
Python's String split() function breaks the given string into a list of strings using the defined
separator.

string.split(separator, maxsplit)

separator – This acts as a delimiter. At this designated divider, the string separates. If is
absent, a separator is any blank space.
maxsplit: It is a number that instructs us to split the string as many times as possible. If it is
not supplied, the default value is -1, which indicates that there is no limit.
Examples of splitting are given below:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
partition() method:
The Python String partition() method divides the string at the first occurrence of the
separator and returns a tuple that includes the portion immediately preceding the
separator, the separator, and the part immediately after the separator. The separator in this
case is a string that is supplied as an argument.

string.partition(separator)
tushar.1801@gmail.com
D0OLHR8SGA

Parameter separator is a substring that will separate the string. A tuple with 3 entries is
returned. the section immediately following the separator, the separator itself, and the
preceding substring.

String justification:
rjust():
After replacing a specified character in the left side of the original string, the string rjust()
method returns a new string of the specified length.

string.rjust(length, fillchar)

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
ljust():
After replacing a specified character in the right side of the original string, the string ljust()
method returns a new string of the specified length.

string.ljust(length, fillchar)

Parameters:
length: The modified string's length. The original string is returned if length is less than or
equal to the length of the original string.
fillchar: Characters that must be padded (optional). If it is absent, the default argument is
taken to be space.
Example:

tushar.1801@gmail.com
D0OLHR8SGA

The centre() method in the Python string constructs and returns a new string that has the
supplied character appended as padding.

string.center(length[, fillchar])

Parameters:
length: The string's length following character padding.
fillchar: Characters that must be padded (optional). If it's omitted, space is used as the
argument by default.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Removing whitespaces:
strip() outputs a new string after eliminating all leading and trailing whitespace, including
tabs.
rstrip() creates a new string by removing trailing whitespace. Eliminating the white spaces
from the string's "right" side makes it simpler to recall.
lstrip() creates a new string by either deleting whitespace from the string's "left" side or its
leading whitespace.
Examples:

tushar.1801@gmail.com
D0OLHR8SGA
replace() method:
Python's replace() function creates a replica of the string by replacing every instance of one
substring with a different substring. It returns a duplicate of the text that replaces every
instance of one substring with another substring.

string.replace(old, new, count)

Parameters:
old – old substring that needs to be replaced.
new - A new substring that would take the place of the previous one.
count - (Optional) The number of times the new substring should be substituted for the old
substring.
Example:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Regular Expressions:
In order to find a string or group of strings, a Regular Expression (RegEx) is a unique string of
characters. By comparing a text to a specific pattern, it may determine if it is present or
absent. It can also divide a pattern into one or more sub-patterns. Regex functionality is
available in Python through the re module. Its main purpose is to provide a search, for which
a string and a regular expression are required. It either returns the first match in this case or
none at all.
Regular expressions in Python are handled by the re module in Python. Using the import
statement, we can import this module.
Example:

The initial index and ending index of the string "good" are provided by the code above.
tushar.1801@gmail.com
D0OLHR8SGA
Note that the r character (r'good') here denotes raw rather than regex. The
character \ won't be recognised as an escape character in the raw string, making it slightly
different from a standard string. This is due to the fact that the regular expression engine
uses the character \ for internal escaping.
MetaCharacters are helpful, significant, and will be used in module RE functions, which
helps us comprehend the analogy with RE. The list of metacharacters is shown below.

Source: GeeksForGeeks
Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Special sequences provide the precise position in the search string where the match must
take place rather than matching for the actual character in the string. It makes it simpler to
write patterns that are used frequently.

tushar.1801@gmail.com
D0OLHR8SGA

Source: GeeksForGeeks

re.findall():
A list of strings containing all of the pattern's non-overlapping matches in the given string
Matches are returned in the order they are found once the string has been left-to-right
scanned.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Probably the most potent function in the re module is findall(). Each string in the list
returned by the findall() function denotes a different match that was found.

Example:

re.compile():
In order to conduct operations like looking for pattern matches or replacing strings, regular
expressions are compiled into pattern objects.

tushar.1801@gmail.com
D0OLHR8SGA

'a' appears for the first time in "Whatever." Case sensitivity applies.
The next occurrence is "e" in "Whatever," followed by "e" once more in "Whatever," "a" in
"are," and "one" for the last "e."
The metacharacter backslash '\' is crucial since it indicates different sequences. Utilize "\\" if
you want to use the backslash without its particular significance as a metacharacter.
re.sub()
The function's "sub" keyword stands for SubString; it searches the provided string for a
specific regular expression pattern (3rd parameter), replaces it with repl (2nd parameter),
and counts the number of times this happens.
Syntax:

re.sub(pattern, repl, string, count=0, flags=0)

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Example:

re.subn():
Except for how it produces output, subn() and sub() are identical in all other respects.
Instead of merely returning the string, it produces a tuple that includes a count of the sum
tushar.1801@gmail.com
of the replacements and the new string.
D0OLHR8SGA

re.sub(pattern, repl, string, count=0, flags=0)

Example:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
translate() method:
Each character in the string is mapped to its matching character in the translation table
using the string translate() method.
The translate() method uses the translation table to swap out or translate the given string's
characters according to the mapping table.

string.translate(table)

Example:

tushar.1801@gmail.com
D0OLHR8SGA

This translation mapping offers a mapping from the letters b, e, l, l, and e to the letters a, p,
p, l, and e, respectively. Although the mapping to a, b, and e is reset to None by the removal
string str3.
Therefore, a, b, and e are eliminated when the string is translated using translate(),
producing cdf..

Pattern Matching:
You could be accustomed to searching for text by pressing Ctrl-F and typing the desired
terms. Regular expressions take things a step further by enabling you to define a specific
text pattern to search for.
Regexes, also known as regular expressions, are descriptions of a pattern of text. A digit
character, or any single number from 0 to 9, is represented by the character "d" in a regex,
for instance.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Example:
Enter the following into the interactive shell to build a Regex object that matches the phone
number pattern.

pinCodeRegex = re.compile(r'\d\d\d\d\d\d')

An Regex object is now present in the pinCodeRegex variable.


The search() method of a Regex object looks for any matches to the regex in the string that
is provided to it. The group() method of Match objects returns the actual text that matches
the search phrase.

Steps for pattern matching:


To utilise regular expressions in Python, follow these steps:
tushar.1801@gmail.com
D0OLHR8SGA
- Use import re to import the regex module.
- With the help of the re.compile() function, create a Regex object. (Don't forget to
employ a raw string.)
- The search() method of the Regex object accepts a string as a parameter. It gives
back a Match object.
- To get a string containing the actual text that was matched, use the group() method
of the Match object.
Grouping paranthesis:
1. Matching objects:
Consider the scenario where you wish to distinguish the final three digits of a pin
code from the remainder of the pin code. Groups will be created in the regex by
adding parentheses: (\d\d\d)-(\d\d\d). The matching text from only one group can
then be obtained by using the group() match object function.

2. Using pc.groups():

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
When using pc.groups, you can assign each value to a different variable by using the
multiple-assignment approach, as seen in the firstThree, lastThree= mo.groups() line.

3. Retrieving all groups at once:


Use the groups() method—notice that the name is plural—if you want to retrieve all
the groups at once.

4. Matching a parenthesis:
tushar.1801@gmail.com
D0OLHR8SGA Regular expressions give parentheses a special meaning, but what if you need to
match a parenthesis in your text? For instance, the firstThree can be set in
parentheses in the pincodes you are attempting to match. In this instance, a
backslash is required to escape the (and) characters. the interactive shell with the
following information:

Greedy Quantifiers:
?, *, +, and m, n are examples of greedy quantifiers that match as many characters as they
can (longest match). For instance, the substrings 'a', 'aa', and 'aaa' all match the regex 'a+',
but the regex 'a+' will match as many 'a's as possible in the string 'aaaa'.
Non-greedy quantifiers:
As few letters as possible are matched by a non-greedy quantifier, such as ??, *?, +?, and?
{m,n}? (shortest possible match). For instance, the regex 'a+?' will match as few 'as' in your

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
string 'aaaa' as feasible. As a result, it completes the task by matching the first character,
"a."

Greedy Matching:
A greedy match occurs when the regex engine matches as many characters as it can in an
effort to discover your pattern in the string.
In your string "bbbb," for instance, the regex "b+" will match as many "b"s as feasible. The
substrings "b," "bb," and "bbb" all match the regex "b+," but the regex engine does not
consider this sufficient. It always strives to match more and is always hungry.
The greedy quantifiers, in other words, give you the longest match from a specific location
in the string.
It turns out that all default quantifiers, including?, *, +, {m}, and {m,n}, are greedy, matching
as many characters as they can to ensure that the regex pattern is still met.
A shorter match would be acceptable in any situation. However, because the regex engine is
naturally greedy, those are insufficient.
Example of greedy matching:
tushar.1801@gmail.com
D0OLHR8SGA

Use the zero-or-one regex 'b?' in the first instance. It matches one 'b' character if feasible
because it is greedy.
Non-greedy pattern matching:
A non-greedy match occurs when the regex engine matches the fewest characters feasible
while still being able to match the given string's pattern.
For instance, the regex 'a+?' will match as few 'as' in your string 'aaaa' as feasible. As a
result, it completes the task by matching the first character, "a." The second character,
which is also a match, is then used, and so on.
The non-greedy quantifiers, in other words, provide you with the shortest match from a
specific location in the string. By adding the question mark symbol "?" to the default
quantifiers?, *, +, {m}, and {m,n}, you can make them less greedy. They "consume" or
"match" as few characters as feasible while still satisfying the regex pattern.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Example:

You use the 'a??' version, which is not a greedy zero or one. If it can, it matches zero 'as.
Keep in mind that it advances from left to right to "consume" the empty string. Only then is
it obliged to match the initial character of the letter "a" because it can no longer match the
empty string. The empty string can then be matched once more after that. Repeatedly, the
empty string is matched first, and only then the letter "a" if it is necessary. That is why this
tushar.1801@gmail.com
peculiar pattern appears.
D0OLHR8SGA
You use the 'p*?' version, which is not a greedy zero or one. Once more, if it can, it matches
zero 'as. It only matches one character at a given point, "consumes," it, and then continues
if it has already matched zero characters at that location.
You utilise the 'p+?' version, which is not a greedy one-or-more. The regex engine only
recognises the character "p" in this instance, consumes it, and continues on to the next
match.
Difference between greedy and non-greedy matching:
Regular expression matching starts right away. It will only return the earliest match that
they can locate. Regular expressions do greedy matches by default. The longest strings that
can be matched and returned using the regex pattern are referred to be greedy matches.
The greedy match will attempt to match the quantified pattern as many times as it can. The
non-greedy match will make an effort to match the quantified pattern as infrequently as
possible.
Character class in python:
A group of characters enclosed in square brackets is referred to as a "character class" or
"character set." Only one character from a character class or character set, on average, is
matched by the regex engine. The characters that we want to match are included in square
brackets. You can use the character set [aeiou] to match any vowel.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Only one character can be matched by a character class or set. It is irrelevant how the
characters are arranged inside a character class or set. The outcomes are the same.
A hyphen is used to represent a range of characters inside a character class. [0-9]
corresponds to a single number between 0 and 9. Similar character classes exist for
lowercase and uppercase letters ([A-Za-z]).
Example:
This code prints all the vowels present in the string.

Sometimes you need to match a group of characters but the shorthand character classes (d,
w, s, and so on) are too general. Using square brackets, you can define your own character
class.
Character set specifications are made with square brackets. To specify the range of
characters inside a character set, use a hyphen. The sequence of characters inside square
tushar.1801@gmail.com
D0OLHR8SGA brackets is irrelevant. The regular phrase [Aa]n, for instance, denotes either an uppercase or
lowercase a, followed by the letter n.
A character class can be expressed in its simplest form by enclosing a group of characters in
square brackets.
Since it specifies a character class that accepts either "a," "b," or "c" as its initial character
followed by "at," the regular expression [abc]at, for instance, will match the words "bat," or
"cat."
Example of custom character classes:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Caret symbol and Dollar sign:
A match must occur at the start of the text being searched for by using the caret sign (^) at
the beginning of a regular expression.
The input string ^abc matches an if the following regular expression is used: a (if an is the
initial symbol). However, when we use the regular expression b on the identical input string,
nothing matches. This is due to the fact that the starting symbol in the input string abc is not
b. Let's examine yet another regular expression, ^(S|s)h, which denotes the following: The
input string's start symbol is either uppercase character S or lowercase character s, which is
followed by lowercase character h.
The string must end with that regex pattern, which can be indicated by adding a dollar
symbol ($) to the end of the regex.

Wildcard Symbols:
tushar.1801@gmail.com
D0OLHR8SGA A wildcard is a symbol that can be used in place of one or more characters to represent
them. Computer applications, languages, search engines, and operating systems all employ
wildcards to make search criteria simpler. The question mark (?) and the asterisk (*) are the
most popular wildcards.
Asterisk(*) – Any number of characters can be specified using an asterisk *. Usually, it is
added to the end of a root word. This is useful when looking for a root word's changeable
ends. Any number of characters can be specified using an asterisk *. Usually, it is added to
the end of a root word. This is useful when looking for a root word's changeable ends.
Question(?) – A question mark(?) is used anyplace in the word to represent a single
character. When a word has multiple spellings and you want to search for all of them at
once, it is most helpful.
The question mark(?) symbol is replaced with the dot(.) character.
Similar to the asterisk * symbol, the .+ characters are used to match one or more characters.
Example:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
tushar.1801@gmail.com
D0OLHR8SGA

Case sensitive string comparison in python:


Case-insensitive means that both the string you are comparing and the string to be
compared must exactly match, but both strings may be written in either uppercase letters
or lowercase letters. (i.e., many cases).
Case sensitive string comparison with lower() method:
Here, the list item and the user_phone are both made into lowercase and then compared.
Example:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Case sensitive string comparison with upper() method:
Here, the list item and the user_phone are both made into uppercase and then compared.

Case sensitive string comparison with casefold() method:


tushar.1801@gmail.com
D0OLHR8SGA Like the lower() function, the casefold() method also operates. However, in contrast to
lower(), it conducts a rigorous string comparison by eliminating any case distinctions that
may be present in the string.
All of the characters in a string that is produced by the casefold() method are lowercase.
Comparing two strings that have both been converted using the casefold() method will yield
more matches than comparing two strings that have only been converted using the lower()
technique because the casefold() method is stronger and more aggressive and will convert
more characters into lower case.
Example:
In this example, a count variable is set to 0. When the for loop runs, the ‘check’ word is
checked for, in the list using casefold. If the word is found, the count iterates by one and the
loop breaks.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
re.verbose():
The regex package's VERBOSE flag enables users to create regular expressions that can
appear nicer and are easier to read. This flag accomplishes that by enabling users to add
additional comments and visually distinguish the pattern's logical components.
Whitespaces inside of patterns are disregarded, however they cannot be ignored if they are
part of a character class, are followed by an unescaped backslash, or are contained within
tokens like *?, (? P, or (?: However, anytime a # character appears in a line that is not a
member of its character class or is not preceded by an unescaped backslash. From the
leftmost # through the end of the line, every character will be disregarded.

tushar.1801@gmail.com
D0OLHR8SGA

There are two ways that a verbose regular expression differs from a compact regular
expression:
Whitespace is not used. Carriage returns, spaces, and tabs do not match as spaces, tabs, and
carriage returns. They are not at all matched. A backslash must be placed in front of the
space if we wish to match it in a verbose regular expression.
Commentary is disregarded. Similar to a remark in Python code, a verbose regular
expression comment begins with the # character and extends to the end of the line. Instead
of being a comment within our source code in this instance, it is a comment within a multi-
line string, but the principle is the same.
re.IGNORECASE:
This flag enables case-insensitive regular expression matching with the supplied string, so
that expressions like [A-Z] will also match lowercase letters. It is often supplied to
re.compile() as an optional argument.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Example:

tushar.1801@gmail.com
D0OLHR8SGA

re.DOTALL():
Python's "." special character matches any character except the beginning of a new line, but
its capability can be expanded using the DOTALL flag.
The "." character can be used to match any character, including newlines, thanks to the
DOTALL flag.
There may be situations when working on real-world projects that require us to analyse
multi-line strings (separated by newline characters, or "n"). In these circumstances, we
employ re.DOTALL.
Example:
One or more characters ('. +') are matched by the regular expression in this case. The engine
halts when it encounters the newline character because the dot character does not
correspond to the line breaks. Take a closer look at the code that uses the DOTALL flag.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Python program:
In the following program, we accept a string from the user and return the total number of
lowercase and uppercase characters.

In the above code, we simply put in the pattern for lowercase in variable ‘lower’ and pattern
for uppercase in variable ‘upper’. We use the re.findall(pattern,text) option to find the total
number of lowercase and uppercase characters by enclosing the statement withing the ‘len’
method.
tushar.1801@gmail.com
D0OLHR8SGA

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by tushar.1801@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.

You might also like