BITypes Notes
BITypes Notes
BITypes Notes
Built-in Types
Python programs access an object's internal data which implements the buffer
protocol using memoryview objects without copying. The byte-oriented data of
an object is read and written directly without copying it using the
memoryview() method.
Python has several functions which are readily available for use. These functions
are called built-in functions.
sys Module
The sys module provides functions and variables used to manipulate different parts
of the Python runtime environment.
sys.argv
It returns a list of command line arguments passed to a Python script. The item at
index 0 in this list is always the name of the script. The rest of the arguments are
stored at the subsequent indices.
sys.exit
It causes the script to exit back to either the Python console or the command
prompt. This is generally used to safely exit from the program in case of generation
of an exception.
sys.maxsize
sys.path
This is an environment variable that is a search path for all Python modules.
sys.version
This attribute displays a string containing the version number of the current Python
interpreter.
getopt Module
Python provided a getopt module that helps you parse command-line options and
arguments. This module provides two functions and an exception to enable
command line argument parsing.
getopt.getopt method:
This method parses command line options and parameter list. Syntax for this
method:
getopt.getopt(args, options, [long_options])
Regular Expressions
1. INTRODUCTION:
A Regular Expression (Regex) is a special sequence of characters that uses a
search pattern to find a string or set of strings. It can detect the presence or absence
of a text by matching it with a particular pattern and also can split a pattern into
one or more sub-patterns. Module re supports the use of regex in Python. Its
primary function is to offer a search, where it takes a regular expression and a
string. Here, it either returns the first match or else none.
Example:
This Python code uses regular expressions to search for the word “University” in
the given string and then prints the start and end indices of the matched word
within the string.
import re
str = 'TMU for Computers: A computer science University for IT'
find = re.search(r'University', str)
print('Start Index:', find.start( ))
print('End Index:', find.end( ))
2. MAIN COMPONENTS
Character:
All characters, except those having special meaning in regex, matches
themselves. E.g., the regex x matches substring "x"; regex 9 matches "9";
regex = matches "="; and regex @ matches "@".
Special Regex Characters:
These characters have special meaning in regex (to be discussed
below): ., +, *, ?, ^, $, (, ), [, ], {, }, |, \.
Escape Sequences (\char):
To match a character having special meaning in regex, you need to use a
escape sequence prefix with a backslash (\). E.g., \. matches "."; regex \
+ matches "+"; and regex \( matches "(".
Regex recognizes common escape sequences such as \n for newline, \t for
tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a
two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit
Unicode.
A Sequence of Characters (or String):
Strings can be matched via combining a sequence of characters (called sub-
expressions). E.g., the regex Saturday matches "Saturday". The matching, by
default, is case-sensitive, but can be set to case-insensitive via modifier.
OR Operator (|): E.g., the regex four|4 accepts strings "four" or "4".
Character class (or Bracket List):
[...]: Accept ANY ONE of the character within the square bracket,
e.g., [aeiou] matches "a", "e", "i", "o" or "u".
[.-.] (Range Expression): Accept ANY ONE of the character in the range,
e.g., [0-9] matches any digit; [A-Za-z] matches any uppercase or lowercase
letters.
[^...]: NOT ONE of the character, e.g., [^0-9] matches any non-digit.
Only these four characters require escape sequence inside the bracket
list: ^, -, ], \.
Occurrence Indicators (or Repetition Operators):
+: one or more (1+), e.g., [0-9]+ matches one or more digits such
as '123', '000'.
*: zero or more (0+), e.g., [0-9]* matches zero or more digits. It accepts all
those in [0-9]+ plus the empty string.
?: zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an
empty string.
{m,n}: m to n (both inclusive)
{m}: exactly m times
{m,}: m or more (m+)
Metacharacters: matches a character
. (dot): ANY ONE character except newline. Same as [^\n]
\d, \D: ANY ONE digit/non-digit character. Digits are [0-9]
\w, \W: ANY ONE word/non-word character. For ASCII, word characters
are [a-zA-Z0-9_]
\s, \S: ANY ONE space/non-space character. For ASCII, whitespace
characters are [ \n\r\t\f]
Position Anchors:
It does not match character, but position such as start-of-line, end-of-line, start-of-
word and end-of-word.
^, $: start-of-line and end-of-line respectively. E.g., ^[0-9]$ matches a
numeric string.
\b: boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\
b matches the word "cat" in the input string.
\B: Inverse of \b, i.e., non-start-of-word or non-end-of-word.
\<, \>: start-of-word and end-of-word respectively, similar to \b. E.g., \<cat\
> matches the word "cat" in the input string.
\A, \Z: start-of-input and end-of-input respectively.