Module 4:
Data Analytics Languages--
Python
31/07/2024 Slide 1
History
• Python created by Guido van Rossum in the
Netherlands in 1990
• Popular programming language
• Widely used in industry and academia
• Simple, intuitive syntax
• Rich library
• Two versions in existence today Python 2 and
Python 3
eLahe Technologies 2020
31/07/2024 2
www.elahetech.com
Interpreted Language
• Python is an interpreted language as opposed
to being compiled
• An interpreter reads a high level program and
executes it
• A compiler translates the program into an
executable object code first which is
subsequently executed
eLahe Technologies 2020
31/07/2024 3
www.elahetech.com
Numpy
• NumPy is the fundamental package for scientific
computing with Python. It contains among other
things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random
number capabilities
eLahe Technologies 2020
31/07/2024 4
www.elahetech.com
Matplotlib
• Matplotlib is a Python 2D plotting library
which produces publication quality figures in
a variety of hardcopy formats and interactive
environments across platforms.
eLahe Technologies 2020
31/07/2024 5
www.elahetech.com
pandas
• pandas is an open source, BSD-licensed
library providing high-performance, easy-to-
use data structures and data analysis tools
for Python
eLahe Technologies 2020
31/07/2024 6
www.elahetech.com
Python Regex
31/07/2024 Slide 7
Regular Expressions
In computing, a regular expression, also referred to as
"regex" or "regexp", provides a concise and flexible
means for matching strings of text, such as particular
characters, words, or patterns of characters. A regular
expression is written in a formal language
that can be interpreted by a regular expression
processor.
http://en.wikipedia.org/wiki/Regular_expression
31/07/2024 8
Python Regular Expressions
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a chracter one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end
31/07/2024 9
The Regular Expression Module
• Before you can use regular expressions in your
program, you must import the library using
"import re"
• You can use re.search() to see if a string matches a
regular expression similar to using the find()
method for strings
• You can use re.findall() extract portions of a string
that match your regular expression similar to a
combination of find() and slicing: var[5:10]
31/07/2024 10
Wild-Card Characters
• The dot character matches any character
• If you add the asterisk character, the character is
"any number of times"
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
31/07/2024 11
Wild-Card Characters
• The dot character matches any character
• If you add the asterisk character, the character is
"any number of times"
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character
31/07/2024 12
Wild-Card Characters
• Depending on how "clean" your data is and the
purpose of your application, you may want to
narrow your match down a bit
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character
31/07/2024 13
Greedy Matching
• The repeat characters (* and +) push outward in both
directions (greedy) to match the largest possible string
One or more
>>> import re characters
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print y
^F.+:
['From: Using the :']
First character in the Last character in the
Why not 'From:'? match is an F match is a :
31/07/2024 14
Non-Greedy Matching
• Not all regular expression repeat codes are greedy!
If you add a ? character - the + and * chill outOne
a bit...
or more
>>> import re characters but
>>> x = 'From: Using the : character' not greedily
>>> y = re.findall('^F.+?:', x)
>>> print y
^F.+?:
['From:']
First character in the Last character in the
match is an F match is a :
31/07/2024 15
Python Slicing
31/07/2024 Slide 16
String Slices
• >>>fruit = “apple”
• >>>fruit[1:3]
• >>>’pp’
• >>>fruit[1:]
• >>>’pple’
• >>>fruit[:4]
• >>>’appl’
• >>>fruit[:]
• >>>’apple’
31/07/2024 17
List Slices
• >>>b
• [3, 4, 5, 6]
• >>>b[0:3]
• [3,4,5]
• b[0:j] with j > 3 and b[0:] are same
• >>>b[:2]
• [3,4]
31/07/2024 18
List Slices
• >>>b[2:2]
• []
• b[i:j:k] is a subset of b[i:j] with elements
picked in steps of k
• >>>b=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
• >>>b[0:10:3]
• [1, 4, 7]
31/07/2024 19
NumPy array slicing
• 1-d array slicing and indexing is similar to
Python lists
• import numpy as np
• arr1=np.array([1,2,5,6,4,3])
• arr1[2:4]=99
• arr1
• Out[8]: array([ 1, 2, 99, 99, 4, 3])
eLahe Technologies 2020
31/07/2024 20
www.elahetech.com
NumPy array slicing
• Slicing in ndarrays is different from Python lists in that
data is not copied
• Slices are views on the original array!
• arr2=arr1[2:4]
• arr2[0]=88
• arr1
• Out[13]: array([ 1, 2, 88, 99, 4, 3])
eLahe Technologies 2020
31/07/2024 21
www.elahetech.com
Sets
31/07/2024 Slide 22
in and notin
• >>>setA= {1,3,5,7}
• >>>3 in setA
• True
• >>>3 not in setA
• False
• >>>4 not in setA
• True
31/07/2024 23
Subset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issubset setB
• True
• >>> setA issubset setC
• False
31/07/2024 24
Superset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issuperset setB
• False
• >>> setB issuperset setA
• True
• >>> setC issuperset setA
• False
31/07/2024 25
Set Union
• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.union(setB)
• {1,3,5,7,9}
• >>>setA | setB
• {1, 3, 5, 7, 9}
31/07/2024 26
Set Intersection
• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.intersection(setB)
• {5,7}
• >>>setA & setB
• {5, 7}
31/07/2024 27
Dictionaries
31/07/2024 Slide 28
Dictionaries
>>>
• Lists index their entries >>> purse = dict() >>>purse['money'] =
12
based on the position >>> purse['candy'] = 3
in the list >>> purse['tissues'] = 75
>>> print(purse)
• Dictionaries are like {'money': 12, 'tissues': 75, 'candy': 3}
bags - no order >>> print(purse['candy'])
3
• So we index the things >>> purse['candy'] = purse['candy'] + 2
we put in the dictionary >>> print(purse)
{'money': 12, 'tissues': 75, 'candy': 5}
with a “lookup tag”
Comparing Lists and
Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values
>>> lst = list() >>> ddd = dict()
>>> lst.append(21) >>> ddd['age'] = 21
>>> lst.append(183) >>> ddd['course'] = 182
>>> print(lst) >>> print(ddd)
[21, 183] {'course': 182, 'age': 21}
>>> lst[0] = 23 >>> ddd['age'] = 23
>>> print(lst) >>> print(ddd)
[23, 183] {'course': 182, 'age': 23}