Day2.2 DataAnalyticsLanguages
Day2.2 DataAnalyticsLanguages
Python
10-08-2022 Slide 1
History
10-08-2022 3
Numpy
10-08-2022 4
Matplotlib
10-08-2022 5
pandas
10-08-2022 6
Functions
10-08-2022 7
Types of Functions
10-08-2022 8
Types of Functions
10-08-2022 9
Built-in Functions
• >>>max(2,3)
• 3
• >>>max(2,3,4)
• 4
• >>>max(max(2,3),4)
• 4
• >>>max(‘abc’)
• c
• >>>max(‘abc’,’bcd’)
• bcd
• >>>max(2,’two’)
• Error
10-08-2022 10
Function Call
10-08-2022 11
Multiple Arguments
10-08-2022 12
Multiple Return Values
10-08-2022 13
No Return Values
10-08-2022 14
Scope of
parameters and variables
• >>>a=1
• >>>def plus4(x):
• b = x+4
• return b
• >>> a
• 1
• >>>plus4(5)
• 9
• >>>b
• Error
• >>>x
• Error
10-08-2022 15
Recursion
• >>>def fib(n):
• if(n==0):
• return 0
• elif(n==1):
• return 1
• else:
• return(fib(n-1)+fib(n-2))
• >>>fib(6)
• 8
10-08-2022 16
Sanity Check
• >>>def fib(n):
• if(not isinstance(n, int)):
• return -1
• if(n < 0):
return -1
• if(n==0):
• return 0
• elif(n==1):
• return 1
• else:
• return(fib(n-1)+fib(n-2))
• >>>fib(6)
• 8
10-08-2022 17
Python Loops
10-08-2022 18
The while loop
• n = 12
• >>>while (n < 20):
• n += 1
• print(n)
• >>>13
• >>>14
• >>>…
• >>>…
• >>>20
• NB: ++ operator does not exist in Python
10-08-2022 19
Counting downwards
• n = 256
• >>>while (n > 1):
• n /= 2
• print(n)
• >>>128.0
• >>>64.0
• >>>…
• >>>…
• >>>1.0
10-08-2022 20
Infinite loop
• n = 12
• >>>while (n > 2):
• n += 1
• print(n)
10-08-2022 21
Exiting loops using break
• >>> n = 2460
• >>>while (n > 1):
• n //= 2
• print (n)
• if(n%2 == 1):
• break
• >>>1230
• >>> 615
10-08-2022 22
The continue statement
• >>> n = 2460
• >>>while (n > 1):
• n //= 2
• if(n%2 == 1):
• continue
• print (n)
• >>>1230
• >>>76
• >>>38
• >>>4
• >>>2
• >>>0
10-08-2022 23
Using for loops
• >>>for i in range(2,8):
• print(i)
• >>>2
• >>>3
• >>>4
• >>>5
• >>>6
• >>>7
10-08-2022 24
Using for loops
• >>>for i in sequence:
• print(i)
• In Python sequences are lists, strings, tuples
• >>>for i in (2,4,6,8):
• print(i)
• >>>for i in [2,4,6,8]:
• print(i)
• >>>for i in “2468”:
• print(i)
• All print 2, 4, 6, 8 in consecutive lines
10-08-2022 25
Strings
10-08-2022 26
Compound data type
• Strings are made up of smaller units – characters and we
may access the whole or its parts
• A character in Python is a string of size 1
• Both single and double quotes can be used e.g. “fruits”
or ‘fruits’
• >>>name = “sachin”
• >>>print(name)
• >>>sachin
• >>>print(name[0])
• >>>s
10-08-2022 27
Length of a string
• Length can be found using the len() function
• fruit = “apple”
• >>>len(fruit)
• 5
• >>>len(“apple”)
• 5
• >>>fruit[5]
• Error
• >>>fruit[len(fruit)-1]
• >>>’e’
10-08-2022 28
String Slices
• >>>fruit = “apple”
• >>>fruit[1:3]
• >>>’pp’
• >>>fruit[1:]
• >>>’pple’
• >>>fruit[:4]
• >>>’appl’
• >>>fruit[:]
• >>>’apple’
10-08-2022 29
String Comparison
• >>>“apple” < “banana”
• True
• >>>”mango” < “banana”
• False
• >>>”mango” < “mango”
• False
• >>>”mango” < “Mango”
• False
• >>>”Mango” < “mango”
• True
10-08-2022 30
Strings are immutable
• >>>fruit = “Mango”
• >>>dance = fruit
• >>>dance
• ‘Mango’
• >>>dance[0]
• ‘M’
• >>>dance[0]=‘T’
• Error
• >>>dance = ‘T’+fruit[1:]
• >>>dance
• ‘Tango’
10-08-2022 31
String methods
• >>>import string
• >>>fruit=‘mango’
• >>>fruit.find(“go”)
• 3
• >>>fruit.upper()
• ‘MANGO’
10-08-2022 32
String methods
10-08-2022 33
Python Lists
10-08-2022 34
Sequences in Python
10-08-2022 35
List Values
• Examples
• [3, 5, 7, 11, 13, 17, 19]
• [2.5, 3, 4, 7.7]
• [2.5, “two”, 3, 4, “five”, 7]
• Mixed types are allowed in a list
• [2.5, 13, “seven”, [8,9,10]]
• Nested lists are also allowed
10-08-2022 36
Constructing List from Ranges
• >>>a = range(3, 7)
• >>>type(a)
• <class ’range’>
• >>>b = list(a)
• >>>b
• [3, 4, 5, 6]
• >>>type(b)
• <class ‘list’>
10-08-2022 37
Access
• >>>b = [3, 4, 5, 6]
• >>>b[0]
• 3
• >>>b[3]
• 6
• >>>b[4]
• Error
• >>>b[3-2]
• 4
10-08-2022 38
Access
10-08-2022 39
Length, Min, Max, Count
• >>>b
• [3, 4, 5, 6]
• >>>len(b)
• 4
• >>>min(b)
• 3
• >>>max(b)
• 6
• >>>sum(b)
• 18
10-08-2022 40
List Membership
• >>>b
• [3, 4, 5, 6]
• >>>3 in b
• True
• >>>”b” in b
• False
• >>>”3” not in b
• True
10-08-2022 41
List Operations
• Concatenation
• >>>b
• [3, 4, 5, 6]
• >>>b+b
• [3, 4, 5, 6, 3, 4, 5, 6]
• >>>2*b
• [3, 4, 5, 6, 3, 4, 5, 6]
• >>>b*2
• [3, 4, 5, 6, 3, 4, 5, 6]
10-08-2022 42
List Slices
• >>>b
• [3, 4, 5, 6]
• >>>b[0:3]
• [3,4,5]
• b[0:j] with j > 3 and b[0:] are same
• >>>b[:2]
• [3,4]
10-08-2022 43
List Slices
• >>>b[2:2]
• []
• b[i:j:k] is a subset of b[i:j] with elements
picked in steps of k
• >>>b=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
• >>>b[0:10:3]
• [1, 4, 7]
10-08-2022 44
Python Lists are Mutable
10-08-2022 45
Python Lists are Mutable
• >>>b
• [13, 4, 5, 6]
• Mutiple assignment
• >>>b[1:3] =[14, 15]
• >>>b
• [13, 14, 15, 6]
10-08-2022 46
Python Lists are Mutable
• Adding at the end
• >>>b[5:5] = 16
• Error
• >>>b[5:5] = [16]
• >>>b
• [13, 14, 15, 6, 16]
• Deletion
• >>>b[3:4] = []
• >>>b
• [13, 14, 15, 16]
10-08-2022 47
Copying
• >>>list1=list2
• >>>list1
• [‘a’,’b’,’c’,’d’]
• >>>list2
• [‘a’,’b’,’c’,’d’]
• >>>list1[0]=‘e’
• >>>list1
• [‘e’,’b’,’c’,’d’]
• >>>list2
• [‘e’,’b’,’c’,’d’]
10-08-2022 48
Cloning
• >>>list1=list2[:]
• >>>list1
• [‘a’,’b’,’c’,’d’]
• >>>list2
• [‘a’,’b’,’c’,’d’]
• >>>list1[0]=‘e’
• >>>list1
• [‘e’,’b’,’c’,’d’]
• >>>list2
• [‘a’,’b’,’c’,’d’]
10-08-2022 49
Lists as Function Arguments
• Passing a list as a parameter actually passes a reference
not a copy
• >>>def deleteFirst(list):
• del(list[0])
• >>>lista=[12, 17, 22]
• >>>lista
• [12, 17, 22]
• >>>deleteFirst(lista)
• >>>lista
• [17, 22]
10-08-2022 50
Nested Lists
• >>>lista=[‘a’,’b’,’c’,[‘d’,’e’,’f’]]
• >>>lista[3]
• [‘d’,’e’,’f’]
• >>>lista[3][1]
• ‘e’
• Square brackets evaluate left to right
10-08-2022 51
Python Regex
10-08-2022 Slide 52
Regular Expressions
http://en.wikipedia.org/wiki/Regular_expression
10-08-2022 53
Python Regular Expressions
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a chracter one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end
10-08-2022 54
Regular Expressions Module
10-08-2022 56
Wild-Card Characters
10-08-2022 57
Wild-Card Characters
10-08-2022 58
Greedy Matching
10-08-2022 59
Non-Greedy Matching
10-08-2022 60
Python Slicing
10-08-2022 Slide 61
String Slices
• >>>fruit = “apple”
• >>>fruit[1:3]
• >>>’pp’
• >>>fruit[1:]
• >>>’pple’
• >>>fruit[:4]
• >>>’appl’
• >>>fruit[:]
• >>>’apple’
10-08-2022 62
List Slices
• >>>b
• [3, 4, 5, 6]
• >>>b[0:3]
• [3,4,5]
• b[0:j] with j > 3 and b[0:] are same
• >>>b[:2]
• [3,4]
10-08-2022 63
List Slices
• >>>b[2:2]
• []
• b[i:j:k] is a subset of b[i:j] with elements
picked in steps of k
• >>>b=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
• >>>b[0:10:3]
• [1, 4, 7]
10-08-2022 64
NumPy array slicing
• 1-d array slicing and indexing is similar to
Python lists
• import numpy as np
• arr1=np.array([1,2,5,6,4,3])
• arr1[2:4]=99
• arr1
• Out[8]: array([ 1, 2, 99, 99, 4, 3])
10-08-2022 65
NumPy array slicing
• arr2[0]=88
• arr1
• Out[13]: array([ 1, 2, 88, 99, 4, 3])
10-08-2022 66
Sets
10-08-2022 Slide 67
in and notin
• >>>setA= {1,3,5,7}
• >>>3 in setA
• True
• >>>3 not in setA
• False
• >>>4 not in setA
• True
10-08-2022 68
Subset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issubset setB
• True
• >>> setA issubset setC
• False
10-08-2022 69
Superset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issuperset setB
• False
• >>> setB issuperset setA
• True
• >>> setC issuperset setA
• False
10-08-2022 70
Set Union
• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.union(setB)
• {1,3,5,7,9}
• >>>setA | setB
• {1, 3, 5, 7, 9}
10-08-2022 71
Set Intersection
• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.intersection(setB)
• {5,7}
• >>>setA & setB
• {5, 7}
10-08-2022 72
Dictionaries
10-08-2022 Slide 73
Chapter 9
Python
Dictionaries
What is a Collection?
Most of our variables have one value in them - when we put a new
value in the variable - the old value is overwritten
$ python
>>> x = 2
>>> x = 4
>>> print(x)
4
A Story of Two Collections ..
• List
• Dictionary
calculator tissue
perfume
money
candy
http://en.wikipedia.org/wiki/Associative_array
Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values
Key Value
One common use of dictionaries is
counting how often we “see” something
>>> ccc = dict()
>>> ccc['csev'] = 1
>>> ccc['cwen'] = 1
>>> print(ccc)
{'csev': 1, 'cwen': 1}
>>> ccc['cwen'] = ccc['cwen'] + 1
>>> print(ccc)
{'csev': 1, 'cwen': 2}
Dictionary Tracebacks
ccc = dict()
>>> print(ccc['csev'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'csev'
>>> 'csev' in ccc
False
When We See a New Name
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
if name not in counts: {'csev': 2, 'zqian': 1, 'cwen': 2}
counts[name] = 1
else :
counts[name] = counts[name] + 1
print(counts)
The get Method for Dictionaries
We can use get() and provide a default value of zero when the key
is not yet in the dictionary - and then just add one
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)
Counting Pattern
counts = dict()
print('Enter a line of text:')
The general pattern to count the
line = input('')
words in a line of text is to split
words = line.split() the line into words, then loop
through the words and use a
print('Words:', words) dictionary to track the count of
print('Counting...') each word independently.
for word in words:
counts[word] = counts.get(word,0) + 1
print('Counts', counts)
python wordcount.py
Enter a line of text:
the clown ran after the car and the car ran into the tent
and the tent fell down on the clown and the car
http://www.flickr.com/photos/71502646@N00/2526007974/
python wordcount.py
counts = dict() Enter a line of text:
line = input('Enter a line of text:') the clown ran after the car and the car ran
words = line.split()
into the tent and the tent fell down on the
print('Words:', words) clown and the car
print('Counting...’)
Words: ['the', 'clown', 'ran', 'after', 'the', 'car',
for word in words: 'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and',
counts[word] = counts.get(word,0) + 1 'the', 'tent', 'fell', 'down', 'on', 'the', 'clown',
print('Counts', counts)
'and', 'the', 'car']
Counting...
bigcount = None
bigword = None
python words.py
for word,count in counts.items(): Enter file: clown.txt
if bigcount is None or count > bigcount: the 7
bigword = word
bigcount = count
print(bigword, bigcount)
Using two nested loops
Dictionaries
>>>
>>> purse = dict()
>>>purse['money'] = 12
• Lists index their entries >>> purse['candy'] = 3
based on the position >>> purse['tissues'] = 75
in the list >>> print(purse)
{'money': 12, 'tissues':
• Dictionaries are like 75, 'candy': 3}
>>> print(purse['candy'])
bags - no order 3
>>> purse['candy'] =
• So we index the things purse['candy'] + 2
we put in the dictionary >>> print(purse)
with a “lookup tag” {'money': 12, 'tissues':
75, 'candy': 5}
10-08-2022 99
Comparing Lists and
Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values
>>> lst = >>> ddd = dict()
list() >>> ddd['age'] = 21
>>> >>> ddd['course'] =
lst.append(21) 182
>>> >>> print(ddd)
lst.append(183) {'course': 182,
>>> print(lst) 'age': 21}
[21, 183] >>> ddd['age'] = 23
>>> lst[0] = 23 >>> print(ddd)
>>> print(lst) {'course': 182,
[23, 183] 'age': 23}
10-08-2022 100