DA Python Env Intro
DA Python Env Intro
1
Agenda
● Introduction
● Learning algorithm types and ML lifecycle
● Choice of language and environment
● Anaconda Ecosystem vs Colab Ecosystem
● IDLE, PyCharm
● Installation details
● Primer on Python primitives or Base Python 3.x
2
Introduction
● Data Analytics is a collection of techniques, tools, frameworks used for adding
value to raw data
● Some common terms associated with Analytics are statistical techniques,
machine learning, deep learning and artificial intelligence
● To put it simply, AI possibly is the most generic term which subsumes ML,
Neural Networks and many other forms of learnings
● Deep Learning is a specialized form of ANN
● ML is a set of algorithms that have the ability to identify patterns which could
help in classification, or prediction in general
● ML shares its set of algorithms with the domain of Data Mining, which could
be classified as first set of business systems to apply algorithms to large scale
business related data to unearth interesting patterns
3
Learning Algorithms
● Supervised learning: Requires actual labels and answers with the data for the
system to figure out patterns. Usually involves a loss function definition and
minimization of the loss function to identify the feature weights or the appropriate
model function
● Unsupervised learning: Set of algorithms where the actual outcome is not known
apriori are known as unsupervised. PCA, Clustering, Gibbs Sampling etc
● Semi-supervised learning: Semi-Supervised learning is a type of algorithm that
represents the middle ground between Supervised and Unsupervised learning
algorithms. It uses the combination of labeled and unlabeled datasets during the
training period
● Evolutionary learning: Algorithms that incorporate the ideas of biological evolution or
swarm optimization often fall in this category.
● Reinforcement learning: Sequential algorithms that bring into play self learning
through experimentation and incorporate uncertainty in some ways
4
ML LifeCycle: Well-Architected ML lifecycle (AWS)
5
Design Principles for MLOps
Source:
https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/well-architected-machine-learning-design-principles.html
9
Comparison
Language Open Ease of Integrability Suitability for Scalability
Source Usage/ with other production
Learning frameworks ready code
R yes high medium low medium
Matlab no high low low medium
Octave yes high medium low low
Python yes medium high medium high
Java yes low high high v.high
Based on instructor’s perspective with regard to usage*
10
Environment
● Could mean framework, IDE, packages and libraries, or simply user
interface or CLI depending upon the context
● Can be simple or complex
● R for example offers RGUI as a simple interface IDE whereas Eclipse
represents a professional IDE for Java and many other scalable
development oriented languages
● Environment can have a major impact on learnability and acceptance and
therefore simpler environment is usually preferred for non-production
applications
11
Two major environments
● For data science related applications two major environment have been used
extensively (AWS SageMaker and Azure ML Studio are other ecosystems)
● Anaconda Ecosystem
○ Open source collection of libraries and interface, mostly used on-premise
though lately Cloud based options are present
○ Compatible with most widely used libraries and high flexibility for
customized environments
○ Privacy, security on local system
● Colaboratory offered by GCP
○ Both free and pay as you go options available on Cloud depending upon
resources required
○ Easy collaboration, auto updates, convenience, same notebook can work
for other language like Go, JS, R etc
12
IDLE vs PyCharm vs Spyder
● IDLE is lightweight, simple, and provides basic functionality such as syntax
highlighting in shell and Python files. Its free and open source
● It gets installed by default installation of Python 3.x on most windows and mac
systems. One can use an interactive session where the Python interpreter executes
all your commands or can create a new file with “.py” suffix to indicate that it’s a
Python file — and execute the Python file simply by running the code
● PyCharm is a professional IDE to be used only if one has larger projects with
multiple Python files (https://www.jetbrains.com/pycharm/ ). Pycharm comes in two
versions: one is a community version which is open-source and free to use. The
other one is the professional version which can be downloaded for a 30-day free
trial.
● Spyder IDE, comes with the Anaconda Environment once installed on the system
13
Installation on a standalone system
● For IPython, use this link for download https://www.python.org/downloads/
and follow the instructions for installation based on the operating system
involved for on premise installation
● For Anaconda one could use the following link and follow instructions:
https://www.anaconda.com/download/
● Installation of Anaconda by default installs Jupyter notebook (run on
command line jupyter notebook and it can be invoked refer the following link
(https://test-jupyter.readthedocs.io/en/latest/install.html#id3
● In case you are not able to install please connect to me by sending an email
or connect with the Computer Center
14
Installation and overview of Colab
● Go to Colab webpage Welcome To Colaboratory . Now refer Overview of Colaboratory
Features
● Upload your .ipynb file or click on +code to insert new code
● Connect to hosted runtime, to run Python code, just add code in the cell and press the
play button at the left of the cell. This will run the IPython for the selected cell
● Jupyter Notebooks allow to surround your code with relevant documentation in a
digestible format using Markdown. (Markdown Guide - Colaboratory )
● To open a new markdown cell in Google Colab by pressing on + Text at the top of the
notebook or below any cell that you hover with your mouse, or by clicking on Insert +
Text cell
● Colaboratory shares the notion of magics from Jupyter. There are shorthand
annotations that change how a cell's text is executed.
https://nbviewer.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magi
cs.ipynb
15
More about Colab
● On the left panel there are quick links that show:
● Table of content: Table of content showing the Markdown headings
● Find and replace: Find and replace any string or regex from the entire file
● Variable inspector: Show all variables that are stored
● File explorer: Files and directories available from Colab. One can connect to
Google Drive from Google Colab so that one can use the files already stored or
even store the result of scripts. To use the files from Google Drive into Google
Colab
○ from google.colab import drive
○ drive.mount('/content/drive')
● Code snippets: Pre-built reusable code snippets
● Search commands: Search box of the commands available from the menu
16
● Terminal: In the pro version one can get access to the runtime’s terminal
Colab shortcuts and other discussion
17
A primer on Python 3.x
● Variable declaration, Python ● For input we can use:
supports int, float, bool, and ● str1 = input(“Enter name”)
str
● print(str1)
● var1 = 5.0
● var2 = 2 ● m1 = int(input(“input an
● var3 = True integer”))
● var4 = "hello everyone" ● print(m1)
● print("greetings",var4)
● greetings hello everyone
● type(var3)
● <class 'bool'>
18
If else in Python
● Python supports if-elif-else ● A more simpler form can be as
statements follows
● x=2 ● isGreater = True if x>y else False
● y=4 ● print(isGreater)
● if(x>y): ● False
● print('x is greater',x)
● elif(x<y):
● print('x is smaller',y)
● else:
● print('x=y',x)
●
● x is smaller 4
19
Sequence or range in Python
20
Examples of for loop with break and continue
● for n in range(2,10): ● for num in range(2, 10):
● p1 =0 ● if num == 5:
● for x in range(2,n):
● if n % x==0 : ● continue
● print(n,'equals',x,'*',n//x) ● print("Not Found ",
● p1 =1 num)
● break
● if(p1==0):
● print(n,'is prime number')
21
Basic functions in Python
● def addElements(a,b): ● def square(x):
● return(a+b) ● return(x**2)
● ●
● addElements(3,4) ● square(-2)
● 7
● 4
● addElements(3.5,4.6)
● 8.1
● def fib(n):
● addElements('machine','learning') ● a, b = 0, 1
● 'machinelearning' ● while a < n:
● ● print(a,end =’ ‘) #note
● def noUse(): ● a, b = b, a+b
● print('Hello world') ● You may write using “for” ?
●
● noUse()
● Hello world 22
Match (switch like) function
● def http_error(status): ● Recursion example
● match status: ● def fac1(n):
● case 400: ● p1=1
● return "Bad request"
● case 404: ● if n<2 :
● return "Not found" ● p1 =p1*1
● case 418: ● else:
● return "I'm a teapot" ● p1=p1*n*fac1(n-1)
● case _: ● return(p1)
● return "Something's wrong
● Explore more examples for
with the internet"
● #run recursion, also write as iterative
● http_error(200) loop
● "Something's wrong with the
internet" 23
Arrays and associated issues
● import array as arr ● marks.reverse()
● marks =arr.array('i',[1,2,3,4,5]) ● marks
● marks ● array('i', [5, 4, 3, 2, 1])
● array('i', [1, 2, 3, 4, 5]) ● Other methods include, append,
● type(marks) extend, pop, remove, index,insert,
● <class 'array.array'> del etc
● len(marks) ● Primitive arrays are no longer used in
● 5 Python any longer, we make use of
● marks list based structures
● array('i', [1, 2, 3, 4, 5]) ● In some cases we use numpy based
arrays structures if required
● https://docs.python.org/3/library/array
.html
24
Arrays with numpy library
● import numpy as np ● m2 = np.array([[1,2,3],[4,5,6],[7,8,9]])
● m1 = np.array(range(10)) ● m2
● array([[1, 2, 3],
● m1
● [4, 5, 6],
● array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) ● [7, 8, 9]])
● type(m1) ● m2[2,2]
● <class 'numpy.ndarray'> ● 9
● m1[2] ● m2[0,1]
● 2
● 2
● subset1=m2[0:2,0:2]
● m1.reshape((2,5)) ● subset1
● array([[0, 1, 2, 3, 4], ● array([[1, 2],
● [5, 6, 7, 8, 9]]) ● [4, 5]])
25
Data Structures: List
● empList =[] ● Example from TB
● empList.append(1) ● batsmen=['Dhoni','Rohit','Virat','Risha
bh']
● empList
● bowlers =
● [1] ['Bumrah','Shami','Shardul','Siraj']
● empList.append(4) ● batsmen
● empList ● ['Dhoni', 'Rohit', 'Virat', 'Rishabh']
● [1, 4] ● batsmen[2]
● 'Virat'
● emptyList = []
● bowlers[0]
● for i in range(1,10): ● 'Bumrah'
● emptyList.append(i) ● allPlayers=batsmen + bowlers
● ● allPlayers
● emptyList ● ['Dhoni', 'Rohit', 'Virat', 'Rishabh',
'Bumrah', 'Shami', 'Shardul', 'Siraj']
● [1, 2, 3, 4, 5, 6, 7, 8, 9]
26
More on List
● len(allPlayers) ● allPlayers.reverse()
● 8 ● allPlayers
● allPlayers[0]
● ['Siraj', 'Shardul', 'Shami',
● 'Dhoni'
● allPlayers[-1] 'Bumrah', 'Rishabh', 'Virat',
● 'Siraj' 'Rohit', 'Dhoni']
● 'Shardul' in bowlers ● allPlayers.sort()
● True ● allPlayers
● 'Virat' in allPlayers
● ['Bumrah', 'Dhoni', 'Rishabh',
● True
● allPlayers.index('Virat') 'Rohit', 'Shami', 'Shardul', 'Siraj',
● 2 'Virat']
27
More on List
● list.append(elem) -- adds a single element to the end of the list.
● list.insert(index, elem) -- inserts the element at the given index, shifting elements
to the right.
● list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a
list is similar to using extend().
● list.index(elem) -- searches for the given element from the start of the list and
returns its index. Throws a ValueError if the element does not appear
● list.remove(elem) -- searches for the first instance of the given element and
removes it (throws ValueError if not present)
● list.sort() -- sorts the list in place (does not return it).
● list.reverse() -- reverses the list in place (does not return it)
● list.pop(index) -- removes and returns the element at the given index. Returns the
rightmost element if index is omitted. 28
Working with “list” like array
● h1 = [[1,2,3],[4,5,6],[7,8,9]] ● T = h1
● for r in T:
● h1[1] ● for c in r:
● [4, 5, 6] ● print(c,end=' ')
● print()
● h1[1][2] ● h2=[[0]*3]*3 //beware!!!!
● 6 ● h2
● for i in range(3): ● [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
● h2[0][1]=1
● for j in range(3): ● h2
● print(h1[i][j]) ● [[0, 1, 0], [0, 1, 0], [0, 1, 0]]
● h2[0]==h2[1]
● print(end='\n') ● True
● h1[0]==h1[1]
● False
29
List Comprehensions
● squares =[]
● for x in range(10):
● squares.append(x*x)
●
● squares
● [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
●
● squares = [x**2 for x in range(10)]
● squares
● [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
●
● squares = list(map(lambda x: x*x ,range(10)))
● squares
● [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
30
The lambda operator
● A lambda function is an anonymous function which can take any number of
arguments, but can only have one expression
● x = lambda a : a + 10
● print(x)
● <function <lambda> at 0x7f7f560aeef0>
● print(x(5))
● 15
● x = lambda a, b : a * b
● print(x(5,6))
● 30
● Explore filter() with lambda operator?
○ t1 = filter(lambda x: x%2 == 0,range(1,10))
31
Using map in Python
● map() function returns a map object(which is an iterator) of the results after
applying the given function to each item of a given iterable (list, tuple)
● Syntax : map(fun, iter)
● def addAny(a,b):
● return(a+b)
●
● n1 = range(6)
● n2 = range(2,7)
● b1 = list(map(addAny,n1,n2))
● b1
● [2, 4, 6, 8, 10]
32
DS: Tuples
● Tuples are an immutable list often ● tran1[2]='price'
used in transactional processing ● Traceback (most recent call last):
especially in retail settings ● File "<pyshell#35>", line 1, in
● tran1 =('cid','item','qty') <module>
● type(tran1) ● tran1[2]='price'
● <class 'tuple'> ● TypeError: 'tuple' object does not
● tran1[0] support item assignment
● 'cid' ● Existing list say X1 may be
● tran1 converted into tuple by typecasting
● ('cid', 'item', 'qty') ● X1 = tuple(X2)
● One cannot modify any element of ● It is better to be careful before any
the tuple such conversion
33
DS: Set
● A set is a mathematical construct that is ● Set supports union, intersection and
a collection of unique elements with a difference operations
specific objective ● wc2011
● setofPlayers = ={'Tendulkar','Dhoni','Sehwag','Gamb
{‘Tendulkar’,’Dhoni’,’Sangkara’,’Ponting’ hir','Yuvraj','Kohli'}
,’Lara’} ● wc2015 =
● setofPlayers {'Dhoni','Kohli','Dhawan','Rohit','Raina
● {'Sangkara', 'Lara', 'Ponting', 'Dhoni', ','Jadeja'}
'Tendulkar'} ● t1 = wc2011.union(wc2015)
● setofPlayers[2] ● t1
● Traceback (most recent call last): ● {'Gambhir', 'Dhawan', 'Yuvraj',
● File "<pyshell#39>", line 1, in
'Tendulkar', 'Jadeja', 'Raina',
'Sehwag', 'Kohli', 'Rohit', 'Dhoni'}
<module>
● t2 = wc2011.intersection(wc2015)
● setofPlayers[2]
● t2
● TypeError: 'set' object is not
● {'Kohli', 'Dhoni'} 34
subscriptable
Methods for Set
● Method Description
● add() Adds an element to the set
● clear() Removes all the elements from the set
● copy() Returns a copy of the set
● difference() Returns a set containing the difference between two or more sets
● difference_update() Removes the items in this set that are also included in another, specified set
● discard() Remove the specified item
● intersection() Returns a set, that is the intersection of two or more sets
● intersection_update() Removes the items in this set that are not present in other, specified set(s)
● isdisjoint() Returns whether two sets have a intersection or not
● issubset() Returns whether another set contains this set or not
● issuperset() Returns whether this set contains another set or not
● pop() Removes an element from the set
● remove() Removes the specified element
● symmetric_difference() Returns a set with the symmetric differences of two sets
● symmetric_difference_update() inserts the symmetric differences from this set and another
● union() Return a set containing the union of sets
35
● update() Update the set with another set, or any other iterable
DS: Dictionary
● Python's efficient key/value hash ● dict1['c']=8
table structure is called a "dict". In ● dict1
others words it is a list of key and ● {'a': 'alpha', 'b': 'beta', 'c': 8}
value pairs ● 'a' in dict1
● dict1 = {} ● True
● type(dict1) ● if 'a' in dict1:
● <class 'dict'> ● print(dict1['a'])
● dict1['a']='alpha' ● alpha
● dict1['b']='beta' ●
● dict1['c']='gamma' ● if 'z' in dict1:
● dict1 ● print(dict1['z'])
● {'a': 'alpha', 'b': 'beta', 'c': 'gamma'} ● ## None
36
DS: Dictionary
● dict1.keys() ● del dict1['b']
● dict_keys(['a', 'b', 'c']) ● dict1
● dict1.values() ● {'a': 'alpha', 'c': 8}
● dict_values(['alpha', 'beta', 8]) ●
● ● users = {'Hans': 'active', 'Dorian':
● for key in dict1: 'inactive', 'Evan': 'active'}
print(key) ● for user, status in
users.copy().items():
●
● for key in sorted(dict1.keys()):
● if status == 'inactive':
● print(key, dict1[key]) ● del users[user]
● ●
● a alpha ● users
● b beta ● {'Hans': 'active', 'Evan': 'active'}
● c8 37
Dictionary Methods
● Method Description
● clear() Removes all the elements from the dictionary
● copy() Returns a copy of the dictionary
● fromkeys() Returns a dictionary with the specified keys and value
● get() Returns the value of the specified key
● items() Returns a list containing a tuple for each key value pair
● keys() Returns a list containing the dictionary's keys
● pop() Removes the element with the specified key
● popitem() Removes the last inserted key-value pair
● setdefault() Returns the value of the specified key. If the key does not exist: insert
the key, with the specified value
● update() Updates the dictionary with the specified key-value pairs
● values() Returns a list of all the values in the dictionary
38
Strings
● string0 = 'python' ● pi = 3.14
● var1 = 'machine learning' ● text = 'The value of pi is ' +
● string0.upper() str(pi)
● 'PYTHON' ● text
● len(var1) ● 'The value of pi is 3.14'
● 16
● tokens = var1.split(' ')
● tokens
● ['machine', 'learning']
● string0[1:4]
● 'yth'
39
String methods (may not work across the platforms)
● s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
● s.strip() -- returns a string with whitespace removed from the start and end
● s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various
character classes
● s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the
given other string
● s.find('other') -- searches for the given other string (not a regular expression) within
s, and returns the first index where it begins or -1 if not found
● s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been
replaced by 'new'
● s.split('delim') -- returns a list of substrings separated by the given delimiter
● s.join(list) -- opposite of split(), joins the elements in the given list together using
the string as the delimiter
40
Module random
Method Description
getstate() Returns the current internal state of the random number generator
choices() Returns a list with a random selection from the given sequence
46
Thanks !!!
47