Introduction to Python
Arun Kumar
IIT Ropar
November 18, 2017
1 / 41
Outline of the Talk
1 Introduction to Python
Scientific Stack
Quotes about Python
2 Working With Python
Python Containers
Conditionals/Iteration/Looping/ Functions/Modules
3 Introduction to NumPy
Solving system of linear equations
4 Introduction to Matplotlib
Scatter Plot
Plotting a histogram
5 Introduction to Pandas
Pandas Data Structures
6 Scipy
7 More Functions...
2 / 41
3 / 41
What is Python ?
1 Flexible, powerful language with FOSS license
2 Easy and compact syntax
3 Batteries included i.e. it comes with a large library of useful modules.
4 Free as in free beer which is an open source beer project.
5 Its designer, Guido Van Rossum took the name form BBC comedy
series Monty Pythons Flying Circus".
6 Website: http://www.python.org
4 / 41
Scientific Stack
NumPy
provides support for large, multi-dimensional arrays and matrices.
Pandas
pandas builds on NumPy and provides richer classes for the management and analysis of time
series and tabular data.
SciPy
contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT,
signal and image processing, ODE solvers etc.
matplotlib
This is the most popular plotting and visualization library for Python, providing both 2D and 3D
visualization capabilities.
rpy2
The high-level interface in rpy2 is designed to facilitate the use of R by Python programmers.
5 / 41
Quotes about Python
Python is fast enough for our site and allows us to produce
maintainable features in record times, with a minimum of
developers," said Cuong Do, Software Architect,
YouTube.com.
YouTube
Python has been an important part of Google since the
beginning, and remains so as the system grows and
evolves. Today dozens of Google engineers use Python,
and were looking for more people with skills in this
language." said Peter Norvig, director of search quality at
Google, Inc.
Google
... Python also shines when it comes to code maintenance.
Without a lot of documentation, it is hard to grasp what is
going on in Java and C++ programs and even with a lot of
documentation, Perl is just hard to read and maintain," says
Friedrich (Senior Project Engineer).
United Space Alliance, NASAs main shuttle support
contractor 6 / 41
Popularity
Table: TIOBE Index for October 2017
Oct 2016 Oct 2017 Language Ratings Change
1 1 Java 12.43% -6.37%
2 2 C 8.37% -1.46%
3 3 C++ 5.01% -0.79%
4 4 C# 3.86% -0.51%
5 5 Python 3.80% +0.03%
19 13 Matlab 1.159% +0.26%
Table: PYPL Popularity (Worldwide), Oct 2017 compared to a year ago:
Rank Language Share Trend
1 Java 22.2% -0.9%
2 Python 17.6% +4.3%
3 PHP 8.8% -1.0%
4 Javascript 8.0% +0.5%
5 C# 7.7% -0.9 %
7 / 41
Running Python
Python programs are executed by an interpreter.
The interpreter can be started by simply typing python in a command
shell.
We can also use IDLE environment for running python.
We can use the read-evaluation loop when the interpreter starts.
When interpreter starts it shows the >>> prompt sign.
8 / 41
Variables and Arithmetic Expression
Example (Python as a Calculator)
>>> 2+2
Example (Dynamically Typed)
>>> a = 1000
>>> a = 0.05
Remark
A variable is a way of referring to a memory location used by a computer
program. A variable is a symbolic name for this physical location.
Python is a dynamically typed language where variables names are
bound to different values, possibly of varying types, during the program
execution.
The equality sign = " should be read" or interpreted as is set to" not as
is equal to".
For x = 2, y = 2, z = 2, the id(x), id(y) and id(z) will be same.
9 / 41
Python Containers
Strings
To calculate string literals, enclose them in single, double, or triple quotes as
follows:
Example
>>> a = Hello World"; b = Python is good; c = computer says no
Lists
Lists are sequences of arbitrary objects. You create a list by enclosing values
in square brackets, as follows:
Example
>>> names = [a, b, c, d]
>>> weights = [45, 50, 70, 55]
10 / 41
Python Containers Cont...
Tuples
You create a tuple by enclosing a group of values in parentheses. Unlike lists
the content of the tuple cannot be modified after creation.
Example
>>> stock = GOOG, 100, 490.10
or by using
>>> stock = (GOOG, 100, 490.1)
Sets
A set is used to contain an unordered collection of objects. Unlike lists and
tuples, sets are unordered and cannot be indexed by numbers. Moreover set
contains unique elements.
Example
>>> s= set([1,1,2,3,4,5,3])
>>> s
set([1, 2, 3, 4, 5])
11 / 41
Python Containers Contd...
Dictionaries
A dictionary is an associative array that contains objects indexed by keys. A
dictionary can be created as follows:
Example
>>> stock = {name: GOOG, shares: 100, price: 200.10}
>>> stock[date] = 18 Nov 2017
Remark
Essentially containers are those python objects which have a __contains__
method defined.
12 / 41
Conditionals
>>> temp = 25
>>> if temp > 20 and temp<28:
print pleasant"
else:
print extreme"
>>> names = [Amitabh", "Aishwarya", "Salman", "Abhishek"]
>>> for name in names:
if name[0] in AEIOU":
print name + " starts with a vowel"
else:
print name + " starts with a consonant"
13 / 41
Iteration and Looping
The most widely used looping construct is the for statement, which is used to
iterate over a collection of item.
Example
>>> for n in [1,2,3]:
print 2 to the %d power is %d " %(n, 2**n)
2 to the 1 power is 2
2 to the 2 power is 4
2 to the 3 power is 8
Same thing can be done by using the range function as follows:
Example
>>> for n in range(1,6):
print 2 to the %d power is %d " %(n, 2**n)
14 / 41
Functions and Modules
Functions
def statement is used to create a function.
Example
>>> def remainder(a,b):
q =a//b; r = a-q*b
return r
Modules
A module is a collection of classes and functions for reuse.
1 save the rem.py in the folder say C:\Users\Admin\Desktop\myModule
2 append the path of the module to sys paths list as follows:
>>> import sys
>>> sys.path.append(rC:\Users\Admin\Desktop\myModule)
3 import the module as
>>> import rem
>>> rem.remainder(10,20)
15 / 41
Python Objects and Classes
Class
Class is a group or category of things having some properties or attributes in
common and differ from others by kind, type, or quality.
Object
object is one of the instance of the class. An object can perform the methods
and can also access the attributes which are defined in the class.
16 / 41
Classes
Python Class
class Stack(object):
def __init__(self):
self.stack = [ ]
def push(self, item):
self.stack.append(item)
def pop(self):
return self.stack.pop()
def length(self):
return len(self.stack)
Remark
self represents the instance of the class. By using the "self" keyword we
can access the attributes and methods of the class in python.
"__init__" is a reserved method in python classes. It is known as a
constructor in object oriented concepts. This method called when an
object is created from the class and it allow the class to initialize the
attributes of a class. 17 / 41
18 / 41
Solving system of linear equations
Suppose you want to solve the system of linear equations
x + 2y = 5
3x + 4y = 6
We can solve it with the help of python package numpy as follows:
Example
>>> import numpy as np
>>> A = np.array([[1,2],[3,4]])
>>> b = np.array([[5],[6]])
>>> np.linalg.solve(A,b)
array([[-4. ], [ 4.5]])
Example (Finding determinant and inverse)
>>> np.linalg.det(A)
>>> np.linalg.inv(A)
19 / 41
20 / 41
Matplotlib
This is an object-oriented plotting library. A procedural interface is provided
by the companion pyplot module, which may be imported directly, e.g.:
import matplotlib.pyplot as plt
(a) 1a (b) 1b
(c) 1a (d) 1b
Figure: 3D plots using matplotlib 21 / 41
Plotting a Function
Example (Plotting a function)
Suppose we want to plot f (t) = et cos(2t).
>>> def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)
>>> t1 = np.arange(0.0, 5.0, 0.1)
>>> plt.plot(t1, f(t1))
>>> plt.show()
22 / 41
Scatter Plot
Scatter plot helps in visualizing the association between two random
variables.
Example (Scatter plots)
>>> import numpy as np
>>> x = np.random.normal(0,1,1000)
>>> y = np.random.normal(0,1,1000)
>>> plt.scatter(x,y)
>>> plt.show()
>>> np.corrcoef(x,y)
Example (Linearly related rvs)
>>> x = np.random.normal(0,1,100)
>>> y = map(lambda u: 2*u+5, x)
>>> plt.scatter(x,y)
23 / 41
Histogram
Example (Standard normal to genral normal rv)
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> mu, sigma = 100, 15
>>> x = mu + sigma * np.random.randn(10000)
>>> n, bins, patches = plt.hist(x, 50, normed=1, facecolor=g)
>>> plt.grid(True)
>>> plt.show()
Example (Standard normal directly)
>>> x = np.random.normal(100,15,10000)
>>> n, bins, patches = plt.hist(x, 50, normed=1, facecolor=r)
>>> plt.show()
24 / 41
Simulating 3D Brownian Motion
>>> import pandas
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from mpl_toolkits.mplot3d import Axes3D
>>> x = np.random.normal(0,1,100)
>>> x_cumsum = x.cumsum()
>>> y = np.random.normal(0,1,100)
>>> y_cumsum = y.cumsum()
>>> z = np.random.normal(0,1,100)
>>> z_cumsum = z.cumsum()
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111, projection=3d)
>>> ax.plot(x_cumsum, y_cumsum, z_cumsum)
>>> plt.show()
>>> plt.savefig(\path\)
25 / 41
3D Brownian Motion
26 / 41
27 / 41
Pandas
Pandas helps to carry out your entire data analysis workflow in Python
without having to switch to a more domain specific language like R.
What People Say About Pandas
Roni Israelov, Portfolio Manager AQR Capital Management pandas
allows us to focus more on research and less on programming. We have found pandas easy
to learn, easy to use, and easy to maintain. The bottom line is that it has increased our
productivity."
David Himrod, Director of Optimization & Analytics pandas is the perfect
tool for bridging the gap between rapid iterations of ad-hoc analysis and production quality
code. If you want one tool to be used across a multi-disciplined organization of engineers,
mathematicians and analysts, look no further."
Olivier Pomel, CEO Datadog We use pandas to process time series data on our
production servers. The simplicity and elegance of its API, and its high level of performance
for high-volume datasets, made it a perfect choice for us."
28 / 41
Series and DataFrame
Example (Pandas Series)
>>> import datetime
>>> dt1 = datetime.datetime(2015,1,1)
>>> dt2 = datetime.datetime(2015,1,10)
>>> dates = pandas.date_range(dt1,dt2,freq = D)
>>> value = np.random.normal(0,1,10)
>>> ts = pandas.Series(value, dates)
Example (Pandas DataFrame)
>>> v1 = np.random.normal(0,1,10)
>>> v2 = np.random.normal(0,1,10)
>>> d = {col1: v1, col2: v2}
>>> df = pandas.DataFrame(d, index = dates)
29 / 41
30 / 41
Binomial Distribution
Example
A company drills 10 oil exploration wells, each with a 8% chance of success.
Eight of the ten wells fail. What is the probability of that happening ?
>>> import scipy.stats
>>> x = scipy.stats.binom(n=10, p=0.08)
>>> x.pmf(2)
0.14780703546361768
Solving by Simulation
>>> N = 20000
>>> x = scipy.stats.binom(n=10, p=0.08)
>>> rns = x.rvs(N)
>>> (rns == 1).sum() / float(N)
31 / 41
Cubic Spline interpolation
Suppose, we want to interpolate between discrete points obtained from the
function f (x) = exp(x) sin(x).
>>> import matplotlib.pyplot as plt
>>> from scipy import interpolate
>>> x = np.arange(0, 10, 0.5)
>>> y = np.exp(-x)*np.sin(x)
>>> f = interpolate.interp1d(x, y, kind = cubic)
>>> xnew = np.arange(0, 9, 0.05)
>>> ynew = f(xnew)
>>> plt.plot(x, y, o, xnew, ynew, -)
32 / 41
More Functions...
33 / 41
Expectation and PMF
Example (Expected number of trials to get first head)
>>> def expectedTrials(noTrials = 1000):
cnt = 0
for i in range(noTrials):
br = np.random.binomial(1, 0.5, 500)
indx = list(br).index(1)+1
cnt += indx
return float(cnt)/noTrials
Example (PMF of number of heads in two coin tosses)
>>> import matplotlib.pyplot as plt
>>> N = 500
>>> heads = numpy.zeros(N, dtype=int)
>>> for i in range(N):
>>> heads[i] = np.random.randint(low = 0, high = 2, size = 2).sum()
# will generate random integer, low-inclusive, high-exclusive
>>> plt.stem(numpy.bincount(heads), marker= o)
>>> plt.show()
34 / 41
Data reading from excel file
load data to pandas DataFrame
>>> import pandas
>>> import xlrd
>>> data = pandas.ExcelFile(rC:\Users\Admin\Desktop\USTREASURY-
REALYIELD.xls)
>>> data.sheet_names
[u 0 Worksheet10 ]
>>> df = data.parse(uWorksheet1)
35 / 41
Data Download from Google, Yahoo ! Finance
Example (Equity Data Download)
>>> import numpy as np
>>> import pandas
>>> import pandas_datareader as pdr
>>> data = pdr.get_data_yahoo(MSFT, start = 1/1/2015, end =
10/14/2015)
Example (FX Data Download)
>>> import pandas_datareader.data as web
>>>web.get_data_fred("DEXJUPS")
1
1
you can check all the symbols on the page https://research.stlouisfed.org/fred2/categories/94
36 / 41
Regression Using sklearn Package
Example (Ordinary Least Square Regression)
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from sklearn import datasets, linear_model
>>> import pandas
>>> import xlrd
>>> data =
pandas.ExcelFile(r/home/arun/Desktop/MyFunctions/PatientData.xlsx)
>>> data.sheet_names
>>> df = data.parse(uSheet1)
>>> Y = df[ln Urea]; X = df[Age]; X = X.reshape(len(X),1);
Y=Y.reshape(len(Y),1)
>>> plt.scatter(X, Y, color=black)
>>> regr = linear_model.LinearRegression()
>>> regr.fit(X, Y)
37 / 41
Option Pricing Using Binomial Model
>>> import sys
>>> sys.path.append(r/home/arun/Desktop/MyFunctions/)
>>> import BinomialPricing
>>> u,d, s0, vol = BinomialPricing.fetchData()
>>> noption = BinomialPricing.OptionCRR(230, 0.25, 210, 0.5, 0.04545, 5)
# Instantiate the OptionCRR calss
>>> noption.price()
38 / 41
Infosys Option Price
Using n-period Binomial Model
>>> u,d , s0 ,vol = BinomialPricing.fetchData()
>>> noption = BinomialPricing.Option(s0,u,d, 0.0611/365, 40, 960)
>>> noption.price()
Using CRR Model
>>> u,d , s0 ,vol = BinomialPricing.fetchData()
>>> annualized_vol = math.sqrt(252)*vol
s0, sigma, strike, maturity, rfr, n >>> noption =
BinomialPricing.OptionCRR(s0, annualized_vol, 960, float(40)/365, 0.0611,
100)
>>> noption.price()
39 / 41
References
Beazley, D. M. (2009). Python: Essential Reference (4th ed.) Pearson
Education, Inc.
Downey, A., Elkner, J., Meyers, C. (2002). How to think like a computer
scientist. Learning with Python. Green Tea press, 2002 (Free book)
Mannilla, L. et al., (2006). What about a simple language? Analyzing the
difficulties in learning to program. Computer Science Education, vol.
16(3): 211227.
http://matplotlib.org/
http://www.numpy.org/
http://pandas.pydata.org/
https://www.python.org/
http://www.scipy.org/
40 / 41
THANK YOU!
41 / 41