Python2_master
Python2_master
Jupyter Notebook
This is a web-based application (runs in the browser) that is used to interpret Python code.
To add more code cells (or blocks) click on the '+' button in the top left corner
There are 3 cell types in Jupyter:
Code: Used to write Python code
Markdown: Used to write texts (can be used to write explanations and other key
information)
NBConvert: Used convert Jupyter (.ipynb) files to other formats (HTML, LaTex, etc.)
To run Python code in a specific cell, you can click on the 'Run' button at the top or press
Shift + Enter
The number sign (#) is used to insert comments when coding to leave messages for yourself
or others. These comments will not be interpreted as code and are overlooked by the
program
Classes
Object-orientated programming approach popular and efficient
Define classes of real-world things or situations (can be thought of as creating your own data
type)
Attributes of various data types
Functions inside of a class are the same except called methods
Methods may be accessed using the dot operator
Instanciate objects of your classes
__init()__ method used to prefill attributes
1 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
In [3]: #Create a Payment class and assign it 3 attributes: payer, payee, amount
class Payment:
def __init__(self, payer, payee, amount):
self.payer = payer
self.payee = payee
self.amount = amount
In [5]: print(pay1.amount)
100
In [6]: print(pay1.payee)
Seamus
Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation
tool, built on top of the Python programming language.
2 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
DataFrames
Series
#print(df)
Out[9]:
Store Date Temperature Fuel_Price MarkDown1 CPI Unemployment IsHoliday
3 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
Out[10]:
Store Date Temperature Fuel_Price MarkDown1 CPI Unemployment IsHoliday
Out[11]: (8190, 9)
4 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
Out[15]: 0 211.096358
1 211.242170
2 211.289143
3 211.319643
4 211.350143
Name: CPI, dtype: float64
(8190, 9)
73710
Out[17]: 228.9764563
Out[18]: 182.7640032
Out[19]: 126.064
5 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
features_df["Store"].unique()
Out[22]: 10/22/2010 45
3/1/2013 45
11/30/2012 45
5/28/2010 45
7/2/2010 45
..
9/14/2012 45
12/9/2011 45
1/27/2012 45
6/22/2012 45
4/8/2011 45
Name: Date, Length: 182, dtype: int64
Out[23]:
Store Date Temp Fuel_Price MD1 CPI Unemployment IsHoliday Status
In [24]: features_df.head()
Out[24]:
Store Date Temp Fuel_Price MD1 CPI Unemployment IsHoliday Status
6 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
Out[25]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status
Out[26]: Store 0
Date 0
Temp 0
Fuel_Price 0
MD1 4158
CPI 585
Unemployment 585
IsHoliday 0
Status 0
dtype: int64
In [28]: features_df.head()
Out[28]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status
In [30]: features_df.head()
7 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
Out[30]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status
In [31]: #Say a colleague of yours asks for a new metric called "customerCost"
#Add a column that is equal to Fuel_Price * CPI
Indexing
Because Pandas will select entries based on column values by default, selecting data based
on row values requires the use of the iloc method.
Allowed inputs are:
An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
Out[32]:
Fuel_Price CPI Unemployment IsHoliday
8 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
In [33]: features_df.loc[[100,105]]
Out[33]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCost
Out[34]:
CPI customerCost
Out[35]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCost
Out[36]:
9 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
In [37]: #Retrieve all rows with a isHoliday of True and customerCost larger than 550
filt1 = features_df['IsHoliday'] == True
filt2 = features_df['customerCost'] > 550
features_df.loc[filt1 & filt2]
Out[37]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCo
Out[38]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCost
10 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
In [39]: #We may also provide specific row/column values to access specific values
features_df.iloc[0, 1]
Out[39]: '2/5/2010'
Out[40]:
Date Fuel_Price
0 2/5/2010 2.572
2 2/19/2010 2.514
Out[41]:
Store Date Temp
1 1 2/12/2010 38.51
2 1 2/19/2010 39.93
Formatting Data
To access and format the string values of a DataFrame, we can access methods within the
"str" module of the DataFrame
We may also format float values using options.display.float_format() in Pandas
In [42]: # We can access all the same string methods from Python 1 using .str
features_df['Status'] = features_df['Status'].str.upper()
In [43]: features_df.head()
Out[43]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCos
11 of 12 11/16/2020, 2:32 PM
Python2_master - Jupyter Notebook http://localhost:8888/notebooks/Documents/BCHPython2/Python2_maste...
Out[44]:
Store Date Temp Fuel_Price CPI Unemployment IsHoliday Status customerCost
12 of 12 11/16/2020, 2:32 PM