Modules (sometimes called packages or libraries) help group together related
sets of tools in Python. In this exercise, we will examine two modules that are
frequently used by Data Scientists:
1. statsmodels: used in machine learning; usually aliased as sm
2. seaborn: a visualization library; usually aliased as sns
Pandas
?What can pandas do for you
Loading tabular data from different sources
Search for particular rows or columns
Calculate aggregate statistics
Combining data from multiple sources
CSV files:
Import pandas as pd
Df = pd.read_csv('random.csv')
Print(df)
Methods:
Print(Df.head()) (method to display the first five lines in data frame)
Df.info() (disply information about data frame)
Selecting columns ways:
1. selecting with brackets and string: strings with spaces or special characters
Dataframe name['columnname']
2. selecting with a dot: if the name contains only letters or numbers
Dataframe name.columnname
Logical statements:
Question == solution (test if the two nu,ber are the same
>, >=, <, <=, !=
Booleans: true or false
In data frame we can compare more than one variable with one variable
Ex: credit_records[credit_records.price > 20.00]
matplotlib
Creating plot line:
From matplotlib import pyplot as plt;
Plt.plot( Xvalues, Yvalues )
Plt.show() to display the plot
Adding labels and legends:
Axes and title labels:
Plt.xlabel(" ")
Plt.ylabel(" ")
Plt.title(" ")
Plt.xticks() to change the numbers on axis
Plt.yticks()
Legends:
Add keyword argument label:
Plt.plot(x, y, label = " ")
Plt.legend() to show legends
Arbitrary text: (floating text)
Plt.text(x, y, ' ')
Modifying text:
Change font size:
Plt.title(' ', fontsize=20)
Change font color:
Plt.legend(color='green')
Adding style:
Changing line color:
Plt.plot(x, y, color=" ")
Changing line width:
Linewidth=1 (from 1 to 7)
Changing line style:
Linestyle='-' or '—' or '-.' Or ':'
Adding markers:
Marker='x' or 's' or 'o' or 'd' or '*' or 'h'
Setting a style:
Plt.style.use(' ')
'fivethirtyeight' - Based on the color scheme of the popular website
'grayscale' - Great for when you don't have a color printer!
'seaborn' - Based on another Python visualization library
'classic' - The default color scheme for Matplotlib
Making a Scatter Plot:
Plt.scatter(x,y)
Plt.show()
Changing marker transparency:
Alpha= 0 to 1
logarithmic scale:
plt.xscale('log');
Making a bar chart:
Plt.bar(x, y)
Plt.barh(x,y) to make horizontal bar chart
Adding error bars:
Yerr=dataframename.error
Stacked bar carts:
Display two diff. sets of bars
Bottom = df.column
Making a histogram:
Plt.hist(x,y)
Changing bins:
Bins=num of bins
Changing range:
Range = (xmin, xmax)
Normalizing:
Reduces the height of each bar by a const. factor so that the area of each bar adda
to one.
Density=True
Dictionary:
varName = {key: value}
name[key] >> to print value
name.keys() >> print all keys
key in name >> true/ false
del(name[key]);
differences between list and dictionariy:
list is indexed by range of numbers, order matters
dictionary is indexed by unique keys, order doesn't matter
PANDAS:
Pandas is an open source library, providing high-performance, easy-to-use
data structures and data analysis tools for Python.
The DataFrame is one of Pandas' most important data structures. way to store
tabular data where you can label the rows and the columns. One way to build a
DataFrame is from a dictionary.
Import pandas as pd
To turn dictionary into dataframe
DataFrameName= pd.dataframe(dictionaryName);
dataframeName.index = …;
CSV files:
Name= pd.read_csv('', index_col = 0);
Index and select data:
Square brackets: selecting columns: dataframeName['colName'],
dataframeName[['colName']]
Selecting rows: dataframeName['1:4'],
Advanced methods:
1- loc: select parts of your data based on labels: loc['dataname'] or [['name']]
Square brackets
● Column access
● Row access: only through slicing
● loc (label-based)
● Row access
● Column access
● Row & Column access brics[["country", "capital"]] brics[1:4] brics.loc[["RU", "IN", "CH"]]
brics.loc[:, ["country", "capital"]] brics.loc[["RU", "IN", "CH"], ["country", "capital"]]
2- iloc: position-based
Like loc but instead of col names you put numbers from 0 to …
Boolean operators: and, or , not
True and true >> true
Numpy operators:
Np.Logical_and(), np.logical_or(), np.logical_not()
Conditional statements:
If, elif, else
If z % 2 == 0 :
Print('z is even')
Else :
Print('z is odd')
Elif ( it is like else if)
Loops:
1- While loop = repeated if statement
Syntax:
While condition :
expression
2- For loop:
Syntax:
For var in seq :
Expression
Ex:
Fam = [1, 2, 4, 5]
For height in fam :
Print(height)
Using enumerate:
For index, height in enumerate (fam) :
Print( height + index)
For c in "family" :
The loop will run for a number of times equal to each char in the string
Loop data structure:
1- Dictionary: use a method
For key, value in dicName .items() :
Print (key + value)
2- Numpy Arrays: 2D array: use a function
For val in np.nditer(npArr) :
Print(val)
3- pandas Data Frame: use a method
For lab, row in dataframeName.iterrows() :
Print(lab)
Print(row)
Using apply func to apply a function for every row in data:
dataframeName["colName'].apply(function)
str(integer) >> convert a number to a string
Random Number:
Random generators:
Np.random.rand()
Np. Random. Seed()
Np.random.randint(0,2) generate 0 or 1
Arr.append() >> add item to array
For x in range(number) : >> make for loop run for a number of times
Np.mean()
Python Data Science Toolbox (Part 1)
1-Defining a function:
Def funcName(): (function header)
Function body
Calling the function>> funcName()
2-Function Parameters:
Def funcName(value):
……
funcName(---)
3-return values from functions:
Return ---
4-Docstrings:
-describe what your function does, placed in the immediate line after the function header
"""….."""
5- Multiple Function Param.
6- return multiple values (tuples)
Tuples is like a list and immutable and constructed using ()
Even_nums = (2, 4, 6)
A, b, c = even_nums
Scope and user-defined functions:
Scopes are three: Global, local, built-in (like print())
Python first looks in local, then enclosing functions if there any, then global then built-in
scope
Using Global key word alter the value of a variable defined in the global scope.
Import builtins to access the python built-in scope????
Nested Functions:
Def outer(..):
Code
Def inner(…):
code
outer function is called enclosing function
Returning Functions
Closure: anything defined locally in the enclosing scope is available to the inner
function even when the outer function has finished execution.
Using nonlocal to alter the value of a variable defined in the enclosing scope.
Default and flexible arguments
Add default argument: funcName(arg = default value)
Functions with variable-length arguments: (*args)
Functions with variable-length arguments: (**Kwargs)
Lambda Functions:
Quicker way to write functions:
Varname = lambda x, y: x ** y
Varname(1,2)
Anonymous functions:
Map(func, seq)
We can write this function into map using lambda and without defining it.
To convert a variable into a list: list(variable)
Errors and exceptions:
Exceptions - caught during execution
Catch exceptions with try-except clause
Runs the code following try
If there’s an exception, run the code following except
Try:
Except Type Error:
Print()
If x < 0:
Raise ValueError('')
Try:
….
Except:
Print()