pandas&numpy
August 4, 2020
1 Week 6. Pandas and NumPy cheat sheet
Pandas (https://pandas.pydata.org) and Numpy (https://numpy.org) are two essential Python
libraries for Data Science. Here we will give you some basics that you need to know for the final
project.
You do not need to know a lot about these libraries for this course. However, it is worth
learning more about these libraries if you plan to be a Data Scientist in the future - do not hesitate
to search for extra materials and tutorials.
In [2]: import numpy as np
import pandas as pd
1.1 Pandas
Pandas provides easy-to-use data structures and data analysis tools. Basic classes in Pandas are
DataFrames, which you can consider as matrices with names columns and indexed rows, and
Series, which are columns of such matrices. For example:
In [3]: species = ['falcon', 'dog', 'spider', 'fish']
feature_1 = [2, 4, 8, 0]
feature_2 = [2, 0, 0, 0]
animals = pd.DataFrame({"num_legs": feature_1, "num_specimen_seen": feature_2}, index=s
print('Type of variable animals is ', type(animals))
print('Type of a column from DataFrame animals is ', type(animals["num_specimen_seen"])
animals
Type of variable animals is <class 'pandas.core.frame.DataFrame'>
Type of a column from DataFrame animals is <class 'pandas.core.series.Series'>
Out[3]: num_legs num_specimen_seen
falcon 2 2
dog 4 0
spider 8 0
fish 0 0
1
In [4]: feature_3 = [2, 0, 0, 0]
animals["num_wings"] = feature_3
animals
Out[4]: num_legs num_specimen_seen num_wings
falcon 2 2 2
dog 4 0 0
spider 8 0 0
fish 0 0 0
Now, when you have a DataFrame, you can extract data from it as you like. For example, take
only some of the columns:
In [5]: X = animals[["num_legs", "num_wings"]]
X
Out[5]: num_legs num_wings
falcon 2 2
dog 4 0
spider 8 0
fish 0 0
DataFrame is a quite complicated structure with a lot of extra information (ex. indexes, column
names, etc.). When you only need the numbers, you can convert the DataFrame into another object
- NumPy NdArray:
In [6]: X_np = X.values
print(type(X_np))
<class 'numpy.ndarray'>
1.2 NumPy
NumPy is the fundamental package for scientific computing with Python. ndarray is one of the
most important classes in NumPy. It is a powerful N-dimensional array object. For instance, you
can use it to store a matrix:
In [7]: X_np
Out[7]: array([[2, 2],
[4, 0],
[8, 0],
[0, 0]])
There are plenty of methods in that class. The simplest methods are getting the sizes of the
ndarray via all of the dimentions:
In [8]: X_np.shape
2
Out[8]: (4, 2)
You can create new ndarrays by converting Python objects into it or by using functions pro-
vided in NumPy:
In [9]: a = np.array([2, 4, 8, 0])
print(a, type(a))
[2 4 8 0] <class 'numpy.ndarray'>
In [10]: b = np.ones(shape=(1, 4))
print(b, type(b))
[[1. 1. 1. 1.]] <class 'numpy.ndarray'>
In [11]: c = np.zeros(shape=(5, 4))
print(c, type(c))
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]] <class 'numpy.ndarray'>
In [12]: d = np.zeros_like(X_np)
print(d, type(d))
[[0 0]
[0 0]
[0 0]
[0 0]] <class 'numpy.ndarray'>
Also, you can change it in many ways. For example, by changing the shape (the sizes) of the
ndarray:
In [13]: print(a.reshape(2, 2))
[[2 4]
[8 0]]
In [14]: print(a.reshape(2, -1)) # -1 means "count it for me"
[[2 4]
[8 0]]
3
In [15]: e = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(e.T) # .T means transpose
[[1 5]
[2 6]
[3 7]
[4 8]]
In [16]: print("Transposing converts an object of shape", e.shape, "into an object of shape", e
Transposing converts an object of shape (2, 4) into an object of shape (4, 2) .
In [17]: print("X_np\n", X_np)
print("d\n", d)
print("result\n", np.concatenate([X_np, d], axis=1))
X_np
[[2 2]
[4 0]
[8 0]
[0 0]]
d
[[0 0]
[0 0]
[0 0]
[0 0]]
result
[[2 2 0 0]
[4 0 0 0]
[8 0 0 0]
[0 0 0 0]]
In [18]: print("result\n", np.concatenate([X_np, d], axis=0)) # depending on the specified axis
result
[[2 2]
[4 0]
[8 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]]
You can also perform matrix and vector operations over ndarray objects:
4
In [19]: v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print(v1, "+", v2, "=", v1 + v2)
print(v1, "-", v2, "=", v1 - v2)
[1 2 3] + [4 5 6] = [5 7 9]
[1 2 3] - [4 5 6] = [-3 -3 -3]
In [20]: print("<", v1, ", ", v2.T, ".T> =", v1.dot(v2.T))
< [1 2 3] , [4 5 6] .T> = 32
In [21]: print(v1, "*", v2, "=", v1 * v2)
[1 2 3] * [4 5 6] = [ 4 10 18]
In [22]: print(v1, "^2 =", v1**2)
[1 2 3] ˆ2 = [1 4 9]
In [23]: print(v1, "* 2 =", v1 * 2)
[1 2 3] * 2 = [2 4 6]
Here are cheat sheets for each library. Feel free to use them while completing your final project:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf