4 Introduction to Python Part 3(1)
4 Introduction to Python Part 3(1)
Ammar Hasan
Department of Electrical Engineering
College of Engineering
Prepared by Dr. Tamer Shanableh, CSE and Dr. Jamal A. Abdalla, CVE
Material mainly based on “Python for Programmers” by Paul Deitel and
Harvey Deitel, Pearson; Illustrated edition, ISBN-10 : 0135224330
Python Libraries
NumPy Library
Pandas
DataFrames
Python Libraries
3
Popular libraires in Python for Data Science (We will use the highlighted ones in this course):
Python Libraries for Data Processing and Model Deployment
• Pandas
• NumPy
• SciPy
• Sci-Kit Learn
• PyCaret
• Tensorflow
• OpenCV
Python Libraries for Data Mining and Data Scraping
• SQLAlchemy
• Scrapy
• BeautifulSoup
Python Libraries for Data Visualization
• Matplotlib
• Ggplot
• Plotly
• Altair
• Seaborn
Source: https://www.projectpro.io/article/top-5-libraries-for-data-science-in-python/196
Importing Libraries
5
import numpy
myarr = numpy.array([1,2,3,4])
import numpy as np
myarr = np.array([1,2,3,4])
Importing a Specific Object
6
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
NumPy Library
The NumPy Library
8
import numpy as np
arr_2D = np.array([
[10, 20, 30, 4],
[2, 8, 2, 4],
[30, 12, 67, 44],
[24, 10, 32, 0]
])
print(arr_2D)
print('Shape: ', arr_2D.shape) #prints the
dimensions of the array
Numpy 2D arrays
11
import numpy as np
import matplotlib.pyplot as plt
# Create a smiley
smiley_array = np.array([
[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
])
print(smiley_array)
plt.imshow(smiley_array, cmap='binary', )
Reshaping NumPy Arrays
12
You can use the NumPy reshape function to transform a 1D array into a
multidimensional array (row-wise)
Example: we can reshape a 12-element 1D array into a 4x3 2D array
Clearly, reshaping a 12-element 1D array into a 4x4 2D array will not work and
this will generate an error.
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)
arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
Transposing NumPy Arrays
13
You can use the np transpose function to replace rows with columns in a 2D array
The first row becomes the first column, the second row becomes the second column
and so forth…
Transposing NumPy Arrays
14
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)
arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
#------------------------------------
arr_2D_transposed = np.transpose(arr_2D)
print('arr_2D_transposed contains: \n',
arr_2D_transposed)
NumPy Sorting
15
We can use the sum, min, max, mean, std, and var
functions on NumPy arrays. An example of using of sum is shown below.
import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90],[100, 81, 82]])
print('The grades are: \n', grades)
import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90], [100, 81, 82]])
print('The grades are: \n', grades)
import numpy as np
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90],
[94, 77, 90], [100, 81, 82]])
print('The grades are: \n', grades)
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77,
90], [100, 81, 82]])
print('The grades are: \n', grades)
Pandas is the commonly used library for dealing with such data.
It provides support for:
Series: for 1D collections (enhanced 1D array).
DataFrames: for 2D collections (enhanced 2D array).
Pandas Series and DataFrames (2/2)
26
Index value
Index header header header header
Pandas Series
Pandas Series (1/2)
29
import pandas as pd
grades = pd.Series([87, 100, 94])
print('Grades Series:\n',grades)
print('First grade: ',grades[0])
import pandas as pd
grades = pd.Series([87, 100, 94],
index=['First', 'Second', 'final'])
print(grades)
Output:
First 87
Second 100
final 94
Accessing Series Using String Indices
32
In the previous example, a Series with custom indices can be accessed via
square brackets [ ] containing a custom index value:
import pandas as pd
grades = pd.Series([87, 100, 94], index=['First',
'Second', 'final'])
print('Grade of first = ',grades['First']) # or
print('Grade of first = ',grades[0])
Output:
Grade of first = 87
Grade of first = 87
Series values are: [ 87 100 94]
Series indices are: Index(['First', 'Second', 'final'],
dtype='object')
33
Pandas DataFrames
Pandas DataFrames
34
• Pandas provides a read_csv() function to read data stored as a .csv file into
a pandas DataFrame.
• Pandas supports many different file formats including csv and excel:
• myDataFrame = pd.read_csv(“myfile.csv”)
• myDataFrame = pd.read_excel(“myfile.xlsx”)
• After reading a file, you can display the first and last 5 rows using
myDataFrame.head()
Creating DataFrames From Files in Colab
36
37
• We will use the Iris sample data, which contains information on 150
Iris flowers, 50 each from one of three Iris species: Setosa,
Versicolour, and Virginica.
• Each flower is characterized by five attributes:
1. sepal_length in centimeters
2. sepal_width in centimeters
3. petal_length in centimeters
4. petal_width in centimeters
• Each flower belongs to one type, which is the last column in
DataFrame:
(Setosa, Versicolour, Virginica)
Data is available online at: https://archive.ics.uci.edu/dataset/53/iris
Iris Flowers Dataset
38
Creating DataFrames From Internet Files (2/3)
39
import pandas as pd
# data = pd.read_csv('iris.data')
#And display the first 5 rows to make sure that the reading
is successful
data.head()
Creating DataFrames From Internet Files (3/3)
40
The output:
41
DataFrames Indexing
Accessing DataFrame’s Columns and Rows (1/4)
42
petal_length columns:
#Access one column using a header’s name 0 1.4
print('petal_length 1 1.4
columns:\n',data['petal_length']) 2 1.3
3 1.5
4 1.4
...
145 5.2
146 5.0
147 5.2
148 5.4
149 5.1
First row:
#Access one row using the .iloc function sepal_length 5.1
print('\n\nFirst row:') sepal_width 3.5
petal_length 1.4
print(data.iloc[0]) petal_width 0.2
class Iris-setosa
Accessing DataFrame’s Columns and Rows (2/4)
43
print('\n\nFirst 5 rows:')
First 5 rows:
#print up to but not including row 5, and cols 0,1 and the
last column
#.loc[ rows from:to , [cols indices] ]
print(data.iloc[0:5 , [0,1,-1]])
• In Boolean expression, you can use the .loc function to filter rows according to Boolean
criteria.
import pandas as pd
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
# data = pd.read_csv('iris.data')
data.columns=['sepal_length','sepal_width','petal_length','petal_width'
,'class']
#Select row where sepal_length >= 5.0 AND & data.sepal_width >= 3.5
rst = data.loc[ (data.sepal_length >= 5.0) & (data.sepal_width >= 3.5)]
print('Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5')
print(rst.head())
DataFrames Boolean Indexing (5/5)
51
Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
14 5.8 4.0 1.2 0.2 Iris-setosa
Summary of Four Types of Indexing in
DataFrames
52
import pandas as pd
#Retrieve data from web archive and add column headers
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width','class']
DataFrames Statistics
DataFrames Statistics (1/2)
54
Similar to Series, you can use the mean(), min(), max(), std(),
var().
In DataFrames, the statistics are calculated by column (for the numeric columns
only).
Avg per col:
print('Avg per col:') sepal_length 5.843333
sepal_width 3.054000
print(data.mean()) petal_length 3.758667
print('Std per col:') petal_width 1.198667
print(data.std())
Std per col:
print('Min per col:') sepal_length 0.828066
print(data.min()) sepal_width 0.433594
print('Max per col:') petal_length 1.764420
petal_width 0.763161
print(data.max())
…
56
There are cases where you need to convert a DataFrame into a NumPy Array
and vice versa
This is needed in machine learning tasks like classification and regression that
you will study next
Let us start by converting a DataFrame into a NumPy array using to_numpy()
function
import pandas as pd
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width
', 'class']
dataFrame_from_numpy =
pd.DataFrame(numpy_from_dataFrame, columns =
['sepal_length', 'sepal_width', 'petal_length',
'petal_width','class'])
dataFrame_from_numpy.head()
60
Converting Dictionaries to
DataFrames
Other Ways of Creating DataFrames (1/2)
61