0% found this document useful (0 votes)
3 views

4 Introduction to Python Part 3(1)

This document serves as an introduction to Python programming, focusing on the libraries NumPy and Pandas, which are essential for data science and artificial intelligence. It covers the basics of importing libraries, creating and manipulating arrays with NumPy, and introduces Pandas for handling heterogeneous data through Series and DataFrames. The material is prepared for a course at the American University of Sharjah and is based on the book 'Python for Programmers' by Paul Deitel and Harvey Deitel.

Uploaded by

Yusra Eltilib
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

4 Introduction to Python Part 3(1)

This document serves as an introduction to Python programming, focusing on the libraries NumPy and Pandas, which are essential for data science and artificial intelligence. It covers the basics of importing libraries, creating and manipulating arrays with NumPy, and introduces Pandas for handling heterogeneous data through Series and DataFrames. The material is prepared for a course at the American University of Sharjah and is based on the book 'Python for Programmers' by Paul Deitel and Harvey Deitel.

Uploaded by

Yusra Eltilib
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Introduction to Python Programming – Part 3

Python Libraries: NumPy and Pandas


Intro to AI and Data Science
NGN 112 – Fall 2024

Ammar Hasan
Department of Electrical Engineering
College of Engineering

American University of Sharjah

Prepared by Dr. Tamer Shanableh, CSE and Dr. Jamal A. Abdalla, CVE
Material mainly based on “Python for Programmers” by Paul Deitel and
Harvey Deitel, Pearson; Illustrated edition, ISBN-10 : 0135224330

Last Updated on: 22nd of August 2024


Table of Content
2

Python Libraries

NumPy Library

Pandas

DataFrames
Python Libraries
3

Python has many libraries, which are a collection of pre-defined


functions or pre-written code. The libraries can can be imported
into your program, and you can use all the functions in that
library.

You have previously used “import math” where math is the


name for the math library in Python.

A software library is a collection of pre-written code such that


programmers do not reinvent the wheel.
Python Libraries
4

Popular libraires in Python for Data Science (We will use the highlighted ones in this course):
Python Libraries for Data Processing and Model Deployment
• Pandas
• NumPy
• SciPy
• Sci-Kit Learn
• PyCaret
• Tensorflow
• OpenCV
Python Libraries for Data Mining and Data Scraping
• SQLAlchemy
• Scrapy
• BeautifulSoup
Python Libraries for Data Visualization
• Matplotlib
• Ggplot
• Plotly
• Altair
• Seaborn
Source: https://www.projectpro.io/article/top-5-libraries-for-data-science-in-python/196
Importing Libraries
5

Import the whole library:

import numpy
myarr = numpy.array([1,2,3,4])

OR: Import the whole library with an alias:

import numpy as np
myarr = np.array([1,2,3,4])
Importing a Specific Object
6

OR: Import a specific function or an object:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create a line plot


plt.plot(x, y)
7

NumPy Library
The NumPy Library
8

• NumPy is a popular open-source library in Python for


data science and artificial intelligence.

• It is a standard way of working with numeric data in


Python.

• It can be used for creating and manipulating N-


dimensional arrays
▪ 1D for lists of numbers
▪ 2D for tables and grayscale images
▪ 3D for images (R, G, B)
▪ 4D for videos (a sequence of 3D images)
Creating NumPy Arrays
9

 Start by importing the NumPy library

import numpy as np

 Next, create 1D arrays:


# Create a 1D array
numpy_array = np.array([10,20,30])

# Create a 1D array from a list of numbers


data = [10, 20, 30, 40, 50]
numpy_array = np.array(data)
Numpy 2D arrays
10

 Next, let’s create a 2D array. Think of 2D arrays as an “Array of “Arrays” or


a matrix.
import numpy as np

arr_2D = np.array([
[10, 20, 30, 4],
[2, 8, 2, 4],
[30, 12, 67, 44],
[24, 10, 32, 0]
])
print(arr_2D)
print('Shape: ', arr_2D.shape) #prints the
dimensions of the array
Numpy 2D arrays
11

import numpy as np
import matplotlib.pyplot as plt

# Create a smiley
smiley_array = np.array([
[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
])

print(smiley_array)
plt.imshow(smiley_array, cmap='binary', )
Reshaping NumPy Arrays
12

 You can use the NumPy reshape function to transform a 1D array into a
multidimensional array (row-wise)
 Example: we can reshape a 12-element 1D array into a 4x3 2D array
 Clearly, reshaping a 12-element 1D array into a 4x4 2D array will not work and
this will generate an error.

import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)

arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
Transposing NumPy Arrays
13

 You can use the np transpose function to replace rows with columns in a 2D array
 The first row becomes the first column, the second row becomes the second column
and so forth…
Transposing NumPy Arrays
14

 You can use the np transpose function to replace rows


with columns in a 2D array
 The first row becomes the first column, the second row

becomes the second column, and so forth.

import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)
arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
#------------------------------------
arr_2D_transposed = np.transpose(arr_2D)
print('arr_2D_transposed contains: \n',
arr_2D_transposed)
NumPy Sorting
15

#Numpy Example: sort #Use the function np.sort(name of


method array, axis to sort: None|0|1)
import numpy as np
# Sort the whole array
arr_2D = np.array([ rst = np.sort(arr_2D,axis=None)
[10, 20, 30, 4], print('sort the whole Array: \n',
[2, 8, 2, 4], rst)
[30, 12, 67, 44],
[24, 10, 32, 0] # Sort row-wise (axis = 1)
]) rst = np.sort(arr_2D,axis=1)
print(arr_2D) print('Row-wise sorting: \n',rst)

# Sort column-wise (axis = 0)


rst = np.sort(arr_2D,axis=0)
print('Column-wise sorting:
\n',rst)
NumPy Calculation Functions
16

 We can use the sum, min, max, mean, std, and var
functions on NumPy arrays. An example of using of sum is shown below.

import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90],[100, 81, 82]])
print('The grades are: \n', grades)

sum = grades.sum(axis=1) # row-wise


print('Summation row-wise:\n',sum)

sum = grades.sum(axis=0) # col-wise


print('Summation col-wise:\n',sum)

sum = grades.sum(axis=None) # all


print('Summation of all grades:\n',sum)
NumPy Calculation Functions
17

 An example of using of min is shown below.

import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90], [100, 81, 82]])
print('The grades are: \n', grades)

min = grades.min(axis=1) # row-wise


print('min row-wise:\n',min)

min = grades.min(axis=0) # col-wise


print('min col-wise:\n',min)

min = grades.min(axis=None) # all


print('min of all grades:\n',min)
Indexing and Slicing (1/4)
18

 Arrays in NumPy use a zero-indexing scheme. This scheme applies to


rows and columns indexing.
import numpy as np

grades = np.array([[87,96, 70], [100, 87, 90], [94,77,


90],[100,81, 82]])
print('The grades are: \n', grades)

#Select one grade using: grade[row index, col index]


print('grades[0,0] = ', grades[0,0])
print('grades[1,2] = ', grades[1,2])

#Select one row of grades using : grade[row index]


print('grades[3] = ', grades[3])
Indexing and Slicing (2/4)
19

 Multiple rows can be selected from a NumPy array.


 Select multiple sequential rows of grades using array_name[row index
from : row index to]. However, this will exclude the row with the last
index as shown in the example below.

import numpy as np

grades = np.array([[87,96, 70],[100, 87, 90],[94, 77,


90], [100, 81, 82]])

print('The grades are: \n', grades)

#Select multiple sequential rows of grades using :


grade[row index from : row index to]

print('grades[0:2] = \n', grades[0:2]) #up to but not


including row 2
Indexing and Slicing (3/4)
20

 You can select a subset of columns in NumPy


arrays
 grades[:,0] means select all rows,
column 0
 grades[:, 0:2] means select all rows,
columns 0,1 (up to but not including 2)

import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90],
[94, 77, 90], [100, 81, 82]])
print('The grades are: \n', grades)

print('First column | grades[:,0] = \n', grades[:,0])


print(‘Last 2 columns| grades[:, 1:3] = \n', grades[:,1:3])

Adopted from https://www.w3resource.com/python-


exercises/numpy/python-numpy-exercise-104.php
Indexing and Slicing (4/4)
21

 Python allows negative indices in arrays


 One particularly important case is the access of the last

column using the negative column index of ‘-1’

import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77,
90], [100, 81, 82]])
print('The grades are: \n', grades)

print('First column | grades[:,0] = \n', grades[:,0])


print('Last column | grades[:, -1] = \n', grades[:,-1])
22

Pandas - Series & DataFrames


Revisiting countries and fruits example
23

Which representation is better, with headers or without headers


Heterogenous data
24

Heterogenous means data of different types e.g. strings and ints


Numpy arrays do not support heterogenous data

Numpy arrays do not


support missing entries
Pandas Series and DataFrames (1/2)
25

 NumPy arrays are optimized for homogenous numeric data

 However, in machine learning (ML) applications, we need to


provide:
 Support for heterogeneous types (e.g., numbers and strings).
 Support for missing data.
 Support for headers and indices (as shown in the next slide).

 Pandas is the commonly used library for dealing with such data.
 It provides support for:
 Series: for 1D collections (enhanced 1D array).
 DataFrames: for 2D collections (enhanced 2D array).
Pandas Series and DataFrames (2/2)
26

Index value
Index header header header header

Rest of columns are called “values”


First column is called “index”
Pandas: Python Data Analysis
27
28

Pandas Series
Pandas Series (1/2)
29

 A Series is an enhanced 1D array.


 It can be indexed using integers like NumPy or strings.

import pandas as pd
grades = pd.Series([87, 100, 94])
print('Grades Series:\n',grades)
print('First grade: ',grades[0])

Output (index and value):


0 87
1 100
2 94
First grade: 87
Pandas Series (2/2)
30

 Provides for statistical import pandas as pd


grades = pd.Series([87, 100, 94])
functions like count,
mean, min, max, and print('Grades Series:\n',grades)
std. print('Count: ', grades.count())
print('Mean: ' , grades.mean())
 For a full numerical print('Min: ' , grades.min())
summary, you can use the print('Max: ' , grades.max())
print('Std: ' , grades.std())
describe function.
# for an overall summary you can
use:
print('Description:\n',grades.des
cribe())
Series with a Custom Index
31

 You can use custom indices with the index


argument. Index value

import pandas as pd
grades = pd.Series([87, 100, 94],
index=['First', 'Second', 'final'])
print(grades)

Output:

First 87
Second 100
final 94
Accessing Series Using String Indices
32

 In the previous example, a Series with custom indices can be accessed via
square brackets [ ] containing a custom index value:
import pandas as pd
grades = pd.Series([87, 100, 94], index=['First',
'Second', 'final'])
print('Grade of first = ',grades['First']) # or
print('Grade of first = ',grades[0])

#--You can also access all values and all indices


print('Series values are: ', grades.values)
print('Series indices are: ', grades.index)

Output:
Grade of first = 87
Grade of first = 87
Series values are: [ 87 100 94]
Series indices are: Index(['First', 'Second', 'final'],
dtype='object')
33

Pandas DataFrames
Pandas DataFrames
34

 DataFrames are enhanced 2D Index header header header header


arrays
 They can have custom indices
and headers
 Each column in a DataFrame is a
Series
Creating DataFrames From Files
35

• Pandas provides a read_csv() function to read data stored as a .csv file into
a pandas DataFrame.

• Pandas supports many different file formats including csv and excel:
• myDataFrame = pd.read_csv(“myfile.csv”)
• myDataFrame = pd.read_excel(“myfile.xlsx”)

• To save data from DataFrames to files use:


• myDataFrame.to_csv(“myOutputFile.csv”)
• myDataFrame.to_excel(“myOutputFile.xlsx”)

• After reading a file, you can display the first and last 5 rows using
myDataFrame.head()
Creating DataFrames From Files in Colab
36

Click to upload a file

I uploaded this file

df2.to_csv('testFileToWrite.csv') # this will create an output file with .csv extension


Creating DataFrames From Internet Files (1/3)

37

• We will use the Iris sample data, which contains information on 150
Iris flowers, 50 each from one of three Iris species: Setosa,
Versicolour, and Virginica.
• Each flower is characterized by five attributes:
1. sepal_length in centimeters
2. sepal_width in centimeters
3. petal_length in centimeters
4. petal_width in centimeters
• Each flower belongs to one type, which is the last column in
DataFrame:
(Setosa, Versicolour, Virginica)
Data is available online at: https://archive.ics.uci.edu/dataset/53/iris
Iris Flowers Dataset
38
Creating DataFrames From Internet Files (2/3)
39

import pandas as pd

#The argument header=None says that this dataset does not


contain a header yet, so we will add one next
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-
learning-databases/iris/iris.data',header=None)

# data = pd.read_csv('iris.data')

#You can then Add column headers


data.columns=['sepal_length','sepal_width','petal_length','pe
tal_width','class']

#And display the first 5 rows to make sure that the reading
is successful
data.head()
Creating DataFrames From Internet Files (3/3)

40

The output:
41

DataFrames Indexing
Accessing DataFrame’s Columns and Rows (1/4)
42

petal_length columns:
#Access one column using a header’s name 0 1.4
print('petal_length 1 1.4
columns:\n',data['petal_length']) 2 1.3
3 1.5
4 1.4
...
145 5.2
146 5.0
147 5.2
148 5.4
149 5.1

First row:
#Access one row using the .iloc function sepal_length 5.1
print('\n\nFirst row:') sepal_width 3.5
petal_length 1.4
print(data.iloc[0]) petal_width 0.2
class Iris-setosa
Accessing DataFrame’s Columns and Rows (2/4)
43

#Access a sequential slice of rows using the .iloc


function

print('\n\nFirst 5 rows:')

print(data.iloc[0:5]) # up to but not including 5

First 5 rows:

sepal_length sepal_width petal_length petal_width class


0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Accessing DataFrame’s Columns and Rows (3/4)
44

#Access a sequential slice of rows and columns using the


.iloc function
print('\n\nFirst 5 rows and first 2 columns:')

#print up to but not including row 5, up to but not


including col 2
#.iloc[ rows from:to , cols from:to ]
print(data.iloc[0:5 , 0:2 ])

First 5 rows and first 2 columns:


sepal_length sepal_width
0 5.1 3.5
1 4.9 3.0
2 4.7 3.2
3 4.6 3.1
4 5.0 3.6
Accessing DataFrame’s Columns and Rows (4/4)
45

#Access a sequential slice of rows and columns using the


.iloc function
print('\n\nFirst 5 rows and first 2 columns:')

#print up to but not including row 5, and cols 0,1 and the
last column
#.loc[ rows from:to , [cols indices] ]
print(data.iloc[0:5 , [0,1,-1]])

sepal_length sepal_width class


0 5.1 3.5 Iris-setosa
1 4.9 3.0 Iris-setosa
2 4.7 3.2 Iris-setosa
3 4.6 3.1 Iris-setosa
4 5.0 3.6 Iris-setosa
46

DataFrames Boolean Indexing


DataFrames Boolean Indexing (1/5)
47

 Pandas provide a powerful selection feature called Boolean


indexing.
 That is, you can use a Boolean expression that returns True/False
to filter a DataFrame.
 Let us start by extracting the numeric data from our DataFrame:
data_numeric = data.iloc[:, 0:4]
data_numeric.head()
sepal_length sepal_width petal_length petal_width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
DataFrames Boolean Indexing (2/5)
48

# from the previous slide


data_numeric = data.iloc[:, 0:4]
#Filter the dataFrame, locate values >=
5.0 sepal_length sepal_width petal_length petal_width
rst = data_numeric[data_numeric >= 5.0] 0 5.1 NaN NaN NaN
rst
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
• Pandas checks every element to 3 NaN NaN NaN NaN
determine whether its value is greater 4 5.0 NaN NaN NaN
than or equal to 5.0. ... ... ... ... ...
• If True then it includes it in the new 145 6.7 NaN 5.2 NaN
DataFrame (rst in the example above). 146 6.3 NaN 5.0 NaN
147 6.5 NaN 5.2 NaN
• Elements for which the condition is False 148 6.2 NaN 5.4 NaN
are represented as NaN (not a number) in 149 5.9 NaN 5.1 NaN
the new DataFrame. 150 rows × 4 column
DataFrames Boolean Indexing (3/5)
49

• In Boolean expression, you can use:


▪ AND, which is the & operator
▪ OR, which is the | operator

data_numeric = data.iloc[:, 0:4]

rst = data_numeric[data_numeric >= 5.0]


rst.head()

#Other examples (data_numeric >= 3.0) AND (data_numeric <=


5.0):
rst = data_numeric[(data_numeric >= 3.0) & (data_numeric <=
5.0)]
rst.head()

#Other examples (data_numeric < 3.0) OR (data_numeric > 5.0):


rst = data_numeric[(data_numeric < 3.0) | (data_numeric > 5.0)]
rst.head()
DataFrames Boolean Indexing (4/5)
50

• In Boolean expression, you can use the .loc function to filter rows according to Boolean
criteria.
import pandas as pd

data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
# data = pd.read_csv('iris.data')

data.columns=['sepal_length','sepal_width','petal_length','petal_width'
,'class']

#Select row where sepal_length >= 5.0


rst = data.loc[ data.sepal_length >= 5.0 ]
print('Select row where sepal_length >= 5.0')
print(rst.head())

#Select row where sepal_length >= 5.0 AND & data.sepal_width >= 3.5
rst = data.loc[ (data.sepal_length >= 5.0) & (data.sepal_width >= 3.5)]
print('Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5')
print(rst.head())
DataFrames Boolean Indexing (5/5)
51

Select row where sepal_length >= 5.0


Output: sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa

Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
14 5.8 4.0 1.2 0.2 Iris-setosa
Summary of Four Types of Indexing in
DataFrames
52
import pandas as pd
#Retrieve data from web archive and add column headers
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width','class']

# Indexing with column header name


print(data['sepal_width'])

# Indexing using .iloc, which is similar to numpy indexing


print(data.iloc[2,1]) # index a particular element
print(data.iloc[2]) # index a row
print(data.iloc[2:5]) # index a range of rows
print(data.iloc[:,4]) # index a column
print(data.iloc[:,0:3]) # index a range of columns
print(data.iloc[20:22,0:3]) # index a range of rows and columns
print(data.iloc[0:7:2,3]) # index a range of rows with step size
print(data.iloc[2,-1]) # index with -1 for last column
print(data.iloc[[0,39,45],3]) # index some specific row numbers

# boolean indexing (element wise)


data_numeric = data.iloc[:,0:4] #retrieve only numeric data
print(data_numeric[data_numeric >= 5.0]) #all elements that satisfy boolean condition
print(data_numeric[(data_numeric >= 3.0) & (data_numeric <= 5.0)])

# boolean indexing (row wise) using .loc


print(data.loc[data.sepal_length >= 7]) #all rows that satisfy boolean condition
print(data.loc[(data.sepal_length >= 7) & (data.petal_length <=5)])
53

DataFrames Statistics
DataFrames Statistics (1/2)
54

 Similar to Series, you can use the describe()function to print out


statistics.
 In DataFrames, the statistics are calculated by column (for the numeric
columns only).
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
data.describe() 50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
DataFrames Statistics (2/2)
55

 Similar to Series, you can use the mean(), min(), max(), std(),
var().
 In DataFrames, the statistics are calculated by column (for the numeric columns
only).
Avg per col:
print('Avg per col:') sepal_length 5.843333
sepal_width 3.054000
print(data.mean()) petal_length 3.758667
print('Std per col:') petal_width 1.198667
print(data.std())
Std per col:
print('Min per col:') sepal_length 0.828066
print(data.min()) sepal_width 0.433594
print('Max per col:') petal_length 1.764420
petal_width 0.763161
print(data.max())

56

Converting Numpy to DataFrames


DataFrames <-> NumPy (1/3)
57

 There are cases where you need to convert a DataFrame into a NumPy Array
and vice versa
 This is needed in machine learning tasks like classification and regression that
you will study next
 Let us start by converting a DataFrame into a NumPy array using to_numpy()
function
import pandas as pd
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width
', 'class']

#Convert a dataFrame into a numPy array


numpy_from_dataFrame = data.to_numpy()
#print(numpy_from_dataFrame)

#OR: Convert the first 4 columns of a dataFrame into a numPy array


numpy_from_dataFrame = data.iloc[:, 0:4].to_numpy()
#print(numpy_from_dataFrame)
DataFrames <-> NumPy (2/3)
58

Output of Output of data.iloc[:,


data.to_numpy() 0:4].to_numpy()

[[5.1 3.5 1.4 0.2 'Iris-setosa'] [[5.1 3.5 1.4 0.2]


[4.9 3.0 1.4 0.2 'Iris-setosa'] [4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2 'Iris-setosa'] [4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2 'Iris-setosa'] [4.6 3.1 1.5 0.2]
[5.0 3.6 1.4 0.2 'Iris-setosa'] [5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4 'Iris-setosa'] [5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3 'Iris-setosa'] [4.6 3.4 1.4 0.3]
[5.0 3.4 1.5 0.2 'Iris-setosa'] [5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2 'Iris-setosa'] [4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1 'Iris-setosa'] [4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2 'Iris-setosa'] [5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2 'Iris-setosa'] [4.8 3.4 1.6 0.2]
… …
DataFrames <-> NumPy (3/3)
59

 To convert a NumPy array into a DataFrame we can use the command


pd.DataFrame()
 Notice how you can add columns (which are the headers), using the argument
columns=[…]

dataFrame_from_numpy =
pd.DataFrame(numpy_from_dataFrame, columns =
['sepal_length', 'sepal_width', 'petal_length',
'petal_width','class'])

dataFrame_from_numpy.head()
60

Converting Dictionaries to
DataFrames
Other Ways of Creating DataFrames (1/2)
61

import pandas as pd Output:


df = pd.DataFrame( Name Age Gender
{ 0 Braund, Mr. Owen Harris 22 male
"Name":["Braund, Mr. Owen 1 Allen, Mr. William Henry 35 male
Harris", "Allen, Mr. William Henry", 2 Bonnell, Miss. Elizabeth 58 female
"Bonnell, Miss. Elizabeth"],
"Age":[22, 35, 58], Age
“Gender":["male","male", "female"] count 3.000000
}
mean 38.333333
)
print(df) std 18.230012
df.describe() min 22.000000
25% 28.500000
50% 35.000000
75% 46.500000
max 58.000000
https://pandas.pydata.org/
Other Ways of Creating DataFrames (2/2)
62

#You can create a DataFrame from an existing The dictionary’s


dictionary as follows keys become the
import pandas as pd column names
my_dictionary={ (headers).
"Name": [
"Dr. Sami Batata", The values become
"Prof. Marwa Halawah", the element values
"Mr. Fawzi Kamal" in the
], corresponding
"Age": [29, 40, 60], column.
"Gender": ["male", "female", "male"]
}
df = pd.DataFrame( my_dictionary)
print(df)

Name Age Gender


0 Dr. Sami Batata 29 male
1 Prof. Marwa Halawah 40 female
2 Mr. Fawzi Kamal 60 male

You might also like