Introduction to Python Programming – Part 3
Python Libraries: NumPy and Pandas
Intro to AI and Data Science
NGN 112 – Fall 2024
Ammar Hasan
Department of Electrical Engineering
College of Engineering
American University of Sharjah
Prepared by Dr. Tamer Shanableh, CSE and Dr. Jamal A. Abdalla, CVE
Material mainly based on “Python for Programmers” by Paul Deitel and
Harvey Deitel, Pearson; Illustrated edition, ISBN-10 : 0135224330
Last Updated on: 22nd of August 2024
Table of Content
2
Python Libraries
NumPy Library
Pandas
DataFrames
Python Libraries
3
Python has many libraries, which are a collection of pre-defined
functions or pre-written code. The libraries can can be imported
into your program, and you can use all the functions in that
library.
You have previously used “import math” where math is the
name for the math library in Python.
A software library is a collection of pre-written code such that
programmers do not reinvent the wheel.
Python Libraries
4
Popular libraires in Python for Data Science (We will use the highlighted ones in this course):
Python Libraries for Data Processing and Model Deployment
• Pandas
• NumPy
• SciPy
• Sci-Kit Learn
• PyCaret
• Tensorflow
• OpenCV
Python Libraries for Data Mining and Data Scraping
• SQLAlchemy
• Scrapy
• BeautifulSoup
Python Libraries for Data Visualization
• Matplotlib
• Ggplot
• Plotly
• Altair
• Seaborn
Source: https://www.projectpro.io/article/top-5-libraries-for-data-science-in-python/196
Importing Libraries
5
Import the whole library:
import numpy
myarr = numpy.array([1,2,3,4])
OR: Import the whole library with an alias:
import numpy as np
myarr = np.array([1,2,3,4])
Importing a Specific Object
6
OR: Import a specific function or an object:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Create a line plot
plt.plot(x, y)
7
NumPy Library
The NumPy Library
8
• NumPy is a popular open-source library in Python for
data science and artificial intelligence.
• It is a standard way of working with numeric data in
Python.
• It can be used for creating and manipulating N-
dimensional arrays
▪ 1D for lists of numbers
▪ 2D for tables and grayscale images
▪ 3D for images (R, G, B)
▪ 4D for videos (a sequence of 3D images)
Creating NumPy Arrays
9
Start by importing the NumPy library
import numpy as np
Next, create 1D arrays:
# Create a 1D array
numpy_array = np.array([10,20,30])
# Create a 1D array from a list of numbers
data = [10, 20, 30, 40, 50]
numpy_array = np.array(data)
Numpy 2D arrays
10
Next, let’s create a 2D array. Think of 2D arrays as an “Array of “Arrays” or
a matrix.
import numpy as np
arr_2D = np.array([
[10, 20, 30, 4],
[2, 8, 2, 4],
[30, 12, 67, 44],
[24, 10, 32, 0]
])
print(arr_2D)
print('Shape: ', arr_2D.shape) #prints the
dimensions of the array
Numpy 2D arrays
11
import numpy as np
import matplotlib.pyplot as plt
# Create a smiley
smiley_array = np.array([
[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
])
print(smiley_array)
plt.imshow(smiley_array, cmap='binary', )
Reshaping NumPy Arrays
12
You can use the NumPy reshape function to transform a 1D array into a
multidimensional array (row-wise)
Example: we can reshape a 12-element 1D array into a 4x3 2D array
Clearly, reshaping a 12-element 1D array into a 4x4 2D array will not work and
this will generate an error.
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)
arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
Transposing NumPy Arrays
13
You can use the np transpose function to replace rows with columns in a 2D array
The first row becomes the first column, the second row becomes the second column
and so forth…
Transposing NumPy Arrays
14
You can use the np transpose function to replace rows
with columns in a 2D array
The first row becomes the first column, the second row
becomes the second column, and so forth.
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print('arr contains: \n', arr)
arr_2D = arr.reshape(4,3)
print('arr_2D contains: \n', arr_2D)
#------------------------------------
arr_2D_transposed = np.transpose(arr_2D)
print('arr_2D_transposed contains: \n',
arr_2D_transposed)
NumPy Sorting
15
#Numpy Example: sort #Use the function np.sort(name of
method array, axis to sort: None|0|1)
import numpy as np
# Sort the whole array
arr_2D = np.array([ rst = np.sort(arr_2D,axis=None)
[10, 20, 30, 4], print('sort the whole Array: \n',
[2, 8, 2, 4], rst)
[30, 12, 67, 44],
[24, 10, 32, 0] # Sort row-wise (axis = 1)
]) rst = np.sort(arr_2D,axis=1)
print(arr_2D) print('Row-wise sorting: \n',rst)
# Sort column-wise (axis = 0)
rst = np.sort(arr_2D,axis=0)
print('Column-wise sorting:
\n',rst)
NumPy Calculation Functions
16
We can use the sum, min, max, mean, std, and var
functions on NumPy arrays. An example of using of sum is shown below.
import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90],[100, 81, 82]])
print('The grades are: \n', grades)
sum = grades.sum(axis=1) # row-wise
print('Summation row-wise:\n',sum)
sum = grades.sum(axis=0) # col-wise
print('Summation col-wise:\n',sum)
sum = grades.sum(axis=None) # all
print('Summation of all grades:\n',sum)
NumPy Calculation Functions
17
An example of using of min is shown below.
import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90], [100, 81, 82]])
print('The grades are: \n', grades)
min = grades.min(axis=1) # row-wise
print('min row-wise:\n',min)
min = grades.min(axis=0) # col-wise
print('min col-wise:\n',min)
min = grades.min(axis=None) # all
print('min of all grades:\n',min)
Indexing and Slicing (1/4)
18
Arrays in NumPy use a zero-indexing scheme. This scheme applies to
rows and columns indexing.
import numpy as np
grades = np.array([[87,96, 70], [100, 87, 90], [94,77,
90],[100,81, 82]])
print('The grades are: \n', grades)
#Select one grade using: grade[row index, col index]
print('grades[0,0] = ', grades[0,0])
print('grades[1,2] = ', grades[1,2])
#Select one row of grades using : grade[row index]
print('grades[3] = ', grades[3])
Indexing and Slicing (2/4)
19
Multiple rows can be selected from a NumPy array.
Select multiple sequential rows of grades using array_name[row index
from : row index to]. However, this will exclude the row with the last
index as shown in the example below.
import numpy as np
grades = np.array([[87,96, 70],[100, 87, 90],[94, 77,
90], [100, 81, 82]])
print('The grades are: \n', grades)
#Select multiple sequential rows of grades using :
grade[row index from : row index to]
print('grades[0:2] = \n', grades[0:2]) #up to but not
including row 2
Indexing and Slicing (3/4)
20
You can select a subset of columns in NumPy
arrays
grades[:,0] means select all rows,
column 0
grades[:, 0:2] means select all rows,
columns 0,1 (up to but not including 2)
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90],
[94, 77, 90], [100, 81, 82]])
print('The grades are: \n', grades)
print('First column | grades[:,0] = \n', grades[:,0])
print(‘Last 2 columns| grades[:, 1:3] = \n', grades[:,1:3])
Adopted from https://www.w3resource.com/python-
exercises/numpy/python-numpy-exercise-104.php
Indexing and Slicing (4/4)
21
Python allows negative indices in arrays
One particularly important case is the access of the last
column using the negative column index of ‘-1’
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77,
90], [100, 81, 82]])
print('The grades are: \n', grades)
print('First column | grades[:,0] = \n', grades[:,0])
print('Last column | grades[:, -1] = \n', grades[:,-1])
22
Pandas - Series & DataFrames
Revisiting countries and fruits example
23
Which representation is better, with headers or without headers
Heterogenous data
24
Heterogenous means data of different types e.g. strings and ints
Numpy arrays do not support heterogenous data
Numpy arrays do not
support missing entries
Pandas Series and DataFrames (1/2)
25
NumPy arrays are optimized for homogenous numeric data
However, in machine learning (ML) applications, we need to
provide:
Support for heterogeneous types (e.g., numbers and strings).
Support for missing data.
Support for headers and indices (as shown in the next slide).
Pandas is the commonly used library for dealing with such data.
It provides support for:
Series: for 1D collections (enhanced 1D array).
DataFrames: for 2D collections (enhanced 2D array).
Pandas Series and DataFrames (2/2)
26
Index value
Index header header header header
Rest of columns are called “values”
First column is called “index”
Pandas: Python Data Analysis
27
28
Pandas Series
Pandas Series (1/2)
29
A Series is an enhanced 1D array.
It can be indexed using integers like NumPy or strings.
import pandas as pd
grades = pd.Series([87, 100, 94])
print('Grades Series:\n',grades)
print('First grade: ',grades[0])
Output (index and value):
0 87
1 100
2 94
First grade: 87
Pandas Series (2/2)
30
Provides for statistical import pandas as pd
grades = pd.Series([87, 100, 94])
functions like count,
mean, min, max, and print('Grades Series:\n',grades)
std. print('Count: ', grades.count())
print('Mean: ' , grades.mean())
For a full numerical print('Min: ' , grades.min())
summary, you can use the print('Max: ' , grades.max())
print('Std: ' , grades.std())
describe function.
# for an overall summary you can
use:
print('Description:\n',grades.des
cribe())
Series with a Custom Index
31
You can use custom indices with the index
argument. Index value
import pandas as pd
grades = pd.Series([87, 100, 94],
index=['First', 'Second', 'final'])
print(grades)
Output:
First 87
Second 100
final 94
Accessing Series Using String Indices
32
In the previous example, a Series with custom indices can be accessed via
square brackets [ ] containing a custom index value:
import pandas as pd
grades = pd.Series([87, 100, 94], index=['First',
'Second', 'final'])
print('Grade of first = ',grades['First']) # or
print('Grade of first = ',grades[0])
#--You can also access all values and all indices
print('Series values are: ', grades.values)
print('Series indices are: ', grades.index)
Output:
Grade of first = 87
Grade of first = 87
Series values are: [ 87 100 94]
Series indices are: Index(['First', 'Second', 'final'],
dtype='object')
33
Pandas DataFrames
Pandas DataFrames
34
DataFrames are enhanced 2D Index header header header header
arrays
They can have custom indices
and headers
Each column in a DataFrame is a
Series
Creating DataFrames From Files
35
• Pandas provides a read_csv() function to read data stored as a .csv file into
a pandas DataFrame.
• Pandas supports many different file formats including csv and excel:
• myDataFrame = pd.read_csv(“myfile.csv”)
• myDataFrame = pd.read_excel(“myfile.xlsx”)
• To save data from DataFrames to files use:
• myDataFrame.to_csv(“myOutputFile.csv”)
• myDataFrame.to_excel(“myOutputFile.xlsx”)
• After reading a file, you can display the first and last 5 rows using
myDataFrame.head()
Creating DataFrames From Files in Colab
36
Click to upload a file
I uploaded this file
df2.to_csv('testFileToWrite.csv') # this will create an output file with .csv extension
Creating DataFrames From Internet Files (1/3)
37
• We will use the Iris sample data, which contains information on 150
Iris flowers, 50 each from one of three Iris species: Setosa,
Versicolour, and Virginica.
• Each flower is characterized by five attributes:
1. sepal_length in centimeters
2. sepal_width in centimeters
3. petal_length in centimeters
4. petal_width in centimeters
• Each flower belongs to one type, which is the last column in
DataFrame:
(Setosa, Versicolour, Virginica)
Data is available online at: https://archive.ics.uci.edu/dataset/53/iris
Iris Flowers Dataset
38
Creating DataFrames From Internet Files (2/3)
39
import pandas as pd
#The argument header=None says that this dataset does not
contain a header yet, so we will add one next
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-
learning-databases/iris/iris.data',header=None)
# data = pd.read_csv('iris.data')
#You can then Add column headers
data.columns=['sepal_length','sepal_width','petal_length','pe
tal_width','class']
#And display the first 5 rows to make sure that the reading
is successful
data.head()
Creating DataFrames From Internet Files (3/3)
40
The output:
41
DataFrames Indexing
Accessing DataFrame’s Columns and Rows (1/4)
42
petal_length columns:
#Access one column using a header’s name 0 1.4
print('petal_length 1 1.4
columns:\n',data['petal_length']) 2 1.3
3 1.5
4 1.4
...
145 5.2
146 5.0
147 5.2
148 5.4
149 5.1
First row:
#Access one row using the .iloc function sepal_length 5.1
print('\n\nFirst row:') sepal_width 3.5
petal_length 1.4
print(data.iloc[0]) petal_width 0.2
class Iris-setosa
Accessing DataFrame’s Columns and Rows (2/4)
43
#Access a sequential slice of rows using the .iloc
function
print('\n\nFirst 5 rows:')
print(data.iloc[0:5]) # up to but not including 5
First 5 rows:
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Accessing DataFrame’s Columns and Rows (3/4)
44
#Access a sequential slice of rows and columns using the
.iloc function
print('\n\nFirst 5 rows and first 2 columns:')
#print up to but not including row 5, up to but not
including col 2
#.iloc[ rows from:to , cols from:to ]
print(data.iloc[0:5 , 0:2 ])
First 5 rows and first 2 columns:
sepal_length sepal_width
0 5.1 3.5
1 4.9 3.0
2 4.7 3.2
3 4.6 3.1
4 5.0 3.6
Accessing DataFrame’s Columns and Rows (4/4)
45
#Access a sequential slice of rows and columns using the
.iloc function
print('\n\nFirst 5 rows and first 2 columns:')
#print up to but not including row 5, and cols 0,1 and the
last column
#.loc[ rows from:to , [cols indices] ]
print(data.iloc[0:5 , [0,1,-1]])
sepal_length sepal_width class
0 5.1 3.5 Iris-setosa
1 4.9 3.0 Iris-setosa
2 4.7 3.2 Iris-setosa
3 4.6 3.1 Iris-setosa
4 5.0 3.6 Iris-setosa
46
DataFrames Boolean Indexing
DataFrames Boolean Indexing (1/5)
47
Pandas provide a powerful selection feature called Boolean
indexing.
That is, you can use a Boolean expression that returns True/False
to filter a DataFrame.
Let us start by extracting the numeric data from our DataFrame:
data_numeric = data.iloc[:, 0:4]
data_numeric.head()
sepal_length sepal_width petal_length petal_width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
DataFrames Boolean Indexing (2/5)
48
# from the previous slide
data_numeric = data.iloc[:, 0:4]
#Filter the dataFrame, locate values >=
5.0 sepal_length sepal_width petal_length petal_width
rst = data_numeric[data_numeric >= 5.0] 0 5.1 NaN NaN NaN
rst
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
• Pandas checks every element to 3 NaN NaN NaN NaN
determine whether its value is greater 4 5.0 NaN NaN NaN
than or equal to 5.0. ... ... ... ... ...
• If True then it includes it in the new 145 6.7 NaN 5.2 NaN
DataFrame (rst in the example above). 146 6.3 NaN 5.0 NaN
147 6.5 NaN 5.2 NaN
• Elements for which the condition is False 148 6.2 NaN 5.4 NaN
are represented as NaN (not a number) in 149 5.9 NaN 5.1 NaN
the new DataFrame. 150 rows × 4 column
DataFrames Boolean Indexing (3/5)
49
• In Boolean expression, you can use:
▪ AND, which is the & operator
▪ OR, which is the | operator
data_numeric = data.iloc[:, 0:4]
rst = data_numeric[data_numeric >= 5.0]
rst.head()
#Other examples (data_numeric >= 3.0) AND (data_numeric <=
5.0):
rst = data_numeric[(data_numeric >= 3.0) & (data_numeric <=
5.0)]
rst.head()
#Other examples (data_numeric < 3.0) OR (data_numeric > 5.0):
rst = data_numeric[(data_numeric < 3.0) | (data_numeric > 5.0)]
rst.head()
DataFrames Boolean Indexing (4/5)
50
• In Boolean expression, you can use the .loc function to filter rows according to Boolean
criteria.
import pandas as pd
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
# data = pd.read_csv('iris.data')
data.columns=['sepal_length','sepal_width','petal_length','petal_width'
,'class']
#Select row where sepal_length >= 5.0
rst = data.loc[ data.sepal_length >= 5.0 ]
print('Select row where sepal_length >= 5.0')
print(rst.head())
#Select row where sepal_length >= 5.0 AND & data.sepal_width >= 3.5
rst = data.loc[ (data.sepal_length >= 5.0) & (data.sepal_width >= 3.5)]
print('Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5')
print(rst.head())
DataFrames Boolean Indexing (5/5)
51
Select row where sepal_length >= 5.0
Output: sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
Select row where sepal_length >= 5.0 & data.sepal_width >= 3.5
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
14 5.8 4.0 1.2 0.2 Iris-setosa
Summary of Four Types of Indexing in
DataFrames
52
import pandas as pd
#Retrieve data from web archive and add column headers
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width','class']
# Indexing with column header name
print(data['sepal_width'])
# Indexing using .iloc, which is similar to numpy indexing
print(data.iloc[2,1]) # index a particular element
print(data.iloc[2]) # index a row
print(data.iloc[2:5]) # index a range of rows
print(data.iloc[:,4]) # index a column
print(data.iloc[:,0:3]) # index a range of columns
print(data.iloc[20:22,0:3]) # index a range of rows and columns
print(data.iloc[0:7:2,3]) # index a range of rows with step size
print(data.iloc[2,-1]) # index with -1 for last column
print(data.iloc[[0,39,45],3]) # index some specific row numbers
# boolean indexing (element wise)
data_numeric = data.iloc[:,0:4] #retrieve only numeric data
print(data_numeric[data_numeric >= 5.0]) #all elements that satisfy boolean condition
print(data_numeric[(data_numeric >= 3.0) & (data_numeric <= 5.0)])
# boolean indexing (row wise) using .loc
print(data.loc[data.sepal_length >= 7]) #all rows that satisfy boolean condition
print(data.loc[(data.sepal_length >= 7) & (data.petal_length <=5)])
53
DataFrames Statistics
DataFrames Statistics (1/2)
54
Similar to Series, you can use the describe()function to print out
statistics.
In DataFrames, the statistics are calculated by column (for the numeric
columns only).
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
data.describe() 50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
DataFrames Statistics (2/2)
55
Similar to Series, you can use the mean(), min(), max(), std(),
var().
In DataFrames, the statistics are calculated by column (for the numeric columns
only).
Avg per col:
print('Avg per col:') sepal_length 5.843333
sepal_width 3.054000
print(data.mean()) petal_length 3.758667
print('Std per col:') petal_width 1.198667
print(data.std())
Std per col:
print('Min per col:') sepal_length 0.828066
print(data.min()) sepal_width 0.433594
print('Max per col:') petal_length 1.764420
petal_width 0.763161
print(data.max())
…
56
Converting Numpy to DataFrames
DataFrames <-> NumPy (1/3)
57
There are cases where you need to convert a DataFrame into a NumPy Array
and vice versa
This is needed in machine learning tasks like classification and regression that
you will study next
Let us start by converting a DataFrame into a NumPy array using to_numpy()
function
import pandas as pd
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data',header=None)
data.columns=['sepal_length','sepal_width','petal_length','petal_width
', 'class']
#Convert a dataFrame into a numPy array
numpy_from_dataFrame = data.to_numpy()
#print(numpy_from_dataFrame)
#OR: Convert the first 4 columns of a dataFrame into a numPy array
numpy_from_dataFrame = data.iloc[:, 0:4].to_numpy()
#print(numpy_from_dataFrame)
DataFrames <-> NumPy (2/3)
58
Output of Output of data.iloc[:,
data.to_numpy() 0:4].to_numpy()
[[5.1 3.5 1.4 0.2 'Iris-setosa'] [[5.1 3.5 1.4 0.2]
[4.9 3.0 1.4 0.2 'Iris-setosa'] [4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2 'Iris-setosa'] [4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2 'Iris-setosa'] [4.6 3.1 1.5 0.2]
[5.0 3.6 1.4 0.2 'Iris-setosa'] [5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4 'Iris-setosa'] [5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3 'Iris-setosa'] [4.6 3.4 1.4 0.3]
[5.0 3.4 1.5 0.2 'Iris-setosa'] [5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2 'Iris-setosa'] [4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1 'Iris-setosa'] [4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2 'Iris-setosa'] [5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2 'Iris-setosa'] [4.8 3.4 1.6 0.2]
… …
DataFrames <-> NumPy (3/3)
59
To convert a NumPy array into a DataFrame we can use the command
pd.DataFrame()
Notice how you can add columns (which are the headers), using the argument
columns=[…]
dataFrame_from_numpy =
pd.DataFrame(numpy_from_dataFrame, columns =
['sepal_length', 'sepal_width', 'petal_length',
'petal_width','class'])
dataFrame_from_numpy.head()
60
Converting Dictionaries to
DataFrames
Other Ways of Creating DataFrames (1/2)
61
import pandas as pd Output:
df = pd.DataFrame( Name Age Gender
{ 0 Braund, Mr. Owen Harris 22 male
"Name":["Braund, Mr. Owen 1 Allen, Mr. William Henry 35 male
Harris", "Allen, Mr. William Henry", 2 Bonnell, Miss. Elizabeth 58 female
"Bonnell, Miss. Elizabeth"],
"Age":[22, 35, 58], Age
“Gender":["male","male", "female"] count 3.000000
}
mean 38.333333
)
print(df) std 18.230012
df.describe() min 22.000000
25% 28.500000
50% 35.000000
75% 46.500000
max 58.000000
https://pandas.pydata.org/
Other Ways of Creating DataFrames (2/2)
62
#You can create a DataFrame from an existing The dictionary’s
dictionary as follows keys become the
import pandas as pd column names
my_dictionary={ (headers).
"Name": [
"Dr. Sami Batata", The values become
"Prof. Marwa Halawah", the element values
"Mr. Fawzi Kamal" in the
], corresponding
"Age": [29, 40, 60], column.
"Gender": ["male", "female", "male"]
}
df = pd.DataFrame( my_dictionary)
print(df)
Name Age Gender
0 Dr. Sami Batata 29 male
1 Prof. Marwa Halawah 40 female
2 Mr. Fawzi Kamal 60 male