0% found this document useful (0 votes)
42 views21 pages

UNIT - 3 Pandas

Uploaded by

rp402948
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views21 pages

UNIT - 3 Pandas

Uploaded by

rp402948
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT - III

Pandas
What is Pandas?
 Pandas is a Python library used for working with data sets.
 Pandas is used for data analysis in Python and developed by Wes McKinney in 2008.
 Pandas is defined as an open-source library that provides high-performance data analyzing,
cleaning, exploring, and manipulating data and machine learning tasks in Python.
 The name of Pandas is derived from the word Panel Data, which means an Econometrics
from Multidimensional data.

Why Use Pandas?


Pandas in Python for its following advantages:

 Pandas allow us to analyze big data and make conclusions based on statistical theories.
 Pandas can clean messy data sets, and make them readable and relevant.
 Relevant data is very important in data science.
 Easily handles missing data
 It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data
structure.
 It provides an efficient way to slice the data
 It provides a flexible way to merge, concatenate or reshape the data

How to Install Pandas?


 The first step of working in pandas is to ensure whether it is installed in the Python folder or
not.
 If not then we need to install it in our system using pip command.
 Type cmd command in the search box and locate the folder using cd command where
python-pip file has been installed.
 After locating it, type the command:
pip install pandas
 After the pandas have been installed into the system, you need to import the library. This
module is generally imported as:
import pandas

Pandas as pd
 Pandas is usually imported under the pd alias.
 alias: In Python alias are an alternate name for referring to the same thing.
 Create an alias with the as keyword while importing:
 Now the Pandas package can be referred to as pd instead of pandas.
import pandas as pd

Checking Pandas Version


The version string is stored under __version__ attribute.

Example

import pandas as pd
print(pd.__version__)
Python Pandas Data Structure

The Pandas provides two data structures for processing the data, i.e., Series and DataFrame, which
are discussed below:

1) Pandas Series
 A Pandas Series is like a column in a table.
 It is defined as a one-dimensional array that is capable of storing various data types.
 The row labels of series are called the index.
 We can easily convert the list, tuple, and dictionary into series using "series' method. It has
one parameter.
 A Series cannot contain multiple columns.

Syntax:

pandas.Series( data, index, dtype, copy)


The parameters of the constructor are as follows −
 data : data takes various forms like ndarray, list, constants
 index : Index values must be unique and hashable, same length as data.
 Dtype: It refers to the data type of series.
 Copy: It is used for copying the data
Create an Empty Series

A basic series, which can be created is an Empty Series.

Example

#import the pandas library and aliasing as pd


import pandas as pd
s = pd.Series()
print s

Output:

Warning (from warnings module):

File "C:/Users/DELL/Desktop/panda.py", line 3

s = pd.Series()

DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future
version. Specify a dtype explicitly to silence this warning.

Series([], dtype: float64)

Create a Series from ndarray


 If data is an ndarray, then index passed must be of the same length.
 If no index is passed, then by default index will be range(n) where n is array length, i.e.,
[0,1,2,3…. range(len(array))-1].
Example 1:

Create a simple Pandas Series from a list:

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info])
print(a)

Output:

0 P

1 a

2 n

3 d

4 a

5 s

dtype: object

Example 2:

Create a simple Pandas Series from a list:

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info, index = [100, 101, 102, 103, 104, 105])
print(a)

Output:

100 P

101 a

102 n

103 d

104 a

105 s

dtype: object

Create a Series from Scalar


 If data is a scalar value, an index must be provided. The value will be repeated to match the
length of index.

Example:

#import the pandas library and aliasing as pd


import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s

Output:

0 5

1 5

2 5

3 5

dtype: int64

Accessing Data from Series with Position

Data in the series can be accessed similar to that in an ndarray.

Example 1:

Retrieve the first element. As we already know, the counting starts from zero for the array, which
means the first element is stored at zeroth position and so on.

import pandas as pd
s = pd.Series([1,2,3,4,5])
#retrieve the first element
print s[0]

Output:

Example 2 :

Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index
onwards will be extracted. If two parameters (with : between them) is used, items between the two
indexes.

import pandas as pd
s = pd.Series([1,2,3,4,5])
#retrieve the first element
print s[ : 3]

Output:
0 1

1 2

2 3

dtype: int64

Example 3:

Retrieve the last three elements.

import pandas as pd
s = pd.Series([1,2,3,4,5] )
#retrieve the first element
print s[-3 : ]

Output:

2 3

3 4

4 5

dtype: int64

Retrieve Data Using Label (Index)

A Series is like a fixed-size dict in that you can get and set values by index label.

Example 1:

Retrieve a single element using index label value.

import pandas as pd
s = pd.Series([1,2,3,4,5], index = ['a','b','c','d','e'] )
#retrieve the first element
print s[‘a’]

Output:

Example 2

Retrieve multiple elements using a list of index label values.

import pandas as pd
s = pd.Series([1,2,3,4,5], index = ['a','b','c','d','e'] )
#retrieve the first element
print s[[‘a’, ‘b’, ‘c’]]

Output:

a 1

b 2

c 3

Example 3
If a label is not contained, an exception is raised.

import pandas as pd
s = pd.Series([1,2,3,4,5], index = ['a','b','c','d','e'] )
#retrieve the first element
print s[‘f’]

Output:

KeyError: 'f'

2) Pandas DataFrame:

 Pandas DataFrame is a widely used data structure which works with a two-dimensional array
with labeled axes (rows and columns).
 DataFrame is defined as a standard way to store data that has two different indexes, i.e., row
index and column index.
 It consists of the following properties:
o The columns can be heterogeneous types like int, bool, and so on.
o It can be seen as a dictionary of Series structure where both the rows and columns are
indexed. It is denoted as "columns" in case of columns and "index" in case of rows.

Syntax:

pandas.DataFrame( data, index, columns, dtype, copy)

 The parameters of the constructor are as follows –

 data: It consists of different forms like ndarray, series, map, constants, lists, array.

 index: The Default np.arrange(n) index is used for the row labels if no index is passed.

 columns: The default syntax is np.arrange(n) for the column labels. It shows only true
if no index is passed.

 dtype: It refers to the data type of each column.

 copy(): It is used for copying the data.


o

Create a DataFrame

We can create a DataFrame using following ways:

 dict
 Lists
 Numpy ndarrrays
 Series

Create an empty DataFrame

To create an empty DataFrame in Pandas:

# importing the pandas library


import pandas as pd
df = pd.DataFrame()
print (df)

Output:

Empty DataFrame

Columns: []

Index: []

Create a DataFrame using List:


The DataFrame can be created using a single list or a list of lists.

Example 1:

# importing the pandas library


import pandas as pd
# a list of strings
x = ['CIVIL', 'EEE', 'MECH','ECE','CSE','AIDS']
# Calling DataFrame constructor on list
df = pd.DataFrame(x)
print(df)

Output:

0 CIVIL

1 EEE

2 MECH

3 ECE

4 CSE

5 AIDS

Example 2:

# importing the pandas library


import pandas as pd
# a list of strings
x = [[101,'CIVIL'], [201,'EEE'], [301,'MECH'],[401,'ECE'],[501,'CSE'],[3001,'AIDS']]
# Calling DataFrame constructor on list
df = pd.DataFrame (x,columns = ['CODE','NAME'])
print(df)

Output:

CODE NAME

0 101 CIVIL

1 201 EEE

2 301 MECH

3 401 ECE

4 501 CSE

5 3001 AIDS

Example 3:

# importing the pandas library


import pandas as pd
# a list of strings
x = [[101,'CIVIL'], [201,'EEE'], [301,'MECH'],[401,'ECE'],[501,'CSE'],[3001,'AIDS']]
# Calling DataFrame constructor on list
df = pd.DataFrame (x,columns = ['CODE','NAME'], dtype = ‘float’)
print(df)

Output:

CODE NAME
0 101.0 CIVIL

1 201.0 EEE

2 301.0 MECH

3 401.0 ECE

4 501.0 CSE

5 3001.0 AIDS

Create a DataFrame from Dict of ndarrays / Lists

 All the ndarrays must be of same length. If index is passed, then the length of the index
should equal to the length of the arrays.
 If no index is passed, then by default, index will be range(n), where n is the array length.
Example 1:

# importing the pandas library

import pandas as pd

x = {'DEPTCODE':[101,201, 301, 401,501,3001],'DEPARTMENT NAME':['CIVIL', 'EEE',


'MECH','ECE','CSE','AIDS']}

df = pd.DataFrame(x)

print(df)

Output:

DEPTCODE DEPARTMENT NAME

0 101 CIVIL

1 201 EEE

2 301 MECH

3 401 ECE

4 501 CSE

5 3001 AIDS

Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by
default taken as column names.
Example 1:

import pandas as pd

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data, index = [row1’, ‘row2])

print df

Output:

a b c

row1 1 2 NaN

row2 5 10 20.0

Column Selection, Addition, and Deletion

Column Selection:

We can select any column from the DataFrame. Here is the code that demonstrates how to select a
column from the DataFrame.

Example:

import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}
d1 = pd.DataFrame(info)
print (d1 ['one'])
Output:

a 1.0

b 2.0

c 3.0

d 4.0

e 5.0

f 6.0

g NaN

h NaN

Name: one, dtype: float64

Column Addition

We add any new column to an existing DataFrame. The below code demonstrates how to add any new
column to an existing DataFrame:
Example:

# importing the pandas library


import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
# Add a new column to an existing DataFrame object
print ("Add new column by passing series")
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print (df)
print ("Add new column using existing DataFrame columns")
df['four']=df['one']+df['three']
print (df)

Output:

Add new column by passing series

one two three

a 1.0 1 20.0

b 2.0 2 40.0

c 3.0 3 60.0

d 4.0 4 NaN

e 5.0 5 NaN

f NaN 6 NaN

Add new column using existing DataFrame columns

one two three four

a 1.0 1 20.0 21.0

b 2.0 2 40.0 42.0

c 3.0 3 60.0 63.0

d 4.0 4 NaN NaN

e 5.0 5 NaN NaN

f NaN 6 NaN NaN

Column Deletion:

We delete any column from the existing DataFrame. This code helps to demonstrate how the column
can be deleted from an existing DataFrame:
Example:

# importing the pandas library


import pandas as pd
info = {'one' : pd.Series([1, 2], index= ['a', 'b']),
'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)
# using del function
print ("Delete the first column:")
del df['one']
print (df)

Output:

The DataFrame:
one two
a 1.0 1
b 2.0 2
c NaN 3
Delete the first column:
two
a 1
b 2
c 3

Row Selection, Addition, and Deletion


Row Selection:

We can select, add, or delete any row at anytime. First of all, we will understand the row selection.
Let's see how we can select a row using different ways that are as follows:

Selection by Label:

We can select any row by passing the row label to a loc function.

Example:

# importing the pandas library


import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.loc['b'])
Output:

one 2.0
two 2.0
Name: b, dtype: float64

Selection by integer location:

The rows can also be selected by passing the integer location to an iloc function.

Example:

# importing the pandas library

import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.iloc[3])

Output:

one 4.0
two 4.0
Name: d, dtype: float64

Slice Rows

It is another method to select multiple rows using ':' operator.

Example:

# importing the pandas library


import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df[2:5])
Output:

one two

c 3.0 3

d 4.0 4

e 5.0 5

Addition of rows:

We can easily add new rows to the DataFrame using append function. It adds the new rows at the end.

Example:
# importing the pandas library
import pandas as pd
d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
d = d.append(d2)
print (d)
Output:

x y
0 7 8
1 9 10
0 11 12
1 13 14

Deletion of rows:

We can delete or drop any rows from a DataFrame using the index label. If in case, the label is
duplicate then multiple rows will be deleted.

Example:

# importing the pandas library


import pandas as pd
a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])
a_info = a_info.append(b_info)
# Drop rows with label 0
a_info = a_info.drop(0)

DataFrame Basic Functionality


The following tables lists down the important attributes or methods that help in DataFrame Basic
Functionality.
Sr.No. Attribute or Method & Description
1 T : Transposes rows and columns.
2 axes : Returns a list with the row axis labels and column axis labels as the only members.

3 dtypes : Returns the dtypes in this object.


4 empty : True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.

5 ndim : Number of axes / array dimensions.

6 shape : Returns a tuple representing the dimensionality of the DataFrame.

7 size : Number of elements in the NDFrame.

8 values : Numpy representation of NDFrame.


9 head() : Returns the first n rows.

10 tail() : Returns last n rows.

DataFrame Functions
There are lots of functions used in DataFrame which are as follows:

Functions Description
Pandas DataFrame.append() Add the rows of other dataframe to the end of the given
dataframe.
Pandas DataFrame.apply() Allows the user to pass a function and apply it to every single
value of the Pandas series.
Pandas DataFrame.assign() Add new column into a dataframe.
Pandas DataFrame.astype() Cast the Pandas object to a specified dtype.astype() function.
Pandas DataFrame.concat() Perform concatenation operation along an axis in the
DataFrame.
Pandas DataFrame.count() Count the number of non-NA cells for each column or row.
Pandas DataFrame.describe() Calculate some statistical data like percentile, mean and std
of the numerical values of the Series or DataFrame.
Pandas Remove duplicate values from the DataFrame.
DataFrame.drop_duplicates()
Pandas DataFrame.groupby() Split the data into various groups.
Pandas DataFrame.head() Returns the first n rows for the object based on position.
Pandas DataFrame.hist() Divide the values within a numerical variable into "bins".
Pandas DataFrame.iterrows() Iterate over the rows as (index, series) pairs.
Pandas DataFrame.mean() Return the mean of the values for the requested axis.
Pandas DataFrame.melt() Unpivots the DataFrame from a wide format to a long format.
Pandas DataFrame.merge() Merge the two datasets together into one.
Pandas DataFrame.pivot_table() Aggregate data with calculations such as Sum, Count,
Average, Max, and Min.
Pandas DataFrame.query() Filter the dataframe.
Pandas DataFrame.sample() Select the rows and columns from the dataframe randomly.
Pandas DataFrame.shift() Shift column or subtract the column value with the previous
row value from the dataframe.
Pandas DataFrame.sort() Sort the dataframe.
Pandas DataFrame.sum() Return the sum of the values for the requested axis by the
user.
Pandas DataFrame.to_excel() Export the dataframe to the excel file.
Pandas DataFrame.transpose() Transpose the index and columns of the dataframe.
Pandas DataFrame.where() Check the dataframe for one or more conditions.

Working with CSV files


What is CSV file:
 CSV (Comma Separated Values) is a simple file format used to store tabular data, such as
a spreadsheet or database.
 A CSV file stores tabular data (numbers and text) in plain text.
 Each line of the file is a data record.
 Each record consists of one or more fields, separated by commas.
 Each cell in the spreadsheet is separated by commas, hence the name.
 The use of the comma as a field separator is the source of the name for this file format.
Python DataFrame to CSV File
 A CSV (comma-seperated value) are the text files that allows data to be stored in a table
format.
 Using .to_csv() method in Python Pandas we can convert DataFrame to CSV file.
 Syntax of pandas.DataFrame.to_csv() Function

Example:
import pandas as pd
mid_term_marks = {"Student": ["Kamal", "Arun", "David", "Thomas", "Steven"],
"Economics": [10, 8, 6, 5, 8],
"Fine Arts": [7, 8, 5, 9, 6],
"Mathematics": [7, 3, 5, 8, 5]}
mid_term_marks_df = pd.DataFrame(mid_term_marks)
print(mid_term_marks_df)
mid_term_marks_df.to_csv("D:\midterm.csv")
print(pd.read_csv(‘D:\midterm.csv’)

Output:

Student Economics Fine Arts Mathematics

0 Kamal 10 7 7

1 Arun 8 8 3

2 David 6 5 5

3 Thomas 5 9 8

4 Steven 8 6 5

Python Read CSV file :

 CSV stands for comma-separated values. A CSV file is a delimited text file that uses a
comma to separate values.

 CSV file to store tabular data in plain text.

 The CSV file format is quite popular and supported by many software applications such as
Notepad, Microsoft Excel and Google Spreadsheet.

 We can create a CSV file using the following ways:

1. Using Notepad: We can create a CSV file using Notepad. In the Notepad, open a new
file in which separate the values by comma and save the file with .csv extension.
2. Using Excel: We can also create a CSV file using Excel. In Excel, open a new file in
which specify each value in a different cell and save it with filetype CSV.

To read data row-wise from a CSV file in Python, we can use reader are present in the CSV module
allows us to fetch data row-wise.

Pandas read_csv() Method


 Pandas is an opensource library that allows to you import CSV in Python and perform data
manipulation. Pandas provide an easy way to create, manipulate and delete the data.
 You must install pandas library with command <code>pip install pandas</code>. In
Windows, you will execute this command in Command Prompt while in Linux in the
Terminal.
 To import a CSV dataset, you can use the object pd.read_csv().

Syntax
pandas.read_csv(filepath_or_buffer,sep=',',`names=None`,`index_col=None`,
`skipinitialspace=False`)

 filepath_or_buffer: Path or URL with the data


 sep=’, ‘: Define the delimiter to use
 `names=None`: Name the columns. If the dataset has ten columns, you need to pass ten
names
 `index_col=None`: If yes, the first column is used as a row index
 `skipinitialspace=False`: Skip spaces after delimiter.

Example:
import pandas
result = pandas.read_csv('D:\data.csv')
print(result)

Pandas - Cleaning Data


Data Cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted,
duplicate, or incomplete data within a dataset.

Bad data could be:

1. Empty cells
2. Data in wrong format
3. Wrong data
4. Duplicates

1) Cleaning Empty Cells


Empty cells can potentially give you a wrong result when you analyze data.

Remove Rows

One way to deal with empty cells is to remove rows that contain empty cells.

Example

#Return a new Data Frame with no empty cells:

Example:

import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())

Note: By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument:

Example

#Remove all rows with NULL values:

import pandas as pd
df = pd.read_csv('data.csv')
df.dropna(inplace = True)
print(df.to_string())

Note: Now, the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all rows
containg NULL values from the original DataFrame.
Replace Empty Values

 Another way of dealing with empty cells is to insert a new value instead.
 This way you do not have to delete entire rows just because of some empty cells.
 The fillna() method allows us to replace empty cells with a value:

#Replace NULL values with the number 130:

import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)

2) Cleaning Data of Wrong Format:


 Cells with data of wrong format can make it difficult, or even impossible, to analyze data.
 To fix it, you have two options: remove the rows, or convert all cells in the columns into the
same format.

Convert Into a Correct Format example data:


 In our Data Frame, we have two cells with the wrong format.
 The 'Date' column should be a string that represents a date:
 Let's try to convert all cells in the 'Date' column into dates.
 Pandas has a to_datetime() method for this:

Example

#Convert to date:

import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())

As you can see from the result, the date in row 26 was fixed, but the empty date in row 22 got a NaT
(Not a Time) value, in other words an empty value. One way to deal with empty values is simply
removing the entire row.
Removing Rows
The result from the converting in the example above gave us a NaT value, which can be handled as a
NULL value, and we can remove the row by using the dropna() method.

Example

#Remove rows with a NULL value in the "Date" column:

df.dropna(subset=['Date'], inplace = True)


3) Fixing Wrong Data

 "Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like
if someone registered "199" instead of "1.99".
 Sometimes you can spot wrong data by looking at the data set.
 If you take a look at our data set, you can see that in row 7, the duration is 450, but for all the
other rows the duration is between 30 and 60.
Replacing Values
 One way to fix wrong values is to replace them with something else.
 In our example, it is most likely a typo, and the value should be "45" instead of "450", and we
could just insert "45" in row 7:

Example

#Set "Duration" = 45 in row 7:

df.loc[7, 'Duration'] = 45

 For small data sets you might be able to replace the wrong data one by one, but not for big
data sets.
 To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries
for legal values, and replace any values that are outside of the boundaries.

Example

 Loop through all values in the "Duration" column.


 If the value is higher than 120, set it to 120:

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120

Removing Rows

 Another way of handling wrong data is to remove the rows that contain wrong data.
 This way you do not have to find out what to replace them with, and there is a good chance
you do not need them to do your analyses.

Example

#Delete rows where "Duration" is higher than 120:

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)

4) Removing Duplicates
Discovering Duplicates
 Duplicate rows are rows that have been registered more than one time.
 By taking a look at our test data set
 To discover duplicates, we can use the duplicated() method.
 The duplicated() method returns a Boolean values for each row:

Example

Returns True for every row that is a duplicate, othwerwise False:

print(df.duplicated())

Removing Duplicates
To remove duplicates, use the drop_duplicates() method.

Example

#Remove all duplicates:

df.drop_duplicates(inplace = True)

The (inplace = True) will make sure that the method does NOT return a new DataFrame, but it will
remove all duplicates from the original DataFrame.

You might also like