
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Difference Between Shallow Copy vs Deep Copy in Pandas DataFrame
One of the most useful data structures in Pandas is the Pandas DataFrame which is a 2-Dimensional table-like structure that contains rows and columns to store data. It allows users to store and manipulate the data, very similar to a spreadsheet or SQL table.
It also provides a serial or linear data structure which is called the 1-Dimensional labelled array that can hold elements of any data type.
Shallow Copy
A shallow copy, as the name suggests, creates a new DataFrame object that references the original data. In other words, a shallow copy points to the same memory location as the original DataFrame. Any modifications made to the shallow copy will reflect in the original DataFrame and vice versa. This behaviour is due to the shared references between the original and the copied object.
Syntax
pandas.DataFrame.copy(deep=False)
If 'deep=False', a shallow copy of the DataFrame is created but the data and index labels are not copied, instead, both the original and the new DataFrame will refer the same data and index labels.
Deep Copy
A deep copy refers to developing a very independent copy of a dataframe, which includes all its data and metadata. In other phrases, a deep copy creates a brand new dataframe object with its personal memory space, become independent from the original dataframe.
Syntax
pandas.DataFrame.copy(deep=True)
The parameter "deep": is optional and its default value is set to True. If 'deep=True', a deep copy of the DataFrame is created. So, we infer that a new DataFrame object is created and all the data and index labels are copied from the original DataFrame to the new one.
Example
In this code, we will create a dataframe and make a deep copy and shallow copy then modify the three dataframes with some operations and demonstrate the different changes in the original shallow deep copy dataframes.
Algorithm
Import the pandas library.
Define a dictionary containing the DataFrame's data.
Create a DataFrame df using the data and the pd.DataFrame() method.
Using the copy() function with the arguments deep=False and deep=True, make a shallow and deep duplicate of the original dataframe.
To see the various changes, change any of the required values in each dataframe.
Print the original dataframe, the shallow and deep copy.
We display the ID of the original DataFrame and its deep and shallow copy.
Example
import pandas as pd # Create a DataFrame data = {'Name': ['Rahul', 'Priya', 'Amit'], 'Age': [25, 28, 22], 'City': ['Mumbai', 'Delhi', 'Kolkata']} df = pd.DataFrame(data) # Shallow copy shallow_copy = df.copy(deep=False) # Deep copy deep_copy = df.copy(deep=True) # Modify a value in the original DataFrame df.loc[0, 'Age'] = 30 shallow_copy.loc[1, 'City'] = 'Chennai' deep_copy.loc[0, 'Age'] = 85 # Print the original DataFrame and its ID print("Original DataFrame:\n", df) print("Shallow Copy:\n", shallow_copy) print("Deep Copy:\n", deep_copy) print() print("Original DataFrame ID:", id(df)) print("Shallow Copy ID:", id(shallow_copy)) print("Deep Copy ID:", id(deep_copy))
Output
Original DataFrame: Name Age City 0 Rahul 30 Mumbai 1 Priya 28 Chennai 2 Amit 22 Kolkata Shallow Copy: Name Age City 0 Rahul 30 Mumbai 1 Priya 28 Chennai 2 Amit 22 Kolkata Deep Copy: Name Age City 0 Rahul 85 Mumbai 1 Priya 28 Delhi 2 Amit 22 Kolkata Original DataFrame ID: 140268239802704 Shallow Copy ID: 140269600767952 Deep Copy ID: 140269600767904
Here, the original DataFrame is modified by changing the age of the first row from 25 to 30 this is also reflected in the shallow copy. The shallow copy's data is not changed and hence is independent of the original DataFrame.
Also when the city of the second row is changed to "Chennai", it affects both the shallow copy and the original DataFrame. The deep copy, on the other hand, is completely independent, so when the age of the first row is changed to 85, it does not affect the original DataFrame.
The IDs show that the original DataFrame and its copies are distinct objects having different ids but the shallow copy does share the memory space with the original DataFrame for most data elements, except for some metadata.
Hence we infer that, the object that is deeply copied is a completely new object now, whereas the shallow copied object is just another alias pointer pointing to the original dataframe.
Example
The following code demonstrates the concept of copying a DataFrame of countries and its population in pandas and showcases the difference between shallow copy and deep copy when changes are made in the original and its shallow and deep copy.
Algorithm
Import the pandas library.
Create a dictionary data with 'Country' and 'Population (Millions)' as keys and corresponding values.
Create a DataFrame df_original using the dictionary data.
Create a shallow copy of df_original and assign it to df_shallow_copy using the copy() method with deep=False.
Create a deep copy of df_original and assign it to df_deep_copy using the copy() method with deep=True.
Modify the shallow copy and the deep copy by changing the values in specific rows using loc[].
Add a new row to the original DataFrame df_original and the shallow copy DataFrame df_shallow_copy using the append() method.
Print the original dataframe and its shallow and deep copy.
import pandas as pd # Create a DataFrame data = {'Country': ['USA', 'Germany', 'Japan'], 'Population (Millions)': [328, 83, 126]} df_original = pd.DataFrame(data) # Shallow copy df_shallow_copy = df_original.copy(deep=False) # Deep copy df_deep_copy = df_original.copy(deep=True) # Modify the shallow copy df_shallow_copy.loc[0, 'Country'] = 'United States Of America' df_shallow_copy.loc[1, 'Population (Millions)'] = 82 # Modify the deep copy df_deep_copy.loc[2, 'Country'] = 'India' df_deep_copy.loc[2, 'Population (Millions)'] = 1400 # Add a new row to the original DataFrame new_row = {'Country': 'Canada', 'Population (Millions)': 38} df_original = df_original.append(new_row, ignore_index=True) # Print the original DataFrame print("Original DataFrame:") print(df_original) # Print the shallow copy print("\nShallow Copy:") print(df_shallow_copy) # Print the deep copy print("\nDeep Copy:") print(df_deep_copy) # Add a new row to the shallow copy DataFrame new_row_shallow = {'Country': 'Australia', 'Population (Millions)': 25} df_shallow_copy = df_shallow_copy.append(new_row_shallow, ignore_index=True) # Print the modified shallow copy DataFrame print("\nModified Shallow Copy:") print(df_shallow_copy)
Output
Original DataFrame: Country Population (Millions) 0 United States Of America 328 1 Germany 82 2 Japan 126 3 Canada 38 Shallow Copy: Country Population (Millions) 0 United States Of America 328 1 Germany 82 2 Japan 126 Deep Copy: Country Population (Millions) 0 USA 328 1 Germany 83 2 India 1400 Modified Shallow Copy: Country Population (Millions) 0 United States Of America 328 1 Germany 82 2 Japan 126 3 Australia 25
The 'Country' value of the first row was modified from 'USA' to 'United States Of America' and The 'Population (Millions)' value of the second row was modified from 83 to 82 in the shallow copy which is reflected both in shallow copy as well as the original dataframe whereas the change of country name from japan to India and its population in deep copy did not affect the original DataFrame.
The newly added rows in both the original and its shallow copy are affected only by their respective dataframes as adding a new object in copy is personal and does not affect the original dataframe and vice versa.
Differences between shallow copy and deep copy
Shallow Copy | Deep Copy | |
---|---|---|
Definition | Creates a new object with references to the same data as the original object. | Creates a completely independent copy with its own data and metadata. |
Data Sharing | Shares data between the original and copied objects. Does not share data with the original object. | Does not share data with the original object. |
Memory Space | Shares memory space with the original object. | Has its own memory space separate from the original object. |
Modifiability | Changes made to the copy can affect the original object and vice versa But adding a new item that is personal to the dataframe doesn't reflect on the original dataframe. | Changes made to the copy do not affect the original object, and vice versa. Similarly, adding a new item that is personal to the dataframe doesn't reflect on the original dataframe. |
Performance | Is Faster and requires less memory as it avoids duplicating the data. | Slower and requires more memory due to duplicating the data. |
Conclusion
A shallow Copy is suitable when we want to create a new DataFrame that shares the same memory space as the original DataFrame. It is Efficient when working with large datasets since it avoids unnecessary memory duplication. A deep Copy is recommended when we need to create an independent copy of the DataFrame.