
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Keep Parameter in Pandas Series Drop Duplicates Method
The drop_duplicate() method in the pandas series constructor is used to remove the duplicate values from a series object. This method cleans the duplicate values and returns a series with modified rows, and it won’t alter the original series object. Instead, it will return a new one.
One of the important parameters in the drop_duplicates() method is “Keep”, the default value of this parameter is “first” which keeps the first occurrence value and deletes the remaining. We can also specify Last and False values to the keep parameter.
If keep=False, it will delete all duplicate values. Or if keep= “Last”, it deletes the duplicate values except for the last occurrence.
Example 1
In the following example, initially, we created a pandas Series by using the pandas series method with a list of strings. Later on, we applied the drop_duplicates() method by setting keep= “last”.
# import pandas package import pandas as pd # create pandas series with duplicate values series = pd.Series(['Robin', 'John', 'Nori', 'Yi', 'Robin', 'Amal', 'Nori']) print(series) # delete duplicate values with keep='last' result = series.drop_duplicates(keep='last') print('Output:
',result)
Output
The output is given below −
0 Robin 1 John 2 Nori 3 Yi 4 Robin 5 Amal 6 Nori dtype: object Output: 1 John 3 Yi 4 Robin 5 Amal 6 Nori dtype: object
The value “Robin” is repeated in two index positions “0” and “4”, and the value “Nori” is also repeated in two positions “2”, “6”.
By setting keep=Last, we have successfully deleted the values at index positions 0 and 2.
Example 2
For the same example, we have changed the value of the keep parameter from “last” to “first”.
# import pandas package import pandas as pd # create pandas series with duplicate values series = pd.Series(['Robin', 'John', 'Nori', 'Yi', 'Robin', 'Amal', 'Nori']) print(series) # delete duplicate values with keep='first' result = series.drop_duplicates(keep='first') print('Output:
',result)
Output
You will get the following output −
0 Robin 1 John 2 Nori 3 Yi 4 Robin 5 Amal 6 Nori dtype: object Output: 0 Robin 1 John 2 Nori 3 Yi 5 Amal dtype: object
For the above mentioned output, the duplicate values at “4” and “6” are deleted, because the values “Robin” and “Nori” occurred Fist at “0” and “2” positions.
Example 3
In this example we will see, how does drop_duplicates() method work for the keep=False value. We have initially created a series object with a list of integers then applied the method.
# import pandas package import pandas as pd # create pandas series with duplicate values series = pd.Series([1,2,1,3,4,2,6,4,5]) print(series) # delete duplicate values with keep=False result = series.drop_duplicates(keep=False) print('Output:
',result)
Output
The output is given below −
0 1 1 2 2 1 3 3 4 4 5 2 6 6 7 4 8 5 dtype: int64 Output: 3 3 6 6 8 5 dtype: int64
The resultant series object from the drop_duplicates() method only has 3 rows whereas the original series has 9 rows. It happened because keep=False will remove all duplicate values, it does keep any single occurrences.