Signup/Sign In
PUBLISHED ON: MARCH 16, 2021

Pandas DataFrame drop_duplicates() Method

In this tutorial, we will learn the Python pandas DataFrame.drop_duplicates() method. It returns a DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes, are ignored.

The below shows the syntax of the DataFrame.drop_duplicates() method.

Syntax

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)

Parameters

subset: column label or sequence of labels, optional. Only consider certain columns for identifying duplicates, by default use all of the columns.

keep: {‘first’, ‘last’, False}, default ‘first’. Determines which duplicates (if any) to keep. - first: Drop duplicates except for the first occurrence. - last: Drop duplicates except for the last occurrence. - False: Drop all duplicates.

inplace: bool, default False. Whether to drop duplicates in place or to return a copy.

ignore_index: bool, default False. If True, the resulting axis will be labeled 0, 1, …, n - 1.

Example 1: Removing duplicate rows using DataFrame.drop_duplicates() Method

The DataFrame.drop_duplicates() method removes the duplicates rows based on the columns. The below example shows the same.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates())

Once we run the program we will get the following output.


Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
0 Navya Python
1 Vindya Java
4 Sinchana Java

Example 2: Removing duplicate rows using DataFrame.drop_duplicates() Method

The DataFrame.drop_duplicates() method removes the duplicates rows on a specific column(s), using a subset method. The below example shows the same.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya',  'Vindya','Navya','Vindya','Sinchana','Sinchana'],'Skills': ['Python', 'Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates(subset=['Skills']))

Once we run the program we will get the following output.


Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
0 Navya Python
1 Vindya Java

Example 3: Removing duplicate rows using DataFrame.drop_duplicates() Method

The DataFrame.drop_duplicates() method removes the duplicates rows by keeping last occurrences, and using the keep method. The below example shows the same.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya', 'Vindya','Navya','Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates(subset=['Name', 'Skills'], keep='last'))

Once we run the program we will get the following output.


Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
2 Navya Python
3 Vindya Java
5 Sinchana Java

Conclusion

In this tutorial, we will learn the DataFrame.drop_duplicates() method. We learned the syntax, parameters, and solved examples by applying this method on the DataFrame and understood the method.



About the author:
I like writing about Python, and frameworks like Pandas, Numpy, Scikit, etc. I am still learning Python. I like sharing what I learn with others through my content.