Signup/Sign In
PUBLISHED ON: MARCH 16, 2021

Pandas DataFrame duplicated() Method

In this tutorial, we will learn the Python pandas DataFrame.duplicated() method. It returns the boolean Series denoting duplicate rows. We can consider certain columns but it is optional. It returns the boolean series for each duplicated row.

The below shows the syntax of the DataFrame.duplicated() method.

Syntax

DataFrame.duplicated(subset=None, keep='first')

Parameters

subset: column label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep:{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates (if any) to mark.

  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False: Mark all duplicates as True.

Example 1: Finding duplicated columns using the DataFrame.duplicated() Method

The below example shows by default, for each set of duplicated values in the DataFrame, the first occurrence is set on False and all others on True.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated())

Once we run the program we will get the following output.


-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 False
1 False
2 True
3 True
4 False
5 True
dtype: bool

Example 2: Finding duplicated columns using the DataFrame.duplicated() Method

The below example shows the by using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(keep='last'))

Once we run the program we will get the following output.


-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 True
1 True
2 False
3 False
4 True
5 False
dtype: bool

Example 3: Finding duplicated columns using the DataFrame.duplicated() Method

The below example shows by setting keep on False, all duplicates are True.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(keep=False))

Once we run the program we will get the following output.


-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 True
1 True
2 True
3 True
4 True
5 True
dtype: bool

Example 4: Finding duplicated columns using the DataFrame.duplicated() Method

The below example shows how to find duplicates on the specific column(s), by using subset method.

import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(subset=['Skills']))

Once we run the program we will get the following output.


-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 False
1 False
2 True
3 True
4 True
5 True
dtype: bool

Conclusion:

In this tutorial, we learned the Python pandas DataFrame.duplicated() method. We learned the syntax, parameter and by applying this method on the DataFrame we solved examples and understood the DataFrame.duplicated() method.



About the author:
I like writing about Python, and frameworks like Pandas, Numpy, Scikit, etc. I am still learning Python. I like sharing what I learn with others through my content.