Pandas DataFrame duplicated() Method
In this tutorial, we will learn the Python pandas DataFrame.duplicated()
method. It returns the boolean Series denoting duplicate rows. We can consider certain columns but it is optional. It returns the boolean series for each duplicated row.
The below shows the syntax of the DataFrame.duplicated()
method.
Syntax
DataFrame.duplicated(subset=None, keep='first')
Parameters
subset: column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns.
keep:{‘first’, ‘last’, False}, default ‘first’
Determines which duplicates (if any) to mark.
-
first
: Mark duplicates as True
except for the first occurrence.
-
last
: Mark duplicates as True
except for the last occurrence.
-
False: Mark all duplicates as True
.
Example 1: Finding duplicated columns using the DataFrame.duplicated()
Method
The below example shows by default, for each set of duplicated values in the DataFrame, the first occurrence is set on False
and all others on True
.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated())
Once we run the program we will get the following output.
-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 False
1 False
2 True
3 True
4 False
5 True
dtype: bool
Example 2: Finding duplicated columns using the DataFrame.duplicated()
Method
The below example shows the by using ‘last’
, the last occurrence of each set of duplicated values is set on False
and all others on True
.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(keep='last'))
Once we run the program we will get the following output.
-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 True
1 True
2 False
3 False
4 True
5 False
dtype: bool
Example 3: Finding duplicated columns using the DataFrame.duplicated()
Method
The below example shows by setting keep
on False
, all duplicates are True
.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(keep=False))
Once we run the program we will get the following output.
-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 True
1 True
2 True
3 True
4 True
5 True
dtype: bool
Example 4: Finding duplicated columns using the DataFrame.duplicated()
Method
The below example shows how to find duplicates on the specific column(s), by using subset
method.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print("-----------DataFrame--------")
print(df)
print("------Finding duplicates rows-------")
print(df.duplicated(subset=['Skills']))
Once we run the program we will get the following output.
-----------DataFrame--------
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
------Finding duplicates rows-------
0 False
1 False
2 True
3 True
4 True
5 True
dtype: bool
Conclusion:
In this tutorial, we learned the Python pandas DataFrame.duplicated()
method. We learned the syntax, parameter and by applying this method on the DataFrame we solved examples and understood the DataFrame.duplicated()
method.