Pandas DataFrame drop_duplicates() Method
In this tutorial, we will learn the Python pandas DataFrame.drop_duplicates()
method. It returns a DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes, are ignored.
The below shows the syntax of the DataFrame.drop_duplicates()
method.
Syntax
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)
Parameters
subset: column label or sequence of labels, optional. Only consider certain columns for identifying duplicates, by default use all of the columns.
keep: {‘first’, ‘last’, False}, default ‘first’. Determines which duplicates (if any) to keep. - first: Drop duplicates except for the first occurrence. - last: Drop duplicates except for the last occurrence. - False: Drop all duplicates.
inplace: bool, default False. Whether to drop duplicates in place or to return a copy.
ignore_index: bool, default False. If True, the resulting axis will be labeled 0, 1, …, n - 1.
Example 1: Removing duplicate rows using DataFrame.drop_duplicates() Method
The DataFrame.drop_duplicates()
method removes the duplicates rows based on the columns. The below example shows the same.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya','Vindya', 'Navya', 'Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates())
Once we run the program we will get the following output.
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
0 Navya Python
1 Vindya Java
4 Sinchana Java
Example 2: Removing duplicate rows using DataFrame.drop_duplicates() Method
The DataFrame.drop_duplicates()
method removes the duplicates rows on a specific column(s), using a subset
method. The below example shows the same.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya', 'Vindya','Navya','Vindya','Sinchana','Sinchana'],'Skills': ['Python', 'Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates(subset=['Skills']))
Once we run the program we will get the following output.
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
0 Navya Python
1 Vindya Java
Example 3: Removing duplicate rows using DataFrame.drop_duplicates() Method
The DataFrame.drop_duplicates()
method removes the duplicates rows by keeping last occurrences
, and using the keep
method. The below example shows the same.
import pandas as pd
df = pd.DataFrame({'Name': ['Navya', 'Vindya','Navya','Vindya','Sinchana','Sinchana'],'Skills': ['Python','Java','Python','Java','Java','Java']})
print(df)
print("-------After removing duplicate rows------")
print(df.drop_duplicates(subset=['Name', 'Skills'], keep='last'))
Once we run the program we will get the following output.
Name Skills
0 Navya Python
1 Vindya Java
2 Navya Python
3 Vindya Java
4 Sinchana Java
5 Sinchana Java
-------After removing duplicate rows------
Name Skills
2 Navya Python
3 Vindya Java
5 Sinchana Java
Conclusion
In this tutorial, we will learn the DataFrame.drop_duplicates()
method. We learned the syntax, parameters, and solved examples by applying this method on the DataFrame and understood the method.