Pandas DataFrame Compare() Method
In this tutorial, we will learn the Python pandas DataFrame.compare()
method. This method compares one DataFrame to another DataFrame and shows the differences. It returns the DataFrame that shows the differences stacked side by side and the resulting index will be a MultiIndex with ‘self’
and ‘other’
stacked alternately at the inner level. This function throws an 'ValueError'
exception when the two DataFrames don’t have identical labels or shapes.
The below shows the syntax of the DataFrame.compare()
method.
Syntax
DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)
Parameters
other: DataFrame. Object to compare with.
align_axis: If it is '0' means ‘index’ and if it is '1' means ‘columns’, and the default value is 1. It determines which axis to align the comparison on.
keep_shape: It represents the bool(True or False), and the default value is False. If it is True, all rows and columns are kept. Otherwise, only the ones with different values are kept.
keep_equal: It represents the bool(True or False), and the default value is False. If it is true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.
Example 1: Comparing two DataFrame using the DataFrame.compare()
Method
We can compare two DataFrames and see the difference using the DataFrame.compare()
method. The below example shows the same.
#importing pandas as pd
import pandas as pd
df1 = pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
df2 = pd.DataFrame([['Abhishek',100,'Maths',95], ['Anurag',101,'Maths',80]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df1.compare(df2))
Once we run the program we will get the following output.
Subject Marks
self other self other
0 Science Maths 90 95
1 Science Maths 85 80
Example 2: Comparing two DataFrame using the DataFrame.compare()
Method with align_axis=0
When align_axis=0
the DataFrame.compare()
method returns DataFrame that are stacked vertically with rows drawn alternately from self and others.
#importing pandas as pd
import pandas as pd
df1 = pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
df2 = pd.DataFrame([['Abhishek',100,'Maths',95], ['Anurag',101,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df1.compare(df2,align_axis=0))
Once we run the program we will get the following output.
Subject Marks
0 self Science 90
other Maths 95
1 self Science 85
other Maths 75
Example 3: Comparing two DataFrame using the DataFrame.compare()
Method
The below example is similar to the previous one, change some of the elements in DataFrame, compare and check the differences.
#importing pandas as pd
import pandas as pd
df1 = pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
df2 = pd.DataFrame([['Abhishek',100,'Maths',95], ['Anurag',101,'Maths',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df1.compare(df2,align_axis=0))
Once we run the program we will get the following output.
Subject Marks
0 self Science 90.0
other Maths 95.0
1 self Science NaN
other Maths NaN
Example 4: Comparing two DataFrame using the DataFrame.compare()
Method with keep_shape=True
If keep_shape=True
, all rows and columns in the resulted DataFrame will be shown. Otherwise, only the ones with different values will be shown in the resulted DataFrame.
#importing pandas as pd
import pandas as pd
df1 = pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
df2 = pd.DataFrame([['Abhishek',100,'Maths',95], ['Anurag',101,'Maths',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df1.compare(df2,keep_shape=True))
Once we run the program we will get the following output.
Name Roll No Subject Marks
self other self other self other self other
0 NaN NaN NaN NaN Science Maths 90.0 95.0
1 NaN NaN NaN NaN Science Maths NaN NaN
Conclusion
In this tutorial, we learned the Python pandas DataFrame.compare()
method. We learned the syntax and parameters of the DataFrame.compare()
method. We compared two dataframes by solving examples and understood the DataFrame.compare()
method.