Pandas DataFrame describe() Method
In this tutorial, we will learn the Python pandas DataFrame.describe()
method. It generates descriptive statistics which includes the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN
values.
- For mixed data types provided via a
DataFrame
, the default is to return only an analysis of numeric
columns.
- For numeric data, the result’s index will include
count
, mean
, std
, min
, max
as well as lower, 50
and upper percentiles.
- For object data (e.g. strings or timestamps), the result’s index will include
count
, unique
, top
, and freq
. The top
is the most common value.
The below shows the syntax of the DataFrame.describe()
method.
Syntax
DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
Example 1: Describing a DataFrame
using the DataFrame.describe()
Method
The below example shows describing a DataFrame
using the DataFrame.describe()
method. By default, only numeric
fields are returned.
import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())
Once we run the program we will get the following output.
Roll No Marks
count 3.000000 3.000000
mean 101.333333 83.333333
std 1.527525 7.637626
min 100.000000 75.000000
25% 100.500000 80.000000
50% 101.000000 85.000000
75% 102.000000 87.500000
max 103.000000 90.000000
Example 2: Describing all columns of a DataFrame
using the DataFrame.describe()
Method
The below example shows describing all columns of a DataFrame
using the DataFrame.describe()
method regardless of the data type.
import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(include='all'))
Once we run the program we will get the following output.
Name Roll No Subject Marks
count 3 3.000000 3 3.000000
unique 3 NaN 2 NaN
top Abhishek NaN Science NaN
freq 1 NaN 2 NaN
mean NaN 101.333333 NaN 83.333333
std NaN 1.527525 NaN 7.637626
min NaN 100.000000 NaN 75.000000
25% NaN 100.500000 NaN 80.000000
50% NaN 101.000000 NaN 85.000000
75% NaN 102.000000 NaN 87.500000
max NaN 103.000000 NaN 90.000000
Example 3: Describing a specific column of the DataFrame
using the DataFrame.describe()
Method
The below example shows describing a column from a DataFrame
by accessing it as an attribute.
import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.Marks.describe())
Once we run the program we will get the following output.
count 3.000000
mean 83.333333
std 7.637626
min 75.000000
25% 80.000000
50% 85.000000
75% 87.500000
max 90.000000
Name: Marks, dtype: float64
Example 4: Describing a specific column of the DataFrame
using the DataFrame.describe()
Method
The below example shows how to describe a DataFrame
excluding numeric columns using the DataFrame.describe()
method with exclude=np.number
.
import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(exclude=np.number))
Once we run the program we will get the following output.
Name Subject
count 3 3
unique 3 2
top Abhishek Science
freq 1 2
Example 5: Describing a DataFrame
using the DataFrame.describe()
method consisting of None
values
The below examples show how the DataFrame.describe()
method describes the DataFrame consisting of None values.
import pandas as pd
df= pd.DataFrame([['Abhishek',101,'Science',None], ['Anurag',None,'Science',85],['Chetan',None,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())
Once we run the program we will get the following output.
Roll No Marks
count 1.0 2.000000
mean 101.0 80.000000
std NaN 7.071068
min 101.0 75.000000
25% 101.0 77.500000
50% 101.0 80.000000
75% 101.0 82.500000
max 101.0 85.000000
Conclusion
In this tutorial, we learned the Python pandas DataFrame.describe()
method. We learned syntax, parameters and we solved examples by applying this method on the DataFrame with different parameters and understood this method.