Signup/Sign In
PUBLISHED ON: MARCH 16, 2021

Pandas DataFrame describe() Method

In this tutorial, we will learn the Python pandas DataFrame.describe() method. It generates descriptive statistics which includes the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

  • For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns.
  • For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles.
  • For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value.

The below shows the syntax of the DataFrame.describe() method.

Syntax

DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Example 1: Describing a DataFrame using the DataFrame.describe() Method

The below example shows describing a DataFrame using the DataFrame.describe() method. By default, only numeric fields are returned.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())

Once we run the program we will get the following output.


Roll No Marks
count 3.000000 3.000000
mean 101.333333 83.333333
std 1.527525 7.637626
min 100.000000 75.000000
25% 100.500000 80.000000
50% 101.000000 85.000000
75% 102.000000 87.500000
max 103.000000 90.000000

Example 2: Describing all columns of a DataFrame using the DataFrame.describe() Method

The below example shows describing all columns of a DataFrame using the DataFrame.describe() method regardless of the data type.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(include='all'))

Once we run the program we will get the following output.


Name Roll No Subject Marks
count 3 3.000000 3 3.000000
unique 3 NaN 2 NaN
top Abhishek NaN Science NaN
freq 1 NaN 2 NaN
mean NaN 101.333333 NaN 83.333333
std NaN 1.527525 NaN 7.637626
min NaN 100.000000 NaN 75.000000
25% NaN 100.500000 NaN 80.000000
50% NaN 101.000000 NaN 85.000000
75% NaN 102.000000 NaN 87.500000
max NaN 103.000000 NaN 90.000000

Example 3: Describing a specific column of the DataFrame using the DataFrame.describe() Method

The below example shows describing a column from a DataFrame by accessing it as an attribute.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.Marks.describe())

Once we run the program we will get the following output.


count 3.000000
mean 83.333333
std 7.637626
min 75.000000
25% 80.000000
50% 85.000000
75% 87.500000
max 90.000000
Name: Marks, dtype: float64

Example 4: Describing a specific column of the DataFrame using the DataFrame.describe() Method

The below example shows how to describe a DataFrame excluding numeric columns using the DataFrame.describe() method with exclude=np.number.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(exclude=np.number))

Once we run the program we will get the following output.


Name Subject
count 3 3
unique 3 2
top Abhishek Science
freq 1 2

Example 5: Describing a DataFrame using the DataFrame.describe() method consisting of None values

The below examples show how the DataFrame.describe() method describes the DataFrame consisting of None values.

import pandas as pd
df= pd.DataFrame([['Abhishek',101,'Science',None], ['Anurag',None,'Science',85],['Chetan',None,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())

Once we run the program we will get the following output.


Roll No Marks
count 1.0 2.000000
mean 101.0 80.000000
std NaN 7.071068
min 101.0 75.000000
25% 101.0 77.500000
50% 101.0 80.000000
75% 101.0 82.500000
max 101.0 85.000000

Conclusion

In this tutorial, we learned the Python pandas DataFrame.describe() method. We learned syntax, parameters and we solved examples by applying this method on the DataFrame with different parameters and understood this method.



About the author:
I like writing about Python, and frameworks like Pandas, Numpy, Scikit, etc. I am still learning Python. I like sharing what I learn with others through my content.