Signup/Sign In

Pandas DataFrame groupby() Method

In this tutorial, we will learn the Python pandas in-built methods DataFrame.groupby(). The groupby() operation involves some combination of splitting the object, applying a method, and combining the results. This method is helpful when we do some calculations or statistics on certain groups inside the DataFrame. It returns a groupby object that consists of information about the groups.

The below shows the syntax of DataFrame.groupby() method.

Syntax

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)  

Parameters

by: mapping, method, label, or list of labels. It is used o determine the groups for groupby. If by is a method, it is called on each value of the object's index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups. If an array is passed, the values are used as-is to determine the groups.

axis: If 0 or 'index' that applies a method to each column. If 1 or 'columns', that apply a method to each row. Default axis value is 0 or 'index'.

level: int, level name, or sequence, default None. It will group by particular levels if the axis is multi-index.

Example: groupby() Method in Pandas

The below example shows how a groupby() method groups or splits the objects in DataFrame.

# import the pandas library
import pandas as pd  
data = {'Name': ['Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],'Percentage': [72, 98, 81, 87,85],'Course': ['Arts','B.Com','M.Tech','B.SC','BE']}  
df = pd.DataFrame(data)  
print(df)
grp=df.groupby('Course')
print(grp)
print(grp.groups)

once we run the program we will get the following output.


Name Percentage Course
0 Avinash 72 Arts
1 Amrutha 98 B.Com
2 Chetana 81 M.Tech
3 Kartik 87 B.SC
4 Nikhil 85 BE
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002429A175EE0>
{'Arts': [0], 'B.Com': [1], 'B.SC': [3], 'BE': [4], 'M.Tech': [2]}

Example: Grouping the DataFrame object using groupby() Method

In this example, we are grouping the multiple columns by using groupby() method of dataframe in pandas.

# import the pandas library
import pandas as pd  
data = {'Name': ['Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],'Percentage': [72, 98, 81, 87,85],'Course': ['Arts','B.Com','M.Tech','B.SC','BE']}  
df = pd.DataFrame(data)  
print(df)
grp=df.groupby(['Course','Name'])
print(grp)
print(grp.groups)

Once we run the program we will get the following output.


Name Percentage Course
0 Avinash 72 Arts
1 Amrutha 98 B.Com
2 Chetana 81 M.Tech
3 Kartik 87 B.SC
4 Nikhil 85 BE
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002429A1759D0>
{('Arts', 'Avinash'): [0], ('B.Com', 'Amrutha'): [1], ('B.SC', 'Kartik'): [3], ('BE', 'Nikhil'): [4], ('M.Tech', 'Chetana'): [2]}

Example: Select single group using the get_group() Method

We can select a single group by using the get_group() method. See the below example.

# import the pandas library
import pandas as pd  
data = {'Name': ['Avinash','Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],  
   'Percentage': [72, 98, 81, 87,85,98],  
   'Course': ['Arts','B.Com','M.Tech','B.SC','BE','M.Tech']}  
df = pd.DataFrame(data)  
print("--------DATAFRAME------")
print(df)
grp=df.groupby('Percentage')
print("----Selecting single group-----")
print(grp.get_group(98))

Once we run the program we will get the following output.


--------DATAFRAME------
Name Percentage Course
0 Avinash 72 Arts
1 Avinash 98 B.Com
2 Amrutha 81 M.Tech
3 Chetana 87 B.SC
4 Kartik 85 BE
5 Nikhil 98 M.Tech
----Selecting single group-----
Name Percentage Course
1 Avinash 98 B.Com
5 Nikhil 98 M.Tech

Example: Performing aggregation operation on the groupby() Method

Once the groupby() object created, we can do several aggregation operations on the grouped data. See the below example.

# import the numpy and pandas library
import numpy as np
import pandas as pd  
data = {'Name': ['Avinash','Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],  
   'Percentage': [72, 98, 81, 87,85,98],  
   'Course': ['Arts','B.Com','M.Tech','B.SC','BE','M.Tech']}  
df = pd.DataFrame(data)  
grp=df.groupby('Course')
print("----Performing aggregation operation on groupby object-----")
print(grp['Percentage'].agg(np.mean))

Once we run the program we will get the following output.


----Performing aggregation operation on groupby object-----
Course
Arts 72.0
B.Com 98.0
B.SC 87.0
BE 85.0
M.Tech 89.5
Name: Percentage, dtype: float64

Conclusion:

In this tutorial, we learned the python pandas in-built method groupby() method. We understand the syntax, parameters and we have different examples by applying this method on DataFrame to understand how groupby() method works.



About the author:
I like writing about Python, and frameworks like Pandas, Numpy, Scikit, etc. I am still learning Python. I like sharing what I learn with others through my content.