Pandas DataFrame groupby() Method
In this tutorial, we will learn the Python pandas in-built methods DataFrame.groupby().
The groupby()
operation involves some combination of splitting the object, applying a method, and combining the results. This method is helpful when we do some calculations or statistics on certain groups inside the DataFrame. It returns a groupby
object
that consists of information about the groups.
The below shows the syntax of DataFrame.groupby()
method.
Syntax
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
Parameters
by: mapping, method, label, or list of labels. It is used o determine the groups for groupby. If by is a method, it is called on each value of the object's index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups. If an array is passed, the values are used as-is to determine the groups.
axis: If 0 or 'index' that applies a method to each column. If 1 or 'columns', that apply a method to each row. Default axis value is 0 or 'index'.
level: int, level name, or sequence, default None. It will group by particular levels if the axis is multi-index.
Example: groupby() Method in Pandas
The below example shows how a groupby() method groups or splits the objects in DataFrame.
# import the pandas library
import pandas as pd
data = {'Name': ['Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],'Percentage': [72, 98, 81, 87,85],'Course': ['Arts','B.Com','M.Tech','B.SC','BE']}
df = pd.DataFrame(data)
print(df)
grp=df.groupby('Course')
print(grp)
print(grp.groups)
once we run the program we will get the following output.
Name Percentage Course
0 Avinash 72 Arts
1 Amrutha 98 B.Com
2 Chetana 81 M.Tech
3 Kartik 87 B.SC
4 Nikhil 85 BE
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002429A175EE0>
{'Arts': [0], 'B.Com': [1], 'B.SC': [3], 'BE': [4], 'M.Tech': [2]}
Example: Grouping the DataFrame object using groupby()
Method
In this example, we are grouping the multiple columns by using groupby()
method of dataframe in pandas.
# import the pandas library
import pandas as pd
data = {'Name': ['Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],'Percentage': [72, 98, 81, 87,85],'Course': ['Arts','B.Com','M.Tech','B.SC','BE']}
df = pd.DataFrame(data)
print(df)
grp=df.groupby(['Course','Name'])
print(grp)
print(grp.groups)
Once we run the program we will get the following output.
Name Percentage Course
0 Avinash 72 Arts
1 Amrutha 98 B.Com
2 Chetana 81 M.Tech
3 Kartik 87 B.SC
4 Nikhil 85 BE
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002429A1759D0>
{('Arts', 'Avinash'): [0], ('B.Com', 'Amrutha'): [1], ('B.SC', 'Kartik'): [3], ('BE', 'Nikhil'): [4], ('M.Tech', 'Chetana'): [2]}
Example: Select single group using the get_group() Method
We can select a single group by using the get_group()
method. See the below example.
# import the pandas library
import pandas as pd
data = {'Name': ['Avinash','Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],
'Percentage': [72, 98, 81, 87,85,98],
'Course': ['Arts','B.Com','M.Tech','B.SC','BE','M.Tech']}
df = pd.DataFrame(data)
print("--------DATAFRAME------")
print(df)
grp=df.groupby('Percentage')
print("----Selecting single group-----")
print(grp.get_group(98))
Once we run the program we will get the following output.
--------DATAFRAME------
Name Percentage Course
0 Avinash 72 Arts
1 Avinash 98 B.Com
2 Amrutha 81 M.Tech
3 Chetana 87 B.SC
4 Kartik 85 BE
5 Nikhil 98 M.Tech
----Selecting single group-----
Name Percentage Course
1 Avinash 98 B.Com
5 Nikhil 98 M.Tech
Example: Performing aggregation operation on the groupby()
Method
Once the groupby()
object created, we can do several aggregation
operations on the grouped data. See the below example.
# import the numpy and pandas library
import numpy as np
import pandas as pd
data = {'Name': ['Avinash','Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],
'Percentage': [72, 98, 81, 87,85,98],
'Course': ['Arts','B.Com','M.Tech','B.SC','BE','M.Tech']}
df = pd.DataFrame(data)
grp=df.groupby('Course')
print("----Performing aggregation operation on groupby object-----")
print(grp['Percentage'].agg(np.mean))
Once we run the program we will get the following output.
----Performing aggregation operation on groupby object-----
Course
Arts 72.0
B.Com 98.0
B.SC 87.0
BE 85.0
M.Tech 89.5
Name: Percentage, dtype: float64
Conclusion:
In this tutorial, we learned the python pandas in-built method groupby()
method. We understand the syntax, parameters and we have different examples by applying this method on DataFrame to understand how groupby() method works.