Signup/Sign In

Iteration in Pandas

In data science, the most important entity we have is data which is not a big revelation considering the name of the field. Therefore it is very important for us to achieve a firm grasp on each and every data set we have, and subsequently, each and every data variable present in the data set. One of the major processes which allow us such access is the process of iteration. Performing iteration over a given data set or different data sets gives us the ability to travel through the memory storage and visit the data variables which are there is the concerned data set. This gives us a good grasp over the data and permits us to perform even more complex operations on the data for their handling and manipulation.

There are various different methods to iterate over a given DataFrame, they are:

  • Row wise iteration

  • Column wise iteration

  • Iteration of data over each other by taking the form of a tuple

Iteration allows us to visit each and every member of the given data set in a manner that is sequential and also gives us the ability to perform various logical and mathematical operations over them as well whilst iterating.

The Pandas library has provided us with 3 different functions which make iteration over the given data sets relatively easier. They are:

  1. iteritems(): This function in the Pandas library helps the user to iterate over each and every element present in the set, column-wise.

  2. iterrows(): This function in the Pandas library helps the user to iterate over each and every element present in the set, row-wise.

  3. itertuple(): This function in the Pandas library helps the user to iterate over each row present in the data set while forming a tuple out of the given data.

The Process of Iterating in Pandas

Before getting started with Pandas we have to import the library from the system. You can import the pandas library using the following command:

import pandas as pd

Now we need a data set to work on. For doing that we load a data set of our choice using the read_csv function. This function allows us to read our given data set. Then we use the function .head() to prints 5 rows from the beginning of the data set. This will give us a preliminary data set to work with.

studyTonight = pd.read_csv("https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv")
print(studyTonight.head())

For a better understanding of all the examples, please visit collab.google.com

1. Using iteritems() function

This is a pre-defined function which is available to us in Pandas. This function allows us to traverse through the system and visit each and every data value present in our data set in the form of columns. The following line of code as input helps us use this function:

for key,values in studyTonight.iteritems():
    print(key, values)

Output:

It is visible in the output that we iterated through the data set using iteritems() because our snippet of code has to run through all of the present elements of the data set via all the columns.

2. Using iterrows() function:

This is a pre-defined function that is available to us in Pandas. This function allows us to travel through the system and visit each and every data value present in our data set in the form of rows. The following line of code as input helps us use this function:

for row_index,row in studyTonight.iterrows():
    print(row_index, row)

Output:

It is visible in the output that we iterated through the data set using iterrows() because our snippet of code has to run through all of the present elements of the data set via all the rows.

3. Using itertuples() function:

This is a pre-defined function which is available to us in Pandas. This function actually creates a tuple for each and every row present in the given data set. It then iterates over them. Thus, this iteration over the given data set gives us a tuple consisting of all the rows present in the data set. The following line of code as input helps us use this function:

for row in studyTonight.itertuples():
    print(row)

Output:

It is visible in the output that we iterated through the data set using itertuples() because our snippet of code has to run through all of the present rows of the data set creating a tuple.

Conclusion:

This tutorial was designed to help you understand what the process of iteration is and all the fundamentals that are involved in a standard process of iteration in pandas. This tutorial also serves to teach all the different methods of iteration that pandas provide us with and the ways we can use them.

The syntax of the code is very important and the sample outputs provide you with a standard that you can compare your own outputs with while practicing. This process has the power to simplify your operations on the given data set. Therefore a good implementation of iteration can take you a long way in data science.

If there are any more queries still present in your mind, feel free to ask them in the comments section down below.



About the author:
I like writing about Python, and frameworks like Pandas, Numpy, Scikit, etc. I am still learning Python. I like sharing what I learn with others through my content.