Pandas DataFrame
A DataFrame is a two-dimensional data structure that stores data aligned in a tabular fashion, that is in rows and columns.
The Series object in Pandas only supported data manipulation in 1 column. Therefore it was very restricting in terms of storage. To tackle this problem, Pandas consist of Dataframe which helps to display multiple columns and perform operations over it. Dataframes can be thought to be the most fundamental data structure of the pandas library. In this tutorial, we will study all the nuances and functions behind this data structure.
What is a DataFrame?
-
In Pandas, Dataframes are the equivalent of a multidimensional array.
-
To understand better we can say that a Dataframe is made up of more than one Series.
-
Visually, data frames look like a table of values.
-
Most datasets need to be converted into a Dataframe before applying any Pandas specific functions on them.
-
Therefore Dataframes can be thought to be one of the most important data structures, in Pandas.
Dataframes consists of 5 main parameters:
-
data: the data which the DataFrame will consist of
-
index: the index which will label the DataFrame items
-
columns: labels for the columns.
-
dtype: mentions the datatypes for the items in the Dataframe
-
copy: to create a copy of inputs
Features of DataFrame:
-
It has a mutable size.
-
Has labeled axes with rows and columns.
-
Columns are of different data types.
-
Capability to perform arithmetic operations on rows and columns.
Structure of DataFrame:
Let us now understand the structure of the Dataframe. For this purpose, we are showing you an example to create a data frame to record student-related data as shown below:-
Creating a DataFrame
A dataframe can be created using the DataFrame()
function which has the following syntax,
pandas.DataFrame(data, index, columns, dtype, copy)
We have already covered above the 5 parameters used while creating a DataFrame object.
But, just creating an empty DataFrame object is of no use. Pandas library provides various different methods for loading datasets present in different file formats. Following are some methods which can be used to read data from files:
read_csv
: This function is used to read comma separated values. File format should be .csv, if you are using a file.
read_json
: This function is used to read data with json format.
read_fwf
: This function is used to read data with fixed width format.
read_excel
: This function is used to read data from excel files.
read_table
: This function is used to read data from database tables.
To find out the parameters accepted by the abpve methods, we can type the method name followed by a question mark and press SHIFT + ENTER, for example:
pd.read_csv? # (press SHIFT + ENTER)
In case of the function pd.read_csv()
, it takes a filename as parameter, uses comma as default separator, which can be changed by providing a custom value to the sep
parameter. And the first line of the dataset is expected to be header.
So if we have a csv file, say student_data.csv, then we can load this dataset into our DataFrame like this,
import pandas as pd
student_df = pd.read_csv('student_data.csv')
print(type(student_df ))
Output:
pandas.core.frame.DataFrame
There are other ways to create a DataFrame object, and we will cover a few as we move on.
Making a DataFrame from a dictionary:
For a better understanding of the code: colab.research.google
Firstly we import the pandas library and Numpy too,
import pandas as pd
import numpy as np
Python dictionaries can be converted into DataFrames, where the keys act as the column names and the values fill-up the DataFrame items.
data = {'student': ['Jack', 'Mike', 'Rohan', 'Zubair'],
'year':[1, 2, 3, 1],
'marks':[9.8, 6.7, 8, 9.9]}
The above dictionary can be converted into a DataFrame by using the DataFrame()
function.
studyTonight_df = pd.DataFrame(data)
print(studyTonight_df)
The first line of the given code creates our DataFrame and the second line helps to print our newly created DataFrame. We will get an output like this as shown in the figure given below:
Conclusion
Dataframes are the most basic building blocks for the Pandas Library and thus it is extremely important to have a grasp over it. This lesson covered all the important aspects which govern the Dataframes in Pandas. Please go through the functions mentioned in this lesson whenever in doubt, it will surely help you.