Pandas Series
Series in Pandas can be compared to one of the fundamental building blocks of the library that helps to manipulate and handle data. Essentially, it is a one-dimensional array with labels and indices.
What is the difference between a NumPy array and a Series?
Series in Pandas differ from a NumPy array because Series have labels on the elements, which NumPy arrays don't have. Thus you can access the Series elements with labels associated with them as well as the integer position of the element.
Parameters in Series:
Series consists of 4 main parameters:
-
data: Out of which the Series will be made
-
index: This will be used to label your data
-
dtype: Tells about the data type of the elements in the Series
-
copy: Input data is copied using this.
Let's use pandas Series:
Note: For a better understanding of the code, check the code here: colab.research.google
To begin, we import the pandas library and the NumPy library,
import pandas as pd
What does "import pandas as pd" signify?
Basically what this line of code does is, it imports the Pandas library and lets you call the library with the name pd
.
Now let us create a Series:
After importing pandas you will be able to use pd.Series()
function to create Series data structures. To do this we will pass an array of our choice into pd.Series()
.
import pandas as pd
studyTonightSeries= pd.Series([3,4,5,7,8,9])
print(studyTonightSeries)
0 3 1 4 2 5 3 7 4 8 5 9 dtype: int64
The column on the left-hand side of your output shows the index of each element present in the Series.
Below we have the code,
import pandas as pd
import numpy as np
st_ar = np.array(['s','t','u','d',’y’])
studyTonightSeries_ar= pd.Series(st_ar)
studyTonightSeries_ar
0 s 1 t 2 u 3 d 4 y dtype: object
As we can see our Series can be made up of any type of elements, strings, integers, float, etc.
Providing custom index in a Series:
Now let's see how we can set our own custom values for the index of pandas Series. Let's create a list of our index values,
ind = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5']
We can map this list onto our pandas Series object so that the list values can serve as an index to the Series object,
ind = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5']
studyTonight_arr = pd.Series([2,6,-3,1,7], index=ind)
print(studyTonight_arr)
Row 1 2 Row 2 6 Row 3 -3 Row 4 1 Row 5 7 dtype: int64
The index list is passed as a parameter into pd.Series()
function.
Making Series from a dictionary in Python
If you have a dictionary in Python, you can turn it into a series. When you convert a python dictionary to pandas Series object, you will notice that the keys of the dictionary have become the index of the series:
studyTonight_dict = { 'Carrot': 12.9, 'Brinjal': 8.4, 'Gourd': 9.7 }
To convert the dictionary above, into a pandas Series, we need to pass the dictionary object into the Series()
function as a parameter.
studyTonight_arr1 = pd.Series(studyTonight_dict)
print(studyTonight_arr1)
Carrot 12.9 Brinjal 8.4 Gourd 9.7 dtype: float64
Operations on Series:
Now let's see a few operations that we can perform on the series data structure in pandas library.
1. Get a Value from Series
To check the value corresponding to any row, we simply pass the name of the row as a parameter, as shown below.
import pandas as pd
ind = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5']
studyTonight_arr = pd.Series([2,6,-3,1], index=ind)
print(studyTonight_arr['Row 2'])
This command will return a value of 6.
We can also filter through a series by providing a condition instead of the index value. The resultant series will be filtered based on the condition which is provided. For example, if we want to get all the data elements stored in the series object with values greater than 2, we can do so like this,
import pandas as pd
ind = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5']
studyTonight_arr = pd.Series([2,6,-3,1], index=ind)
print(studyTonight_arr[studyTonight_arr > 2])
Row 2 6 Row 5 7 dtype: int64
As shown in the output, 6 and 7 are the only values present which are greater than 2.
2. Check if a value is present in Series
To check the presence of an item in a series we can use the in
keyword. Using the in
keyword will return a boolean value telling us if a particular item is present in the series or not. Let's take an example,
import pandas as pd
ind = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5']
studyTonight_arr = pd.Series([2,6,-3,1], index=ind)
'Row 3' in studyTonight_arr
The above statement will return True.
3. Mathematical operation on Series
Pandas series can be operated upon, mathematically. You can multiply, add, subtract and divide constants from an existing Series.
# to multiply all the values by 3
studyTonight_arr*3
Row 1 6 Row 2 18 Row 3 -9 Row 4 3 Row 5 21 dtype: int64
4. Demonstration of missing values:
To demonstrate the role of missing values in Series, we will first make a new list of items and add them to our Series.
vegies = ['Carrot', 'Brinjal', 'Peas', 'Gourd']
studyTonight_dict = { 'Carrot': 12.9, 'Brinjal': 8.4, 'Gourd': 9.7 }
studyTonight_arr2 = pd.Series(studyTonight_dict, index = vegies)
print(studyTonight_arr2)
In our newly built series, we have a new item “Peas” But as we can see our Series doesn’t have a value to correspond “Peas”. Therefore it is automatically represented in our System as NaN
.
Carrot 12.9 Brinjal 8.4 Peas NaN Gourd 9.7 dtype: float64
NaN
is the Pandas' method of representing missing values.
5. Adding two series:
You can perform arithmetic operations on 2 or more Series objects too. We can simply use the +
operator to add two series objects.
studyTonight_dict = { 'Carrot': 12.9, 'Brinjal': 8.4, 'Gourd': 9.7 }
studyTonight_arr1 = pd.Series(studyTonight_dict)
vegies = ['Carrot', 'Brinjal', 'Peas', 'Gourd']
studyTonight_arr2 = pd.Series(studyTonight_dict, index = vegies)
print(studyTonight_arr1 + studyTonight_arr2)
Brinjal 16.8 Carrot 25.8 Gourd 19.4 Peas NaN dtype: float64
Adding two series objects will also add the respective values, but this is only possible if the datatype of the values store in the series being added is same. Also, the values are added only if the index values are same. If the index values are not the same then all the values will be together stored in the new series object.
dict_one = { 'Carrot': 12.9, 'Brinjal': 8.4, 'Gourd': 9.7 }
studyTonight_arr1 = pd.Series(dict_one)
dict_two = { 'Bread': 20.5, 'Eggs': 12.5, 'Milk': 21 }
studyTonight_arr2 = pd.Series(dict_two)
print(studyTonight_arr1 + studyTonight_arr2)
Bread NaN Brinjal NaN Carrot NaN Eggs NaN Gourd NaN Milk NaN dtype: float64
Notice the alphabetical order in which the index values got arranged. But all the values are changed to NaN
.
6. Accessing a range of elements in a series:
The :
operator in python lets us access a segment of lists etc and in this case our Series objects. Using it, we can access segments like the last 'n' elements, or the first 'n' elements or 'n' elements in between.
To get the first 2 elements from a Series, we use:
studyTonight_arr2[:2]
Carrot 12.9 Brinjal 8.4 dtype: float64
To get the last 2 elements from a Series, we use:
studyTonight_arr2[2:]
Peas NaN Gourd 9.7 dtype: float64
Therefore we can understand that the function essentially works in the way of [a:b] where a is the first element of our desired range and the b is the last element of our desired range. Using this we can take out a range of values from the middle too,
studyTonight_arr2[1:3]
Brinjal 8.4 Peas NaN dtype: float64
Conclusion:
In this tutorial, we covered the essential parts of the pandas Series object and also learned how to perform various functions over this data structure. The series data structure can be thought to be the second most important data structure in the Pandas library, therefore it is very important to get your basics cleared.