In my previous post, we explored the Matplotlib python plotting library which is used for visualization of datasets by plotting graphs. We also saw various attributes which can be used to modify the plots and explored various line graphs. In today's post, we will discuss other different type of graphs: bar graph, pie chart, histogram, scatterplot as well as 3-D plotting.
1. Matpotlib: Bar graph
It is a type of visualization that helps in representing categorical data. It has rectangular bars (hence the name, bar graph) that can be represented horizontally and vertically.
In the code above, two lists have been defined so as to plot the graph for popular programming languages against the number of people using them. These lists are hypothetical and haven't been taken from any surveys whatsoever (hence ignore the numerical values).
The first function plot_bar_vertically()
shows a vertical graph and the second function plot_bar_horizontally()
shows the same data in a horizontal graph.
The only difference which has been made to represent the graph horizontally is the usage of barh
instead of bar
in the code.
2. Matpotlib: Pie chart
Also known as a circle chart or Emma chart, it is used to represent proportions of data. The central angle and the area between each of the parts of the pie chart represent the quantity of data. It has been named as pie chart due to its resemblance to a piece of a pie.
Usually the data shown in a pie chart are in percentages.
import matplotlib.pyplot as plt
slices_usage = [2.5, 4, 1]
persons = ['Person A', 'Person B', 'Person C']
colors = ['r', 'b', 'y']
plt.pie(slices_usage, labels=persons, colors=colors, startangle=90, autopct='%.1f%%')
plt.show()
Output:
Note: Try running this code in the terminal above.
The calculation in the above code goes like this:
The slice usage is = [2.5, 4, 1]
Since the pie chart data is represented using percentages, take the sum of the slice usage, i.e 7.5, now find out how much percentage of 7.5 is 2.5?
The same technique is used for all the other values in the slice usage list.
If we want, we can add labels to our pie chart,
daily_life = [8, 2, 9, 1, 4]
activities = ['Sleep', 'Activities-Commute', 'Work', 'Exercise', 'Family-Time']
plt.pie(daily_life, labels=activities, startangle=90, autopct='%.1f%%')
plt.title("An engineer's life")
plt.show()
Output:
3. Matpotlib: Scatter plot
It is also known as scatter graph, scatter chart, scattergram or scatter diagram.
It helps visualize the relationship between 2 or more variables. In addition to this, it helps in identifying outliers(abnormalities) which could be present in a dataset. This way, the exceptions in data could be better understood and the reason behind the same could be found out.
It uses Cartesian coordinates to display the values for two variables in a dataset. It is represented as a collection of points, wherein the value of one data point is visualized with respect to its pair in the dataset, horizontally and vertically, respectively.
import matplotlib.pyplot as plt
surprise_test_grades = [56, 90, 75, 89, 99, 45, 90, 100, 86, 64]
prepared_test_grades = [10, 92, 80, 48, 100, 48, 77, 99, 68, 77]
grades_range = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
plt.scatter(grades_range, surprise_test_grades, color='r')
plt.scatter(grades_range, prepared_test_grades, color='g')
plt.xlabel('Range')
plt.ylabel('Grades Scored')
plt.show()
Output:
This code has 2 lists, namely surprise_test_grades
and prepared_test_grades
. 10 students scored certain marks in the surprise test(surprise_test_grades
) and the same 10 students scored certain marks in the test for which they were prepared(prepared_test_grades
). These marks have been plotted with the help of scatter plot with respect to the grades_range
list, which defines a list of range.
It clearly shows outlier for a few students who have scored less in the prepared test in comparison to their score in the surprize test.
4. Matpotlib: Histogram
Histograms help in representing grouped data. The X-axis and the Y-axis represent the range and the frequencies respectively. The histogram is based on the area of the bar and not always the height of the bar. It usually represents the number of data attributes (y-axis values) in a particular range (x-axis values).
import matplotlib.pyplot as plt
import numpy as np
x = np.random.random_integers(1, 100, 7)
plt.hist(x, bins=11)
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()
Output:
In the code above, a random set of 7 integers between 1 and 100 is generated every time and sent to the hist
function. The bins in the plt.hist
represent the range within which data is present. After this, the x-axis and y-axis are given labels and this plot is displayed on the screen.
Quick Note: The value for bin has to be chosen in a way such that the histogram doesn't become too small or too large. If it is too small, it ends up showing too much individual data thereby missing out on the underlying pattern present in the data. On the other hand, if the bin value is too large, it dissolves the patterns in data and we end up observing nothing. Usually, the value of bin is in the range of 8 to 15, but this isn’t a hard and fast rule.
5. Matpotlib: 3-D plot
mplot3d is the library (that comes pre-installed with Matplotlib) that helps in the 3-dimensional representation of data. In the 3-D space, lines, as well as points, can be represented. The advantage of using 3 D plots is its ability to be viewed from different angles.
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
axis = plt.axes(projection="3d")
plt.show()
Output:
The above code shows how a simple 3-D projection can be represented.
Plotting a 3-D graph can be reduced into 3 steps:
Step 1: Generate points that will help make the surface for the 3 D plot. Define the points for x, y and define a function that uses x and y to calculate the z value.
fig = plt.figure()
ax = plt.axes(projection="3d")
def z_function(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-5, 10, 20)
y = np.linspace(-5, 10, 20)
X, Y = np.meshgrid(x, y)
Z = z_function(X, Y)
fig = plt.figure()
Step 2: Plot a wireframe, which will help in the estimation of the surface for the 3-D plot.
ax = plt.axes(projection="3d")
ax.plot_wireframe(X, Y, Z, color='green')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Step 3: Project the created surface on the plotted wireframe and extend the remaining points beyond their range.
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,cmap='flag', edgecolor='none')
ax.set_title('3-D plot')
Note: The cmap
is basically the color map and the following link describes different Color codes in Matpotlib.
In the upcoming posts, we will look into other Machine Learning algorithms and their implementations using Python.
You may also like: