Hello Everyone!
In this article, I will be discussing (Code implementation + Explanation) various Loss functions used in ML and their comparison using a performance graph.
There are various loss functions used in Machine Learning depending on the user's purpose.
Out of them, the loss functions that I am going to cover in this article are:
-
Mean Squared Loss
-
Mean Absolute Loss
-
Mean Log Cosh Loss
-
Root Mean Squared Loss
So lets get started!!
What are Loss functions in Machine Learning?
Most of the algorithms in Machine Learning rely on Optimizing (minimizing or maximizing) a function, which we call an Objective Function.
Out of these, the group of functions that we tend to minimize is called Loss Functions.
As the name suggests, a loss function is used to determine the loss of information or error in a particular Machine Learning algorithm under consideration.
Formally speaking:
A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome.
Out of many, the most commonly used method for finding the minimum point of a function (point of minima, as we are focusing on reducing the error to maximize the objective function) is Gradient Descent.
Categorization of Loss Functions:
All the loss functions can be broadly categorized into 2 types:
-
Classification Loss
-
Regression Loss
In this article, most of the loss functions that are going to be discussed, fall into the category of Regression Loss.
So let's study each one of them, one at a time.
1. Mean Squared Loss/Error:
It is defined as the sum of the squared distances between the actual values and the predicted values.
Formula:
Mathematically, it is defined as below:
Code Implementation:
Below we have the code implementation,
def mean_squared_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
new = np.dot(xdata,weights)
predict_y = np.subtract(new,ydata)
MSE = np.mean(np.square(predict_y))
return MSE
Applications of Mean Square Error:
Out of various use-cases/applications of Mean Square Error, below I have discussed some of the most important ones.
-
In Statistical Modelling, the Mean Square Error represents the difference between the actual observation values and the observation values predicted by the model.
-
In Linear Regression, the mathematical benefits of Mean Square Error are particularly evident in its use at analyzing the performance of Linear Regression. It helps in differentiating the variations in a dataset into the following two categories:
-
The key criterion in selecting various estimators is minimizing Mean Square Error. Among various unbiased estimators, minimizing the Mean Square Error is equivalent to minimzing the Variance, and the estimator that does this is the minimum variance unbiased estimator.
Graphical Representation of Mean Square Error:
2. Mean Absolute Loss
It is defined as the sum of the absolute differences between the actual values and the predicted values.
The term absolute difference refers to the distance or the amount/magnitude of deflection in the predicted values from the actual values.
Formula:
Mathematically, it is defined as below:
Code Implementation:
Below we have the code implementation,
def mean_absolute_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.
return MAL
Applications of Mean Absolute Error:
Out of various use-cases/applications of Mean Absolute Error, below I have discussed some of the most important ones.
- Mean Absolute Error is mostly used to determine the accuracy of the industry forecasts.
- It is of great help in the process of strategic planning as it determines the accuracy of the predictions and providing the relevant recommendations.
Graphical Representation of Mean Absolute Error:
3. Mean Log Cosh Loss:
Log-cosh is the logarithm of the hyperbolic cosine of the prediction error.
Formula:
Mathematically, it is defined as below:
Code Implementation:
Below we have the code implementation,
def mean_log_cosh_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
MLCL = np.log(np.cosh(predict_y))
MLCL = np.mean(MLCL)
return MLCL
Applications of Mean Log Cosh Loss:
Out of various use-cases/applications of Mean Log Cosh Loss, below I have discussed some of the most important ones.
log(cosh(x))
is approximately equal to (x ** 2) / 2
for small x
and to abs(x) - log(2)
for large x
.
- This means that 'logcosh' works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction.
- It has all the advantages of Huber loss, and it’s twice differentiable everywhere, unlike Huber's loss.
Graphical Representation of Mean Log Cosh Loss:
4. Root Mean Squared Loss
Root Mean Square Loss/Error (RMSE) is the standard deviation of the residuals (prediction errors).
Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how to spread out these residuals are.
In simple terms, it tells you how concentrated the data is around the line of best fit.
Formula:
Mathematically, it is defined as below:
Code Implementation:
Below we have the code implementation,
def root_mean_squared_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.subtract(np.dot(xdata,weights),ydata)
RMSL = np.sqrt(np.mean((predict_y)**2))
return RMSL
Applications of Root Mean Squared Loss:
Out of various use-cases/applications of Root Mean Squared Loss, below I have discussed some of the most important ones.
Comparison graph for all the Loss/Error Functions discussed above:
We have tested the below-written code and this is the output graph we have got.
Note: In the below graph, the logcosh and Mean Absolute Error are completely overlapped denoting that the results depends on the type and amount of the input data provided to the model. You may try running the below code for a different input size and you will get a different graph.
The complete code (putting it all together):
Note: Below we have provided the complete code (programmed in Python) to run and test all the above discussed error functions yourself.
The entire code is developed by our team and hence we would recommend you to go through to it carefully, one function at a time and we are sure that it will help you get your concepts crystal clear.
Don't get overwhelmed by looking at the size of the code. We have provided you with the comments wherever possible.
Hope you find it helpful.
import numpy as np
import argparse
import csv
import matplotlib.pyplot as plt
import sys
import math
'''
You are only required to learn the following functions
mean_squared_loss
mean_absolute_loss
mean_log_cosh_loss
root_mean_squared_loss
Don't modify any other functions or command line arguments because autograder will be used
Don't modify function declaration (arguments)
'''
def mean_squared_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
new = np.dot(xdata,weights)
predict_y = np.subtract(new,ydata)
MSE=np.mean(np.square(predict_y))
return MSE
def mean_squared_gradient(xdata, ydata, weights):
'''
weights = weight vector [D X 1]
xdata = input feature matrix [N X D]
ydata = output values [N X 1]
Return the mean squared gradient
'''
predict_y = np.subtract(np.dot(xdata,weights),ydata)
predict_y = np.asarray(predict_y)
MSG = np.dot(xdata.T,(predict_y))
MSG=MSG/xdata.shape[0]
MSG=2*MSG
return MSG
def mean_absolute_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.
return MAL
def mean_absolute_gradient(xdata, ydata, weights):
mul = np.dot(xdata , weights)
predict_y = mul-ydata
abst = np.abs(predict_y)
aj = np.divide(predict_y ,abst)
MAG = np.dot(xdata.T , aj)
MAG=MAG/xdata.shape[0]
return MAG
def mean_log_cosh_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
MLCL = np.log(np.cosh(predict_y))
MLCL=np.mean(MLCL)
return MLCL
def mean_log_cosh_gradient(xdata, ydata, weights):
predict_y = (np.subtract(np.dot(xdata,weights),ydata))
MLCG = (np.dot (np.tanh(predict_y).T,xdata)).T
MLCG=MLCG/xdata.shape[0]
return MLCG
def root_mean_squared_loss(xdata, ydata, weights):
'''
weights = weight vector [D X 1] #input weight vector
xdata = input feature matrix [N X D] #input values
ydata = output values [N X 1] #actual output values
'''
predict_y = np.subtract(np.dot(xdata,weights),ydata)
RMSL = np.sqrt(np.mean((predict_y)**2))
return RMSL
def root_mean_squared_gradient(xdata, ydata, weights):
predict_y = np.subtract(np.dot(xdata,weights),ydata)
numerator = (np.dot(predict_y.T,xdata))/xdata.shape[0]
denominator = np.sqrt(np.mean((predict_y)**2))
RMSG = np.divide(numerator,denominator)
RMSG = RMSG.T
return RMSG
class LinearRegressor:
def __init__(self,dims):
# dims is the number of the features
# You can use __init__ to initialise your weight and biases
# Create all class related variables here
self.weights =np.ones((dims,1))
self.weights = self.weights.astype('float64')
return
def train(self, xtrain, ytrain, loss_function, gradient_function, epoch=100, lr=1.0):
'''
xtrain = input feature matrix [N X D]
ytrain = output values [N X 1]
learn weight vector [D X 1]
epoch = scalar parameter epoch
lr = scalar parameter learning rate
loss_function = loss function name for linear regression training
gradient_function = gradient name of loss function
'''
# You need to write the training loop to update weights here
ytrain = np.array(ytrain)
ytrain = np.reshape(ytrain, (xtrain.shape[0],1))
ytrain = ytrain.astype('float64')
arr_err = []
for iteration in range(epoch):
err = loss_function(xtrain, ytrain,self.weights)
self.weights = self.weights - lr*gradient_function(xtrain,ytrain,self.weights)
# print("error =",err)
arr_err.append(err)
return arr_err
def predict(self, xtest):
count=np.dot(xtest,self.weights)
for i in range(xtest.shape[0]):
adi=int(count[i])
if adi<0:
adi=0
print(str(adi))
'''
This code is to make the output csv file
file = open("prediction.csv","w")
file.write("instance (id),count\n")
for i in range(xtest.shape[0]):
row=""
adi=int(count[i])
if adi<0:
adi=0
row=str(i)+","+str(adi)+"\n"
print(str(adi))
file.write(row)
file.close()
# This returns your prediction on xtest
'''
return count
def read_dataset(trainfile, testfile):
'''
Reads the input data from train and test files and
Returns the matrices Xtrain : [N X D] and Ytrain : [N X 1] and Xtest : [M X D]
where D is number of features and N is the number of train rows and M is the number of test rows
'''
xtrain = []
ytrain = []
xtest = []
with open(trainfile,'r') as f:
reader = csv.reader(f,delimiter=',')
next(reader, None)
for row in reader:
xtrain.append(row[:-1])
ytrain.append(row[-1])
with open(testfile,'r') as f:
reader = csv.reader(f,delimiter=',')
next(reader, None)
for row in reader:
xtest.append(row)
return np.array(xtrain), np.array(ytrain), np.array(xtest)
def preprocess_dataset(xdata, ydata=None):
'''
xdata = input feature matrix [N X D]
ydata = output values [N X 1]
Convert data xdata, ydata obtained from read_dataset() to a usable format by loss function
The ydata argument is optional so this function must work for the both the calls
xtrain_processed, ytrain_processed = preprocess_dataset(xtrain,ytrain)
xtest_processed = preprocess_dataset(xtest)
NOTE: You can ignore/drop few columns. You can feature scale the input data before processing further.
'''
xtrain = xdata[:,[8,9,10,11]]
n,m = xdata.shape
X0 = np.ones((n,1))
xtrain = np.append(xtrain,X0,axis=1)
xtrain = np.asarray(xtrain)
aa = xtrain[...,0]
aa = aa.astype(float)
m1 = np.mean(aa)
sd1 = np.std(aa)
xtrain[...,0] = (aa-m1)/sd1
aa = xtrain[...,1]
aa = aa.astype(float)
m1 = np.mean(aa)
sd1 = np.std(aa)
xtrain[...,1] = (aa-m1)/sd1
aa = xtrain[...,2]
aa = aa.astype(float)
for i in range(n):
if aa[i]==3:
aa[i]=0
aa = aa.astype(float)
m1 = np.mean(aa)
sd1 = np.std(aa)
xtrain[...,2]=(aa-m1)/sd1
aa = xtrain[...,3]
aa = aa.astype(float)
for i in range(n):
if aa[i]==3:
aa[i]=0
aa = aa.astype(float)
m1 = np.mean(aa)
sd1 = np.std(aa)
xtrain[...,3]=(aa-m1)/sd1
dt=xdata[:,1]
year=np.zeros((n,3))
for i in range(n):
yr=dt[i]
yr=yr[3:4]
yr=int(yr)
year[i][yr-1]=1
xtrain = np.concatenate((xtrain,year),axis=1)
mon=np.zeros((n,12))
for i in range(n):
m = dt[i]
m = m[5:7]
m = int(m)
mon[i][m-1]=1
xtrain = np.concatenate((xtrain,mon),axis=1)
day=np.zeros((n,31))
for i in range(n):
d = dt[i]
d = d[8:10]
d = int(d)
day[i][d-1] = 1
xtrain = np.concatenate((xtrain,day),axis=1)
season = {
"1" : [1,0,0,0],
"2" : [0,1,0,0],
"3" : [0,0,1,0],
"4" : [0,0,0,1]
}
sea = xdata[:,2]
ss = []
for i in sea:
ss.append(season[i])
ss = np.array(ss)
xtrain = np.append(xtrain,ss,axis=1)
hr = xdata[:,3]
hours = np.zeros((n,24))
for i in range(n):
hours[i][int(hr[i])]=1
hours = np.asarray(hours)
xtrain = np.concatenate((xtrain,hours),axis=1)
hl1 = np.eye(2)[xtrain[:, 4].astype(float).astype(int)]
xtrain = np.concatenate((xtrain,hl1),axis=1)
'''
holi = {
"0" : [1,0],
"1" : [0,1]
}
hl=xtrain[:,4]
hl1=[]
for i in hl:
hl1.append(holi[i])
hl1=np.array(hl1)
xtrain = np.append(xtrain,hl1,axis=1)
'''
days = {
"Monday" : [1,0,0,0,0,0,0],
"Tuesday" : [0,1,0,0,0,0,0],
"Wednesday" : [0,0,1,0,0,0,0],
"Thursday" : [0,0,0,1,0,0,0],
"Friday" : [0,0,0,0,1,0,0],
"Saturday" : [0,0,0,0,0,1,0],
"Sunday" : [0,0,0,0,0,0,1]
}
abc=xdata[:,5]
dayss=[]
for i in abc:
dayss.append(days[i])
dayss=np.asarray(dayss)
xtrain = np.append(xtrain,dayss,axis=1)
wda=np.eye(2)[xtrain[:, 6].astype(float).astype(int)]
xtrain = np.concatenate((xtrain,wda),axis=1)
'''
wda = {
"0" : [1,0],
"1" : [0,1]
}
wd=xtrain[:,6]
wd1=[]
for i in wd:
wd1.append(wda[i])
wd1=np.asarray(wd1)
xtrain = np.append(xtrain,wd1,axis=1)
'''
st = np.eye(2)[xtrain[:, 7].astype(float).astype(int)]
xtrain = np.concatenate((xtrain,st),axis=1)
'''
situation = {
"1" : [1,0],
"2" : [0,1]
}
st=xtrain[:,7]
st1 = []
for i in st:
st1.append(situation[i])
st1=np.asarray(st1)
xtrain = np.append(xtrain,st1,axis=1)
'''
xtrain = xtrain.astype('float64')
return xtrain, ydata
dictionary_of_losses = {
'mse':(mean_squared_loss, mean_squared_gradient),
'mae':(mean_absolute_loss, mean_absolute_gradient),
'rmse':(root_mean_squared_loss, root_mean_squared_gradient),
'logcosh':(mean_log_cosh_loss, mean_log_cosh_gradient),
}
def main():
# You are free to modify the main function as per your requirements.
# Uncomment the below lines and pass the appropriate value
#mean_squared_loss()
xtrain, ytrain, xtest = read_dataset(args.train_file, args.test_file)
xtrainprocessed, ytrainprocessed = preprocess_dataset(xtrain, ytrain)
xtestprocessed = preprocess_dataset(xtest)
model1 = LinearRegressor(xtrainprocessed.shape[1])
mse = model1.train(xtrainprocessed, ytrainprocessed, mean_squared_loss , mean_squared_gradient, args.epoch, args.lr)
'''
Code to plot graph for all the 4 errors
model2 = LinearRegressor(xtrainprocessed.shape[1])
model3 = LinearRegressor(xtrainprocessed.shape[1])
model4 = LinearRegressor(xtrainprocessed.shape[1])
# The loss function is provided by command line argument
loss_fn, loss_grad = dictionary_of_losses[args.loss]
mae = model2.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_absolute_gradient, args.epoch, args.lr)
logcosh = model3.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_log_cosh_gradient, args.epoch, args.lr)
rmse = model4.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, root_mean_squared_gradient, args.epoch, args.lr)
# print("MSE",mse,"MAE",mae,"LOGCOSH",logcosh,"RMSE",rmse)
plt.plot(range(args.epoch),mse,"-",label="mse")
plt.plot(range(args.epoch),mae,"-", label="mae")
plt.plot(range(args.epoch),logcosh,"-",label="logcosh")
plt.plot(range(args.epoch),rmse,"-",label="rmse")
plt.legend()
plt.show()
'''
model1.predict(xtestprocessed[0])
if __name__ == '__main__':
'''
You can remove the comments and run or test your code on the following input:
code to test all the 8 functions for the sample data provided on the moodle
ydata = [-2, 1, 1, 2, 0]
xdata = [[ 1, 0, 2, -3], [ 1, -1, 0, -3], [-2, -5, 1, -3], [ 0, -5, 3, -3], [ 0, -4, 3, -2]]
weights = [ 1, 0, -2, -1]
xdata=np.asarray(xdata)
ydata=np.asarray(ydata)
weights = np.asarray(weights)
ydata=ydata.reshape(ydata.shape[0],1)
weights=weights.reshape(weights.shape[0],1)
ydata=ydata.reshape(ydata.shape[0],1)
print(xdata.shape)
print(ydata.shape)
print(weights.shape)
print(mean_squared_loss(xdata, ydata, weights))
print(mean_squared_gradient(xdata, ydata, weights))
print(mean_absolute_loss(xdata, ydata, weights))
print(mean_absolute_gradient(xdata, ydata, weights))
print(root_mean_squared_loss(xdata, ydata, weights))
print(root_mean_squared_gradient(xdata, ydata, weights) )
print(mean_log_cosh_loss(xdata, ydata, weights))
print(mean_log_cosh_gradient(xdata, ydata, weights))
sys.exit(0)
'''
parser = argparse.ArgumentParser()
parser.add_argument('--loss', default='mse', choices=['mse','mae','rmse','logcosh'], help='loss function')
parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
parser.add_argument('--epoch', default=50000, type=int, help='number of epochs')
parser.add_argument('--train_file', type=str, help='location of the training file')
parser.add_argument('--test_file', type=str, help='location of the test file')
args = parser.parse_args()
main()
Just copy and paste this Python code on your IDE and run by uncommenting the input parameters.
What to uncomment and how to run, we have left it upto you so that you can validate your understanding of the code implementation.
In case of any queries, feel free to get them cleared by posting it on our coemment section down below!
Happy Learning : )