Signup/Sign In
LAST UPDATED: MARCH 2, 2020

Types of Loss functions in ML

Hello Everyone!

In this article, I will be discussing (Code implementation + Explanation) various Loss functions used in ML and their comparison using a performance graph.

There are various loss functions used in Machine Learning depending on the user's purpose.

Out of them, the loss functions that I am going to cover in this article are:

  1. Mean Squared Loss

  2. Mean Absolute Loss

  3. Mean Log Cosh Loss

  4. Root Mean Squared Loss

So lets get started!!

What are Loss functions in Machine Learning?

Most of the algorithms in Machine Learning rely on Optimizing (minimizing or maximizing) a function, which we call an Objective Function.

Out of these, the group of functions that we tend to minimize is called Loss Functions.

As the name suggests, a loss function is used to determine the loss of information or error in a particular Machine Learning algorithm under consideration.

Formally speaking:

A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome.

Out of many, the most commonly used method for finding the minimum point of a function (point of minima, as we are focusing on reducing the error to maximize the objective function) is Gradient Descent.

Categorization of Loss Functions:

All the loss functions can be broadly categorized into 2 types:

  1. Classification Loss

  2. Regression Loss

In this article, most of the loss functions that are going to be discussed, fall into the category of Regression Loss.

So let's study each one of them, one at a time.

1. Mean Squared Loss/Error:

It is defined as the sum of the squared distances between the actual values and the predicted values.

Formula:

Mathematically, it is defined as below:

Code Implementation:

Below we have the code implementation,

def mean_squared_loss(xdata, ydata, weights):
    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values

    '''
    new = np.dot(xdata,weights)
    predict_y = np.subtract(new,ydata)
    MSE = np.mean(np.square(predict_y))
    
    return MSE

Applications of Mean Square Error:

Out of various use-cases/applications of Mean Square Error, below I have discussed some of the most important ones.

  • In Statistical Modelling, the Mean Square Error represents the difference between the actual observation values and the observation values predicted by the model.

  • In Linear Regression, the mathematical benefits of Mean Square Error are particularly evident in its use at analyzing the performance of Linear Regression. It helps in differentiating the variations in a dataset into the following two categories:

    • Variation explained by the Model

    • Variation explained by Randomness.

  • The key criterion in selecting various estimators is minimizing Mean Square Error. Among various unbiased estimators, minimizing the Mean Square Error is equivalent to minimzing the Variance, and the estimator that does this is the minimum variance unbiased estimator.

Graphical Representation of Mean Square Error:

2. Mean Absolute Loss

It is defined as the sum of the absolute differences between the actual values and the predicted values.

The term absolute difference refers to the distance or the amount/magnitude of deflection in the predicted values from the actual values.

Formula:

Mathematically, it is defined as below:

Code Implementation:

Below we have the code implementation,

def mean_absolute_loss(xdata, ydata, weights):
    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values

    '''
    predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
    MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
    MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.

    return MAL

Applications of Mean Absolute Error:

Out of various use-cases/applications of Mean Absolute Error, below I have discussed some of the most important ones.

  • Mean Absolute Error is mostly used to determine the accuracy of the industry forecasts.
  • It is of great help in the process of strategic planning as it determines the accuracy of the predictions and providing the relevant recommendations.

Graphical Representation of Mean Absolute Error:

3. Mean Log Cosh Loss:

Log-cosh is the logarithm of the hyperbolic cosine of the prediction error.

Formula:

Mathematically, it is defined as below:

Code Implementation:

Below we have the code implementation,

def mean_log_cosh_loss(xdata, ydata, weights):
    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values

    '''
    predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
    MLCL = np.log(np.cosh(predict_y))
    MLCL = np.mean(MLCL)

    return MLCL

Applications of Mean Log Cosh Loss:

Out of various use-cases/applications of Mean Log Cosh Loss, below I have discussed some of the most important ones.

  • log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x.
  • This means that 'logcosh' works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction.
  • It has all the advantages of Huber loss, and it’s twice differentiable everywhere, unlike Huber's loss.

Graphical Representation of Mean Log Cosh Loss:

4. Root Mean Squared Loss

Root Mean Square Loss/Error (RMSE) is the standard deviation of the residuals (prediction errors).

Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how to spread out these residuals are.

In simple terms, it tells you how concentrated the data is around the line of best fit.

Formula:

Mathematically, it is defined as below:

Code Implementation:

Below we have the code implementation,

def root_mean_squared_loss(xdata, ydata, weights):
    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values

    '''

    predict_y = np.subtract(np.dot(xdata,weights),ydata)
    RMSL = np.sqrt(np.mean((predict_y)**2))
    return RMSL

Applications of Root Mean Squared Loss:

Out of various use-cases/applications of Root Mean Squared Loss, below I have discussed some of the most important ones.

  • climatology

  • forecasting

  • regression analysis to verify experimental results.

Comparison graph for all the Loss/Error Functions discussed above:

We have tested the below-written code and this is the output graph we have got.

Note: In the below graph, the logcosh and Mean Absolute Error are completely overlapped denoting that the results depends on the type and amount of the input data provided to the model. You may try running the below code for a different input size and you will get a different graph.

The complete code (putting it all together):

Note: Below we have provided the complete code (programmed in Python) to run and test all the above discussed error functions yourself.

The entire code is developed by our team and hence we would recommend you to go through to it carefully, one function at a time and we are sure that it will help you get your concepts crystal clear.

Don't get overwhelmed by looking at the size of the code. We have provided you with the comments wherever possible.

Hope you find it helpful.

import numpy as np
import argparse
import csv
import matplotlib.pyplot as plt
import sys
import math

''' 
You are only required to learn the following functions

mean_squared_loss
mean_absolute_loss
mean_log_cosh_loss
root_mean_squared_loss

Don't modify any other functions or command line arguments because autograder will be used
Don't modify function declaration (arguments)
'''

def mean_squared_loss(xdata, ydata, weights):

    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values
    '''
    new = np.dot(xdata,weights)
    predict_y = np.subtract(new,ydata)
    MSE=np.mean(np.square(predict_y))
    
    return MSE

def mean_squared_gradient(xdata, ydata, weights):

    '''
    weights = weight vector [D X 1]
    xdata = input feature matrix [N X D]
    ydata = output values [N X 1]
    Return the mean squared gradient
    '''
    predict_y = np.subtract(np.dot(xdata,weights),ydata)
    predict_y = np.asarray(predict_y)
    MSG = np.dot(xdata.T,(predict_y))
    MSG=MSG/xdata.shape[0]
    MSG=2*MSG
    
    return MSG

def mean_absolute_loss(xdata, ydata, weights):

    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values
    '''
    predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
    MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
    MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.

    return MAL

def mean_absolute_gradient(xdata, ydata, weights):
    mul = np.dot(xdata , weights)
    predict_y = mul-ydata
    abst = np.abs(predict_y)
    aj = np.divide(predict_y ,abst)
    MAG = np.dot(xdata.T , aj)
    MAG=MAG/xdata.shape[0]
    return MAG

def mean_log_cosh_loss(xdata, ydata, weights):

    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values
    '''

    predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
    MLCL = np.log(np.cosh(predict_y))
    MLCL=np.mean(MLCL)

    return MLCL

def mean_log_cosh_gradient(xdata, ydata, weights):
    predict_y = (np.subtract(np.dot(xdata,weights),ydata))
    MLCG = (np.dot (np.tanh(predict_y).T,xdata)).T
    MLCG=MLCG/xdata.shape[0]
    return MLCG

def root_mean_squared_loss(xdata, ydata, weights):

    '''
    weights = weight vector [D X 1] #input weight vector
    xdata = input feature matrix [N X D] #input values
    ydata = output values [N X 1] #actual output values
    '''
    predict_y = np.subtract(np.dot(xdata,weights),ydata)
    RMSL = np.sqrt(np.mean((predict_y)**2))
    return RMSL

def root_mean_squared_gradient(xdata, ydata, weights):
   
    predict_y = np.subtract(np.dot(xdata,weights),ydata)
    numerator = (np.dot(predict_y.T,xdata))/xdata.shape[0]
    denominator = np.sqrt(np.mean((predict_y)**2))
    RMSG = np.divide(numerator,denominator)
    RMSG = RMSG.T
    return RMSG     
 

class LinearRegressor:

    def __init__(self,dims):

        # dims is the number of the features
        # You can use __init__ to initialise your weight and biases
        # Create all class related variables here

        self.weights =np.ones((dims,1))
        self.weights = self.weights.astype('float64')
        return 

    def train(self, xtrain, ytrain, loss_function, gradient_function, epoch=100, lr=1.0):

        '''
        xtrain = input feature matrix [N X D]
        ytrain = output values [N X 1]
        learn weight vector [D X 1]
        epoch = scalar parameter epoch
        lr = scalar parameter learning rate
        loss_function = loss function name for linear regression training
        gradient_function = gradient name of loss function
        '''

        # You need to write the training loop to update weights here

        ytrain = np.array(ytrain)
        ytrain = np.reshape(ytrain, (xtrain.shape[0],1))
        ytrain = ytrain.astype('float64')
        arr_err = []

        for iteration in range(epoch):
            err = loss_function(xtrain, ytrain,self.weights)
            self.weights = self.weights - lr*gradient_function(xtrain,ytrain,self.weights)
            # print("error =",err)
            arr_err.append(err)

        return arr_err
        

    def predict(self, xtest):

        count=np.dot(xtest,self.weights)

        for i in range(xtest.shape[0]):
            adi=int(count[i])
            if adi<0:
                adi=0
            print(str(adi))


        ''' 
        This code is to make the output csv file
        file = open("prediction.csv","w")
        file.write("instance (id),count\n")
        for i in range(xtest.shape[0]):
            row=""
            adi=int(count[i])
            if adi<0:
                adi=0
            row=str(i)+","+str(adi)+"\n"
            print(str(adi))
            file.write(row)

        file.close()

        # This returns your prediction on xtest

        '''

        return count


def read_dataset(trainfile, testfile):

    '''
    Reads the input data from train and test files and 
    Returns the matrices Xtrain : [N X D] and Ytrain : [N X 1] and Xtest : [M X D] 
    where D is number of features and N is the number of train rows and M is the number of test rows
    '''

    xtrain = []
    ytrain = []
    xtest = []

    with open(trainfile,'r') as f:
        reader = csv.reader(f,delimiter=',')
        next(reader, None)
        for row in reader:
            xtrain.append(row[:-1])
            ytrain.append(row[-1])

    with open(testfile,'r') as f:
        reader = csv.reader(f,delimiter=',')
        next(reader, None)
        for row in reader:
            xtest.append(row)

    return np.array(xtrain), np.array(ytrain), np.array(xtest)


def preprocess_dataset(xdata, ydata=None):

    '''
    xdata = input feature matrix [N X D] 
    ydata = output values [N X 1]
    Convert data xdata, ydata obtained from read_dataset() to a usable format by loss function
    The ydata argument is optional so this function must work for the both the calls
    xtrain_processed, ytrain_processed = preprocess_dataset(xtrain,ytrain)
    xtest_processed = preprocess_dataset(xtest) 
    
    NOTE: You can ignore/drop few columns. You can feature scale the input  data before processing further.
    '''

    xtrain = xdata[:,[8,9,10,11]]
    n,m = xdata.shape 
    X0 = np.ones((n,1))
    xtrain = np.append(xtrain,X0,axis=1)
    xtrain = np.asarray(xtrain)
    aa = xtrain[...,0]
    aa = aa.astype(float)
    m1 = np.mean(aa)
    sd1 = np.std(aa)
    xtrain[...,0] = (aa-m1)/sd1  
    aa = xtrain[...,1]
    aa = aa.astype(float)
    m1 = np.mean(aa)
    sd1 = np.std(aa)
    xtrain[...,1] = (aa-m1)/sd1
    aa = xtrain[...,2]
    aa = aa.astype(float)
    
    for i in range(n):
        if aa[i]==3:
            aa[i]=0
                
    aa = aa.astype(float)
    m1 = np.mean(aa)
    sd1 = np.std(aa)
    xtrain[...,2]=(aa-m1)/sd1
    aa = xtrain[...,3]
    aa = aa.astype(float)

    for i in range(n):
        if aa[i]==3:
            aa[i]=0            

    aa = aa.astype(float)
    m1 = np.mean(aa)
    sd1 = np.std(aa)
    xtrain[...,3]=(aa-m1)/sd1

    dt=xdata[:,1]

    year=np.zeros((n,3))

    for i in range(n):
        yr=dt[i]
        yr=yr[3:4]
        yr=int(yr)
        year[i][yr-1]=1

    xtrain = np.concatenate((xtrain,year),axis=1)

    mon=np.zeros((n,12))

    for i in range(n):
        m = dt[i]
        m = m[5:7]
        m = int(m)
        mon[i][m-1]=1

    xtrain = np.concatenate((xtrain,mon),axis=1)

    day=np.zeros((n,31))

    for i in range(n):
        d = dt[i]
        d = d[8:10]
        d = int(d)
        day[i][d-1] = 1   

    xtrain = np.concatenate((xtrain,day),axis=1)

    season = {
        "1" : [1,0,0,0],
        "2" : [0,1,0,0],
        "3" : [0,0,1,0],
        "4" : [0,0,0,1]
    }

    sea = xdata[:,2]
    ss = []

    for i in sea:
        ss.append(season[i])

    ss = np.array(ss)
    xtrain = np.append(xtrain,ss,axis=1)

    hr = xdata[:,3]
    hours = np.zeros((n,24))
 
    for i in range(n):
        hours[i][int(hr[i])]=1

    hours = np.asarray(hours)
    xtrain = np.concatenate((xtrain,hours),axis=1)

    hl1 = np.eye(2)[xtrain[:, 4].astype(float).astype(int)]

    xtrain = np.concatenate((xtrain,hl1),axis=1)

    '''
    holi = {
        "0" : [1,0],
        "1" : [0,1]
    }

    hl=xtrain[:,4]
    hl1=[]

    for i in hl:
        hl1.append(holi[i])

    hl1=np.array(hl1)

    xtrain = np.append(xtrain,hl1,axis=1) 

    '''

    days = {

        "Monday" : [1,0,0,0,0,0,0],

        "Tuesday" : [0,1,0,0,0,0,0],

        "Wednesday" : [0,0,1,0,0,0,0],

        "Thursday" : [0,0,0,1,0,0,0],

        "Friday" : [0,0,0,0,1,0,0],

        "Saturday" : [0,0,0,0,0,1,0],

        "Sunday" : [0,0,0,0,0,0,1]

    }

    abc=xdata[:,5]

    dayss=[]

    for i in abc:
        dayss.append(days[i])

    dayss=np.asarray(dayss)

    xtrain = np.append(xtrain,dayss,axis=1)

    wda=np.eye(2)[xtrain[:, 6].astype(float).astype(int)]

    xtrain = np.concatenate((xtrain,wda),axis=1)

    '''

    wda = {
        "0" : [1,0],
        "1" : [0,1]
    }

    wd=xtrain[:,6]

    wd1=[]

    for i in wd:
        wd1.append(wda[i])

    wd1=np.asarray(wd1)

    xtrain = np.append(xtrain,wd1,axis=1)

    '''

    st = np.eye(2)[xtrain[:, 7].astype(float).astype(int)]
    xtrain = np.concatenate((xtrain,st),axis=1)


    '''
    situation = {
        "1" : [1,0],
        "2" : [0,1]        
    }   

    st=xtrain[:,7]
    st1 = []
    
    for i in st:
        st1.append(situation[i])

    st1=np.asarray(st1)

    xtrain = np.append(xtrain,st1,axis=1)

    '''

    xtrain = xtrain.astype('float64')
    return xtrain, ydata 
  

dictionary_of_losses = {

    'mse':(mean_squared_loss, mean_squared_gradient),

    'mae':(mean_absolute_loss, mean_absolute_gradient),

    'rmse':(root_mean_squared_loss, root_mean_squared_gradient),

    'logcosh':(mean_log_cosh_loss, mean_log_cosh_gradient),

}


def main():

    # You are free to modify the main function as per your requirements.
    # Uncomment the below lines and pass the appropriate value

    #mean_squared_loss()

    xtrain, ytrain, xtest = read_dataset(args.train_file, args.test_file)

    xtrainprocessed, ytrainprocessed = preprocess_dataset(xtrain, ytrain)

    xtestprocessed = preprocess_dataset(xtest)

    model1 = LinearRegressor(xtrainprocessed.shape[1])

    mse = model1.train(xtrainprocessed, ytrainprocessed, mean_squared_loss , mean_squared_gradient, args.epoch, args.lr)

    '''
    Code to plot graph for all the 4 errors

    model2 = LinearRegressor(xtrainprocessed.shape[1])

    model3 = LinearRegressor(xtrainprocessed.shape[1])

    model4 = LinearRegressor(xtrainprocessed.shape[1])

    # The loss function is provided by command line argument    

    loss_fn, loss_grad = dictionary_of_losses[args.loss]

    mae = model2.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_absolute_gradient, args.epoch, args.lr)

    logcosh = model3.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_log_cosh_gradient, args.epoch, args.lr)

    rmse = model4.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, root_mean_squared_gradient, args.epoch, args.lr)

    # print("MSE",mse,"MAE",mae,"LOGCOSH",logcosh,"RMSE",rmse)

    plt.plot(range(args.epoch),mse,"-",label="mse")

    plt.plot(range(args.epoch),mae,"-", label="mae")

    plt.plot(range(args.epoch),logcosh,"-",label="logcosh")

    plt.plot(range(args.epoch),rmse,"-",label="rmse")

    plt.legend()

    plt.show()

    '''

    model1.predict(xtestprocessed[0])

     
if __name__ == '__main__':  

    '''

    You can remove the comments and run or test your code on the following input:

    code to test all the 8 functions for the sample data provided on the moodle

    ydata = [-2,  1,  1,  2,  0]

    xdata = [[ 1,  0,  2, -3], [ 1, -1,  0, -3], [-2, -5,  1, -3], [ 0, -5,  3, -3], [ 0, -4,  3, -2]]

    weights = [ 1, 0, -2, -1]

    xdata=np.asarray(xdata)

    ydata=np.asarray(ydata)

    weights = np.asarray(weights)

    ydata=ydata.reshape(ydata.shape[0],1)

    weights=weights.reshape(weights.shape[0],1)

    ydata=ydata.reshape(ydata.shape[0],1)

    print(xdata.shape)

    print(ydata.shape)

    print(weights.shape)

    print(mean_squared_loss(xdata, ydata, weights))

    print(mean_squared_gradient(xdata, ydata, weights))

    print(mean_absolute_loss(xdata, ydata, weights))

    print(mean_absolute_gradient(xdata, ydata, weights))

    print(root_mean_squared_loss(xdata, ydata, weights))

    print(root_mean_squared_gradient(xdata, ydata, weights) )

    print(mean_log_cosh_loss(xdata, ydata, weights))

    print(mean_log_cosh_gradient(xdata, ydata, weights))

    sys.exit(0)

    '''

    parser = argparse.ArgumentParser()

    parser.add_argument('--loss', default='mse', choices=['mse','mae','rmse','logcosh'], help='loss function')

    parser.add_argument('--lr', default=0.01, type=float, help='learning rate')

    parser.add_argument('--epoch', default=50000, type=int, help='number of epochs')

    parser.add_argument('--train_file', type=str, help='location of the training file')

    parser.add_argument('--test_file', type=str, help='location of the test file')

    args = parser.parse_args()

    main()

Just copy and paste this Python code on your IDE and run by uncommenting the input parameters.

What to uncomment and how to run, we have left it upto you so that you can validate your understanding of the code implementation.

In case of any queries, feel free to get them cleared by posting it on our coemment section down below!

Happy Learning : )

Technical writer Aditya Jain specialises in LeetCode. Aditya's writing style simplifies even the hardest LeetCode problems. Aditya works to help developers improve and reach their potential.
IF YOU LIKE IT, THEN SHARE IT
Advertisement

RELATED POSTS