Table of contents
1.
Introduction
1.1.
Prerequisite 
2.
Hypothesis 
3.
Cost Function and Gradient Descent
4.
Feature Scaling
5.
Learning Rate 
6.
Implementation
7.
Frequently Asked Questions
8.
Key Takeaways
Last Updated: Mar 27, 2024

Multivariate Linear Regression - Implementation

Author Arun Nawani
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

You must have learned about simple linear regression or univariate linear regression and probably would have used it to solve some problems in the past. Like a model that predicts house prices in Delhi given their carpet area. Here the house price solely depends on the carpet area. Now for the same task suppose we are given the locality and the year of construction of the house as well. These additional features will also need to be considered in predicting the house price. The linear regression model that takes into consideration all these additional features will no longer be a simple linear regression model but a multivariate regression model.  Multivariate linear regression is just a more generalised form of linear regression. That means simple linear regression is a special case of multivariate linear regression.

Prerequisite 

The blog assumes that you are familiar with linear regression and have basic knowledge about its terminologies and computational working. If not, we strongly suggest you go through our other blog about linear regression.

Hypothesis 

Let’s recall the equation for simple linear regression.

hθ = θ0x + θ1

Where  hθ = target/ output variable as predicted by out hypothesis

  x  = independent variable

  θ0 = linear regression coefficient 

  θ1 = y intercept

In simple linear regression, we have a single independent variable determining the output. In multivariate linear regression, we have several independent variables and is of the form -

hθ(x) = θ0x0 + θ1x1 + θ2x+...+ θn-1xn-1 + θn 

Where x0, x1, x2,... xn-1 are the independent variables or features and n is the number of independent variables. 

Cost Function and Gradient Descent

Cost function in the case of Linear regression was - 

Here hθx(i) is output of ith training example on our hypothesis and y(i) is the actual value of the data point.

The cost function basically finds the values of linear regression coefficient and y-intercept for which the error difference between the actual output and predicted output is the least.

The cost function in the case of multivariate linear regression is given by - 


 

Sourcelink

The cost function is simulated using the gradient descent algorithm. Gradient descent is an optimisation algorithm that minimises the cost function. 

We see that for a very small increase in value of θ, there is a sharp decrease in the value of the cost function J. However this difference in J gets smaller and smaller for the increasing value of θ. To a point where it finally reaches the global minima and after that there is an increase in J value for an increase in θ. The corresponding independent variables at the global minima are then chosen for the final model. 

This is an iterative process and is given by - 

Here ​​α is the learning rate of the gradient descent.

Feature Scaling

The features or the independent variables or features need to be scaled in the same range for the gradient descent to work optimally. This is done by dividing all the features by the max of the feature column. For eg, suppose consider this dummy dataset { 3, 15, 200, 82, 500}. We can divide all the values in the data set with the Absoulte maximum value in the dataset (500). So the scaled data would lie in the range of -1 to 1. {3/500 , 15/500, 200/500 ,82/500 ,500/500}. This scaling method is called Absolute maximum Min-max scaling is another popular scaling method where we subtract the minimum value from every data in the set and then divide it by the range of the set. It is given by - 

(D-Dmin) / (Dmax-Dmin

There are other ways to scale the dataset as well. 

Unscaled features lead to skewed contour plotting as seen above. 

 

Source - link

Learning Rate
 

Source - link

Learning rate is the rate at which the gradient descent approaches and converges at global minima. The learning rate needs to be optimal as a small learning rate leads to slow convergence, which leads to slower execution and higher computational cost. 

For a very high value of learning rate, the descent might never reach the global minima and just pass over it. 

So an optimum value of α is necessary for the gradient descent to work efficiently.

 

Source - link

Implementation

We’ll now have a look at the python implementation of multivariate linear regression.

The dataset used is boston housing prices which is imported from sklearn library. The house price is our target variable.  

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from pandas import DataFrame
import sklearn as skt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
You can also try this code with Online Python Compiler
Run Code

 

Creating the data frame and splitting the dataset

class Dataset:
    
    def __str__(self):
        return "Dataset: The 'scikit-learn' Boston House-Prices" 
    
    def __init__(self):
        print(self)        
        
    def load(self):
        boston = load_boston()
        self.data = boston
        self.df = DataFrame(np.concatenate((boston.data, boston.target.reshape(-1, 1)), axis=1), 
                            columns=np.concatenate((boston.feature_names, ["MEDV"])))

        
        X = boston.data
        
        # Choose only a subset of features
        features_limit = ["LSTAT", "RM", "PTRATIO"]
        if len(features_limit)>0:
            features_index = np.where(dataset.data.feature_names==np.array(features_limit).reshape(-1,1))[1]
            X = X[:, features_index]
        
        X = self.preprocess(X)        
        
        y = boston.target.reshape(-1, 1) # Convert to Column vector        
        
        # Insert bias feature (x0 = 1) into features for all examples
        X = np.concatenate((np.ones((X.shape[0],1)), X), axis=1)
                
        self.X = X
        self.y = y
        
        print("Informations:")
        print("\t X ~", np.shape(X))
        print("\t y ~", np.shape(y))
        
        # Sanity check
        assert(len(X)==len(y))
        m = len(X)
        self.m = m
        print("\t m:", m)            
        
        # Number of features
        n = np.shape(X)[1]
        self.n = n
        print("\t n:", n)
        
        # train, dev, test sets split
        X_train_dev, X_test, y_train_dev, y_test = train_test_split(X, y, test_size=.1, random_state=0)
        X_train, X_dev, y_train, y_dev = train_test_split(X_train_dev, y_train_dev, test_size=.1, random_state=0)
                
        # Number of examples
        m_train = len(X_train)
        m_dev = len(X_dev)
        m_test = len(X_test)

        print("\t m_train:", m_train)
        print("\t m_dev:", m_dev)
        print("\t m_test:", m_test)
        
        assert(m_train+m_dev+m_test == m)
            
        return X_train, X_dev, X_test, y_train, y_dev, y_test, m_train, m_dev, m_test, m, n

 
    def preprocess(self, X):
        X_ = (X-np.mean(X, axis=0)) / np.std(X, axis=0) # Z normalization
        return X_
You can also try this code with Online Python Compiler
Run Code

 

dataset = Dataset()
X_train, X_dev, X_test, y_train, y_dev, y_test, m_train, m_dev, m_test, m, n = dataset.load()
You can also try this code with Online Python Compiler
Run Code

 

 

Custom Multivariate Linear Regression Model

class Model:    
    def __str__(self):
        return "Multivariate Linear Regression"
    #--------------------------------------------------------
    # Model parameters and hyperparameters initialization
    def __init__(self):
        print(self)
        self.theta = np.zeros((n, 1)) # Model parameters
        print("theta ~", np.shape(self.theta) )
        self.alpha = 0.06 # Learning rate #0.15 #0.01 #0.001 #0.0001
        self.epochs = 80
        # Cost function values during the training
        self.costs_train = []
        self.costs_dev = []                
    #--------------------------------------------------------    
    # Fitting the model to the training set w.r.t. the development set
    def fit(self, X_train, X_dev, y_train, y_dev):
        for epoch in np.arange(1, self.epochs+1):
            
            # Predictions
            preds_train = self.predict(X_train)
            preds_dev = self.predict(X_dev)
            
            # Costs
            J_train = self.cost(preds_train, y_train)
            J_dev = self.cost(preds_dev, y_dev)            
            
            # Accumulating costs for plotting the learning curves
            self.costs_train.append(J_train)
            self.costs_dev.append(J_dev)
            
            # Derivatives
            dtheta = (1/m) * np.matmul(X_train.T , (preds_train - y_train))
            
            # Update parameters
            self.update(dtheta)                        
            
        # Plot the learning curve
        self.plot_learning_curve()
    #--------------------------------------------------------
    # Update the parameters of the model
    def update(self, dtheta):
        self.theta -= self.alpha * dtheta
    #--------------------------------------------------------
    # Model prediction for some examples
    def predict(self, X):
        # Hypothesis
        preds = np.matmul(X, self.theta)
        return preds
    #--------------------------------------------------------
    # Evaluation of model on the test set
    def evaluate(self, X_test, y_test):
        m_test = X_test.shape[0]
        print("test set>")
        preds_test = self.predict(X_test)
        J_test = self.cost(preds_test, y_test)        
        
        se_test = np.sum( np.power( (preds_test-y_test), 2 ) )
        mse_test = (1/m_test)*(se_test)
        rmse_test = np.sqrt( mse_test )
        r2_score = 1-( se_test / np.sum( np.power(y_test-np.mean(y_test),2) ) )
        
        #print("\t", "Cost: ", J_test)
        print("\t", "MSE: ", mse_test)
        print("\t", "RMSE: ", rmse_test)
        print("\t", "r2 score: ", r2_score)
    #--------------------------------------------------------    
    # Use for getting the parameters of the model especially after the training
    def get_parameters(self):
        return self.theta
    #--------------------------------------------------------
    # Plot the learning curve to decision making purposes
    def plot_learning_curve(self):
        def config_figure():
            plt.xlabel("Epoch")
            plt.ylabel("Cost")        
            plt.title("Learning curve [train set]")           
            plt.legend()
        
        data = [[self.costs_train], [self.costs_dev], [self.costs_train, self.costs_dev]]
        labels = [["Trainig cost"], ["Development (validation) cost"], ["Trainig cost", "Development (validation) cost"]]
        styles = [["r-"], ["b-"], ["r-", "b-"]]
        
        for i, [dt, label, style] in enumerate(zip(data, labels, styles), 0):
            plt.figure()    
            for j, d in enumerate(dt, 0):
                plt.plot(np.arange(self.epochs), d, style[j], label=label[j])      
            config_figure()
            plt.show() 
    #--------------------------------------------------------    
    def cost(self, preds, y):
        J = (1/2) * np.mean( np.power( np.sum(preds-y), 2 ) )
        return J        

model=Model()
You can also try this code with Online Python Compiler
Run Code

 

 

model.fit(X_train, X_dev, y_train, y_dev)
You can also try this code with Online Python Compiler
Run Code

 

 

Scikit-learn Multivariate Linear Regression

​​from sklearn.metrics import mean_squared_error, r2_score
clf = LinearRegression()
clf.fit(X_train, y_train)
preds_test = clf.predict(X_test)
You can also try this code with Online Python Compiler
Run Code

 

Comparing the results of both the models

print("scikit-learn")
print("test set>")
print("\t", "MSE: ", mean_squared_error(y_test, preds_test))
print("\t", "RMSE: ", np.sqrt(mean_squared_error(y_test, preds_test)))
print("\t", "r2 score: ", r2_score(y_test, preds_test))
You can also try this code with Online Python Compiler
Run Code

 

model.evaluate(X_test, y_test)
You can also try this code with Online Python Compiler
Run Code

 

As we can see, our custom model is giving out similar results as the one by sci-kit learn library. 

Frequently Asked Questions

  1. What is the difference between simple linear regression and multivariate linear regression?
    Simple linear regression model makes its predictions with a single feature. Multivariate linear regression is a more generalised form of linear regression with n features. It could be said that simple linear regression is a special case of multivariate linear regression. 
     
  2. What is the learning rate of gradient descent?
    Learning rate is defined as the rate at which gradient descent converges at the global minima. This needs to be an optimum value since a small learning rate leads to slow convergence and a higher computational cost. While a bigger value can lead to missing out on the minima entirely.
     
  3. What are the various regression evaluation metrics? 
    r score, mean absolute error, mean squared error. 

Key Takeaways

In this blog, we covered multivariate linear regression in detail. From its mathematical significance to its python implementation. We also learned about the nitty-gritty of gradient descent and it can be used to simulate the cost function. We highly recommend you implement the code yourself to better understand how to go about a regression problem. 

Happy Learning

Live masterclass