Introduction
As the data increases or the complexity of a model increases, the machine learning algorithms tend to get slower, and generating predictions can take many hours or days, then we have to rely on computational resources (like GPUs or more CPUs) to reduce the waiting time.
Vectorization is the process of reducing the algorithm’s running time without using external computational resources.
The idea of Vectorization is to remove explicit ‘for loops’ from the code by replacing it with a vectorized implementation that yields the same output.
Also See, Resnet 50 Architecture
How vectorization works internally?
Every CPU has Single Instruction Multiple Dimension (SIMD) sets which means that we can apply the same instruction to multiple points simultaneously, thereby using the parallelism property of the CPU. So, if there are N elements in an array, we can simply execute the same instruction on multiple dimensions simultaneously instead of looping N times.
Also see, Artificial Intelligence in Education
Mathematical Intuition
Let us assume that our CPU can process four dimensions/points simultaneously. So, the time taken to process N dimensions would be N/4.
Non-vectorized
for i in range(0, N):
prod[i] = array_one[i] * array_two[i]
Vectorized
for i in range(0, N, 4):
prod[i] = array_one[i] * array_two[i]
prod[i+1] = array_one[i+1] * array_two[i+1]
prod[i+2] = array_one[i+2] * array_two[i+2]
prod[i+3] = array_one[i+3] * array_two[i+3]
Vectorization in Machine Learning
In Machine Learning or Deep Learning, we often train our models on large datasets. So it becomes crucial that our code is time optimized, else the model will take several hours or days to yield results.
Let us take the example of the Logistic Regression ML algorithm,
We have the below equation,
The first step in calculating the value of Z is to multiply the two vectors W and X. We can multiply the two vectors in two ways vectorized and non-vectorized. Let’s see the python implementation of both.
Importing Libraries
import numpy as np # Using numpy for doing operations on the arrays
import time # Using time to determine the time taken by different operations
Initializing both vectors with random values
W = np.random.rand(5000000) # one-dimensional array with five million randomly generated numbers
X = np.random.rand(5000000) # one-dimensional array with five million randomly generated numbers
Non-Vectorized Implementation
# Non-vectorized demonstration
before_time = time.time()
Z = np.zeros((1,1))
for i in range(0, len(W)):
Z += W[i]*X[i]
after_time = time.time()
print("Time taken by non-vectorized multiplication is: ", (after_time - before_time)*1000, " ms")
print(round(Z[0][0], 5))
Vectorized Implementation
# Vectorized demonstration
before_time = time.time()
Z = np.dot(W.transpose(), X)
after_time = time.time()
print("Time taken by the dot product is: ", (after_time - before_time)*1000, " ms")
print(round(Z, 5))
Output
Result
We can see from the above implementations that the non-vectorized implementation took around 14000 milliseconds (14 seconds) to give the output. In contrast, the vectorized implementation took only seven milliseconds which is 2000 times faster.
As the dimensions of the matrices increase, the time difference between the two approaches will also increase. Hence, it is good to always use vectorization wherever possible and eliminate for loops.
Read More - Time Complexity of Sorting Algorithms