Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Collaborative Filtering
Content-Based Filtering
Importing Necessary Libraries 
Loading Data
Frequently Asked Questions  
Key Takeaways 
Last Updated: Mar 27, 2024

Recommendation Engine

Basics of Machine Learning
Free guided path
9 chapters
29+ problems
Earn badges and level up


We will be studying the implementation of a sample recommendation engine in Python. 

A Recommendation Engine is a system that helps to filter preferences based on the user. These are used in varied domains such as movies, books, music, and more. 

There are two ways by which we can build a recommendation engine:-

(i) Collaborative Filtering 

(ii) Content-based Filtering


Collaborative Filtering

→ The user's past behavior is used to build the model. 

→ Also, a comparison is made with similar preferences of other users to get an overall idea. 

→  It is only after an in-depth analysis that the model predicts the user's interests.


Content-Based Filtering

→ A series of discrete characteristics of an item is used to recommend additional items with similar properties. 

→ Also, the description of items and the profile of users’ preferences are used to get an overall idea. 

→ After an in-depth analysis of the above, the model recommends items based on the user's past preferences.



Now, let us implement a movie recommendation engine using numerous Python libraries.

We would be using the following two files for building the recommendation engine. Clicking on them will download the same. 

(i) .tsv file

(ii) Movie_Id_Titles.csv


Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Importing Necessary Libraries 

Firstly, we will load some basic libraries:-

(i) Numpy - for linear algebra. 

(ii) Pandas - for data analysis. 

(iii) Seaborn - for data visualization.

(iv) Matplotlib - for data visualisation.  

import numpy as np 

import pandas as pd 

import seaborn as sns

from matplotlib import pyplot as plt


Loading Data

Firstly, we will create a few columns for the dataset, namely, User_ID, Item_ID, Rating, and Timestamp. 

User_ID: ID of the unique user.

Item_ID: Item of the unique movie.

Rating: Rating given to the movie by a particular user.

Timestamp: Timestamp of the movie rating. 

# Creating columns for the Dataset 

columns = ['User_ID''Item_id''Rating''Timestamp']


df = pd.read_csv('file (1).tsv', sep='\t', names=columns)


#printing the dataset 







movies = pd.read_csv('Movie_Id_Titles.csv')






We will merge the two datasets to build the movie recommendation engine. We will have the following features as part of the final dataset:-

→ User_ID - unique ID of the user. 

→ item_id - unique ID of the item. 

→ Rating - rating of the movie assigned by a unique user. 

→ Timestamp - timestamp of the movie rating. 

→ Title - the title of the movie. 

#merging both the datasets 

data = pd.merge(df, movies, on='item_id')








Firstly, let us calculate the average ratings of all the movies for better visualization. 

# Calculating mean rating of all movies in ascending order 






Secondly, let us calculate the number of ratings assigned to all movies.

# Calculating count rating of all movies

Total_Ratings= counts = data.groupby('title')['Rating'].count().sort_values(ascending=False).head()





Now, let us create a data frame to list each movie with its respective average rating and total ratings. 

#lets create a dataframe of each movie with its number of ratings and avg rating 

ratings = pd.DataFrame(data.groupby('title')['Rating'].mean())

ratings['Total Ratings'] = pd.DataFrame(data.groupby('title')['Rating'].count())






Finally, let us plot a graph of the Average Rating vs. Number of Ratings for different movies. 

# plot graph


plt.figure(figsize =(105))

plt.xlabel('Average Rating')

plt.ylabel('Number of Ratings')

ratings['Rating'].hist(bins = 70)





Now, let us sort movies according to the number of ratings. 

#Sorting movies according to the number of ratings 

movie_sort = data.pivot_table(index ='User_ID',columns ='title', values ='Rating')




ratings.sort_values('Total Ratings', ascending = False).head(10)




So, now we get the total number of ratings assigned to each movie in descending order. We observe that 'Star Wars' has got the highest ratings. 

So, let us try to find the correlation of 'Star Wars' with other movies and generate similar movies for the user. This will be the goal of our recommendation engine, i.e., to suggest movies similar to 'Star Wars' for the user. 

# analysing correlation with similar movies

similar_to_starwars = movie_sort.corrwith(starwars_user_ratings)

#correlation of star wars with other movies, high correlation indicating higher similarity 

corr_starwars = pd.DataFrame(similar_to_starwars, columns =['Correlation'])

corr_starwars.dropna(inplace = True)








Now, let us find movies similar to star wars with the condition that the movie has greater than 200 ratings. We are doing this to make user preferences more accurate. 

#movies similar to Star Wars 

corr_starwars.sort_values('Correlation', ascending = False).head(10)

corr_starwars = corr_starwars.join(ratings['Total Ratings'])




corr_starwars[corr_starwars['Total Ratings']>200].sort_values('Correlation', ascending = False).head()




 Finally, we find the top 5 movies (>200 ratings) that are most similar to ‘Star Wars’:-

→ Empire, Strikes Back, The (1980)

→ Return of the Jedi (1983)

→ Raiders of the Lost Ark (1981) 

→ Sting, The (1973) 

Hence, we can predict user preference based on his liking.

Frequently Asked Questions  

Q1. What is a Recommendation Engine?

A Recommendation Engine is a system that helps to filter preferences based on the user. These are used in varied domains such as movies, books, music, and more. 


Q2. What are the two ways of building a Recommendation Engine?

The two ways of building a recommendation engine are:-

(i) Collaborative Filtering 

(ii) Content-Based Filtering 


Q3. What is the importance of Data Visualization?

Data visualization helps to gain insights into data and, in turn, figure out patterns and identify errors. 

Key Takeaways 

Congratulations on making it this far. This blog discussed a fundamental overview of building a Recommendation Engine !!

We learned about Data Loading, Data Visualisation, and finally, generating Recommendations using correlation. 

If you are preparing for the upcoming Campus Placements, don’t worry. Coding Ninjas has your back. Visit this link for a carefully crafted and designed course on-campus placements and interview preparation.

Previous article
Recommendation Engine - Introduction
Next article
Attention Mechanism for Image Processing
Guided path
Basics of Machine Learning
9 chapters
29+ Problems
Earn badges and level up
Live masterclass