Table of contents
1.
Introduction
2.
What is BERT?
3.
Pre-requisite
4.
Requirements
5.
Building a Sentence Similarity Finder in Keras
5.1.
Import Libraries
5.2.
Python
5.3.
Distribution Strategy
5.4.
Model Creation and Scope
5.5.
Defining Input Layers
5.6.
Python
5.7.
Loading BERT Model
5.8.
Python
5.9.
Model Architecture
5.10.
Python
5.11.
Adding Layers
5.12.
Python
5.13.
Compiling the Model
5.14.
Python
5.15.
Tokenization and Padding
5.16.
Python
5.17.
Python
5.18.
Model Summary
6.
Model Demonstration
6.1.
Python
7.
Frequently Asked Questions
7.1.
What is Keras?
7.2.
What is a sentence similarity finder?
7.3.
What is BERT?
7.4.
Can I deploy the BERT-based models for real-time applications?
8.
Conclusion
Last Updated: Mar 27, 2024
Medium

Building a Sentence Similarity Finder in Keras

Author Sohail Ali
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Sentence similarity is the measure of how similar two different texts or sentences are and how much they express the same meaning. This technique is useful for retrieving and clustering the data. The BERT is an advanced pre-trained NLP model developed by Google. It is a powerful model which is extensively used in text classification, entity recognition, text similarity finding, and many more.

Building a Sentence Similarity Finder in Keras

In this blog, we will be building a sentence similarity finder in Keras using BERT. So without any further wait, let’s start learning!

What is BERT?

The BERT stands for bidirectional encoder representation from transformers. It is a pre-trained NLP model based on transformer architecture. It was designed in order to understand the context of words in sentences by analysing the bidirectional relationships between the words.

A few features of the BERT model are as follows:

  • It is available as an open-source model, and it can capture rich language patterns and features.
     
  • It is based on a self-attention mechanism, which is used to model word dependencies.
     
  • It is trained on large unlabeled data, and it can be fine-tuned on specific NLP tasks.
     
  • It supports multiple languages making it useful for multilingual NLP tasks.
     

Pre-requisite

Below are some pre-requisites to fully understand the sentence similarity finder model in Keras:

  • Familiarity with Python and Keras: You should be familiar with the Python programming language and Keras library that runs on top of TensorFlow.
     
  • Deep Learning Basics: You should know deep learning concepts like neural networks, layers, optimizers, etc.
     
  • Transformers Library: You should be familiar with transformer libraries from which the BERT model is loaded.
     
  • Input Sequences: The knowledge of tokenization and padding is necessary as they are the fundamental steps of transformer-based models.

Requirements

In order to build the model in Python, you will require to have all the following libraries installed:

  • Numpy: It is used for numerical calculations in Python. The command to install numpy is:
     
pip install numpy

 

  • TensorFlow: It is a powerful tool which is used in deep learning models. The command to install TensorFlow is:
     
pip install tensorflow

 

  • Transformers Library: It is a library of pre-trained transformer models. The command to install the transformers library is:
     
pip install transformers

Building a Sentence Similarity Finder in Keras

Now, let’s start building our model to predict similarity scores for two different texts/statements. 

Import Libraries

First, let’s start by importing the necessary libraries which will be required for building a sentence similarity finder in Keras.

  • Python

Python

import tensorflow as tf
import tensorflow.keras as keras
from transformers import TFBertModel, BertTokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
You can also try this code with Online Python Compiler
Run Code


Here, we imported the TensorFlow and Keras libraries in the first two lines. Then we imported the BERT model and Tokenizer to work with the pre-trained BERT model. Tokenizers convert raw string input into integer input for Keras layers. Finally, the pad_sequence to pad sequences to a fixed length.

Distribution Strategy

TensorFlow provides an in-build functionality supporting distributed strategy. The distributed strategy allows you to carry out distributed training across several processing units with minimal changes in the code. 

Let us create a distribution strategy to train the existing code with minimal changes. We will use the MirroredStrategy, which allows distribution across multiple GPUs.

strategy = tf.distribute.MirroredStrategy()
You can also try this code with Online Python Compiler
Run Code

Model Creation and Scope

Let us create the scope of the distribution strategy in which we are going to perform all our operations.

with strategy.scope():
	… 
	…
You can also try this code with Online Python Compiler
Run Code

Defining Input Layers

Now, let’s define three layers to store the encoded ids, attention masks, and token-type ids using the Keras ‘Input’ layer.

  • Python

Python

input_ids = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="input_ids")
attention_masks = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="attention_masks")
token_type_ids = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="token_type_ids")
You can also try this code with Online Python Compiler
Run Code

Loading BERT Model

Now, load the pre-trained BERT model based on the 'bert-base-uncased' variant. 

  • Python

Python

bert_pretrained_model = TFBertModel.from_pretrained("bert-base-uncased")
bert_pretrained_model.trainable = False
You can also try this code with Online Python Compiler
Run Code


Here, the trainable parameter is made false to prevent any modification of pre-trained features.

Model Architecture

Now, let us pass the previously defined three layers to the BERT model and store it in a variable.

  • Python

Python

bert_output = bert_pretrained_model(input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids)
You can also try this code with Online Python Compiler
Run Code

Adding Layers

Now, let’s add some additional trainable layers on top of the BERT layer for better adaptability of new data. 

  • Python

Python

bi_lstm = keras.layers.Bidirectional(keras.layers.LSTM(64, return_sequences=True))(sequence_output)
avg_pool = keras.layers.GlobalAveragePooling1D()(bi_lstm)
max_pool = keras.layers.GlobalMaxPooling1D()(bi_lstm)
concat = keras.layers.concatenate([avg_pool, max_pool])
dropout = keras.layers.Dropout(0.3)(concat)
output = keras.layers.Dense(3, activation="softmax")(dropout)
You can also try this code with Online Python Compiler
Run Code


In the above code, we have used various layers such as bidirectional LSTM, global average pooling, and global max pooling. Their outputs are concatenated and passed through a dropout layer. Finally, a dense layer is added, which will give us the final output.

Compiling the Model

Once we get the final BERT model with specified inputs and outputs, we are ready to compile the model.

  • Python

Python

bert_model = keras.models.Model(inputs=[input_ids, attention_masks, token_type_ids], outputs=output)

bert_model.compile(
	optimizer=keras.optimizers.Adam(),
	loss="categorical_crossentropy",
	metrics=["acc"],
)
You can also try this code with Online Python Compiler
Run Code


Here, we have compiled the BERT model using the Adam optimizer, categorical cross-entropy loss function and accuracy metric.

Tokenization and Padding

We will use the BERT tokenizer and pad the given sentences. It creates a sequence of encoded its, attention masks, and token-type ids.

  • Python

Python

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sequences = tokenizer(sentence1, sentence2, padding='max_length', truncation=True, max_length=MAX_LENGTH, return_tensors='tf')
You can also try this code with Online Python Compiler
Run Code


The final code of our model is:

  • Python

Python

import tensorflow as tf
import tensorflow.keras as keras
from transformers import TFBertModel, BertTokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Create the bert_model under a distribution strategy scope.
strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
	# Set the maximum sequence length for padding
	MAX_LENGTH = 100  

	# Encoded token ids.
	input_ids = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="input_ids")
	# Attention masks.
	attention_masks = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="attention_masks")
	# Token-type ids.
	token_type_ids = keras.layers.Input(shape=(MAX_LENGTH,), dtype=tf.int32, name="token_type_ids")

	# Loading pretrained BERT bert_model.
	bert_pretrained_model = TFBertModel.from_pretrained("bert-base-uncased")
	# Freeze the BERT bert_model.
	bert_pretrained_model.trainable = False

	bert_output = bert_pretrained_model(input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids)
	sequence_output = bert_output.last_hidden_state

	# Add additional layers
	bi_lstm = keras.layers.Bidirectional(keras.layers.LSTM(64, return_sequences=True))(sequence_output)
	avg_pool = keras.layers.GlobalAveragePooling1D()(bi_lstm)
	max_pool = keras.layers.GlobalMaxPooling1D()(bi_lstm)
	concat = keras.layers.concatenate([avg_pool, max_pool])
	dropout = keras.layers.Dropout(0.3)(concat)
	output = keras.layers.Dense(3, activation="softmax")(dropout)
	bert_model = keras.models.Model(inputs=[input_ids, attention_masks, token_type_ids], outputs=output)

	bert_model.compile(
		optimizer=keras.optimizers.Adam(),
		loss="categorical_crossentropy",
		metrics=["acc"],
      
	)

# Example sentences
sentence1 = ""
sentence2 = ""


# Tokenize sentences and pad sequences
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sequences = tokenizer(sentence1, sentence2, padding='max_length', truncation=True, max_length=MAX_LENGTH, return_tensors='tf')
You can also try this code with Online Python Compiler
Run Code

Model Summary

Now, let’s look at the final model using the summary() method available in Keras. It prints the model architecture, number of trainable parameters, and other related information about the model.

Add the below piece of lines in your code to get the summary of the model.

print(bert_model.summary())
You can also try this code with Online Python Compiler
Run Code

 

After executing the code, we will get the below result.

output

Great! We are finally done building a sentence similarity finder in keras.

Model Demonstration

Let’s now check the working of our model using an example statement. We will declare two variables, sentence1 and sentence2, which will be our example statements. 

Add the below line to your code. 

  • Python

Python

# Example sentences
sentence1 = "Hello Everyone! Welcome to Coding Ninjas."
sentence2 = "Join Coding Ninjas to become a pro coder!"

# Tokenize sentences and pad sequences
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sequences = tokenizer(sentence1, sentence2, padding='max_length', truncation=True, max_length=MAX_LENGTH, return_tensors='tf')

# Predict similarity
prediction = bert_model.predict([sequences['input_ids'], sequences['attention_mask'], sequences['token_type_ids']])
similarity_score = prediction[0, 1]  # Assuming you want similarity score for sentence1 and sentence2
print("The similarity score of two sentences is:", similarity_score)
You can also try this code with Online Python Compiler
Run Code


After executing the code, you will get the final similarity score of our two statements, as shown below:

final result

Frequently Asked Questions

What is Keras?

Keras is an open-source deep-learning library in Python. It provides high-level neural network API to build deep learning models.

What is a sentence similarity finder?

A sentence similarity finder is a model which is used to predict the similarity between two given sentences based on their semantic meaning.

What is BERT?

The BERT stands for bidirectional encoder representation from transformers which is a pre-trained NLP model used for various language processing tasks.

Can I deploy the BERT-based models for real-time applications?

Yes, depending on your use case, You can deploy the BERT-based models for real-time applications.

Conclusion

This article discusses the topic of building a sentence similarity finder in Keras. We discussed the BERT model and how we can use it to check the similarity between two sentences. We hope this blog has helped you enhance your knowledge of building a sentence similarity finder in Keras. If you want to learn more, then check out our articles.

Refer to our Guided Path to upskill yourself in DSACompetitive ProgrammingJavaScriptSystem Design, and many more! If you want to test your coding ability, you may check out the mock test series and participate in the contests hosted on Coding Ninjas!

But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. In that case, you must look at the problemsinterview experiences, and interview bundles for placement preparations.

However, you may consider our paid courses to give your career an edge over others!

Happy Learning!

Live masterclass