OddOneOut Model using Word Embedding

Introduction

A word installation is an educated portrayal of a message where words that have a similar significance have a comparative portrayal. This way to deal with addressing terms and archives might be viewed as one of the vital forward leaps of profound learning on testing common language handling issues. Word Embeddings are a technique for extricating highlights out of text so we can enter those elements into an AI model to work with text information. They attempt to save linguistic and semantic data.

OddOneOut Model using Word Embedding

Word embedding

In regular language handling, word installing is a term utilized to portray words for text investigation, ordinarily, as a genuine esteemed vector that encodes the importance of the word to such an extent that the words nearer in the vector space are supposed to be used to comparative in mean.

Word2Vec and Gensim are words inserting approaches that address this issue and empower comparative terms to have comparative aspects and, thus, bring set.

Word embeddings utilize a brain network with one information layer, one hidden layer, and one result layer.

Odd One Out

Odd one out, the issue is one of the most intriguing and goto issues regarding testing the sensible thinking abilities of a person. It is regularly utilized in numerous severe tests and situation adjustments to check the person's logical skills and dynamic capacity. In this article, we will compose a python code that can be utilized to track down the odd words among a given arrangement of words.

We will track down the average vector of all the given word vectors. Afterward, we analyze the similarity worth of each word vector with the normal vector esteem, the word with the slightest similarity will be our odd word.

Steps in process

Word2Vec in Python
Introducing modules. We start by introducing the 'gensim' and 'nltk' modules.
Bringing in libraries. From nltk.tokenize import sent_tokenize, word_tokenize import gensim from gensim.models import Word2Vec.
Perusing the text information.
Setting up the corpus.
Building the Word2Vec model utilizing Gensim.

Using Word2Vec and Gensim

Word2Vec is perhaps the most famous strategy to learn word embeddings utilizing a shallow brain organization. Tomas Mikolov created it in 2013 at Google. Word2vec is a mix of models used to address conveyed portrayals of words in a corpus C. Word2Vec (W2V) is a calculation that acknowledges message corpus as information and results in a vector portrayal for each word. We will utilize the Google pre-prepared model for the estimate Odd One Out that we will execute soon. To add the gensim library to python, we will use pip and the following code.

import numpy as np
import gensim
from gensim.models import word2vec,KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity


vector_word_notations = KeyedVectors.load_word2vec_format

You can also try this code with Online Python Compiler

Run Code

Code to define the function to point out the odd word:

def odd_word_out(input_words):
'''The function accepts a list of word and returns the odd word.'''

# Generate all word embeddings for the given list of words

whole_word_vectors = [vector_word_notations[i] for i in input_words]

# average vector for all word vectors
mean_vector = np.mean(whole_word_vectors,axis=0)

# Iterate over every word and find similarity
odd_word = None
minimum_similarity = 99999.0 # Can be any very high value

for i in input_words:
similarity = cosine_similarity([vector_word_notations[i]],[mean_vector])
if similarity < minimum_similarity:
minimum_similarity = similarity
odd_word = i

print("cosine similarity score between %s and mean_vector is %.3f"%(i,similarity))

print("\nThe odd word is: "+odd_word)

You can also try this code with Online Python Compiler

Run Code

The cosine likeness capacity will be essential in carrying out this calculation. It registers likeness as the standardized speck result of X and Y. In short words, we can utilize it to tell how much two terms are connected. Allow us to see specific models.

Now we can use different example codes to test this like:

input_1 = ['apple','mango','juice','python','orange','guava'] # python is odd word
odd_word_out(input_1)

You can also try this code with Online Python Compiler

Run Code

In this execution, we have utilized KeyedVectors(from gensim module) and cosine likeness function(provided by sklearn)

The algorithm of OddOneOut

What we are doing is passing a rundown of words to our program. Along these lines, what we will do is we will take the normal of the word vectors of the multitude of words, i.e., assuming the word vectors of the words in the rundown areas v1,v2,v3… … vn (n = no. of words in the rundown), the average vector can be found out by taking the mean of all the word vectors by np.mean ([v1,v2,v3,…,vn],axis=0). Then, at that point, we will set a variable smaller than usual and give it an impressive high worth, which will help in specific correlations we will see soon. Then we will initiate a circle and emphasize every one of the words in the rundown and look at the cosine comparability between each word with the avg vector we determined.

Our oddball will be the word with the most extreme value of similitude with the average vector. Our normal vector comprises n-k terms with similar settings and k words (where k will be a modest number) of a different location from that of n-k words.

Word2Vec was presented in two papers in September and October 2013 by a group of analysts at Google. Alongside the documents, the specialists distributed their execution in C. The Python execution was done not long after the first paper by Gensim.

The basic supposition of Word2Vec is that two words having comparable settings likewise share a comparative significance and thus a relative vector portrayal from the model. For example, "canine," "pup," and "puppy" are frequently utilized in comparable circumstances, with comparative encompassing words like "great," "fleecy," or "adorable," and as per Word2Vec, they will in this manner share a relative vector portrayal.

From this presumption, Word2Vec can be utilized to figure out the relations between words in a dataset, register the comparability between them, or use the vector portrayal of those words to contribute to different applications like text characterization or bunching.

Libraries that are used in the code are as follows:

xlrd==1.1.0:
spaCy==2.0.12:
gensim==3.4.0:
scikit-learn==0.19.1:
seaborn==0.8:

FAQs

1. What is word implanting model?

While addressing a word like a frog, the closest neighbor of a frog would be frogs, amphibians, Litoria.

2. What is word installing utilized for?

In regular language handling (NLP), word inserting is a term utilized for the portrayal of words for text examination, ordinarily, as a genuine esteemed vector that encodes the significance of the word with the end goal that the terms that are nearer in the vector space are supposed to comparative in mean.

3. How would I make a Word2Vec install?
- Word2Vec in Python
- Introducing modules. We start by introducing the 'gensim' and 'nltk' modules.
- Bringing in libraries.
From nltk.tokenize import sent_tokenize, word_tokenize
import gensim
from gensim.models import Word2Vec.
- Perusing the text information.
- Setting up the corpus.
- Building the Word2Vec model utilizing Gensim.

4. What is the Word2vec model?

Word2vec is a method for normal language handling distributed in 2013. The word2vec calculation utilizes a brain network model to gain word relationships from a vast text corpus.

5. What is the contrast between GloVe installing and Word2Vec?

Word2Vec accepts texts as preparing information for brain organization. The subsequent inserting catches whether words show up in comparable settings—gloVe centers around words co-events over the entire corpus.

Key Takeaways

This gave a genuine model and bright about more up to date and more effective systems around utilizing profound gaining language models to remove highlights from text information and address issues like word semantics, setting, and information sparsity. Next up will be point-by-point procedures on utilizing profound learning models, including designing picture information. Having these embeddings, we can play out some fascinating regular language undertakings. One of these is to figure out a similitude between different words.

Hey Ninjas! Don’t stop here; check out Coding Ninjas for Python, more unique courses and guided paths. Also, try Coding Ninjas Studio for more exciting articles, interview experiences, and excellent Machine Learning and Python problems.

Happy Learning!