0% found this document useful (0 votes)
22 views2 pages

Lab Ca3

The document is a Jupyter notebook that demonstrates the use of the Gensim library for natural language processing, specifically for word vector operations. It includes functions to calculate word vectors, dot products, cosine similarity, and to find the most similar words. Additionally, it features a function to complete multiple-choice analogy questions using word vectors, showcasing various examples and their results.

Uploaded by

try.nahush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

Lab Ca3

The document is a Jupyter notebook that demonstrates the use of the Gensim library for natural language processing, specifically for word vector operations. It includes functions to calculate word vectors, dot products, cosine similarity, and to find the most similar words. Additionally, it features a function to complete multiple-choice analogy questions using word vectors, showcasing various examples and their results.

Uploaded by

try.nahush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

3/14/25, 8:25 PM nlp.

ipynb - Colab

Double-click (or enter) to edit

!pip install gensim

Requirement already satisfied: gensim in /usr/local/lib/python3.11/dist-packages (4.3.3)


Requirement already satisfied: numpy<2.0,>=1.18.5 in /usr/local/lib/python3.11/dist-packages (from gensim) (1.26.4)
Requirement already satisfied: scipy<1.14.0,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from gensim) (1.13.1)
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.11/dist-packages (from gensim) (7.1.0)
Requirement already satisfied: wrapt in /usr/local/lib/python3.11/dist-packages (from smart-open>=1.8.1->gensim) (1.17.2)

import numpy as np
import gensim
import gensim.downloader
import math

# modelName = "fasttext-wiki-news-subwords-300"
modelName = "glove-wiki-gigaword-200"
# modelName = "glove-twitter-200"
model = gensim.downloader.load(modelName)

more_horiz [--------------------------------------------------] 1.5% 3.9/252.1MB downloaded

def getWordVector(word):
return model[word]

#Dot Product
def getVectorDotProduct(v1, v2):
return np.dot(v1, v2)
def getWordVectorDotProduct(w1, w2):
return getVectorDotProduct(getWordVector(w1), getWordVector(w2))
#Vector Length
def getVectorLength(v):
return math.sqrt(getVectorDotProduct(v, v))
def getWordVectorLength(w):
return getVectorLength(getWordVector(w))
#Cosine Similarity
def getVectorCosineSimilarity(v1, v2):
return getVectorDotProduct(v1, v2)/(getVectorLength(v1) * getVectorLength(v2))
def getWordVectorCosineSimilarity(w1, w2):
return getVectorCosineSimilarity(getWordVector(w1), getWordVector(w2))
def getMostSimilarWord(word):
return model.most_similar(word)

def completeMCQAnalogyList(wordA, wordB, wordC, options):


answerVector = getWordVector(wordC) + getWordVector(wordB) - getWordVector(wordA)
bestOption = options[0]
# Check if bestOption is in the model's vocabulary before getting its vector
if bestOption in model.key_to_index:
bestSimilarity = getVectorCosineSimilarity(answerVector, getWordVector(bestOption))
else:
bestSimilarity = -1 # Assign a low similarity if the word is not found

for i in range(0, len(options)):


# Check if the current option is in the model's vocabulary before getting its vector
if options[i] in model.key_to_index:
similarity = getVectorCosineSimilarity(answerVector, getWordVector(options[i]))
if bestSimilarity < similarity:
bestSimilarity = similarity
bestOption = options[i]
return bestOption # Return after checking all options

completeMCQAnalogyList('squint', 'eye', 'squeeze', ['tongue', 'cloth', 'hand', 'throat'])

'hand'
 

completeMCQAnalogyList('pantry', 'store', 'scullery', ['kitchen', 'cook', 'utensils', 'wash'])

https://colab.research.google.com/drive/1hC2JoibWA5Dqi7tdtsnnseIZugiZdetV#scrollTo=Ut1QRrIQ3M4h&printMode=true 1/2
3/14/25, 8:25 PM nlp.ipynb - Colab

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-46-a468d4ecfff5> in <cell line: 0>()
----> 1 completeMCQAnalogyList('pantry', 'store', 'scullery', ['kitchen', 'cook', 'utensils', 'wash'])

4 frames
/usr/local/lib/python3.11/dist-packages/gensim/models/keyedvectors.py in get_index(self, key, default)
418 return default
419 else:
Next-->
steps:
420 Explain error raise KeyError(f"Key '{key}' not present")
421
422 def get_vector(self, key, norm=False):
completeMCQAnalogyList('calf', 'cow', 'puppy', ['dog', 'bitch', 'donkey', 'mare'])
KeyError: "Key 'scullery' not present"

'dog'
 
 

completeMCQAnalogyList('coal', 'heat', 'wax', ['candle', 'bee', 'energy', 'light'])

'light'
 

completeMCQAnalogyList('traveler', 'journey', 'sailor', ['water', 'voyage', 'ship', 'crew'])

'ship'
 

completeMCQAnalogyList('election', 'manifesto', 'meeting', ['agenda', 'minutes', 'circular', 'preface'])

'agenda'
 

completeMCQAnalogyList('eye', 'myopia', 'teeth', ['pyorrhoea', 'cataract', 'trachoma', 'eczema'])

'eczema'
 

completeMCQAnalogyList('carpenter', 'saw', 'tailor', ['sew', 'cloth', 'needle', 'tape'])

'tape'
 

completeMCQAnalogyList('paw', 'cat', 'hoof', ['horse', 'lion', 'lamb', 'elephant'])

'horse'
 

completeMCQAnalogyList('antiseptic', 'germs', 'antidote', ['allergy', 'poison', 'wound', 'infection'])

'poison'
 

Start coding or generate with AI.

https://colab.research.google.com/drive/1hC2JoibWA5Dqi7tdtsnnseIZugiZdetV#scrollTo=Ut1QRrIQ3M4h&printMode=true 2/2

You might also like