0% found this document useful (0 votes)
386 views

NLP Assignment-7 Solution

This document contains a 7 question multiple choice quiz on natural language processing topics. The questions cover word embedding techniques like word2vec, word analogy, similarity metrics like cosine similarity and KL divergence, and limitations of word2vec models.

Uploaded by

geetha megharaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
386 views

NLP Assignment-7 Solution

This document contains a 7 question multiple choice quiz on natural language processing topics. The questions cover word embedding techniques like word2vec, word analogy, similarity metrics like cosine similarity and KL divergence, and limitations of word2vec models.

Uploaded by

geetha megharaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Natural Language Processing

Assignment 7
Type of Question: MCQ

Number of Questions: 7 Total Marks: (5×1)+(3×2)=10

==================================================
=

Question 1: Suppose you have a raw text corpus and you compute word co
occurrence matrix from there. Which of the following algorithm(s) can you utilize
to learn word representations? (Choose all that apply) [1 mark]

a. CBOW
b. SVM
c. PCA
d. Bagging

Answer: a, c
Solution:

==================================================
=

Question 2: What is the method for solving word analogy questions like, given A,
B and D, find C such that A:B::C:D, using word vectors? [1 mark]

a. vc = va + (vb − vd), then use cosine similarity to find the closest word of vc.
b. vc = va + (vd − vb) then do dictionary lookup for vc
c. vc = vd + (va − vb) then use cosine similarity to find the closest word of vc.
d. vc = vd + (va − vb) then do dictionary lookup for vc.
e. None of the above

Answer: c
Solution: vd − vc = vb − va
vc = vd + va − vb then use cosine similarity to find the closest word of vc.
==================================================
=

Question 3: What is the value of PMI(w1, w2) for C(w1) = 250, C(w2) = 1000,
C(w1, w2) = 160, N = 100000? N: Total number of documents.
C(wi): Number of documents, wi has appeared in.
C(wi, wj ): Number of documents where both the words have appeared in.
Note: Use base 2 in logarithm. [1 mark]

a. 4
b. 5
c. 6
d. 5.64

Answer: c

Solution:

PMI = log2 [(160*100000) / (250*1000)] = log2(64) = 6

==================================================
=

Question 4: Given two binary word vectors w1 and w2 as follows:


w1 = [1010101010]
w2 = [0011111100]
Compute the Dice and Jaccard similarity between them. [2 marks]

a. 6/11, 3/8
b. 10/11, 5/6
c. 4/9, 2/7
d. 5/9, 5/8
Answer: a
Solution:

==================================================
=

Question 5: Consider two probability distributions for two words be p and q.


Compute their similarity scores with KL-divergence. [2 marks]
p = [0.20, 0.75, 0.50]
q = [0.90, 0.10, 0.25]
Note: Use base 2 in logarithm.

a. 4.704, 1,720
b. 1.692, 0.553
c. 2.246, 1.412
d. 3.213, 2.426

Answer: c
Solution:

==================================================
=

Question 6: Consider the following word co-occurrence matrix given below.


Compute the cosine similarity between
(i) w1 and w2, and (ii) w1 and w3. [2 mark]
w4 w5 w6
w1 2 8 5
w2 4 9 7
w3 1 2 3

a. 0.773, 0.412
b. 0.881, 0.764
c. 0.987, 0.914
d. 0.897, 0.315

Answer: c
Solution:

Cosine-sim (w1, w2) = (2*4 + 8*9 + 5*7) / (√(2*2 + 8*8 + 5*5) * √(4*4 + 9*9 +
7*7)) = 0.987
Cosine-sim (w1, w3) = (2*1 + 8*2 + 5*3) / (√(2*2 + 8*8 + 5*5) * √(1*1 + 2*2 +
3*3)) = 0.914

==================================================
=

Question 7: Which of the following type of relations can be captured by


word2vec (CBOW or Skipgram)? [1 mark]
1. Analogy (A:B::C:?)
2. Antonymy
3. Polysemy
4. All of the above

Answer: 1
Solution: Word vectors learnt using CBOW or Skipgram models can’t
disambiguate between Antonyms or Polysemous words.

==================================================
=

You might also like