0% found this document useful (0 votes)

9 views7 pages

NLP Using Deep Learning Handson

The document discusses various methods of representing words in numeric formats, focusing on one-hot encoding and word embeddings such as CBOW, Skip-gram, and GLoVe models. It explains the process of learning word embeddings, including initializing weights, defining context and target words, and the concept of negative sampling to improve efficiency. Additionally, it covers data preprocessing techniques for text data, including tokenization and sequence padding for training models in NLP tasks.

Uploaded by

Gurram Anurag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

NLP Using Deep Learning Handson

Uploaded by

Gurram Anurag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 7

One Hot Encoding

One naive way of representing a word in numeric format is by one-hot encoding.

As shown in the image initially, all the words in the vocabulary are stored as a
set and assigned a unique index to each word.
Later each word is represented as a vector, where all the elements are zeros except
for the index of the word which is equal to one.

Word to Vec
Learning good word embeddings are of paramount importance in NLP.
A good word embedding is one which can represent a word in minimum vector space and
at the same time preserve the semantics as well as their context in the language.
Each word embeddings are points in a vector space, and the transformation of words
to their vector representation is called wordtovec.

Learning Word Embeddings

There are several ways of arriving at word vectors using deep learning and some of
the popular methods are
Continuous Bag of words
Skipgram model
Glovec model

CBOW
Consider we have a sentence Judith feigned a forgotten wallet to evade paying for
dinner, proving she had surpassed frugality and become parsimonious.
If you are not sure about the meaning of the final word parsimonious you tend to
look at the words surrounding it to guess its meaning (or context).
By looking at the words evade, paying, frugality (avoiding waste) parsimonious
might mean spending less.
CBOW learns the meaning of a given word (or numeric vector) by looking at a fixed
number of words front and behind the word of interest, or in other words, it learns
the context.
The main idea of CBOW model is to predict a word given its context.

Skip Gram
Skip gram model is quit different from CBOW model.
The idea of skip gram model is to predict the context given the word
For a given word skip gram model tries to predict most probable words that usually
surrounds it.

Global Vectors
GLoVe model is slightly different from word to vec model.
GLoVe model learns to build word embeddings by looking at the number of times the
two words have appeared together which we call it as co-occurrence.
It tries to minimize the difference between the similarity of two-word vectors and
their co-occurrence.

Initializing Word Embeddings

Before training any neural network, one always starts with initializing the weights
and bias parameters to random values.
Likewise, before learning the word embeddings, the embedding vectors for each of
the words in the vocabulary are initialized with random values.
The word embeddings look like the table shown above. This table is commonly known
as look up table.
The length of each word embedding is of N dimension, in the above table N = 3

The Context and the Target

Before we employ a neural network to learn word embeddings, it is necessary to fix
the context and target words.
Consider the sentence
When you play a game of thrones you win, or you die.

Let's say that we are trying to learn the word embedding for the word thrones in
the above sentence.

Here the target word is thrones i.e the word for which we are trying to find the
embedding.

The context words for the word thrones are **game **, of , you, win i.e. the two
words before and after the target word provided the window size is 2.

In other words, the words which usually surround the target words becomes the
context word

The words sharing similar context share similar meaning (or similar word vector
representations).

Sampling
Once we have initialized the lookup table, it's time to sample the target and
context words as shown in the above image.
They are similar input feature and target labels what you see in supervised
learning.

The Skip Gram Model

The figure shows the architecture of skip gram model.
The embedding matrix is the lookup table whose weights are taken as word vectors.

Extracting Word Embeddings

After several iterations of forward-pass -> computing loss -> update weights the
updated weights of the embedding matrix of initial hidden layer are the final word
embeddings.
The weights corresponding to softmax layer can be discarded.

Skip-Gram Drawback
One main disadvantage of skip-gram model is that for each of the training sample,
it sees the model updates all the weights in every iteration.
If the vocabulary size is large, this could be computationally expensive.
To address this issue, there is another technique callednegative sampling, where
only a small fraction of weights are updated for each training sample.
Swipe next to know more on negative sampling.

Negative Sampling
In negative sampling, we generate a set of positive and negative samples from the
available text as shown in the figure.
Each sample will have a binary target value that says whether the two words appear
in the context or not.
The number of negative samples K can be arbitrarily chosen. Larger the corpus,
smaller the value of K (usually around 5).

Negative Sampling Architecture

The figure shows the negative sampling model architecture.
The softmax layer of Skip gram model is replaced by sigmoid activation.

-------------

Which of the following model predicts if a word is a context of another word or

not? Negative sampling
Which of the following model tries to predict the context word based on the target?
Skip gram model

Which of the following model tries to predict the target word based on the context?
CBOW model

Similar words tend to have similar word embeddings representations. Depends on the
corpus that is used to train

Which of the following option/options is/are the advantages of learning word

embeddings? All

Which of the following model needs fewer training samples to learn the word
embeddings? Negative sampling

Which of the following activations is used in the CBOW model in its final layer to
learn word embeddings? Softmax activation

For the window size two, what would be the maximum number of target words that can
be sampled for skip gram model? 4

Which of the following option is the drawback of representing text as one hot
encoding? No contextual relationship

Which layer of the Skip-gram model has an actual word embedding representation?
Hidden layer

---------------

Sample Code
from gensim.models import Word2Vec

# define training data

sentences = [['gensim', 'is', 'billed','as', 'a', 'natural', 'language',
'processing', 'package'],
['but', 'it', 'is', 'practically', 'much', 'more', 'than' ,'that'],
['It', 'is', 'a', 'leading', 'and', 'a', 'state', 'of', 'the', 'art',
'package',
'for', 'processing', 'texts', 'working' 'with' 'word' 'vector'
'models']]

# train model
model = Word2Vec(sentences, min_count=1, size = 10)

# summarize the loaded model

print(model)

# summarize vocabulary
words = list(model.wv.vocab)
print(words)

# access vector for one word

print(model['gensim'])

---------------

Hands On - Analogy Completion

1)
word_to_vec = dict()
glove_file = open(file_name)

for line in glove_file:

records = line.split()
word = records[0]
vector_dimensions = np.array(records[1:], dtype='float32')
word_to_vec [word] = vector_dimensions

glove_file.close()

2)
score = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))

3)
best_word = None
w1= word_to_vec[word_1]
w2= word_to_vec[word_2]
w3= word_to_vec[word_3]

words = word_to_vec.keys()
max_cosine_sim = -100

for w in words:
if w in [word_1, word_2, word_3] :
continue

cosine_sim = cosine_similarity((w2 - w1), (word_to_vec[w] - w3))

if cosine_sim > max_cosine_sim:

max_cosine_sim = cosine_sim
best_word = w

---------------

Which of the following algorithm takes into account the global context of the word
to generate word vectors? GloVe model

Which of the following metrics uses the dot product of two vectors to determine the
similarity? Cosine distance

Which of the following criteria is used by GloVe model to learn the word
embeddings? Reduce the difference between the similarity of two-word vectors and
their co-occurrence value

Which of the following values is passed to sg parameter of gensim Word2Vec() to

generate word vectors using skip gram model? 0

Which of the following values is passed to sg parameter of gensim Word2Vec() to

generate word vectors using CBOW gram model? 1

Which of the following models use co-occurrence matric to generate word vectors?
GloVe model

Which of the following model learns the word embeddings based on the co-occurrence
of the words in the corpus? GloVe model
Which of the following function in Keras is used to add the embedding layer to the
model? Keras.layers.Embedding()

Which of the following is the constructor used in gensim to generate word vectors?
Word2Vec()

-----------------

Data Preprocessing
When it comes to text data, we first remove all kinds of stop words, if necessary,
and then transform each character or words into one hot encoding.
Keras framework has a built-in class called Tokenizer which performs implicit
tokenization and indexing of each words in the document.
It also eliminates special characters in the document.
### collection of text (or corpus)
docs = ["not good", "climax was awesome !", "really liked the movie", "too
lengthy"]

from keras.preprocessing.text import Tokenizer

t = Tokenizer()

### Perform transformation

t.fit_on_texts(docs)

###Output the number of documents in the corpus

t.document_count

###Output the number of occurrence of each word across the document

t.word_counts

###Output the dictionary having word as key and their unique index as values
t.word_index

###Output the dictionary having word as key and number of documents it has appeared
as values
t.word_docs

Lookup Table
Now each word in text data is replaced by their respective index.
To train the LSTM model, you will not directly input the word index to the LSTM
network.
We first initialize the lookup table of shape (vocab_size, vector_length).
We do this in KerasEmbedding class as follows.
from keras.layers import Embedding

embedding_layer = Embedding(vocab_size, vector_length)

Transform Data
Once you have the unique index for each word in the corpus, the corpus has to be
represented as an array of an index in place of words as shown below.
word_to_id = { the: 0, awesome: 1, movie: 2, good:3, was:4 }

data = [["the movie was awesome"]]

transformed_data = [[0, 2, 4, 1]]

Sequence Padding
The length of the movie review is not always determined, it can be too short or too
long.
The model may take a very long time to train if the text data is too long.
For all the reviews we may consider only first few words say 500.
If the text is less than 500words we zeros, in the beginning, to make up the length
to 500 words.
from keras.preprocessing import sequence
max_review_length = 500
sequence.pad_sequences(transformed_data, maxlen=max_review_length)

------------------

Hands On - Sentiment Classification

import numpy as np
vocab_size = 5000
np.load.__defaults__=(None, True, True, 'ASCII')
(X_train, Y_train), (X_test, Y_test) = imdb.load_data(num_words=vocab_size)
np.load.__defaults__=(None, False, True, 'ASCII')
word_to_id = get_word_index()

word_to_id["PAD"] = 0
word_to_id["START"] = 1
word_to_id["UNK"] = 2

from keras_preprocessing import sequence

X_train_pad = sequence.pad_sequences(X_train, maxlen=500, padding='pre',value=0)
X_test_pad = sequence.pad_sequences(X_test, maxlen=500, padding='pre',value=0)

from keras.models import Sequential

from keras.layers import Embedding
from keras.layers.core import Dense, Dropout
from keras.layers.recurrent import LSTM
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(5000,embedding_vector_length))
model.add(LSTM(100))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train_pad, Y_train, validation_data=(X_test_pad, Y_test),

epochs=3,batch_size=64)

-------------

What is meant by beam width in Beam search algorithm? The maximum number of words
to be sampled at a time by decoder

What is the tradeoff between the greedy search and beam search algorithms? Not
Computation time

The functionality of encoder in an encoder-decoder network for machine translation

is __________. Not To generate translated words
Which of the following constructor is used to tokenize and assign unique index to
words in keras? from keras.preprocessing.text.Tokenizer()

Assignment 3 (27 09 2010)
100% (1)
Assignment 3 (27 09 2010)
50 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Word Embedding
No ratings yet
Word Embedding
35 pages
Wordembed
No ratings yet
Wordembed
31 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Gen AI 1
No ratings yet
Gen AI 1
4 pages
WORD EMBEDDING Project
No ratings yet
WORD EMBEDDING Project
15 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Neural Network
No ratings yet
Neural Network
23 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Chapter II
No ratings yet
Chapter II
26 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
NLP2
No ratings yet
NLP2
11 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
07 Word Embeddings Notes
No ratings yet
07 Word Embeddings Notes
23 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
NLP Using Deep Learning
No ratings yet
NLP Using Deep Learning
3 pages
Embeddings
No ratings yet
Embeddings
3 pages
Word Embeddings Notes
No ratings yet
Word Embeddings Notes
9 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
No ratings yet
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
8 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
No ratings yet
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
5 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
NLP Lec 03
No ratings yet
NLP Lec 03
26 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Word2vector Paper PDF
No ratings yet
Word2vector Paper PDF
9 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Lecture 4 Word Representation
No ratings yet
Lecture 4 Word Representation
48 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
Word Vectors I
No ratings yet
Word Vectors I
23 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Part 3
No ratings yet
Part 3
5 pages
Neural Word Embedding As Implicit Matrix Factorization
No ratings yet
Neural Word Embedding As Implicit Matrix Factorization
9 pages
Word Embeddings in NLP - Gunjan Agicha - Medium
No ratings yet
Word Embeddings in NLP - Gunjan Agicha - Medium
5 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Lecture#14
No ratings yet
Lecture#14
38 pages
Unit 2
No ratings yet
Unit 2
15 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Crypto
No ratings yet
Crypto
2 pages
Hyperledger Fabric
No ratings yet
Hyperledger Fabric
2 pages
Internet of Things Prime
No ratings yet
Internet of Things Prime
4 pages
Purview of Icon Design 2
No ratings yet
Purview of Icon Design 2
1 page
Frescoplay Internet of Things Internet of Things Prime
No ratings yet
Frescoplay Internet of Things Internet of Things Prime
2 pages
Structured Data Classification MCQ's
No ratings yet
Structured Data Classification MCQ's
6 pages
Web Control Room Assessment
No ratings yet
Web Control Room Assessment
3 pages
T Factor Software Defined Networking Answers
No ratings yet
T Factor Software Defined Networking Answers
4 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Unittest
No ratings yet
Unittest
5 pages
Module 3
No ratings yet
Module 3
98 pages
00 0 Flow Shop and Job Shop Scheduling
No ratings yet
00 0 Flow Shop and Job Shop Scheduling
17 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Independent Components Analysis
No ratings yet
Independent Components Analysis
26 pages
Iszc 415
No ratings yet
Iszc 415
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
5 pages
Problem Set 1
No ratings yet
Problem Set 1
2 pages
FEM Intro To The Finite Element Method 3rd Edition Reddy
No ratings yet
FEM Intro To The Finite Element Method 3rd Edition Reddy
1 page
Linear Algebra
No ratings yet
Linear Algebra
7 pages
Filtro Hanning
No ratings yet
Filtro Hanning
4 pages
Case Problem 2 Distribution Systems Design
No ratings yet
Case Problem 2 Distribution Systems Design
6 pages
Lab 3 Fourier Transform 2022
No ratings yet
Lab 3 Fourier Transform 2022
8 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
No ratings yet
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
3 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
LAB Sheet # 10: Regression (Least Square Method/linear Model)
No ratings yet
LAB Sheet # 10: Regression (Least Square Method/linear Model)
12 pages
Mathematics Polynomials 3 Eng
No ratings yet
Mathematics Polynomials 3 Eng
27 pages
Operating System Os Notes New Cs 2nd Year
No ratings yet
Operating System Os Notes New Cs 2nd Year
89 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
Heap Sort
No ratings yet
Heap Sort
3 pages
Bresenham Line Drawing Algorithm
No ratings yet
Bresenham Line Drawing Algorithm
15 pages
FP Growth Algorithm Example Problems
No ratings yet
FP Growth Algorithm Example Problems
12 pages
LP Exercise - Alpujarra - Kelompok 4
No ratings yet
LP Exercise - Alpujarra - Kelompok 4
6 pages
Questions On Digital Signal Proseccesing
No ratings yet
Questions On Digital Signal Proseccesing
16 pages
Tugas Penyelesaian Soal Menggunakan Metode Simplex: Iterasi 1
No ratings yet
Tugas Penyelesaian Soal Menggunakan Metode Simplex: Iterasi 1
6 pages
Poster ME4
No ratings yet
Poster ME4
1 page
CLIQUE Algorithm Grid-Based Subspace Clustering
No ratings yet
CLIQUE Algorithm Grid-Based Subspace Clustering
10 pages