Module 1.2
Module 1.2
Real-World Example:
Regex Examples
Example: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-
Z|a-z]{2,}\b
Example: ^\d{3}-\d{3}-\d{4}$
Example: \b\d{2}/\d{2}/\d{4}\b
Detailed breakdown of how each example matches the
target text.
Real-World Example:
Understanding Words
Real-World Example:
Real-World Example:
Introduction to Corpora
Real-World Example:
Types of Corpora
: "Types of Corpora"
Real-World Example:
Text Normalization
Lowercasing
Removing Punctuation
"Removing Punctuation"
Stemming
"Stemming"
Lemmatization
: "Lemmatization"
One-Hot Encoding
o
In this example, each word is represented by a vector
that is as long as the number of unique words in the
vocabulary. The position of '1' in each vector
corresponds to the index of the word in the vocabulary
list. Although this method is straightforward, it does
not capture any semantic relationships between the
words; all words are equally distant from each other.
: "Word Embeddings"
o Definition: Distributed representations of words in a
continuous vector space where semantically similar words
are closer together.
o Purpose: To capture the context and meaning of words,
allowing for more sophisticated analysis.
o Example:
Words like "king," "queen," "man," and "woman"
might be positioned such that the vector difference
between "king" and "queen" is similar to "man" and
"woman."
Example: Word2Vec
o Example:
Example: GloVe
Definition: GloVe stands for Global Vectors and is trained on global
word-word co-occurrence statistics.
o Example:
o
Word embeddings are trained on a fixed vocabulary,
so words not in the training set are problematic.
Additionally, words with multiple meanings might
have a single vector that doesn't capture all their uses,
leading to errors in understanding context.
Cosine Similarity
o Example:
Vectors:
A = [1, 2, 3]
B = [4, 5, 6]
o Example:
Document 1: [1, 1, 0, 0, 1, 0]
Document 2: [0, 1, 1, 1, 1, 0]
Word2Vec
Introduction to Word2Vec
Advantages of Word2Vec
: "Advantages of Word2Vec"
:
o Semantic Relationships: Captures relationships between
words that go beyond simple co-occurrence.
o Example:
o Example:
o Example:
Example:
Example: