0% found this document useful (0 votes)

18 views38 pages

AP For NLP-LO2

Uploaded by

shahbazhassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views38 pages

AP For NLP-LO2

Uploaded by

shahbazhassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Advanced Python for NLP

Word embedding

GloVe (Global Vectors for Word Representation)

is a popular unsupervised machine learning algorithm for learning high-dimensional vector

representations (embeddings) of words based on their co-occurrence statistics in a corpus. The
algorithm was introduced by researchers at Stanford University in 2014 and has since become one
of the most widely used methods for creating word embeddings.

The basic idea behind GloVe is to create a vector space in which each word is represented by a
dense vector of real-valued numbers. This vector should capture some of the semantic and syntactic
information associated with the word, so that words that are similar in meaning or that tend to occur
in similar contexts will have similar vector representations.
Word embedding
The training process for GloVe involves constructing a co-occurrence matrix that records the
number of times each word appears in the same context as every other word in the corpus. A
"context" here refers to a fixed number of words on either side of the target word, which are
used to define its context.

Once the co-occurrence matrix is constructed, GloVe applies a process of matrix factorization
to learn the word embeddings. The goal of this process is to find a low-dimensional
representation of the co-occurrence matrix that preserves the relative frequencies of word co-
occurrences. The resulting word embeddings are dense, real-valued vectors that capture the
meaning and context of each word.
SVM—Support Vector Machines
• A relatively new classification method for both linear and
nonlinear data
• It uses a nonlinear mapping to transform the original training
data into a higher dimension
• With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”)
• With an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated
by a hyperplane
• SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by the
support vectors)
4
SVM—History and Applications
• Vapnik and colleagues (1992)—groundwork from Vapnik &
Chervonenkis’ statistical learning theory in 1960s
• Features: training can be slow but accuracy is high owing
to their ability to model complex nonlinear decision
boundaries (margin maximization)
• Used for: classification and numeric prediction
• Applications:
• handwritten digit recognition, object recognition,
speaker identification, benchmarking time-series
prediction tests
5
Let’s Learn This Like a Five Year
Old

We have 2 colors of balls on

the table that we want to
separate.
We get a stick and put it on
the table, this works pretty
well right?
Some villain comes and places
more balls on the table, it kind
of works but one of the balls is
on the wrong side and there is
probably a better place to put
the stick now.
SVMs try to put the stick
in the best possible
place by having as big a
gap on either side of the
stick as possible.
Now when the villain
returns the stick is still in a
pretty good spot.
There is another trick in the
SVM toolbox that is even
more important. Say the
villain has seen how good
you are with a stick so he
gives you a new challenge.
There’s no stick in the world
that will let you split those
balls well, so what do you
do? You flip the table of
course! Throwing the balls
into the air. Then, with your
pro ninja skills, you grab a
sheet of paper and slip it
between the balls.
Now, looking at the balls from
where the villain is standing,
the balls will look split by some
curvy line.
Let’s Go Little
Deeper
Support Vectors are attempting to
find a hyperplane that divides the
two classes with the largest margin.
The support vectors are the points
which fall within this margin.

Here margin means the distance

from the
closest elements to the hyperplane .
A
Hyperplan
e
SVM—General Philosophy

Small Margin Large Margin

Support Vectors

16
SVM—Margins and Support Vectors

Data Mining: Concepts and

December 14, 2024 Techniques 17
SVM—When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training
tuples associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but
we want to find the best one (the one that minimizes classification
error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e.,
maximum marginal hyperplane (MMH)
18
SVM—Linearly Separable
 A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar
(bias)
 For 2-D it can be written as
w 0 + w 1 x1 + w 2 x2 = 0
 The hyperplane defining the sides of the margin:
H 1: w 0 + w 1 x1 + w 2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
 Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
 This becomes a constrained (convex) quadratic
optimization problem: Quadratic objective function and
linear constraints  Quadratic Programming (QP)  19
Why Is SVM Effective on High Dimensional Data?

 The complexity of trained classifier is characterized by the

# of support vectors rather than the dimensionality of the
data
 The support vectors are the essential or critical training
examples —they lie closest to the decision boundary (MMH)
 If all other training examples are removed and the training is
repeated, the same separating hyperplane would be found
 The number of support vectors found can be used to
compute an (upper) bound on the expected error rate of the
SVM classifier, which is independent of the data
dimensionality
 Thus, an SVM with a small number of support vectors can
20
A2

SVM—Linearly Inseparable
 Transform the original input data into a higher
A1
dimensional space

 Search for a linear separating hyperplane in the

new space
21
SVM: Different Kernel functions
 Instead of computing the dot product on the
transformed data, it is math. equivalent to applying
a kernel function K(Xi, Xj) to the original data, i.e.,
K(Xi, Xj) = Φ(Xi) Φ(Xj)
 Typical Kernel Functions

 SVM can also be used for classifying multiple (> 2)

classes and for regression analysis (with additional
22
Classification
The Classification algorithm is a Supervised Learning
technique that is used to identify the category of new
observations on the basis of training data.
In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups.
Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc. Classes can be called as targets/labels
or categories.
In classification algorithm, a discrete output function(y) is mapped to input
variable(x).
y=f(x), where y = categorical output
• Classification is a two-step process, learning step and
prediction step. In the learning step, the model is
developed based on given training data. In the prediction
step, the model is used to predict the response for given
data.
• The algorithm which implements the classification on a
dataset is known as a classifier. There are two types of
Classifications:
• Binary Classifier: If the classification problem has only two
possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
• Multi-class Classifier: If a classification problem has more than
two outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of
types of music.
Use cases of Classification Algorithms

Classification algorithms can be used in different places.

• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.
• As a marketing manager, you want a set of customers who are most
likely to purchase your product.
• classifying customers into a group of potential and non-potential
customers or safe or risky loan applications
Decision Tree Algorithm

• A decision tree is a flowchart-like tree structure where an

internal node represents feature(or attribute), the branch
represents a decision rule, and each leaf node represents the
outcome.
• The topmost node in a decision tree is known as the root
node. It learns to partition on the basis of the attribute value.
• It partitions the tree in recursively manner call recursive
partitioning. This flowchart-like structure helps you in decision
making.
• It's visualization like a flowchart diagram which easily mimics
the human level thinking. That is why decision trees are easy
to understand and interpret.
Decision Tree Induction: An Example

❑ Training data set: Buys_computer

❑ Resulting tree:

age
?
<=3 overcas
31..40 >4
0 t 0
student ye credit
? s rating?
no ye excellen fair
s t
n ye n ye
o s o s 27
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer
manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are
discretized in advance)
• Examples are partitioned recursively based on selected
attributes
• Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
• There are no samples left
28
Attribute Selection Measure:
Information Gain (ID3/C4.5)
■ Select the attribute with the highest information gain
■ Let pi be the probability that an arbitrary tuple in D belongs to class C i, estimated by |
Ci, D|/|D|
■ Expected information (entropy) needed to classify a tuple in D:

■ Information needed (after using A to split D into v partitions) to classify D:

■ Information gained by branching on attribute A

31
Attribute Selection: Information Gain

g Class P: buys_computer = “yes”

g Class N: buys_computer = “no”

means “age <=30” has 5 out of 14

samples, with 2 yes’es and 3 no’s. Hence

Similarly,

32
Gain Ratio for Attribute Selection (C4.5)
• Information gain measure is biased towards attributes with a large number
of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the problem
(normalization to information gain)

• GainRatio(A) = Gain(A)/SplitInfo(A)
• Ex.

• gain_ratio(income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as the splitting
attribute

33
Gini Index (CART, IBM IntelligentMiner)
• If a data set D contains examples from n classes, gini index,
gini(D) is defined as

where pj is the relative frequency of class j in

D
• If a data set D is split on A into two subsets D1 and D2, the
gini index gini(D) is defined as

• Reduction in Impurity:

• The attribute provides the smallest ginisplit(D) (or the largest

reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute)
34
Comparing Attribute Selection Measures
• The three measures, in general, return good results but
• Information gain:
• biased towards multivalued attributes
• Gain ratio:
• tends to prefer unbalanced splits in which one partition is much
smaller than the others
• Gini index:
• biased to multivalued attributes
• has difficulty when # of classes is large
• tends to favor tests that result in equal-sized partitions and purity in
both partitions

35
Advantages of the Decision Tree
• It is simple to understand as it follows the same process
which a human follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a
problem.
• There is less requirement of data cleaning compared to
other algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it
complex.
• It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
• For more class labels, the computational complexity of the
decision tree may increase.
THANKS!
Any questions?

Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Pranic Feng Shui Technique by Elshaddai Pranic Healing Center
50% (6)
Pranic Feng Shui Technique by Elshaddai Pranic Healing Center
10 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
VO MCA S4 Data Mining Unit 6
No ratings yet
VO MCA S4 Data Mining Unit 6
21 pages
SVM7
No ratings yet
SVM7
53 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
08 Classification
No ratings yet
08 Classification
46 pages
Machine Learning Answer Bank
No ratings yet
Machine Learning Answer Bank
54 pages
Algorithm of Neural Network M4
No ratings yet
Algorithm of Neural Network M4
25 pages
Piaggio Report
33% (3)
Piaggio Report
66 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
SVM Unit3
No ratings yet
SVM Unit3
23 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Tripura Tallika Deities of PKS March 2024
No ratings yet
Tripura Tallika Deities of PKS March 2024
69 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Contact Details of TBIs
No ratings yet
Contact Details of TBIs
45 pages
Lecture - 7 Classification (SVM)
No ratings yet
Lecture - 7 Classification (SVM)
48 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
Experiment # 10
No ratings yet
Experiment # 10
10 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
Hearst SVM
No ratings yet
Hearst SVM
12 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
The Impact of Instructional Materials On Students
No ratings yet
The Impact of Instructional Materials On Students
74 pages
Mental Health Nursing
100% (4)
Mental Health Nursing
10 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
SVM Notes
No ratings yet
SVM Notes
4 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
SVM Algorithm
No ratings yet
SVM Algorithm
17 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
SVM
No ratings yet
SVM
4 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
SVM
No ratings yet
SVM
11 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
This Is
No ratings yet
This Is
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
A Pragmatic Study of Yoruba Proverbs in English-90
No ratings yet
A Pragmatic Study of Yoruba Proverbs in English-90
10 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Compiled Quizzes LM
No ratings yet
Compiled Quizzes LM
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
Algorithm Thesis Proposal
100% (2)
Algorithm Thesis Proposal
4 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
Parent/Guardian'S Signature: Attendance Record Department of Education
No ratings yet
Parent/Guardian'S Signature: Attendance Record Department of Education
2 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Graduands Names As Per Ids 1
No ratings yet
Graduands Names As Per Ids 1
68 pages
Oak Hill SH PDF
No ratings yet
Oak Hill SH PDF
29 pages
Quiz Sts
No ratings yet
Quiz Sts
54 pages
Lloyds British - Scope of Work Training
No ratings yet
Lloyds British - Scope of Work Training
23 pages
4-H News: Macoupin County
No ratings yet
4-H News: Macoupin County
15 pages
Kylie Neil Resume
No ratings yet
Kylie Neil Resume
2 pages
Bie Sample Entrance Exam 2021
No ratings yet
Bie Sample Entrance Exam 2021
4 pages
Do One Thing Every Day That Scares You.
No ratings yet
Do One Thing Every Day That Scares You.
2 pages
SBM Eform v.2 2
No ratings yet
SBM Eform v.2 2
7 pages
Students Perceptions of Mathematics Writing
No ratings yet
Students Perceptions of Mathematics Writing
21 pages
Defining Popular Education
No ratings yet
Defining Popular Education
16 pages
A Study On The Effectiveness of Training Program of OIL
No ratings yet
A Study On The Effectiveness of Training Program of OIL
7 pages
Kanishka Goonewardena - Space-Time-Revolution - Spring 2012 JPG1503 Syllabus
No ratings yet
Kanishka Goonewardena - Space-Time-Revolution - Spring 2012 JPG1503 Syllabus
3 pages
Emu
No ratings yet
Emu
6 pages
Global University Rankings and The Mediatization of Higher Education 2019 (Book Review
No ratings yet
Global University Rankings and The Mediatization of Higher Education 2019 (Book Review
3 pages
Traducida A Ingles de ACTIVIDAD GUIA 2
No ratings yet
Traducida A Ingles de ACTIVIDAD GUIA 2
5 pages
Lesson Plan: Ratios and Proportions
No ratings yet
Lesson Plan: Ratios and Proportions
2 pages
Survey Questions (G3)
No ratings yet
Survey Questions (G3)
5 pages
Working Through Sadness: Check Your Thinking
No ratings yet
Working Through Sadness: Check Your Thinking
3 pages
MATH F341 - Jhuma
No ratings yet
MATH F341 - Jhuma
3 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

AP For NLP-LO2

Uploaded by

AP For NLP-LO2

Uploaded by

Advanced Python for NLP

GloVe (Global Vectors for Word Representation)

is a popular unsupervised machine learning algorithm for learning high-dimensional vector

We have 2 colors of balls on

Here margin means the distance

Small Margin Large Margin

Data Mining: Concepts and

 The complexity of trained classifier is characterized by the

 Search for a linear separating hyperplane in the

 SVM can also be used for classifying multiple (> 2)

Classification algorithms can be used in different places.

• A decision tree is a flowchart-like tree structure where an

❑ Training data set: Buys_computer

■ Information needed (after using A to split D into v partitions) to classify D:

■ Information gained by branching on attribute A

g Class P: buys_computer = “yes”

means “age <=30” has 5 out of 14

• gain_ratio(income) = 0.029/1.557 = 0.019

where pj is the relative frequency of class j in

• The attribute provides the smallest ginisplit(D) (or the largest

You might also like