0% found this document useful (0 votes)

30 views79 pages

NLP Chapter 2

This document provides an overview of machine learning for natural language processing. It introduces supervised and unsupervised machine learning. For supervised learning, it discusses classification using loan application data as an example. For evaluation, it covers holdout set, cross-validation, and other methods. For unsupervised learning, it introduces clustering and discusses k-means clustering in detail.

Uploaded by

ai20152023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views79 pages

NLP Chapter 2

Uploaded by

ai20152023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Machine learning for Natural

Language Processing

Department of Computing Science

Institute of Technology
Jimma University
Contents
• Introduction
• Supervised and Unsupervised Machine Learning
• Bayesian Networks
Introduction
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which
represent some “past experiences” of an
application domain.
• Machine learning is programming computers to
– optimize a performance criterion using example data or
past experience.
• Learning is the execution of a computer program
to optimize the parameters of a model using the
training data or past experience.
Introduction…
• The model may be
– predictive
• to make predictions in the future, or
– descriptive
• to gain knowledge from data, or both.
Introduction…
• Machine learning uses
– the theory of statistics in building mathematical models,
because the core task is
• making inference from a sample.
• The role of computer science is twofold:
– In training
• efficient algorithms
– to solve the optimization problem,
– to store and process the massive amount of data
– After training
• efficiency
– representation and algorithmic solution for inference .
Introduction…
• Based on data types – Machine learning
could be
– Supervised , or
– Unsupervised
Supervised Machine Learning
• learns from examples.
– Supervision: The data (observations,
measurements, etc.) are labeled with
pre-defined classes.
– It is like that a “teacher” gives the classes
(supervision).
– Test data are classified into these classes too.
• The task is commonly called:
– Supervised learning,
– Classification, or
– Inductive learning
Supervised…
• Data: A set of data records (also called
examples, instances or cases) described by
– k attributes: A1, A2, … Ak.
– a class: Each example is labelled with a
pre-defined class.
• Goal: To learn a classification model from
the data that can be used to predict the
classes of new (future, or test)
cases/instances.
Example
• A credit card company receives thousands of
applications for new cards.
• Each application contains information about an
applicant,
– age
– Marital status
– annual salary
– outstanding debts
– credit rating
– etc.
• Problem: to decide whether an application should
approved, or to classify applications into two
categories, approved and not approved.
Example…
Example…
• Learn a classification model from the data
• Use the model to classify future loan applications into
– Yes (approved) and
– No (not approved)
• What is the class for following case/instance?
Supervised learning process
• Two Steps:
– Learning (training): Learn a model using the
training data
– Testing: Test the model using unseen test data
to assess the model accuracy
What do we mean by learning?
• Given
– a data set D,
– a task T, and
– a performance measure M,
• A computer system is said to learn from D
to perform the task T if after learning the
system’s performance on T improves as
measured by M.
• In other words, the learned model helps the
system to perform T better as compared to
no learning.
Example
• Data: Loan application data
• Task: Predict whether a loan should be
approved or not.
• Performance measure: accuracy.
No learning: classify all future applications
(test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
Fundamental assumption of learning
Assumption: The distribution of training
examples is identical to the distribution of
test examples (including future unseen
examples).
• In practice, this assumption is often violated
to certain degree.
• Strong violations will clearly result in poor
classification accuracy.
• To achieve good accuracy on the test data,
training examples must be sufficiently
representative of the test data.
Evaluation Method
• Predictive accuracy

• Efficiency
– time to construct the model
– time to use the model
• Robustness: handling noise and missing values
• Scalability: efficiency in disk-resident databases
• Interpretability:
– understandable and insight provided by the model
• Compactness of the model: like number of rules.
Evaluation Method…
• Holdout set: The available data set D is divided into
two disjoint subsets,
– the training set Dtrain (for learning a model)
– the test set Dtest (for testing the model)
• Important: training set should not be used in testing
and the test set should not be used in learning.
– Unseen test set provides a unbiased estimate of accuracy.
• The test set is also called the holdout set. (the
examples in the original data set D are all labeled with
classes.)
• This method is mainly used when the data set D is
large.
Evaluation Method…
• n-fold cross-validation:
– The available data is partitioned into n equal-size
disjoint subsets.
– Use each subset as the test set and combine the rest n-1
subsets as the training set to learn a classifier.
– The procedure is run n times, which give n accuracies.
– The final estimated accuracy of learning is the average
of the n accuracies.
– 10-fold and 5-fold cross-validations are commonly
used.
– This method is used when the available data is not
large.
Evaluation Method…
• Leave-one-out cross-validation: This
method is used when the data set is very
small.
• It is a special case of cross-validation
• Each fold of the cross validation has only a
single test example and all the rest of the
data is used in training.
• If the original data has m examples, this is
m-fold cross-validation
Evaluation Method…
• Validation set: the available data is divided into three
subsets,
– a training set,
– a validation set and
– a test set.
• A validation set is used frequently for estimating
parameters in learning algorithms.
• In such cases, the values that give the best accuracy on
the validation set are used as the final parameter
values.
• Cross-validation can be used for parameter estimating
as well.
Evaluation Method…
• Precision and recall measures
– Used in information retrieval and text classification.
– We use a confusion matrix to introduce them.
Evaluation Method…

• Precision p is the number of correctly

classified positive examples divided by the
total number of examples that are classified
as positive.
• Recall r is the number of correctly classified
positive examples divided by the total
number of actual positive examples in the
test set.
Evaluation Method…
• It is hard to compare two classifiers using two
measures.
– F1 score combines precision and recall into one measure

• The harmonic mean of two numbers tends to be closer to the

smaller of the two.
– For F1-value to be large, both p and r much be large.
Unsupervised Machine Learning
• The data have no target attribute.
– We want to explore the data to find some
intrinsic structures in them.
• Example, clustering
Clustering
• Clustering is a technique for finding similarity
groups in data, called clusters. i.e.,
– it groups data instances that are similar to (near) each
other in one cluster and data instances that are very
different (far away) from each other into different
clusters.
• Clustering is often called an unsupervised
learning task as no class values denoting an a
priori grouping of the data instances are given,
which is the case in supervised learning.
Clustering…
• The data set has three natural groups of data
points, i.e., 3 natural clusters.
What is clustering for?
• Let us see some real-life examples
• Example 1: groups people of similar sizes together
to make “small”, “medium” and “large” T-Shirts.
– Tailor-made for each person: too expensive
– One-size-fits-all: does not fit all.
• Example 2: In marketing, segment customers
according to their similarities
– To do targeted marketing.
Aspects of clustering
• A clustering algorithm
– Partitional clustering
– Hierarchical clustering
• A distance (similarity, or dissimilarity) function
• Clustering quality
– Inter-clusters distance ⇒ maximized
– Intra-clusters distance ⇒ minimized
• The quality of a clustering result depends on the
algorithm, the distance function, and the
application.
K-means clustering
• K-means is a partitional clustering algorithm
• Let the set of data points (or instances) D be
{x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in a real-valued
space X ⊆ Rr, and r is the number of attributes
(dimensions) in the data.
• The k-means algorithm partitions the given data
into k clusters.
– Each cluster has a cluster center, called centroid.
– k is specified by the user
K-means algorithm
• Given k, the k-means algorithm works as follows:
1) Randomly choose k data points (seeds) to be
the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current
cluster memberships.
4) If a convergence criterion is not met, go to 2).
K-means algorithm…
Example
Example…
Distance Function
Strengths of k-means
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
– Since both k and t are small. k-means is considered a
linear algorithm.
• K-means is the most popular clustering algorithm.
Weakness of k-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away from
other data points.
– Outliers could be errors in the data recording or some
special data points with very different values.
Hierarchical Clustering
• Produce a nested sequence of clusters, a tree, also
called Dendrogram.
Types of hierarchical clustering
• Agglomerative (bottom up) clustering: It builds the
dendrogram (tree) from the bottom level, and
– merges the most similar (or nearest) pair of clusters
– stops when all the data points are merged into a single
cluster (i.e., the root cluster).
• Divisive (top down) clustering: It starts with all data
points in one cluster, the root.
– Splits the root into a set of child clusters. Each child cluster
is recursively divided further
– stops when only singleton clusters of individual data points
remain, i.e., each cluster with only a single point
Agglomerative clustering
It is more popular then divisive methods.
• At the beginning, each data point forms a
cluster (also called a node).
• Merge nodes/clusters that have the least
distance.
• Go on merging
• Eventually all nodes belong to one cluster
Agglomerative clustering algorithm
An example: working of the algorithm
Measuring the distance of two clusters
• A few ways to measure distances of two clusters.
• Results in different variations of the algorithm.
– Single link
– Complete link
– Average link
Single link method
• The distance between two clusters is the distance
between two closest data points in the two
clusters, one data point from each cluster.
• It can find arbitrarily shaped clusters, but
– It may cause the undesirable “chain effect” by noisy points
Complete link method
• The distance between two clusters is the distance
of two furthest data points in the two clusters.
• It is sensitive to outliers because they are far away
Example
• Let’s examine how these linkage methods work, using the
following small, one-dimensional data set:

Single-linkage agglomerative clustering on the sample data set

Example

Complete-linkage agglomerative clustering on the sample data set

Reading Assignment
• Distance functions
– Euclidean distance
– Manhattan (city block) distance
– Minkowski distance
– Chebychev distance
– Cosine similarity
Bayesian Network
Introduction
Suppose you are trying to determine if a
patient has inhalational anthrax. You
observe the following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty breathing

49
Introduction
You would like to determine how likely
the patient is infected with inhalational
anthrax given that the patient has a
cough, a fever, and difficulty breathing

We are not 100% certain that the patient

has anthrax because of these symptoms.
We are dealing with uncertainty!

50
Introduction

Now suppose you order an x-ray and

observe that the patient has a wide
mediastinum.
Your belief that that the patient is
infected with inhalational anthrax is now
much higher.

51
Introduction

• In the previous slides, what you observed

affected your belief that the patient is
infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Why in fact, we do…

52
Bayesian Networks
HasAnth
rax

HasC HasFe HasDifficultyB HasWideMediasti

ough ver reathing num

• In the opinion of many AI researchers, Bayesian

networks are the most significant contribution in
AI in the last 10 years
• They are used in many applications eg. spam
filtering, speech recognition, robotics, diagnostic
systems and even syndromic surveillance

53
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the event
• For example, the random variable A could
be the event of getting a heads on a coin flip

54
Boolean Random Variables

• We will start with the simplest type of random

variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not occurring
• Examples (Let A be a Boolean random variable):
A = Getting heads on a coin flip
A = It will rain today
A = The Cubs win the World Series in 2007

55
Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an outcome
would be obtained if the process were repeated a large number of times
under similar conditions*

The sum of the red

and blue areas is 1
P(A = true)

*
P(A =
Ahem…there’s also the Bayesian
definition which says probability is false)
your degree of belief in an outcome

56
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which B
is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”
H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) = 1/40
P(H = true | F = true) = 1/2

“Headaches are rare and flu is rarer,

P(H = true) but if you’re coming down with flu
there’s a 50-50 chance you’ll have a
headache.”
57
The Joint Probability Distribution

• We will write P(A = true, B = true) to mean

“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
P(F = true)

P(H = true)
In general, P(X|Y)=P(X,Y)/P(Y)

58
The Joint Probability Distribution
• Joint probabilities can be between A B C P(A,B,C)
any number of variables false false false 0.1
false false true 0.2
eg. P(A = true, B = true, C = true)
false true false 0.05
• For each combination of variables,
false true true 0.05
we need to say how probable that
true false false 0.3
combination is
true false true 0.1
• The probabilities of these true true false 0.05
combinations need to sum to 1 true true true 0.15

Sums to 1

59
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability
false false false 0.1
distribution, you can calculate any
probability involving A, B, and C false false true 0.2
false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
Examples of things you can compute:
true true true 0.15
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)

60
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1
• For k Boolean random false false true 0.2
false true false 0.05
variables, you need a
false true true 0.05
table of size 2k true false false 0.3
• How do we use fewer true false true 0.1
numbers? Need the true true false 0.05
true true true 0.15
concept of
independence

61
Independence

Variables A and B are independent if any of

the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
This says that knowing the outcome of A
does not tell me anything new about the
outcome of B.

62
Independence

How is independence useful?

• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2n
values in the table
• If the coin flips are independent, then
Each P(Ci) table has 2 entries
and there are n of them for a
total of 2n values

63
Conditional Independence

Variables A and B are conditionally

independent given C if any of the following
hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain anything
by knowing A (either because A doesn’t influence B or because
knowing C provides all the information knowing A would give)

64
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A

C D

2. A set of tables for each node in the graph

A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Directed Acyclic Graph

Each node in the graph is a A node X is a parent of another

random variable node Y if there is an arrow from
node X to node Y eg. A is a
parent of B
A

C D

Informally, an arrow from

node X to node Y means X has
a direct influence on Y

66
A Set of Tables for Each Node
A P(A) A B P(B|A)
false 0.6
Each node Xi has a conditional
false false 0.01 probability distribution P(Xi |
true 0.4 false true 0.99 Parents(Xi)) that quantifies the
true false 0.7 effect of the parents on the node
true true 0.3
The parameters are the
B C P(C|B) probabilities in these conditional
false false 0.4 probability tables (CPTs)
false true 0.6 A
true false 0.9
true true 0.1
B
B D P(D|B)

C D false false 0.02

false true 0.98
true false 0.05
true true 0.95
A Set of Tables for Each Node
Conditional Probability
Distribution for C given B

If you have a Boolean variable with k Boolean parents, this table has 2k
probabilities.

68
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables

69
Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its
non-descendants (ND1, ND2)

P P
1 2

N N
X
D1 D2

C C
1 2

70
The Joint Probability Distribution

Due to the Markov condition, we can compute

the joint probability distribution over all the
variables X1, …, Xn in the Bayesian net using
the formula:

Where Parents(Xi) means the values of the Parents of the node Xi with
respect to the graph

71
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95)
A

C D

72
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate: This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95)
A

B
These numbers are from the
conditional probability tables
C D

73
Inference

• Using a Bayesian network to compute

probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)

X = The query variable(s)

74
Inference
HasAnth
rax

HasC HasFe HasDifficultyB HasWideMediasti

ough ver reathing num

• An example of a query would be:

P( HasAnthrax = true | HasFever = true, HasCough = true)
• Note: Even though HasDifficultyBreathing and
HasWideMediastinum are in the Bayesian network, they are not
given values in the query (ie. they do not appear either as query
variables or evidence variables)
• They are treated as unobserved variables

75
The Bad News
• Exact inference is feasible in small to
medium-sized networks
• Exact inference in large networks takes a
very long time
• We resort to approximate inference
techniques which are much faster and give
pretty good results

76
One last unresolved issue…
We still haven’t said where we get the
Bayesian network from. There are two
options:
• Get an expert to design it
• Learn it from data

77
Assignment-1
1. Write a python program that reads two
paragraphs of a document and then:
– segments into list of sentences and write them
on secondary storage device
– segments these sentences into words and write
them on secondary storage device
• Note:
– List of words should be converted into lower
cases and free from any numbers and
punctuation marks
Assignment -2
• 2.
– Take some paragraphs from any sources.
– Write a python program that displays
collocation words.

Submission Date: April 8, 2014

AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Final
No ratings yet
Final
63 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Automatic PPE Monitoring System For Construction W
No ratings yet
Automatic PPE Monitoring System For Construction W
7 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
CS1004 DataMining Unit 4 Notes
No ratings yet
CS1004 DataMining Unit 4 Notes
8 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
深度学习框架下基于网络图像的细粒度识别研究孙泽人
No ratings yet
深度学习框架下基于网络图像的细粒度识别研究孙泽人
112 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Training Test
No ratings yet
Training Test
54 pages
Donalek Classif
No ratings yet
Donalek Classif
69 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Classification
No ratings yet
Classification
53 pages
Wa0007
No ratings yet
Wa0007
48 pages
Unit 5
No ratings yet
Unit 5
77 pages
Module 1
No ratings yet
Module 1
50 pages
Intro To ML
No ratings yet
Intro To ML
107 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Batch - 142 - Minor Final Report
No ratings yet
Batch - 142 - Minor Final Report
48 pages
Week 8
No ratings yet
Week 8
70 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
No ratings yet
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
42 pages
01 - ML - Introduction
No ratings yet
01 - ML - Introduction
65 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Machine
No ratings yet
Machine
61 pages
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
No ratings yet
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
31 pages
Prayag Report
No ratings yet
Prayag Report
39 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
17 pages
DUnit I
No ratings yet
DUnit I
25 pages
Internship Report
No ratings yet
Internship Report
18 pages
Chapter 7 Learning
No ratings yet
Chapter 7 Learning
34 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Decision Tree - 1
No ratings yet
Decision Tree - 1
31 pages
Enhancing Robotic Perception: Using Omniverse Replicator: Synthetic Data Generation
No ratings yet
Enhancing Robotic Perception: Using Omniverse Replicator: Synthetic Data Generation
25 pages
综述缺陷检测
No ratings yet
综述缺陷检测
28 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
A Stacking Ensemble Machine Learning Model For Emergency Call Forecasting
No ratings yet
A Stacking Ensemble Machine Learning Model For Emergency Call Forecasting
18 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
Ai CH4
No ratings yet
Ai CH4
27 pages
AI Midterm Paper
No ratings yet
AI Midterm Paper
2 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
No ratings yet
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
16 pages
Responsible AI in Evidence SynthEsis v.0.9 2
No ratings yet
Responsible AI in Evidence SynthEsis v.0.9 2
18 pages
Empirical Analysis For Crime Prediction and Forecasting Using Machine
No ratings yet
Empirical Analysis For Crime Prediction and Forecasting Using Machine
15 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Update Week 13 Machine Learning Supervised
No ratings yet
Update Week 13 Machine Learning Supervised
21 pages
Delving Into Machine Learning's Influence On Disease Diagnosis and Prediction
No ratings yet
Delving Into Machine Learning's Influence On Disease Diagnosis and Prediction
13 pages
Prompt Injection Defense Task Finetuning
No ratings yet
Prompt Injection Defense Task Finetuning
24 pages
Nissim Et Al (2023)
No ratings yet
Nissim Et Al (2023)
12 pages
Bi Unit 5
No ratings yet
Bi Unit 5
20 pages
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
No ratings yet
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
19 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Lecture#12 DM MS (DEIM) Spring 2025
No ratings yet
Lecture#12 DM MS (DEIM) Spring 2025
21 pages
Intrusion Detection System Based On Pattern Recognition
No ratings yet
Intrusion Detection System Based On Pattern Recognition
9 pages
Suicidal Thought Detection Using NLPNatural Language Processing On Reddit Data
No ratings yet
Suicidal Thought Detection Using NLPNatural Language Processing On Reddit Data
6 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
CT43B0513 Ieee
No ratings yet
CT43B0513 Ieee
6 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Customer Churn Prediction For A Software-As-A-Service Inventory Management Software Company A Case Study in Thailand
No ratings yet
Customer Churn Prediction For A Software-As-A-Service Inventory Management Software Company A Case Study in Thailand
5 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Paper New
No ratings yet
Paper New
6 pages
Laptop Price Prediction in Machine Learning Using Random Forest Classifier Technique
No ratings yet
Laptop Price Prediction in Machine Learning Using Random Forest Classifier Technique
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
A Study of Malicious URL Detection Using Machine Learning and Heuristic Approaches
No ratings yet
A Study of Malicious URL Detection Using Machine Learning and Heuristic Approaches
11 pages

NLP Chapter 2

Uploaded by

NLP Chapter 2

Uploaded by

Machine learning for Natural

Department of Computing Science

• Precision p is the number of correctly

• The harmonic mean of two numbers tends to be closer to the

Single-linkage agglomerative clustering on the sample data set

Complete-linkage agglomerative clustering on the sample data set

We are not 100% certain that the patient

Now suppose you order an x-ray and

• In the previous slides, what you observed

HasC HasFe HasDifficultyB HasWideMediasti

• In the opinion of many AI researchers, Bayesian

• We will start with the simplest type of random

The sum of the red

“Headaches are rare and flu is rarer,

• We will write P(A = true, B = true) to mean

Variables A and B are independent if any of

How is independence useful?

Variables A and B are conditionally

2. A set of tables for each node in the graph

Each node in the graph is a A node X is a parent of another

Informally, an arrow from

C D false false 0.02

Due to the Markov condition, we can compute

• Using a Bayesian network to compute

X = The query variable(s)

HasC HasFe HasDifficultyB HasWideMediasti

• An example of a query would be:

Submission Date: April 8, 2014

You might also like