0% found this document useful (0 votes)

5 views

practice_midterm_solutions

The document outlines the midterm information and topics for UMass CS 485, scheduled for 11/7, which includes language concepts, classification methods, evaluation metrics, and syntax. It provides practice questions with partial solutions covering naive Bayes, logistic regression, and text preprocessing. Additionally, it discusses the importance of exercises and offers insights into the evaluation of classifiers and syntactic structures.

Uploaded by

sahar kamal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

practice_midterm_solutions

Uploaded by

sahar kamal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Midterm practice questions

[with partial solutions]

UMass CS 485, Fall 2023

This version: 11/2

1 Midterm information
The midterm will be in-class during the normal time, on Tuesday 11/7. It’s closed-book,
except you can bring a “cheat sheet,” one page of notes (front and back is fine) that you
write for yourself. Typically the act of writing the notes can be a useful study aid!

2 Topics on the midterm

Anything covered in class, readings, homeworks, or exercises can be on the midterm.
We’re more likely to focus on topics covered in class or referred to in class.
We recommend going through all exercises as practice problems—do them completely,
if you haven’t before.
Topics include the following:

Language concepts

• Regular expressions
• Text normalization, tokenization

Probability, language modeling, classification

• Relative frequency estimation and pseudocount smoothing

• N-gram (Markov) language models
• Naive Bayes
• Logistic regression, binary classification
• Logistic regression, multiclass classification
• Note: gradient derivations will not be included

1
Evaluation / Annotation

• Classification evaluation metrics: false positives/negatives, precision, recall, F1

• Annotator agreement rates

Syntax / Linguistic structure

• Part of speech tags

• BIO tagging
• Constituency grammars and parses
• Dependency parses
• Parsing (CKY)

3 Classification
Question 3.1. Consider training and predicting with a naive Bayes classifier for two doc-
ument classes, and without pseudocounts. The word “booyah” appears once for class 1,
and never for class 0. When predicting on new data, if the classifier sees “booyah”, what
is the posterior probability of class 1?
[Solution: 1]

Question 3.2. For a probabilistic classifier for a binary classification problem, consider
the prediction rule to predict class 1 if P (y = 1|x) > t, and predict class 0 otherwise. This
assumes some threshold t is set. If the threshold t is increased,

(a) Does precision tend to increase, decrease, or stay the same? [Solution: increase]

(b) Does recall tend to increase, decrease, or stay the same? [Solution: decrease]

Classification example
Here’s a naive Bayes model with the following conditional probability table (each row is
that class’s unigram language model):

word type a b c
P (w | y = 1) 5/10 3/10 2/10
P (w | y = 0) 2/10 2/10 6/10

and the following prior probabilities over classes:

P (y = 1) P (y = 0)
8/10 2/10

Naive Bayes
Consider a binary classification problem, for whether a document is about the end of the
world (class y = 1), or it is not about the end of the world (class y = 0).

Question 3.3. Consider a document consisting of 2 a’s, and 1 c.

Note: In this practice and on the midterm, you do not need to convert to decimal or
simplify fractions. You may find it easier to not simplify the fractions. On the midterm,
we will not penalize simple arithmetic errors. Please show your work.

(a) What is the probability that it is about the end of the world?

(b) What is the probability it is not about the end of the world?

Question 3.4. Now suppose that we know the document is about the end of the world
(y = 1).

(a) True or False, the naive Bayes model is able to tell us the probability of seeing the
document w ~ = (a, a, b, c) under the model.

(b) If True, what is the probability?

Logistic Regression
Consider a logistic regression model for this same problem (y = 1 means the document is
about the end of the world), with three features. The model has weights β = (0.5, 0.25, 1).
Note: for this problem you will be exponentiating certain quantities. You do not need to write
out your answer as a number, but instead in terms of exp() values, e.g., P = 1 + 2exp(−1).

Question 3.5. A given document has feature vector x = (1, 4, 0).

(a) What is the probability that the document is about the end of the world ?

(b) What is the probability that it is not about the end of the world?

Question 3.6. Now suppose that we know the document is about sports (y = 1).

(a) True or False, the logistic regression model is able to tell us the probability of seeing
x = (1, 1, 2) under the model.

(b) If True, what is the probability? (again, answer in terms of exp() values).
Question 3.7. Consider a logistic regression model with weights β = (β1 , β2 , β3 ). A given
document has feature vector x = (1, −2, −1).

1. What is a value of the vector β such that the probability of the document being
about the end of the world is 1 (or incredibly close)? [Solution: (1000, 0, 0) will do.
Causes a large z = β 0 x and thus large p(y = 1 | x) = 1/(1 + exp(−1000) and e−1000 is
really tiny!]

2. What is a value of the vector β such that the probability of the document being about
the end of the world is 0 (or incredibly close)? [Solution: (−1000, 0, 0) will do.]

Question 3.8. Show the two standard definitions of the logistic sigmoid function are
equivalent.

Question 3.9. In Naive Bayes, if you increase the pseudocount hyperparameter, does your
model tend to underfit or overfit more? [Solution: underfit]

Question 3.10. In logistic regression, if you increase the L2 norm regularization hyper-
parameter, does your model tend to underfit or overfit more? [Solution: underfit. This
is referring to the λPwhen the learning problem is P penalized likelihood learning, either
maxβ loglik(β) − λ( j βj2 )2 or minβ −loglik(β) + λ( j βj2 )2 and loglik(β) = i log pβ (yi |
P
xi ). ]

4 Evaluation
Question 4.1. INLP chapter 4, Exercise 2 [Note: too hard to be a test question]

Question 4.2. INLP chapter 4, Exercise 3

5 Misc
Question 5.1. Please write a regular expression to match any word that has 3 or more
instances of the same vowel in a row, like sooooo or haaaaha. (Assume there are 5 vowels:
a, e, i, o, u.)

Question 5.2. Consider training a supervised document classifier for sentiment, and com-
pare it to a lexicon counting classifier. If you have a very low number of labeled docu-
ments, which model do you expect to be better? If you have a very high number of labeled
documents, which model do you expect to be better? Why?
6 Text preprocessing
Question 6.1. What is the difference between tokenization and word normalization? (Not
vector norms or probability normalization.) Please list a few examples of word normal-
ization.

[Solution: Tokenization refers to breaking text into word-sized pieces. Strictly speak-
ing, tokenization preserves the original string of each word. Word normalization refers to
processing to collapse different tokens into the same word type: for example, lowercas-
ing, replacing numbers with a special symbols, stemming, lemmatization, etc.]

Question 6.2. Why is word normalization used?

[Solution: To reduce sparsity]

Question 6.3. (A) What is the difference between lemmatization and stemming? (B) Give
a justification why lemmatization may be preferred to stemming. (C) Give a justification
why stemming may be preferred to lemmatization.

[Solution: Stemming refers to any algorithm that attempts to remove affixes from a
word. Lemmatization refers to a smarter stemmer that uses grammatical information to
help determine the root word form. Lemmatizers are much more accurate (just look at the
output of the Porter stemmer!), but they require more resources, such as a part of speech
tagger, and can be slower to run.]

Question 6.4. What’s a pro and a con of using n-gram features, as opposed to bag-of-
words features, for a classifier?

[Solution: It’s the usual sparsity tradeoff you see all over NLP. N-grams have more
specific meaning so you get stronger information (e.g. “not good” or “social security” are
much different than the unigram features they produce), but at the cost of sparsity (the
new features may be rare, so there are fewer examples to training on, and they may not
occur much at runtime or in test data).]

7 More questions
Question 7.1. What’s a useful thing you can calculate with NB, without having to calcu-
late p(x)? [Solution: most likely MAP class. or, prob of text given a label.]

Question 7.2. What’s a useful thing you can calculate with NB, but requires you to calcu-
late p(x)? [Solution: probability of a particular class. probability of classes.]

Question 7.3. For Naive Bayes with many classes, consider the case where you only care
about the ratio between the posterior probabilities for two classes, say, class C and D.
Demonstrate (and show your work) that you do not need to calculate the Bayes Rule
normalizer p(x) to calculate this posterior ratio,
p(y = C | x)
p(y = D | x)
Question 7.4. Say you have NB for a binary classification problem. You retrain the
model lots of times, and each time you make the pseudocount hyperparameter higher
and higher. With each model you do predictions on new data. What happens to Naive
Bayes predicted document posteriors as the pseudocount goes higher? [HINT: you can
just do this intuitively. It may help to focus on the P (w|y) terms. A rigorous, if overkill,
approach, is to use L’Hospital’s rule.]
(a) They all become either 0 or 1.
(b) They all become 0.5.
(c) They all become the class prior.
(d) There is no stable trend in all situations.
[Solution: They all become the prior. the easy way to see this is, imagine a giant alpha
like a million or a zillion. For any word w,
nw,y + α nw,y + 1, 000, 000 α 1
p(w|y) = = → =
ny + V α ny + V 1, 000, 000 Vα V
where nw,y is the number of tokens among doc class y that are wordtype w, and ny is the
number of tokens for doc class y. those two numbers are dominated by the giant α, which
causes all words to have the same uniform probability.]
Question 7.5. In the typical case for English, how does the number of parameters com-
pare between BOW versus a model where features are counts of character 10-grams?
(a) BOW has more parameters than character 10-grams
(b) Character 10-grams has more parameters than BOW
(c) They are the same
[Solution: Char 10-grams has more. While some English words are longer than length
10, the average is less, and common function words are often very short. For example, “I
had a fe” includes more than 3 tokens.]
Question 7.6. What is an issue if you want to apply BOW to Chinese documents?
[Solution: Word segmentation—it’s a nontrivial processing step in itself. This is why
character n-grams are a popular feature method for languages that don’t use whitespace
tokenization conventions, including Chinese, Japanese, and Korean.]
Question 7.7. Consider an annotation task with 5 items and 2 annotators, for binary clas-
sification. Both annotators annotated all items. Draw up a 5×2 matrix of their annotations
and fill in any values you like, as long as agreement is less than 100%. For all the follow-
ing questions, show your work. (A) What is the agreement rate? (B) What is the random
chance agreement rate? (Use the overall prevalence of classes among all annotations.) (C)
Calculate Cohen’s kappa.

Question 7.8. What is the range of possible values of Cohen’s kappa?

Question 7.9. Give an example of a task where Cohen’s kappa might be high, and one
where it might be low. Why the difference?

8 Syntax
Question 8.1. Constituency and dependency trees focus on different aspects of a sen-
tence’s syntactic structure. What does a constituency tree focus on? What does a depen-
dency tree focus on?

Question 8.2. Draw a lexicalized constituency tree for an example sentence. (For exam-
ple, use Figure 12.11 in SLP3.) Draw the unlabeled dependency tree it corresponds to.

Question 8.3. Give a simple CFG that can parse the following POS-tagged sentence, with
an analysis conforming to the standard type of grammar used in your readings (broadly
similar to the Penn Treebank style). You can exclude unary expansions from POS tags to
the lexicon; assume POS tags are given to the parser as input. Draw the corresponding
parse tree.
(PRP I) (VB run) (ADJ fast)

Question 8.4. Amend your CFG so it can also parse this sentence. Draw the correspond-
ing parse tree.
(PRP I) (VB run) (ADJ fast) (IN on) (NNPS Mondays)

Question 8.5. CFGs: Eisenstein INLP textbook questions 9.8, 9.9 (pg. 233, pdf pg. 241)

Question 8.6. CFG parsing: Eisenstein INLP textbook questions 10.1–10.4 (pg. 253, pdf
pg. 271)

Question 8.7. PCFGs: Eisenstein INLP textbook question 10.7, and possibly 10.8 (pg. 254,
pdf pg. 272).

Homework3 Sol
No ratings yet
Homework3 Sol
5 pages
The Induction Book
From Everand
The Induction Book
Steven H. Weintraub
No ratings yet
Graph Terminology
No ratings yet
Graph Terminology
4 pages
LPI Spa Side Owner's Manual (Catalina) PDF
No ratings yet
LPI Spa Side Owner's Manual (Catalina) PDF
24 pages
Dork List
No ratings yet
Dork List
100 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
CS388N Practice Questions Answers
No ratings yet
CS388N Practice Questions Answers
48 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
Qta Lse Day5.PDF
No ratings yet
Qta Lse Day5.PDF
62 pages
Week4
No ratings yet
Week4
45 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
Ai Lecture22
No ratings yet
Ai Lecture22
32 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
5_LR_Apr_7_2021 (3)
No ratings yet
5_LR_Apr_7_2021 (3)
93 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 2
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 2
8 pages
CS_221_Fall_19_Solution
No ratings yet
CS_221_Fall_19_Solution
30 pages
AI Answers
No ratings yet
AI Answers
4 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
Classification
No ratings yet
Classification
81 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Introduction. Binary Classification and Bayes Optimal Classifier
No ratings yet
Introduction. Binary Classification and Bayes Optimal Classifier
7 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
05_NaiveBayesAndSentimentClassification
No ratings yet
05_NaiveBayesAndSentimentClassification
36 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
5
No ratings yet
5
25 pages
ISYE6740_Fall2024_HW4_Rubric
No ratings yet
ISYE6740_Fall2024_HW4_Rubric
5 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
04_N-gram Language Models
No ratings yet
04_N-gram Language Models
41 pages
3 Classification 1
No ratings yet
3 Classification 1
55 pages
ML Unit 2
No ratings yet
ML Unit 2
66 pages
ESGB - Naive Bayes and Logistic Regression
No ratings yet
ESGB - Naive Bayes and Logistic Regression
36 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Week 3 - Lecture Slides - Logistic Regression
No ratings yet
Week 3 - Lecture Slides - Logistic Regression
54 pages
b9c50b5b58d240169f8bec65f9d6589b
No ratings yet
b9c50b5b58d240169f8bec65f9d6589b
61 pages
I239-5 Naive Bayes
No ratings yet
I239-5 Naive Bayes
35 pages
Over Fitting and TBL
No ratings yet
Over Fitting and TBL
46 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
WSDM 1 31 15
No ratings yet
WSDM 1 31 15
108 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
CS 904: Natural Language Processing Statistical Inference: N-Grams
No ratings yet
CS 904: Natural Language Processing Statistical Inference: N-Grams
30 pages
Lecture 05
No ratings yet
Lecture 05
45 pages
anlp-02-wordrep-textclass
No ratings yet
anlp-02-wordrep-textclass
59 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
25 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
25 pages
hw6 Solution PDF
No ratings yet
hw6 Solution PDF
11 pages
ML Hw1
No ratings yet
ML Hw1
2 pages
Lec4 Logistic Regression
No ratings yet
Lec4 Logistic Regression
12 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
DSCI303-18 NaiveBayes
No ratings yet
DSCI303-18 NaiveBayes
44 pages
Solutions Manual to accompany An Introduction to Numerical Methods and Analysis
From Everand
Solutions Manual to accompany An Introduction to Numerical Methods and Analysis
James F. Epperson
5/5 (1)
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
frai-07-1345445
No ratings yet
frai-07-1345445
19 pages
CS361 Artificial Intelligence (SEP) Lecture 2 (Intelligent Agents _ AI Related Disciplines) Fall 2020
No ratings yet
CS361 Artificial Intelligence (SEP) Lecture 2 (Intelligent Agents _ AI Related Disciplines) Fall 2020
61 pages
ca20-part02-nlp
No ratings yet
ca20-part02-nlp
60 pages
model C
No ratings yet
model C
2 pages
Lec - 3 - Image Enhancement in Spatial Domain - 3
No ratings yet
Lec - 3 - Image Enhancement in Spatial Domain - 3
28 pages
4 14755 CS213 20172018 1 2 1 Lecture 2
No ratings yet
4 14755 CS213 20172018 1 2 1 Lecture 2
35 pages
L7b Logic
No ratings yet
L7b Logic
9 pages
L7b Logic 5 9
No ratings yet
L7b Logic 5 9
5 pages
assignment1_Advanced Python_UHasselt
No ratings yet
assignment1_Advanced Python_UHasselt
3 pages
Week 8 - Graded + Practice Assignment
No ratings yet
Week 8 - Graded + Practice Assignment
12 pages
Customizing BricsCAD V17
100% (1)
Customizing BricsCAD V17
560 pages
User-Friendly and Communicative!: Altivar 312
No ratings yet
User-Friendly and Communicative!: Altivar 312
11 pages
Dam Exercise Using Cadam
No ratings yet
Dam Exercise Using Cadam
8 pages
Cntlplan1 Week 9 Johnson Anderson
No ratings yet
Cntlplan1 Week 9 Johnson Anderson
7 pages
Oh The Microservices You LL Build Learn Microservices From Zero To Hero
No ratings yet
Oh The Microservices You LL Build Learn Microservices From Zero To Hero
13 pages
Army CID Wikileaks Hunt
No ratings yet
Army CID Wikileaks Hunt
2 pages
A Review on the Role of Big Data Analytics in The
No ratings yet
A Review on the Role of Big Data Analytics in The
8 pages
Arduino Water Pressure Sensor Project, Water Level Pressure Sensor
100% (1)
Arduino Water Pressure Sensor Project, Water Level Pressure Sensor
21 pages
Sensors and Actuators
100% (1)
Sensors and Actuators
10 pages
03-2. Brainstorming Creative Concepts - Foundr
No ratings yet
03-2. Brainstorming Creative Concepts - Foundr
2 pages
4020918-1300SRM1435-(09-2014)-UK-EN-APC200 código de falha
No ratings yet
4020918-1300SRM1435-(09-2014)-UK-EN-APC200 código de falha
42 pages
X 64 DBG
No ratings yet
X 64 DBG
239 pages
ENTERPRISE RESOURCE PLANNING - Preeti
No ratings yet
ENTERPRISE RESOURCE PLANNING - Preeti
11 pages
The Push Pull Converter
No ratings yet
The Push Pull Converter
2 pages
Resume Gyan Karn Dark
No ratings yet
Resume Gyan Karn Dark
1 page
Resetear RRU - Moshell
No ratings yet
Resetear RRU - Moshell
6 pages
Guide For Refund of Bid Security
50% (2)
Guide For Refund of Bid Security
2 pages
Berger Silk Glamour 2019 SC
No ratings yet
Berger Silk Glamour 2019 SC
2 pages
Forte Hotel Case (Conjoint) PDF
No ratings yet
Forte Hotel Case (Conjoint) PDF
6 pages
An Introduction To Trigger in DB2 For OS
100% (4)
An Introduction To Trigger in DB2 For OS
7 pages
Dedication Preface Chapter 1: Introduction To Autodesk Revit Architecture 2012
No ratings yet
Dedication Preface Chapter 1: Introduction To Autodesk Revit Architecture 2012
12 pages
HP® Codemaster™ XL: Specifications
No ratings yet
HP® Codemaster™ XL: Specifications
1 page
DX Diag
No ratings yet
DX Diag
30 pages
Teaching Notes
No ratings yet
Teaching Notes
4 pages
Assessment 2
No ratings yet
Assessment 2
5 pages

practice_midterm_solutions

Uploaded by

practice_midterm_solutions

Uploaded by

Midterm practice questions

[with partial solutions]

UMass CS 485, Fall 2023

2 Topics on the midterm

Probability, language modeling, classification

• Relative frequency estimation and pseudocount smoothing

• Classification evaluation metrics: false positives/negatives, precision, recall, F1

Syntax / Linguistic structure

• Part of speech tags

and the following prior probabilities over classes:

Question 3.3. Consider a document consisting of 2 a’s, and 1 c.

(b) If True, what is the probability?

Question 3.5. A given document has feature vector x = (1, 4, 0).

Question 4.2. INLP chapter 4, Exercise 3

Question 6.2. Why is word normalization used?

[Solution: To reduce sparsity]

Question 7.8. What is the range of possible values of Cohen’s kappa?

You might also like