Hpa 1
Hpa 1
Science
Natural Language Processing
Homework and Programming Assignment 1
Total Points: 100
Deadline: Sep 9, 2023
2. [Points 10] Determine the number of tokens and vocabulary, and types from the below
text. Please list them in your answer too.
Text: “I came in in the middle of this film so I had no idea about any credits or even its
title till I looked it up here, where I see that it has received a mixed reception by your
commentators. I'm on the positive side regarding this film but one thing really caught my
attention as I watched: the beautiful and sensitive score written in a Coplandesque
Americana style. My surprise was great when I discovered the score to have been written
by none other than John Williams himself.”
3. [points 10] Write down all the steps of text normalization and give an example for each
step.
4. [points 30] We know how to compute similarity distance between two given strings
using the edit distance algorithm.
a. [Points 20] Please write down the distance matrix for the following strings.
Consider space “ “ as a single character.
b. [Points 10] List down all the operations you need to perform. Please show
backtracing matrix to validate your answer for the above example strings.
5. [Points 25] Please formulate your char-language model for the following text. Consider
each character as a single word to formulate your language model. Show the details of
your LM formulation.
Training Text: “aaaa bbb aaa bbb ababab acacac cacacad ccca dcdcdccdddccc cbbcbccb
acac bdbdbd dbdbdb dadaaddadadddaaa ddd ccc bbb cdcdcdcd ccddcd dcdcdcdc”
Kennesaw State University Computer
Science
Testing text: aabcacddbcbbdaadda
a. Unigram language model [Points 15]
b. Compute perplexity of your model [Point 10]
6. [Points 15] You are given a training set of 30 numbers that consists of 21 zeros and 1
each of the other digits 1-9. Now we see the following test set: 0 0 0 0 0 3 0 0 0 0. What
is the unigram perplexity?
Submission Instructions:
Important.
Late submission or Extension: Late HomeWorks/assignment will not be accepted unless an extension is approved
by me in advance. Requests for extensions must be made at least three days before the due date with valid reason. 3
points will be deducted for each day after the submission deadline from your grade even if you are approved for
extension. For details, please see the Homework and Exam Policies section of your syllabus for more details.