0% found this document useful (0 votes)
15 views6 pages

Exam Preparation Questions PCL I 2022

The document provides questions for an exam on natural language processing techniques. It includes questions about UNIX commands, regular expressions, Python programming concepts like lists and dictionaries, and NLTK functions. The exam covers material from lectures on introductory topics through more advanced NLP modeling techniques.

Uploaded by

Richard Salnikov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Exam Preparation Questions PCL I 2022

The document provides questions for an exam on natural language processing techniques. It includes questions about UNIX commands, regular expressions, Python programming concepts like lists and dictionaries, and NLTK functions. The exam covers material from lectures on introductory topics through more advanced NLP modeling techniques.

Uploaded by

Richard Salnikov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Exam Preparation Questions

Programmiertechniken der Computerlinguistik I (PCL-I)


Universität Zürich Fall Semester 2022
Simon Clematide, Martin Volk 19.12.2022

The subject matter includes the chapters of the NLTK book (up to and including the
specified chapters): 1.1-1.4, 2.1-2.5, 3.1-3.6,4.1-4.2.

Lecture 1 (MV): Introduction to UNIX Commands


1. What does the UNIX command 'wc' do?
2. What is the UNIX command to append 3 text files a.txt, b.txt, c.txt to each other
and store the whole text in the file result.txt?
3. Given a verticalized word list (one word per line) in the file my_text.txt. Which
command combination calculates an alphabetically sorted list of all word types of
this file?

Lecture 2 (MV): The 'grep' command and regular expressions


1. What is the parameter for 'grep' to display the two lines before a hit line as
context?
2. What is the regular expression to find all words beginning with 'ge' and ending
with 't'?
3. What is the regular expression to find all Part-of-Speech tags for personal
pronouns (PPER) and relative pronouns (PRELS and PRELAT) in a file in 3-
column format (token, PoS, lemma)?
4. What is the regular expression to find all words where the same letter occurs 3
times immediately adjacent?
5. What is the difference between a wildcard search and a regular expression search?

Lecture 3 (MV): More Unix Commands


1. What command gives you the first 100 lines from a file? Which command gives
you the last 100 lines from a file? What do these commands do if the file has less
than 100 lines?
2. How can you use Unix commands to find all token bigrams from a file in 3-
column format?
3. What are the meanings of the two occurrences of the vertical bar '|' in the
command:
• grep -P '(ADV|JJ)' my_infile.txt | grep -o -P '^\w+'
4. Why is anbn with n ∈ N not a regular language,
while anbn with n ∈ N and n < 100 is indeed a regular language?
5. What is the difference between the following two searches? Give an example
where the result differs.
• grep -P -i '^[a-z]+[^\t]*\d+[^\t]*[a-z]+' my_infile.txt

1
• grep -P -i '^[a-z]+[^\s]*\d+[^\s]*[a-z]+' my_infile.txt

Lecture 4 (MV): First steps in Python


1. How can you access the last 4 letters of a word (= a string)? How can you make
sure that your program only performs this operation if the word is at least 6
characters long?
2. How do you write a loop that is executed 55 times?
3. How do you write a distinction in 3 cases (3 alternatives) in Python? If a word is
shorter than 10 letters, write it to a file. If the word is shorter than 20 letters, write
it on the screen. If the word is longer than 20 letters, write it and its length on the
screen.

Lecture 5 (MV): Python Lists


1. Here is a Python list of 3 words:
my_list = ['Christmas', 'for', 'all']
Which command inserts the string 'Merry' at the beginning of the list?
Which operation inserts the word 'soon' into the list after the word 'Christmas'?
2. How can you determine the length of a list?
3. What are the advantages of Python lists over Python dictionaries?

Lecture 6 (MV): Python Dictionaries


1. Given a dictionary with 5 German words and their part-of-speech tags:
my_hash = {'Weihnachten':'NN', 'steht':'VVFIN', 'vor':'APPR',
'der':'ART', 'Tür':'NN'}
Which operation inserts the word 'Fröhliche' with the PoS tag 'ADJA' into the
dictionary?
2. Given a text file (with 1500 tokens) and a file with 10,000 different first names.
How often must a program access the dictionary with the first names if it is to
check for each word in the text file whether it is a person first name? All genitive
forms (e.g. Simons, Martins, Annas) should also be checked.

3. Given is a file with verticalized text where each word appears with its part of
speech:
Estos DM
30 CARD
símbolos NC
representan VLfin
un ART
sistema NC
fonológico ADJ
de PREP
5 CARD
fonemas NC
vocálicos ADJ
y CC
25 CARD

2
fonemas NC
consonanticos ADJ
. FS
The two columns are separated by a tabulator. Write a Python program that counts
which part-of-speech tag occurs how often.
4. Change the program under 2 so that only part-of-speech tags beginning with the
letter 'C' are counted.
5. Change the program under 2 so that only PoS tags are counted where the word
contains the letter sequence 'on'.
6. Please describe what the following Python program does. Comment every
operation of this program.

def main():

my_lexicon = {'Tante':'aunt',
'Polly':'Polly',
'sah':'saw',
'Mann':'man',
'mit':'with',
'Fernglas':'telescope',
'im':'in the',
'Garten':'garden',
'dem':'the',
'der':'the',
'den':'the'}

# test_sentence = 'Tante Polly sah den Mann mit dem


Fernglas'
test_sentence = 'Tante Polly sah den Mann im Garten'

print(test_sentence)
test_list = test_sentence.split()

for word in test_list:


print(my_lexicon[word], end=' ')

print()
print('---------------------')

if __name__ == '__main__':
main()

Lecture 7 (MV): Python Tuples, Working with Excel files, Accessing


Wikipedia
1. What is the difference between Python lists and Python tuples and Python sets?
2. How can you output a table with 3 columns (token, PoS tag, lemma) into an Excel
file with OpenPyxl?
3. How can you access Wikipedia pages from Python?

3
4. How can you work with Wikipedia pages from Python when you are not
connected to the Internet?

Lecture 8 (SC): Objects, Types and Strings


1. What are types? What are objects? What are attributes? What are methods?
2. How can I determine the character code of a single character string?
3. What is the difference between ISO-LATIN-1 and UNICODE? What is the
difference between ASCII and UNICODE?
4. What are the differences between the different Python ways to write string
literals?
5. How to determine the type of a variable? Do all objects have a type?
6. Why do I have to distinguish between UTF-8 and Unicode?
7. What does the optional encoding argument of the standard functions open() do?
8. Why should you include a coding comment # -*- coding: utf-8 -*- in the source
code?
9. What do the methods decode() and encode() do? What is the type bytes meant
for?
10. What can be calculated with re.findall()? Given is a string s and a pattern p.
Which result can be found at re.findall(p,s)?
11. What is the Regex flag (?x) for?
12. Given is a string s and r and a pattern p. To what does the expression re.sub(p,r,s)
evaluate ? I.e. one must be able to use regular expressions (incl. greedy and non-
greedy repetitions *? and +?).
13. What is the module unicodedata good for? How can Unicode character classes
help in regular expressions?

Lecture 9 (SC): NLTK Book Chapter 1


1. What are the advantages of using a toolkit like NLTK? What are the
disadvantages?
2. What are the differences between modules and packages?
3. What is the difference between the two versions of the import statement?
import nltk
from nltk import *
4. What is a dispersion plot and a KWIC?
5. What is the value of a list comprehension of the form x?
6. What is the difference between a set comprehension and a dict comprehension?
7. How do you define functions? What does the return statement do?
8. What constitutes good functions?
9. What is the difference between a local and a global name?
10. Why can the keyword "global" be useful? Why is its use problematic?

Lecture 10 (SC): NLTK Book Chapter 2


1. In what different ways can you represent raw text corpora? What are the
advantages? What are the disadvantages?

4
2. Which types of corpora do exist? What is normally used in NLP?
3. What are univariate frequency distributions? How can I create them in NLTK?
How can you calculate the frequency distribution of the word lengths of a
tokenized text (as a list of strings)?
4. What are conditional or bivariate frequency distributions? What can they be used
for? How can the frequency distribution of the words of a tagged corpus be
calculated for each part-of-speech tag?
5. What is the difference between statements and expressions?
6. Given is a list comprehension e: What is the corresponding code in the form of
statements? Also for nested for-loops...
7. What can be said about the statement that in Python functions are also objects?
8. What is the difference between a class and an object?
9. What is the difference between methods and functions?
10. What is enumerate() useful for? What is the difference between the statement
when the name 'text' is a list of tokens of a text in the two following examples?
for i in range(len(text)):
print(i, text[i])

for (i,w) in enumerate(text):


print(i, w)

Lecture 11 (SC): NLTK Book Chapter 2: Lexical Resources


1. How can stopword lists be applied to text corpora to calculate the non-stopword
vocabulary?
2. What are suitable data structures to represent lexical resources of different
complexity?
3. What are classes conceptually? What is meant by inheritance? What does the
name 'object' mean in Python?
4. How do you define your own classes? What is a constructor for a class?
5. What does __init__() return? What are its side effects?
6. How are methods defined? What does the parameter "self" mean?
7. Why is it important to differentiate between definition time and runtime?
8. Which attributes are stored in the object instances? Where are Python methods
stored?
9. What naming conventions are used to identify private/protected
attributes/methods and public attributes/methods?

Lecture 12/13 (SC): NLTK Book Chapter 3


1. How does the format() method of str work? What are fields for? What do format
strings look like when formatting typical data (rounding floats, padding strings)?
2. What's the deal with rounding numbers in Python? To what extent does the
rounding of floating point numbers in Python differ from the rounding rules we
know from school?

5
3. How can you write a flexibly parameterizable KWIC concordance program with
your own classes?
4. What is a text index?
5. What are generator expressions? What are they suitable for? To what extent are
generator expressions more efficient than list comprehension expressions?
6. What does the function next() do? What distinguishes it from the function iter()?
7. What does the keyword "yield" do in contrast to "return"?
8. Why are generator expressions particularly useful as arguments of functions like
max(), sum() or set()?
9. What is the difference between range objects and generators?
10. How can you sample words randomly from a corpus in Python?
11. Which exceptions are most common? How can exceptions be handled?
12. Why is the 'with' construct useful for files?
13. Which are the main classes of spaCy for a typical pipeline? What is their
function? How can you access information about dependency parses in a parsed
document?
14. Why does spaCy use the Vocab class? What is it good for?
15. What does the Matcher class of spaCy allow us to do? What are the differences to
the DependencyMatcher class? What do the operators of the semgrex query
language mean? Given a dependency tree, how would a pattern look like that
matches the indicated syntactic relation?
16. How is the vector cosine similarity computed? What is the numeric range of this
similarity measure? On which linguistic levels can we use cosine similarity in
spaCy? What are sense2vec vectors?
17. Which forms of serialization do you know? What is the benefit of using JSON for
serialization? What is the problem of binary serializations?
18.

Lecture 13/14 (SC): Binding, Identity and Mutable Data Structures


1. What is meant by binding and rebinding in Python? Which Python language
constructs relate to it?
2. What is Garbage Collection? What is disposed?
3. What do the variable names denote that are used in for-loops and as function
parameters? What happens with mutable data structures when modifications are
made to these variable ?
4. What does the id() function show? What is the difference between the operator
"is" and "=="?
5. In which aspects does the increment operato behave differently for mutable or
non-mutable objects?
6. How can you copy lists and other mutable objects? What is the difference
between a shallow copy and a deep copy? How does this relate to the notation a =
b[:]?
7. Why is it dangerous to delete elements from a list in a for-loop over the same list?
8. Which functionalities does sorted() offer in connection with dictionaries? How
can you find maximum elements in lists and dictionaries?
9. What should a command line argument parser be able to do?

You might also like