0% found this document useful (0 votes)

15 views6 pages

Exam Preparation Questions PCL I 2022

The document provides questions for an exam on natural language processing techniques. It includes questions about UNIX commands, regular expressions, Python programming concepts like lists and dictionaries, and NLTK functions. The exam covers material from lectures on introductory topics through more advanced NLP modeling techniques.

Uploaded by

Richard Salnikov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Exam Preparation Questions PCL I 2022

Uploaded by

Richard Salnikov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Exam Preparation Questions

Programmiertechniken der Computerlinguistik I (PCL-I)

Universität Zürich Fall Semester 2022
Simon Clematide, Martin Volk 19.12.2022

The subject matter includes the chapters of the NLTK book (up to and including the
specified chapters): 1.1-1.4, 2.1-2.5, 3.1-3.6,4.1-4.2.

Lecture 1 (MV): Introduction to UNIX Commands

1. What does the UNIX command 'wc' do?
2. What is the UNIX command to append 3 text files a.txt, b.txt, c.txt to each other
and store the whole text in the file result.txt?
3. Given a verticalized word list (one word per line) in the file my_text.txt. Which
command combination calculates an alphabetically sorted list of all word types of
this file?

Lecture 2 (MV): The 'grep' command and regular expressions

1. What is the parameter for 'grep' to display the two lines before a hit line as
context?
2. What is the regular expression to find all words beginning with 'ge' and ending
with 't'?
3. What is the regular expression to find all Part-of-Speech tags for personal
pronouns (PPER) and relative pronouns (PRELS and PRELAT) in a file in 3-
column format (token, PoS, lemma)?
4. What is the regular expression to find all words where the same letter occurs 3
times immediately adjacent?
5. What is the difference between a wildcard search and a regular expression search?

Lecture 3 (MV): More Unix Commands

1. What command gives you the first 100 lines from a file? Which command gives
you the last 100 lines from a file? What do these commands do if the file has less
than 100 lines?
2. How can you use Unix commands to find all token bigrams from a file in 3-
column format?
3. What are the meanings of the two occurrences of the vertical bar '|' in the
command:
• grep -P '(ADV|JJ)' my_infile.txt | grep -o -P '^\w+'
4. Why is anbn with n ∈ N not a regular language,
while anbn with n ∈ N and n < 100 is indeed a regular language?
5. What is the difference between the following two searches? Give an example
where the result differs.
• grep -P -i '^[a-z]+[^\t]*\d+[^\t]*[a-z]+' my_infile.txt

1
• grep -P -i '^[a-z]+[^\s]*\d+[^\s]*[a-z]+' my_infile.txt

Lecture 4 (MV): First steps in Python

1. How can you access the last 4 letters of a word (= a string)? How can you make
sure that your program only performs this operation if the word is at least 6
characters long?
2. How do you write a loop that is executed 55 times?
3. How do you write a distinction in 3 cases (3 alternatives) in Python? If a word is
shorter than 10 letters, write it to a file. If the word is shorter than 20 letters, write
it on the screen. If the word is longer than 20 letters, write it and its length on the
screen.

Lecture 5 (MV): Python Lists

1. Here is a Python list of 3 words:
my_list = ['Christmas', 'for', 'all']
Which command inserts the string 'Merry' at the beginning of the list?
Which operation inserts the word 'soon' into the list after the word 'Christmas'?
2. How can you determine the length of a list?
3. What are the advantages of Python lists over Python dictionaries?

Lecture 6 (MV): Python Dictionaries

1. Given a dictionary with 5 German words and their part-of-speech tags:
my_hash = {'Weihnachten':'NN', 'steht':'VVFIN', 'vor':'APPR',
'der':'ART', 'Tür':'NN'}
Which operation inserts the word 'Fröhliche' with the PoS tag 'ADJA' into the
dictionary?
2. Given a text file (with 1500 tokens) and a file with 10,000 different first names.
How often must a program access the dictionary with the first names if it is to
check for each word in the text file whether it is a person first name? All genitive
forms (e.g. Simons, Martins, Annas) should also be checked.

3. Given is a file with verticalized text where each word appears with its part of
speech:
Estos DM
30 CARD
símbolos NC
representan VLfin
un ART
sistema NC
fonológico ADJ
de PREP
5 CARD
fonemas NC
vocálicos ADJ
y CC
25 CARD

2
fonemas NC
consonanticos ADJ
. FS
The two columns are separated by a tabulator. Write a Python program that counts
which part-of-speech tag occurs how often.
4. Change the program under 2 so that only part-of-speech tags beginning with the
letter 'C' are counted.
5. Change the program under 2 so that only PoS tags are counted where the word
contains the letter sequence 'on'.
6. Please describe what the following Python program does. Comment every
operation of this program.

def main():

my_lexicon = {'Tante':'aunt',
'Polly':'Polly',
'sah':'saw',
'Mann':'man',
'mit':'with',
'Fernglas':'telescope',
'im':'in the',
'Garten':'garden',
'dem':'the',
'der':'the',
'den':'the'}

# test_sentence = 'Tante Polly sah den Mann mit dem

Fernglas'
test_sentence = 'Tante Polly sah den Mann im Garten'

print(test_sentence)
test_list = test_sentence.split()

for word in test_list:

print(my_lexicon[word], end=' ')

print()
print('---------------------')

if __name__ == '__main__':
main()

Lecture 7 (MV): Python Tuples, Working with Excel files, Accessing

Wikipedia
1. What is the difference between Python lists and Python tuples and Python sets?
2. How can you output a table with 3 columns (token, PoS tag, lemma) into an Excel
file with OpenPyxl?
3. How can you access Wikipedia pages from Python?

3
4. How can you work with Wikipedia pages from Python when you are not
connected to the Internet?

Lecture 8 (SC): Objects, Types and Strings

1. What are types? What are objects? What are attributes? What are methods?
2. How can I determine the character code of a single character string?
3. What is the difference between ISO-LATIN-1 and UNICODE? What is the
difference between ASCII and UNICODE?
4. What are the differences between the different Python ways to write string
literals?
5. How to determine the type of a variable? Do all objects have a type?
6. Why do I have to distinguish between UTF-8 and Unicode?
7. What does the optional encoding argument of the standard functions open() do?
8. Why should you include a coding comment # -*- coding: utf-8 -*- in the source
code?
9. What do the methods decode() and encode() do? What is the type bytes meant
for?
10. What can be calculated with re.findall()? Given is a string s and a pattern p.
Which result can be found at re.findall(p,s)?
11. What is the Regex flag (?x) for?
12. Given is a string s and r and a pattern p. To what does the expression re.sub(p,r,s)
evaluate ? I.e. one must be able to use regular expressions (incl. greedy and non-
greedy repetitions *? and +?).
13. What is the module unicodedata good for? How can Unicode character classes
help in regular expressions?

Lecture 9 (SC): NLTK Book Chapter 1

1. What are the advantages of using a toolkit like NLTK? What are the
disadvantages?
2. What are the differences between modules and packages?
3. What is the difference between the two versions of the import statement?
import nltk
from nltk import *
4. What is a dispersion plot and a KWIC?
5. What is the value of a list comprehension of the form x?
6. What is the difference between a set comprehension and a dict comprehension?
7. How do you define functions? What does the return statement do?
8. What constitutes good functions?
9. What is the difference between a local and a global name?
10. Why can the keyword "global" be useful? Why is its use problematic?

Lecture 10 (SC): NLTK Book Chapter 2

1. In what different ways can you represent raw text corpora? What are the
advantages? What are the disadvantages?

4
2. Which types of corpora do exist? What is normally used in NLP?
3. What are univariate frequency distributions? How can I create them in NLTK?
How can you calculate the frequency distribution of the word lengths of a
tokenized text (as a list of strings)?
4. What are conditional or bivariate frequency distributions? What can they be used
for? How can the frequency distribution of the words of a tagged corpus be
calculated for each part-of-speech tag?
5. What is the difference between statements and expressions?
6. Given is a list comprehension e: What is the corresponding code in the form of
statements? Also for nested for-loops...
7. What can be said about the statement that in Python functions are also objects?
8. What is the difference between a class and an object?
9. What is the difference between methods and functions?
10. What is enumerate() useful for? What is the difference between the statement
when the name 'text' is a list of tokens of a text in the two following examples?
for i in range(len(text)):
print(i, text[i])

for (i,w) in enumerate(text):

print(i, w)

Lecture 11 (SC): NLTK Book Chapter 2: Lexical Resources

1. How can stopword lists be applied to text corpora to calculate the non-stopword
vocabulary?
2. What are suitable data structures to represent lexical resources of different
complexity?
3. What are classes conceptually? What is meant by inheritance? What does the
name 'object' mean in Python?
4. How do you define your own classes? What is a constructor for a class?
5. What does __init__() return? What are its side effects?
6. How are methods defined? What does the parameter "self" mean?
7. Why is it important to differentiate between definition time and runtime?
8. Which attributes are stored in the object instances? Where are Python methods
stored?
9. What naming conventions are used to identify private/protected
attributes/methods and public attributes/methods?

Lecture 12/13 (SC): NLTK Book Chapter 3

1. How does the format() method of str work? What are fields for? What do format
strings look like when formatting typical data (rounding floats, padding strings)?
2. What's the deal with rounding numbers in Python? To what extent does the
rounding of floating point numbers in Python differ from the rounding rules we
know from school?

5
3. How can you write a flexibly parameterizable KWIC concordance program with
your own classes?
4. What is a text index?
5. What are generator expressions? What are they suitable for? To what extent are
generator expressions more efficient than list comprehension expressions?
6. What does the function next() do? What distinguishes it from the function iter()?
7. What does the keyword "yield" do in contrast to "return"?
8. Why are generator expressions particularly useful as arguments of functions like
max(), sum() or set()?
9. What is the difference between range objects and generators?
10. How can you sample words randomly from a corpus in Python?
11. Which exceptions are most common? How can exceptions be handled?
12. Why is the 'with' construct useful for files?
13. Which are the main classes of spaCy for a typical pipeline? What is their
function? How can you access information about dependency parses in a parsed
document?
14. Why does spaCy use the Vocab class? What is it good for?
15. What does the Matcher class of spaCy allow us to do? What are the differences to
the DependencyMatcher class? What do the operators of the semgrex query
language mean? Given a dependency tree, how would a pattern look like that
matches the indicated syntactic relation?
16. How is the vector cosine similarity computed? What is the numeric range of this
similarity measure? On which linguistic levels can we use cosine similarity in
spaCy? What are sense2vec vectors?
17. Which forms of serialization do you know? What is the benefit of using JSON for
serialization? What is the problem of binary serializations?
18.

Lecture 13/14 (SC): Binding, Identity and Mutable Data Structures

1. What is meant by binding and rebinding in Python? Which Python language
constructs relate to it?
2. What is Garbage Collection? What is disposed?
3. What do the variable names denote that are used in for-loops and as function
parameters? What happens with mutable data structures when modifications are
made to these variable ?
4. What does the id() function show? What is the difference between the operator
"is" and "=="?
5. In which aspects does the increment operato behave differently for mutable or
non-mutable objects?
6. How can you copy lists and other mutable objects? What is the difference
between a shallow copy and a deep copy? How does this relate to the notation a =
b[:]?
7. Why is it dangerous to delete elements from a list in a for-loop over the same list?
8. Which functionalities does sorted() offer in connection with dictionaries? How
can you find maximum elements in lists and dictionaries?
9. What should a command line argument parser be able to do?

Install Dcm4chee5
No ratings yet
Install Dcm4chee5
10 pages
Compiler Design in C (Allen I. Holub)
100% (1)
Compiler Design in C (Allen I. Holub)
986 pages
Semantics of Programming Languages: Computer Science Tripos, Part 1B 2011
No ratings yet
Semantics of Programming Languages: Computer Science Tripos, Part 1B 2011
127 pages
An Introduction To The Zen of Python
100% (1)
An Introduction To The Zen of Python
88 pages
Pyregex
No ratings yet
Pyregex
71 pages
Python Basics
100% (2)
Python Basics
82 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Perl Programming
No ratings yet
Perl Programming
141 pages
Chapter 3 Lexical Analyser
No ratings yet
Chapter 3 Lexical Analyser
29 pages
Practical File: Be (Cse) 6 Semester
No ratings yet
Practical File: Be (Cse) 6 Semester
54 pages
Common Lisp Cookbook
100% (1)
Common Lisp Cookbook
680 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Semantics of Programming Languages: Lecture Notes On
No ratings yet
Semantics of Programming Languages: Lecture Notes On
97 pages
Python: An Introduction Python: An Introduction
No ratings yet
Python: An Introduction Python: An Introduction
82 pages
Command Line Arguments in C - C++
100% (1)
Command Line Arguments in C - C++
2 pages
LexYacc Final
No ratings yet
LexYacc Final
44 pages
Assignment 1 (Part I) : Setup Mininet VM in Virtualbox
No ratings yet
Assignment 1 (Part I) : Setup Mininet VM in Virtualbox
4 pages
Chap 11
No ratings yet
Chap 11
28 pages
Learn Python in Three Hours
No ratings yet
Learn Python in Three Hours
53 pages
Packaging Python Org en Latest PDF
No ratings yet
Packaging Python Org en Latest PDF
158 pages
Python I: Some Material Adapted From Upenn Cmpe391 Slides and Other Sources
No ratings yet
Python I: Some Material Adapted From Upenn Cmpe391 Slides and Other Sources
68 pages
Unit-Iii Chapter-1: Python Strings Revisited
100% (2)
Unit-Iii Chapter-1: Python Strings Revisited
49 pages
Python: An Introduction Python: An Introduction
100% (1)
Python: An Introduction Python: An Introduction
82 pages
FTAdmin
No ratings yet
FTAdmin
134 pages
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
16 pages
Starting
No ratings yet
Starting
82 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Lecture #5 Began Here: Avoid These Top 10 Homework #1 Bugs in Your Homework #2
No ratings yet
Lecture #5 Began Here: Avoid These Top 10 Homework #1 Bugs in Your Homework #2
5 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages
Lecture 03
No ratings yet
Lecture 03
42 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
18 pages
Lexical Analysis
No ratings yet
Lexical Analysis
29 pages
A Python Book
No ratings yet
A Python Book
148 pages
Python Programming
No ratings yet
Python Programming
13 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
CS - CH - 4 5 NOTESehwi
No ratings yet
CS - CH - 4 5 NOTESehwi
18 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lab MAnual 2
No ratings yet
Lab MAnual 2
8 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
X - Table of Contents
No ratings yet
X - Table of Contents
5 pages
Lisp (Programming Language)
No ratings yet
Lisp (Programming Language)
31 pages
How To Install Apache, PHP 7.1 and MySQL On CentOS 7.4 (LAMP) Best
No ratings yet
How To Install Apache, PHP 7.1 and MySQL On CentOS 7.4 (LAMP) Best
21 pages
Python Notes FINAL
No ratings yet
Python Notes FINAL
62 pages
Python Crash Course Strings, Math
No ratings yet
Python Crash Course Strings, Math
27 pages
ALV Grid in The Nutshell Field Catalog - Col - Id - ALV Control Column ID
No ratings yet
ALV Grid in The Nutshell Field Catalog - Col - Id - ALV Control Column ID
6 pages
Chapter 13: File Permissions and Ownership
No ratings yet
Chapter 13: File Permissions and Ownership
12 pages
Chapter 5
No ratings yet
Chapter 5
21 pages
UEFI Secure Boot On U-Boot: Grant Likely 23 Aug 2019
No ratings yet
UEFI Secure Boot On U-Boot: Grant Likely 23 Aug 2019
16 pages
Log
No ratings yet
Log
19 pages
SCMP
No ratings yet
SCMP
24 pages
Install Kubernetes (K8s) and Docker On Ubuntu 20.04
No ratings yet
Install Kubernetes (K8s) and Docker On Ubuntu 20.04
3 pages
Linux Commands
No ratings yet
Linux Commands
14 pages
Titan Install
No ratings yet
Titan Install
4 pages
Lesson Plans LING360
No ratings yet
Lesson Plans LING360
35 pages
NOTE - Building Abstractions With Functions
No ratings yet
NOTE - Building Abstractions With Functions
60 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Python Intro
No ratings yet
Python Intro
80 pages
Log Thiet
No ratings yet
Log Thiet
55 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
UN
No ratings yet
UN
3 pages
Natural Language Processing - Session 3 - Regular Expressions
No ratings yet
Natural Language Processing - Session 3 - Regular Expressions
39 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Unix-Linux Architecture
No ratings yet
Unix-Linux Architecture
12 pages
I2N L2 NS Organization
No ratings yet
I2N L2 NS Organization
63 pages
Toa Handout 2
No ratings yet
Toa Handout 2
41 pages
Howto-Gdb Helpers
No ratings yet
Howto-Gdb Helpers
10 pages
01 Methods Phenomena
No ratings yet
01 Methods Phenomena
46 pages
l09 nltk1 Slides
No ratings yet
l09 nltk1 Slides
42 pages
3 - 3 Directory Traversal
No ratings yet
3 - 3 Directory Traversal
7 pages
ENGR 1204 Lecture 6
No ratings yet
ENGR 1204 Lecture 6
40 pages
ECL2 2024 Ex06 SampleSolution Updated
No ratings yet
ECL2 2024 Ex06 SampleSolution Updated
9 pages
NLP Assignment
No ratings yet
NLP Assignment
8 pages
PPL Unit 1
No ratings yet
PPL Unit 1
8 pages
Cia Ii (PSPP - 24ucs301 Answerkey)
No ratings yet
Cia Ii (PSPP - 24ucs301 Answerkey)
28 pages
Lec 1 IntroToAutomataTheory
No ratings yet
Lec 1 IntroToAutomataTheory
20 pages
Unit 2 Cse111
100% (1)
Unit 2 Cse111
88 pages
Complete DevOps Interview Guide
No ratings yet
Complete DevOps Interview Guide
4 pages
Lainkusanagi OSCP Like PDF
No ratings yet
Lainkusanagi OSCP Like PDF
12 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Optimizing A Red Hat Enterprise Linux Installation ISO Image - Enable Sysadmin
No ratings yet
Optimizing A Red Hat Enterprise Linux Installation ISO Image - Enable Sysadmin
6 pages
PythonProgramming DCA6109
No ratings yet
PythonProgramming DCA6109
13 pages
TOC L01 Languages S25
No ratings yet
TOC L01 Languages S25
37 pages
Midterm 1
No ratings yet
Midterm 1
5 pages
Data Preparation in Python
No ratings yet
Data Preparation in Python
8 pages
Multimedia Application L2
No ratings yet
Multimedia Application L2
47 pages
2 Tokens Naturalness of Code
No ratings yet
2 Tokens Naturalness of Code
56 pages
LESSON PLANII Yr. - 1
No ratings yet
LESSON PLANII Yr. - 1
2 pages
Fat File System Forensics
No ratings yet
Fat File System Forensics
23 pages
Ch11 ManipulatingTextWithMethodsAndFiles
No ratings yet
Ch11 ManipulatingTextWithMethodsAndFiles
54 pages
PM Debug Info
No ratings yet
PM Debug Info
145 pages
B.C.A Sem-5 & 6 Major, Minor Syllabus 2025-26 (Dt.16-06-2025)
No ratings yet
B.C.A Sem-5 & 6 Major, Minor Syllabus 2025-26 (Dt.16-06-2025)
52 pages
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet

Exam Preparation Questions PCL I 2022

Uploaded by

Exam Preparation Questions PCL I 2022

Uploaded by

Exam Preparation Questions

Programmiertechniken der Computerlinguistik I (PCL-I)

Lecture 1 (MV): Introduction to UNIX Commands

Lecture 2 (MV): The 'grep' command and regular expressions

Lecture 3 (MV): More Unix Commands

Lecture 4 (MV): First steps in Python

Lecture 5 (MV): Python Lists

Lecture 6 (MV): Python Dictionaries

# test_sentence = 'Tante Polly sah den Mann mit dem

for word in test_list:

Lecture 7 (MV): Python Tuples, Working with Excel files, Accessing

Lecture 8 (SC): Objects, Types and Strings

Lecture 9 (SC): NLTK Book Chapter 1

Lecture 10 (SC): NLTK Book Chapter 2

for (i,w) in enumerate(text):

Lecture 11 (SC): NLTK Book Chapter 2: Lexical Resources

Lecture 12/13 (SC): NLTK Book Chapter 3

Lecture 13/14 (SC): Binding, Identity and Mutable Data Structures

You might also like