Exam Preparation Questions PCL I 2022
Exam Preparation Questions PCL I 2022
The subject matter includes the chapters of the NLTK book (up to and including the
specified chapters): 1.1-1.4, 2.1-2.5, 3.1-3.6,4.1-4.2.
1
• grep -P -i '^[a-z]+[^\s]*\d+[^\s]*[a-z]+' my_infile.txt
3. Given is a file with verticalized text where each word appears with its part of
speech:
Estos DM
30 CARD
símbolos NC
representan VLfin
un ART
sistema NC
fonológico ADJ
de PREP
5 CARD
fonemas NC
vocálicos ADJ
y CC
25 CARD
2
fonemas NC
consonanticos ADJ
. FS
The two columns are separated by a tabulator. Write a Python program that counts
which part-of-speech tag occurs how often.
4. Change the program under 2 so that only part-of-speech tags beginning with the
letter 'C' are counted.
5. Change the program under 2 so that only PoS tags are counted where the word
contains the letter sequence 'on'.
6. Please describe what the following Python program does. Comment every
operation of this program.
def main():
my_lexicon = {'Tante':'aunt',
'Polly':'Polly',
'sah':'saw',
'Mann':'man',
'mit':'with',
'Fernglas':'telescope',
'im':'in the',
'Garten':'garden',
'dem':'the',
'der':'the',
'den':'the'}
print(test_sentence)
test_list = test_sentence.split()
print()
print('---------------------')
if __name__ == '__main__':
main()
3
4. How can you work with Wikipedia pages from Python when you are not
connected to the Internet?
4
2. Which types of corpora do exist? What is normally used in NLP?
3. What are univariate frequency distributions? How can I create them in NLTK?
How can you calculate the frequency distribution of the word lengths of a
tokenized text (as a list of strings)?
4. What are conditional or bivariate frequency distributions? What can they be used
for? How can the frequency distribution of the words of a tagged corpus be
calculated for each part-of-speech tag?
5. What is the difference between statements and expressions?
6. Given is a list comprehension e: What is the corresponding code in the form of
statements? Also for nested for-loops...
7. What can be said about the statement that in Python functions are also objects?
8. What is the difference between a class and an object?
9. What is the difference between methods and functions?
10. What is enumerate() useful for? What is the difference between the statement
when the name 'text' is a list of tokens of a text in the two following examples?
for i in range(len(text)):
print(i, text[i])
5
3. How can you write a flexibly parameterizable KWIC concordance program with
your own classes?
4. What is a text index?
5. What are generator expressions? What are they suitable for? To what extent are
generator expressions more efficient than list comprehension expressions?
6. What does the function next() do? What distinguishes it from the function iter()?
7. What does the keyword "yield" do in contrast to "return"?
8. Why are generator expressions particularly useful as arguments of functions like
max(), sum() or set()?
9. What is the difference between range objects and generators?
10. How can you sample words randomly from a corpus in Python?
11. Which exceptions are most common? How can exceptions be handled?
12. Why is the 'with' construct useful for files?
13. Which are the main classes of spaCy for a typical pipeline? What is their
function? How can you access information about dependency parses in a parsed
document?
14. Why does spaCy use the Vocab class? What is it good for?
15. What does the Matcher class of spaCy allow us to do? What are the differences to
the DependencyMatcher class? What do the operators of the semgrex query
language mean? Given a dependency tree, how would a pattern look like that
matches the indicated syntactic relation?
16. How is the vector cosine similarity computed? What is the numeric range of this
similarity measure? On which linguistic levels can we use cosine similarity in
spaCy? What are sense2vec vectors?
17. Which forms of serialization do you know? What is the benefit of using JSON for
serialization? What is the problem of binary serializations?
18.