Introduction
Introduction
6 March 2024
Adam Jatowt
[email protected]
Examples of recent NLP/IR researches
• Question answering in news article collections
• Automatic hint generation
• Multi-modal document summarization
• Multi-timeline summarization of news articles
• Epidemic information extraction
• Search and recommendation models in news collections
• Sentence temporal validity (aka. information expiry date) estimation
• Text readability and comprehensibility estimation
• Natural language processing (NLP) deals with designing methods and algorithms that take as an input, or
produce as an output, unstructured, natural language data
• Natural language processing is focused on the design & analysis of computational algorithms and
representations for processing natural human language
What is Natural Language Processing?
• Human language is our main general-purpose communication tool
• Natural Language Processing
• Large field: processing natural language text involves many various syntactic, semantic, and
pragmatic tasks in addition to other problems
General things about this course
• Not a linguistics course, but rather a course that includes aspects of language processing,
machine learning and quantitative methods
• We will explore statistical and NN techniques for the automatic analysis of natural (human)
language data
• The dominant modeling paradigm is corpus-driven statistical/deep learning, with both supervised
and unsupervised methods
• NLP is a huge field!
• We focus mainly on core ideas, tasks, and methods needed for or behind fundamental technologies,
and eventually for NLP applications
Course goals
• Overview and study fundamental tasks in NLP
• Learn some classic and state-of-the-art techniques
• Acquire some research ideas, interest and experience :-)
Housekeeping notes
Schedule, Grading, etc.
Overall effort
• This is a 5 ECTS VU course
• 1 ECTS credit = 25 hours of work [1]
• 125h/course → about 8h/week
• 8h/week - 3h/week = roughly 5h/week of work at home (includes preparations for exam)
[1] https://www.uibk.ac.at/studium/organisation/anerkennung-und-ects-zuteilung/index.html.en
Tentative schedule
• Week 1 • Week 8-9
• Course overview • Sentiment analysis & lexicons
• Week 2 • Sequence labeling
• NLP introduction • POS tagging, Named entity recognition
• Basic text pre-processing
• Week 10
• Week 3 • Document summarization
• N-grams, language models • Information and relation extraction
• Spelling error correction • Event extraction
• Week 4-5 • Week 11
• Word relations, senses & Wordnet
• Question answering, Commonsense knowledge extraction
• Text classification, Logistic regression
• Chatbots
• Week 6
• Basics of neural networks for NLP • Week 12-13
• Vector semantics, word embeddings • Tutorial presentations
• Week 7 (May 8) • Week 14 (June 26)
• Mid-term Exam • Final Exam
Points Grade
Less than 50 Not enough
50 - 63 Enough
64 - 77 Satisfactory
78 - 89 Good
90 - 100 Very good
Attendance (5%)
• Attendance is advised but not mandatory
• However, you have to be present at the mid-term and final examinations (week 7 and week 14), as well as
make a tutorial presentation.
• You may cancel your registration in the course before March 21 (after third class) without any
consequences. After this time you may get a grade
• Please send me an email in case of a resignation
Class Participation (20%)
• Weekly paper presentations and occasional homework assignments
• Paper presentation time: about 15min presentation + QA time
• Q&A, especially, actively taking part in the discussions during the final tutorial
presentations and paper presentations
Tutorial Presentations (25%)
• About 25min long presentations in groups of 2-3 students
• Focused on recent NLP methods, mainly Deep Learning and LLMs
• Done in a tutorial style: overview the method, demonstrate key functions and code to implement it, and
showcase its results on any chosen dataset
• Slides to be submitted to OLAT at least a day before
• Presentation details and topics will be given later
Exams (50%)
• Mid-term exam (25%) on May 5 and final exam (25%) on June 26
• Closed book exams
• Multiple choice questions
• Focused more on understanding rather than rote memorization of details
• The exam dates will not be changed so you should not take this course if you anticipate a
schedule conflict
Academic Honesty & Integrity
• We expect you to do your own work unless it is specifically assigned as a group
assignment/project
• Whenever you use someone else’s idea, software library, etc., then it should be clearly
documented
• You are not allowed to collaborate on answering exam questions
• It is an honor code violation to discuss exam questions with other students
Slides
• Slides will be uploaded to OLAT before each lecture
• Some slides borrowed from Julia Hockenmaier, Alex Lascarides, Nathan Schneider, Dan Jurafsky, Chris Manning,
David Bamman, Ray Mooney, Yulia Tsvetkov, Taylor Berg-Kirk, Dan Klein, Diyi Yang, Jannik Strötgen, Vinay Setty,
Anubhav Jangra,...
Books
• H. Lane, C. Howard, H. Max Hapke, Natural Language Processing in Action, Manning, 2019
• Jurafsky and Martin, Speech and Language Processing, 2nd or 3rd Edition
• Manning and Schuetze, Foundations of Statistical NLP Speech and Language Processing, MIT
Press
• Goldberg, Neural Network Methods for Natural Language Processing. Synthesis lectures on
human language technologies
• Etc.
Other Relevant Books
• NLP with Python, The NLTK book, Bird, Klein & Loper.
• https://www.nltk.org/book