0% found this document useful (0 votes)
28 views

Introduction

This document provides an introduction to a course on natural language processing and data analysis. It discusses the topics that will be covered in the course including text preprocessing, language models, word embeddings, text classification, neural networks, and several NLP applications. It also outlines the course structure, schedule, evaluation criteria, and resources that will be used.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Introduction

This document provides an introduction to a course on natural language processing and data analysis. It discusses the topics that will be covered in the course including text preprocessing, language models, word embeddings, text classification, neural networks, and several NLP applications. It also outlines the course structure, schedule, evaluation criteria, and resources that will be used.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Advanced Data Engineering &

Analysis: Course Introduction

6 March 2024
Adam Jatowt
[email protected]
Examples of recent NLP/IR researches
• Question answering in news article collections
• Automatic hint generation
• Multi-modal document summarization
• Multi-timeline summarization of news articles
• Epidemic information extraction
• Search and recommendation models in news collections
• Sentence temporal validity (aka. information expiry date) estimation
• Text readability and comprehensibility estimation

Anyone interested in doing master’s thesis research on any of these


or related NLP/IR topics can contact me for discussion.
Some open topics are listed on the DS website (more to come..).
About Course
What is Natural Language Processing?
• The amount of digital textual data being generated every day is huge (e.g., the Web, social media, medical
records, digitalized books)
• So does the need for understanding, analyzing, organizing, translating, and processing this flood of words and documents

• Natural language processing (NLP) deals with designing methods and algorithms that take as an input, or
produce as an output, unstructured, natural language data
• Natural language processing is focused on the design & analysis of computational algorithms and
representations for processing natural human language
What is Natural Language Processing?
• Human language is our main general-purpose communication tool
• Natural Language Processing
• Large field: processing natural language text involves many various syntactic, semantic, and
pragmatic tasks in addition to other problems
General things about this course
• Not a linguistics course, but rather a course that includes aspects of language processing,
machine learning and quantitative methods
• We will explore statistical and NN techniques for the automatic analysis of natural (human)
language data
• The dominant modeling paradigm is corpus-driven statistical/deep learning, with both supervised
and unsupervised methods
• NLP is a huge field!
• We focus mainly on core ideas, tasks, and methods needed for or behind fundamental technologies,
and eventually for NLP applications
Course goals
• Overview and study fundamental tasks in NLP
• Learn some classic and state-of-the-art techniques
• Acquire some research ideas, interest and experience :-)
Housekeeping notes
Schedule, Grading, etc.
Overall effort
• This is a 5 ECTS VU course
• 1 ECTS credit = 25 hours of work [1]
• 125h/course → about 8h/week
• 8h/week - 3h/week = roughly 5h/week of work at home (includes preparations for exam)

[1] https://www.uibk.ac.at/studium/organisation/anerkennung-und-ects-zuteilung/index.html.en
Tentative schedule
• Week 1 • Week 8-9
• Course overview • Sentiment analysis & lexicons
• Week 2 • Sequence labeling
• NLP introduction • POS tagging, Named entity recognition
• Basic text pre-processing
• Week 10
• Week 3 • Document summarization
• N-grams, language models • Information and relation extraction
• Spelling error correction • Event extraction
• Week 4-5 • Week 11
• Word relations, senses & Wordnet
• Question answering, Commonsense knowledge extraction
• Text classification, Logistic regression
• Chatbots
• Week 6
• Basics of neural networks for NLP • Week 12-13
• Vector semantics, word embeddings • Tutorial presentations
• Week 7 (May 8) • Week 14 (June 26)
• Mid-term Exam • Final Exam

(The schedule may be subject to change.


Exams and presentation dates are however fixed)
Evaluation
• VU: Continuous assessment course
• Grading is decided based on attendance (5%), class participation (20%), tutorial
presentations (25%), and two exams (50%).
• Participation in the exams is necessary for a positive grade

Points Grade
Less than 50 Not enough
50 - 63 Enough
64 - 77 Satisfactory
78 - 89 Good
90 - 100 Very good
Attendance (5%)
• Attendance is advised but not mandatory
• However, you have to be present at the mid-term and final examinations (week 7 and week 14), as well as
make a tutorial presentation.
• You may cancel your registration in the course before March 21 (after third class) without any
consequences. After this time you may get a grade
• Please send me an email in case of a resignation
Class Participation (20%)
• Weekly paper presentations and occasional homework assignments
• Paper presentation time: about 15min presentation + QA time
• Q&A, especially, actively taking part in the discussions during the final tutorial
presentations and paper presentations
Tutorial Presentations (25%)
• About 25min long presentations in groups of 2-3 students
• Focused on recent NLP methods, mainly Deep Learning and LLMs
• Done in a tutorial style: overview the method, demonstrate key functions and code to implement it, and
showcase its results on any chosen dataset
• Slides to be submitted to OLAT at least a day before
• Presentation details and topics will be given later
Exams (50%)
• Mid-term exam (25%) on May 5 and final exam (25%) on June 26
• Closed book exams
• Multiple choice questions
• Focused more on understanding rather than rote memorization of details
• The exam dates will not be changed so you should not take this course if you anticipate a
schedule conflict
Academic Honesty & Integrity
• We expect you to do your own work unless it is specifically assigned as a group
assignment/project
• Whenever you use someone else’s idea, software library, etc., then it should be clearly
documented
• You are not allowed to collaborate on answering exam questions
• It is an honor code violation to discuss exam questions with other students
Slides
• Slides will be uploaded to OLAT before each lecture
• Some slides borrowed from Julia Hockenmaier, Alex Lascarides, Nathan Schneider, Dan Jurafsky, Chris Manning,
David Bamman, Ray Mooney, Yulia Tsvetkov, Taylor Berg-Kirk, Dan Klein, Diyi Yang, Jannik Strötgen, Vinay Setty,
Anubhav Jangra,...
Books
• H. Lane, C. Howard, H. Max Hapke, Natural Language Processing in Action, Manning, 2019
• Jurafsky and Martin, Speech and Language Processing, 2nd or 3rd Edition
• Manning and Schuetze, Foundations of Statistical NLP Speech and Language Processing, MIT
Press
• Goldberg, Neural Network Methods for Natural Language Processing. Synthesis lectures on
human language technologies
• Etc.
Other Relevant Books
• NLP with Python, The NLTK book, Bird, Klein & Loper.
• https://www.nltk.org/book

• Natural Language Processing, Eisenstein.


• https://tinyurl.com/eisenstein-nlp

• Linguistic Fundamentals of NLP, Bender.


• http://tinyurl.com/bender-nlp
Useful NLP Resources
• https://www.nltk.org/
• https://spacy.io/
• https://openai.com/
• https://allennlp.org/
• An open-source NLP research library, built on PyTorch
• https://huggingface.co/transformers/
• https://towardsdatascience.com/
• http://nlpprogress.com/
• Repository tracking the progress in NLP, including the datasets and the current state-of-the-art for most common NLP tasks
• https://github.com/flairNLP/flair
Relevant Scientific Conferences (ordered by
importance)
• Association for Computational Linguistics (ACL)
• Empirical Methods in Natural Language Processing (EMNLP)
• North American Association for Computational Linguistics (NAACL)
• International Conference on Computational Linguistics (COLING)
• European chapter of the Association for Computational Linguistics (EACL)
• Conference on Computational Natural Language Learning (CoNLL)
Other related Scientific Conferences
• Related ones:
• CIKM
• WSDM
• WWW
• SIGIR
• ECIR
• AAAI, IJCAI
Example NLP Workshops associated with
Relevant Conferences
• NLP for Building Educational Applications (BEA)

• Fact Extraction and VERification (FEVER)

• Figurative Language Processing (FLP)

• NLP for Conversational AI (NLP4ConvAI)

• Narrative Understanding, Storylines, and Events (NUSE)

• Representation Learning for NLP (RepL4NLP)

• Natural Language Processing for Social Media (SocialNLP)

• Neural Generation and Translation (WNGT)


Workshops recently organized by DS group
• The 6th Int. Workshop on Narrative Extraction from Texts at ECIR 2023
(Text2Story2023) https://text2story23.inesctec.pt/
• The 1st Int. Implicit Author Characterization from Texts for Search and Retrieval
(IACT’23) at SIGIR 2023 https://en.sce.ac.il/news/iact23
• The 2nd Int. Workshop on Computational Approaches to Historical Language Change
2021 at ACL 2021 https://languagechange.org/events/2021-acl-lchange/
Consultation Hours
• Fridays: 16:00 - 17:30
• Digital Science Center (DiSC), Innrain 15, A-6020 Innsbruck (room 01-09)
• Please schedule a meeting by email in advance
Disabilities Support Office
• The University of Innsbruck offers support through the disabilities office:
• Mag. Bettina Jeschke
[email protected]
• +43 512 507 8887
• You do not need a diagnosis to contact the office
• For students with suspected attention deficit, autism, or learning disabilities, additional support is
provided by the S-AAL project:
• https://www.uibk.ac.at/de/projects/s-aal/
Thank you!

You might also like