Elective
Elective
Since then, humans have spread all across the planet, speak a multitude of languages, and
struggle to communicate with one another. Computers and NLP can bridge this gap.
Introduction
Next, we examine the topic of Gen AI, which has revolutionized the way we have questions
answered, after over two decades of “Googling” for information, and assembling the
responses into something workable. Large language models (LLM) that power ChatGPT and
Bard are the current rage. While their outputs may not be optimal. In the Figure below, there
is a basic fault in the Python code that is generated (Can you spot it?). Regardless, there is
scope for progress with work that has been increasingly published since 2020. The
remainder of the course is divided between business case studies and guest speaker sessions.
There shall be no emphasis on coding. The Instructor will demonstrate concepts with
code in class where needed. All examples shall be circulated among participants, so they can
execute them on their own if they wish to.
Background
The core of NLP is linguistics; besides tackling syntax and semantics, NLP models must
address the nuances of ambiguity, common sense, context, sarcasm, and so on. Modern day
LLM models combine machine learning, big data, and AI approaches to realise human-level
accuracy with complex tasks such as speech recognition and answer generation. Voice-based
assistants such as Alexa, Cortana, Ok Google, and Siri have capably exploited all three
aspects of NLP, namely, learning, understanding and producing human language content.
While these advancements are speeding up response times for businesses, they are also
transforming the way we live.
NLP powers ground-breaking innovations. During Google I/O 2022, CEO Sundar Pichai
showcased several NLP-driven platforms.2 A novel, mono-lingual approach has allowed the
company to translate a large set of low resource (LR) Indian languages such as Assamese,
Bhojpuri, Konkani, Maithili, Manipuri, and Sanskrit.
Wearing a Glass (see Fig 1), a sighted person can see what is being said in real-time; the
words can also be translated into the user’s native tongue. This opens up a world of
opportunities for businesses that are focused on inclusion. In the near future, education
may be fully transacted in one’s native language.
Voice-powered apps on the phone use synthesised speech to establish a critical connect with
the user for handy tasks like route navigation. Google Maps have helped drivers tangibly
reduce carbon emissions by nudging them to choose a slightly slower but more energy-
efficient route. Automated summarisation of lengthy text documents is possible with Google
Docs. YouTube has begun to incorporate auto-generated chapters.
1
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266
2
Watch the entire event at https://www.youtube.com/watch?v=nP-nMZpLM1A
We live in a “renaissance era” of automation that is fuelled by easy access to training data,
elastic computation platforms, and huge investments from governments and businesses.
There is a growing emphasis on inclusivity; speech-to-text technologies from Google are
bringing different sections of society together by enabling communication between them.
Hugging Face has open sourced pre-trained models like BERT to automate NLP tasks such
as MT and sentiment analysis. DALL.E from OpenAI and Midjourney takes creativity to the
next level, by generating imagery from a supplied description.
The course is designed with the following specific objectives and learning outcomes:
Text Book(s)
Reference Book(s)
Aggarwal, C. C. (2015). Mining text data. In Data mining (pp. 429-455). Springer,
Cham.
Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python:
Enabling language-aware data products with machine learning. O'Reilly Media,
Inc.
Hagiwara, M. (2021). Real-World Natural Language Processing: Practical
applications with deep learning. Simon and Schuster.
Hapke, H. M., Lane, H., & Howard, C. (2019). Natural language processing in action.
Manning
Rao, D., & McMahan, B. (2019). Natural language processing with PyTorch: build
intelligent language applications using deep learning. O'Reilly Media, Inc.
Russell, M. A. (2013). Mining the social web: data mining Facebook, Twitter,
LinkedIn, Google+, GitHub, and more. O'Reilly Media, Inc.
Thomas, A. (2020). Natural Language Processing with Spark NLP: Learning to
Understand Text at Scale. O'Reilly Media, Inc.
Tunstall, L., von Werra, L., & Wolf, T. (2022). Natural Language Processing with
Transformers. O'Reilly Media, Inc.
Vasiliev, Y. (2020). Natural Language Processing with Python and spaCy: A
Practical Introduction. No Starch Press.
Additional Reading(s)
Specified against the topics.
Session Topic
1-2 The Landscape of Natural Language Processing
We discuss the history and evolution of NLP. The initial decades saw
the rise of heuristic approaches such as regular expression matches,
word frequencies, etc. Forensic “fingerprints” helped capture the
Unabomber. Starting with the 1990s, NLP began to benefit from
developments in machine learning and AI. We explore the landscape of
commercial offerings powered by NLP.
Readings:
1. Dale, R. NLP commercialisation in the last 25 years. Medium
(2019)
2. Dave Davies. (2017). FBI Profiler says Linguistic Work was
Pivotal in capture of Unabomber. npr.org
3 Python primer
4 Linguistics orientation
The first session goes over the basics of machine learning (ML). We
contrast the programmatic approach to derive regression coefficients,
with one that uses AI. We conclude by illustrating some supervised
and unsupervised models.
11 Information Extraction
Humans are driven by basic questions such as who, what, where, when
and why. Named entities form the answers to such questions, and
NE recognition (NER) plays a key role in extracting relevant
information from unstructured text, which forms the bulk of business
transactions. NER is the starting step for extracting keywords,
temporal information, event and relation. After going over the basics,
we apply transformers to perform NER and build a Question-
Answering system.
While NLP helps build nifty applications, monetising them comes with
its set of challenges. The case explains how an entrepreneur identified
a gap in the market and created a suite of NLP offerings under a
unified platform called Omnitive. The case is focused on strategic
aspects of the business model.
14 Topic Modelling
The final session is dedicated to the prospects that the future holds.