0% found this document useful (0 votes)
18 views214 pages

Natural+Language+Processing+in+Python

This document outlines a 5-part course series on data science in Python, focusing on natural language processing (NLP) techniques. It includes practical assignments, course structure, and installation instructions for using Anaconda and Jupyter Notebook. The course covers both traditional and modern NLP methods, including machine learning and deep learning approaches, with a hands-on approach to applying these concepts.

Uploaded by

abubakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views214 pages

Natural+Language+Processing+in+Python

This document outlines a 5-part course series on data science in Python, focusing on natural language processing (NLP) techniques. It includes practical assignments, course structure, and installation instructions for using Anaconda and Jupyter Notebook. The course covers both traditional and modern NLP methods, including machine learning and deep learning approaches, with a hands-on approach to applying these concepts.

Uploaded by

abubakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 214

DATA SCIENCE IN PYTHON

Natural Language
Processing
With Expert Data Science Instructor Alice Zhao

*Copyright Maven Analytics, LLC


ABOUT THIS SERIES

This is Part 5 of a 5-Part series designed to take you through several applications of data science
using Python, including data prep & EDA, regression, classification, unsupervised learning & NLP

PART 1 PART 2 PART 3 PART 4 PART 5


Data Prep & EDA Regression Classification Unsupervised Natural Language
Learning Processing

*Copyright Maven Analytics, LLC


COURSE STRUCTURE

This course is for students looking for a practical, hands-on approach to learning and
applying natural language processing (NLP) concepts and techniques with Python

Additional resources include:

Downloadable PDF to serve as a helpful reference when you’re offline or on the go

Quizzes & Assignments to test and reinforce key concepts, with step-by-step solutions

Interactive demos to keep you engaged and apply your skills throughout the course

*Copyright Maven Analytics, LLC


COURSE OUTLINE

Install Anaconda, launch Jupyter Notebook to write Python code,


1 Installation & Setup and practice creating and activating conda environments

Review the history of NLP, traditional ML vs modern LLM


2 Natural Language Processing 101 approaches, common NLP applications and Python libraries

Walk through the NLP text preprocessing pipeline, including


3 Text Preprocessing cleaning, normalization, linguistic analysis and vectorization

Use traditional ML techniques to apply rule-based, supervised


4 NLP with Machine Learning learning and unsupervised learning techniques on text data

*Copyright Maven Analytics, LLC


COURSE OUTLINE

Understand the theory behind how neural networks and deep


5 Neural Networks & Deep Learning learning work before moving on to modern DL architectures

Dive into the main parts of a transformer, including embeddings,


6 Transformers & LLMs attention and FFNs, as well as popular LLMs (BERT, GPT, etc.)

Use pretrained LLMs for sentiment analysis, NER, zero-shot


7 Hugging Face Transformers classification, summarization, feature extraction and generation

Review the NLP techniques covered in this course, when to use


8 NLP Review & Next Steps them, and next steps to dive deeper and stay up-to-date

*Copyright Maven Analytics, LLC


THE COURSE ASSIGNMENTS

MAVEN You'll be using text cleaning, normalization and vectorization techniques on


BOOKS book descriptions to extract common words and insights about the books

OVIE M
THE M

You'll be using machine learning techniques on movie summaries to rank movies


AVEN

by sentiment, predict the gender of a director, and identify movie themes

MAVEN You'll be using pretrained LLMs on book descriptions to extract character names,
BOOKS classify books into categories, create summaries, and recommend similar books

*Copyright Maven Analytics, LLC


SETTING EXPECTATIONS

This course covers traditional & modern natural language processing (NLP)
• Traditional NLP includes text preprocessing techniques & machine learning algorithms for text data
• Modern NLP includes concepts like neural networks, deep learning, transformers, and large language models (LLMs)

We will use Anaconda as our package and environment manager


• Anaconda is free to download, and the industry standard for conducting data science tasks with Python

We will use Hugging Face to work with Large Language Models (LLMs)
• We’ll use the Model Hub to access pretrained models and the Transformers library in Python to apply them
• We will NOT be doing a deep dive into more advanced transformer topics like fine-tuning, RAGs, etc.

You do NOT need to be a Python expert to take this course


• It is strongly recommended that you complete the first course in this series, Data Prep & EDA, but we will
teach the relevant math and Python code for applying NLP techniques throughout this course

*Copyright Maven Analytics, LLC


INSTALLATION & SETUP

*Copyright Maven Analytics, LLC


INSTALLATION & SETUP

In this section we’ll install Anaconda, start writing Python code in a Jupyter Notebook,
and learn how to create a new conda environment to get set up for this course

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Learn about Anaconda’s various features


Anaconda Overview Installing Anaconda
• Install Anaconda and launch Jupyter Notebook
• Get comfortable writing Python code within the
Launching Jupyter Conda Environments Jupyter Notebook interface
• Understand how to create and use conda
environments for project organization

*Copyright Maven Analytics, LLC


ANACONDA

Anaconda is the most popular package and environment


Anaconda
manager for data science and machine learning tasks
Overview

Installing
Anaconda
When you install Anaconda, it comes with the following:

Launching Coding languages & tools Popular packages Package & environment manager
Jupyter

Conda
Environments

We highly recommend using Anaconda as your package and


We’ll be using Jupyter Notebook environment manager to follow along with the course demos
to write Python code
You can use pip installs for packages and the venv module for
environments as an alternative if you’re already familiar with them

*Copyright Maven Analytics, LLC


INSTALL ANACONDA (MAC)

1) Go to anaconda.com/download 2) Click on the arrow in the “Download for Mac”


and click “Skip registration” button and select the type of computer you have
(Apple Silicon for newer computers, Intel for older ones)
Anaconda
Overview

Installing
Anaconda
3) Launch the downloaded Anaconda pkg file

Launching
Jupyter

Conda
Environments 4) Follow the installation steps (default settings are OK) and click “Continue”, “Agree” and “Install” at the end

*Copyright Maven Analytics, LLC


INSTALL ANACONDA (PC)

1) Go to anaconda.com/download 2) Click on “Download” button


and click “Skip registration”

Anaconda
Overview

Installing 3) Launch the downloaded Anaconda exe file


Anaconda

Launching
Jupyter

Conda
Environments 4) Follow the installation steps (default settings are OK) and click “Continue”, “Agree” and “Install” at the end

*Copyright Maven Analytics, LLC


LAUNCH JUPYTER

1) Open the Terminal (Mac) or Anaconda Prompt (PC) application


2) Type jupyter notebook and hit return
Anaconda
Overview

Installing
Anaconda MAC PC
Launching
Jupyter

Conda
Environments

*Copyright Maven Analytics, LLC


YOUR FIRST JUPYTER NOTEBOOK

1) Once inside the Jupyter interface, create a folder to store your notebooks for the course

Anaconda
Overview

Installing
Anaconda

Launching NOTE: You can rename the folder by clicking “Rename” in the top left corner
Jupyter

Conda
2) Open your new coursework folder and launch your first Jupyter Notebook!
Environments

NOTE: You can rename the notebook by clicking on the title at the top of the screen

*Copyright Maven Analytics, LLC


THE NOTEBOOK SERVER

NOTE: When you launch a Jupyter Notebook, you’ll see a bunch of log data; this
Anaconda
is called a notebook server, and it powers the notebook interface
Overview

Installing
Anaconda

Launching
Jupyter

Conda If you close the server window,


Environments your notebooks will not run!

Depending on your OS, and method


of launching Jupyter, you may not see
this – as long as you can run your
notebooks, don’t worry!

*Copyright Maven Analytics, LLC


CONDA ENVIRONMENTS

A conda environment is a place on your computer where you can install specific
versions of Python and Python packages without affecting other projects
Anaconda
Overview

My computer
Installing
Anaconda Environment 1 Environment 2

Launching
Jupyter

3.13 3.10 2.2


Conda
Environments

I’m working on a beginner Python 101 I’m taking a NumPy course where the
project and am learning about built-in instructor is using Python 3.10 with NumPy
Python functions 2.2, and I want my code to match his
I’m going to activate Environment 1 I’m going to activate Environment 2 and
and do my Python coding here do my Python coding here

*Copyright Maven Analytics, LLC


CONDA ENVIRONMENTS

A conda environment is a place on your computer where you can install specific
versions of Python and Python packages without affecting other projects
Anaconda
Overview
Environment 1 Environment 2
Installing
Anaconda

Launching 3.13 3.10


Jupyter

Conda
Environments
2.2

We can use all built-in Python 3.13 functions, but get an error here We can use all built-in Python 3.10 functions, and we’re able to import
because the NumPy library isn’t available in this environment the NumPy library because it’s installed in this environment

*Copyright Maven Analytics, LLC


DEFAULT VS. NEW ENVIRONMENTS

As a Python beginner, you’ve likely been using the default environment, but
advanced users create new environments for each new, complex project
Anaconda • Creating a new environment gives us a blank slate to freshly install Python packages and
Overview
make sure the versions and dependencies are correct for each project

Installing
Anaconda Default environment New environment for sentiment analysis project

Launching
Jupyter

Conda
Environments
New environment for LLM project

If you’re just using a few basic libraries,


using the default environment for all
your projects is fine

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
> conda create --name llm_project_env
Anaconda 1 Create a new environment
base (default) llm_project_env
Launching
Jupyter
2 Activate the new environment

Conda 3 Install the packages you need


Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual


The base environment is A new, empty environment
active by default has been created called
6 Deactivate the environment llm_project_env

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
> conda activate llm_project_env
Anaconda 1 Create a new environment
base (default) llm_project_env
Launching
Jupyter
2 Activate the new environment

Conda 3 Install the packages you need


Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual


You can specify that you want
to use this environment
6 Deactivate the environment

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
> conda install transformers
Anaconda 1 Create a new environment
llm_project_env
Launching
Jupyter
2 Activate the new environment

Conda 3 Install the packages you need


Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual

6 Deactivate the environment


With just one line of code, the Transformers package is installed along
with its dependencies, which are other packages that it uses code from

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
> jupyter notebook
Anaconda 1 Create a new environment
llm_project_env
Launching
Jupyter
2 Activate the new environment

Conda 3 Install the packages you need


Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual

The Jupyter Notebook you open will have access to the


6 Deactivate the environment packages available in the active environment

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
from transformers import pipeline
Anaconda 1 Create a new environment
sentiment = pipeline("sentiment-analysis")
sentiment("I love NLP!")
Launching
Jupyter
2 Activate the new environment
llm_project_env
Conda 3 Install the packages you need
Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual

6 Deactivate the environment

The code uses the packages in the active environment

*Copyright Maven Analytics, LLC


CONDA WORKFLOW

This is the workflow for working with conda environments and packages:

Anaconda
Overview EXAMPLE Creating a new environment to use Hugging Face’s Transformers library

Installing
> conda deactivate
Anaconda 1 Create a new environment

base (default) llm_project_env


Launching
Jupyter
2 Activate the new environment

Conda 3 Install the packages you need


Environments

4 Launch Jupyter within this environment

5 Start writing Python code as usual


Deactivating takes you back
to the base environment
6 Deactivate the environment

*Copyright Maven Analytics, LLC


CONDA COMMANDS

These are some helpful commands when working with conda environments:

Anaconda
Overview Category Command Description
conda env list Lists all conda environments in your system
Installing
Anaconda
conda create --name test_env Creates a new environment called text_env
Environment
Launching conda activate test_env Activates the test_env environment
Jupyter
Deactivates the current environment and returns to
conda deactivate
the base environment
Conda
Environments conda list Lists installed packages in the active environment
Package
conda install Installs specified packages into the active environment

Exports package names and versions in the active


conda env export > text_env.yml
environment into a .yml file
YML
conda env create -f test_env.yml Create a new environment from a .yml file

*Copyright Maven Analytics, LLC


CONDA COMMANDS

All conda commands should be written and executed within the Terminal (Mac)
or Anaconda Prompt (PC) application
Anaconda
Overview
The (base) prefix tells us we’re “conda env list” is the conda command to
in the default environment display all the available environments
Installing
Anaconda

Launching
Jupyter The * signals the
active environment
Conda
Environments

These are the three NLP environments


we’ll be creating throughout this course
You will only see the (base) prefix if you have Anaconda installed

*Copyright Maven Analytics, LLC


ENVIRONMENTS IN THIS COURSE

We will be creating and using four conda environments throughout this course
• While you can still complete the course without utilizing environments, they will help keep you
Anaconda organized and avoid potential version conflicts
Overview

Installing
Anaconda Section Environment
1) Installation & Setup test_env
Launching If you have experience working
Jupyter 2) Natural Language Processing 101 with .yml files, you can find the
NLP environment .yml files in
3) Text Preprocessing nlp_basics the “Environments” folder
Conda within the course resources
Environments
4) NLP with Machine Learning nlp_machine_learning You can use them as reference
or to quickly create new conda
5) Neural Networks & Deep Learning environments

6) Transformers & LLMs

7) Hugging Face Transformers nlp_transformers

8) NLP Review & Next Steps

*Copyright Maven Analytics, LLC


NATURAL LANGUAGE PROCESSING 101

*Copyright Maven Analytics, LLC


NATURAL LANGUAGE PROCESSING 101

In this section we’ll cover the basics of natural language processing (NLP), including key
concepts, the evolution of NLP over the years, and its applications & Python libraries

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Understand the basics of NLP


NLP Basics History of NLP
• Learn how NLP has evolved over the years and the
techniques that are commonly used today
Techniques & Applications • Become familiar with the variety of applications,
techniques, and Python libraries available for NLP

*Copyright Maven Analytics, LLC


NLP BASICS

NATURAL LANGUAGE PROCESSING


NLP Basics noun
The application of machine learning algorithms to the analysis, understanding,
History of NLP and manipulation of written or spoken examples of human language*

Techniques &
Applications
Using computers to work with text data

Customer Rating Review

Remy 5 The food at this restaurant was amazing!!


It’s easy for humans to read & interpret
Anton 3.5 Long wait. The customer service was meh. these reviews, but computers struggle;
that’s where NLP comes in!
Colette 4.5 Great ambiance and drinks. Would come back!

*Dictionary.com
*Copyright Maven Analytics, LLC
NLP & AI

Natural Language Processing falls under Artificial Intelligence (AI), which is a


field that tries to replicate what humans can do using computers

NLP Basics

ARTIFICIAL INTELLIGENCE
History of NLP

Computer Vision Machine Learning Natural Language Processing


Techniques &
Applications
Seeing like a human Learning like a human Interpreting language like a human
Image recognition Supervised learning Sentiment analysis

Object detection Unsupervised learning Text summarization

Where does data science fit in here?


• Data scientists use data to extract insights, and they can do so by applying and interpreting
various computer vision, machine learning, and natural language processing models

*Copyright Maven Analytics, LLC


HISTORY OF NLP

The field of NLP has evolved significantly over the past 70+ years:

NLP Basics Early NLP Traditional NLP Modern NLP


Researchers were excited, but results Increasing excitement with the addition Explosion of research with
didn’t meet expectations as they realized of algorithms and computing power how well transformers
History of NLP the difficulty of the task performed (still active)

1950 1960 1970 1980 1990 2000 2010 2020


Techniques &
Applications

A Georgetown-IBM Statistical machine Deep learning is


experiment translates translation (SMT) done paired with NLP
Russian into English with using probabilistic
250 terms and 6 rules models instead of rules

Penn Treebank created with Transformers are introduced and


ELIZA, the first chatbot, 4 million annotated words can tackle a variety of NLP tasks
is created at MIT for NLP researchers

*Copyright Maven Analytics, LLC


HISTORY OF NLP

The field of NLP has evolved significantly over the past 70+ years:

NLP Basics Early NLP Traditional NLP Modern NLP

Rules-based techniques Statistical techniques Recurrent-based


History of NLP • Grammar rules • Statistics techniques
• Pattern matching • Probability • RNNs, LSTMs, etc.

Techniques &
Machine learning techniques Transformer-based
Applications
• Supervised learning techniques
• Unsupervised learning • LLMs: GPT, BERT, etc.

*Copyright Maven Analytics, LLC


HISTORY OF NLP

The field of NLP has evolved significantly over the past 70+ years:

NLP Basics Early NLP Traditional NLP Modern NLP

Rules-based techniques Statistical techniques Recurrent-based


History of NLP • Grammar rules • Statistics techniques
• Pattern matching • Probability • RNNs, LSTMs, etc.

Techniques &
Machine learning techniques Transformer-based
Applications Traditional NLP has largely
• Supervised learning techniques
replaced Early NLP
• Unsupervised learning • LLMs: GPT, BERT, etc.

*Copyright Maven Analytics, LLC


HISTORY OF NLP

The field of NLP has evolved significantly over the past 70+ years:

NLP Basics Early NLP Traditional NLP Modern NLP

Rules-based techniques Statistical techniques Recurrent-based


History of NLP • Grammar rules • Statistics techniques
• Pattern matching • Probability • RNNs, LSTMs, etc.

Techniques &
Machine learning techniques Transformer-based
Applications
• Supervised learning techniques
• Unsupervised learning • LLMs: GPT, BERT, etc.

Transformer-based NLP
has largely replaced
Recurrent-based NLP

*Copyright Maven Analytics, LLC


HISTORY OF NLP

The field of NLP has evolved significantly over the past 70+ years:

NLP Basics Early NLP Traditional NLP Modern NLP

Rules-based techniques Statistical techniques Recurrent-based


History of NLP • Grammar rules • Statistics techniques
• Pattern matching • Probability • RNNs, LSTMs, etc.

Techniques &
Machine learning techniques Transformer-based
Applications
• Supervised learning techniques
• Unsupervised learning • LLMs: GPT, BERT, etc.

There’s a big mindset shift from traditional to modern NLP:


• With traditional NLP, there’s a focus on understanding everything that’s happening
• With modern NLP, there’s no way to understand everything that’s happening, and
what matters most is performance, even if it’s all a black box

*Copyright Maven Analytics, LLC


NLP APPLICATIONS & TECHNIQUES

There are numerous NLP applications & techniques that we’ll cover:

NLP Basics
NLP Category Technique Application

History of NLP Rules-Based Sentiment Analysis

Supervised Learning (Naïve Bayes) Text Classification


Techniques &
Applications Traditional Unsupervised Learning (NMF) Topic Modeling

Sentiment Analysis
Encoder-Only LLM (BERT)
Named Entity Recognition (NER)
Zero Shot Classification
Encoder-Decoder LLM (BART)
Text Summarization
Modern Decoder-Only LLM (GPT) Text Generation

Embeddings (MiniLM) Document Similarity

*Copyright Maven Analytics, LLC


NLP LIBRARIES IN PYTHON

Most general data science tasks can be done using Pandas and Scikit-learn, but
there are many available Python libraries for NLP tasks:

NLP Basics Course Section Library Applications


Pandas Cleaning & Normalization
History of NLP Text Preprocessing SpaCy Cleaning & Normalization, Linguistic Analysis

Scikit-learn Vectorization
Techniques &
Applications VADER Sentiment Analysis
NLP with Machine Learning
Scikit-learn Text Classification, Topic Modeling

Neural Networks & Deep Learning Scikit-learn Classification & Regression

Hugging Face Transformers Transformers Text Summarization, Text Generation, etc.

These are other popular NLP libraries that we will NOT be covering as a part of this course
(nltk, genism, TensorFlow, and PyTorch) as we will focus on the simplicity and ease of use

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Natural language processing allows computers to work with text data


• NLP falls under the umbrella of Artificial Intelligence (AI), which is about making computers imitate human
behaviors (like interpreting their natural language)

NLP techniques have greatly evolved over the past 70+ years
• Starting with rules-based techniques in the 1950s-70s, then moving onto traditional ML techniques in the
1980s-2000s, and currently modern NLP with deep learning and transformers-based techniques

There can be multiple approaches to tackle various NLP problems


• While transformers have been popularized for providing amazing results, simple rules-based or machine-learning
based techniques are still important to understand for smaller to medium data sets

Python is one of the best coding languages for applying NLP techniques
• There are many NLP libraries, such as scikit-learn and transformers, which integrate well into other frameworks

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING

In this section we’ll review the text preprocessing steps required before applying machine
learning algorithms, including cleaning, normalization, vectorization, and more

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

Text Preprocessing • Learn the standard natural language processing


NLP Pipeline workflow, also called the NLP pipeline
with Pandas
• Apply text cleaning and normalization techniques
Text Preprocessing using Python’s Pandas and spaCy libraries
Vectorization
with spaCy
• Understand how to format text data in a way that a
computer can process using vectorization with
word counts and TF-IDF scores

*Copyright Maven Analytics, LLC


DATA SCIENCE WORKFLOW

NLP projects follow the same data science workflow, except there’s an extra
text preprocessing step between cleaning and exploring data:
NLP Pipeline

Text Preprocessing

1 2 3 4 5 6
Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization
Scoping a Gathering Cleaning Exploring Modeling Sharing
Project Data Data Data Data Insights

For NLP projects, this portion is called the NLP pipeline, which is the
series of steps your text data goes through for processing and analysis

*This workflow is discussed in more detail in the Data Prep & EDA course
*Copyright Maven Analytics, LLC
TEXT PREPROCESSING

Text preprocessing is about preparing raw text data for analysis and modeling:

NLP Pipeline

1 2 3 3.5 4 5 6
Text Preprocessing
with Pandas Scoping a Gathering Cleaning Text Exploring Modeling Sharing
project data data Preprocessing data data insights
Text Preprocessing
with spaCy

Vectorization Generally Text data


clean text Cleaning & Normalization Vectorization ready for
data EDA and
• Cleaning: remove unnecessary text • Turn text into a matrix of numbers modeling
• Normalization: make text consistent • Each document is represented by a
• These steps can be done using a vector of counts or TF-IDF values
combination of Pandas and spaCy • This can be done with scikit-learn

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING TECHNIQUES

We’ll be covering these text preprocessing techniques:

NLP Pipeline
Category Concepts Description

Text Preprocessing Lowercasing Convert all text to lowercase


with Pandas
Special Characters Remove punctuation & special characters using regular expressions

Text Preprocessing Tokenization Split text into smaller units, like words or sentences
Cleaning &
with spaCy Normalization*
Stemming / Lemmatization Reduce words to their root or base form

Stop Words Remove common, non-essential words


Vectorization
Parts of Speech (POS) Tagging Identify grammatical roles of words (nouns, verbs, etc.)

Document-Term Matrix (DTM) Represent text by word frequency, also known as Bag of Words
Vectorization
TF-IDF Extension of DTM that weights words based on their importance

*These text cleaning and normalization steps can be mixed and matched

*Copyright Maven Analytics, LLC


ASSIGNMENT: CREATE A NEW ENVIRONMENT

Key Objectives
NEW MESSAGE
May 15, 2025 1. Open the Terminal (Mac) or Anaconda Prompt
From: Lexi Con (Lead Data Scientist) (PC) application and create a new conda
environment called “nlp_basics”
Subject: NLP Onboarding
2. Activate the “nlp_basics” environment
Hi,
3. Install Python, Jupyter Notebook, Pandas, spaCy,
I hear you’re the new associate data scientist on the team – Scikit-learn, and Matplotlib in the environment
welcome!
We’re currently kicking off several natural language processing
4. Launch Jupyter within the environment
projects with our client, Maven Books. 5. Write and execute a line of Python code
I’d like to get you involved ASAP. Can you create a new conda
environment on your computer, and install the latest versions of
Python and any other NLP libraries you might need?
Talk soon!
Lexi

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING WITH PANDAS

The Pandas library is used for simple text cleaning and normalization
• Use str.lower() to make all text lowercase
NLP Pipeline
• Use str.replace() to replace special characters (punctuation, numbers, etc.)

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

Everything is lowercase now!

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING WITH PANDAS

The Pandas library is used for simple text cleaning and normalization
• Use str.lower() to make all text lowercase
NLP Pipeline
• Use str.replace() to replace special characters (punctuation, numbers, etc.)

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

No text between brackets!

PRO TIP: Regular expressions (regex) allow you to find patterns; once you understand
the basic concept, you can use tools like ChatGPT to generate the syntax

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING WITH PANDAS

The Pandas library is used for simple text cleaning and normalization
• Use str.lower() to make all text lowercase
NLP Pipeline
• Use str.replace() to replace special characters (punctuation, numbers, etc.)

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

No more special characters!

PRO TIP: Regular expressions (regex) allow you to find patterns; once you understand
the basic concept, you can use tools like ChatGPT to generate the syntax

*Copyright Maven Analytics, LLC


ASSIGNMENT: TEXT PREPROCESSING WITH PANDAS

Key Objectives
NEW MESSAGE
May 16, 2025 1. Read the childrens_books.csv file into a Jupyter
From: Lexi Con (Lead Data Scientist) Notebook
Subject: Text preprocessing request 2. Within the Description column:
a) Make all the text lowercase
Hello,
b) Remove all \xa0 characters
Now that you’re all settled in, let me get you up to speed with
c) Remove all punctuation
our first task for the Maven Books project: text preprocessing
I hear you’re already familiar with Pandas. We’ve been given a
flat file of the top 100 children's books over the past century.
Can you use Pandas string functions to do some text
normalization and cleaning?
Thank you!
Lexi

childrens_books.csv

*Copyright Maven Analytics, LLC


TEXT PREPROCESSING WITH SPACY

The spaCy library can handle many NLP tasks, including tokenization,
lemmatization, stop words, and more
NLP Pipeline • The first step is to turn a text string into a spaCy doc object

Text Preprocessing
with Pandas
When you import spaCy, you need to specify which language
to use – in this case, we’re choosing English, which includes
Text Preprocessing information from a large amount of annotated text
with spaCy

Vectorization

Here we’re converting a single string into a spaCy object


Now that doc (document) is a spaCy object, we can use
all the available spaCy NLP methods on the text

*Copyright Maven Analytics, LLC


TOKENIZATION

Tokenization lets you break text up into smaller units, like words
• Text strings are often split by whitespace to make tokens
NLP Pipeline

Text Preprocessing
with Pandas

Text Preprocessing This [] syntax is called a list comprehension


with spaCy The way to read it is, for every token in the
document, return the token text

Vectorization

spaCy mainly splits on whitespace, but there’s some additional, smarter logic:
• Common contractions are separated (I’m)
• Punctuation is typically separated unless it’s a URL, email address, etc.
• …and much more!

*Copyright Maven Analytics, LLC


LEMMATIZATION

Lemmatization reduces words to their base form


• spaCy uses a combination of linguistic rules and statistical models to lemmatize text
NLP Pipeline

Text Preprocessing
with Pandas

With lemmatization:
Text Preprocessing • “i” has been updated to “I”
with spaCy • “selling” has been updated to “sell”
• “lemons” has been updated to “lemon”

Vectorization

What’s the difference between lemmatization and stemming?


• They both reduce words to their base form, but lemmatization is the smarter approach
and generally performs better – when choosing one, go with lemmatization
• Stemming: am → am, is → is, are → ar happy → happi, happiness → happi
• Lemmatization: am → be, is → be, are → be happy → happy, happiness → happy

*Copyright Maven Analytics, LLC


STOP WORDS

Stop words are words without any significant meaning


NLP Pipeline • You can view the full stop word list in spaCy with the code print(nlp.Defaults.stop_words)

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

The logic here is to only return


Vectorization tokens that are not stop words

Note that “I” and “for” have been removed

*Copyright Maven Analytics, LLC


PARTS OF SPEECH TAGGING

Parts of speech (POS) tagging lets you label nouns, verbs, etc. within text data
• This is optional, but is sometimes used as a filtering technique to only look at nouns and
NLP Pipeline pronouns for analysis, for example

Text Preprocessing
with Pandas
This is a lesser used technique
compared to the others and one of
Text Preprocessing many linguistic analysis capabilities
with spaCy available within spaCy:
• Other types of linguistic
analysis include Named Entity
Vectorization Recognition (NER),
Dependency Parsing and more
• Linguistic analysis techniques
work better with raw text
• spaCy uses a combination of
linguistic rules and statistical
models for linguistic analysis

*Copyright Maven Analytics, LLC


ASSIGNMENT: TEXT PREPROCESSING WITH SPACY

Key Objectives
NEW MESSAGE
May 19, 2025 1. In addition to the lowercasing and special
From: Lexi Con (Lead Data Scientist) character removal from the previous assignment,
within the cleaned Description column:
Subject: RE: Text preprocessing request
a) Tokenize the text
Hi again, b) Lemmatize the text
Thanks for the first round of text preprocessing you did c) Remove stop words
earlier with Pandas!
Could you do a second round of normalization and cleaning
on the Description column with spaCy to tokenize, lemmatize
and remove stop words from the text?
Thank you!
Lexi

*Copyright Maven Analytics, LLC


VECTORIZATION

Vectorization is the process of converting text data into numeric data so that
future data analysis and machine learning techniques can be applied
NLP Pipeline • Most ML techniques require text data to be cleaned, normalized and in a numeric format
• Some techniques, such as sentiment analysis, require text data to be in its raw text form
Text Preprocessing
with Pandas
We will be covering these vectorization techniques:
Text Preprocessing
with spaCy Word Counts TF-IDF Embeddings

Vectorization

(We’ll cover this in modern NLP later!)

*Copyright Maven Analytics, LLC


DOCUMENT-TERM MATRIX

Clean, normalized text is vectorized as a Document-Term Matrix (DTM)


• Each row represents a document, and each column represents a term
NLP Pipeline • The values within the DTM can be word counts, TF-IDF scores, or other calculated values

Text Preprocessing
with Pandas

Text Preprocessing In this example, every


with spaCy value is the count of
each term (columns) in
each document (rows)
Vectorization

A DTM is a bag of words representation of text, where each document is


represented by how often certain words appear, regardless of word order

*Copyright Maven Analytics, LLC


COUNT VECTORIZER IN PYTHON

Create a Count Vectorizer object to make a Document-Term Matrix in Python

NLP Pipeline
from sklearn.feature_extraction.text import CountVectorizer

Text Preprocessing cv = CountVectorizer(stop_words='english', ngram_range=(1,2), min_df=2)


with Pandas

Text Preprocessing
with spaCy Language to remove Range for the sequence of “n” words Number of OR percent of
stop words for to consider as a term in the DTM documents a term needs
(default is None) Examples: to appear in to be included
Vectorization
in the DTM
• (1,1) – “data” (default)
• (1,2) – “data”, “data science” (default is 1)
• (3,3) – “data science workflow”

You’ll notice that we’re able to tokenize and remove stop words using both spaCy
AND sklearn, so it’s your choice with which library you choose to do so

*Copyright Maven Analytics, LLC


COUNT VECTORIZER IN PYTHON

Create a Count Vectorizer object to make a Document-Term Matrix in Python

NLP Pipeline

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

With the default parameters, these are the word counts for
the 15 terms (columns) across the 8 documents (rows)

*Copyright Maven Analytics, LLC


COUNT VECTORIZER IN PYTHON

Create a Count Vectorizer object to make a Document-Term Matrix in Python

NLP Pipeline

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

With these parameters, we’re removing all English stop words, returning all one
and two-word terms, and keeping terms that appear in 2 or more documents
The columns have been reduced from the original 15 to 9!

*Copyright Maven Analytics, LLC


ASSIGNMENT: COUNT VECTORIZER

Key Objectives
NEW MESSAGE
May 20, 2025 1. Vectorize the cleaned and normalized text using
From: Lexi Con (Lead Data Scientist)
Count Vectorizer with the default parameters
Subject: Vectorization request 2. Modify the Count Vectorizer parameters to reduce
the number of columns:
Hello,
a) Remove stop words
Now that you’ve cleaned and normalized the book b) Set a minimum document frequency of 10%
descriptions using pandas and spaCy, can you create a quick
visualization to show the top 10 most common terms in the 3. Use the updated Count Vectorizer to identify the:
descriptions?
a) Top 10 most common terms
Could you also share some of the less common terms that
appear in multiple book descriptions? b) Top 10 least common terms that appear in at least
10% of the documents
Thanks!
Lexi
4. Create a horizontal bar chart of the top 10 most
common terms

*Copyright Maven Analytics, LLC


TF-IDF

Term Frequency-Inverse Document Frequency (TF-IDF) is an alternative to the


word count calculation in a DTM
NLP Pipeline • It emphasizes important words by reducing the impact of common words

Text Preprocessing
with Pandas Term Frequency Inverse Document Frequency
Text Preprocessing Problem it solves: Problem it solves:
with spaCy High counts can dominate, especially for Each word is treated equally, even when
high frequency words or long documents some might be more important
Vectorization
Solution: Solution:
Normalize the counts so they’re all on the Assign more weight to rare words than to
same scale common words

𝑇𝑒𝑟𝑚 𝑐𝑜𝑢𝑛𝑡 𝑖𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑇𝑜𝑡𝑎𝑙 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 + 1


𝑇𝐹 𝐼𝐷𝐹 = ∗ 𝑙𝑜𝑔
𝑇𝑜𝑡𝑎𝑙 𝑡𝑒𝑟𝑚𝑠 𝑖𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚 + 1

*Copyright Maven Analytics, LLC


TF-IDF VECTORIZER IN PYTHON

Create a TF-IDF Vectorizer object in Python to use TD-IDF scores in your DTM
• It has many of the same parameters as the Count Vectorizer
NLP Pipeline

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

With the default parameters, we get the same 15 terms (columns) across
8 documents (rows), but with TF-IDF scores instead of word counts

*Copyright Maven Analytics, LLC


TF-IDF VECTORIZER IN PYTHON

Create a TF-IDF Vectorizer object in Python to use TD-IDF scores in your DTM
• It has many of the same parameters as the Count Vectorizer
NLP Pipeline

Text Preprocessing
with Pandas

Text Preprocessing
with spaCy

Vectorization

With the same parameters as earlier, we’re back down to 9 terms instead of 15

*Copyright Maven Analytics, LLC


COUNTS VS TF-IDF SCORES

Here’s a comparison between word counts & TF-IDF scores from the same data:

NLP Pipeline
3

3
Text Preprocessing
1 1
with Pandas

Text Preprocessing 2 2
with spaCy

Vectorization 3 3

1.
1 “lemon”, “market”, and “maven” are all equal 1.
1 “maven” and “market” have higher values since they are more rare

1.
2 The value of 6 for “lemon” skews the results 1.
2 Everything ranges between 0 and 1

1.
3 “lemonade” shows up three times and “tea” twice 1.
3 “lemonade” is high for rows 0 and 2, but lower than “tea” in row 6

*Copyright Maven Analytics, LLC


ASSIGNMENT: TF-IDF VECTORIZER

Key Objectives
NEW MESSAGE
May 21, 2025 1. Vectorize the cleaned and normalized text using TF-
From: Lexi Con (Lead Data Scientist)
IDF Vectorizer with the default parameters
Subject: RE: Vectorization request 2. Modify the TF-IDF Vectorizer parameters to reduce
the number of columns:
Hi again – Can you do the same analysis as last time, but using
a) Remove stop words
TF-IDF instead and compare the two results? Thanks!
b) Set a minimum document frequency of 10%
---
c) Set a maximum document frequency of 50%
Hello,
Now that you’ve cleaned and normalized the book 3. Using the updated TF-IDF Vectorizer, create a
descriptions using pandas and spaCy, can you create a quick horizontal bar chart of the top 10 most highly
visualization to show the top 10 most common terms in the weighted terms
descriptions?
4. Compare the Count Vectorizer bar chart from the
Thanks! previous assignment with the TF-IDF Vectorizer bar
Lexi
chart and note the differences in the top term lists

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

NLP projects have an extra text preprocessing step in the DS workflow


• Text preprocessing is a part of the NLP pipeline, which is the series of steps your text data goes through for
processing and analysis, including gathering, cleaning (general and text-specific), exploring and modeling

Text cleaning & normalization can be done using Pandas and spaCy
• Pandas is good for simple tasks like lowercasing and removing text with regular expressions
• spaCy can perform more advanced linguistic tasks like tokenization, lemmatization, removing stop words, and more
• By putting the steps into Python functions, you can better organize your code and create an NLP pipeline

Vectorization is the process of making text numeric for future analysis


• Vectorization starts by creating a document-term matrix and often follows the bag of words model
• The values can be filled with term counts (Count Vectorizer) or TF-IDF scores (TF-IDF Vectorizer)
• Later we’ll talk about embeddings, which is another technique that takes word order and meaning into account

*Copyright Maven Analytics, LLC


NLP WITH MACHINE LEARNING

*Copyright Maven Analytics, LLC


NLP WITH MACHINE LEARNING

In this section, we’ll highlight tasks that can be solved using traditional NLP methods,
including rules-based, and supervised & unsupervised machine learning techniques

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

Machine Learning Traditional NLP • Understand how rules-based sentiment analysis


Refresher Overview techniques work, and become familiar with the
VADER library

Sentiment Analysis Text Classification • Use Naïve Bayes and Logistic Regression as
supervised learning approaches for text
classification the with scikit-learn library
Topic Modeling • Use Non-Negative Matrix Factorization (NMF) as
an unsupervised learning approach for topic
modeling with the scikit-learn library

*Copyright Maven Analytics, LLC


WHAT IS MACHINE LEARNING?

Data scientists use machine learning algorithms to enable computers to learn


and make decisions from data
Machine Learning
Refresher
Machine learning algorithms fall into two broad categories:

Traditional NLP
Overview
Supervised Learning Unsupervised Learning
Using historical data to predict the future Finding patterns and relationships in data
Sentiment
Analysis

Text
Classification
What will house prices look like How can I segment my
for the next 12 months? customers?

Topic Modeling

How can I flag suspicious emails What hidden themes are in


as spam? these product reviews?

*Copyright Maven Analytics, LLC


COMMON ML ALGORITHMS

You can use any of these common machine learning algorithms for natural
language processing tasks once you’ve preprocessed your text data:
Machine Learning
Refresher

MACHINE LEARNING
Traditional NLP
Overview

Supervised Learning Unsupervised Learning


Sentiment
Analysis K-Means Clustering
Regression Classification Hierarchical Clustering
Text
Linear Regression K-Nearest Neighbors DBSCAN
Classification
Regularized Regression Logistic Regression Isolation Forests
In this section, we’ll introduce these two
Time Series Analysis Decision Trees Principal Component Analysis techniques that are commonly used for
Topic Modeling NLP tasks like text classification (Naïve
Random Forests t-SNE
Bayes) and topic modeling (NMF)
Gradient Boosted Trees Singular Value Decomposition

Naïve Bayes Non-Negative Matrix Factorization

*The majority of these algorithms are explained in detail in courses 2-4 of this Data
Science in Python series (Regression, Classification and Unsupervised Learning)
*Copyright Maven Analytics, LLC
TRADITIONAL NLP OVERVIEW

These common NLP tasks are often solved using traditional NLP methods, such
Machine Learning
as simple rules-based techniques or more advanced ML algorithms
Refresher

Traditional NLP
Overview

Sentiment
Analysis Sentiment Analysis Text Classification Topic Modeling

Text Identifying the positivity or Classifying text as one label Finding themes within a corpus of
Classification negativity of text or another text

• Technique: Rules-based • Technique: Supervised • Technique: Unsupervised


Topic Modeling • Library: VADER learning (i.e., Naïve Bayes) learning (i.e., NMF)
• Input format: raw text • Library: scikit-learn • Library: scikit-learn
• Input format: CV / TF-IDF • Input format: CV / TF-IDF

*Copyright Maven Analytics, LLC


TRADITIONAL VS. MODERN NLP

When should I use traditional vs. modern NLP techniques?


• In summary, start simple!
Machine Learning
Refresher

Traditional NLP What is my NLP goal? How much data do I have?


Overview

Sentiment Analysis Small to medium data Try traditional


Sentiment (<100k rows) techniques first
Analysis These can be done with
Text Classification traditional techniques
Consider modern
Text Topic Modeling Big data (>1M rows)
techniques
Classification

Text Generation
Topic Modeling
These cannot be done with traditional
Machine Translation techniques, so use modern techniques

Question Answering

*Copyright Maven Analytics, LLC


SENTIMENT ANALYSIS

Sentiment analysis is used to determine the positivity or negativity of text


• An overall sentiment score between -1 and +1 is given to each block of text
Machine Learning
Refresher

Traditional NLP “A dozen lemons will make a gallon of lemonade.” Neutral


Overview

“When life gives you lemons, make lemonade! ☺” Positive


Sentiment
Analysis
“I didn't like the taste of that lemonade at all.” Negative

Text
Classification These are hints that this is
positive / negative text

Topic Modeling

You’ll notice that sentiment analysis is applied on raw text – it’s not cleaned
because punctuation matters, and it’s not vectorized because word order matters

*Copyright Maven Analytics, LLC


SENTIMENT ANALYSIS IN PYTHON

Sentiment analysis can be done using rules-based techniques with libraries like
VADER, classification techniques (up next), or modern NLP techniques (later)
Machine Learning
Refresher

Traditional NLP
Overview

Sentiment
Analysis

Text
Classification
0% of the text is negative, 75% is neutral, and 25% is positive Overall sentiment
score is positive!
Topic Modeling

VADER assigns predefined sentiment weights to words (amazing = 2.8, horrible = -2.5),
incorporates modifiers (not, very, caps, punctuation, etc.), and computes a final score

*The entire 7000+ word lexicon can be found here:


https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt
*Copyright Maven Analytics, LLC
ASSIGNMENT: SENTIMENT ANALYSIS

Key Objectives
NEW MESSAGE
May 22, 2025 1. Create a new “nlp_machine_learning” environment
From: Oscar Wynn (The Movie Maven) 2. Launch Jupyter Notebook
Subject: Feel good vs dark movies
3. Read in the movie_reviews.csv file
Hi there, 4. Apply sentiment analysis to the movie_info column
We’re a small entertainment news and movie reviews
5. Sort the sentiment scores to return the top 10 and
website, focused on data-driven content.
bottom 10 sentiment scores and their
We’re publishing an article on the top 10 most feel-good corresponding movie titles
movies and the top 10 darkest movies according to data.
Could you use sentiment analysis to help us come up with
movies for these two lists?
Thanks!
Oscar

movie_reviews.csv

*Copyright Maven Analytics, LLC


TEXT CLASSIFICATION

Text classification is used to categorize text into groups based on labeled data

Machine Learning
Refresher

Traditional NLP
Overview Spam Tech
support
Not Billing Other
Sentiment spam issues
Analysis

Text
Classification
These existing emails have been These existing customer support tickets have been
prelabeled as spam or not spam prelabeled as billing issues, tech support and other

Topic Modeling

Send Given this new email, text Help me Given this new ticket, text classification
money classification will tell us if reset my will tell us what type of ticket it is
it’s spam or not spam
ASAP! password

*Copyright Maven Analytics, LLC


TEXT CLASSIFICATION ALGORITHMS

You can input vectorized text data into any classification algorithm:
Machine Learning • KNN, Logistic Regression, Decision Trees, Random Forests, Gradient Boosted Trees, etc.
Refresher
• Naïve Bayes is another classification algorithm that works especially well on text data

Traditional NLP
Overview

Sentiment
Which classification algorithm should I choose for my text data?
Analysis
• For small data sets (<10k rows), start with Naïve Bayes and other simple
Text
models like Logistic Regression, KNN, etc.
Classification • For medium data sets (<100k rows), start with Logistic Regression and
other classification techniques like Decision Trees, Random Forests,
Topic Modeling
Gradient Boosted Trees, etc.
• For large data sets (>1M rows), start with Gradient Boosted Trees and
potentially move on to modern NLP techniques with LLMs

*Copyright Maven Analytics, LLC


NAÏVE BAYES

Naïve Bayes is a technique that’s commonly used for text classification


• It’s based on Bayes Theorem, which assumes conditionally independent features
Machine Learning
Refresher • This independence assumption is naïve, but the algorithm works surprisingly well on text data

Traditional NLP
Overview
EXAMPLE If an email contains the word “ASAP”, how likely is it going to be spam?

Sentiment
Analysis
The probability that the word The probability
ASAP appears in a spam email an email is spam
Text
Classification
𝑃 𝐴𝑆𝐴𝑃 𝑆𝑝𝑎𝑚 ∗ 𝑃(𝑆𝑝𝑎𝑚)
𝑃 𝑆𝑝𝑎𝑚 𝐴𝑆𝐴𝑃 =
Topic Modeling 𝑃(𝐴𝑆𝐴𝑃)
The probability that an
email is spam, given it The probability the word
contains the word ASAP ASAP is in any email

*Copyright Maven Analytics, LLC


NAÏVE BAYES

Naïve Bayes is a technique that’s commonly used for text classification


• It’s based on Bayes Theorem, which assumes conditionally independent features
Machine Learning
Refresher • This independence assumption is naïve, but the algorithm works surprisingly well on text data

Traditional NLP
Overview
EXAMPLE If an email contains the word “ASAP”, how likely is it going to be spam?

Sentiment
Analysis Distribution of 1,000 emails:

SPAM
Text 1 0
Classification 50 250
1 50 10 𝑃 𝐴𝑆𝐴𝑃 𝑆𝑝𝑎𝑚 ∗ 𝑃(𝑆𝑝𝑎𝑚) 250 ∗ 1000 0.2 ∗ 0.25
ASAP

= 60 = ≈ 0.83
0 200 740 𝑃(𝐴𝑆𝐴𝑃) 1000
0.06
Topic Modeling

There’s an 83% chance an email is


spam if it contains the word “ASAP”
When looking at one word, the calculation is just Bayes Theorem

*Copyright Maven Analytics, LLC


NAÏVE BAYES

Naïve Bayes is a technique that’s commonly used for text classification


• It’s based on Bayes Theorem, which assumes conditionally independent features
Machine Learning
Refresher • This independence assumption is naïve, but the algorithm works surprisingly well on text data

Traditional NLP
Overview
EXAMPLE If an email contains the word “ASAP” and the “$” symbol, how likely is it going to be spam?

Sentiment
Analysis
The probability that the word The probability that $ The probability an
ASAP appears in a spam email appears in a spam email email is spam
Text
Classification
𝑃 𝐴𝑆𝐴𝑃 𝑆𝑝𝑎𝑚 ∗ 𝑃 $ 𝑆𝑝𝑎𝑚 ∗ 𝑃(𝑆𝑝𝑎𝑚)
𝑃 𝑆𝑝𝑎𝑚 𝐴𝑆𝐴𝑃, $ =
Topic Modeling 𝑃(𝐴𝑆𝐴𝑃, $)
The probability that an
email is spam, given it The probability that an email
contains ASAP and $ contains both ASAP and $

*Copyright Maven Analytics, LLC


NAÏVE BAYES

Naïve Bayes is a technique that’s commonly used for text classification


• It’s based on Bayes Theorem, which assumes conditionally independent features
Machine Learning
Refresher • This independence assumption is naïve, but the algorithm works surprisingly well on text data

Traditional NLP
Overview
EXAMPLE If an email contains the word “ASAP” and the “$” symbol, how likely is it going to be spam?

Sentiment
Analysis
This is the naïve assumption – the probability that an email contains ASAP and The probability an
the probability it contains $ are not independent, they’re actually correlated email is spam
Text
Classification
𝑃 𝐴𝑆𝐴𝑃 𝑆𝑝𝑎𝑚 ∗ 𝑃 $ 𝑆𝑝𝑎𝑚 ∗ 𝑃(𝑆𝑝𝑎𝑚)
𝑃 𝑆𝑝𝑎𝑚 𝐴𝑆𝐴𝑃, $ =
Topic Modeling 𝑃(𝐴𝑆𝐴𝑃, $)
The probability that an
email is spam, given it The probability that an email
contains ASAP and $ contains both ASAP and $

*Copyright Maven Analytics, LLC


NAÏVE BAYES

Naïve Bayes is a technique that’s commonly used for text classification


• It’s based on Bayes Theorem, which assumes conditionally independent features
Machine Learning
Refresher • This independence assumption is naïve, but the algorithm works surprisingly well on text data

Traditional NLP
Overview
EXAMPLE If an email contains the word “ASAP” and the “$” symbol, how likely is it going to be spam?

Sentiment
Analysis Distribution of 1,000 emails:

SPAM
Text 1 0
Classification 1 50 10 𝑃 𝐴𝑆𝐴𝑃 𝑆𝑝𝑎𝑚 ∗ 𝑃 $ 𝑆𝑝𝑎𝑚 ∗ 𝑃(𝑆𝑝𝑎𝑚)
ASAP

≈ 0.95
0 200 740 𝑃(𝐴𝑆𝐴𝑃, $)
Topic Modeling
1 0
There’s a 95% chance an email is
1 80 30 spam if it contains “ASAP” and “$”
These probabilities are all calculated
$

automatically by the model!


0 170 720

ASAP and $: 42
*Copyright Maven Analytics, LLC
NAÏVE BAYES IN PYTHON

Use sklearn’s MultinomialNB to perform Naïve Bayes in Python


• The input should be a CountVectorizer or TfidfVectorizer
Machine Learning
Refresher • There are no parameters to tune with Naïve Bayes

Traditional NLP
Overview

This follows the typical sklearn process


Sentiment for a supervised learning model*:
Analysis 1. Instantiate an object
2. Fit a model
3. Make a prediction
Text
Classification

Topic Modeling
We’re using MultinomialNB because the inputs are counts (like you would see in a CountVectorizer
output) – for 1/0 values, like in the previous spam example, you would use BernoulliNB instead

*This sklearn process is discussed in much more detail in the


Classification course in the Data Science in Python series
*Copyright Maven Analytics, LLC
TEXT CLASSIFICATION NEXT STEPS

Once you fit your first Naïve Bayes model in Python, you can improve your text
classification model by tuning any part of the NLP pipeline:
Machine Learning
Refresher
1 Text preprocessing
Traditional NLP • Update any cleaning or normalization steps
Overview

2 Vectorization
Sentiment
Analysis • Fine tune the CountVectorizer parameters (stop_words, ngram_range, min_df, etc.)
• Try using TfidfVectorizer instead
Text
Classification
3 Feature engineering
• Include non-term values such as text length, sentiment score, time of day sent, etc.
Topic Modeling

4 Modeling
• Try a different probability cutoff point instead of the default 50% probability
• Try a different classification model (Logistic Regression, Gradient Boosted Trees, etc.)

*Copyright Maven Analytics, LLC


ASSIGNMENT: TEXT CLASSIFICATION

Key Objectives
NEW MESSAGE
May 23, 2025 1. Clean and normalize the “movie_info” column using
From: Oscar Wynn (The Movie Maven) the “maven_text_preprocessing.py” module
Subject: Female vs male directors 2. Create a Count Vectorizer
• Remove stop words
Hi again,
• Set the minimum document frequency to 10%
Our next piece is going to spotlight female directors, and we
want to see if there are any differences between the types of 3. Create a Naïve Bayes model and a Logistic
movies that female versus male directors create. Regression model to predict which movies are
Could you create a classification model that predicts which directed by women vs men using the CV
movies are directed by females versus males based their
movie descriptions? 4. Compare their accuracy scores and classification
reports
Please also send over a list of the top 5 movies that are most
likely directed by a female according to the model. 5. Using the better performing model, return the top
Thanks! 5 movies that the model predicts are most likely
directed by a women

*Copyright Maven Analytics, LLC


TOPIC MODELING

Topic modeling is used to find themes in unlabeled text documents


• Topic modeling techniques extract the topics, but it’s up to you to interpret and name them
Machine Learning
Refresher

Topic 1 Topic 2
Traditional NLP
Overview “I like lemons and limes.” 100% 0%

“Puppies and kittens are so cute.” 0% 100%


Sentiment
Analysis
“I’m making cat-shaped cookies.” 50% 50%

Text “My dog loves apples and blueberries.” 67% 33%


Classification

Topic Modeling
What are topics 1 and 2?
• Topic 1: lemons, limes, cookies, apples, blueberries Food
• Topic 2: puppies, kittens, cat, dog Animals

*Copyright Maven Analytics, LLC


TOPIC MODELING ALGORITHMS

You can input vectorized text data into a topic modeling algorithm
Machine Learning
Refresher

Which topic modeling algorithm should I choose for my text data?


Traditional NLP
Overview
• For small data sets (<10k rows), start with Non-Negative Matrix Factorization
(NMF) using the sklearn library
Sentiment
Analysis • For medium data sets (<100k rows), start with Latent Dirichlet Allocation
(LDA) using the genism library
Text • For large data sets (>1M rows), use modern embedding-based NLP approaches
Classification
such as BERTopic and Top2Vec (embeddings will be discussed later!)

Topic Modeling

In this course, we’ll be demoing NMF because it’s in sklearn. For more details on LDA, you can check
out my YouTube video on LDA using genism: https://www.youtube.com/watch?v=NYkbqzTlW3w

*Copyright Maven Analytics, LLC


NON-NEGATIVE MATRIX FACTORIZATION

Non-Negative Matrix Factorization (NMF) is a topic modeling technique that


decomposes the document-term matrix (V) into two other matrices:
Machine Learning • A document-topic matrix (W) that shows how much each topic appears in each document
Refresher
• A topic-term matrix (H) that shows how important each word is to each topic
Traditional NLP
Overview
Term 1

Term 2

Term 3

Term 4

Term 5

Term 6

Term 1

Term 2

Term 3

Term 4

Term 5

Term 6
Topic 1 Topic 2

Sentiment Doc 1 Doc 1 Topic 1


Analysis
Doc 2
= Doc 2
x
Topic 2
H
Text
Classification
Doc 3
V Doc 3
W The H matrix helps understand the
Doc 4 Doc 4
terms that make up each topic

Topic Modeling Doc 5 Doc 5 The W matrix shows us the topic


distributions across the documents

Other matrix factorization techniques include PCA and SVD, but NMF is the only one that returns
all positive results, which is needed for text data where negative values wouldn’t make sense

*Copyright Maven Analytics, LLC


NMF IN PYTHON

Use sklearn’s NMF from the decomposition module to perform NMF in Python
Machine Learning • The input should be a CountVectorizer or TfidfVectorizer
Refresher
• Start at 2 components (topics) and increase by 1 until you figure out the best number of topics
Traditional NLP
Overview

Sentiment
This follows the typical sklearn process
Analysis
for an unsupervised learning model*:
1. Instantiate an object
Text 2. Fit and transform the data
Classification 3. View the attributes

Topic Modeling
NMF starts with an initial set of randomized values, so
set a random state to get the same results each time

*This sklearn process is discussed in much more detail in the


Unsupervised Learning course in the Data Science in Python series
*Copyright Maven Analytics, LLC
TOPIC MODELING NEXT STEPS

Once you fit your first NMF model in Python, you can improve your topic model
by tuning any part of the NLP pipeline:
Machine Learning
Refresher
1 Text preprocessing

Traditional NLP
• Update any cleaning or normalization steps
Overview
2 Vectorization
Sentiment • Fine tune the TfidfVectorizer parameters (stop_words, ngram_range, min_df, etc.)
Analysis
• Try using CountVectorizer instead
Text
Classification 3 Modeling
• Modify “n_components” to try out different numbers of topics
Topic Modeling
• Try a different topic modeling technique (Latent Dirichlet Allocation, Latent Semantic
Analysis, BERTopic, Top2Vec, etc.)

BONUS: In the demo, we’ll show an example of how you can mix and match multiple algorithms for
your analysis (in this case, topics + sentiment scores + EDA = sentiment about each topic)

*Copyright Maven Analytics, LLC


ASSIGNMENT: TOPIC MODELING

Key Objectives
NEW MESSAGE
May 27, 2025 1. Using the same preprocessed data as the last
From: Oscar Wynn (The Movie Maven) assignment, create a Tfidf Vectorizer
Subject: Movie themes • Remove stop words
• Start with min_df = 0.05 and max_df=0.2
Hello,
2. Create an NMF model to find the main topics in
Our feel-good movies list and female directors articles were the movie descriptions
both hits over the weekend! Thanks for your help with those.
• Start with n_components=2
Our next goal is to suggest movies based on movie themes.
Could you use topic modeling to find the major themes in our 3. Tweak the model by updating the Tfidf Vectorizer
movie list?
parameters and number of topics
Once you do that, for a few of the themes, can you provide a list
of the top 5 movies that have the theme? 4. Interpret and name the topics
Thanks! 5. For two of the topics, return the top movies that
Oscar contain the topic

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Machine learning techniques are a great starting point for NLP tasks
• Any general ML algorithm can be applied for NLP tasks once the text data is cleaned and vectorized
• ML is the preferred approach for small & medium data sets, while modern NLP is preferred for large ones

Sentiment analysis is often done using rules-based techniques


• Popular libraries within Python for sentiment analysis include VADER and TextBlob
• Other techniques include text classification and modern NLP techniques

Naïve Bayes is a popular text classification technique


• Naïve Bayes is a good starting point for classifying text data within small data sets (<10k rows)
• Other techniques include Logistic Regression, Gradient Boosted Trees, and modern NLP techniques

Non-Negative Matrix Factorization is a popular topic modeling technique


• NMF is a good starting point for identifying topics within small data sets (<10k rows)
• Other techniques include Latent Dirichlet Allocation and modern NLP techniques

*Copyright Maven Analytics, LLC


NEURAL NETWORKS & DEEP LEARNING

*Copyright Maven Analytics, LLC


NEURAL NETWORKS & DEEP LEARNING

In this section, we’ll visually break down the concepts behind neural networks and deep
learning, the building blocks of modern NLP techniques

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Understand how logistic regression works, and


Modern NLP Overview Neural Networks build up to a neural network step-by-step
• Become familiar with key terms such as layers,
Deep Learning nodes, weights, activation functions, and more
• Learn how to create a neural network in Python
with MLPClassifier & MLPRegressor
• Understand the difference between neural
networks and deep learning
• Get introduced to deep learning architectures

*Copyright Maven Analytics, LLC


MODERN NLP OVERVIEW

We are now moving from traditional to modern NLP:

Modern NLP
Overview

Neural Networks Traditional NLP Modern NLP

Data: Data:
Deep Learning • Small to medium data sets • Small to large data sets

Techniques: Techniques:
To understand transformer-
• Rules-based • Transformers-based LLMs
based models, we’ll start with
• Supervised learning (Naïve Bayes) (BERT, GPT, LLaMA, T5, BART) the basics: neural networks
• Unsupervised learning (NMF)

Applications: Applications:
• Sentiment analysis • Traditional NLP applications
• Text classification • Text summarization
• Topic modeling • Text generation

*Copyright Maven Analytics, LLC


MODERN NLP OVERVIEW

In the next two sections, we’ll be covering these modern NLP concepts to
understand how LLMs work before applying them using Hugging Face:

Modern NLP
Overview
Concepts Key Terms
Neural Networks
1 Neural Networks & Deep Learning • Neural network components: layers, nodes,
weights, parameters, activation functions
a) Logistic Regression • Neural network training: forward pass, loss,
Deep Learning
b) Neural Networks backpropagation, gradient descent
COMPLEXITY

c) Deep Learning • Deep learning architectures: FNN, CNN,


RNN, LSTM, Transformers

2 Transformers & LLMs • Embeddings: tokens


• Attention: queries, keys, scores
a) Embeddings
• Feedforward neural network
b) Attention
• Transformers: encoders vs decoders
c) Transformer-Based LLMs
• Pretrained LLMs: BERT, GPT and more

*Copyright Maven Analytics, LLC


INTRO TO NEURAL NETWORKS

A neural network is a machine learning model designed to process information in a


way that’s inspired by neurons in the human brain
Modern NLP • Biological neurons communicate by receiving and passing along information to other neurons
Overview
• A neural network processes data through layers of nodes (neurons)

Neural Networks

Deep Learning

Input layer Hidden layer Output layer


(Features) (Parameters & (Predictions)
Activation functions)
*Copyright Maven Analytics, LLC
LOGISTIC REGRESSION

To understand neural networks, let’s start with a simple logistic regression model
• Logistic regression is a classification technique used to predict a true or false outcome
Modern NLP
Overview

EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?


Neural Networks
Probability of profit Today’s temperature
Yes (1)
Probability of profit

Deep Learning
𝑝 = 𝜎(𝑚𝑥 + 𝑏)
0.5
x = 15 (59F) p = 22%
x = 25 (77F) p = 78%
No (0) x = 35 (95F) p = 98%
5 10 15 20 25 30 35
Today’s temperature (°C)
What are 𝝈, m, and b?
The higher the temperature today, the more • Coming up next…
likely my lemonade stand will be profitable

*Copyright Maven Analytics, LLC


LOGISTIC REGRESSION

To understand neural networks, let’s start with a simple logistic regression model
• Logistic regression is a classification technique used to predict a true or false outcome
Modern NLP
Overview

EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?


Neural Networks

Slope Intercept
1
Deep Learning
𝑦 = 𝑚𝑥 + 𝑏 = 0.25𝑥 − 5
5
x = 15 (59F) y = -1.25
y x = 25 (77F) y = 1.25
0 x = 35 (95F) y = 3.75
5 10 15 20 25 30 35

-5 How do we come up with values for m and b?


• We fit a logistic regression model in scikit-learn
Today’s temperature (°C)

*Copyright Maven Analytics, LLC


LOGISTIC REGRESSION

To understand neural networks, let’s start with a simple logistic regression model
• Logistic regression is a classification technique used to predict a true or false outcome
Modern NLP
Overview

EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?


Neural Networks
The sigmoid function transforms the
y-values so they fall between 0 and 1
Yes (1)
Probability of profit

Deep Learning
𝑝 = 𝜎(𝑦)
0.5
x = 15 (59F) y = -1.25 p = 22%
x = 25 (77F) y = 1.25 p = 78%
No (0) x = 35 (95F) y = 3.75 p = 98%
-6 -4 -2 0 2 4 6
y
These probability values are much more
interpretable than the original y-values

*Copyright Maven Analytics, LLC


LOGISTIC REGRESSION

To understand neural networks, let’s start with a simple logistic regression model
• Logistic regression is a classification technique used to predict a true or false outcome
Modern NLP
Overview

EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?


Neural Networks

Yes (1)
1 Remember, y=mx+b
Probability of profit

Deep Learning
𝑝=
0.5
1 + 𝑒 −(𝑚𝑥+𝑏)
This is the calculation for a
sigmoid (𝜎) transformation
No (0)
5 10 15 20 25 30 35
Today’s temperature (°C)

*Copyright Maven Analytics, LLC


LOGISTIC REGRESSION

To understand neural networks, let’s start with a simple logistic regression model
• Logistic regression is a classification technique used to predict a true or false outcome
Modern NLP
Overview

EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?


Neural Networks
2. Non-linear 1. Linear transformation
Yes (1) transformation
Probability of profit

Deep Learning
𝑝 = 𝜎(𝑚𝑥 + 𝑏)
0.5
x = 15 (59F) y = -1.25 p = 22%
x = 25 (77F) y = 1.25 p = 78%
No (0) x = 35 (95F) y = 3.75 p = 98%
5 10 15 20 25 30 35
Today’s temperature (°C)
Why are these steps so important?
• A linear transformation followed by a non-
linear transformation is the main calculation
of a neural network (coming up next!)
*Copyright Maven Analytics, LLC
LOGISTIC REGRESSION: VISUALLY

A logistic regression model is essentially a very simple neural network

Modern NLP
Overview EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?
This sigmoid function is a type of
non-linear transformation, or in
Neural Networks NN-speak, an activation function
σ
Today’s Probability
temperature of profit 𝑝 = 𝜎(𝑤𝑥 + 𝑏)
Deep Learning (x) (p)

In a neural network, the slope (m) is


called a weight (w), the intercept (b)
Input layer Hidden layer Output layer is called a bias (b), and together, they
(Features) (Parameters & (Predictions) are called the model parameters
Activation functions)

Let’s include more inputs, profitability


can’t just depend on temperature!

*Copyright Maven Analytics, LLC


LOGISTIC REGRESSION: VISUALLY

A logistic regression model is essentially a very simple neural network

Modern NLP
Overview EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?

h These w and b values come from fitting a


Neural Networks logistic regression model in scikit-learn
σ
Today’s w1 Probability
temperature of profit ℎ = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏)
Deep Learning (x1) (p)

ℎ = 𝜎(0.25𝑥1 + 1𝑥2 − 5)
w2

x1 = 15, x2 = 0 h = 22%
Coming up with more features is hard
Weekend to do, let’s have an algorithm help us x1 = 15, x2 = 1 h = 43%
(x2) x1 = 25, x2 = 0 h = 77%
x1 = 25, x2 = 1 h = 90%
x1 = 35, x2 = 0 h = 97%
Input layer Hidden layer Output layer x1 = 35, x2 = 1 h = 99%
(Features) (Parameters & (Predictions)
Activation functions)
*Copyright Maven Analytics, LLC
NEURAL NETWORKS: VISUALLY
Adding nodes to the hidden layer makes this behave like a true neural network
• You can specify the number of nodes in the hidden layer and the activation function for each

Modern NLP
Overview EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?

h1
Neural Networks
σ
Today’s w1 Probability ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 )
temperature Thirst level of profit
Deep Learning (x1) w2 (p) ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )
h2

w3 σ These are features the model came up with!


Weekend Foot traffic (while we can interpret them here, in practice
(x2) w4 they are less interpretable)

Input layer Hidden layer Output layer


(Features) (Parameters & (Predictions)
Activation functions)
*Copyright Maven Analytics, LLC
NEURAL NETWORKS: VISUALLY
Adding nodes to the hidden layer makes this behave like a true neural network
• You can specify the number of nodes in the hidden layer and the activation function for each

Modern NLP
Overview EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?

h1 p
Neural Networks
σ σ
Today’s w1 w5 ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 )
temperature Thirst level Profitability
Deep Learning (x1) w2 ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )
h2 w6 𝑝 = 𝜎(𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏3 )
w3 σ
Weekend Foot traffic The outputs from the hidden layers are
(x2) w4 assigned their own weights and bias, and
wrapped in a final activation function

Input layer Hidden layer Output layer


(Features) (Parameters & (Predictions)
Activation functions)
*Copyright Maven Analytics, LLC
NEURAL NETWORKS: VISUALLY
Adding nodes to the hidden layer makes this behave like a true neural network
• You can specify the number of nodes in the hidden layer and the activation function for each

Modern NLP
Overview EXAMPLE Based on today’s temperature, will my lemonade stand be profitable today?

h1 p
Neural Networks
σ σ
Today’s w1 w5 ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 )
temperature Thirst level Profitability
Deep Learning (x1) w2 ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )
h2 w6 𝑝 = 𝜎(𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏3 )
w3 σ
We are calculating these x1 = 15, x2 = 0 p = 15%
Weekend Foot traffic probabilities from a neural x1 = 15, x2 = 1 p = 46%
(x2) w4 network vs a logistic
regression from earlier x1 = 25, x2 = 0 p = 73%
x1 = 25, x2 = 1 p = 90%
x1 = 35, x2 = 0 p = 91%
Input layer Hidden layer Output layer
x1 = 35, x2 = 1 p = 96%
(Features) (Parameters & (Predictions)
Activation functions)
*Copyright Maven Analytics, LLC
NEURAL NETWORKS SUMMARY

1 A neural network processes data through layers of nodes


• There are input, hidden and output nodes
Modern NLP • It’s up to you as a data scientist to decide on the number of hidden layers and nodes
Overview

2 The same two step calculation is done at every node


Neural Networks
• Step 1: Calculate a weighted sum of the inputs & add a bias (weights & bias = parameters)
• Step 2: Apply a non-linear transformation to the result, also called an activation function
Deep Learning
3 Neural networks are a type of supervised learning technique
• You input in historical data and labels, and get a prediction (Python demo up next!)
• Other supervised learning techniques include Logistic Regression, Naïve Bayes, etc.

4 How do you determine the parameters?


• In practice, you can fit a neural network using scikit-learn (Python demo up next!)
• In theory, the training process involves adjusting parameters (we’ll discuss very soon!)

*Copyright Maven Analytics, LLC


NEURAL NETWORKS IN PYTHON

To create a neural network in Python, use MLPClassifier or MLPRegressor


within sklearn’s neural network module
• MLP stands for Multilayer Perceptron, which is another name for a neural network
Modern NLP
Overview

Neural Networks from sklearn.neural_network import MLPClassifier

nn = MLPClassifier(hidden_layer_sizes=(100,), activation='relu')
Deep Learning

Defines the number of nodes in each hidden layer Sets the activation function
Examples: for the hidden layers
• (100,) – 1 hidden layer with 100 nodes (default) Examples:
• (50,30) – 2 hidden layers with 50 and 30 nodes respectively • 'relu' (default)
• 'logistic'
• 'tanh'
• 'identity'

*Copyright Maven Analytics, LLC


NEURAL NETWORKS IN PYTHON

To create a neural network in Python, use MLPClassifier or MLPRegressor


within sklearn’s neural network module
• MLP stands for Multilayer Perceptron, which is another name for a neural network
Modern NLP
Overview

Neural Networks from sklearn.neural_network import MLPClassifier

nn = MLPClassifier(hidden_layer_sizes=(100,), activation='relu')
Deep Learning

Identity Logistic Tanh ReLu

*Copyright Maven Analytics, LLC


NEURAL NETWORKS IN PYTHON

To create a neural network in Python, use MLPClassifier or MLPRegressor


within sklearn’s neural network module
• MLP stands for Multilayer Perceptron, which is another name for a neural network
Modern NLP
Overview

Neural Networks This follows the typical sklearn process for


a supervised learning model:
1. Instantiate an object
Deep Learning 2. Fit a model
3. Make a prediction

When are neural networks used in practice?


• While they could be used for classification and regression, it’s often too much of a black box, and more
interpretable models like Logistic Regression and Linear Regression are primarily used instead
• For modern NLP tasks, neural networks are rarely used on their own, but rather as building blocks for
more complex techniques and architectures (coming up soon!)

*Copyright Maven Analytics, LLC


PRO TIP: NN NOTATION & MATRICES

As we saw in the Python code, the weights and biases of a neural network are
contained in weight matrices and bias vectors
• This is helpful to remember for the next section on Transformers & LLMs, where everything
Modern NLP
we review will live in matrices
Overview

(1)
𝑏1 Weight matrix
Neural Networks How to read
(1)
𝑤11
(1)
𝑤11 (2)
𝑤11
X1 h1 (2)
𝑏1
(1)
𝑤12
Deep Learning p
(1)
(1) 𝑏2
𝑤21 Starting node Ending node
X2 (1)
h2 (2)
𝑤21
𝑤22

1 1 (1)
ℎ1 = 𝜎(𝑤
𝜎 0.3𝑥 𝑥 +
11 1 1 +0.05𝑥
𝑤 𝑥2−+6.5
21 2 𝑏1 )
Weight 0.3 0.08 3.5
matrices: 0.05 2.5 3 ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)
Bias
-6.5 -1.8 -3.2
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2)
vectors:

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

Training a neural network means to calculate its optimal parameters

1. Random start: Start with an initial set of random weights & biases
Modern NLP
Overview
2. Forward pass: Starting from the left, apply all calculations through the neural network
to get to a final set of predicted values
Neural Networks
3. Calculate loss: Compare the predicted and actual values to compute the error, or loss
4. Update parameters: Starting from the right, calculate how much each parameter
Deep Learning contributed to the loss with back propagation, and then use gradient descent (a popular
optimization technique) to adjust the parameters by moving them a step closer to
reducing the loss
5. Repeat: Repeat steps 2-4 until you minimize the loss or reach an iteration limit and lock
in the final model parameters (weights and biases)

The math behind back propagation and gradient descent is beyond the scope of this course, but the key takeaway
is that each iteration moves closer to the optimal parameters, and it does so as efficiently as possible

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
𝑏1
Temperature Weekend Profitable
𝑤1 (x1) (x2) (y)
𝑤5
Neural Networks X1 h1 𝑏3
14 0 0
𝑤3
18 1 0
𝑏2 p
𝑤2 22 0 0
Deep Learning
22 1 1
X2 h2 𝑤6
𝑤4 26 0 1

26 1 1

30 0 1

30 1 1

35 0 1

15 1 0

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
𝑏1
Temperature Weekend Profitable
𝑤1 (x1) (x2) (y)
𝑤5
Neural Networks X1 h1 𝑏3
14 0 0
𝑤3
18 1 0
𝑏2 p
𝑤2 22 0 0
Deep Learning
22 1 1
X2 h2 𝑤6
𝑤4 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 ) 30 1 1

35 0 1

15 1 0

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
𝑏1
Temperature Weekend Profitable
𝑤1 (x1) (x2) (y)
𝑤5
Neural Networks X1 h1 𝑏3
14 0 0
𝑤3
18 1 0
𝑏2 p
𝑤2 22 0 0
Deep Learning
22 1 1
X2 h2 𝑤6
𝑤4 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 ) 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
𝑏1
Temperature Weekend Profitable
𝑤1 (x1) (x2) (y)
𝑤5
Neural Networks X1 h1 𝑏3
14 0 0
𝑤3
18 1 0
𝑏2 p
𝑤2 22 0 0
Deep Learning
22 1 1
X2 h2 𝑤6
𝑤4 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 ) 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )

𝑝 = 𝜎(𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏3 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
0
Temperature Weekend Profitable
1 (x1) (x2) (y)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏1 ) 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑤3 𝑥1 + 𝑤4 𝑥2 + 𝑏2 )

𝑝 = 𝜎(𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏3 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
0
Temperature Weekend Profitable
1 (x1) (x2) (y)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(1𝑥1 + 1𝑥2 + 0) 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(1𝑥1 + 1𝑥2 + 0)

𝑝 = 𝜎(1ℎ1 + 1ℎ2 + 0)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 1: Random start – Start with an initial set of random weights & biases

Modern NLP
Overview Training data:
0
Temperature Weekend Profitable
1 (x1) (x2) (y)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 𝑥1 + 𝑥2 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 𝑥1 + 𝑥2 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 14 + 0 = 0.99 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 14 + 0 = 0.99 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎 14 + 0 = 0.99

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 14 + 0 = 0.99 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎 14 + 0 = 0.99

𝑝 = 𝜎 0.99 + 0.99 = 0.88


*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0 0.88
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎 14 + 0 = 0.99 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎 14 + 0 = 0.99

𝑝 = 𝜎 0.99 + 0.99 = 0.88


*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0 0.88
1
18 1 0
0 p
1 22 0 0
Deep Learning
22 1 1
X2 h2 1
1 26 0 1

26 1 1

30 0 1

ℎ1 = 𝜎(𝑥1 + 𝑥2 ) 30 1 1

35 0 1

15 1 0
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 2: Forward pass –Starting from the left, apply all calculations through the
neural network to get to a final set of predicted values
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0 0.88
1
18 1 0 0.88
0 p
1 22 0 0 0.88
Deep Learning
22 1 1 0.88
X2 h2 1
1 26 0 1 0.88

26 1 1 0.88

30 0 1 0.88

ℎ1 = 𝜎(𝑥1 + 𝑥2 ) 30 1 1 0.88

35 0 1 0.88

15 1 0 0.88
ℎ2 = 𝜎(𝑥1 + 𝑥2 )
The initial model isn’t sensitive to our inputs and
𝑝 = 𝜎(ℎ1 + ℎ2 ) predicts we’ll be profitable 88% of the time

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS

STEP 3: Calculate loss – Compare the predicted and actual values to compute the
error, or loss
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction
1 (x1) (x2) (y) (p)
1
Neural Networks X1 h1 0
14 0 0 0.88
1
18 1 0 0.88
0 p
1 22 0 0 0.88
Deep Learning
22 1 1 0.88
X2 h2 1
1 26 0 1 0.88

26 1 1 0.88

30 0 1 0.88

ℎ1 = 𝜎(𝑥1 + 𝑥2 ) 30 1 1 0.88

35 0 1 0.88

15 1 0 0.88
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 3: Calculate loss – Compare the predicted and actual values to compute the
error, or loss
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction Error
1 (x1) (x2) (y) (p) (ε)
1
Neural Networks X1 h1 0
14 0 0 0.88 -0.88
1
18 1 0 0.88 -0.88
0 p
1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 1
1 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(𝑥1 + 𝑥2 ) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

=
LOG LOSS: 0.927
𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS
STEP 4: Update parameters – Starting from the right, calculate how much each
parameter contributed to the loss with back propagation, and then use gradient
descent to adjust the parameters by moving them a step closer to reducing the loss
Modern NLP
Overview Training data:
0
Temperature Weekend Profitable Prediction Error
1 (x1) (x2) (y) (p) (ε)
1
Neural Networks X1 h1 0
14 0 0 0.88 -0.88
1
18 1 0 0.88 -0.88
0 p
1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 1
1 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(𝑥1 + 𝑥2 ) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88
ℎ2 = 𝜎(𝑥1 + 𝑥2 )

=
LOG LOSS: 0.927
𝑝 = 𝜎(ℎ1 + ℎ2 )
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS
STEP 4: Update parameters – Starting from the right, calculate how much each
parameter contributed to the loss with back propagation, and then use gradient
descent to adjust the parameters by moving them a step closer to reducing the loss
Modern NLP
Overview Training data:
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.88 -0.88
0.2
18 1 0 0.88 -0.88
−2 p
0.1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 2.5
2.2 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88

=
LOG LOSS: 0.927

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS
STEP 4: Update parameters – Starting from the right, calculate how much each
parameter contributed to the loss with back propagation, and then use gradient
descent to adjust the parameters by moving them a step closer to reducing the loss
Modern NLP
Overview Training data:
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.88 -0.88
0.2
18 1 0 0.88 -0.88
−2 p
0.1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 2.5
2.2 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88

=
LOG LOSS: 0.927

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS
STEP 4: Update parameters – Starting from the right, calculate how much each
parameter contributed to the loss with back propagation, and then use gradient
descent to adjust the parameters by moving them a step closer to reducing the loss
Modern NLP
Overview Training data:
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.88 -0.88
0.2
18 1 0 0.88 -0.88
−2 p
0.1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 2.5
2.2 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88
ℎ2 = 𝜎(0.2𝑥1 + 2.2𝑥2 − 2)

=
LOG LOSS: 0.927

*Copyright Maven Analytics, LLC


TRAINING NEURAL NETWORKS
STEP 4: Update parameters – Starting from the right, calculate how much each
parameter contributed to the loss with back propagation, and then use gradient
descent to adjust the parameters by moving them a step closer to reducing the loss
Modern NLP
Overview Training data:
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.88 -0.88
0.2
18 1 0 0.88 -0.88
−2 p
0.1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 2.5
2.2 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88
ℎ2 = 𝜎(0.2𝑥1 + 2.2𝑥2 − 2)

=
LOG LOSS: 0.927
𝑝 = 𝜎(2.5ℎ1 + 2.5ℎ2 − 2.5)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data:
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.88 -0.88
0.2
18 1 0 0.88 -0.88
−2 p
0.1 22 0 0 0.88 -0.88
Deep Learning
22 1 1 0.88 0.12
X2 h2 2.5
2.2 26 0 1 0.88 0.12

26 1 1 0.88 0.12

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.88 0.12

35 0 1 0.88 0.12

15 1 0 0.88 -0.88
ℎ2 = 𝜎(0.2𝑥1 + 2.2𝑥2 − 2)

=
LOG LOSS: 0.927
𝑝 = 𝜎(2.5ℎ1 + 2.5ℎ2 − 2.5)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data: 3. CALCULATE LOSS
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.56
0.2
18 1 0 0.87
−2 p
0.1 22 0 0 0.9
Deep Learning
22 1 1 0.91
X2 h2 2.5
2.2 26 0 1 0.91

26 1 1 0.92

30 0 1 0.92

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.92

35 0 1 0.92

15 1 0 0.77
ℎ2 = 𝜎(0.2𝑥1 + 2.2𝑥2 − 2)
This model still estimates we’ll likely be profitable in
each scenario, but the probabilities make more sense
𝑝 = 𝜎(2.5ℎ1 + 2.5ℎ2 − 2.5)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 4. ADJUST PARAMETERS
Overview Training data: 3. CALCULATE LOSS
−6
Temperature Weekend Profitable Prediction Error
0.4 (x1) (x2) (y) (p) (ε)
2.5
Neural Networks X1 h1 −2.5
14 0 0 0.56 -0.56
0.2
18 1 0 0.87 -0.87
−2 p
0.1 22 0 0 0.9 -0.9
Deep Learning
22 1 1 0.91 0.09
X2 h2 2.5
2.2 26 0 1 0.91 0.09

26 1 1 0.92 0.08

30 0 1 0.92 0.08

ℎ1 = 𝜎(0.4𝑥1 + 0.1𝑥2 − 6) 30 1 1 0.92 0.08

35 0 1 0.92 0.08

15 1 0 0.77 -0.77
ℎ2 = 𝜎(0.2𝑥1 + 2.2𝑥2 − 2)

=
LOG LOSS: 0.710
𝑝 = 𝜎(2.5ℎ1 + 2.5ℎ2 − 2.5)
This is down from 0.927!
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data:
−6.3
Temperature Weekend Profitable Prediction Error
0.35 (x1) (x2) (y) (p) (ε)
3
Neural Networks X1 h1 −3
14 0 0 0.56 -0.56
0.12
18 1 0 0.87 -0.87
−1.9 p
0.08 22 0 0 0.9 -0.9
Deep Learning
22 1 1 0.91 0.09
X2 h2 2.8
2.3 26 0 1 0.91 0.09

26 1 1 0.92 0.08

30 0 1 0.92 0.08

ℎ1 = 𝜎(0.35𝑥1 + 0.08𝑥2 − 6.3) 30 1 1 0.92 0.08

35 0 1 0.92 0.08

15 1 0 0.77 -0.77
ℎ2 = 𝜎(0.12𝑥1 + 2.3𝑥2 − 1.9)

=
LOG LOSS: 0.710
𝑝 = 𝜎(3ℎ1 + 2.8ℎ2 − 3)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data: 3. CALCULATE LOSS
−6.3
Temperature Weekend Profitable Prediction Error
0.35 (x1) (x2) (y) (p) (ε)
3
Neural Networks X1 h1 −3
14 0 0 0.24
0.12
18 1 0 0.76
−1.9 p
0.08 22 0 0 0.79
Deep Learning
22 1 1 0.89
X2 h2 2.8
2.3 26 0 1 0.88

26 1 1 0.93

30 0 1 0.91

ℎ1 = 𝜎(0.35𝑥1 + 0.08𝑥2 − 6.3) 30 1 1 0.94

35 0 1 0.93

15 1 0 0.59
ℎ2 = 𝜎(0.12𝑥1 + 2.3𝑥2 − 1.9)
We’re now predicting we likely won’t be profitable in
low temperature weekdays, which makes sense!
𝑝 = 𝜎(3ℎ1 + 2.8ℎ2 − 3)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 4. ADJUST PARAMETERS
Overview Training data: 3. CALCULATE LOSS
−6.3
Temperature Weekend Profitable Prediction Error
0.35 (x1) (x2) (y) (p) (ε)
3
Neural Networks X1 h1 −3
14 0 0 0.24 -0.24
0.12
18 1 0 0.76 -0.76
−1.9 p
0.08 22 0 0 0.79 -0.79
Deep Learning
22 1 1 0.89 0.11
X2 h2 2.8
2.3 26 0 1 0.88 0.12

26 1 1 0.93 0.07

30 0 1 0.91 0.09

ℎ1 = 𝜎(0.35𝑥1 + 0.08𝑥2 − 6.3) 30 1 1 0.94 0.06

35 0 1 0.93 0.07

15 1 0 0.59 -0.59
ℎ2 = 𝜎(0.12𝑥1 + 2.3𝑥2 − 1.9)

=
LOG LOSS: 0.468
𝑝 = 𝜎(3ℎ1 + 2.8ℎ2 − 3)
This is down from 0.710!
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 4. ADJUST PARAMETERS
Overview Training data:
−6.5
Temperature Weekend Profitable Prediction Error
0.3 (x1) (x2) (y) (p) (ε)
3.5
Neural Networks X1 h1 −3.2
14 0 0 0.24 -0.24
0.08
18 1 0 0.76 -0.76
−1.8 p
0.05 22 0 0 0.79 -0.79
Deep Learning
22 1 1 0.89 0.11
X2 h2 3
2.5 26 0 1 0.88 0.12

26 1 1 0.93 0.07

30 0 1 0.91 0.09

ℎ1 = 𝜎(0.3𝑥1 + 0.05𝑥2 − 6.5) 30 1 1 0.94 0.06

35 0 1 0.93 0.07

15 1 0 0.59 -0.59
ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)

=
LOG LOSS: 0.468
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data:
−6.5
Temperature Weekend Profitable Prediction Error
0.3 (x1) (x2) (y) (p) (ε)
3.5
Neural Networks X1 h1 −3.2
14 0 0 0.24 -0.24
0.08
18 1 0 0.76 -0.76
−1.8 p
0.05 22 0 0 0.79 -0.79
Deep Learning
22 1 1 0.89 0.11
X2 h2 3
2.5 26 0 1 0.88 0.12

26 1 1 0.93 0.07

30 0 1 0.91 0.09

ℎ1 = 𝜎(0.3𝑥1 + 0.05𝑥2 − 6.5) 30 1 1 0.94 0.06

35 0 1 0.93 0.07

15 1 0 0.59 -0.59
ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)

=
LOG LOSS: 0.468
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP 2. FORWARD PASS
Overview Training data: 3. CALCULATE LOSS
−6.5
Temperature Weekend Profitable Prediction Error
0.3 (x1) (x2) (y) (p) (ε)
3.5
Neural Networks X1 h1 −3.2
14 0 0 0.13
0.08
18 1 0 0.6
−1.8 p
0.05 22 0 0 0.53
Deep Learning
22 1 1 0.81
X2 h2 3
2.5 26 0 1 0.78

26 1 1 0.92

30 0 1 0.88

ℎ1 = 𝜎(0.3𝑥1 + 0.05𝑥2 − 6.5) 30 1 1 0.95

35 0 1 0.92

15 1 0 0.46
ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)
The probabilities for profit are much more spread
out and sensitive to both input features!
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2)
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP
Overview Training data: 3. CALCULATE LOSS
−6.5
Temperature Weekend Profitable Prediction Error
0.3 (x1) (x2) (y) (p) (ε)
3.5
Neural Networks X1 h1 −3.2
14 0 0 0.13 -0.13
0.08
18 1 0 0.6 -0.6
−1.8 p
0.05 22 0 0 0.53 -0.53
Deep Learning
22 1 1 0.81 0.19
X2 h2 3
2.5 26 0 1 0.78 0.22

26 1 1 0.92 0.08

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.3𝑥1 + 0.05𝑥2 − 6.5) 30 1 1 0.95 0.05

35 0 1 0.92 0.08

15 1 0 0.46 -0.46
ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)

=
LOG LOSS: 0.323
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2)
This is now optimized!
*Copyright Maven Analytics, LLC
TRAINING NEURAL NETWORKS

STEP 5: Repeat – Repeat steps 2-4 until you minimize the loss or reach an
iteration limit and lock in the final model parameters (weights and biases)
Modern NLP
Overview Training data:
−6.5
Temperature Weekend Profitable Prediction Error
0.3 (x1) (x2) (y) (p) (ε)
3.5
Neural Networks X1 h1 −3.2
14 0 0 0.13 -0.13
0.08
18 1 0 0.6 -0.6
−1.8 p
0.05 22 0 0 0.53 -0.53
Deep Learning
22 1 1 0.81 0.19
X2 h2 3
2.5 26 0 1 0.78 0.22

26 1 1 0.92 0.08

30 0 1 0.88 0.12

ℎ1 = 𝜎(0.3𝑥1 + 0.05𝑥2 − 6.5) 30 1 1 0.95 0.05

35 0 1 0.92 0.08

15 1 0 0.46 -0.46
ℎ2 = 𝜎(0.08𝑥1 + 2.5𝑥2 − 1.8)
Today (new data):
Profitable!
𝑝 = 𝜎(3.5ℎ1 + 3ℎ2 − 3.2) 25 0 0.93

*Copyright Maven Analytics, LLC


DEEP LEARNING

Deep learning refers to a neural network with 3 or more hidden layers


• Deep learning has revolutionized the field of artificial intelligence (including natural
language processing, computer vision, speech recognition, and more) since the 2010s
Modern NLP
Overview • While the math has been around since the 1950s (invention of logistic regression and
perceptron), the computational power of the 2010s has boosted to them to new heights

Neural Networks

Neural Network Deep Learning


Deep Learning

Sometimes used for: Often used for:


• Simple classification tasks (0 or 1) • More complex tasks (NLP, CV, ASR, etc.)
• Medium data sets (thousands of rows) • Large data sets (millions of rows)

*Copyright Maven Analytics, LLC


DEEP LEARNING ARCHITECTURES

Deep learning architectures combine deep learning with additional calculations


and specialized operations to perform the ground-breaking tasks
Modern NLP • What we’ve been working with so far is a Feedforward Neural Network (FNN), which is
Overview the simplest deep learning architecture (no additional calculations or operations)

Neural Networks
Info flows from left to right

Deep Learning
FNNs are often a piece of other, more
complex, deep learning architectures
(this is an important piece of the next
section on Transformers!)

When every node is connected to every other node,


it’s called a fully connected neural network

*Copyright Maven Analytics, LLC


DEEP LEARNING ARCHITECTURES

These are some of the most popular deep learning architectures and their layers:

Modern NLP Convolutional Applications:


Convolutions
Image-related tasks like Raw image Pooling layer FNN layer Prediction
Overview Neural image classification, layer
Networks object detection, etc.
Extracts image Reduces Learns Makes
Neural Networks
(CNNs) features dimensions patterns predictions
Popularized in 2012

Deep Learning Recurrent Applications: Long Short-Term Memory (LSTMs)


Sequential-tasks, such as Sequential are an extension of RNNs that include
Hidden layers
Neural NLP tasks, time series data
Prediction
logic to remember and forget info over
Networks analysis, etc. time (popularized in 2015)
Feedback loops
(RNNs) remember info
Popularized in 2013 from prior steps

Transformers Applications:
Embeddings Attention
Popularized in 2017 NLP, CV, ASR tasks, Raw text FNN layer Prediction
layer layer
and much more!
(More on these in the Transformers section up next!)
Transformers have replaced
RNNs and LSTMs for NLP tasks
*Copyright Maven Analytics, LLC
DEEP LEARNING ARCHITECTURES

Summary of deep learning architectures:

Modern NLP
Overview Architecture Description Application

Building block for other deep


Neural Networks Feedforward Neural Network (FNN) Basic fully connected neural network
learning architectures

Extract image features using


Deep Learning Convolutional Neural Network (CNN) Used for image tasks
convolutions, then feed into FNN

Recurrent Neural Network (RNN) / Used for sequential tasks


Use output as input for next prediction
Long Short-Term Memory (LSTM) (but not NLP tasks anymore)

Consists of embeddings, attention, and Used for NLP, computer vision,


Transformers
FNN layers (coming up next!) speech recognition, etc.

*Copyright Maven Analytics, LLC


DEEP LEARNING IN PRACTICE

Traditional NLP way of thinking: train your model


1. Pick a model that’s good for your problem (predict if a company will be profitable)
Modern NLP 2. Provide your historical data (inputs = company descriptions, labels = profitable 1/0)
Overview
3. Feed it into model to get final parameters – now the model is trained
4. Make predictions using this trained model
Neural Networks
DOWNSIDES: To fit a deep learning model, you need millions of rows of labeled data, along with
significant computational resources
Deep Learning

Modern NLP way of thinking: use a pretrained model (parameters are locked-in)
1. Pick a pretrained model that’s good for your problem (predict if company will be profitable)
2. Make predictions using this pretrained model
3. (Optional) Improve the predictions using transfer learning or fine-tuning

NOTE: Only research labs and large tech companies will train their own deep learning models from
scratch these days, while the majority of data scientists use or start with pretrained models

*Copyright Maven Analytics, LLC


DEEP LEARNING IN PRACTICE

Most data scientists will use pretrained deep learning models for their analysis
• These pretrained models have already been trained on extremely large data sets, so all the
Modern NLP parameters (weights, biases, etc.) are locked in
Overview
• Large Language Models (LLMs) are deep learning models that are pretrained on massive
amounts of text data, including BERT and GPT (much more on this in the next section!)
Neural Networks • To use an LLM, you input your text, and then all the calculations (weighted sums, non-linear
transformations, etc.) are applied to output a final prediction

Deep Learning

This is a big mindset shift for data scientists.


Traditionally, data scientists have been focused on model interpretability, always knowing what’s happening
behind the scenes and avoiding using black box techniques and falling into the “danger zone”.
With NLP tasks these days, pretrained deep learning models work so well that they’re the gold standard for
NLP tasks, and even though it’s a black box, it’s absolutely acceptable and encouraged to use them.
The goal of this section and the next is to break down the components of popular deep learning models at a
high level to give you an idea of how they work before applying them using Hugging Face.

*Copyright Maven Analytics, LLC


PRETRAINED DEEP LEARNING MODELS

There are multiple ways to use a pretrained deep learning model:

These two are covered in the Hugging Face Transformers section of this course

Modern NLP
Overview Pretrained model only Pretrained model embeddings

Download and use a pretrained model as is to make Use a pretrained model’s embeddings as inputs into
Neural Networks predictions traditional machine learning models
• Parameters are fixed • Parameters are fixed
• Used for sentiment analysis, text summarization, etc. • Used for document similarity, document clustering, etc.
Deep Learning

Pretrained model with transfer learning Pretrained model with RAGs

Start with a pretrained model and adjust the Combine pretrained models with external databases
parameters by training on task / domain-specific data* to be more up-to-date and context-aware**
• Parameters are updated in final layers or all layers • Parameters may or may not be updated
• Used for text classification, industry-specific analysis, etc. • Used for question answering, fact checking, etc.

*Adjusting weights requires a large amount of data (at least tens of **RAGs (Retrieval Augmented Generation) require building a
thousands of labeled data points) & computational power (more than a structured retrieval database to hold at least tens of thousands of
single computer, many GPUs) external text documents

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Neural networks are ML models with input, hidden, and output layers
• They are sometimes called artificial neural networks (ANNs) or multilayer perceptrons (MLPs)
• At each node, the weighted sum of the inputs is calculated and a non-linear transformation applied
• To train a neural network, start with random parameters and slowly adjust them until they become optimal

Deep learning refers to a neural network with three or more hidden layers
• DL is often used for more complex applications such as NLP, computer vision, speech recognition, etc.

Deep learning architectures combine deep learning with extra calculations


• Popular architectures include Transformers for natural language processing, CNNs for computer vision, etc.
• These architectures often include basic FNNs, with additional modifications based on the input data (feature extraction
for image data, loops for sequential data, embeddings and attention for text data, etc.)

Most data professionals use pretrained deep learning models for analysis
• Pretrained models (set parameters) are trained on millions of data points and perform well out-of-the-box
• AI / ML researchers will train and sometimes data scientists will fine-tune models for domain-specific data sets

*Copyright Maven Analytics, LLC


TRANSFORMERS & LLMS

*Copyright Maven Analytics, LLC


TRANSFORMERS & LLMS

In this section, we’ll introduce transformers and its main layers, as well as pretrained
deep learning models specifically for NLP tasks: large language models (LLMs)

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Become familiar with the main components of the


Transformers & LLMs Embeddings transformer architecture: embeddings, attention,
and feedforward neural networks

Attention FNNs • Review the differences between encoder-only,


decoder-only and encoder-decoder models
• Get introduced to popular large language models,
Encoders & Decoders Pretrained LLMs including BERT, GPT, and more

*Copyright Maven Analytics, LLC


RECAP: MODERN NLP CONCEPTS

In this section, we’ll be covering the rest of these modern NLP concepts to
understand how LLMs work before applying them in Hugging Face:
Transformers &
LLMs

Concepts Key Terms


Embeddings

1 Neural Networks & Deep Learning • Neural network components: layers, nodes,
weights, parameters, activation functions
Attention a) Logistic Regression • Neural network training: forward pass, loss,
b) Neural Networks backpropagation, gradient descent
COMPLEXITY

We’ve covered
these now! • Deep learning architectures: FNN, CNN,
FNNs c) Deep Learning
RNN, LSTM, Transformers

Encoders & 2 Transformers & LLMs • Embeddings: tokens


Decoders • Attention: queries, keys, scores
a) Embeddings
• Feedforward neural network
b) Attention
Pretrained LLMs • Transformers: encoders vs decoders
c) Transformer-Based LLMs
• Pretrained LLMs: BERT, GPT and more

*Copyright Maven Analytics, LLC


TRANSFORMERS & LLMS

Transformers are a deep learning architecture with three main layers:


embeddings, attention, and feedforward neural networks (FNN)
Transformers &
LLMs
Large Language Models (LLMs) are deep learning models that have been
pretrained on a massive amount of text data
Embeddings

Transformer LLM
Attention Transformers can be used for LLMs can be based on many deep
many tasks, but are mainly learning architectures, but they are
used for NLP applications mainly based on transformers

FNNs Vision, Transformer RNNs,


Audio, etc. -Based LLMs LSTMs, etc.

Encoders &
Decoders

Pretrained LLMs
Transformer-Based LLMs
are the most popular DL
approach to NLP tasks

*Copyright Maven Analytics, LLC


TRANSFORMER ARCHITECTURE

The transformer architecture refers to the series of layers and computations that
the input data passes through to produce a final result
Transformers &
LLMs • Along the way, the input text is gradually transformed, hence the name transformers

Embeddings

These are the main layers of a transformer:


Attention

Raw text Embeddings layer Attention layer FNN layer Prediction

FNNs
Uses vectors to Adjusts their meanings Learns patterns from
represent the semantic based on context from the prior layers and
meaning of words surrounding words adds complexity
Encoders &
Decoders

Pretrained LLMs

*Copyright Maven Analytics, LLC


EMBEDDINGS

The first layer of a transformer, the embeddings layer, converts text tokens into
meaningful numeric representations
Transformers &
LLMs • It places each token (word) into a high-dimensional space, so words with similar meanings
end up close together, and words with different meanings are farther apart
Embeddings

Embeddings layer:

Attention token dim1 dim2 … dim768 Total dimensions vary, but 768
“I love cold lemonade!” is a common length for LLMs
I 0.16 -0.04 … 0.67
FNNs love -0.21 0.59 … 0.33
Hot
Cold
cold 0.04 -0.14 … 0.89
Encoders &

dim540
Decoders lemonade -0.11 0.35 … -0.15
Tokens include things
like punctuation! Summer
! 0.05 -0.03 … -0.06 Winter

Pretrained LLMs
dim143
The vector (768 numbers) for each token represents its location in
space – amazingly, these values have semantic meaning!

*Copyright Maven Analytics, LLC


EMBEDDINGS

The first layer of a transformer, the embeddings layer, converts text tokens into
meaningful numeric representations
Transformers &
LLMs • It places each token (word) into a high-dimensional space, so words with similar meanings
end up close together, and words with different meanings are farther apart
Embeddings

How are these values generated?


Attention
• Popular word embeddings are trained using shallow neural networks (word2vec) and
matrix factorization (GloVe) (both are concepts you’re now familiar with!)
FNNs • Within an LLM though, these values (weights) are randomly initialized and slowly
updated until they reach their final values (like we saw in the Neural Networks section)
Encoders & • Remember when we talked about how all the parameters of a neural network are
Decoders represented as matrices in Python? This embeddings matrix here is exactly that!

Pretrained LLMs
In the embedding layer alone, given a vocabulary size of 50k and 768 dimensions for each token, the
embedding matrix would have 38 million parameters! And that’s just the start of a transformer…

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs

Embeddings Without the attention layer, the word “lemonade” is in the same location in space for
all three of these sentences, even though it has a different meaning in each one

Attention

“lemonade” “I love cold lemonade!” “I love Beyonce’s Lemonade album.”


FNNs

This will be in an exact This will be in a slightly This will be in a very different
Encoders & location based on the different location, since location, since it’s about an
Decoders word embedding it’s specifically cold album, not a drink
lemonade that’s loved

Pretrained LLMs

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs

The token here


Embeddings is “lemonade”

Without the attention layer, the meaning of the word ”lemonade” isn’t
Attention “I love cold lemonade!” affected by the other words in the sentence

With the attention layer, the word “cold” adds context to “lemonade”
FNNs “I love cold lemonade!”
• In technical terms: cold attends to lemonade
• In layman’s terms: this isn’t just any lemonade, it’s a cold one
Encoders &
Decoders
With the attention layer, the word “love” adds context to “lemonade”
“I love cold lemonade!”
• In technical terms, love attends to lemonade
Pretrained LLMs • In layman’s terms: this isn’t just any lemonade, it’s lemonade that’s loved

“Lemonade” will absorb the meanings of


“cold” and “love”, but not so much “I”

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs • It does this by creating matrices for queries, keys, and attention scores

Embeddings Queries: Questions about other tokens Keys: Answers to those questions
token q1 q2 … q768 token k1 k2 … k768

I 0.12 -0.30 … 0.08 I -0.02 -0.25 … 0.04


Attention
love 0.85 0.14 … -0.22 love 0.18 0.10 … -0.20

cold 0.23 -0.10 … 0.16 cold 0.42 -0.05 … 0.10


FNNs
lemonade 0.11 0.22 … 0.03 lemonade 0.89 0.19 … 0.01

! -0.05 0.09 … -0.01 ! -0.10 0.05 … -0.02


Encoders &
Decoders
This asks: “Who or what do I love?” This says: “I can be loved!”

Pretrained LLMs
The queries and keys here are one of many query-key pairs in a transformer. Other queries about
love could be “what is expressing the love?”, “what kind of love is being expressed?”, etc.

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs • It does this by creating matrices for queries, keys, and attention scores

Embeddings Queries: Questions about other tokens Keys: Answers to those questions
token q1 q2 … q768 token k1 k2 … k768

I 0.12 -0.30 … 0.08 I -0.02 -0.25 … 0.04


Attention
love 0.85 0.14 … -0.22 love 0.18 0.10 … -0.20

cold 0.23 -0.10 … 0.16 cold 0.42 -0.05 … 0.10 I’m somewhat loved
FNNs
lemonade 0.11 0.22 … 0.03 lemonade 0.89 0.19 … 0.01 I’m loved the most

! -0.05 0.09 … -0.01 ! -0.10 0.05 … -0.02


Encoders &
Decoders
This asks: “Who or what do I love?” Who or what do I love?
I’m loved the most

Pretrained LLMs All these relationships are summarized in


I’m somewhat loved
an attention scores matrix (up next!)

I’m not loved

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs • It does this by creating matrices for queries, keys, and attention scores

Embeddings
Attention scores: Summary of query-key relationships
I love cold lemonade !
Attention
I 0.2 0.6 0 0 0.2
The “love” is mostly for “lemonade”,
love 0.1 0.1 0.3 0.4 0.1 and somewhat for “cold”
FNNs cold 0.1 0.1 0.2 0.5 0.1
“Cold” mostly describes “lemonade”
lemonade 0.3 0.1 0.4 0.1 0.1
Encoders & ! 0.3 0.1 0.1 0 0.5
Decoders

“Lemonade” is getting “love” and is “cold”


Pretrained LLMs

*Copyright Maven Analytics, LLC


ATTENTION

The second layer of a transformer, the attention layer, adds context by helping
each token absorb additional meaning from other tokens
Transformers &
LLMs • It does this by creating matrices for queries, keys, and attention scores

Embeddings
How are these values generated?
Attention • Like embeddings, the query and key values are randomly initialized and slowly updated
until they reach their final values
• What are the additional calculations?
FNNs
• To capture query-key similarity, a dot product (similarity score) is taken
• For attention scores to add up to 1, a softmax normalization function is applied
Encoders &
Decoders

Pretrained LLMs Like how the embeddings layer amazingly captured word meaning, this attention layer
amazingly captures how much each token attends to, or gives context to, other tokens

*Copyright Maven Analytics, LLC


FEEDFORWARD NEURAL NETWORK

The third layer of a transformer, the feedforward neural network (FNN) layer,
learns patterns from the data and adds complexity to the model
Transformers &
LLMs • Embedding layer: words are placed in locations in space that hold some meaning
• Attention layer: the locations are adjusted based on context from surrounding words
Embeddings
• FNN layer: patterns in those contextual relationships are learned and captured

Attention
Typical FNN in a transformer:
The output is a transformed representation of
FNNs Input layer Hidden layer Output layer the original tokens with refined meanings

768 nodes 3072 nodes 768 nodes


Encoders &
Decoders
This is just a widely-accepted theory or
One of these nodes could be capturing the idea interpretation of how this works to make it
of a noun-adjective relationship, another could more understandable – the actual workings
Pretrained LLMs are more abstract and less interpretable
be capturing the idea of love, etc.

*Copyright Maven Analytics, LLC


SUMMARY: TRANSFORMERS

To summarize these are the main layers of a transformer:


Transformers &
LLMs
Raw text Embeddings layer Attention layer FNN layer Prediction

Embeddings Uses vectors to Adjusts their meaning Learns patterns from


represent the semantic based on context from the prior layers and
meaning of words surrounding words adds complexity
Attention

How are do transformers work so well? Attention.


FNNs
• Attention captures context – it enriches the meaning of each word based on others
• Attention allows for parallelization – unlike RNNs & LSTMs which process words one at
Encoders &
Decoders a time in a sequence, the attention step processes an entire sequence at once, making it
highly parallelizable for training and able to handle huge data sets
• Attention generalizes well – this core transformers architecture works well for a variety
Pretrained LLMs
of NLP tasks with little fine-tuning

*Copyright Maven Analytics, LLC


SUMMARY: TRANSFORMERS

To summarize these are the main layers of a transformer:


Transformers &
LLMs
Raw text Embeddings layer Attention layer FNN layer Prediction

Embeddings Uses vectors to Adjusts their meaning Learns patterns from


represent the semantic based on context from the prior layers and
meaning of words surrounding words adds complexity
Attention

How are do transformers work so well? Attention.


FNNs

The famous 2017 paper was right!


Encoders &
Decoders

Pretrained LLMs

*Copyright Maven Analytics, LLC


SUMMARY: TRANSFORMERS

In reality, the layers typically follow this order:


Transformers &
LLMs
Transformer block Transformer block

These 8+ attention layers in Attention layer Attention layer


Embeddings parallel are known as multi-
headed attention
Attention layer Attention layer

Attention layer Attention layer


Attention

Attention layer Attention layer


Embeddings FNN FNN
Raw text Prediction
layer layer layer
FNNs Attention layer Attention layer

Attention layer Attention layer


Encoders &
Decoders Attention layer Attention layer

Attention layer Attention layer


Pretrained LLMs

This transformer block is often repeated


12+ times, or 24+ times for larger models

*Copyright Maven Analytics, LLC


BREAKING DOWN THE TRANSFORMER DIAGRAM

This is the transformer diagram from the


Normalization
“Attention is All You Need” paper (2017)
Transformers & Matrix multiplication
LLMs
Addition You now understand its three core learning
and scaling
Embeddings layers that capture patterns in text data!
• Embedding
Attention • Attention
• Feedforward neural network
Repeat
FNNs multiple
times

Encoders & The other components in the diagram are


Decoders math calculations that support the learning
Addition process, but don’t learn patterns themselves

Pretrained LLMs Sine / cosine


function

Attention is All You Need paper https://arxiv.org/pdf/1706.03762 *Copyright Maven Analytics, LLC
ENCODERS & DECODERS

There are three main categories of transformers: encoder-only models,


decoder-only models, and encoder-decoder models
Transformers &
LLMs
• Different models will use different pieces of the transformer architecture

Embeddings
Encoder-Only Models
Only use the left side of the
Attention
architecture, aka the encoder
While encoders embed
The encoder takes raw text and text, they can be fine-
encodes it as an embedding tuned for specific tasks
FNNs
representation of the text like sentiment analysis,
where an extra
In short, it understands text classification step is
Encoders & added to get from
Decoders Application: Sentiment Analysis embedding to output
• Input: “I love cold lemonade!”
• Output: Positive
Pretrained LLMs

*Copyright Maven Analytics, LLC


ENCODERS & DECODERS

There are three main categories of transformers: encoder-only models,


decoder-only models, and encoder-decoder models
Transformers &
LLMs
• Different models will use different pieces of this transformer architecture

Embeddings
Decoder-Only Models
Only use the right side of the
Attention
architecture, aka the decoder
The decoder takes an input text
sequence and infers* the next word
FNNs
In short, it generates text

Encoders &
Application: Text Generation
Decoders • Input: “I love cold lemonade!”
• Output: “It’s the perfect drink! ”

Pretrained LLMs

For decoder-only models, these are inputs

*With transformers & LLMs, the word inference is typically used instead of the word prediction *Copyright Maven Analytics, LLC
ENCODERS & DECODERS

There are three main categories of transformers: encoder-only models,


decoder-only models, and encoder-decoder models
Transformers &
LLMs
• Different models will use different pieces of this transformer architecture

Embeddings Encoder-Decoder Models


Use the entire architecture, both the How do we use these in practice?
encoder and decoder sides • Download a pretrained LLM
Attention (coming up next!)
The encoder-decoder takes two inputs:
1. A text sequence
2. A shifted target sequence
FNNs
Both are encoded as embeddings and
combined to infer the next word
Encoders & In short, it understands and generates text
Decoders
Application: Translation
• Input: “I love cold lemonade!”
Pretrained LLMs • Output: “¡Me encanta la limonada fría!”

*Copyright Maven Analytics, LLC


LARGE LANGUAGE MODELS (LLMs)

Transformer-based LLMs are models that uses the transformer architecture


Transformers &
and are pretrained on huge amounts of text data
LLMs

Embeddings Encoder-Only LLMs Decoder-Only LLMs Encoder-Decoder LLMs

Turns text into embeddings Infers the next token in a text Turns text into other text
Attention
Popular models: Popular models: Popular models:
• BERT – Bidirectional Encoder • GPT – Generative Pre-trained • T5 – Text-to-Text Transfer
Representations from Transformer Transformer
FNNs Transformers • BART – combines BERT and GPT

Encoders & A base LLM can have many variants:


Decoders • RoBERTa – better performance
• DistilBERT – smaller and faster Once trained, all the parameters are set
• BERT-QA – fine-tuned for question answering (embeddings, attention weights, FNN weights,
Pretrained LLMs • BioBERT – fine-tuned for biomedical texts etc.) so that anyone can input in a new text
• LegalBERT – fine-tuned for legal texts sequence and quickly calculate an output

*Copyright Maven Analytics, LLC


TRAINING LLMs

To train an LLM, researchers first choose an architecture, decide how they’ll


Transformers &
structure the inputs and outputs, and then select data sets for training
LLMs
• Training is typically done on billions of words from a variety of sources
• This process is done by large tech companies and can take weeks or months to do on many
Embeddings
GPUs (graphics processing units that allow for parallel processing)

Attention

Encoder-Only LLMs Decoder-Only LLMs Encoder-Decoder LLMs


FNNs Often trained with Masked Often trained with Autoregressive Often trained to handle a variety of
Language Modeling (MLM) Language Modeling tasks by converting everything to text
Encoders & • Input: “I love {MASK} lemonade” • Input: “I love cold” • Input: EN to ES: “I love lemonade”
Decoders • Output: “cold” • Output: “lemonade” • Output: “Me encanta la limonada”

Pretrained LLMs

*Copyright Maven Analytics, LLC


POPULAR PRETRAINED LLMs
These are some of the most popular pretrained LLMs:
LLM Type Created By Parameters Trained On
Transformers &
LLMs BERT Encoder-only Google
110 million (BERT-base) BooksCorpus (800M words)
340 million (BERT-large) English Wikipedia (2.5 billion words)

BooksCorpus (800M words)


Embeddings DistilBERT Encoder-only Hugging Face 66 million
English Wikipedia (2.5 billion words)

220 million (T5-small)


T5 Encoder-Decoder Google 770 million (T5-large) C4 (Colossal Clean Crawled Corpus) ~750GB of text
Attention 11 billion (T5-11B)
CC-News (140GB of news articles)
140 million (BART-base)
BART Encoder-Decoder Meta (Facebook) BooksCorpus (800M words)
400 million (BART-large)
English Wikipedia (2.5 billion words)
FNNs
WebText (40GB of text from Reddit links)
175 billion (GPT-3)
GPT Decoder-only OpenAI BooksCorpus (800M words)
1.5 billion (GPT-2)
English Wikipedia (2.5 billion words)
Encoders &
Decoders
Many of these are open source, but GPT-3 and later Each LLM can be used for multiple applications:
versions are proprietary – OpenAI hasn’t even
Pretrained LLMs shared the number of parameters for GPT-4! • BERT is used for sentiment analysis, classification, etc.
• T5 is used for summarization, translation, etc.
• GPT is mainly used for generation
• Any of them can be used for question answering, etc.

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Transformers are a deep learning architecture with three main layers


• The embeddings layer contains numeric vectors that represent semantic meaning of words
• The attention layer adjusts the meaning of each word based on context from surrounding words
• The feedforward neural network layer learns patterns from the prior layers and adds complexity

Attention is the game changer in the transformer architecture


• It retains context by using attention scores to capture relationships between words
• Matrix calculations allow for parallelization, making it possible to train on huge datasets

Transformer-based LLMs are the most popular approach to NLP tasks


• Large Language Models (LLMs) are deep learning models pretrained on huge text datasets
• Popular LLMs include encoder-only BERT for understanding text, decoder-only GPT for generating text and
encoder-decoders T5 and BART for understanding and converting text

*Copyright Maven Analytics, LLC


HUGGING FACE TRANSFORMERS

*Copyright Maven Analytics, LLC


HUGGING FACE TRANSFORMERS

In this section, we’ll introduce the Hugging Face Transformers library in Python and
walk through examples of how you can use pretrained models to perform NLP tasks

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Become familiar with the Hugging Face syntax and


Hugging Face Overview Sentiment Analysis workflow
• Practice applying several types of NLP tasks using
Named Entity Recognition Zero-Shot Classification Hugging Face’s pretrained models

Text Summarization Text Generation

Document Similarity

*Copyright Maven Analytics, LLC


HUGGING FACE

Hugging Face is the company that created the Transformers Python library,
Hugging Face
making it easy for data professionals to access and utilize pretrained LLMs
Overview
• They also host the Model Hub, which contains 1M+ pretrained, open-source models (in
addition to base models, there are variants, fine-tuned models, experimental models, etc.)
Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification

Text
Summarization

Text Generation

Document
Similarity

*Copyright Maven Analytics, LLC


HUGGING FACE WORKFLOW

We’ll be using this Hugging Face workflow in Python for multiple applications
Hugging Face
Overview
across the three main LLM categories, as well as embeddings:

Sentiment
Analysis 1 Determine your goal

Named Entity
Recognition
Encoder-Only Decoder-Only
Zero-Shot
• Sentiment Analysis • Text Generation
Classification
• Named Entity Recognition

Text
Summarization Encoder-Decoder Embedding
• Zero-Shot Classification • Document Similarity
Text Generation
• Text Summarization

Document
Similarity

*Copyright Maven Analytics, LLC


HUGGING FACE WORKFLOW

We’ll be using this Hugging Face workflow in Python for multiple applications
Hugging Face
Overview
across the three main LLM categories, as well as embeddings:

Sentiment
Analysis 1 Determine your goal (Sentiment analysis, summarization, generation, etc.)

Named Entity
Recognition
2 Identify a pretrained model from Hugging Face’s Model Hub

Zero-Shot
Classification
Sort by popularity!

Text
Summarization

Text Generation

Document
Similarity

*Copyright Maven Analytics, LLC


HUGGING FACE WORKFLOW

We’ll be using this Hugging Face workflow in Python for multiple applications
Hugging Face
Overview
across the three main LLM categories, as well as embeddings:

Sentiment
Analysis 1 Determine your goal (Sentiment analysis, summarization, generation, etc.)

Named Entity
Recognition
2 Identify a pretrained model from Hugging Face’s Model Hub

Zero-Shot 3 Specify your input data (a single string, a Series or column of text data, etc.)
Classification

4 Apply the pretrained model on your input data and view the outputs
Text
Summarization

Text Generation
After using a pretrained model, you have the optional step of improving your results using
transfer learning, fine-tuning, RAGs, and more (reference the Pretrained Deep Learning Models
lesson), but those often require large labeled data sets and additional processing power
Document
Similarity

*Copyright Maven Analytics, LLC


SENTIMENT ANALYSIS

Sentiment analysis is used to determine the positivity or negativity of text


Hugging Face
Overview • The default LLM for sentiment analysis is DistilBERT (encoder-only)

Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification

Text
Summarization
Transformer pipeline steps:
1. Import the pipeline module
The predicted sentiment This is the model’s confidence
Text Generation 2. Specify the task: sentiment-analysis
is positive (vs. negative) in its prediction (from 0 to 1)
3. Choose the default model
4. Specify we’re only using our CPU
Document
This is very much a positive sentence
Similarity

*Copyright Maven Analytics, LLC


SENTIMENT ANALYSIS

EXAMPLE Find the sentiment for Pop Chip reviews


Hugging Face
Overview
Add a timer to
compare
Sentiment
performance
Analysis
Hide warning
messages
Named Entity
Recognition

Zero-Shot Speed up the code


Classification by using GPU
instead of CPU
Text
Summarization

Use .apply to apply sentiment_analyzer to


Text Generation a column of truncated text

Document
Similarity

*Copyright Maven Analytics, LLC


PRO TIP: SPEEDING UP TRANSFORMERS CODE

Here are a few tips for speeding up your transformers code:


Hugging Face • For new Macs with an Apple Silicon chip (i.e., M3 chip), choose device='mps' to use the GPU
Overview
• For PCs with a dedicated GPU (i.e., gaming PCs), choose device='cuda' to use your GPU
Sentiment • For older Macs and most PCs, you’ll only be able to use your CPU for computations (default)
Analysis

Named Entity
Recognition

Zero-Shot
While not as good as using a GPU,
Classification
you can try some of these techniques
to speed up your code if you only have
Text a CPU available
Summarization

I was able to cut my runtime


Text Generation
from 1 minute to 15 seconds
by trying these 4 things!

Document
Similarity

*Copyright Maven Analytics, LLC


ASSIGNMENT: SENTIMENT ANALYSIS WITH LLMs

Key Objectives
NEW MESSAGE
May 28, 2025 1. Create a new “nlp_transformers” environment
From: Oscar Wynn (The Movie Maven) 2. Launch Jupyter Notebook
Subject: RE: Feel good vs dark movies
3. Read in the movie reviews data set including the
VADER sentiment scores
Regarding my earlier message, can you do this using Hugging
Face & LLMs instead of VADER & rules, and compare the 4. Apply sentiment analysis to the “movie_info”
results? Thank you!
column using transformers
---
5. Compare the transformers sentiment scores
We’re publishing an article on the top 10 most feel-good with the VADER sentiment scores
movies and the top 10 darkest movies according to data.
Could you use sentiment analysis to help us come up with
movies for these two lists?
Thanks!
Oscar

movie_reviews_sentiment.csv

*Copyright Maven Analytics, LLC


NAMED ENTITY RECOGNITION (NER)

Named Entity Recognition (NER) is used to find and label important information
(people, places, organizations, dates, etc.) in text
Hugging Face
Overview • The default LLM for NER is BERT (encoder-only)

Sentiment
Analysis

Transformer pipeline steps:


Named Entity
1. Import the pipeline module
Recognition
2. Specify the task: ner
3. Choose the default model
Zero-Shot 4. Specify we’re only using our CPU
Classification

Text By setting aggregation_strategy to


Summarization “SIMPLE”, we’re specifying we want
to look at words, not subwords

Text Generation

We’re 97.8% sure “Springfield” is a named entity


Document
identified as a location (LOC) – the word starts
Similarity
at character 44 and ends at 55

*Copyright Maven Analytics, LLC


NAMED ENTITY RECOGNITION (NER)

EXAMPLE Find common terms mentioned in Pop Chip reviews


Hugging Face
Overview

Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification

Text
Summarization

Use .apply to apply ner_analyzer to a column of text,


Text Generation create a list of named entities, and clean up the list
You can see competitors
in the output list
Document
Similarity

*Copyright Maven Analytics, LLC


ASSIGNMENT: NAMED ENTITY RECOGNITION (NER)

Key Objectives
NEW MESSAGE
May 29, 2025 1. Read in the children’s books data set
From: Lexi Con (Lead Data Scientist) 2. Apply NER to the Description column
Subject: Book characters
3. Create a list of all named entities
Hi! 4. Only include the people (PER)
It’s been a while.
5. Extra credit: Exclude the authors as well
Our client would like a rough list of characters from our book
collection.
Could you use NER to extract the named entities from the
book descriptions, and then filter on only people?
Thanks so much!
Lexi

childrens_books.csv

*Copyright Maven Analytics, LLC


ZERO-SHOT CLASSIFICATION

Zero-Shot classification is used to quickly categorize text without labels


Hugging Face
Overview • The default LLM for zero-shot classification is BART (encoder-decoder)

Sentiment
Analysis

Named Entity
Same pipeline steps with zero-
Recognition
shot-classification as the task

Zero-Shot
Classification

Text
Summarization
This is a quote!

Text Generation
You provide the label options and the model returns scores
that classify it into one of those labels (adding up to 1)
Document
Similarity

*Copyright Maven Analytics, LLC


ZERO-SHOT CLASSIFICATION

EXAMPLE Categorize pop chip reviews into one of five groups


Hugging Face
Overview

Sentiment
Analysis

Named Entity
Recognition Use things like domain
expertise, EDA, and topic
modeling to come up with
Zero-Shot relevant labels
Classification

Text
Summarization

Text Generation It looks like these


labels make sense!

Document
Similarity

*Copyright Maven Analytics, LLC


ASSIGNMENT: ZERO-SHOT CLASSIFICATION

Key Objectives
NEW MESSAGE
May 30, 2025 1. Apply zero-shot classification to the Description
From: Lexi Con (Lead Data Scientist) column
Subject: Book categories 2. Find the number of books in each category and
check a few to see if the results make sense
Hello,
Our client would like to divide their book list into five shelves
at their physical bookstore. Could you label all the books as
one of these categories?
• Adventure & Fantasy
• Animals & Nature
• Mystery
• Humor
• Non-Fiction

Thanks!
Lexi

*Copyright Maven Analytics, LLC


TEXT SUMMARIZATION

Text summarization is used to make long bodies of text more concise


Hugging Face • The default LLM for text summarization is BART (encoder-decoder)
Overview

Sentiment
Analysis
Same pipeline steps with
summarization as the task
Named Entity
Recognition

Zero-Shot
Classification

Text
Summarization

Text Generation

Beyond specifying the min and max length of the summarized text, you can set the do_sample parameter
Document to False to use the most likely next word (default) or to True to use a more random and creative next word
Similarity

*Copyright Maven Analytics, LLC


TEXT SUMMARIZATION

EXAMPLE Use text summarization to reduce text size before sentiment analysis
Hugging Face
Overview

Sentiment
Analysis

Named Entity Earlier we truncated the data


Recognition because our text was too long for
sentiment analysis, but we can
apply text summarization instead
Zero-Shot
Classification

Text
Summarization

Text Generation
None of these 3
perfectly capture
Document the sentiment
Similarity

*Copyright Maven Analytics, LLC


ASSIGNMENT: TEXT SUMMARIZATION

Key Objectives
NEW MESSAGE
May 31, 2025 1. Apply text summarization to the Description column
From: Lexi Con (Lead Data Scientist) 2. Review the results to see if they make sense
Subject: Book summaries

Hello,
Our client would like a short one-liner for each book.
Could you use text summarization to summarize the
descriptions?

Thanks!
Lexi

*Copyright Maven Analytics, LLC


PRO TIP: TEXT GENERATION

Text generation is used to write new text based on a prompt


Hugging Face • The default LLM for text generation is GPT (decoder-only)
Overview

Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification

Text
Summarization

Text Generation
The do_sample parameter allows you to get
more random and creative next words Text generation is mostly used for creating applications, and better models
Document like GPT-3 and 4 require using an API with an OpenAI account and credits
Similarity

*Copyright Maven Analytics, LLC


DOCUMENT EMBEDDINGS

Embeddings are numeric representations of text that carry semantic meaning


Hugging Face • You can get document embeddings from an LLM using feature extraction
Overview
• The default LLM for feature extraction is BERT (encoder-only), but MiniLM (encoder-only) is
more popular for document similarity
Sentiment
Analysis

Named Entity Same pipeline steps with feature-extraction


Recognition as the task, and a non-default, popular model

Zero-Shot
Classification

Text
Feature extraction is the idea of
Summarization Now that the sentence has been vectorized, you using embeddings from
can apply EDA, clustering, classification, etc. (typically) the last layer of a
pretrained transformer model
Text Generation and inputting them into
downstream ML / analysis tasks

Document
Similarity MiniLM uses 384 dimensions for embeddings
compared to the 768 dimensions from BERT

*Copyright Maven Analytics, LLC


COSINE SIMILARITY

Cosine similarity is a metric used to calculate the similarity between observations


Hugging Face • Values range from -1 (dissimilar) to +1 (similar)
Overview
• This can be used with document embeddings to calculate document similarity
Sentiment
Analysis
EXAMPLE Finding the fruit most similar to mango
Named Entity
Recognition

Zero-Shot
Classification
Peach
Banana

Text
cos 𝟔𝟎 = 𝟎. 𝟓
Sugar

Summarization Mango
Mangos and limes
are not very similar
Text Generation
Lime
60°

Document
Vitamin C
Similarity

*Copyright Maven Analytics, LLC


COSINE SIMILARITY

Cosine similarity is a metric used to calculate the similarity between observations


Hugging Face • Values range from -1 (dissimilar) to +1 (similar)
Overview
• This can be used with document embeddings to calculate document similarity
Sentiment
Analysis
EXAMPLE Finding the fruit most similar to mango
Named Entity
Recognition

Zero-Shot
Classification
Peach
Banana

Text
cos 𝟒𝟑 = 𝟎. 𝟕𝟑
Sugar

Summarization Mango
Peaches are more similar
to mangos than limes
Text Generation
43° Lime

Document
Vitamin C
Similarity

*Copyright Maven Analytics, LLC


COSINE SIMILARITY

Cosine similarity is a metric used to calculate the similarity between observations


Hugging Face • Values range from -1 (dissimilar) to +1 (similar)
Overview
• This can be used with document embeddings to calculate document similarity
Sentiment
Analysis
EXAMPLE Finding the fruit most similar to mango
Named Entity
Recognition

Zero-Shot
Classification
Peach
Banana

Text
cos 𝟗 = 𝟎. 𝟗𝟖
Sugar

Summarization Mango
Bananas are the most
similar fruit to mangos!
Text Generation
Lime

Document
Vitamin C
Similarity

*Copyright Maven Analytics, LLC


COSINE SIMILARITY

Cosine similarity is a metric used to calculate the similarity between observations


Hugging Face • Values range from -1 (dissimilar) to +1 (similar)
Overview
• This can be used with document embeddings to calculate document similarity
Sentiment
Analysis
EXAMPLE Finding the fruit most similar to mango
Named Entity
Recognition

There are many similarity metrics to


Zero-Shot
choose from, but cosine similarity is
Classification
Peach popular in machine learning because:
Banana
• It focuses on direction instead of
Text magnitude
Sugar

Summarization Mango
• It can handle high dimensions
• It works well on sparse data
(data containing many 0 values)
Text Generation
Lime

Document
Vitamin C
Similarity

*Copyright Maven Analytics, LLC


DOCUMENT SIMILARITY

EXAMPLE Use embeddings and cosine similarity to find similar movies


Hugging Face
Overview

Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification

Text In layman’s terms, movie_embeddings


Summarization holds the embedding for each movie
In technical terms, we’re creating a
numpy array with:
Text Generation
• 166 elements (one for each movie)
• 384 dimensions (movie vector)
Document
Similarity

*Copyright Maven Analytics, LLC


DOCUMENT SIMILARITY

EXAMPLE Use embeddings and cosine similarity to find similar movies


Hugging Face
Overview

Sentiment
Analysis

Named Entity
Recognition

Zero-Shot
Classification
These are the movies that are most
Text similar to Captain Marvel based on
Summarization their movie descriptions

Text Generation

Document
Similarity

*Copyright Maven Analytics, LLC


ASSIGNMENT: DOCUMENT SIMILARITY

Key Objectives
NEW MESSAGE
June 1, 2025 1. Turn the Description column into embeddings
From: Lexi Con (Lead Data Scientist) using feature extraction
Subject: Book recommendations 2. Compare the cosine similarity of Harry Potter
and the Sorcerer’s Stone to all the other books
Hello,
3. Return the top 5 most similar books
I have one final request for you.
Our client is a big fan of the first Harry Potter book, Harry
Potter and the Sorcerer's Stone.
What other books would you recommend for them using
document similarity with LLM embeddings?
Thanks for all your help over the past few weeks!
Lexi

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

The transformers library allows you to use pretrained LLMs in Python


• Most data professionals will use pretrained models, while researchers and specialists will fine-tune models
• The steps are to select a task, specify a model, apply the model on your text data, and view the output
• The code can take a long time to run so be sure to utilize your GPU, if available (device = ‘mps’ or ‘cuda’)

Hugging Face’s Model Hub contains many NLP tasks to choose from
• The transformers library will provide a default model for various tasks, but you can swap out models
• By filtering on tasks and sorting on downloads, you can find alternative models to test out

There are many applications of LLMs


• In this section, we covered sentiment analysis, named entity recognition, zero-shot classification, text
summarization, text generation and document similarity
• LLMs are just one tool in a data scientist’s toolbox – these techniques can be used as an alternative to
traditional techniques (sentiment analysis) or alongside traditional techniques (cosine similarity)

*Copyright Maven Analytics, LLC


NLP REVIEW & NEXT STEPS

*Copyright Maven Analytics, LLC


NLP REVIEW

These are the NLP techniques & applications that we covered:

NLP Review NLP Category Technique Application

Rules-Based Sentiment Analysis


NLP Next Steps

Supervised Learning (Naïve Bayes) Text Classification

Traditional Unsupervised Learning (NMF) Topic Modeling

Sentiment Analysis
Encoder-Only LLM (BERT)
Named Entity Recognition (NER)
Zero Shot Classification
Encoder-Decoder LLM (BART)
Text Summarization
Modern Decoder-Only LLM (GPT) Text Generation

Embeddings (MiniLM) Document Similarity

*Copyright Maven Analytics, LLC


NLP REVIEW

When should I use traditional vs. modern NLP techniques?


• In summary, start simple!

NLP Review

What is my NLP goal? How much data do I have?


NLP Next Steps
Sentiment Analysis Small to medium data Try traditional
(<100k rows) techniques first
These can be done with
Text Classification traditional techniques
Consider modern
Topic Modeling Big data (>1M rows)
techniques

Text Generation
These cannot be done with traditional
Machine Translation techniques, so use modern techniques

Question Answering

*Copyright Maven Analytics, LLC


NLP NEXT STEPS

If you enjoyed the technical aspects of this course and want to learn more:

NLP Review

Traditional NLP Modern NLP


NLP Next Steps
You now know the basics of text preprocessing You now know the basics of neural networks, deep
and vectorization, and have practice applying learning, and the transformer architecture, and have
supervised & unsupervised learning techniques practice applying pretrained LLMs

Next steps: Next steps (beyond pretrained models):


• Start small: Use these NLP techniques on a • Transfer learning: LLM parameters stay mostly frozen,
small piece of a larger data science project and only the last layers are fine-tuned on a new task
• Go beyond the basics: Try practicing on text (useful if there is limited training data)
and numeric data, testing out different cleaning • Fine-tuning: All LLM parameters are updated based on
techniques, tuning model parameters, and mixing new training data (requires 10k+ labeled examples)
& matching techniques • RAGs (retrieval-augmented generation): Enhances LLM
• Learn new algos: You are well-equipped to learn outputs by retrieving relevant info from a database before
other new machine learning techniques you generating answers (requires 10k+ external text
encounter documents)

*Copyright Maven Analytics, LLC


NLP NEXT STEPS

Key takeaways from someone who has been a decade-long data scientist:

NLP Review Modern NLP is a huge mindset shift from traditional data science
• With traditional science, the “danger zone” lies in not understanding everything
NLP Next Steps • With modern NLP, it’s impossible to comprehend everything

The good news:


• It’s incredible how well modern NLP techniques work
• Powerful NLP techniques are now available to the masses

The bad news:


• AI ethics implications (shout out to Chris Bruehl’s Data & AI Ethics course!)
• Sometimes it feels impossible to keep up

*Copyright Maven Analytics, LLC


NLP NEXT STEPS

After this course:


• You can follow the NLP conversation now that you understand the foundations
• There are a lot of advancements so keep track of the news!
NLP Review

Top organizations and labs:


NLP Next Steps • OpenAI (@OpenAI) – creator of GPT models
• Google AI (@GoogleAI) – creator of BERT, T5 and more
• Meta AI (@MetaAI) – creator of BART, LLaMA and more
• Hugging Face (@huggingface) – hosts NLP models
• Stanford NLP Group (@stanfordnlp) – leader in linguistic research
• DeepMind (@DeepMind) – cutting-edge AI research

Congrats on completing this course!


The NLP field is rapidly evolving and there are always new things to learn. You now have the baseline
knowledge to go out, explore, test out new models and dive deeper into whatever interests you.
Welcome to the field of NLP. It’s an exciting time, and we are just getting started.

*Copyright Maven Analytics, LLC

You might also like