0% found this document useful (0 votes)

32 views31 pages

Practical and Effective Neural NER

This document discusses spaCy v2.0 and beyond, including new features for named entity recognition and Prodigy, spaCy's active learning tool. spaCy is an open-source library for industrial-strength natural language processing. Prodigy allows for faster training and evaluation of models through its annotation tool that combines machine learning and user experience insights. Upcoming developments include pre-trained models for more languages and domains, as well as adding capabilities like coreference resolution and entity linking.

Uploaded by

wcc32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views31 pages

Practical and Effective Neural NER

Uploaded by

wcc32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Practical and

Effective Neural
Entity Recognition
in spaCy v2.0 and beyond

Matthew Honnibal 💥 Explosion AI

Explosion AI is a digital studio
specialising in Artificial Intelligence
and Natural Language Processing.

Open-source library for industrial-strength

Natural Language Processing

spaCy’s next-generation Machine Learning library

for deep learning with text

A radically efficient data collection and annotation

tool, powered by active learning

Coming soon: pre-trained, customisable models

for a variety of languages and domains
Matthew Honnibal
CO-FOUNDER

PhD in Computer Science in 2009.

10 years publishing research on state-of-
the-art natural language understanding
systems. Left academia in 2014 to
develop spaCy.

Ines Montani
CO-FOUNDER

Programmer and front-end developer with

degree in media science and linguistics.
Has been working on spaCy since its first
release. Lead developer of Prodigy.
“I don’t get it. Can you
explain like I’m five?”
Think of us as a boutique kitchen.

free recipes published online open-source software

catering for select events consulting

soon: a line of kitchen gadgets downloadable tools

soon: a line of fancy sauces and spice mixes you

can use at home pre-trained models
spaCy
free, open-source library for Natural Language Processing

helps you build applications that process and

“understand” large volumes of text

in use at hundreds of companies

A hopelessly short
introduction to Named
Entity Recognition
What’s NER?

import spacy

nlp = spacy.load('en')
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")

for ent in doc.ents:

print(ent.text, ent.start_char, ent.end_char, ent.label)

U-ORG U-GPE B-MONEY L-MONEY

’s NER performance
SYSTEM TYPE NER F

spaCy en_core_web_sm (2017) neural 85.67

spaCy en_core_web_lg (2017) neural 86.42

Strubell et al. (2017) neural 86.81

Chiu and Nichols (2016) neural 86.19

Durrett and Klein (2014) neural 84.04

Ratinov and Roth (2009) linear 83.45

alpha.spacy.io
’s English models

MODEL TYPE UAS NER F POS WPS SIZE

en_core_web_sm (2017) v2 neural 91.4 85.5 97.0 8.2k 36MB

en_core_web_lg (2017) v2 neural 91.9 86.4 97.2 8.1k 667MB

en_core_web_sm (2016) v1 linear 86.6 78.5 96.6 25.7k 50MB

en_core_web_lg (2016) v1 linear 90.6 81.4 96.7 18.8k 1GB

alpha.spacy.io
What’s so hard about
Named Entity Recognition?
Entity recognition is not a great thesis topic.
This makes progress slow.

Structured prediction 🤓 interesting!

Knowledge intensive 🤔 potentially cool?

Mix of easy and hard cases 😫 super frustrating...

Transition-based NER

Lample et al. (2016)

Start with all empty stack, all words on buffer, no entities

Define actions that change the state

Predict the sequence of actions

DEEP LE A R N I N G F OR N L P

Embed. Encode.
Attend. Predict.
Think of data shapes,
not application details.

integer vector
category label single meaning

sequence of vectors matrix

multiple meanings meanings in context
EMBED

Learn dense embedding

“You shall know a word by the company it keeps.”

if it barks like a dog...

word2vec, PMI, LSI etc.

NOTATION
EMBED | Function concatenation
>> Function composition

features = doc2array([NORM, PREFIX, SUFFIX, SHAPE])

norm = get_col(0) >> HashEmbed(128, 7500)
prefix = get_col(1) >> HashEmbed(128, 7500)
suffix = get_col(2) >> HashEmbed(128, 7500)
shape = get_col(3) >> HashEmbed(128, 7500)

embed_word = (
(norm | prefix | suffix | shape)
>> Maxout(128, pieces=3)
)
EMBED

features = doc2array([NORM, PREFIX, SUFFIX, SHAPE])

norm = get_col(0) >> HashEmbed(128, 7500)
prefix = get_col(1) >> HashEmbed(128, 7500)
suffix = get_col(2) >> HashEmbed(128, 7500)
shape = get_col(3) >> HashEmbed(128, 7500)

embed_word = (
(norm | prefix | suffix | shape)
>> Maxout(128, pieces=3)
)
ENCODE

Learn to encode context

encode context-independent vectors into

context-sensitive sentence matrix

LSTM, CNN etc.

ENCODE

trigram_cnn = (
ExtractWindow(nW=1) >> Maxout(128, pieces=3)
)
encode_context = (
embed_word
>> Residual(trigram_cnn)
>> Residual(trigram_cnn)
>> Residual(trigram_cnn)
>> Residual(trigram_cnn)
)
ATTEND

Learn what to pay attention to

summarize inputs with respect to query

get global problem-specific representation

ATTEND

Learn to predict target

values

output class IDs, real values, etc.

standard multi-layer perceptron

PREDICT

tensor = trigram_cnn(embed_word(doc))
state_weights = state2vec(tensor)
state = initialize_state(doc)
while not state.is_finished:
features = get_features(state, state_weights)
probs = mlp(features)
action = (probs * valid_actions(state)).argmax()
state = action(state)
Advantages of the
transition-based approach

Mostly equivalent to sequence tagging

Convenient to share code with parser

Easily exclude invalid sequences

Easily define arbitrary features

Breaking through the
knowledge acquisition
bottleneck
PROBLEM

We need annotations.

We can definitely pre-train embeddings.

We can probably pre-train CNN.

We can pre-train entities, but should fine-tune.

We must train the output from scratch.

We absolutely need evaluation data.

Prodigy (prodi.gy)

Annotation tool combining insights from Machine Learning

and UX to help developers train and evaluate models faster.
START THE PRODIGY SERVER

$ prodigy dataset ner_product "Improve PRODUCT

on Reddit data"

✨ Created dataset 'ner_product'.

$ prodigy ner.teach ner_product en_core_web_sm

~/data/RC_2010-01.bz2 --loader reddit --label
PRODUCT

✨ Starting the web server on port 8080...

TRAIN AND EVALUATE

$ prodigy ner.batch-train ner_product

en_core_web_sm --output /tmp/model --eval-split 0.5
--label PRODUCT

Loaded model en_core_web_sm

Using 50% of examples (883) for evaluation
Using 100% of remaining examples (891) for training

Correct 164
Incorrect 46
Baseline 0.005
Accuracy 0.781
What’s next?

spaCy v2.0 release candidate – almost ready 🎉

Create training data for more languages and

specific genres and domains

Add coreference resolution and entity linking

Use self-training to keep models up-to-date

Thanks!
💥 Explosion AI
explosion.ai

📲 Follow us on Twitter
@honnibal
@_inesmontani
@explosion_ai

ETH Zurich Talk - April 14, 2025
No ratings yet
ETH Zurich Talk - April 14, 2025
84 pages
LLM Book
No ratings yet
LLM Book
161 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
AI by Hand Vol 1
No ratings yet
AI by Hand Vol 1
28 pages
RSpec Essentials
From Everand
RSpec Essentials
Mani Tadayon
3/5 (1)
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
No ratings yet
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
275 pages
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
Ai
No ratings yet
Ai
28 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Deep Learning Book Part1
No ratings yet
Deep Learning Book Part1
100 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Deep Learning Notes (1) 2
No ratings yet
Deep Learning Notes (1) 2
54 pages
Kirkvik Acit2022
No ratings yet
Kirkvik Acit2022
155 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
No ratings yet
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
66 pages
Lecture 13 - Transformer Encoder Decoderv2
No ratings yet
Lecture 13 - Transformer Encoder Decoderv2
65 pages
3-Natural Language Processing With Attention Models
No ratings yet
3-Natural Language Processing With Attention Models
62 pages
Speeding Up Document Image Classi Cation
No ratings yet
Speeding Up Document Image Classi Cation
59 pages
Computer Science 2
No ratings yet
Computer Science 2
66 pages
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
No ratings yet
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
37 pages
07 Dlintro
No ratings yet
07 Dlintro
39 pages
Recurrent Neural Networks Cheatsheet
No ratings yet
Recurrent Neural Networks Cheatsheet
44 pages
Transformers
No ratings yet
Transformers
15 pages
Anlp 05 Transformers
No ratings yet
Anlp 05 Transformers
40 pages
Deep Learning For Industries
No ratings yet
Deep Learning For Industries
45 pages
2019-POPL - Code2vec Learning Distributed Representations of Code
No ratings yet
2019-POPL - Code2vec Learning Distributed Representations of Code
29 pages
M20CS061
No ratings yet
M20CS061
37 pages
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
From Everand
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
Lena Neill
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
GenAI Workshop
No ratings yet
GenAI Workshop
35 pages
08 Natural Language Processing in Tensorflow
No ratings yet
08 Natural Language Processing in Tensorflow
29 pages
Topic Modelling - Deep Learning Interview Questions
No ratings yet
Topic Modelling - Deep Learning Interview Questions
19 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Deep Learning Questions 1701781891
No ratings yet
Deep Learning Questions 1701781891
17 pages
Cluster1 Core ML NLP Techniques Summary
No ratings yet
Cluster1 Core ML NLP Techniques Summary
8 pages
Super VIP Cheetsheet - Deep Learning, AI, ML
No ratings yet
Super VIP Cheetsheet - Deep Learning, AI, ML
47 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
32 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Sumati
No ratings yet
Sumati
10 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Code2vec Learning Distributed Representations of Code
No ratings yet
Code2vec Learning Distributed Representations of Code
30 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
BBBB
No ratings yet
BBBB
8 pages
AI
No ratings yet
AI
11 pages
Exercise 8
No ratings yet
Exercise 8
6 pages
Named Entity Recognition Using Transformers - 1716328213413
No ratings yet
Named Entity Recognition Using Transformers - 1716328213413
7 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Transformer
No ratings yet
Transformer
5 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Grounded Recurrent Neural Networks
No ratings yet
Grounded Recurrent Neural Networks
11 pages
21 01 23
No ratings yet
21 01 23
8 pages
Exam 2003
No ratings yet
Exam 2003
21 pages
Object Detection
No ratings yet
Object Detection
13 pages
AI Slide
No ratings yet
AI Slide
36 pages
Spacy Cheat Sheet Python For Data Science: Spans Visualizing
No ratings yet
Spacy Cheat Sheet Python For Data Science: Spans Visualizing
2 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
SC QB (24-25)
No ratings yet
SC QB (24-25)
14 pages
Face Recognition Using Subspace LDA: Sheifali Gupta, O.P.Sahoo, Ajay Goel and Rupesh Gupta
No ratings yet
Face Recognition Using Subspace LDA: Sheifali Gupta, O.P.Sahoo, Ajay Goel and Rupesh Gupta
3 pages
Twin Support Vector Machines Models Extensions and Applications
No ratings yet
Twin Support Vector Machines Models Extensions and Applications
221 pages
Deep Learning - IIT Ropar - Unit 5 - Week 2
No ratings yet
Deep Learning - IIT Ropar - Unit 5 - Week 2
4 pages
100 AI Algorithms
No ratings yet
100 AI Algorithms
5 pages
Maai 6
No ratings yet
Maai 6
143 pages
3 # Deep Learning
No ratings yet
3 # Deep Learning
36 pages
Soft Computing
No ratings yet
Soft Computing
96 pages
Manipal Institute of Technology: Course Plan
No ratings yet
Manipal Institute of Technology: Course Plan
3 pages
Kernel Perceptron
No ratings yet
Kernel Perceptron
28 pages
IITHyderabad AI-ML Brochure V1
No ratings yet
IITHyderabad AI-ML Brochure V1
8 pages
الشبكات العصبية الاصطناعية واستخدامها في تميز بصمة الأصبع
No ratings yet
الشبكات العصبية الاصطناعية واستخدامها في تميز بصمة الأصبع
21 pages
Extreme Learning Machine and Its Applications
No ratings yet
Extreme Learning Machine and Its Applications
8 pages
AI in Robotics Presentation
No ratings yet
AI in Robotics Presentation
20 pages
C 15 Mini Project
No ratings yet
C 15 Mini Project
12 pages
3 - Golovko
No ratings yet
3 - Golovko
5 pages
Project Report
No ratings yet
Project Report
5 pages
Accident Detection Using ML and Ai Techniques
No ratings yet
Accident Detection Using ML and Ai Techniques
8 pages
Machine Learning Methods For Toxic Comment Classif
No ratings yet
Machine Learning Methods For Toxic Comment Classif
12 pages
Multimodal Transformer Fusion For Continuous Emotion Recognition
No ratings yet
Multimodal Transformer Fusion For Continuous Emotion Recognition
5 pages
AI CourseSyllabus
No ratings yet
AI CourseSyllabus
2 pages
Module1 (ML)
No ratings yet
Module1 (ML)
2 pages
Search Term-1
No ratings yet
Search Term-1
4 pages
SAR ATR With Full-Angle Data Augmentation and Feat
No ratings yet
SAR ATR With Full-Angle Data Augmentation and Feat
5 pages
Icipcn 2020 1
No ratings yet
Icipcn 2020 1
2 pages