Advanced NLP With Spacy Chapter2
Advanced NLP With Spacy Chapter2
Ines Montani
spaCy core developer
Shared vocab and string store (1)
Vocab : stores data shared across multiple documents
coffee_hash = nlp.vocab.strings['coffee']
coffee_string = nlp.vocab.strings[coffee_hash]
Hashes can't be reversed – that's why we need to provide the shared vocab
Ines Montani
spaCy core developer
The Doc object
# Create an nlp object
from spacy.lang.en import English
nlp = English()
Use token attributes if available – for example, token.i for the token index
Ines Montani
spaCy core developer
Comparing semantic similarity
spaCy can compare two objects and predict similarity
Important: needs a model that has word vectors included, for example:
YES: en_core_web_md (medium model)
0.8627204117787385
0.7369546
print(doc.similarity(token))
0.32531983166759537
print(span.similarity(doc))
0.619909235817623
Short phrases are better than long documents with many irrelevant words
print(doc1.similarity(doc2))
0.9501447503553421
Ines Montani
spaCy core developer
Statistical predictions vs. rules
Statistical models Rule-based systems
Real-world product names, person names, countries of the world, cities, drug
examples subject/object relationships names, dog breeds
matcher = PhraseMatcher(nlp.vocab)