Natural Language Processing With Python & NLTK Cheat Sheet: by Via

This document provides a cheat sheet summary of key Natural Language Processing techniques using Python and the NLTK library. It covers topics such as text handling, corpus access, tokenization, sentence parsing, part-of-speech tagging, named entity recognition, text classification, lemmatization and stemming, and using regular expressions with Pandas for tasks like extraction and replacement. The cheat sheet is intended as a concise reference guide for common NLP pre-processing, analysis and machine learning tasks.

Uploaded by

Ashwani Rathee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views

Natural Language Processing With Python & NLTK Cheat Sheet: by Via

Uploaded by

Ashwani Rathee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Natural Language Processing with Python & nltk Cheat Sheet

by RJ Murray (murenei) via cheatography.com/58736/cs/15485/

Handling Text Sentence Parsing

text='Some words' assign string g=nltk.data.load('grammar.cfg') Load a grammar from a file

list(text) Split text into character tokens g=nltk.CFG.fromstring("""..."" Manually define grammar
")
set(text) Unique tokens
parser=nltk.ChartParser(g) Create a parser out of the
len(text) Number of characters
grammar

trees=parser.parse_all(text)
Accessing corpora and lexical resources
for tree in trees: ... print tree
from nltk.corpus import import CorpusReader object
brown from nltk.corpus import treebank

brown.words(text_id) Returns pretokenised document as list treebank.parsed_sents('wsj_000 Treebank parsed sentences

of words 1.mrg')

brown.fileids() Lists docs in Brown corpus

Text Classification
brown.categories() Lists categories in Brown corpus
from sklearn.feature_extraction.text import
Tokenization CountVectorizer, TfidfVectorizer

text.split(" ") Split by space vect=CountVectorizer().fit(X_tr Fit bag of words model to

ain) data
nltk.word_tokenizer(text) nltk in-built word tokenizer
vect.get_feature_names() Get features
nltk.sent_tokenize(doc) nltk in-built sentence tokenizer
vect.transform(X_train) Convert to doc-term matrix

Lemmatization & Stemming

Entity Recognition (Chunking/Chinking)
input="List listed lists listing Different suffixes
listings" g="NP: {<DT>?<JJ>*<NN>}" Regex chunk grammar

words=input.lower().split(' ') Normalize (lowercase) cp=nltk.RegexpParser(g) Parse grammar

words
ch=cp.parse(pos_sent) Parse tagged sent. using grammar
porter=nltk.PorterStemmer Initialise Stemmer
print(ch) Show chunks
[porter.stem(t) for t in words] Create list of stems
ch.draw() Show chunks in IOB tree
WNL=nltk.WordNetLemmatizer() Initialise WordNet
cp.evaluate(test_sents) Evaluate against test doc
lemmatizer
sents=nltk.corpus.treebank.tagged_sents()
[WNL.lemmatize(t) for t in words] Use the lemmatizer
print(nltk.ne_chunk(sent)) Print chunk tree
Part of Speech (POS) Tagging

nltk.help.upenn_tagset Lookup definition for a POS tag

('MD')

nltk.pos_tag(words) nltk in-built POS tagger

<use an alternative tagger to illustrate

ambiguity>

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by CrosswordCheats.com

cheatography.com/murenei/ Last updated 29th May, 2018. Learn to solve cryptic crosswords!
tutify.com.au Page 1 of 2. http://crosswordcheats.com
Natural Language Processing with Python & nltk Cheat Sheet
by RJ Murray (murenei) via cheatography.com/58736/cs/15485/

RegEx with Pandas & Named Groups

df=pd.DataFrame(time_sents, columns=['text'])

df['text'].str.split().str.len()

df['text'].str.contains('word')

df['text'].str.count(r'\d')

df['text'].str.findall(r'\d')

df['text'].str.replace(r'\w+day\b', '???')

df['text'].str.replace(r'(\w)', lambda x: x.groups()

[0][:3])

df['text'].str.extract(r'(\d?\d):(\d\d)')

df['text'].str.extractall(r'((\d?\d):(\d\d) ?
([ap]m))')

df['text'].str.extractall(r'(?P<digits>\d)')

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by CrosswordCheats.com

cheatography.com/murenei/ Last updated 29th May, 2018. Learn to solve cryptic crosswords!
tutify.com.au Page 2 of 2. http://crosswordcheats.com

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
Scikit Learn Docs PDF
No ratings yet
Scikit Learn Docs PDF
2,387 pages
Subaru XV Manual
No ratings yet
Subaru XV Manual
8 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
A Practical Time-Series Tutorial With MATLAB
No ratings yet
A Practical Time-Series Tutorial With MATLAB
95 pages
Python Tutorial: Release 2.0
100% (1)
Python Tutorial: Release 2.0
77 pages
Numpy Python Cheat Sheet
100% (1)
Numpy Python Cheat Sheet
1 page
Data Science With Python - Lesson 09 - Data Science With Python - NLP PDF
No ratings yet
Data Science With Python - Lesson 09 - Data Science With Python - NLP PDF
62 pages
Numpy Basics: Arithmetic Operations
No ratings yet
Numpy Basics: Arithmetic Operations
6 pages
Jupyter Notebook Markdown
100% (1)
Jupyter Notebook Markdown
3 pages
Jupyter Installation
100% (1)
Jupyter Installation
19 pages
Python Numpy Tutorial (CS231n-Stanford)
No ratings yet
Python Numpy Tutorial (CS231n-Stanford)
27 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Pandas Practice Questions
No ratings yet
Pandas Practice Questions
2 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
Introduction To Data Visualization in Python
No ratings yet
Introduction To Data Visualization in Python
16 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
12 Useful Pandas Techniques in Python For Data Manipulation
100% (2)
12 Useful Pandas Techniques in Python For Data Manipulation
19 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Pandas Series Practice Questions
0% (1)
Pandas Series Practice Questions
42 pages
Panda Python
100% (1)
Panda Python
398 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Matplotlib
No ratings yet
Matplotlib
17 pages
Python Seaborn Tutorial - Jupyter Notebook
No ratings yet
Python Seaborn Tutorial - Jupyter Notebook
19 pages
XL Wings
No ratings yet
XL Wings
214 pages
Tutorial Pytorch Best Commands
No ratings yet
Tutorial Pytorch Best Commands
8 pages
Pandas
100% (1)
Pandas
1,131 pages
Supervised Learning
No ratings yet
Supervised Learning
3 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
44 pages
MongoDB CheatSheet
No ratings yet
MongoDB CheatSheet
9 pages
Core Libraries For Machine Learning
No ratings yet
Core Libraries For Machine Learning
5 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Natural Language Processing Artificial Intelligence
No ratings yet
Natural Language Processing Artificial Intelligence
81 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Forecast
No ratings yet
Forecast
82 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Weidadeyue Jupyter Notebook
No ratings yet
Weidadeyue Jupyter Notebook
1 page
K Means Clustering
100% (1)
K Means Clustering
10 pages
Pyspark With Docker
100% (1)
Pyspark With Docker
15 pages
Decorator Hand Out
No ratings yet
Decorator Hand Out
1 page
Interactive Applications Using Matplotlib - Sample Chapter
100% (1)
Interactive Applications Using Matplotlib - Sample Chapter
24 pages
Introductory Notes: Matplotlib: Preliminaries
No ratings yet
Introductory Notes: Matplotlib: Preliminaries
8 pages
Suppython Xii
No ratings yet
Suppython Xii
121 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Gujarat Technological University: Semester - V Subject Name: Python Programming
No ratings yet
Gujarat Technological University: Semester - V Subject Name: Python Programming
4 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
OpenFlow Cookbook
From Everand
OpenFlow Cookbook
Kingston Smiler. S
5/5 (1)
Django 1.0 Template Development
From Everand
Django 1.0 Template Development
Scott Newman
No ratings yet
Mastering IPython 4.0
From Everand
Mastering IPython 4.0
Thomas Bitterman
No ratings yet
Mastering Python High Performance: Learn how to optimize your code and Python performance with this vital guide to Python performance profiling and benchmarking
From Everand
Mastering Python High Performance: Learn how to optimize your code and Python performance with this vital guide to Python performance profiling and benchmarking
Fernando Donglio
No ratings yet
Murenei - Natural Language Processing With Python and NLTK
No ratings yet
Murenei - Natural Language Processing With Python and NLTK
2 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Entropy 21 00479 PDF
No ratings yet
Entropy 21 00479 PDF
17 pages
Flower (Buoy) Path Lane (PVC Tube) Bins (X & O Marks) Box With Candy AUV
No ratings yet
Flower (Buoy) Path Lane (PVC Tube) Bins (X & O Marks) Box With Candy AUV
8 pages
What Is A Black Body ?
No ratings yet
What Is A Black Body ?
82 pages
Writing Tips
No ratings yet
Writing Tips
21 pages
Installation Procedures
No ratings yet
Installation Procedures
5 pages
Selecting Apis For Implementation of Scipy: Toolbox - Skeleton Toolbox
No ratings yet
Selecting Apis For Implementation of Scipy: Toolbox - Skeleton Toolbox
9 pages
Installation Procedures
No ratings yet
Installation Procedures
5 pages
Selecting Apis For Implementation of Scipy: Toolbox - Skeleton Toolbox
No ratings yet
Selecting Apis For Implementation of Scipy: Toolbox - Skeleton Toolbox
9 pages
Rules and FAQ
No ratings yet
Rules and FAQ
3 pages
Examination Fee Reciept
No ratings yet
Examination Fee Reciept
1 page
Files
No ratings yet
Files
1 page
Delhi Public School, R.K.Puram, New Delhi
No ratings yet
Delhi Public School, R.K.Puram, New Delhi
2 pages
21850, Issue 17 Flotation DAF, IAF, SAF
No ratings yet
21850, Issue 17 Flotation DAF, IAF, SAF
3 pages
Check List For Equipment Layout (Design)
100% (2)
Check List For Equipment Layout (Design)
5 pages
Annex 1: Bibliography: DBA Design by Analysis
No ratings yet
Annex 1: Bibliography: DBA Design by Analysis
12 pages
3 Types of Braking
No ratings yet
3 Types of Braking
19 pages
ElectrePro Help PDF
100% (2)
ElectrePro Help PDF
44 pages
Mathematics For Electrical Engineering and Computing
No ratings yet
Mathematics For Electrical Engineering and Computing
7 pages
AWS Instance Types For SAP Business One, Version For SAP Hana
No ratings yet
AWS Instance Types For SAP Business One, Version For SAP Hana
24 pages
Siemens FireFinder XLS PMI Operation Manual PDF
No ratings yet
Siemens FireFinder XLS PMI Operation Manual PDF
62 pages
Plummer Block Housings - PT International Corp
No ratings yet
Plummer Block Housings - PT International Corp
2 pages
14 C++ Return by Reference
No ratings yet
14 C++ Return by Reference
1 page
14.1 Apendix 04 PROFIBUS PDF
No ratings yet
14.1 Apendix 04 PROFIBUS PDF
34 pages
Design and Analysis of Gravity Dam - A Case Study Analysis Using Staad Pro 1
No ratings yet
Design and Analysis of Gravity Dam - A Case Study Analysis Using Staad Pro 1
11 pages
Lecture ESD 40
No ratings yet
Lecture ESD 40
53 pages
Unit 2 - Database Management System - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Database Management System - WWW - Rgpvnotes.in
16 pages
Ohm's Law - GROUP 1
No ratings yet
Ohm's Law - GROUP 1
31 pages
Musical Notes and Time Value
100% (1)
Musical Notes and Time Value
11 pages
01 JMS 312 GS-BL 1065 KW
No ratings yet
01 JMS 312 GS-BL 1065 KW
6 pages
NFC Par I 03 15 2021 1454
No ratings yet
NFC Par I 03 15 2021 1454
6 pages
(Ebook) Structural Dynamics Fundamentals and Advanced Applications, Volume II: Volume II by Alvar M. Kabe, Brian H. Sako ISBN 9780128216156, 0128216158 2024 Scribd Download
100% (3)
(Ebook) Structural Dynamics Fundamentals and Advanced Applications, Volume II: Volume II by Alvar M. Kabe, Brian H. Sako ISBN 9780128216156, 0128216158 2024 Scribd Download
55 pages
Manual Micropipeta
No ratings yet
Manual Micropipeta
28 pages
CQ On Chap-3 (Chemistry 1 Paper)
No ratings yet
CQ On Chap-3 (Chemistry 1 Paper)
4 pages
Acceleration Time Graphs Notes As Level Physics
No ratings yet
Acceleration Time Graphs Notes As Level Physics
1 page
9700 BIOLOGY: MARK SCHEME For The May/June 2009 Question Paper For The Guidance of Teachers
No ratings yet
9700 BIOLOGY: MARK SCHEME For The May/June 2009 Question Paper For The Guidance of Teachers
5 pages
Novedades Gerber 8.5 AE
No ratings yet
Novedades Gerber 8.5 AE
56 pages
Riko DN10003 PN10 SZ40 22may
No ratings yet
Riko DN10003 PN10 SZ40 22may
5 pages
Section 7.1: Introduction To Hypothesis Testing
No ratings yet
Section 7.1: Introduction To Hypothesis Testing
31 pages
Laplace Table PDF
No ratings yet
Laplace Table PDF
2 pages
ACSIEM2
No ratings yet
ACSIEM2
11 pages