0% found this document useful (0 votes)

138 views

NLP - Cheatsheet

This document provides an overview of common natural language processing techniques for text preprocessing, tokenization, part-of-speech tagging, named entity recognition, text normalization, dependency parsing, chunking, word embeddings, sentiment analysis, and text classification. Code examples are given using libraries like NLTK, spaCy, gensim, scikit-learn, and Transformers.

Uploaded by

ADITYA MANWATKAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views

NLP - Cheatsheet

Uploaded by

ADITYA MANWATKAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

# [ Natural Language Processing (NLP) ] ( CheatSheet )

Text Preprocessing

● Lowercase text: text = text.lower()

● Remove punctuation: import string text = text.translate(str.maketrans("",
"", string.punctuation))
● Remove digits: text = ''.join(c for c in text if not c.isdigit())
● Remove stopwords: from nltk.corpus import stopwords stopwords_list =
stopwords.words('english') text = ' '.join(word for word in text.split()
if word not in stopwords_list)
● Remove custom stopwords: custom_stopwords = ['the', 'and', 'is'] text = '
'.join(word for word in text.split() if word not in custom_stopwords)
● Remove short words: text = ' '.join(word for word in text.split() if
len(word) > 2)
● Remove long words: text = ' '.join(word for word in text.split() if
len(word) < 15)
● Replace specific words: text = text.replace('old_word', 'new_word')
● Remove HTML tags: from bs4 import BeautifulSoup text =
BeautifulSoup(text, 'html.parser').get_text()
● Remove URLs: import re text = re.sub(r'http\S+', '', text)
● Remove email addresses: import re text = re.sub(r'\S+@\S+', '', text)
● Expand contractions: import contractions text = contractions.fix(text)
● Normalize Unicode characters: import unicodedata text =
unicodedata.normalize('NFKD', text)
● Remove accented characters: import unidecode text =
unidecode.unidecode(text)
● Remove extra whitespaces: text = ' '.join(text.split())

Tokenization

● Tokenize text into words (NLTK): from nltk.tokenize import word_tokenize

tokens = word_tokenize(text)
● Tokenize text into words (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text) tokens = [token.text for
token in doc]
● Tokenize text into sentences (NLTK): from nltk.tokenize import
sent_tokenize sentences = sent_tokenize(text)

By: Waleed Mousa

● Tokenize text into sentences (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text) sentences = [sent.text for
sent in doc.sents]
● Tokenize text into n-grams (NLTK): from nltk.util import ngrams n = 2 #
Change n to the desired n-gram size ngrams_list =
list(ngrams(text.split(), n))

Part-of-Speech Tagging

● POS tagging (NLTK): from nltk import pos_tag tagged_tokens =

pos_tag(tokens)
● POS tagging (spaCy): import spacy nlp = spacy.load('en_core_web_sm') doc
= nlp(text) tagged_tokens = [(token.text, token.pos_) for token in doc]
● Fine-grained POS tagging (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text) tagged_tokens =
[(token.text, token.tag_) for token in doc]

Named Entity Recognition (NER)

● NER (NLTK): from nltk import ne_chunk ne_chunks = ne_chunk(tagged_tokens)

● NER (spaCy): import spacy nlp = spacy.load('en_core_web_sm') doc =
nlp(text) entities = [(ent.text, ent.label_) for ent in doc.ents]
● NER with custom labels (spaCy): import spacy from spacy.tokens import
Span nlp = spacy.load('en_core_web_sm') doc = nlp(text) custom_entities =
[Span(doc, start, end, label='CUSTOM') for start, end in
custom_entity_offsets] doc.ents = list(doc.ents) + custom_entities

Text Normalization

● Stemming (NLTK): from nltk.stem import PorterStemmer stemmer =

PorterStemmer() stemmed_tokens = [stemmer.stem(token) for token in
tokens]
● Lemmatization (NLTK): from nltk.stem import WordNetLemmatizer lemmatizer
= WordNetLemmatizer() lemmatized_tokens = [lemmatizer.lemmatize(token)
for token in tokens]
● Lemmatization with POS (NLTK): from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer() lemmatized_tokens =
[lemmatizer.lemmatize(token, pos='v') for token, pos in tagged_tokens]
● Lemmatization (spaCy): import spacy nlp = spacy.load('en_core_web_sm')
doc = nlp(text) lemmatized_tokens = [token.lemma_ for token in doc]

By: Waleed Mousa

● Lowercase and lemmatize (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text.lower()) lemmatized_tokens =
[token.lemma_ for token in doc]
● Lowercase, lemmatize, and remove stopwords (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text.lower()) lemmatized_tokens =
[token.lemma_ for token in doc if not token.is_stop]

Dependency Parsing

● Dependency parsing (spaCy): import spacy nlp =

spacy.load('en_core_web_sm') doc = nlp(text) for token in doc:
print(token.text, token.dep_, token.head.text)
● Extract subject-verb-object triples (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text) for token in doc: if
token.dep_ in ['nsubj', 'dobj']: print(token.text, token.dep_,
token.head.text)
● Visualize dependency tree (spaCy): import spacy from spacy import
displacy nlp = spacy.load('en_core_web_sm') doc = nlp(text)
displacy.render(doc, style='dep', jupyter=True)

Chunking

● Noun phrase chunking (NLTK): from nltk import RegexpParser grammar =

r'NP: {<DT>?<JJ>*<NN>+}' chunk_parser = RegexpParser(grammar) chunks =
chunk_parser.parse(tagged_tokens)
● Verb phrase chunking (NLTK): from nltk import RegexpParser grammar =
r'VP: {<VB.*><NP|PP|CLAUSE>+$}' chunk_parser = RegexpParser(grammar)
chunks = chunk_parser.parse(tagged_tokens)
● Custom chunking (NLTK): from nltk import RegexpParser grammar = r'CUSTOM:
{<JJ>+<NN>}' chunk_parser = RegexpParser(grammar) chunks =
chunk_parser.parse(tagged_tokens)

Word Embeddings

● Load pre-trained word embeddings (Word2Vec): from gensim.models import

KeyedVectors model =
KeyedVectors.load_word2vec_format('path/to/embeddings.bin', binary=True)
● Get word vector: vector = model['word']
● Find similar words: similar_words = model.most_similar('word')
● Find analogies: analogies = model.most_similar(positive=['king',
'woman'], negative=['man'])

By: Waleed Mousa

● Compute word similarity: similarity = model.similarity('word1', 'word2')
● Compute sentence similarity: from scipy.spatial.distance import cosine
sentence1_vector = np.mean([model[word] for word in sentence1.split()],
axis=0) sentence2_vector = np.mean([model[word] for word in
sentence2.split()], axis=0) similarity = 1 - cosine(sentence1_vector,
sentence2_vector)
● Train custom Word2Vec embeddings: from gensim.models import Word2Vec
sentences = [['word1', 'word2', 'word3'], ['word4', 'word5', 'word6']]
model = Word2Vec(sentences, size=100, window=5, min_count=1, workers=4)
● Load pre-trained GloVe embeddings: from gensim.scripts.glove2word2vec
import glove2word2vec glove_input_file = 'path/to/glove.txt'
word2vec_output_file = 'path/to/glove.word2vec'
glove2word2vec(glove_input_file, word2vec_output_file) model =
KeyedVectors.load_word2vec_format(word2vec_output_file, binary=False)
● Load pre-trained FastText embeddings: from gensim.models.fasttext import
load_facebook_vectors model =
load_facebook_vectors('path/to/fasttext.bin')
● Use spaCy's pre-trained word embeddings: import spacy nlp =
spacy.load('en_core_web_lg') doc = nlp(text) word_vectors = [token.vector
for token in doc]

Sentiment Analysis

● TextBlob sentiment analysis: from textblob import TextBlob blob =

TextBlob(text) sentiment = blob.sentiment.polarity
● VADER sentiment analysis: from nltk.sentiment.vader import
SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer()
sentiment = analyzer.polarity_scores(text)
● Flair sentiment analysis: from flair.models import TextClassifier from
flair.data import Sentence classifier =
TextClassifier.load('en-sentiment') sentence = Sentence(text)
classifier.predict(sentence) sentiment = sentence.labels[0].value
● Transformers sentiment analysis (BERT): from transformers import pipeline
classifier = pipeline('sentiment-analysis') sentiment =
classifier(text)[0]['label']

Text Classification

● Naive Bayes classifier (NLTK): from nltk.classify import

NaiveBayesClassifier train_data = [(text1, 'class1'), (text2, 'class2'),

By: Waleed Mousa

...] classifier = NaiveBayesClassifier.train(train_data) predicted_class
= classifier.classify(text)
● Naive Bayes classifier (scikit-learn): from
sklearn.feature_extraction.text import CountVectorizer from
sklearn.naive_bayes import MultinomialNB vectorizer = CountVectorizer() X
= vectorizer.fit_transform(texts) y = labels classifier = MultinomialNB()
classifier.fit(X, y) predicted_class =
classifier.predict(vectorizer.transform([text]))[0]
● Support Vector Machine (SVM) classifier (scikit-learn): from
sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm
import LinearSVC vectorizer = TfidfVectorizer() X =
vectorizer.fit_transform(texts) y = labels classifier = LinearSVC()
classifier.fit(X, y) predicted_class =
classifier.predict(vectorizer.transform([text]))[0]
● Logistic Regression classifier (scikit-learn): from
sklearn.feature_extraction.text import CountVectorizer from
sklearn.linear_model import LogisticRegression vectorizer =
CountVectorizer() X = vectorizer.fit_transform(texts) y = labels
classifier = LogisticRegression() classifier.fit(X, y) predicted_class =
classifier.predict(vectorizer.transform([text]))[0]
● Random Forest classifier (scikit-learn): from
sklearn.feature_extraction.text import TfidfVectorizer from
sklearn.ensemble import RandomForestClassifier vectorizer =
TfidfVectorizer() X = vectorizer.fit_transform(texts) y = labels
classifier = RandomForestClassifier() classifier.fit(X, y)
predicted_class = classifier.predict(vectorizer.transform([text]))[0]
● FastText classifier: from fasttext import train_supervised train_data =
"train.txt" model = train_supervised(input=train_data, lr=1.0, epoch=25,
wordNgrams=2) predicted_class = model.predict(text)[0][0]
● BERT classifier (Transformers): from transformers import pipeline
classifier = pipeline('text-classification', model='bert-base-uncased')
predicted_class = classifier(text)[0]['label']

Topic Modeling

● Latent Dirichlet Allocation (LDA) (gensim): from gensim import corpora,

models dictionary = corpora.Dictionary(texts) corpus =
[dictionary.doc2bow(text) for text in texts] lda_model =
models.LdaMulticore(corpus, num_topics=10, id2word=dictionary, passes=10)

By: Waleed Mousa

● Non-Negative Matrix Factorization (NMF) (scikit-learn): from
sklearn.feature_extraction.text import TfidfVectorizer from
sklearn.decomposition import NMF vectorizer = TfidfVectorizer() X =
vectorizer.fit_transform(texts) nmf_model = NMF(n_components=10,
random_state=1) nmf_model.fit(X) topic_words =
[vectorizer.get_feature_names()[i] for i in
nmf_model.components_.argsort()[:, :10]
● Hierarchical Dirichlet Process (HDP) (gensim): from gensim import
corpora, models dictionary = corpora.Dictionary(texts) corpus =
[dictionary.doc2bow(text) for text in texts] hdp_model =
models.HdpModel(corpus, dictionary)

Text Summarization

● TextRank summarization (gensim): from gensim.summarization import

summarize summary = summarize(text, ratio=0.2)
● LexRank summarization (sumy): from sumy.parsers.plaintext import
PlaintextParser from sumy.nlp.tokenizers import Tokenizer from
sumy.summarizers.lex_rank import LexRankSummarizer parser =
PlaintextParser.from_string(text, Tokenizer('english')) summarizer =
LexRankSummarizer() summary = summarizer(parser.document,
sentences_count=3)
● Luhn summarization (sumy): from sumy.parsers.plaintext import
PlaintextParser from sumy.nlp.tokenizers import Tokenizer from
sumy.summarizers.luhn import LuhnSummarizer parser =
PlaintextParser.from_string(text, Tokenizer('english')) summarizer =
LuhnSummarizer() summary = summarizer(parser.document, sentences_count=3)
● LSA summarization (sumy): from sumy.parsers.plaintext import
PlaintextParser from sumy.nlp.tokenizers import Tokenizer from
sumy.summarizers.lsa import LsaSummarizer parser =
PlaintextParser.from_string(text, Tokenizer('english')) summarizer =
LsaSummarizer() summary = summarizer(parser.document, sentences_count=3)
● BART summarization (Transformers): from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(text, max_length=100, min_length=30,
do_sample=False)
● T5 summarization (Transformers): from transformers import pipeline
summarizer = pipeline("summarization", model="t5-base",
tokenizer="t5-base", framework="tf") summary = summarizer(text,
max_length=100, min_length=30, do_sample=False)

By: Waleed Mousa

Language Translation

● Google Translate API: from googletrans import Translator translator =

Translator() translated_text = translator.translate(text, dest='fr').text
● Transformers translation (MarianMT): from transformers import pipeline
translator = pipeline("translation_en_to_fr",
model="Helsinki-NLP/opus-mt-en-fr") translated_text =
translator(text)[0]['translation_text']
● Transformers translation (T5): from transformers import pipeline
translator = pipeline("translation_en_to_de", model="t5-base",
tokenizer="t5-base", framework="tf") translated_text =
translator(text)[0]['translation_text']

Text Generation

● GPT-2 text generation: from transformers import pipeline generator =

pipeline('text-generation', model='gpt2') generated_text =
generator(text, max_length=100,
num_return_sequences=1)[0]['generated_text']
● XLNet text generation: from transformers import pipeline generator =
pipeline('text-generation', model='xlnet-base-cased') generated_text =
generator(text, max_length=100,
num_return_sequences=1)[0]['generated_text']
● CTRL text generation: from transformers import pipeline generator =
pipeline('text-generation', model='ctrl') generated_text =
generator(text, max_length=100,
num_return_sequences=1)[0]['generated_text']

Coreference Resolution

● Neural coreference resolution (spaCy): import spacy nlp =

spacy.load('en_core_web_sm') doc = nlp(text) for cluster in
doc._.coref_clusters: print(cluster.main, cluster.mentions)
● Rule-based coreference resolution (neuralcoref): import spacy import
neuralcoref nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp) doc = nlp(text) for cluster in
doc._.coref_clusters: print(cluster.main, cluster.mentions)

Keyword Extraction

By: Waleed Mousa

● TF-IDF keyword extraction (scikit-learn): from
sklearn.feature_extraction.text import TfidfVectorizer vectorizer =
TfidfVectorizer() X = vectorizer.fit_transform(texts) feature_names =
vectorizer.get_feature_names() keywords = [feature_names[i] for i in
X.toarray()[0].argsort()[::-1][:5]]
● RAKE keyword extraction (RAKE-NLTK): from rake_nltk import Rake rake =
Rake() rake.extract_keywords_from_text(text) keywords =
rake.get_ranked_phrases()[:5]
● TextRank keyword extraction (gensim): from gensim.summarization import
keywords keywords = keywords(text).split('\n')[:5]
● YAKE keyword extraction (yake): import yake kw_extractor =
yake.KeywordExtractor() keywords = kw_extractor.extract_keywords(text)

Named Entity Linking

● Wikifier entity linking: import requests url =

"https://www.wikifier.org/annotate-article" params = {"text": text,
"lang": "en", "userKey": "YOUR_API_KEY"} response = requests.get(url,
params=params) entities = response.json()
● DBpedia Spotlight entity linking: import requests url =
"https://api.dbpedia-spotlight.org/en/annotate" headers = {"Accept":
"application/json"} params = {"text": text} response = requests.get(url,
headers=headers, params=params) entities = response.json()

Text Similarity

● Cosine similarity (scikit-learn): from sklearn.feature_extraction.text

import TfidfVectorizer from sklearn.metrics.pairwise import
cosine_similarity vectorizer = TfidfVectorizer() X =
vectorizer.fit_transform(texts) similarity_matrix = cosine_similarity(X)
● Jaccard similarity (scikit-learn): from sklearn.feature_extraction.text
import CountVectorizer from sklearn.metrics import jaccard_score
vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts)
jaccard_scores = [jaccard_score(X[0], X[i]) for i in range(1,
len(texts))]
● Levenshtein distance (Python): def levenshtein_distance(s1, s2): return
sum(c1 != c2 for c1, c2 in zip(s1, s2)) + abs(len(s1) - len(s2)) distance
= levenshtein_distance(text1, text2)

By: Waleed Mousa

● Semantic similarity (spaCy): import spacy nlp =
spacy.load('en_core_web_lg') doc1 = nlp(text1) doc2 = nlp(text2)
similarity = doc1.similarity(doc2)

Sequence Labeling

● Part-of-Speech (POS) tagging (spaCy): import spacy nlp =

spacy.load('en_core_web_sm') doc = nlp(text) pos_tags = [(token.text,
token.pos_) for token in doc]
● Named Entity Recognition (NER) (spaCy): import spacy nlp =
spacy.load('en_core_web_sm') doc = nlp(text) entities = [(ent.text,
ent.label_) for ent in doc.ents]
● Chunking (spaCy): import spacy nlp = spacy.load('en_core_web_sm') doc =
nlp(text) chunks = [(chunk.text, chunk.label_) for chunk in
doc.noun_chunks]
● Semantic Role Labeling (SRL) (AllenNLP): from
allennlp.predictors.predictor import Predictor predictor =
Predictor.from_path("https://storage.googleapis.com/allennlp-public-model
s/bert-base-srl-2020.03.24.tar.gz") srl =
predictor.predict(sentence=text)

Language Identification

● langdetect: from langdetect import detect language = detect(text)

● langid: import langid language, confidence = langid.classify(text)
● fastText language identification: import fasttext model =
fasttext.load_model('lid.176.bin') language =
model.predict(text)[0][0][-2:]

Text Preprocessing (Advanced)

● Spell correction (pyspellchecker): from spellchecker import SpellChecker

spell = SpellChecker() corrected_text = ' '.join([spell.correction(word)
for word in text.split()])
● Text normalization (unidecode): from unidecode import unidecode
normalized_text = unidecode(text)
● Text standardization (ftfy): from ftfy import fix_text standardized_text
= fix_text(text)
● Emoji handling (emoji): import emoji text_without_emoji =
emoji.get_emoji_regexp().sub(r'', text)

By: Waleed Mousa

● Hashtag handling (regex): import re text_without_hashtags =
re.sub(r'#\w+', '', text)
● Mention handling (regex): import re text_without_mentions =
re.sub(r'@\w+', '', text)
● URL handling (urllib): from urllib.parse import urlparse def
is_url(text): try: result = urlparse(text) return all([result.scheme,
result.netloc]) except: return False text_without_urls = ' '.join([word
for word in text.split() if not is_url(word)])

By: Waleed Mousa

Machine Learning For Algorithmic Trading
40% (10)
Machine Learning For Algorithmic Trading
13 pages
Long_Short_Term_Memory_(LSTM)
No ratings yet
Long_Short_Term_Memory_(LSTM)
23 pages
Splunk-CrowdStrike Hunting Cheat Sheet
No ratings yet
Splunk-CrowdStrike Hunting Cheat Sheet
8 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
2022 Staticspeed Vunerability Report Template
No ratings yet
2022 Staticspeed Vunerability Report Template
57 pages
Getting Started With Artificial Intelligence
No ratings yet
Getting Started With Artificial Intelligence
69 pages
Bro Log Vars
No ratings yet
Bro Log Vars
6 pages
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
No ratings yet
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
5 pages
Read & Download (PDF Kindle)
No ratings yet
Read & Download (PDF Kindle)
5 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
Using Python Libraries-1
No ratings yet
Using Python Libraries-1
61 pages
NLP Cheatsheet
100% (2)
NLP Cheatsheet
18 pages
Advanced Topics With Python
No ratings yet
Advanced Topics With Python
17 pages
Untitled
No ratings yet
Untitled
53 pages
Computer Programming: Using GNU Smalltalk
100% (1)
Computer Programming: Using GNU Smalltalk
119 pages
Machine Learning Complete Notes
No ratings yet
Machine Learning Complete Notes
102 pages
Neural Network Methods for Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657 - Own the ebook now with all fully detailed content
100% (7)
Neural Network Methods for Natural Language Processing 1st Edition by Yoav Goldberg ISBN 9783031021657 3031021657 - Own the ebook now with all fully detailed content
89 pages
Jason Brown Lee Text Books
0% (1)
Jason Brown Lee Text Books
1 page
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
100% (1)
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
24 pages
Deep Learning Hardware
No ratings yet
Deep Learning Hardware
82 pages
Exploring Deep Learning For Language
No ratings yet
Exploring Deep Learning For Language
160 pages
Fabric Security
No ratings yet
Fabric Security
176 pages
Structured Deep Neural Networks For Speech Recognition
No ratings yet
Structured Deep Neural Networks For Speech Recognition
196 pages
Deep Learning in Bioinfomatics.
No ratings yet
Deep Learning in Bioinfomatics.
31 pages
Concise Machine Learning PDF
No ratings yet
Concise Machine Learning PDF
172 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Python PPT 03
No ratings yet
Python PPT 03
54 pages
Roadmap To Become AI Engineer
No ratings yet
Roadmap To Become AI Engineer
12 pages
Modules and Packages in Python
No ratings yet
Modules and Packages in Python
24 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Deep Learning
No ratings yet
Deep Learning
34 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Cryptography The Art of Secret Writing: SRKR Engineering College Bhimavaram
No ratings yet
Cryptography The Art of Secret Writing: SRKR Engineering College Bhimavaram
12 pages
Diff Between ML DL AI PDF
No ratings yet
Diff Between ML DL AI PDF
3 pages
Altoros Tensorflow Cheat Sheet
100% (1)
Altoros Tensorflow Cheat Sheet
1 page
Machine Learning For Everyone
100% (1)
Machine Learning For Everyone
50 pages
Top 100 Deep Learning Interview Questions
No ratings yet
Top 100 Deep Learning Interview Questions
157 pages
Machine Learning For Everyone - in Simple Words. With Real-World Examples. Yes, Again PDF
No ratings yet
Machine Learning For Everyone - in Simple Words. With Real-World Examples. Yes, Again PDF
62 pages
LangChain and LlamaIndex Projects Lab Book Hooking Large Language Models Up to the Real World (Mark Watson) (Z-Library)
No ratings yet
LangChain and LlamaIndex Projects Lab Book Hooking Large Language Models Up to the Real World (Mark Watson) (Z-Library)
86 pages
Pytorch Tutorial PDF
No ratings yet
Pytorch Tutorial PDF
27 pages
2024 NTU - Resaro - LLM - Security - Paper
No ratings yet
2024 NTU - Resaro - LLM - Security - Paper
19 pages
Linux Device Drivers Guide Via chatGPT
No ratings yet
Linux Device Drivers Guide Via chatGPT
25 pages
100 Days Data Analyst Learning Roadmap
No ratings yet
100 Days Data Analyst Learning Roadmap
6 pages
What Is The Need For Residual Learning?
No ratings yet
What Is The Need For Residual Learning?
3 pages
Python Crash Course 0.07 PDF
No ratings yet
Python Crash Course 0.07 PDF
68 pages
Artificial Intelligence: Spring 2008, Juris Vīksna
No ratings yet
Artificial Intelligence: Spring 2008, Juris Vīksna
67 pages
Artificial Intelligence Definition, Ethics and Standards
50% (2)
Artificial Intelligence Definition, Ethics and Standards
12 pages
10 Most Asked LLM Interview Questions
No ratings yet
10 Most Asked LLM Interview Questions
12 pages
McCormick How Stable Diffusion Works Dec 2022
No ratings yet
McCormick How Stable Diffusion Works Dec 2022
13 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
(Numpy) - Extended Cheatsheet
No ratings yet
(Numpy) - Extended Cheatsheet
8 pages
State of ModelOps 2021
100% (1)
State of ModelOps 2021
22 pages
MLDD 1
No ratings yet
MLDD 1
44 pages
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
No ratings yet
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
105 pages
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
No ratings yet
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
41 pages
Deep Learning Tutorial: Release 0.1
100% (1)
Deep Learning Tutorial: Release 0.1
137 pages
Design A Machine Learning System
No ratings yet
Design A Machine Learning System
9 pages
Cryptography Roadmap
No ratings yet
Cryptography Roadmap
1 page
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
Living With Linux In the Industrial World
From Everand
Living With Linux In the Industrial World
Elaiya Iswera Lallan
No ratings yet
Professional WebGL Programming: Developing 3D Graphics for the Web
From Everand
Professional WebGL Programming: Developing 3D Graphics for the Web
Andreas Anyuru
No ratings yet
PostgreSQL 9 Administration Cookbook: LITE Edition
From Everand
PostgreSQL 9 Administration Cookbook: LITE Edition
Simon Riggs
3/5 (1)
Three level Authentication system
No ratings yet
Three level Authentication system
5 pages
Workshop at KDK
No ratings yet
Workshop at KDK
2 pages
Sample_Paper_24-25
No ratings yet
Sample_Paper_24-25
1 page
Class 9 PPR
No ratings yet
Class 9 PPR
2 pages
Class 9 Lesson 11 & 15
No ratings yet
Class 9 Lesson 11 & 15
2 pages
Zero To Advance DSA in 30 Days
No ratings yet
Zero To Advance DSA in 30 Days
33 pages
AComparative Study of Machine Learning and Deep Learning Techniques For Fake News Detection
No ratings yet
AComparative Study of Machine Learning and Deep Learning Techniques For Fake News Detection
28 pages
Ber Topic
No ratings yet
Ber Topic
10 pages
Multi-Aspect Co-Attentional Collaborative Filtering For Extreme
No ratings yet
Multi-Aspect Co-Attentional Collaborative Filtering For Extreme
11 pages
PG3 (1)
No ratings yet
PG3 (1)
31 pages
Matching and Scoring Final
No ratings yet
Matching and Scoring Final
11 pages
(Ebook) Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises (Uncertainty, Computational Techniques, and Decision Intelligence) by Sahar Tahvili, Leo Hatvani ISBN 9780323919135, 0323919138 download pdf
100% (6)
(Ebook) Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises (Uncertainty, Computational Techniques, and Decision Intelligence) by Sahar Tahvili, Leo Hatvani ISBN 9780323919135, 0323919138 download pdf
81 pages
Applying Deep Learning For Arabic Keyphrase Extraction
No ratings yet
Applying Deep Learning For Arabic Keyphrase Extraction
8 pages
nlp
No ratings yet
nlp
6 pages
2306.08121v2
No ratings yet
2306.08121v2
12 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
Data Redundancy Using LSTM
No ratings yet
Data Redundancy Using LSTM
24 pages
Resume Job Matching
No ratings yet
Resume Job Matching
16 pages
图书馆系统相关文献综述
100% (1)
图书馆系统相关文献综述
8 pages
Improving WordNet Using Word Embeddings
No ratings yet
Improving WordNet Using Word Embeddings
8 pages
Assignment 1: Welcome To Tensorflow: Problem 1: Op Is All You Need
No ratings yet
Assignment 1: Welcome To Tensorflow: Problem 1: Op Is All You Need
4 pages
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
No ratings yet
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
18 pages
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
No ratings yet
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
8 pages
Context-Based Bengali Next Word Prediction A Compa
No ratings yet
Context-Based Bengali Next Word Prediction A Compa
8 pages
Word Embedding Generation For Telugu Corpus
No ratings yet
Word Embedding Generation For Telugu Corpus
28 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields
No ratings yet
Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields
6 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
37 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
No ratings yet
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
17 pages
Data Augmentation Approaches in Natural Language Processing A Survey
No ratings yet
Data Augmentation Approaches in Natural Language Processing A Survey
20 pages

NLP - Cheatsheet

Uploaded by

NLP - Cheatsheet

Uploaded by

# [ Natural Language Processing (NLP) ] ( CheatSheet )

● Lowercase text: text = text.lower()

● Tokenize text into words (NLTK): from nltk.tokenize import word_tokenize

By: Waleed Mousa

● POS tagging (NLTK): from nltk import pos_tag tagged_tokens =

Named Entity Recognition (NER)

● NER (NLTK): from nltk import ne_chunk ne_chunks = ne_chunk(tagged_tokens)

● Stemming (NLTK): from nltk.stem import PorterStemmer stemmer =

By: Waleed Mousa

● Dependency parsing (spaCy): import spacy nlp =

● Noun phrase chunking (NLTK): from nltk import RegexpParser grammar =

● Load pre-trained word embeddings (Word2Vec): from gensim.models import

By: Waleed Mousa

● TextBlob sentiment analysis: from textblob import TextBlob blob =

● Naive Bayes classifier (NLTK): from nltk.classify import

By: Waleed Mousa

● Latent Dirichlet Allocation (LDA) (gensim): from gensim import corpora,

By: Waleed Mousa

● TextRank summarization (gensim): from gensim.summarization import

By: Waleed Mousa

● Google Translate API: from googletrans import Translator translator =

● GPT-2 text generation: from transformers import pipeline generator =

● Neural coreference resolution (spaCy): import spacy nlp =

By: Waleed Mousa

Named Entity Linking

● Wikifier entity linking: import requests url =

● Cosine similarity (scikit-learn): from sklearn.feature_extraction.text

By: Waleed Mousa

● Part-of-Speech (POS) tagging (spaCy): import spacy nlp =

● langdetect: from langdetect import detect language = detect(text)

Text Preprocessing (Advanced)

● Spell correction (pyspellchecker): from spellchecker import SpellChecker

By: Waleed Mousa

By: Waleed Mousa

You might also like