0% found this document useful (0 votes)

3 views5 pages

NLP 7

This document is a comprehensive cheat sheet on Natural Language Processing (NLP), covering key topics such as data collection, preprocessing, feature extraction, sentiment analysis, text classification, named entity recognition, machine translation, text summarization, text generation, and real-world applications of NLP. It includes code snippets for various NLP tasks using popular libraries like NLTK, scikit-learn, spaCy, and Hugging Face Transformers. The document serves as a practical guide for implementing NLP techniques and understanding their applications.

Uploaded by

sifax lakhdarchaouche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views5 pages

NLP 7

Uploaded by

sifax lakhdarchaouche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

# NLP (Natural Language Processing) Cheat Sheet - Partie 7

---

## 1. Pipeline de Traitement de Langue Naturelle

1. Pipeline typique pour NLP

- **Collecte des Données** : Extraire des données textuelles (articles, tweets,
emails, etc.).
- **Prétraitement** :
- Nettoyage (suppression des caractères spéciaux, normalisation).
- Tokenisation (division en mots ou phrases).
- Suppression des mots vides (stopwords).
- **Extraction de Caractéristiques** :
- TF-IDF, Word2Vec, ou embeddings.
- **Modélisation** :
- Modèles supervisés (classificateurs).
- Modèles non supervisés (clustering).
- **Évaluation** :
- Mesures comme la précision, le rappel, ou la F1-score.

---

## 2. Nettoyage et Prétraitement de Texte

1. Nettoyage de Texte avec regex

```python
import re

text = "Hello!!! NLP is amazing... Visit https://example.com"

cleaned_text = re.sub(r'https?://\S+|www\.\S+', '', text) # Supprimer les URLs
cleaned_text = re.sub(r'[^\w\s]', '', cleaned_text) # Supprimer la ponctuation
print(cleaned_text)
```

2. Tokenisation avec NLTK

```python
from nltk.tokenize import word_tokenize

text = "Natural Language Processing enables machines to understand human

language."
tokens = word_tokenize(text)
print(tokens)
```

3. Suppression des Stopwords

```python
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)
```

---

## 3. **Vectorisation de Texte**

1. TF-IDF avec scikit-learn

```python
from sklearn.feature_extraction.text import TfidfVectorizer

documents = ["I love NLP.", "NLP is amazing.", "I enjoy learning NLP."]
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)
print(vectorizer.get_feature_names_out())
print(tfidf_matrix.toarray())
```

2. Bag of Words (CountVectorizer)

```python
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
bow_matrix = vectorizer.fit_transform(documents)
print(vectorizer.get_feature_names_out())
print(bow_matrix.toarray())
```

---

## 4. **Analyse de Sentiment**

1. Analyse de Sentiment avec VADER

```python
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
sentence = "I absolutely love this product. It's fantastic!"
sentiment_score = analyzer.polarity_scores(sentence)
print(sentiment_score)
```

2. Analyse de Sentiment avec TextBlob

```python
from textblob import TextBlob

sentence = "This movie is great, but the ending was disappointing."

blob = TextBlob(sentence)
print(blob.sentiment)
```

---

## 5. **Classification de Texte**

1. Classification de texte avec un pipeline Hugging Face

```python
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
text = "I really enjoyed the movie. It was fantastic!"
result = classifier(text)
print(result)
```

2. Classification supervisée avec scikit-learn

```python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

data = ["I love NLP.", "I hate math.", "NLP is fun.", "Math is boring."]
labels = ["positive", "negative", "positive", "negative"]

model = make_pipeline(TfidfVectorizer(), MultinomialNB())

model.fit(data, labels)

test_text = ["I enjoy studying NLP."]

print(model.predict(test_text))
```

---

## 6. Reconnaissance d'Entités Nommées (NER)

1. Extraction avec spaCy

```python
import spacy

nlp = spacy.load("en_core_web_sm")
text = "Apple is looking to buy a startup in the UK for $1 billion."
doc = nlp(text)

for ent in doc.ents:

print(f"Entity: {ent.text}, Label: {ent.label_}")
```

2. NER avec Hugging Face

```python
from transformers import pipeline

ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-

english")
text = "Barack Obama was born in Hawaii and became the President of the USA."
entities = ner_pipeline(text)
print(entities)
```

---

## 7. **Traduction Automatique**

1. Traduction avec MarianMT (Hugging Face)

```python
from transformers import MarianMTModel, MarianTokenizer

model_name = "Helsinki-NLP/opus-mt-en-fr"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

text = "Machine learning is the future of technology."

translated = tokenizer.encode(text, return_tensors="pt")
result = model.generate(translated)
translated_text = tokenizer.decode(result[0], skip_special_tokens=True)
print(translated_text)
```
2. **Traduction avec Google Translate API**
```bash
pip install googletrans==4.0.0-rc1
```

```python
from googletrans import Translator

translator = Translator()
text = "Natural Language Processing is amazing."
translation = translator.translate(text, src="en", dest="fr")
print(translation.text)
```

---

## 8. Résumé Automatique (Text Summarization)

1. Résumé avec BART

```python
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

text = """Natural Language Processing (NLP) is a fascinating field of artificial
intelligence.
It focuses on enabling machines to understand, interpret, and respond to human
language."""
summary = summarizer(text, max_length=50, min_length=20, do_sample=False)
print(summary[0]['summary_text'])
```

2. Résumé avec T5

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

text = "Machine learning and artificial intelligence are transforming industries

worldwide."
input_text = f"summarize: {text}"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 9. Modèles de Langue et Génération de Texte

1. Génération de texte avec GPT-2

```python
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

result = generator("Natural Language Processing is", max_length=50,
num_return_sequences=1)
print(result[0]["generated_text"])
```

2. Génération de texte avec GPT-3

```python
import openai

openai.api_key = "YOUR_API_KEY"

response = openai.Completion.create(
engine="text-davinci-003",
prompt="Explain the importance of NLP in modern technology.",
max_tokens=100
)
print(response.choices[0].text.strip())
```

---

## 10. Applications Réelles de NLP

1. Systèmes de Recommandation : Filtrer et recommander des contenus basés sur

les préférences utilisateur.
2. **Chatbots** : Assistants virtuels pour répondre aux requêtes client.
3. **Recherche d’Information** : Extraire des informations spécifiques d’un corpus
de données.
4. **Détection de Spam** : Identifier les spams dans les emails ou les
commentaires.
5. **Traduction en Temps Réel** : Traduire des conversations dans différentes
langues.
6. **Analyse des Médias Sociaux** : Analyser les opinions sur Twitter ou Facebook
pour comprendre les tendances.

Living Language Ultimate German I PDF
69% (13)
Living Language Ultimate German I PDF
437 pages
Oral Communication in Context Second Quarter: Senior High School
82% (11)
Oral Communication in Context Second Quarter: Senior High School
22 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Case Study
No ratings yet
Case Study
25 pages
NLP
No ratings yet
NLP
6 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
NLP Exp2
No ratings yet
NLP Exp2
6 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
NLP Syllabus
No ratings yet
NLP Syllabus
2 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
Gen Ai-1
No ratings yet
Gen Ai-1
6 pages
NLP Sheets
No ratings yet
NLP Sheets
23 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
QLSTMvs LSTM
No ratings yet
QLSTMvs LSTM
7 pages
Pertemuan 4 - Fature Extraction
No ratings yet
Pertemuan 4 - Fature Extraction
18 pages
NLP Report File
No ratings yet
NLP Report File
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP
No ratings yet
NLP
2 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
28 pages
Natural Language Processing Lab 9
No ratings yet
Natural Language Processing Lab 9
13 pages
Chatbot NLP Assignment
No ratings yet
Chatbot NLP Assignment
6 pages
Pertemuan 3 - Preprocessing
No ratings yet
Pertemuan 3 - Preprocessing
25 pages
Python Scripts
No ratings yet
Python Scripts
5 pages
ChatGPT - MyLearning On Coding For NLP
No ratings yet
ChatGPT - MyLearning On Coding For NLP
10 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
Roadmap For Mastering Natural Language Processing
No ratings yet
Roadmap For Mastering Natural Language Processing
3 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
UNIT IV Lecture Notes Covering Natural Language Processing
No ratings yet
UNIT IV Lecture Notes Covering Natural Language Processing
6 pages
Natural Language Processing: Teaching Machines To Understand Human Language
No ratings yet
Natural Language Processing: Teaching Machines To Understand Human Language
2 pages
Ncert Ai Code
No ratings yet
Ncert Ai Code
2 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Analysis of Applied Natural Language Processing With Python - Implementing Machine Learning and Deep Learning Algorithms For Natural Language Processing (PDFDrive)
No ratings yet
Analysis of Applied Natural Language Processing With Python - Implementing Machine Learning and Deep Learning Algorithms For Natural Language Processing (PDFDrive)
2 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
Text Processing Techniques
No ratings yet
Text Processing Techniques
14 pages
NLP Lab Internal Question Bank
No ratings yet
NLP Lab Internal Question Bank
5 pages
QA Using Gemini Langchain ChromaDB PDF
No ratings yet
QA Using Gemini Langchain ChromaDB PDF
2 pages
AI Engineer Cheat Sheet Micro1
No ratings yet
AI Engineer Cheat Sheet Micro1
2 pages
Transformers Models - The "Pipeline": Function
No ratings yet
Transformers Models - The "Pipeline": Function
5 pages
NLP
No ratings yet
NLP
4 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
Learn NLP With Python
No ratings yet
Learn NLP With Python
39 pages
Gen Ai 7,8,9,10
No ratings yet
Gen Ai 7,8,9,10
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
3 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
NLP Chapter - 1 Sheet
No ratings yet
NLP Chapter - 1 Sheet
6 pages
Natural Language Understanding in Chatbots
No ratings yet
Natural Language Understanding in Chatbots
4 pages
Programs Code
No ratings yet
Programs Code
7 pages
NLP 1
No ratings yet
NLP 1
11 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
Pgi20s02j - Lab Record
No ratings yet
Pgi20s02j - Lab Record
24 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Tools
No ratings yet
NLP Tools
5 pages
NLP ML Important Topics Summary
No ratings yet
NLP ML Important Topics Summary
3 pages
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
Mapreduce1.Ipynb - Colab
No ratings yet
Mapreduce1.Ipynb - Colab
6 pages
A Grammar of Chukchi
100% (1)
A Grammar of Chukchi
410 pages
When Why and Where
No ratings yet
When Why and Where
3 pages
Mydigitalbook
No ratings yet
Mydigitalbook
6 pages
Grammar 1 Class 1
No ratings yet
Grammar 1 Class 1
12 pages
Eyes Open 1 SB
100% (1)
Eyes Open 1 SB
130 pages
UNIT 3 - Belajar Bahasa Inggris Dari Nol
No ratings yet
UNIT 3 - Belajar Bahasa Inggris Dari Nol
10 pages
Unit 17. Practice Activities and Tasks For Language and Skills Development - Presentation
No ratings yet
Unit 17. Practice Activities and Tasks For Language and Skills Development - Presentation
15 pages
Participle or Gerund
No ratings yet
Participle or Gerund
1 page
Detailed Lesson Plan For Reading and Writing Grade 11
100% (2)
Detailed Lesson Plan For Reading and Writing Grade 11
5 pages
Japanese
No ratings yet
Japanese
11 pages
English For Philosophy and Philology - Infinitive or - Ing
No ratings yet
English For Philosophy and Philology - Infinitive or - Ing
14 pages
Positive To Comparative Degree
No ratings yet
Positive To Comparative Degree
7 pages
предлоги 2022-2024 Teacher
No ratings yet
предлоги 2022-2024 Teacher
6 pages
Extra
No ratings yet
Extra
3 pages
ENG 211 History of The English Language
No ratings yet
ENG 211 History of The English Language
150 pages
083 Zahra Elbanisa Pragmatics Learning Activities 1
No ratings yet
083 Zahra Elbanisa Pragmatics Learning Activities 1
2 pages
(Mansfield) 2. Demetrius Came Alive and
No ratings yet
(Mansfield) 2. Demetrius Came Alive and
17 pages
Material 3er Corte Ingles
No ratings yet
Material 3er Corte Ingles
33 pages
SURIGAONON1
No ratings yet
SURIGAONON1
11 pages
Plan de Lectie Pentru Limba Engleză
No ratings yet
Plan de Lectie Pentru Limba Engleză
2 pages
Thanksgiving (YL) - Onestopenglish
No ratings yet
Thanksgiving (YL) - Onestopenglish
7 pages
Philippine English
100% (1)
Philippine English
16 pages
English Grammar, 1000 English Verbs Forms, Learn English Verb Forms, Verb List PDF
No ratings yet
English Grammar, 1000 English Verbs Forms, Learn English Verb Forms, Verb List PDF
1 page
The Simple Present Tense: Negative Form
No ratings yet
The Simple Present Tense: Negative Form
32 pages
Eapp Final Exam With Tos From Deped
100% (2)
Eapp Final Exam With Tos From Deped
4 pages
Day Six
No ratings yet
Day Six
21 pages
Communicative Language Teaching
No ratings yet
Communicative Language Teaching
28 pages
Infant One - End of Term - Language Arts
No ratings yet
Infant One - End of Term - Language Arts
6 pages