Nlp2.ipynb - Colab

22jj

Uploaded by

Ajinkya Somawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Nlp2.ipynb - Colab

22jj

Uploaded by

Ajinkya Somawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

4/19/24, 4:06 PM nlp2.

ipynb - Colab

1 import nltk
2 from sklearn.feature_extraction.text import CountVectorizer
3 nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...

[nltk_data] Unzipping tokenizers/punkt.zip.
True

1 # Sample data
2 corpus = [
3 "SPPU is the one of the best university in India.",
4 "India has already allowded so many new universities.",
5 "AICTE is main authority in technical education.",
6 "UGC and AICTE allowded technical education in india?",
7 ]
8

1 # Create the Bag of Words model

2 vectorizer = CountVectorizer()
3 X_bow = vectorizer.fit_transform(corpus)

1 # Get feature names and transformed data

2 feature_names_bow = vectorizer.get_feature_names_out()
3 bow_matrix = X_bow.toarray()

1 # Print feature names and BoW matrix

2 print("Feature Names (BoW):", feature_names_bow)
3 print("BoW Matrix:\n", bow_matrix)

Feature Names (BoW): ['aicte' 'allowded' 'already' 'and' 'authority' 'best' 'education' 'has'
'in' 'india' 'is' 'main' 'many' 'new' 'of' 'one' 'so' 'sppu' 'technical'
'the' 'ugc' 'universities' 'university']
BoW Matrix:
[[0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 2 0 0 1]
[0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0]
[1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0]
[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0]]

1 !pip install scikit-learn gensim nltk

2 from sklearn.feature_extraction.text import TfidfVectorizer

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (4.3.2)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.25.2)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.11.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.4.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3.4.0)
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim) (6.4.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.2)

1 # Create the TF-IDF model

2 vectorizer_tfidf = TfidfVectorizer()
3 X_tfidf = vectorizer_tfidf.fit_transform(corpus)

1 # Get feature names and transformed data

2 feature_names_tfidf = vectorizer_tfidf.get_feature_names_out()
3 tfidf_matrix = X_tfidf.toarray()

1 # Print feature names and TF-IDF matrix

2 print("Feature Names (TF-IDF):", feature_names_tfidf)
3 print("TF-IDF Matrix:\n", tfidf_matrix)

Feature Names (TF-IDF): ['aicte' 'allowded' 'already' 'and' 'authority' 'best' 'education' 'has'
'in' 'india' 'is' 'main' 'many' 'new' 'of' 'one' 'so' 'sppu' 'technical'
'the' 'ugc' 'universities' 'university']
TF-IDF Matrix:
[[0. 0. 0. 0. 0. 0.30954541
0. 0. 0.19757882 0.19757882 0.24404915 0.
0. 0. 0.30954541 0.30954541 0. 0.30954541
0. 0.61909081 0. 0. 0.30954541]
[0. 0.29737611 0.37718389 0. 0. 0.
0. 0.37718389 0. 0.24075159 0. 0.
0.37718389 0.37718389 0. 0. 0.37718389 0.
0. 0. 0. 0.37718389 0. ]
[0.35639424 0. 0. 0. 0.4520409 0.
0.35639424 0. 0.28853185 0. 0.35639424 0.4520409

https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 1/3
4/19/24, 4:06 PM nlp2.ipynb - Colab
0. 0. 0. 0. 0. 0.
0.35639424 0. 0. 0. 0. ]
[0.34242558 0.34242558 0. 0.43432343 0. 0.
0.34242558 0. 0.27722302 0.27722302 0. 0.
0. 0. 0. 0. 0. 0.
0.34242558 0. 0.43432343 0. 0. ]]

1 from gensim.models import Word2Vec

2 from nltk.tokenize import word_tokenize

1 # Tokenize the documents

2 tokenized_corpus = [word_tokenize(doc.lower()) for doc in corpus]
3 # Train the Word2Vec model
4 model_w2v = Word2Vec(sentences=tokenized_corpus, vector_size=100, window=5, min_count=1, workers=4)
5

1 # Get the Word2Vec embeddings for each word

2 embeddings_w2v = [model_w2v.wv[word] for doc in tokenized_corpus for word in doc]
3

1 print("Word2Vec Embeddings (Example):", embeddings_w2v[:5])

output Word2Vec Embeddings (Example): [array([-4.9724146e-03, -1.2821439e-03, 3.2808294e-03, -6.4131343e-03,

-9.7032748e-03, -9.2617292e-03, 9.0226065e-03, 5.3696753e-03,
-4.7882134e-03, -8.3339782e-03, 1.2951550e-03, 2.8790133e-03,
-1.2458978e-03, 1.2699742e-03, -4.3185740e-03, 4.7948617e-03,
1.4796027e-03, 8.8773808e-03, -9.9788336e-03, -5.2726669e-03,
-9.1006216e-03, -3.4521171e-04, -7.8554507e-03, 5.0299861e-03,
-6.3978485e-03, -5.9502255e-03, 5.0689173e-03, -8.1629418e-03,
1.4552462e-03, -7.2365543e-03, 9.8626213e-03, 8.6347228e-03,
1.7700142e-03, 5.7870778e-03, 4.5951647e-03, -5.9907152e-03,
9.7548291e-03, -9.6800094e-03, 8.0489898e-03, 2.7558431e-03,
-3.0530239e-03, -3.5616157e-03, 9.0742577e-03, -5.4402603e-03,
8.1877513e-03, -6.0094744e-03, 8.3887624e-03, -5.5658707e-04,
7.9459315e-03, -3.1532587e-03, 5.9769000e-03, 8.8024903e-03,
2.5420673e-03, 1.3162253e-03, 5.0389166e-03, 8.0063958e-03,
8.5699316e-03, 8.4947534e-03, 7.0525687e-03, 8.0050612e-03,
8.6004017e-03, -3.2667242e-05, -1.0029497e-03, 1.6668305e-03,
4.6866389e-06, 6.8768725e-04, -8.6033335e-03, -9.5959110e-03,
-2.3133331e-03, 8.9247189e-03, -3.6475467e-03, -6.9804057e-03,
4.8784767e-03, 1.0698296e-03, 1.8517259e-03, 3.6527361e-03,
3.5221805e-03, 5.7269363e-03, 1.2339676e-03, 8.4258645e-04,
9.0451026e-03, 2.7826610e-03, -4.7025373e-03, 6.5429192e-03,
5.2161720e-03, 2.8710719e-03, -3.1352045e-03, 3.3388904e-03,
6.3642915e-03, 7.0779454e-03, 9.4181398e-04, -8.5304342e-03,
2.5565538e-04, 3.7333352e-04, 3.9412794e-03, -9.4706584e-03,
9.7080907e-03, -6.9747777e-03, 5.7595358e-03, -9.4276723e-03],
dtype=float32), array([-0.0071398 , 0.00124439, -0.00717616, -0.00223565, 0.00371874,
0.00583367, 0.001202 , 0.00210848, -0.00410963, 0.00722465,
-0.00630294, 0.00464309, -0.0082172 , 0.00204422, -0.00497717,
-0.00425125, -0.00310916, 0.00565882, 0.00579249, -0.00497653,
0.00077368, -0.00849352, 0.00780642, 0.00925912, -0.00274006,
0.00079614, 0.00074861, 0.00547782, -0.00860957, 0.00058059,
0.00686888, 0.00222321, 0.00112738, -0.00932088, 0.00847669,
-0.00625879, -0.00298613, 0.00349368, -0.00077095, 0.00141088,
0.00178102, -0.00682666, -0.00973249, 0.00904355, 0.00619567,
-0.00691088, 0.00339972, 0.00020398, 0.00475398, -0.00711601,
0.00402788, 0.00434206, 0.0099519 , -0.00447311, -0.00138774,
-0.00731545, -0.00969014, -0.00908436, -0.00102474, -0.00650439,
0.00484432, -0.00616408, 0.0025211 , 0.00072896, -0.00339727,
-0.00097363, 0.00997826, 0.00914278, -0.00446263, 0.00908478,
-0.00564142, 0.00593425, -0.00309757, 0.00342886, 0.00302015,
0.006903 , -0.00237185, 0.00877823, 0.00758474, -0.0095498 ,
-0.00801289, -0.00763687, 0.00292587, -0.00279558, -0.00693359,
-0.00812493, 0.00830964, 0.00197929, -0.00933083, -0.00478753,
0.00313186, -0.0047108 , 0.00528206, -0.00423214, 0.00264669,
-0.00804493, 0.00620823, 0.00481998, 0.00078511, 0.00301797],
dtype=float32), array([ 8.1650205e-03, -4.4393395e-03, 8.9832470e-03, 8.2537076e-03,
-4.4381348e-03, 3.0088305e-04, 4.2714751e-03, -3.9304695e-03,
-5.5628875e-03, -6.5138922e-03, -6.7317014e-04, -2.9316242e-04,
4.4594160e-03, -2.4768524e-03, -1.6832585e-04, 2.4654416e-03,
4.8718420e-03, -2.8879360e-05, -6.3401391e-03, -9.2649423e-03,
2.9410048e-05, 6.6641076e-03, 1.4697608e-03, -8.9649623e-03,
-7.9361815e-03, 6.5568490e-03, -3.7907732e-03, 6.2528555e-03,
-6.6814339e-03, 8.4838886e-03, -6.5139448e-03, 3.2910376e-03,
-1.0536474e-03, -6.7908973e-03, -3.2850883e-03, -1.1634642e-03,
-5.4759043e-03, -1.2073567e-03, -7.5638522e-03, 2.6458006e-03,
9.0703918e-03, -2.3795378e-03, -9.7446056e-04, 3.5161036e-03,
8.6651891e-03, -5.9261033e-03, -6.8902504e-03, -2.9335832e-03,
9.1518667e-03, 8.6510333e-04, -8.6797718e-03, -1.4467967e-03,

https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 2/3
4/19/24, 4:06 PM nlp2.ipynb - Colab

https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 3/3

Time+Series+Forecasting Monograph
100% (4)
Time+Series+Forecasting Monograph
58 pages
Minutely
No ratings yet
Minutely
1 page
Pretrained Inception-V3 Convolutional Neural Network - MATLAB Inceptionv3
100% (1)
Pretrained Inception-V3 Convolutional Neural Network - MATLAB Inceptionv3
2 pages
Soft Computing Unit 1 and 2 Questions
100% (5)
Soft Computing Unit 1 and 2 Questions
3 pages
EXAM PREPERATION - Ipynb - Colaboratory-1
No ratings yet
EXAM PREPERATION - Ipynb - Colaboratory-1
8 pages
ML Project
No ratings yet
ML Project
10 pages
DNN ALL Practical 28
No ratings yet
DNN ALL Practical 28
34 pages
Tarea 8
No ratings yet
Tarea 8
7 pages
Downloaded by R GAYATHRI (R.gayathri@aalimec - Ac.in)
No ratings yet
Downloaded by R GAYATHRI (R.gayathri@aalimec - Ac.in)
56 pages
DL Lab2
No ratings yet
DL Lab2
38 pages
HW8 La
No ratings yet
HW8 La
18 pages
DeepTrading With TensorFlow 2 - TodoTrader
No ratings yet
DeepTrading With TensorFlow 2 - TodoTrader
9 pages
Notebook - Tensorflow Keras
No ratings yet
Notebook - Tensorflow Keras
25 pages
Lab09 Assignment
No ratings yet
Lab09 Assignment
29 pages
ml labs
No ratings yet
ml labs
14 pages
22
No ratings yet
22
7 pages
Kerr - Solve Ivp
No ratings yet
Kerr - Solve Ivp
8 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
BHMC17 P5.ipynb - Colaboratory
No ratings yet
BHMC17 P5.ipynb - Colaboratory
4 pages
Aiml Lab
No ratings yet
Aiml Lab
14 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
DL Lab(6-10) With Output
No ratings yet
DL Lab(6-10) With Output
5 pages
Tutorial 1 - Bayesian Neural Networks With Pyro - UvA DL Notebooks v1.2 Documentation
No ratings yet
Tutorial 1 - Bayesian Neural Networks With Pyro - UvA DL Notebooks v1.2 Documentation
9 pages
Ai Last 5
No ratings yet
Ai Last 5
4 pages
Import As
100% (1)
Import As
27 pages
PCA
No ratings yet
PCA
23 pages
Numpy_TE2
No ratings yet
Numpy_TE2
12 pages
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
Ad3511 Deep Learning Lab Manual
No ratings yet
Ad3511 Deep Learning Lab Manual
80 pages
DEEP LEARNING MANUAL Final
No ratings yet
DEEP LEARNING MANUAL Final
14 pages
Play Tennis Tree
No ratings yet
Play Tennis Tree
1 page
L_AND_T_project_Naveen 24cs002895
No ratings yet
L_AND_T_project_Naveen 24cs002895
7 pages
Ai ML Programs
No ratings yet
Ai ML Programs
34 pages
ML LAB P-1
No ratings yet
ML LAB P-1
10 pages
Nb3 (Optional)
No ratings yet
Nb3 (Optional)
35 pages
TF Mannual
No ratings yet
TF Mannual
19 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
28 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
46 pages
AIML
No ratings yet
AIML
12 pages
P2
No ratings yet
P2
14 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
ML Lab Record
No ratings yet
ML Lab Record
33 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
Notes Data Science 1
No ratings yet
Notes Data Science 1
6 pages
DL Lab 3
No ratings yet
DL Lab 3
5 pages
Deep Learning Manual (1)
No ratings yet
Deep Learning Manual (1)
53 pages
16 - Practical - 6-7.ipynb - Colab
No ratings yet
16 - Practical - 6-7.ipynb - Colab
3 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Foundations of Data Science: Exercise 1
No ratings yet
Foundations of Data Science: Exercise 1
5 pages
Kmeans Example Mnnit
No ratings yet
Kmeans Example Mnnit
23 pages
Arc Length Truss 2dof
No ratings yet
Arc Length Truss 2dof
5 pages
Precision and Recall
No ratings yet
Precision and Recall
13 pages
Lesson1 Notes Fastai
No ratings yet
Lesson1 Notes Fastai
18 pages
2403RES29 - Hemant Choudhary - CS582 - Assignment - 1
No ratings yet
2403RES29 - Hemant Choudhary - CS582 - Assignment - 1
5 pages
Source Code
No ratings yet
Source Code
28 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
748747019-ad3511-deep-learning-lab-manual-iii-yearjnn (1)-1
No ratings yet
748747019-ad3511-deep-learning-lab-manual-iii-yearjnn (1)-1
51 pages
Numpy Module
No ratings yet
Numpy Module
10 pages
Program 1
No ratings yet
Program 1
25 pages
Chapter02 Mathematical-Building-Blocks
No ratings yet
Chapter02 Mathematical-Building-Blocks
9 pages
Deep Learning and TensorFlow
No ratings yet
Deep Learning and TensorFlow
50 pages
ML File
No ratings yet
ML File
13 pages
Electronic Cigarettes: All you need to know about E-Cigarettes and if you should switch
From Everand
Electronic Cigarettes: All you need to know about E-Cigarettes and if you should switch
Michael Gilchrist
No ratings yet
2025admin Guide en
No ratings yet
2025admin Guide en
11 pages
NIFS2024 Application Guideline
No ratings yet
NIFS2024 Application Guideline
10 pages
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
Stqa Miniproject
No ratings yet
Stqa Miniproject
18 pages
HS 512 Advanced Econometrics
No ratings yet
HS 512 Advanced Econometrics
2 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Fla Unit 4
No ratings yet
Fla Unit 4
103 pages
Reservoir Computing
No ratings yet
Reservoir Computing
8 pages
Statistical Simulation of Wind Speed in Athens, Greece Based On Weibull and ARMA Models
No ratings yet
Statistical Simulation of Wind Speed in Athens, Greece Based On Weibull and ARMA Models
8 pages
Deterministic Finite Automata
No ratings yet
Deterministic Finite Automata
11 pages
Resnet50 Summary
No ratings yet
Resnet50 Summary
4 pages
Lesson Plan - ML24ECSC306
No ratings yet
Lesson Plan - ML24ECSC306
22 pages
Variational Autoencoders - Pre Quiz - Attempt Review
100% (2)
Variational Autoencoders - Pre Quiz - Attempt Review
4 pages
Formal Language and Automata Theory
100% (1)
Formal Language and Automata Theory
136 pages
Automata
No ratings yet
Automata
2 pages
CSA501_ QB Neural Network Deep Learning_updated2024
No ratings yet
CSA501_ QB Neural Network Deep Learning_updated2024
11 pages
Minimization
No ratings yet
Minimization
13 pages
Session 9-Finite State Machines
No ratings yet
Session 9-Finite State Machines
34 pages
NFA ( (Unsaved 310885703214693280) )
No ratings yet
NFA ( (Unsaved 310885703214693280) )
15 pages
Univariate Time Series Modelling and Forecasting
100% (2)
Univariate Time Series Modelling and Forecasting
72 pages
Lecture 9. ARIMA Models
No ratings yet
Lecture 9. ARIMA Models
16 pages
Flat - Unit - 4 Notes
No ratings yet
Flat - Unit - 4 Notes
20 pages
Practice Final Questions 1
No ratings yet
Practice Final Questions 1
26 pages
Inverse Gamma PDF
No ratings yet
Inverse Gamma PDF
3 pages
Automata Assignment 2
No ratings yet
Automata Assignment 2
2 pages
Signal Theory PDF
No ratings yet
Signal Theory PDF
2 pages
Bidirectional LSTM Networks For Improved Phoneme Classification and Recognition
No ratings yet
Bidirectional LSTM Networks For Improved Phoneme Classification and Recognition
6 pages
Demand and Forecasting Chap 011
No ratings yet
Demand and Forecasting Chap 011
59 pages
What Is A UML Diagram? What Is Meant by UML?
No ratings yet
What Is A UML Diagram? What Is Meant by UML?
18 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Chapter 2 RegularExpressions
No ratings yet
Chapter 2 RegularExpressions
95 pages