Nlp2.ipynb - Colab
Nlp2.ipynb - Colab
ipynb - Colab
1 import nltk
2 from sklearn.feature_extraction.text import CountVectorizer
3 nltk.download('punkt')
1 # Sample data
2 corpus = [
3 "SPPU is the one of the best university in India.",
4 "India has already allowded so many new universities.",
5 "AICTE is main authority in technical education.",
6 "UGC and AICTE allowded technical education in india?",
7 ]
8
Feature Names (BoW): ['aicte' 'allowded' 'already' 'and' 'authority' 'best' 'education' 'has'
'in' 'india' 'is' 'main' 'many' 'new' 'of' 'one' 'so' 'sppu' 'technical'
'the' 'ugc' 'universities' 'university']
BoW Matrix:
[[0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 2 0 0 1]
[0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0]
[1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0]
[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0]]
Feature Names (TF-IDF): ['aicte' 'allowded' 'already' 'and' 'authority' 'best' 'education' 'has'
'in' 'india' 'is' 'main' 'many' 'new' 'of' 'one' 'so' 'sppu' 'technical'
'the' 'ugc' 'universities' 'university']
TF-IDF Matrix:
[[0. 0. 0. 0. 0. 0.30954541
0. 0. 0.19757882 0.19757882 0.24404915 0.
0. 0. 0.30954541 0.30954541 0. 0.30954541
0. 0.61909081 0. 0. 0.30954541]
[0. 0.29737611 0.37718389 0. 0. 0.
0. 0.37718389 0. 0.24075159 0. 0.
0.37718389 0.37718389 0. 0. 0.37718389 0.
0. 0. 0. 0.37718389 0. ]
[0.35639424 0. 0. 0. 0.4520409 0.
0.35639424 0. 0.28853185 0. 0.35639424 0.4520409
https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 1/3
4/19/24, 4:06 PM nlp2.ipynb - Colab
0. 0. 0. 0. 0. 0.
0.35639424 0. 0. 0. 0. ]
[0.34242558 0.34242558 0. 0.43432343 0. 0.
0.34242558 0. 0.27722302 0.27722302 0. 0.
0. 0. 0. 0. 0. 0.
0.34242558 0. 0.43432343 0. 0. ]]
https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 2/3
4/19/24, 4:06 PM nlp2.ipynb - Colab
https://colab.research.google.com/drive/1GWw2psLJ4rs1IT5iUg9xdoMOZ87BiYaF#scrollTo=9b32KeWq6dd-&uniqifier=2&printMode=true 3/3