Assign 9-20U00323 Sec C.ipynb - Colaboratory
Assign 9-20U00323 Sec C.ipynb - Colaboratory
ipynb - Colaboratory
Question 1
import re
import string
import nltk
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')
text = """GDP in developing countries suchs as Pakistan will contine growing at a high rate"""
doc= nlp(text)
https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 1/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
contine --> ROOT --> VERB
growing --> xcomp --> VERB
at --> prep --> ADP
a --> det --> DET
high --> amod --> ADJ
rate --> pobj --> NOUN
displacy.render(doc,style='dep',jupyter=True)
pobj
pobj
prep amod
for i in doc1.ents:
print(i.text+'==='+i.label_)
Apple===ORG
$3 trillion===MONEY
Monday===DATE
under four years===DATE
Apple===ORG
182.86===MONEY
Apple===ORG
Apple U.S.===ORG
annual===DATE
quarter===DATE
29 percent===PERCENT
year-over-year===DATE
Apple===ORG
25.6===CARDINAL
year-over-year===DATE
more than $18 billion===MONEY
the quarter===DATE
https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 2/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
import pandas as pd
import seaborn as sns
import numpy as np
from nltk.tokenize import sent_tokenize
df323.shape
(5171, 4)
graph = sns.countplot(x="label",data=df323)
https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 3/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
token= RegexpTokenizer(r'[a-zA-Z0-9]+')
cv= CountVectorizer(lowercase=True,stop_words="english",ngram_range=(1,1),tokenizer=token.tokenize)
text_counts= cv.fit_transform(df323['text'])
text_counts
Collecting metrics
Downloading metrics-0.3.3.tar.gz (18 kB)
Preparing metadata (setup.py) ... done
Collecting Pygments==2.2.0 (from metrics)
Downloading Pygments-2.2.0-py2.py3-none-any.whl (841 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 841.7/841.7 kB 13.0 MB/s eta 0:00:00
Collecting pathspec==0.5.5 (from metrics)
Downloading pathspec-0.5.5.tar.gz (21 kB)
Preparing metadata (setup.py) ... done
Collecting pathlib2>=2.3.0 (from metrics)
Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from pat
Building wheels for collected packages: metrics, pathspec
Building wheel for metrics (setup.py) ... done
Created wheel for metrics: filename=metrics-0.3.3-py2.py3-none-any.whl size=17795 sha
Stored in directory: /root/.cache/pip/wheels/29/7a/e7/1175d9ff10607b8f02aa37c32392cb2
Building wheel for pathspec (setup.py) ... done
Created wheel for pathspec: filename=pathspec-0.5.5-py3-none-any.whl size=24219 sha25
Stored in directory: /root/.cache/pip/wheels/28/d4/11/01ccd690e97b06874998aa554a8b261
Successfully built metrics pathspec
Installing collected packages: Pygments, pathspec, pathlib2, metrics
Attempting uninstall: Pygments
Found existing installation: Pygments 2.16.1
Uninstalling Pygments-2.16.1:
Successfully uninstalled Pygments-2.16.1
ERROR: pip's dependency resolver does not currently take into account all the packages
ipython 7.34.0 requires jedi>=0.16, which is not installed.
nbconvert 6.5.4 requires pygments>=2.4.1, but you have pygments 2.2.0 which is incompat
rich 13.5.2 requires pygments<3.0.0,>=2.13.0, but you have pygments 2.2.0 which is inco
Successfully installed Pygments-2.2.0 metrics-0.3.3 pathlib2-2.3.7.post1 pathspec-0.5.5
WARNING: The following packages were previously imported in this runtime:
[pygments]
You must restart the runtime in order to use newly installed versions.
RESTART RUNTIME
clf = DecisionTreeClassifier().fit(X_train,y_train)
predicted = clf.predict(X_test)
https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 4/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
print("Decision tree accuracy:", accuracy_score(y_test,predicted))
print(classification_report(y_test,predicted))
tfidfvectorizer=TfidfVectorizer()
tfidf_vectors= tfidfvectorizer.fit_transform(df323['text'])
clf= DecisionTreeClassifier().fit(X_train,y_train)
predicted= clf.predict(X_test)
print("Decision tree accuracy:",accuracy_score(y_test,predicted))
print(classification_report(y_test,predicted))
https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 5/5