0% found this document useful (0 votes)
24 views5 pages

Assign 9-20U00323 Sec C.ipynb - Colaboratory

The document discusses Samana Tatheer's assignment on predictive analytics using the spaCy library in Python. It shows the code to import spaCy and load the model, perform part-of-speech tagging, dependency parsing, and named entity recognition on text. It then imports a dataset of ham and spam emails, explores the data, and imports machine learning libraries to classify the emails using a decision tree classifier.

Uploaded by

Samana Tatheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Assign 9-20U00323 Sec C.ipynb - Colaboratory

The document discusses Samana Tatheer's assignment on predictive analytics using the spaCy library in Python. It shows the code to import spaCy and load the model, perform part-of-speech tagging, dependency parsing, and named entity recognition on text. It then imports a dataset of ham and spam emails, explores the data, and imports machine learning libraries to classify the emails using a decision tree classifier.

Uploaded by

Samana Tatheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/28/23, 1:54 PM Assign 9-20U00323 Sec C.

ipynb - Colaboratory

**Samana Tatheer, 20U00323, BSc IV Sec C**


Samana Tatheer, 20U00323, BSc IV Sec C

Question 1

pip install spacy

Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (3.6.1)


Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.9)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.7)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.8)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.4.7)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.9)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.10.2)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (4.66.1)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.23.5)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.10.12)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy) (67.7.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (23.1)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (202
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy) (0.7.10)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy) (0
Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/lib/python3.10/dist-packages (from typer<0.10.0,>=0.3.0->spacy) (8.1.7
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy) (2.1.3)

Q1. Import the spacy library

import re
import string
import nltk
import spacy
from spacy import displacy

Q2. Import 'en_core_web_sm'

nlp = spacy.load('en_core_web_sm')

Q3. Speech Tagging

text = """GDP in developing countries suchs as Pakistan will contine growing at a high rate"""

doc= nlp(text)

for tok in doc:


print(tok.text,"-->", tok.dep_,"-->",tok.pos_)

GDP --> nsubj --> PROPN


in --> prep --> ADP
developing --> amod --> VERB
countries --> pobj --> NOUN
suchs --> pobj --> NOUN
as --> mark --> SCONJ
Pakistan --> nsubj --> PROPN
will --> aux --> AUX

https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 1/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
contine --> ROOT --> VERB
growing --> xcomp --> VERB
at --> prep --> ADP
a --> det --> DET
high --> amod --> ADJ
rate --> pobj --> NOUN

Q4. Dependecy Graph

displacy.render(doc,style='dep',jupyter=True)

pobj

pobj

prep amod

GDP in developing countries

PROPN ADP VERB NOUN

Q5. Name entity recognition

doc1=nlp("""Apple hit a market cap of $3 trillion during intraday trading on Monday,


tripling its valuation in under four years. Apple broke the barrier when its
share price hit $182.86. The milestone is mostly symbolic but it shows
investors remain bullish on Apple stock and its ability to grow. Apple U.S.
showed annual growth across all of its product categories in its fourth-
quarter earnings, with revenue up 29 percent year-over-year. While the
iPhone is still the biggest sales driver, Apple’s services business grew 25.6
percent year-over-year and delivered more than $18 billion in revenue
during the quarter. And analysts see plenty of room to run.""")

for i in doc1.ents:
print(i.text+'==='+i.label_)

Apple===ORG
$3 trillion===MONEY
Monday===DATE
under four years===DATE
Apple===ORG
182.86===MONEY
Apple===ORG
Apple U.S.===ORG
annual===DATE
quarter===DATE
29 percent===PERCENT
year-over-year===DATE
Apple===ORG
25.6===CARDINAL
year-over-year===DATE
more than $18 billion===MONEY
the quarter===DATE

Question 2: Predictive Analytics

https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 2/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory

i.Import required libraries

import pandas as pd
import seaborn as sns
import numpy as np
from nltk.tokenize import sent_tokenize

ii.Import data set

df323 = pd.read_csv('/content/Ham Spam dataset.csv')


df323

Unnamed: 0 label text label_num

0 605 ham Subject: enron methanol ; meter # : 988291\r\n... 0

1 2349 ham Subject: hpl nom for january 9 , 2001\r\n( see... 0

2 3624 ham Subject: neon retreat\r\nho ho ho , we ' re ar... 0

3 4685 spam Subject: photoshop , windows , office . cheap ... 1

4 2030 ham Subject: re : indian springs\r\nthis deal is t... 0

... ... ... ... ...

5166 1518 ham Subject: put the 10 on the ft\r\nthe transport... 0

5167 404 ham Subject: 3 / 4 / 2000 and following noms\r\nhp... 0

5168 2933 ham Subject: calpine daily gas nomination\r\n>\r\n... 0

5169 1409 ham Subject: industrial worksheets for august 2000... 0

5170 4807 spam Subject: important online banking alert\r\ndea... 1

5171 rows × 4 columns

iii. Dataset shape:

df323.shape

(5171, 4)

iv. Bar chart of Ham and Spam emails:

graph = sns.countplot(x="label",data=df323)

v. Import machine learning libraries

https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 3/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory

from sklearn.tree import DecisionTreeClassifier


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report

from sklearn.feature_extraction.text import CountVectorizer


from nltk.tokenize import RegexpTokenizer

token= RegexpTokenizer(r'[a-zA-Z0-9]+')

vi. Apply the text preprocessing steps

cv= CountVectorizer(lowercase=True,stop_words="english",ngram_range=(1,1),tokenizer=token.tokenize)

text_counts= cv.fit_transform(df323['text'])
text_counts

/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py:528: UserWarning: The parameter 'token_pattern' will not be


warnings.warn(
<5171x50174 sparse matrix of type '<class 'numpy.int64'>'
with 356187 stored elements in Compressed Sparse Row format>

vii. Train test split

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(text_counts, df323['label'],test_size=0.3, random_state=1)

viii. Decision Tree Classification

pip install metrics

Collecting metrics
Downloading metrics-0.3.3.tar.gz (18 kB)
Preparing metadata (setup.py) ... done
Collecting Pygments==2.2.0 (from metrics)
Downloading Pygments-2.2.0-py2.py3-none-any.whl (841 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 841.7/841.7 kB 13.0 MB/s eta 0:00:00
Collecting pathspec==0.5.5 (from metrics)
Downloading pathspec-0.5.5.tar.gz (21 kB)
Preparing metadata (setup.py) ... done
Collecting pathlib2>=2.3.0 (from metrics)
Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from pat
Building wheels for collected packages: metrics, pathspec
Building wheel for metrics (setup.py) ... done
Created wheel for metrics: filename=metrics-0.3.3-py2.py3-none-any.whl size=17795 sha
Stored in directory: /root/.cache/pip/wheels/29/7a/e7/1175d9ff10607b8f02aa37c32392cb2
Building wheel for pathspec (setup.py) ... done
Created wheel for pathspec: filename=pathspec-0.5.5-py3-none-any.whl size=24219 sha25
Stored in directory: /root/.cache/pip/wheels/28/d4/11/01ccd690e97b06874998aa554a8b261
Successfully built metrics pathspec
Installing collected packages: Pygments, pathspec, pathlib2, metrics
Attempting uninstall: Pygments
Found existing installation: Pygments 2.16.1
Uninstalling Pygments-2.16.1:
Successfully uninstalled Pygments-2.16.1
ERROR: pip's dependency resolver does not currently take into account all the packages
ipython 7.34.0 requires jedi>=0.16, which is not installed.
nbconvert 6.5.4 requires pygments>=2.4.1, but you have pygments 2.2.0 which is incompat
rich 13.5.2 requires pygments<3.0.0,>=2.13.0, but you have pygments 2.2.0 which is inco
Successfully installed Pygments-2.2.0 metrics-0.3.3 pathlib2-2.3.7.post1 pathspec-0.5.5
WARNING: The following packages were previously imported in this runtime:
[pygments]
You must restart the runtime in order to use newly installed versions.

RESTART RUNTIME

clf = DecisionTreeClassifier().fit(X_train,y_train)
predicted = clf.predict(X_test)

https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 4/5
9/28/23, 1:54 PM Assign 9-20U00323 Sec C.ipynb - Colaboratory
print("Decision tree accuracy:", accuracy_score(y_test,predicted))
print(classification_report(y_test,predicted))

Decision tree accuracy: 0.9426546391752577


precision recall f1-score support

ham 0.96 0.96 0.96 1101


spam 0.89 0.91 0.90 451

accuracy 0.94 1552


macro avg 0.93 0.93 0.93 1552
weighted avg 0.94 0.94 0.94 1552

ix. Using TF-IDF approach

tfidfvectorizer=TfidfVectorizer()
tfidf_vectors= tfidfvectorizer.fit_transform(df323['text'])

X_train, X_test, y_train, y_test = train_test_split(tfidf_vectors, df323['label'],test_size=0.3, random_state=1)

clf= DecisionTreeClassifier().fit(X_train,y_train)
predicted= clf.predict(X_test)
print("Decision tree accuracy:",accuracy_score(y_test,predicted))
print(classification_report(y_test,predicted))

Decision tree accuracy: 0.9336340206185567


precision recall f1-score support

ham 0.96 0.95 0.95 1101


spam 0.87 0.90 0.89 451

accuracy 0.93 1552


macro avg 0.92 0.92 0.92 1552
weighted avg 0.93 0.93 0.93 1552

https://colab.research.google.com/drive/1aMqzCNkPQKpLKbQRUwPPVlhb_ySpjbav?authuser=0#scrollTo=h9NnrrTj_li6&printMode=true 5/5

You might also like