0% found this document useful (0 votes)

39 views12 pages

Parts of Speech Tagger

This document summarizes a parts of speech tagger project completed by Akshay Bhoju kothari, Dhanush Shetty, and H.K Nakul as part of an ML internship. It describes extracting features from sentences, training a naive bayes classifier on the feature sets with 85% accuracy, and potential future enhancements like correcting grammar, parsing text, and adding sentiment analysis.

Uploaded by

Nakul hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views12 pages

Parts of Speech Tagger

Uploaded by

Nakul hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Parts of Speech Tagger

1st project on ML internship (xcelerator)

submitted by- Akshay Bhoju kothari

Dhanush Shetty
H.K Nakul
Machine learning

definition-
Machine learning is an application of artiﬁcial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being
explicitly programmed.

Applications

1. Virtual Personal Assistants. Siri, Alexa, Google Now are some of the popular examples of virtual personal assistants.
2. Social Media Services.(facebook)
3. Email Spam and Malware Filtering.
4. Online Customer Support.
5. Search Engine Result Refining.
6. Product Recommendations.
Challenges faced-

1. Most of the challenges we faced in extracting the features.

2. During the training phase we got less accuracy .
3. As we are beginner in python coding ,so it felt somewhat difficult.
4. Understanding about parts of speech.
Feature Extraction
def Feature_Extraction(sentence, i): #feature extraction

features = { 'Token': sentence[i],

'first_word': i == 0,

'capitalized':sentence[i][0].upper() == sentence[i][0],

'All_capitalized': sentence[i].upper() == sentence[i],

'numeric': sentence[i].isdigit(),

'prev-word': '' if i == 0 else sentence[i - 1],

'suffix(1)': sentence[i][-1],

'suffix(2)': '' if len(sentence[i]) < 2 else sentence[i][-2:],

'suffix(3)': '' if len(sentence[i]) < 3 else sentence[i][-3:],

'prefix(1)': sentence[i][0]}

return features
How have we solved our problem--

1. we did a lot of research on identifying the proper features.

2. we read materials and referred websites on Xclerator portal about
machine learning and python coding.
3. Referred many websites and online learning platforms like coursera and
NPTEL.
4. we chose proper algorithm to improve efficiency.
5. we did a lot of research in identifying the proper features .
6. we discussed among our group to enhance our knowledge.
Importing and downloading necessary libraries and dataset.

import nltk #importing and downloading necessary libraries and dataset.

nltk.download('brown')

nltk.download('tagsets')

nltk.download('universal_tagset')

from nltk.corpus import brown

lines = brown.sents(categories='news')

feature= []

for sentence in lines:

for i, word in enumerate(sentence):

feature.append((Feature_Extraction(sentence, i)))
tagged_sents = brown.tagged_sents(categories='news', tagset='universal') #to untag all
the sentences which are tagged and the appending it to the featureset

featureset = []

for tagged_sent in tagged_sents:

untagged_sent = nltk.tag.untag(tagged_sent)

for i, (word,tag) in enumerate(tagged_sent):

featureset.append((Feature_Extraction(untagged_sent,i),tag)) #here featureset is

the data which we will be using for training

size = int(len(featureset)*0.1) #using only 10000 of words as total data

train_set, test_set = featureset[size:], featureset[:size] #5000 datas to train and other

5000 to test
Classifier-
classifier = nltk.NaiveBayesClassifier.train(train_set)

Evaluation using accuracy-

classifier.classify(Feature_Extraction(brown.sents()[0], 9,)) #for the word 'of'

print(Feature_Extraction(brown.sents()[0], 9,))

accuracy=nltk.classify.accuracy(classifier, test_set)

print(accuracy) #nearly 85% we are getting

Naive bayes Classifier-
Future Enhancements-
1. we will be able to correct grammatical errors in a sentence.

2. we will able to do chunking and parsing of text

3. this can be also used in chatbots as a part of the model

4. By adding some extra features we can make this model as sentiment

analyser
References-

1.http://www.nltk.org/book/ch06.html#ref-document-classify-all-words.

2 resource available on xclerator portal.

3.https://docs.python.org/3/library/stdtypes.html.(python documentation)
THANK YOU

Natural Language Processing For Hackers
No ratings yet
Natural Language Processing For Hackers
176 pages
Blotter System Docs
No ratings yet
Blotter System Docs
31 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Lab 6
No ratings yet
Lab 6
47 pages
Blog Submission Guidelines
No ratings yet
Blog Submission Guidelines
5 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
SOP of NIRNAY APP - 241001 - 202046
No ratings yet
SOP of NIRNAY APP - 241001 - 202046
39 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
AZ 900T0X ENU PowerPoint 1 Cloud Concepts
No ratings yet
AZ 900T0X ENU PowerPoint 1 Cloud Concepts
25 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
No ratings yet
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
9 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Deep Learning Questions 1701781891
No ratings yet
Deep Learning Questions 1701781891
17 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
Glove
100% (1)
Glove
10 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
NLP
No ratings yet
NLP
6 pages
Unit 6 Endsem PYQs
No ratings yet
Unit 6 Endsem PYQs
15 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Word Embeddings in NLP
No ratings yet
Word Embeddings in NLP
42 pages
NLP Lab Assignment 8
No ratings yet
NLP Lab Assignment 8
14 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
NLP Essentials
No ratings yet
NLP Essentials
22 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Iqpump Controller: User Manual
No ratings yet
Iqpump Controller: User Manual
266 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
Electronic Age-Wps Office
No ratings yet
Electronic Age-Wps Office
94 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Infineon-Traveo t2g Can User Guide-UserManual-V13 00-En
No ratings yet
Infineon-Traveo t2g Can User Guide-UserManual-V13 00-En
104 pages
AI Intern Interview Complete Questions Harpreet
No ratings yet
AI Intern Interview Complete Questions Harpreet
3 pages
DS 7
No ratings yet
DS 7
3 pages
Male Basic Round Neck Tee
100% (1)
Male Basic Round Neck Tee
21 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Motion UI
No ratings yet
Motion UI
3 pages
AI Lab Programs
No ratings yet
AI Lab Programs
9 pages
Methodology
No ratings yet
Methodology
9 pages
Chat Bot
No ratings yet
Chat Bot
10 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Sumati
No ratings yet
Sumati
10 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
DeekshikaJadyada AP24LDS11
No ratings yet
DeekshikaJadyada AP24LDS11
6 pages
Computer Engineering Department: Micro Project Report
No ratings yet
Computer Engineering Department: Micro Project Report
18 pages
NLPAssignment Purna
No ratings yet
NLPAssignment Purna
12 pages
Intrusion Detection With Suricata
No ratings yet
Intrusion Detection With Suricata
32 pages
How To Create Security Roles For SAP FIORI Tiles Via PFCG in Gateway - Frontend System
100% (1)
How To Create Security Roles For SAP FIORI Tiles Via PFCG in Gateway - Frontend System
9 pages
Digital Forensics Seminar Report
No ratings yet
Digital Forensics Seminar Report
22 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
Module 5 - Computer Basics
No ratings yet
Module 5 - Computer Basics
64 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
NLP Record
No ratings yet
NLP Record
15 pages
Value Proposition: HP Indigo 7500 Digital Press Presentation
No ratings yet
Value Proposition: HP Indigo 7500 Digital Press Presentation
57 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
Just Exam Online Tutorial
No ratings yet
Just Exam Online Tutorial
61 pages
Fundamentals of Crisis Communication
100% (1)
Fundamentals of Crisis Communication
28 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Natural Language Processing GPT-2
No ratings yet
Natural Language Processing GPT-2
5 pages
Classical IPC Problems Reader's and Writer Problem
No ratings yet
Classical IPC Problems Reader's and Writer Problem
79 pages
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Quadratic Py Qs
No ratings yet
Quadratic Py Qs
8 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Sample
No ratings yet
Sample
8 pages
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
No ratings yet
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
7 pages
Bilal Servicenow Developer
No ratings yet
Bilal Servicenow Developer
5 pages
Skit
No ratings yet
Skit
14 pages
Red Hat Enterprise Linux-9-Configuring and Using A CUPS Printing server-en-US
No ratings yet
Red Hat Enterprise Linux-9-Configuring and Using A CUPS Printing server-en-US
31 pages
qw5618 EN
No ratings yet
qw5618 EN
21 pages
nm8sb 1 5
No ratings yet
nm8sb 1 5
4 pages
Moxa Nport 6400 6600 Series Datasheet v1.5
No ratings yet
Moxa Nport 6400 6600 Series Datasheet v1.5
11 pages
Ecommerce Assignment 1
No ratings yet
Ecommerce Assignment 1
3 pages
DM-MICA ZIVAME Group Anuj Sarabhai
No ratings yet
DM-MICA ZIVAME Group Anuj Sarabhai
9 pages
EWARM DDFFormat
No ratings yet
EWARM DDFFormat
6 pages
Datasheet 708E Series v1 02 20221205
No ratings yet
Datasheet 708E Series v1 02 20221205
3 pages