0% found this document useful (0 votes)
39 views12 pages

Parts of Speech Tagger

This document summarizes a parts of speech tagger project completed by Akshay Bhoju kothari, Dhanush Shetty, and H.K Nakul as part of an ML internship. It describes extracting features from sentences, training a naive bayes classifier on the feature sets with 85% accuracy, and potential future enhancements like correcting grammar, parsing text, and adding sentiment analysis.

Uploaded by

Nakul hk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views12 pages

Parts of Speech Tagger

This document summarizes a parts of speech tagger project completed by Akshay Bhoju kothari, Dhanush Shetty, and H.K Nakul as part of an ML internship. It describes extracting features from sentences, training a naive bayes classifier on the feature sets with 85% accuracy, and potential future enhancements like correcting grammar, parsing text, and adding sentiment analysis.

Uploaded by

Nakul hk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Parts of Speech Tagger

1st project on ML internship (xcelerator)

submitted by- Akshay Bhoju kothari


Dhanush Shetty
H.K Nakul
Machine learning

definition-
Machine learning is an application of artificial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being
explicitly programmed.

Applications

1. Virtual Personal Assistants. Siri, Alexa, Google Now are some of the popular examples of virtual personal assistants.
2. Social Media Services.(facebook)
3. Email Spam and Malware Filtering.
4. Online Customer Support.
5. Search Engine Result Refining.
6. Product Recommendations.
Challenges faced-

1. Most of the challenges we faced in extracting the features.


2. During the training phase we got less accuracy .
3. As we are beginner in python coding ,so it felt somewhat difficult.
4. Understanding about parts of speech.
Feature Extraction
def Feature_Extraction(sentence, i): #feature extraction

features = { 'Token': sentence[i],

'first_word': i == 0,

'capitalized':sentence[i][0].upper() == sentence[i][0],

'All_capitalized': sentence[i].upper() == sentence[i],

'numeric': sentence[i].isdigit(),

'prev-word': '' if i == 0 else sentence[i - 1],

'suffix(1)': sentence[i][-1],

'suffix(2)': '' if len(sentence[i]) < 2 else sentence[i][-2:],

'suffix(3)': '' if len(sentence[i]) < 3 else sentence[i][-3:],

'prefix(1)': sentence[i][0]}

return features
How have we solved our problem--

1. we did a lot of research on identifying the proper features.


2. we read materials and referred websites on Xclerator portal about
machine learning and python coding.
3. Referred many websites and online learning platforms like coursera and
NPTEL.
4. we chose proper algorithm to improve efficiency.
5. we did a lot of research in identifying the proper features .
6. we discussed among our group to enhance our knowledge.
Importing and downloading necessary libraries and dataset.

import nltk #importing and downloading necessary libraries and dataset.

nltk.download('brown')

nltk.download('tagsets')

nltk.download('universal_tagset')

from nltk.corpus import brown

lines = brown.sents(categories='news')

feature= []

for sentence in lines:

for i, word in enumerate(sentence):

feature.append((Feature_Extraction(sentence, i)))
tagged_sents = brown.tagged_sents(categories='news', tagset='universal') #to untag all
the sentences which are tagged and the appending it to the featureset

featureset = []

for tagged_sent in tagged_sents:

untagged_sent = nltk.tag.untag(tagged_sent)

for i, (word,tag) in enumerate(tagged_sent):

featureset.append((Feature_Extraction(untagged_sent,i),tag)) #here featureset is


the data which we will be using for training

size = int(len(featureset)*0.1) #using only 10000 of words as total data

train_set, test_set = featureset[size:], featureset[:size] #5000 datas to train and other


5000 to test
Classifier-
classifier = nltk.NaiveBayesClassifier.train(train_set)

Evaluation using accuracy-


classifier.classify(Feature_Extraction(brown.sents()[0], 9,)) #for the word 'of'

print(Feature_Extraction(brown.sents()[0], 9,))

accuracy=nltk.classify.accuracy(classifier, test_set)

print(accuracy) #nearly 85% we are getting


Naive bayes Classifier-
Future Enhancements-
1. we will be able to correct grammatical errors in a sentence.

2. we will able to do chunking and parsing of text

3. this can be also used in chatbots as a part of the model

4. By adding some extra features we can make this model as sentiment


analyser
References-

1.http://www.nltk.org/book/ch06.html#ref-document-classify-all-words.

2 resource available on xclerator portal.

3.https://docs.python.org/3/library/stdtypes.html.(python documentation)
THANK YOU

You might also like