0% found this document useful (0 votes)
219 views46 pages

Sentiment Analysis On Manipuri Language

1) The document discusses sentiment analysis on the Manipuri language using machine learning techniques like Naive Bayes classifier and deep learning approaches. 2) It outlines the objectives to classify Manipuri sentences as positive or negative sentiment and explores features like TF-IDF. 3) The proposed model involves data collection, preprocessing like transliteration, feature extraction using TF-IDF and sentiment analysis using Naive Bayes and deep learning methods.

Uploaded by

RAHUL KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views46 pages

Sentiment Analysis On Manipuri Language

1) The document discusses sentiment analysis on the Manipuri language using machine learning techniques like Naive Bayes classifier and deep learning approaches. 2) It outlines the objectives to classify Manipuri sentences as positive or negative sentiment and explores features like TF-IDF. 3) The proposed model involves data collection, preprocessing like transliteration, feature extraction using TF-IDF and sentiment analysis using Naive Bayes and deep learning methods.

Uploaded by

RAHUL KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

SENTIMENT ANALYSIS ON MANIPURI

LANGUAGE

Project Supervisor

Loitongbam Sanayai Meetei


Dr. Samir Kumar Borgohain
M Tech Scholar : CS-16-25-101 Assistant Professor
Department of CSE, NIT Silchar Department of CSE, NIT Silchar
CONTENT
 Introduction
 Literature Review

 Problem Statement

 Objective

 Proposed Model

 Future Work

 Reference
WHAT IS SENTIMENT ANALYSIS?
LITERATURE SURVEY
Serial Reference Findings Limitations
No.

1 [1] Kishorjith N. The data were processed for Mainly focus on the feature
et al.: Verb based Part of Speech (POS) selection and the sentiment
Manipuri tagging using Conditional decider was based on a simple
sentiment analysis Random Field (CRF). counting method. More methods
Polarity being notified for and algorithm can be
each of the verbs, the implemented and explored
highest number of polarity
being the sentiment decider.

2 [2] Hayeon Jang et


al. : Language- Using SVMLight classifier,
specific sentiment comparisons were done on Contrary to their expectations,
analysis in term frequency- inverse the simple classification method
morphologically document frequency (TF- gets higher results.
rich languages IDF) and all possible
combinations of chunking
and shifters.
LITERATURE SURVEY
Serial Reference Findings Limitations
No.
3 [3] P. Vateekul et al: Two deep learning techniques Since the feature extraction
A Study of for the sentiment uses bog of words, signature
Sentiment Analysis classification of Thai Twitter words may be less.
Using Deep data, i.e., Convolutional
Learning Neural Network and Long
Techniques on Thai Short Term Memory (LSTM).
Twitter Data Both techniques were found to
give significantly higher
accuracies than classical
techniques.
PROBLEM STATEMENT:

 Lack of data

 Lack of polarity tagged data

 Lack of part of speech (POS) tagger

 Most of the data were in Bengali script


OBJECTIVE

 Sentiment analysis on Manipuri Language, that is to classify the


sentence to Positive or Negative sentiment.

 Since no classification model have been applied on Manipur


language in the sentiment analysis [1].
We will exploring the Naïve Bayes classifier and deep learning
approach.
ABOUT MANIPURI LANGUAGE
 Manipuri language, a Tibeto-Burman language spoken
predominantly in Manipur, a northeastern state of India.
 Smaller speech communities exist in the Indian states of Assam,
Mizoram, and Tripura, as well as in Bangladesh and Myanmar
(Burma)
 Subject Object Verb (SOV) language
 E.g. Robert na lafoi chakhre ( its English transliteration is “Robert
banana ate” which in the English language would be “Robert ate banana” ,
which is a Subject Verb Object format)

 Agglutinative language

Language Present Present Past perfect


continuous
English go going went
Hindi jata ja raha gaya
Manipuri chatlage chatli chatlure
PROPOSED MODEL :

Data collection

Pre-processing

Feature Extraction

Sentiment Analysis
PROPOSED MODEL : Data from survey and
articles containing
Manipuri text from
Technology Development
for Indian Languages
Data collection (TDIL)

Pre-processing

Feature Extraction

Sentiment Analysis
PROPOSED MODEL :

Data collection • Transliteration from


Bengali script to English.
• Manual annotation of
polarity to each sentences for
supervised training and
Pre-processing ground truth reference

Feature Extraction

Sentiment Analysis
PROPOSED MODEL :

Data collection

Pre-processing
Term Frequency –
Inverse Document
Frequency (TF-IDF)

Feature Extraction

Sentiment Analysis
TF-IDF
Numerical statistic that is intended to reflect how important a
word is to a document in a collection or corpus.

TF (Term Frequency):
Raw count of a term in a document, i.e. the number of times that
term t occurs in document d.
tf(t,d) = ft,d

IDF (Inverse Document Frequency):


Calculated as:

Finally,
tf-idf = tf(t,d) . idf(t)
TF-IDF EXAMPLE
Dataset:

Doc 1 ei koiba chatpa pammi


Doc 2 esei taba nungai amadi esei tabana pothaba fangi
Doc 3 koiba chatpa matam mangni

TF calculation:
koiba chatpa pammi esei nungai pothaba fangi matam manngi

Doc1 1 1 1
Doc 2 2 1 1 1
Doc 3 1 1 1 1
TF-IDF EXAMPLE
IDF calculation:
koiba chatpa pammi esei nungai pothaba fangi matam manngi
Doc1 0.18 0.18 0.48
Doc 2 0.48 0.48 0.48 0.48
Doc 3 0.18 0.18 0.48 0.48

TF-IDF calculation:

koiba chatpa pammi esei nungai pothaba fangi matam manngi


Doc1 0.18 0.18 0.48
Doc 2 0.95 0.48 0.48 0.48
Doc 3 0.18 0.18 0.48 0.48
PROPOSED MODEL :
Data collection

Pre-processing

Feature Extraction
Naïve Bayes

Sentiment Analysis Machine


Learning Deep Learning
NAIVE BAYES

where,
P(Ck| A ) = probability that a training pattern with A attribute
belongs to class Ck ( Posterior probability )
P( A|Ck) = probability that a training pattern of class Ck to have
A attribute ( Conditional probability )
P(Ck) = probability of a training pattern that belongs to class
Ck ( Prior probability )
P( A ) = probability of a training pattern having attributes A
EXAMPLE
Type Doc Words Class

Training 1 ei koiba chatpa pammi pos

2 esei taba nungai amadi esei tabana pothaba fangi pos

3 koiba chatpa matam mangni neg

Testing 4 esei taba matam mangi

TF-IDF :
koiba chatpa pammi esei nungai pothaba fangi matam manngi

Doc1 0.18 0.18 0.48


Doc 2 0.95 0.48 0.48 0.48
Doc 3 0.18 0.18 0.48 0.48
EXAMPLE
Type Doc Words Class

Training 1 koiba chatpa pammi pos

P(pos) = 2/3 2 esei nungai esei pothaba fangi pos

P(neg) = 1/3 3 koiba chatpa matam mangni neg

Testing 4 esei matam mangi

Conditional probability:
P(esei| pos) = [(0.95*2) + 1] / (8 + 9) = 2.9/17 P(pos|d4)
P(matam| pos) = [0 + 1] / (8 + 9) = 1/17 = 2/3* 2.9/17 * (1/17)2
P(mangi| pos) = [0 + 1] / (8 + 9) = 1/17 = 0.000393

P(esei| neg) = [0 + 1] / (3 + 9) = 1/12 P(neg|d4)


P(matam| neg) = [0.48 + 1] / (3 + 9) = 1.48/12 = 1/3 * 1/12 * (1.48/12)2
P(mangi| neg) = [0.48 + 1] / (3 + 9) = 1.48/12 = 0.000422
DEEP LEARNING

Fig 1. Artificial Neural Network Fig 2. Deep neural network


PROGRESS SO FAR

 Data Collection : we have collected around 2000 sentences in Manipuri


language

 Implementation of Transliteration program in progress

 Manual annotation in progress


FUTURE WORK

 Collect more data

 Implementation of the model


REFERENCE

 Kishorjith N., Dilipkumar, K., Wangkheimayum, H., Shinghajith,


K., Sivaji B.: Verb based Manipuri sentiment analysis. IJNLC
3(3), 1307–2278, 2014

 Hayeon Jang and Hyopil Shin : Language-specific sentiment


analysis in morphologically rich languages. In Coling 2010:
Posters, pages 498–506, Beijing, China, August, 2010.

 P. Vateekul and T. Koomsubha : A Study of Sentiment Analysis


Using Deep Learning Techniques on Thai Twitter Data, 2016.
Thank you

You might also like