0% found this document useful (0 votes)
3 views

Sentiment Analysis in Java_ Analyzing Multisentence Text Blocks

The article discusses methods for performing sentiment analysis on multisentence text blocks in Java using Stanford CoreNLP. It highlights the importance of calculating a single sentiment score for entire text blocks, suggesting weighted averages to account for the significance of different sentences. Two approaches are presented: one focusing on the first and last sentences in product reviews, and another that increases the weight of sentences as the story progresses.

Uploaded by

tadala8333858591
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Sentiment Analysis in Java_ Analyzing Multisentence Text Blocks

The article discusses methods for performing sentiment analysis on multisentence text blocks in Java using Stanford CoreNLP. It highlights the importance of calculating a single sentiment score for entire text blocks, suggesting weighted averages to account for the significance of different sentences. Two approaches are presented: one focusing on the first and last sentences in product reviews, and another that increases the weight of sentences as the story progresses.

Uploaded by

tadala8333858591
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Java Magazine

Java Magazine

Sentiment analysis in Java: Analyzing


multisentence text blocks
Yuli Vasiliev | February 18, 2022

   
One sentence is positive. One sentence is negative. What’s the
sentiment of the entire text block?
Sentiment analysis tells you if text conveys a positive, negative, or neutral message. When applied to a
stream of social media messages from an account or a hashtag, for example, you can determine whether
sentiment is overall favorable or unfavorable. If you examine sentiments over time, you can analyze them
for trends or attempt to correlate them against external data. Based on this analysis, you could then build
predictive models.

This is the second article in a series on performing sentiment analysis in Java by using the sentiment tool
integrated into Stanford CoreNLP, an open source library for natural language processing (NLP).

In the first article, “Perform textual sentiment analysis in Java using a deep learning model,” you learned
how to use this tool to determine the sentiment of a sentence from very negative to very positive. In
practice, however, you might often need to look at a single aggregate sentiment score for the entire text
block, rather than having a set of sentence-level sentiment scores.

Here, I will describe some approaches you can use to perform analysis on an arbitrarily sized text block,
building on the Java code presented in the first article.
Scoring a multisentence text block
When you need to deal with a long, multisentence text block (such as a tweet, an email, or a product
review), you might naturally want to have a single sentiment score for the entire text block rather than
merely receiving a list of sentiment scores for separate sentences.

One simple solution is to calculate the average sentiment score for the entire text block by adding the
sentiment scores of separate sentences and dividing by the number of sentences.

However, this approach is not perfect in most cases since different sentences within a text block can affect
the overall sentiment differently. In other words, different sentences within a block may have varying
degrees of importance when you calculate the overall sentiment.

There is no single algorithm for identifying the most-important sentences that would work equally well for
all types of texts; perhaps that is why Stanford CoreNLP does not provide a built-in option for identifying
the overall sentiment of a multisentence text block.

Fortunately, you can manually code such functionality to work best for the type of text you are dealing
with. For example, text samples of the same type usually have something in common when it comes to
identifying the most-important sentences.

Imagine you’re dealing with product reviews. The most-important statements—from the standpoint of
the overall review sentiment—typically can be found at the beginning or the end of the review. The first
statement usually expresses the main idea of the review, and the last one summarizes it. While this may
not be true for every review, a significant portion of them look exactly like that. Here is an example.

I would recommend this book for anyone who wants an introduction to natural language
processing. Just finished the book and followed the code all way. I tried the code from the
resource website. I like how it is organized. Well done.

The Stanford CoreNLP sentiment classifier would identify the above sentences as follows:

I would recommend this book for anyone who wants an introduction to


Sentence:
natural language processing.

Sentiment: Positive(3)

Sentence: Just finished the book and followed the code all way.

Sentiment: Neutral(2)

Sentence: I tried the code from the resource website.

Sentiment: Neutral(2)

Sentence: I like how it is organized.

Sentiment: Neutral(2)
Sentence: Well done.

Sentiment: Positive(3)

As you can see, the first and the last sentences suggest that the review is positive. Overall, however, the
number of neutral sentences in the review outnumber the positive statements, which means that an
arithmetic linear average, where you give the same weight to each sentence, does not seem to be a proper
way to calculate the overall sentiment of the review. Instead, you might want to calculate it with more
weight assigned to the first and the last sentences, as implemented in the example discussed below.

The weighted-average approach


Continuing with the sample Java program introduced in the first article, add the following
getReviewSentiment() method to the nlpPipeline class, as follows:

import java.util.*;
...

public static void getReviewSentiment(String review, float weight)


{
int sentenceSentiment;
int reviewSentimentAverageSum = 0;
int reviewSentimentWeightedSum = 0;
Annotation annotation = pipeline.process(review);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnn
int numOfSentences = sentences.size();
int factor = Math.round(numOfSentences*weight);
if (factor == 0) {
factor = 1;
}
int divisorLinear = numOfSentences;
int divisorWeighted = 0;

for (int i = 0; i < numOfSentences; i++)


{
Tree tree = sentences.get(i).get(SentimentAnnotatedTree.class);
sentenceSentiment = RNNCoreAnnotations.getPredictedClass(tree);
reviewSentimentAverageSum = reviewSentimentAverageSum + sentenceS
if(i == 0 || i == numOfSentences -1) {
reviewSentimentWeightedSum = reviewSentimentWeightedSum + sen
divisorWeighted += factor;
}
else
{
reviewSentimentWeightedSum = reviewSentimentWeightedSum + sen
divisorWeighted += 1;
}
}
}
System.out.println("Number of sentences:\t\t" + numOfSentences);
System.out.println("Adapted weighting factor:\t" + factor);
System.out.println("Weighted average sentiment:\t" + Math.round((floa
System.out.println("Linear average sentiment:\t" + Math.round((float)
}

Copy code snippet

The getReviewSentiment() method shown above illustrates how to calculate the overall sentiment of
a review using two approaches, calculating both a weighted average and the linear average for
comparison purposes.

The method takes the text of a review as the first parameter. As the second, you pass a weighting factor to
apply to the first and the last sentences when calculating the overall review sentiment. The weighting
factor is passed in as a real number in the range [0, 1]. To apply the scale to fit a particular review, you
recalculate the weighting factor by multiplying the passed value by the number of sentences in the review,
thus calculating the adapted weighting factor.

To test the getReviewSentiment() method, use the following code:

public class OverallReviewSentiment


{
public static void main(String[] args)
{
String text = "I would recommend this book for anyone who wants an i
nlpPipeline.init();
nlpPipeline.getReviewSentiment(text, 0.4f);
}
}

Copy code snippet

This example passes in 0.4 as the weighting factor, but you should experiment with the value passed in.
The higher this value, the more importance is given to the first and last sentences in the review.

To see this approach in action, recompile the nlpPipeline class and compile the newly created
OverallReviewSentiment class. Then, run OverallReviewSentiment, as follows:

$ javac nlpPipeline.java
$ javac OverallReviewSentiment.java
$ java OverallReviewSentiment
This should produce the following results:

Number of sentences: 5

Adapted weighting factor: 2

Weighted average sentiment: 3

Linear average sentiment: 2

As you can see, the weighted average shows a more relevant estimate of the overall sentiment of the
review than the linear average does.

Sequential increases in weight ratios


When it comes to storylike texts that cover a sequence of events spread over a time span, the importance
of sentences—from the standpoint of the overall sentiment—often increases as the story goes. That is,
the most important sentences in the sense of having the most influence on the overall sentiment
conveyed by the story are typically found at the end, because they describe the most-recent episodes,
conclusions, or experiences.

Consider the following tweet:

The weather in the morning was terrible. We decided to go to the cinema. Had a great time.

The sentence-level sentiment analysis of this story gives the following results:

Sentence: The weather in the morning was terrible.

Sentiment: Negative(1)

Sentence: We decided to go to the cinema.

Sentiment: Neutral(2)

Sentence: Had a great time.

Sentiment: Positive(3)

Although the tweet begins with a negative remark, the overall sentiment here is clearly positive due to the
final note about time well spent at the movies. This pattern also works for reviews where customers
describe their experience with a product much like a story, as in the following example:

I love the stories from this publisher. They are always so enjoyable. But this one disappointed
me.

Here is the sentiment analysis for it:


Sentence: I love the stories from this publisher.

Sentiment: Positive(3)

Sentence: They are always so enjoyable.

Sentiment: Positive(3)

Sentence: But this one disappointed me.

Sentiment: Negative(1)

As you can see, more comments here are positive, but the entire block has an overall negative sentiment
due to the final, disapproving remark. As in the previous example, this suggests that in a text block like
this one, later sentences should be weighted more heavily than earlier ones.

For the ratio, you might use the index value of each sentence in the text, taking advantage of the fact that
a later sentence has a greater index value. In other words, the importance increases proportionally to the
index value of a sentence.

A matter of scale
Another important thing to decide is the scale you’re going to use for sentiment evaluation of each
sentence, as the best solution may vary depending on the type of text blocks you’re dealing with.

To evaluate tweets, for example, you might want to employ all five levels of sentiment available with
Stanford CoreNLP: very negative, negative, neutral, positive, and very positive.

When it comes to product review analysis, you might choose only two levels of sentiment—positive and
negative—rounding all other options to one of these two. Since both the negative and the positive classes
in Stanford CoreNLP are indexed with an odd number (1 and 3, respectively), you can tune the sentiment
evaluation method discussed earlier to round the weighted average being calculated to its nearest odd
integer.

To try this, you can add to the nlpPipeline class as follows:

public static void getStorySentiment(String story)


{
int sentenceSentiment;
int reviewSentimentWeightedSum = 0;
Annotation annotation = pipeline.process(story);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnn
int divisorWeighted = 0;
for (int i = 1; i <= sentences.size(); i++)
{
{
Tree tree = sentences.get(i-1).get(SentimentAnnotatedTree.class);
sentenceSentiment = RNNCoreAnnotations.getPredictedClass(tree);
reviewSentimentWeightedSum = reviewSentimentWeightedSum + sentenc
divisorWeighted += i;
}
System.out.println("Weighted average sentiment:\t" + (double)(2*Math.
}

Copy code snippet

Test the above method with the following code:

public class OverallStorySentiment


{
public static void main(String[] args)
{
String text = "The weather in the morning was terrible. We de
nlpPipeline.init();
nlpPipeline.getStorySentiment(text);
}
}

Copy code snippet

Recompile nlpPipeline and compile the newly created OverallStorySentiment class, and run
OverallStorySentiment as follows:

$ javac nlpPipeline.java
$ javac OverallStorySentiment.java
$ java OverallStorySentiment

The result should look as follows:

Weighted average sentiment: 3.0

This test uses a single sample text to test the sentiment-determining method discussed here. For an
example of how to perform such a test against a set of samples, refer back to the first article in this series.

Conclusion
This article looked at two methods of calculating the overall sentiment of a multisentence text block. Both
methods assume different sentences within a text block can affect the overall sentiment differently.
The first method determines the sentiment of customer reviews and is based on the observation that
the most-significant comments in a product review are at the beginning and end.

The second method calculates the overall sentiment by increasing the weight of each sentence as you
move from the beginning to the end of the text. This method may work fine for storylike texts where
the importance of sentences typically increases as the story progresses.

You can (and should) experiment with these and other methods to find the approach that best models the
type of text in your business case.

The final article of this series will show how to train the Stanford CoreNLP sentiment tool with your own
data to understand domain-specific phrases.

Dig deeper
Perform textual sentiment analysis in Java using a deep learning model

Natural language processing at your fingertips with OCI Language


How to program machine learning in Java with the Tribuo library

Performing sentiment analysis using Oracle Text

Yuli Vasiliev

Yuli Vasiliev is a programmer, freelance author, and consultant currently specializing in open source
development; Oracle database technologies; and, more recently, natural-language processing (NLP).

 Previous Post Next Post 

Resources Why Oracle Learn What's New Contact Us


for
Analyst What is Try Oracle US Sales
About Reports Customer Cloud Free Tier 1.800.633.0738
Best CRM Service? Oracle How can we help?
Careers
What is ERP? Sustainability
Developers Cloud What is Subscribe to
Economics Marketing Oracle Content
Investors
Corporate Automation? Oracle COVID- Try Oracle Cloud
Partners 19 Response
Responsibility What is Free Tier
Startups Oracle and
Diversity and Procurement? Events
Inclusion SailGP
What is Talent News
Security Management? Oracle and
Practices Premier
What is VM?
League
Oracle and Red
Bull Racing
Honda

© 2022 Oracle Site Map Privacy / Do Not Sell My Info Cookie Preferences Ad Choices Careers

You might also like