0% found this document useful (0 votes)

97 views

Step 1: Create A CSV File: # For Text Mining

This document provides steps to perform text mining and sentiment analysis in R. It describes how to load required packages, import text data from a file or URL, clean the text by transforming, removing numbers/stopwords/punctuation, and stemming words. It also shows how to build a term-document matrix, generate a word cloud, and analyze sentiment of text using the SentimentAnalysis package to classify statements as positive, negative or neutral.

Uploaded by

deeksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

Step 1: Create A CSV File: # For Text Mining

Uploaded by

deeksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Step 1: Create a csv file

Step 2 : Install and load the required packages

# for text mining

install.packages("tm")
# for text stemming
install.packages("SnowballC")
# for word-cloud generator
install.packages("wordcloud")
# for colour palettes
install.packages("RColorBrewer")

# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

Step 3 : Text mining

load the text

text <- readLines(file.choose())

2. Load the data as a corpus

docs <- Corpus(VectorSource(text))

3. Inspect the content of the document

inspect(docs)
Build a term-document matrix

dtm <- TermDocumentMatrix(docs)

m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

Text transformation

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")

Cleaning the text

# Convert the text to lower case

docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word # specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
docs <- tm_map(docs, stemDocument)

Step 4 : Build a term-document matrix

dtm <- TermDocumentMatrix(docs)

m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

Step 5 : Generate the Word cloud

set.seed(1234)

wordcloud(words = d$word, freq = d$freq, min.freq = 1,

max.words=200, random.order=FALSE, rot.per=0.35,

colors=brewer.pal(8, "Dark2"))

Explore frequent terms and their associations

findFreqTerms(dtm, lowfreq = 4)

findAssocs(dtm, terms = "app", corlimit = 0.3)

Plot word frequencies

barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word, col ="lightblue", main ="Most

frequent words", ylab = "Word frequencies")

Read the text file from internet

filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-
dream-speech.txt"
text <- readLines(filePath)
How to fetch Data from Tweeter?

Go to
https://developer.twitter.com/en/apps

Create your API

consumerKey="RsB4yZ4VkWSGm8ENNFT0b3TjS"
consumerSecret="FhkzngdQAXlRrxGzpnlix0vnHYpEEk1rWJDSZQxqHjG0knQMDp"
accessToken="707517769739214848-ViNwLenOczx1PRl8rjQJb8Epl808JJy"
accessTokenSecret="IEXcppF7iLMqXISguvAbh5q9VTsXGb9pTIAcfaIe8uFpE"
setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret)
tweets= searchTwitter("dearicaipleasechange", n=1000, lang="en")

How to save the tweets?

tweet.df <- twListToDF(tweets)
write.csv(tweet.df, "tweets1.csv")
getwd()
Sentiment Analysis
install.packages("SentimentAnalysis")
library(SentimentAnalysis)

# Analyze a single string to obtain a binary response (positive / negative)

sentiment <- analyzeSentiment("Yeah, this was a great soccer game for the German team!")
convertToBinaryResponse(sentiment)$SentimentQDAP

# Create a vector of strings

documents <- c("Wow, I really like the new light sabers!",
"That book was excellent.",
"R is a fantastic language.",
"The service in this restaurant was miserable.",
"This is neither positive or negative.",
"The waiter forget about my dessert -- what poor service!")

sentiment <- analyzeSentiment(documents)

sentiment$SentimentQDAP

# View sentiment direction (i.e. positive, neutral and negative)

convertToDirection(sentiment$SentimentQDAP)
For available documents
sentiment <- analyzeSentiment(documents)
For library documents
library(tm)
data("crude")

# Analyze sentiment
sentiment <- analyzeSentiment(crude)

# Count positive and negative news releases

table(convertToBinaryResponse(sentiment$SentimentLM))
#News releases with highest and lowest sentiment
crude[[which.max(sentiment$SentimentLM)]]$meta$heading
crude[[which.min(sentiment$SentimentLM)]]$meta$heading
# Visualize distribution of standardized sentiment variable
hist(scale(sentiment$SentimentLM))
# Compute cross-correlation
cor(sentiment[, c("SentimentLM", "SentimentHE", "SentimentQDAP")])
# crude oil news between 1987-02-26 until 1987-03-02
datetime <- do.call(c, lapply(crude, function(x) x$meta$datetimestamp))
plotSentiment(sentiment$SentimentLM)
Twitter Sentiment Analysis

# Install Requried Packages

installed.packages("SnowballC")
installed.packages("tm")
installed.packages("twitteR")
installed.packages("syuzhet")

tweets.df <- twListToDF(tweets)

head(tweets.df)

head(tweets.df$text)
#Remove URLs, hashtags and other twitter handles
tweets.df2 <- gsub("http.*","",tweets.df$text)
tweets.df2 <- gsub("https.*","",tweets.df2)
tweets.df2 <- gsub("#.*","",tweets.df2)
tweets.df2 <- gsub("@.*","",tweets.df2)

We will first try to get the emotion score for each of the tweets. ‘Syuzhet’ breaks the
emotion into 10 different emotions – anger, anticipation, disgust, fear, joy, sadness,
surprise, trust, negative and positive.
word.df <- as.vector(tweets.df2)
emotion.df <- get_nrc_sentiment(word.df)
emotion.df2 <- cbind(tweets.df2, emotion.df)
head(emotion.df2)
Most Positive Comment

sent.value <- get_sentiment(word.df)

most.positive <- word.df[sent.value == max(sent.value)]

most.positive

most.negative <- word.df[sent.value <= min(sent.value)]

most.negative

Sentiment score for each tweets

sent.value
Segregating positive and negative tweets
positive.tweets <- word.df[sent.value > 0]
head(positive.tweets)

negative.tweets <- word.df[sent.value < 0]

head(negative.tweets)

neutral.tweets <- word.df[sent.value == 0]

head(neutral.tweets)

Summer of The Mariposas PDF
29% (17)
Summer of The Mariposas PDF
53 pages
Text Mining Code
No ratings yet
Text Mining Code
3 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
Readme
No ratings yet
Readme
3 pages
Joey by Kent, Jack
No ratings yet
Joey by Kent, Jack
50 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
35 pages
Harvesting and Analyzing Tweets Using R
No ratings yet
Harvesting and Analyzing Tweets Using R
23 pages
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
No ratings yet
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
6 pages
Text Analysis
No ratings yet
Text Analysis
15 pages
Text Analysis
No ratings yet
Text Analysis
15 pages
Text Mining Code
No ratings yet
Text Mining Code
2 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Twittermining: 1 Twitter Text Mining - Required Libraries
No ratings yet
Twittermining: 1 Twitter Text Mining - Required Libraries
4 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
34 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
5 Paso S Text Mining
No ratings yet
5 Paso S Text Mining
4 pages
Review Analysis Using R Software: Team Members
No ratings yet
Review Analysis Using R Software: Team Members
10 pages
Word Cloud
No ratings yet
Word Cloud
10 pages
Sentiment Analysis of Online Data For Business Analytics: Synopsis
No ratings yet
Sentiment Analysis of Online Data For Business Analytics: Synopsis
6 pages
Module 8 - Text - Update
No ratings yet
Module 8 - Text - Update
42 pages
DMW Project Report by Saurabh Zingade
No ratings yet
DMW Project Report by Saurabh Zingade
16 pages
Basic Textual Analysis in R
No ratings yet
Basic Textual Analysis in R
2 pages
Sentiment 2
No ratings yet
Sentiment 2
7 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
23 pages
Word Cloud
No ratings yet
Word Cloud
3 pages
Lab5 Instructions
No ratings yet
Lab5 Instructions
51 pages
Sypnosis: Twitter Sentimental Analysis
No ratings yet
Sypnosis: Twitter Sentimental Analysis
3 pages
RDataMining Slides Twitter Analysis
100% (1)
RDataMining Slides Twitter Analysis
40 pages
Text Analysis
No ratings yet
Text Analysis
13 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
Hands-On Data Science With R Text Mining
No ratings yet
Hands-On Data Science With R Text Mining
41 pages
Twitter Sentiment Analysis For Product Review
No ratings yet
Twitter Sentiment Analysis For Product Review
19 pages
Lecture 8
No ratings yet
Lecture 8
45 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
crowd sourcing platform IEEE paper 1
No ratings yet
crowd sourcing platform IEEE paper 1
7 pages
Business Analytics CA3
No ratings yet
Business Analytics CA3
11 pages
Packages Which Are Used For Above Analysis
No ratings yet
Packages Which Are Used For Above Analysis
4 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Data Science With R Text Mining by Graham Williams
No ratings yet
Data Science With R Text Mining by Graham Williams
21 pages
Text Mining and Sentiment Assignment
No ratings yet
Text Mining and Sentiment Assignment
5 pages
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis_1605073640
No ratings yet
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis_1605073640
5 pages
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
No ratings yet
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
6 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
Method For The Analysis of Sentiments in Social Networks With The Use of R
No ratings yet
Method For The Analysis of Sentiments in Social Networks With The Use of R
16 pages
Package Sentimentr': R Topics Documented
No ratings yet
Package Sentimentr': R Topics Documented
49 pages
Design Review
No ratings yet
Design Review
16 pages
A Tutorial of Text Mining in R Using TM Package
No ratings yet
A Tutorial of Text Mining in R Using TM Package
6 pages
Tmcode Text Mining
No ratings yet
Tmcode Text Mining
2 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Hands-On Data Science With R Text Mining: 10th January 2016
No ratings yet
Hands-On Data Science With R Text Mining: 10th January 2016
47 pages
Amazon Sentimental Analysis
No ratings yet
Amazon Sentimental Analysis
8 pages
SMTA - Lab Record - Aim, Procedures and Results
No ratings yet
SMTA - Lab Record - Aim, Procedures and Results
31 pages
Text Mining in R: A Tutorial
No ratings yet
Text Mining in R: A Tutorial
7 pages
TWITTER SENTIMENT NLP Projectt
No ratings yet
TWITTER SENTIMENT NLP Projectt
19 pages
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
No ratings yet
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
4 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Week 8
No ratings yet
Week 8
24 pages
Sentiment Analysis Presentationnotes
No ratings yet
Sentiment Analysis Presentationnotes
4 pages
R Code NB
No ratings yet
R Code NB
3 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Ms Access 2007: Step by Step
From Everand
Ms Access 2007: Step by Step
Asim Abbasi
5/5 (1)
C Questions
No ratings yet
C Questions
19 pages
Will The Real Cindy Sherman Please Stand Up?: Eva Respini
No ratings yet
Will The Real Cindy Sherman Please Stand Up?: Eva Respini
21 pages
Casa Da Musica and The Question of Form: Harvard GSD Course 3211 - Rafael Moneo On Contemporary Architecture
No ratings yet
Casa Da Musica and The Question of Form: Harvard GSD Course 3211 - Rafael Moneo On Contemporary Architecture
24 pages
SYNOPSIS FOR THESI Kala
No ratings yet
SYNOPSIS FOR THESI Kala
17 pages
W. Dennison - Ch. R. Morey, Studies in East Christian and Roman Art, New York - London 1918
100% (1)
W. Dennison - Ch. R. Morey, Studies in East Christian and Roman Art, New York - London 1918
350 pages
Houston Newcomer and Relocation Guide Summer 2012
No ratings yet
Houston Newcomer and Relocation Guide Summer 2012
212 pages
tm1 Api
No ratings yet
tm1 Api
428 pages
A New Aetosaur Genus (Archosauria, Pseudosuchia) From The Early Late Triassic of Southern Brazil
No ratings yet
A New Aetosaur Genus (Archosauria, Pseudosuchia) From The Early Late Triassic of Southern Brazil
33 pages
Readme
No ratings yet
Readme
16 pages
Speculum Astronomiae Praemittuntur Autem
No ratings yet
Speculum Astronomiae Praemittuntur Autem
174 pages
Dicionário Assírio M 2
No ratings yet
Dicionário Assírio M 2
344 pages
Conservation of Arabic Manuscripts
No ratings yet
Conservation of Arabic Manuscripts
46 pages
Giant Snakes of Bygone Days PDF
No ratings yet
Giant Snakes of Bygone Days PDF
11 pages
Newsletter 01 ENG
No ratings yet
Newsletter 01 ENG
4 pages
Kontron JIDA32 Library API: Technical Manual Rev. 1.8
No ratings yet
Kontron JIDA32 Library API: Technical Manual Rev. 1.8
61 pages
The New Old World Perry Anderson instant download
100% (1)
The New Old World Perry Anderson instant download
33 pages
The Hottentot Venus Freak Shows and The
No ratings yet
The Hottentot Venus Freak Shows and The
16 pages
2013 King Program
No ratings yet
2013 King Program
25 pages
Roi Eng 1632649063 Grammar Homework Week 11 Definite Indefinite Ar - Ver - 3
No ratings yet
Roi Eng 1632649063 Grammar Homework Week 11 Definite Indefinite Ar - Ver - 3
4 pages
MicroStation Basic Macros...
100% (1)
MicroStation Basic Macros...
4 pages
PDF Judging Exhibitions A Framework for Assessing Excellence Beverly Serrell download
100% (4)
PDF Judging Exhibitions A Framework for Assessing Excellence Beverly Serrell download
51 pages
Cu 31924030683274
No ratings yet
Cu 31924030683274
156 pages
(Ebook) You May Ask Yourself: Introduction to thinking like a sociologist by Dalton Conley ISBN 9780393937732, 0393937739 download
100% (2)
(Ebook) You May Ask Yourself: Introduction to thinking like a sociologist by Dalton Conley ISBN 9780393937732, 0393937739 download
47 pages
Cotton Yarn - Quality Depends On Mixing Strategy - Spinning & Weaving - Features - The ITJ
100% (1)
Cotton Yarn - Quality Depends On Mixing Strategy - Spinning & Weaving - Features - The ITJ
12 pages
BK Hodd 002171
No ratings yet
BK Hodd 002171
21 pages
The Drowned Muse Casting the Unknown Woman of the Seine Across the Tides of Modernity 1st Edition Anne-Gaelle Saliot - The ebook is ready for instant download and access
100% (2)
The Drowned Muse Casting the Unknown Woman of the Seine Across the Tides of Modernity 1st Edition Anne-Gaelle Saliot - The ebook is ready for instant download and access
43 pages
SPT 2012 Preservation of Information Resources
100% (1)
SPT 2012 Preservation of Information Resources
35 pages

Step 1: Create A CSV File: # For Text Mining

Uploaded by

Step 1: Create A CSV File: # For Text Mining

Uploaded by

Step 1: Create a csv file

Step 2 : Install and load the required packages

# for text mining

Step 3 : Text mining

load the text

text <- readLines(file.choose())

2. Load the data as a corpus

docs <- Corpus(VectorSource(text))

3. Inspect the content of the document

dtm <- TermDocumentMatrix(docs)

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

Cleaning the text

# Convert the text to lower case

Step 4 : Build a term-document matrix

dtm <- TermDocumentMatrix(docs)

Step 5 : Generate the Word cloud

wordcloud(words = d$word, freq = d$freq, min.freq = 1,

max.words=200, random.order=FALSE, rot.per=0.35,

Explore frequent terms and their associations

findAssocs(dtm, terms = "app", corlimit = 0.3)

barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word, col ="lightblue", main ="Most

Read the text file from internet

Create your API

How to save the tweets?

# Analyze a single string to obtain a binary response (positive / negative)

# Create a vector of strings

sentiment <- analyzeSentiment(documents)

# View sentiment direction (i.e. positive, neutral and negative)

# Count positive and negative news releases

# Install Requried Packages

tweets.df <- twListToDF(tweets)

sent.value <- get_sentiment(word.df)

most.positive <- word.df[sent.value == max(sent.value)]

most.negative <- word.df[sent.value <= min(sent.value)]

Sentiment score for each tweets

negative.tweets <- word.df[sent.value < 0]

neutral.tweets <- word.df[sent.value == 0]

You might also like