0% found this document useful (0 votes)

85 views7 pages

Text Mining in R: A Tutorial

This document provides a tutorial on performing text mining in R. It discusses reading text files into R, preprocessing text such as converting to lowercase and removing stopwords, creating a document term matrix, generating a word cloud visualization of frequent words, and performing sentiment analysis. Code examples are provided using R packages like tm, SnowballC, wordcloud, and syuzhet to demonstrate common text mining techniques in R.

Uploaded by

meenana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views7 pages

Text Mining in R: A Tutorial

Uploaded by

meenana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Text mining in R: a tutorial

This tutorial was built for people who wanted to learn the essential tasks required to process text
for meaningful analysis in R, one of the most popular and open source programming languages
for data science. At the end of this tutorial, you’ll have developed the skills to read in large files
with text and derive meaningful insights you can share from that analysis. You’ll have learned how
to do text mining in R, an essential data mining tool. The tutorial is built to be followed along with
tons of tangible code examples. The full repository with all of the files and data is here if you wish
to follow along.

Searching for a job using R? Check out our list of R Interview Questions first!

If you don’t have an R environment set up already, the easiest way to follow along would be to
use Jupyter with R. Jupyter offers an interactive R environment where you can easily modify inputs
and get the outputs demonstrated rapidly so you can rapidly get up to speed on text mining in R.

Text mining definition

Natural languages (English, Hindi, Mandarin etc.) are different from programming languages. The
semantic or the meaning of a statement depends on the context, tone and a lot of other factors.
Unlike programming languages, natural languages are ambiguous.

Text mining deals with helping computers understand the “meaning” of the text. Some of the
common text mining applications include sentiment analysis e.g if a Tweet about a movie says
something positive or not, text classification e.g classifying the mails you get as spam or ham etc.

In this tutorial, we’ll learn about text mining and use some R libraries to implement some common
text mining techniques. We’ll learn how to do sentiment analysis, how to build word clouds, and
how to process your text so that you can do meaningful analysis with it.

R
R is succinctly described as “a language and environment for statistical computing and graphics,”
which makes it worth knowing if you’re dabbling in the data science/art of statistics and
exploratory data analysis. R has a wide variety of useful packages.

Here, we’ll focus on R packages useful in understanding and extracting insights from the text and
text mining packages.

In this tutorial, we will be using the following packages:

1. RSQLite, ‘SQLite’ Interface for R

2. tm, framework for text mining applications
3. SnowballC, text stemming library
4. Wordcloud, for making wordcloud visualizations
5. Syuzhet, text sentiment analysis
6. ggplot2, one of the best data visualization libraries
7. quanteda, N-grams
You can install the aforementioned packages using the following command:
install.package(“package name”)

Text preprocessing
Before we dive into analyzing text, we need to preprocess it. Text data contains white spaces,
punctuations, stop words etc. These characters do not convey much information and are hard to
process. For example, English stop words like “the”, “is” etc. do not tell you much information
about the sentiment of the text, entities mentioned in the text, or relationships between those
entities. Depending upon the task at hand, we deal with such characters differently. This will help
isolate text mining in R on important words.

Word cloud
A word cloud is a simple yet informative way to understand textual data and to do text analysis. In
this example, we will try to visualize Hillary Clinton’s Emails. This will help us quantify the content
of the Emails and help us derive insights and better communicate our results Along the way, we’ll
also learn about some data preprocessing steps that will be immensely helpful in other text mining
tasks as well. Let’s start with getting the data. You can head over to Kaggle to download the
dataset.

Let’s read the data and learn to implement the preprocessing steps.

[code lang=”r” toolbar=”true” title=”Reading data in with R”]library(RSQLite)

db <- dbConnect(dbDriver(“SQLite”), “/Users/shubham/Documents/hillary-clinton-emails/database.sqlite”)

# Get all the emails sent by Hillary

emailHillary <- dbGetQuery(db, “SELECT ExtractedBodyText EmailBody FROM Emails e INNER JOIN Persons p ON
e.SenderPersonId=P.Id WHERE p.Name=’Hillary Clinton’ AND e.ExtractedBodyText != ” ORDER BY RANDOM()”)
emailRaw <- paste(emailHillary$EmailBody, collapse=” // “)
[/code]

The above code reads in the “database.sqlite” file into R. SQLite is an embedded SQL database
engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite
reads and writes directly to ordinary disk files. So, you can read an SQLite file just as you would
read a CSV or a text file. Accordingly, the same theory would apply to any type of CSV or text file or
input file that you can work with in R, though you would use a different approach.

This guide shows how you would read different file formats such as Excel, R and .txt files into R and
other data sources (including social media data).

Here, we’ll use the package RSQLite to read in a SQLite file containing all of Hillary Clinton’s emails.
Next, we will be querying the column containing the Email text body. Then we’ll be ready to do an
analysis of the Clinton emails that shaped this political season.

We’ll perform the following steps to make sure that the text mining in R we’re dealing with is clean:

 Convert the text to lower case, so that words like “write” and “Write” are considered the same word
for analysis
 Remove numbers
 Remove English stopwords e.g “the”, “is”, “of”, etc
 Remove punctuation e.g “,”, “?”, etc
 Eliminate extra white spaces
 Stemming our text

Stemming is the process of reducing inflected (or sometimes derived) words to their word stem,
base or root form. E.g changing “car”, “cars”, “car’s”, “cars’” to “car”. This can also help with
different verb tenses with the same semantic meaning such as digs, digging, and dig.

One very useful library to perform the aforementioned steps and text mining in R is the “tm”
package. The main structure for managing documents in tm is called a Corpus, which represents a
collection of text documents.

[code lang=”r” toolbar=”true” title=”Cleaning text in R”]

# Transform and clean the text
library(“tm”)
docs <- Corpus(VectorSource(emailRaw))[/code]

Once we have our email corpus (all of Hillary’s emails) stored in the variable “docs”, we’ll want to
modify the words within the emails in it with the techniques we discussed above such as stemming,
stopword removal and more. With the tm library, this can be done easily. Transformations are
done via the tm_map() function which applies a function to all elements of the corpus. Basically, all
transformations work on single text documents and tm_map() just applies them to all documents
in a corpus. If you wanted to convert all the text of Hillary’s emails into lowercase at once, you’d
use the tm library and the techniques below to do so easily.

[code lang=”r” toolbar=”true” title=”Using the TM library to process text”]

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords(“english”))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
[/code]

To stem text, we will need another library, known as SnowballC.

[code lang=”r” toolbar=”true” title=”Using the SnowballC library to stem text”]

# Text stemming (reduces words to their root form)
library(“SnowballC”)
docs <- tm_map(docs, stemDocument)
# Remove additional stopwords
docs <- tm_map(docs, removeWords, c(“clintonemailcom”, “stategov”, “hrod”))
[/code]
A document term matrix is an important representation for text mining in R tasks and an important
concept in text analytics. Each row of the matrix is a document vector, with one column for every
term in the entire corpus.

Naturally, some documents may not contain a given term, so this matrix is sparse. The value in
each cell of the matrix is the term frequency.

tm makes it very easy to create the term-document matrix. With the document term matrix made,
we can then proceed to build a word cloud for Hillary’s emails, highlighting which words are the
most frequently made.

[code lang=”r” toolbar=”true” title=”Using the SnowballC library to stem text”]

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
[/code]

[code lang=”r” toolbar=”true” title=”Generating a wordcloud of Hillary’s emails”]

# Generate the WordCloud
library(“wordcloud”)
library(“RColorBrewer”)
par(bg=”grey30″)
png(file=”WordCloud.png”,width=1000,height=700, bg=”grey30″)
wordcloud(d$word, d$freq, col=terrain.colors(length(d$word), alpha=0.9), random.order=FALSE, rot.per=0.3 )
title(main = “Hillary Clinton’s Most Used Used in the Emails”, font.main = 1, col.main = “cornsilk3”, cex.main = 1.5)
dev.off()
[/code]
Sentiment Analysis

Sentiment analysis is the process of determining whether a piece of writing is positive, negative or
neutral. Here, we’ll work with the package “syuzhet”.

Just as the previous example, we’ll read the Emails from the database.

[code lang=”r” toolbar=”true” title=”Read emails into syuzhet”]

Emails <- data.frame(dbGetQuery(db,”SELECT * FROM Emails”))
library(‘syuzhet’)
[/code]

“syuzhet” uses NRC Emotion lexicon. The NRC emotion lexicon is a list of words and their
associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust)
and two sentiments (negative and positive).

The get_nrc_sentiment function returns a data frame in which each row represents a sentence
from the original file. The columns include one for each emotion type was well as the positive or
negative sentiment valence. It allows us to take a body of text and return which emotions it
represents — and also whether the emotion is positive or negative.

[code lang=”r” toolbar=”true” title=”Do sentiment analysis of Hillary’s emails”]

d<-get_nrc_sentiment(Emails$RawText)
td<-data.frame(t(d))

td_new <- data.frame(rowSums(td[2:7945]))

#The function rowSums computes column sums across rows for each level of a grouping variable.

#Transformation and cleaning

names(td_new)[1] <- “count”
td_new <- cbind(“sentiment” = rownames(td_new), td_new)
rownames(td_new) <- NULL
td_new2<-td_new[1:8,]
[/code]

Now, we’ll use “ggplot2” to create a bar graph. Each bar represents how prominent the each of the
emotion is in text.

[code lang=”r” toolbar=”true” title=”Graph the sentiment analysis in ggplot2″]

#Visualisation
library(“ggplot2″)
qplot(sentiment, data=td_new2, weight=count, geom=”bar”,fill=sentiment)+ggtitle(“Email
sentiments”)
[/code]
N-grams

You must have noticed YouTube’s auto-captioning feature. Auto-captioning is a speech recognition
problem. One of the features in being able to generate captions automatically from audio is to
predict what word comes after a given sequence of words. E.g

I’d like to make a …

Hopefully, you concluded that the next word in the sequence is “call”. We do this by first analyzing
what words frequently co-occur. We formalize this by introducing N-grams. An n-gram is a
contiguous sequence of n items from a given sequence of text or speech. In other words, we’ll be
finding collocations. a collocation is a sequence of words or terms that co-occur more often than
would be expected by chance. An example of this would be the term “very much.”

In this section, we’ll use the R-library “quanteda” to compute tri-grams to find commonly occuring
sequence of 3 words.

[code lang=”r” toolbar=”true” title=”Calculating trigrams in quanteda”]

library(tm)
library(RSQLite)
library(quanteda)

db <- dbConnect(dbDriver(“SQLite”), “/Users/shubham/Documents/hillary-clinton-

emails/database.sqlite”)

# Get all the emails sent by Hillary

emailHillary <- dbGetQuery(db, “SELECT ExtractedBodyText EmailBody FROM Emails e INNER JOIN
Persons p ON e.SenderPersonId=P.Id WHERE p.Name=’Hillary Clinton’
AND e.ExtractedBodyText != ” ORDER BY RANDOM()”)
emails <- paste(emailHillary$EmailBody, collapse=” // “)
[/code]

We will use quanteda’s function collocations to do so. And, finally we’ll remove stopwords from the
collocations so we can get a full view of which are the most frequently used collection of three
words in Hillary’s emails.

[code lang=”r” toolbar=”true” title=”Remove stopwords”]

collocations(emails, size = 2:3)
print(removeFeatures(collocations(emails, size = 2:3), stopwords(“english”)))
[/code]

Conclusion
We set out to inform you how to do some of the most common text mining in R tasks with
examples and sample code. Leave a comment below if you think we’re missing something or if you
want to add something to this text mining in R discussion!

Physics Principles With Applications 5th Edition Full Download
100% (1)
Physics Principles With Applications 5th Edition Full Download
403 pages
Fundamentals of Electrical Engineering 1st Edition Full Download
100% (1)
Fundamentals of Electrical Engineering 1st Edition Full Download
407 pages
Electrical Engineering 6th Edition Full Download
100% (2)
Electrical Engineering 6th Edition Full Download
407 pages
(European Association of Methodology Series) Uwe Engel, Anabel Quan-Haase, Sunny Liu, Lars E. Lyberg - Handbook of Computational Social Science, Volume 1_ Theory, Case Studies and Ethics-Routledge (20
No ratings yet
(European Association of Methodology Series) Uwe Engel, Anabel Quan-Haase, Sunny Liu, Lars E. Lyberg - Handbook of Computational Social Science, Volume 1_ Theory, Case Studies and Ethics-Routledge (20
469 pages
Tom A. B. Snijders - Multilevel Analysis - An Introduction To Basic and Advanced Multilevel Modeling (2011) - 1
No ratings yet
Tom A. B. Snijders - Multilevel Analysis - An Introduction To Basic and Advanced Multilevel Modeling (2011) - 1
521 pages
Stanley A. Gelfand - Hearing - An Introduction To Psychological and Physiological Acoustics, Sixth Edition-CRC Press (2018)
No ratings yet
Stanley A. Gelfand - Hearing - An Introduction To Psychological and Physiological Acoustics, Sixth Edition-CRC Press (2018)
419 pages
Chapter 1 MCQS Modern Project Management
100% (1)
Chapter 1 MCQS Modern Project Management
11 pages
CEO Database
100% (2)
CEO Database
176 pages
Topic 7 - Moderation Analysis
No ratings yet
Topic 7 - Moderation Analysis
21 pages
Binaural Hearing-Sound Localization-Spatial Hearing - Stecker Gallun 2012
No ratings yet
Binaural Hearing-Sound Localization-Spatial Hearing - Stecker Gallun 2012
53 pages
Transmission Line Modelling and Performance
100% (1)
Transmission Line Modelling and Performance
8 pages
State Estimation in Electric Power Systems - A Generalized Approach (Monticelli) (2012)
100% (4)
State Estimation in Electric Power Systems - A Generalized Approach (Monticelli) (2012)
405 pages
Writing A Research Proposal
No ratings yet
Writing A Research Proposal
8 pages
Unit 5 - Surround and 3D Sound Systems
No ratings yet
Unit 5 - Surround and 3D Sound Systems
124 pages
Latent Semantic Analysis
No ratings yet
Latent Semantic Analysis
36 pages
Latent Semantic Analysis
No ratings yet
Latent Semantic Analysis
15 pages
Vehkalehti & Everitt, 2019 - Cap 12
100% (3)
Vehkalehti & Everitt, 2019 - Cap 12
27 pages
A Survey of Topic Pattern Mining in Text Mining PDF
No ratings yet
A Survey of Topic Pattern Mining in Text Mining PDF
7 pages
Course in Causal Inference
No ratings yet
Course in Causal Inference
428 pages
Livro de Estatistica PDF
No ratings yet
Livro de Estatistica PDF
623 pages
Rasch Models Foundations, Recent Developments, and Applications
No ratings yet
Rasch Models Foundations, Recent Developments, and Applications
428 pages
Running Head: Introduction To Latent Semantic Analysis
0% (1)
Running Head: Introduction To Latent Semantic Analysis
41 pages
320d Wiring
90% (10)
320d Wiring
2 pages
A Brief Tutorial On Interval Type-2 Fuzzy Sets and Systems
No ratings yet
A Brief Tutorial On Interval Type-2 Fuzzy Sets and Systems
10 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
Fuchs Kelso
0% (1)
Fuchs Kelso
31 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
Measurement Essentials 2nd Ed
100% (1)
Measurement Essentials 2nd Ed
205 pages
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
100% (2)
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
328 pages
Exploring Rating Scale Functioning For Survey Research 1071855379 9781071855379 Compress
No ratings yet
Exploring Rating Scale Functioning For Survey Research 1071855379 9781071855379 Compress
225 pages
SAS System For Regression
No ratings yet
SAS System For Regression
239 pages
The Auditory System: B. The Intensity of Amplitude of The Sound Measured in
No ratings yet
The Auditory System: B. The Intensity of Amplitude of The Sound Measured in
10 pages
BATES TheCompositionAndPerformanceOfSpatialMusic
No ratings yet
BATES TheCompositionAndPerformanceOfSpatialMusic
257 pages
Win Steps
No ratings yet
Win Steps
463 pages
Collective Actions in Europe - A Comparative, Economic and Transsystemic Analysis-Springer International Publishing (2019)
No ratings yet
Collective Actions in Europe - A Comparative, Economic and Transsystemic Analysis-Springer International Publishing (2019)
132 pages
Non-Parametric Methods:: Analysis of Ranked Data
100% (1)
Non-Parametric Methods:: Analysis of Ranked Data
37 pages
Attention, Perception, Learning, Memory, & Forgetting
No ratings yet
Attention, Perception, Learning, Memory, & Forgetting
319 pages
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
No ratings yet
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
6 pages
CIMA Syllabus Final
No ratings yet
CIMA Syllabus Final
128 pages
Topic Modeling Using LDA
No ratings yet
Topic Modeling Using LDA
10 pages
Metpen Sosiolinguistik PDF
No ratings yet
Metpen Sosiolinguistik PDF
184 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Presentation - The Economics of Competition Law
No ratings yet
Presentation - The Economics of Competition Law
24 pages
Patrick Barlow and Tiffany Smith
No ratings yet
Patrick Barlow and Tiffany Smith
41 pages
Making Sense of Statistics
No ratings yet
Making Sense of Statistics
16 pages
A Place Theory of Sound Localization Jeffress-1948 ITD Calculation in The SON
No ratings yet
A Place Theory of Sound Localization Jeffress-1948 ITD Calculation in The SON
5 pages
240E3A - Statistics For Behavioral Science
No ratings yet
240E3A - Statistics For Behavioral Science
3 pages
Manual - Metodologia Livro PDF
No ratings yet
Manual - Metodologia Livro PDF
258 pages
2005 - Active Listening Room Compensation For Spatial Sound Reproduction Systems
No ratings yet
2005 - Active Listening Room Compensation For Spatial Sound Reproduction Systems
293 pages
4.EURAMET Guide On Comparisons (Toolbox)
No ratings yet
4.EURAMET Guide On Comparisons (Toolbox)
24 pages
INDIA ITME 2012-Presentation
No ratings yet
INDIA ITME 2012-Presentation
99 pages
Hands-On Data Science With R Text Mining: 10th January 2016
No ratings yet
Hands-On Data Science With R Text Mining: 10th January 2016
47 pages
Chap 1-4, Statistical Inference, by Casella and Berger PDF
No ratings yet
Chap 1-4, Statistical Inference, by Casella and Berger PDF
686 pages
Linear Discriminant Analysis (Lda)
No ratings yet
Linear Discriminant Analysis (Lda)
11 pages
Is 14223 1 1995
No ratings yet
Is 14223 1 1995
10 pages
Structural Equation Models: The Basics
No ratings yet
Structural Equation Models: The Basics
15 pages
Bifactor Modelling in Mplus
No ratings yet
Bifactor Modelling in Mplus
55 pages
Hands-On Data Science With R Text Mining
No ratings yet
Hands-On Data Science With R Text Mining
41 pages
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
No ratings yet
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
19 pages
poLCA An R Package For Polytomous Variable Latent
No ratings yet
poLCA An R Package For Polytomous Variable Latent
29 pages
Karanja Evanson Mwangi Cit Masters Report Libre PDF
No ratings yet
Karanja Evanson Mwangi Cit Masters Report Libre PDF
136 pages
Text Analysis
No ratings yet
Text Analysis
15 pages
Statistics
No ratings yet
Statistics
27 pages
On The Perception of Incongruity - A Paradigm - Bruner & Postman
No ratings yet
On The Perception of Incongruity - A Paradigm - Bruner & Postman
12 pages
Syllabus of Ph.D. Course
No ratings yet
Syllabus of Ph.D. Course
6 pages
R Markdown
No ratings yet
R Markdown
15 pages
Course Outcome 3rd Semester
No ratings yet
Course Outcome 3rd Semester
11 pages
Part 1.2
100% (1)
Part 1.2
88 pages
Teaching Statistics To Engineers
No ratings yet
Teaching Statistics To Engineers
11 pages
I Need You To Survive - DB
No ratings yet
I Need You To Survive - DB
9 pages
Discoverer10g Administration
No ratings yet
Discoverer10g Administration
85 pages
Presumption of Constitutionality
No ratings yet
Presumption of Constitutionality
17 pages
Grating and Expanded Metal Catalog
No ratings yet
Grating and Expanded Metal Catalog
118 pages
Solution Assigment Chapter 5
No ratings yet
Solution Assigment Chapter 5
11 pages
MX SB RO: User Manual
No ratings yet
MX SB RO: User Manual
23 pages
Engineering Foundation 2020-2021
No ratings yet
Engineering Foundation 2020-2021
5 pages
X7DB8 X7Dbe: User'S Manual
No ratings yet
X7DB8 X7Dbe: User'S Manual
130 pages
ER Diagram
No ratings yet
ER Diagram
2 pages
Playwright JS Course Content
No ratings yet
Playwright JS Course Content
10 pages
Popescu Luiza PDF
No ratings yet
Popescu Luiza PDF
4 pages
DAY 1-Dr Stephen Opio Okiror
No ratings yet
DAY 1-Dr Stephen Opio Okiror
33 pages
Brake Drum
No ratings yet
Brake Drum
4 pages
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
No ratings yet
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
8 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages
(Student Version) 91264 - 2023 - Anything Is Popsicle
No ratings yet
(Student Version) 91264 - 2023 - Anything Is Popsicle
4 pages
Group Life Assurance in Myanmar
No ratings yet
Group Life Assurance in Myanmar
2 pages
3 RD Sem Results
No ratings yet
3 RD Sem Results
2 pages
Quests in White Orchard The Witcher 3 Wiki
No ratings yet
Quests in White Orchard The Witcher 3 Wiki
1 page
CV Ahmad Mustafa
No ratings yet
CV Ahmad Mustafa
1 page
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet

Text Mining in R: A Tutorial

Uploaded by

Text Mining in R: A Tutorial

Uploaded by

Text mining in R: a tutorial

Text mining definition

In this tutorial, we will be using the following packages:

1. RSQLite, ‘SQLite’ Interface for R

[code lang=”r” toolbar=”true” title=”Reading data in with R”]library(RSQLite)

# Get all the emails sent by Hillary

[code lang=”r” toolbar=”true” title=”Cleaning text in R”]

[code lang=”r” toolbar=”true” title=”Using the TM library to process text”]

To stem text, we will need another library, known as SnowballC.

[code lang=”r” toolbar=”true” title=”Using the SnowballC library to stem text”]

[code lang=”r” toolbar=”true” title=”Using the SnowballC library to stem text”]

[code lang=”r” toolbar=”true” title=”Generating a wordcloud of Hillary’s emails”]

[code lang=”r” toolbar=”true” title=”Read emails into syuzhet”]

[code lang=”r” toolbar=”true” title=”Do sentiment analysis of Hillary’s emails”]

td_new <- data.frame(rowSums(td[2:7945]))

#Transformation and cleaning

[code lang=”r” toolbar=”true” title=”Graph the sentiment analysis in ggplot2″]

I’d like to make a …

[code lang=”r” toolbar=”true” title=”Calculating trigrams in quanteda”]

db <- dbConnect(dbDriver(“SQLite”), “/Users/shubham/Documents/hillary-clinton-

# Get all the emails sent by Hillary

[code lang=”r” toolbar=”true” title=”Remove stopwords”]

You might also like