0% found this document useful (0 votes)
25 views

Text Mining Code

The document discusses loading packages and preparing text data for analysis in R. It shows how to create a corpus from text, clean the text by removing numbers, stopwords and punctuation, build a term document matrix, and generate a word cloud visualizing the most frequent terms.

Uploaded by

shubham solanki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Text Mining Code

The document discusses loading packages and preparing text data for analysis in R. It shows how to create a corpus from text, clean the text by removing numbers, stopwords and punctuation, build a term document matrix, and generate a word cloud visualizing the most frequent terms.

Uploaded by

shubham solanki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

#Install and load the required packages

# for text mining


install.packages("tm")
# for text stemming
install.packages("SnowballC")
# for word-cloud generator
install.packages("wordcloud")
# for colour palettes
install.packages("RColorBrewer")

# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library("readxl")

#Text mining

#Load the data as a corpus

docs <- Corpus(VectorSource(Text))

#Build a term-document matrix


dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

#Cleaning the text

# Convert the text to lower case


docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word # specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)

#Build a term-document matrix


dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

Step 5 : Generate the Word cloud


set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))

#Explore frequent terms


findFreqTerms(dtm, lowfreq = 4)

write.csv(d, "result.csv")
getwd()

You might also like