0% found this document useful (0 votes)

10 views

Text Analysis

The document provides a comprehensive overview of text analysis using R, covering techniques such as tokenization, sentiment analysis, and topic modeling. It includes practical examples of analyzing text data from various sources, including customer reviews and CSV files, with detailed code snippets. The document emphasizes the importance of text analysis in extracting insights from large volumes of text data across different domains.

Uploaded by

Vishwajeet Srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Text Analysis

Uploaded by

Vishwajeet Srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Text

Analysis

R
1
Table of Contents
1 Text Analysis: Introduction ............................................................................................................. 3
2 Text Analysis in R............................................................................................................................. 5
2.1 Text Analysis of a text ta(a) ..................................................................................................... 5
2.2 Text Analysis of a text ta(b)..................................................................................................... 5
2.3 Text Analysis of a text ta(c) ..................................................................................................... 7
2.4 Sentiment Analysis of a text ta(d) ........................................................................................... 8
2.5 Sentiment Analysis of a CSV file ta(e) ................................................................................... 10
2.6 Sentiment Analysis of a PDF file ta(f) .................................................................................... 11
2.7 Sentiment Analysis of Chapter 7.1 (HKD) ............................................................................. 13
2.8 Sentiment Analysis of Chapter 7.2(HKD) .............................................................................. 13

2
1 Text Analysis: Introduction
Suppose you have a mountain of text data: customer reviews, news articles, books – the
possibilities are endless. Text analysis is like having a powerful magnifying glass and a set of
tools to sift through this mountain and uncover hidden patterns, understand the underlying
meaning, and extract valuable insights.

Think of it this way:

You have a box full of jigsaw puzzles. Each puzzle piece is a word, and the entire box is a
collection of texts.

Text analysis helps you:

• Find all the corner pieces: Identify the most frequent words (like "the," "a," "is")
– these are common but not always the most meaningful.
• Group similar pieces: Find words that often appear together (like "delicious" and
"food," "fast" and "delivery") to understand themes and topics.
• Determine the overall picture: Analyse the sentiment (positive, negative, neutral)
expressed in the text, identify the main topics discussed, and even predict future
trends.

Let’s take a simple example

Let's say you have a collection of customer reviews for a restaurant. You can use text analysis
to:

• Identify common words: "delicious," "tasty," "service," "slow," "friendly,"

"disappointed."
• Analyse sentiment: Determine if the overall sentiment of the reviews is positive,
negative, or neutral.
• Find common themes: Identify recurring themes, such as slow service, delicious
food, or friendly staff.

Key Techniques in Text Analysis:

• Tokenisation: Breaking down text into individual words or sentences.

• Sentiment Analysis: Determining the emotional tone of the text (positive, negative,
neutral).
• Topic Modeling: Identifying the main topics discussed in the text.
• Named Entity Recognition: Identifying and classifying named entities (people,
organizations, locations).

Tools for Text Analysis:

• R: A powerful programming language with many libraries for text analysis (like
tidytext, tm, sentimentr).
• Python: Another popular language with libraries like NLTK, spaCy, and scikit-learn.

3
Text analysis is a rapidly growing field with applications in various domains, including
business, marketing, social sciences, and even healthcare.

Text Analysis Flowchart

Import Data

dplyer, tidyverse, pdftools, VCorpus

Data Cleaning

tm, tidytext, textstem

Lemma

ggplot2
Plot

wordcloud

Word cloud

syuzhet

Sentiment
Analysis

4
2 Text Analysis in R
2.1 Text Analysis of a text ta(a)
library(dplyr)
library(tm)
library(ggplot2)
library(tidyverse)
library(tidytext)
library(pdftools)
library(wordcloud)
library(textstem)
# First Text mining Process
text_data <- data.frame(cbind(
id = 1:6,
text = c ( " I am Anubha - finance teacher eager to explore the potential of AI in education. ",
" My goal is to provide the best possible learning experience for my students.",
" I believe that AI tools can revolutionise the way we teach and learn.",
" I am open to collaborating with anyone who shares my passion for using AI to enhance
education.",
" I am a firm believer in student-centred learning and inquiry-based learning. ",
"I love data science. Data science is amazing!")
))

2.2 Text Analysis of a text ta(b)

library(dplyr)
library(tm)
library(ggplot2)
library(tidyverse)
library(tidytext)
library(pdftools)
library(wordcloud)
library(textstem)
# First Text mining Process

5
text_data <- data.frame(cbind(
id = 1:6,
text = c ( " I am Anubha - finance teacher eager to explore the potential of AI in education. ",
" My goal is to provide the best possible learning experience for my students.",
" I believe that AI tools can revolutionise the way we teach and learn.",
" I am open to collaborating with anyone who shares my passion for using AI to enhance
education.",
" I am a firm believer in student-centred learning and inquiry-based learning. ",
"I love data science. Data science is amazing!")
))

#A data frame source interprets each row of the data frame x as a document.
#The first column must be named "doc_id" and contain a unique string identifier for each document.
#The second column must be named "text"
colnames(text_data)=c('doc_id','text')
crp<- VCorpus(DataframeSource(text_data))
#VCorpus stands for Volatile Corpus (VCorpus is suitable for smaller datasets that can be comfortably
held in memory)
print(crp[[1]])
#tm_map() allows you to apply a specified function to each document within a corpus.
crp<- tm_map(crp, content_transformer(tolower))
print(crp[[1]]$content)
crp<- tm_map(crp, stripWhitespace) # removes whitespaces
crp<- tm_map(crp, removePunctuation) # removes punctuations
crp<- tm_map(crp, removeNumbers) # removes numbers
crp<- tm_map(crp, removeWords, stopwords("english"))
#Examples of common English stop words:
##Articles: a, an, the
##Prepositions: in, on, at, to, from, with, for
##Conjunctions: and, but, or, if, because
##Pronouns: I, you, he, she, it, they, we, me, him, her, them, us
##Other: no, not, only, very, this, that, these, those
mystopwords<- c(stopwords("english"),"anubha")
crp<- tm_map(crp, removeWords, mystopwords)

# Lemmatization (Lemmatization in Natural Language Processing (NLP) is the process of reducing a

word to its base or dictionary form, known as the lemma)
crp<- tm_map(crp, content_transformer(lemmatize_strings))
print(crp[[1]]$content)
review_corpus<- crp

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# inspect frequent words
freq_terms<- findFreqTerms(tdm, lowfreq=1)

term_freq<- rowSums(as.matrix(tdm))
term_freq<- subset(term_freq, term_freq>=1)
df<- data.frame(term = names(term_freq), freq = term_freq)

6
2.3 Text Analysis of a text ta(c)
library(dplyr)
library(tm)
library(ggplot2)
library(tidyverse)
library(tidytext)
library(pdftools)
library(wordcloud)
library(textstem)
# First Text mining Process
text_data <- data.frame(cbind(
id = 1:6,
text = c ( " I am Anubha - finance teacher eager to explore the potential of AI in education. ",
" My goal is to provide the best possible learning experience for my students.",
" I believe that AI tools can revolutionise the way we teach and learn.",
" I am open to collaborating with anyone who shares my passion for using AI to enhance
education.",
" I am a firm believer in student-centred learning and inquiry-based learning. ",
"I love data science. Data science is amazing!")
))

print(crp[[1]])

crp<- tm_map(crp, content_transformer(tolower))

print(crp[[1]]$content)
crp<- tm_map(crp, stripWhitespace) # removes whitespaces
crp<- tm_map(crp, removePunctuation) # removes punctuations
crp<- tm_map(crp, removeNumbers) # removes numbers
crp<- tm_map(crp, removeWords, stopwords("english"))
mystopwords<- c(stopwords("english"),"anubha", "godara")
crp<- tm_map(crp, removeWords, mystopwords)

# Lemmatization
crp<- tm_map(crp, content_transformer(lemmatize_strings))
print(crp[[1]]$content)
review_corpus<- crp

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# inspect frequent words
freq_terms<- findFreqTerms(tdm, lowfreq=1)

7
term_freq<- rowSums(as.matrix(tdm))
term_freq<- subset(term_freq, term_freq>=1)
df<- data.frame(term = names(term_freq), freq = term_freq)

#association of words
find_assocs= findAssocs(tdm,"text",corlimit = 0.1)

# Now plotting the top frequent words

library(ggplot2)

df_plot<- df %>%
top_n(10)

# Plot word frequency

ggplot(df_plot, aes(x = reorder(term, +freq), y = freq, fill = freq)) +
geom_bar(stat = "identity")+ scale_colour_gradientn(colors = terrain.colors(10))+ xlab("Terms")+
ylab("Count")+coord_flip()

# Create word cloud

wordcloud(words = df$term, freq = df$freq, min.freq = 1,
random.order = FALSE, colors = brewer.pal(8, "Dark2"))

2.4 Sentiment Analysis of a text ta(d)

library(dplyr)
library(tm)
library(ggplot2)
library(tidyverse)
library(tidytext)
library(pdftools)
library(wordcloud)
library(textstem)
# First Text mining Process
text_data <- data.frame(cbind(
id = 1:6,
text = c ( " I am Anubha - finance teacher eager to explore the potential of AI in education. ",
" My goal is to provide the best possible learning experience for my students.",
" I believe that AI tools can revolutionise the way we teach and learn.",
" I am open to collaborating with anyone who shares my passion for using AI to enhance
education.",
" I am a firm believer in student-centred learning and inquiry-based learning. ",
"I love data science. Data science is amazing!")
))

8
colnames(text_data)=c('doc_id','text')
crp<- VCorpus(DataframeSource(text_data))

print(crp[[1]])

crp<- tm_map(crp, content_transformer(tolower))

# Lemmatization
crp<- tm_map(crp, content_transformer(lemmatize_strings))
print(crp[[1]]$content)
review_corpus<- crp

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# inspect frequent words
freq_terms<- findFreqTerms(tdm, lowfreq=1)

term_freq<- rowSums(as.matrix(tdm))
term_freq<- subset(term_freq, term_freq>=1)
df<- data.frame(term = names(term_freq), freq = term_freq)

#association of words
find_assocs= findAssocs(tdm,"text",corlimit = 0.1)

# Now plotting the top frequent words

library(ggplot2)

df_plot<- df %>%
top_n(10)

# Plot word frequency

ggplot(df_plot, aes(x = reorder(term, +freq), y = freq, fill = freq)) +
geom_bar(stat = "identity")+ scale_colour_gradientn(colors = terrain.colors(10))+ xlab("Terms")+
ylab("Count")+coord_flip()

# Create word cloud

wordcloud(words = df$term, freq = df$freq, min.freq = 1,
random.order = FALSE, colors = brewer.pal(8, "Dark2"))

# Get sentiment lexicon

sentiment_lexicon <- get_sentiments("bing")

9
colnames(df)[1]='word'
# Perform sentiment analysis
sentiment_analysis <- df %>%
inner_join(sentiment_lexicon, by=("word")) %>%
count(word, sentiment, sort = TRUE)

# View sentiment analysis

print(sentiment_analysis)

2.5 Sentiment Analysis of a CSV file ta(e)

library(dplyr)
library(tm)
library(ggplot2)
library(tidyverse)
library(tidytext)
library(pdftools)
library(wordcloud)
library(textstem)
#Analysing Excel Text
data=read.csv("C:/Users/ADMIN/OneDrive/Desktop/R/AM/amazon_vfl_reviews_session2.csv")
summary(data)
str(data)
data$sn=seq(1,nrow(data))

colnames(data)[c(6,5)]=c('doc_id','text')

crp<- VCorpus(DataframeSource(data))

print(crp[[1]])
crp<- tm_map(crp, content_transformer(tolower))
print(crp[[1]]$content)
crp<- tm_map(crp, stripWhitespace) # removes whitespaces
crp<- tm_map(crp, removePunctuation) # removes punctuations
crp<- tm_map(crp, removeNumbers) # removes numbers
crp<- tm_map(crp, removeWords, stopwords("english"))
mystopwords<- c(stopwords("english"),"book","people")
crp<- tm_map(crp, removeWords, mystopwords)

# Lemmatization
crp<- tm_map(crp, content_transformer(lemmatize_strings))
print(crp[[1]]$content)
review_corpus<- crp

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# inspect frequent words
freq_terms<- findFreqTerms(tdm, lowfreq=1)

10
term_freq<- rowSums(as.matrix(tdm))
term_freq<- subset(term_freq, term_freq>=1)
df<- data.frame(term = names(term_freq), freq = term_freq)

#association of words
find_assocs= findAssocs(tdm,"text",corlimit = 0.1)

# Now plotting the top frequent words

library(ggplot2)

df_plot<- df %>%
top_n(10)

# Plot word frequency

ggplot(df_plot, aes(x = reorder(term, +freq), y = freq, fill = freq)) +
geom_bar(stat = "identity")+ scale_colour_gradientn(colors = terrain.colors(10))+ xlab("Terms")+
ylab("Count")+coord_flip()

# Create word cloud

wordcloud(words = df$term, freq = df$freq, min.freq = 1,
random.order = FALSE, colors = brewer.pal(8, "Dark2"))

# Get sentiment lexicon

sentiment_lexicon <- get_sentiments("bing")

colnames(df)[1]='word'
# Perform sentiment analysis
sentiment_analysis <- df %>%
inner_join(sentiment_lexicon, by=("word")) %>%
count(word, sentiment, sort = TRUE)

# View sentiment analysis

print(sentiment_analysis)

2.6 Sentiment Analysis of a PDF file ta(f)

#Read PDF Files
##Reading PDF Files From location
#identifying multiple pdf files from folder
library(pdftools)
library(tm)
stop_words2=c(stopwords("en"),"makes")
setwd("C:/Users/ADMIN/OneDrive/Desktop/R/AM/PDF")

files<- list.files(pattern = "pdf$")

files #files contain the named vector of pdf files

11
read_function<- readPDF(control=list(text="-layout"))
read_corpus<- Corpus(URISource(files[1:5]),readerControl = list(reader=read_function))

read_corpus<-tm_map(read_corpus,removePunctuation)

dtm <- DocumentTermMatrix(read_corpus, control = list(removePunctuation = TRUE, stopwords =

TRUE, tolower = TRUE, removeNumbers = TRUE, stemDocument = TRUE, bounds = list(global = c(3,
Inf))))

dtm_matrix<-as.matrix(dtm) # converting dtm to a matrix so that data becomes viewable

#some inverted commas, hastags etc are not removed from "remove punctuation"
# so we can use textclean package for those cases.

View(dtm_matrix) # running this might take 5 to 10 seconds as it shows the word count of each
word in 15 pdfs

dtm_matrix<-t(dtm_matrix) # to show the data in strutured format

number_occurance<- rowSums(dtm_matrix) #use rowSums not rowsum as this is matrix

number_occurance[1:20] #using the squared brackets to fix the number of words

number_occurance_sorted <- sort(number_occurance,decreasing = TRUE)

number_occurance_sorted[1:20] #using the squared brackets to fix the number of words

library(wordcloud)
set.seed(123)
wordcloud(names(number_occurance_sorted), number_occurance_sorted, max.words=25,
scale=c(3, .1), colors=brewer.pal(6, "Dark2"))

cor_word <- findAssocs(dtm, "marketing" , corlimit = 0.2)

cor_word$marketing[1:20] #as we are corelating with marketing

library(treemap)

data_frame<- data.frame(word=names(number_occurance_sorted),
freq=number_occurance_sorted)
data_frame[1:20,]

#Enter the minimum Frequency in a Word Tree

treemap(subset(data_frame,number_occurance_sorted>10), index = c('word'), vSize = 'freq')

#Enter How many words you want to enter

treemap(data_frame[1:10,], index = c('word'), vSize = 'freq')

#cluster analysis
distance<-dist(data_frame[1:20,] )

12
distance
clust<- hclust(distance)
plot(clust) #hang=-1 for symetric cluster roots

2.7 Sentiment Analysis of Chapter 7.1 (HKD)

#install readxl
install.packages("readxl")
library(readxl)
#Replace "our_pdf_file.xlxs" with the actual path to your EXCEL file
reviews<-read_excel("C:/Users/ADMIN/OneDrive/Desktop/R/R Data/socialmediareviews.xlsx")
#install tm
install.packages("tm")
library(tm)
review_corp<-VCorpus(VectorSource(reviews$reviews))
review_corp[1][2]
review_corp<-
tm_map(review_corp,removeWords,c("now","Know","took","that's","air","away","war","Know","jo
b","one","like","actually","new","guy","don't","things","lot","try","bit","don't","don't","anything","t
hing","say","also","can","get","used","got","take","just","now","will","it's","want","whatever","beco
me","that's","said","given","give","much"))
head(reviews)
tail(reviews)
tdm <- TermDocumentMatrix (review_corp, control = list (removePunctuation = TRUE, stopwords =
TRUE))
tdm_matrix<- as.matrix(tdm)
tdm_matrix<-t(tdm_matrix)
tdm_matrix[1:20]
number_occurrence <- rowSums(tdm_matrix)
number_occurrence[1:20]
number_occurrence_sorted<-sort(number_occurrence ,decreasing=TRUE)
number_occurrence_sorted[1:60]

2.8 Sentiment Analysis of Chapter 7.2(HKD)

13
hing","say","also","can","get","used","got","take","just","now","will","it's","want","whatever","beco
me","that's","said","given","give","much"))
head(reviews)
tail(reviews)
tdm <- TermDocumentMatrix (review_corp, control = list (removePunctuation = TRUE, stopwords =
TRUE))
tdm_matrix<- as.matrix(tdm)
tdm_matrix<-t(tdm_matrix)
tdm_matrix[1:20]
number_occurrence <- rowSums(tdm_matrix)
number_occurrence[1:20]
number_occurrence_sorted<-sort(number_occurrence ,decreasing=TRUE)
number_occurrence_sorted[1:60]

library(wordcloud)
wordcloud(names(number_occurrence_sorted), number_occurrence_sorted, max.words=25,
scale=c(3, .1), colors=brewer.pal(6, "Dark2"))

#association of words
cor_word <- findAssocs(dtm, "time" , corlimit = 0.1)
cor_word$time

library(treemap)
data_frame<- data.frame(word=names(number_occurrence_sorted),
freq=number_occurrence_sorted)
data_frame[1:20,]

#Enter the minimum Frequency in a Word Tree

treemap(subset(data_frame,number_occurrence_sorted>10), index = c('word'), vSize = 'freq')

#Enter How many words you want to enter

treemap(data_frame[1:10,], index = c('word'), vSize = 'freq')

#cluster analysis
distance<-dist(data_frame[1:20,] )
distance
clust<- hclust(distance)
plot(clust)

#sentiment analysis
library(syuzhet)
sent_corpus<- iconv(reviews$reviews)
review_sent<- get_nrc_sentiment(sent_corpus)
head(review_sent)
sentiment_counts <- colSums(review_sent)
barplot(sentiment_counts, las=2, col=rainbow(10), ylab='Count', main= 'Sentiment reviews')

14
#Important Notes:
#Accuracy: Sentiment analysis accuracy depends heavily on the quality of the text data, the
chosen lexicon or model, and the complexity of the text.
#Lexicon Selection: The sentimentr package uses a built-in lexicon. You can explore other
lexicons (e.g., Bing Liu, AFINN) for potentially better results.
#Advanced Techniques: For more sophisticated sentiment analysis, consider using machine
learning models like Naive Bayes or Support Vector Machines.
#Error Handling: Implement robust error handling for potential issues like invalid PDF files
or unexpected text formats.

Murach's SQL Server 2019 For Developers Isbn139781943872572
100% (2)
Murach's SQL Server 2019 For Developers Isbn139781943872572
700 pages
Text Mining Code
No ratings yet
Text Mining Code
3 pages
Data Science Capstone - Week 2 Milestone - Exploratory Data Analysis On Text Files
No ratings yet
Data Science Capstone - Week 2 Milestone - Exploratory Data Analysis On Text Files
7 pages
Text Mining Code
No ratings yet
Text Mining Code
2 pages
Business Analytics CA3
No ratings yet
Business Analytics CA3
11 pages
Step 1: Create A CSV File: # For Text Mining
No ratings yet
Step 1: Create A CSV File: # For Text Mining
9 pages
Raj_Dv_exp5
No ratings yet
Raj_Dv_exp5
6 pages
Data Science With R Text Mining by Graham Williams
No ratings yet
Data Science With R Text Mining by Graham Williams
21 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
35 pages
R语言基础入门指令 (tips)
No ratings yet
R语言基础入门指令 (tips)
14 pages
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
No ratings yet
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
6 pages
5 Paso S Text Mining
No ratings yet
5 Paso S Text Mining
4 pages
Hands-On Data Science With R Text Mining
No ratings yet
Hands-On Data Science With R Text Mining
41 pages
EBUS622 - Week 5 - Lecture - Text Preparation
No ratings yet
EBUS622 - Week 5 - Lecture - Text Preparation
40 pages
Lab5 Instructions
No ratings yet
Lab5 Instructions
51 pages
Quanteda
No ratings yet
Quanteda
2 pages
Hands-On Data Science With R Text Mining: 10th January 2016
No ratings yet
Hands-On Data Science With R Text Mining: 10th January 2016
47 pages
Text Analysis
No ratings yet
Text Analysis
13 pages
R Code NB
No ratings yet
R Code NB
3 pages
Tmcode Text Mining
No ratings yet
Tmcode Text Mining
2 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
34 pages
A Tutorial of Text Mining in R Using TM Package
No ratings yet
A Tutorial of Text Mining in R Using TM Package
6 pages
Text Mining in R: A Tutorial
No ratings yet
Text Mining in R: A Tutorial
7 pages
Ejercicio #1
No ratings yet
Ejercicio #1
3 pages
SMTA QBnew
No ratings yet
SMTA QBnew
3 pages
Amazon Food Review Notes
No ratings yet
Amazon Food Review Notes
37 pages
Packages Which Are Used For Above Analysis
No ratings yet
Packages Which Are Used For Above Analysis
4 pages
EXP5
No ratings yet
EXP5
15 pages
Big data
No ratings yet
Big data
5 pages
Text Analytics Notes
No ratings yet
Text Analytics Notes
12 pages
Order Tasks and Milestones Assignment
No ratings yet
Order Tasks and Milestones Assignment
6 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
21 pages
Bradzil Classif withTM
No ratings yet
Bradzil Classif withTM
16 pages
2019 06 27 - Muenster
No ratings yet
2019 06 27 - Muenster
218 pages
Statistical NLP
No ratings yet
Statistical NLP
45 pages
SMTA - Lab Record - Aim, Procedures and Results
No ratings yet
SMTA - Lab Record - Aim, Procedures and Results
31 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
Lecture 8
No ratings yet
Lecture 8
45 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Package Sentimentr': R Topics Documented
No ratings yet
Package Sentimentr': R Topics Documented
49 pages
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Quanteda PDF
No ratings yet
Quanteda PDF
2 pages
Module 8 - Text - Update
No ratings yet
Module 8 - Text - Update
42 pages
Cheat Sheet: Extract Features
No ratings yet
Cheat Sheet: Extract Features
2 pages
05b.BDA (18CS72) Module-5 Text Mining
No ratings yet
05b.BDA (18CS72) Module-5 Text Mining
23 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
chp_5
No ratings yet
chp_5
57 pages
British_Airways_Forage_Report
No ratings yet
British_Airways_Forage_Report
12 pages
Introduction To Text Visualization by Nan Cao, Weiwei Cui (Auth.)
No ratings yet
Introduction To Text Visualization by Nan Cao, Weiwei Cui (Auth.)
122 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Text Analysis: Why Do We Need Text Analytics
No ratings yet
Text Analysis: Why Do We Need Text Analytics
2 pages
Content Modeling For Social Media Text: Christina Sauper
No ratings yet
Content Modeling For Social Media Text: Christina Sauper
136 pages
MATLAB Text Analytics Toolbox User s Guide The Mathworks instant download
100% (4)
MATLAB Text Analytics Toolbox User s Guide The Mathworks instant download
56 pages
Text Analysis in R
No ratings yet
Text Analysis in R
21 pages
Diary Topic
No ratings yet
Diary Topic
5 pages
Tidy Text
No ratings yet
Tidy Text
39 pages
Amazon Sentimental Analysis
No ratings yet
Amazon Sentimental Analysis
8 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Word Cloud
No ratings yet
Word Cloud
10 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Unit 4 CS 3RD Yr
No ratings yet
Unit 4 CS 3RD Yr
13 pages
Book Review
No ratings yet
Book Review
4 pages
22BC355_SEC J_ASSIGNMENT
No ratings yet
22BC355_SEC J_ASSIGNMENT
11 pages
22BC355_SEC-J_IRLL-4_ASSIGNMENT
No ratings yet
22BC355_SEC-J_IRLL-4_ASSIGNMENT
11 pages
Tute
No ratings yet
Tute
1 page
03 - Image Segmentation
No ratings yet
03 - Image Segmentation
45 pages
Python Programming Language
No ratings yet
Python Programming Language
11 pages
Manual 3994438
No ratings yet
Manual 3994438
94 pages
Micromedex Tutorial
No ratings yet
Micromedex Tutorial
78 pages
POC - Test Automation Tool Selection
No ratings yet
POC - Test Automation Tool Selection
9 pages
This is a 101 Hybrid Chip Process Online 1
No ratings yet
This is a 101 Hybrid Chip Process Online 1
12 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Tcd-Vfi Ii
No ratings yet
Tcd-Vfi Ii
128 pages
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
No ratings yet
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
90 pages
Emotion Recognition From Facial Expression of Autism Spectrum Disordered Children Using Image Processing and Machine Learning Algorithms
No ratings yet
Emotion Recognition From Facial Expression of Autism Spectrum Disordered Children Using Image Processing and Machine Learning Algorithms
47 pages
Chapter 4 Internet and Internet Services PDF
No ratings yet
Chapter 4 Internet and Internet Services PDF
25 pages
Introduction To Information Systems: ITEC 1010 Information and Organizations
No ratings yet
Introduction To Information Systems: ITEC 1010 Information and Organizations
77 pages
How To Bypass Internet Connection To Install Windows 11
No ratings yet
How To Bypass Internet Connection To Install Windows 11
8 pages
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
No ratings yet
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
87 pages
LS6 Lesson 3 How To Turn On Off Restart A Desktop Computer
No ratings yet
LS6 Lesson 3 How To Turn On Off Restart A Desktop Computer
8 pages
Tia Portal V17 Technical Highlights
No ratings yet
Tia Portal V17 Technical Highlights
50 pages
Final Time Table For Summer 2024 Theory Examination
No ratings yet
Final Time Table For Summer 2024 Theory Examination
9 pages
Semi-Detailed Lesson Plan I. Objectives: Ii. Content Iii. Learning Resources
No ratings yet
Semi-Detailed Lesson Plan I. Objectives: Ii. Content Iii. Learning Resources
7 pages
Fact Sheet Practice Lab
100% (1)
Fact Sheet Practice Lab
2 pages
Differential Calculus
No ratings yet
Differential Calculus
59 pages
Human Activities Classifier Using SVM
No ratings yet
Human Activities Classifier Using SVM
19 pages
On The Use of Github Actions in Software Development Repositories
No ratings yet
On The Use of Github Actions in Software Development Repositories
11 pages
3DX 3D Sculptor Datasheet
No ratings yet
3DX 3D Sculptor Datasheet
2 pages
Question Bank 2
No ratings yet
Question Bank 2
4 pages
Chapter 5
No ratings yet
Chapter 5
17 pages
Ed428574 PDF
No ratings yet
Ed428574 PDF
194 pages
Unit #2 - Install and Configure Wireless Access Point
No ratings yet
Unit #2 - Install and Configure Wireless Access Point
53 pages
Research Internet Surfing
No ratings yet
Research Internet Surfing
16 pages
Convert Internal Table Data Into HTML Format Without Using Function Modules
No ratings yet
Convert Internal Table Data Into HTML Format Without Using Function Modules
4 pages

Text Analysis

Uploaded by

Text Analysis

Uploaded by

Text

Think of it this way:

Text analysis helps you:

Let’s take a simple example

• Identify common words: "delicious," "tasty," "service," "slow," "friendly,"

Key Techniques in Text Analysis:

• Tokenisation: Breaking down text into individual words or sentences.

Tools for Text Analysis:

Text Analysis Flowchart

dplyer, tidyverse, pdftools, VCorpus

tm, tidytext, textstem

2.2 Text Analysis of a text ta(b)

# Lemmatization (Lemmatization in Natural Language Processing (NLP) is the process of reducing a

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

crp<- tm_map(crp, content_transformer(tolower))

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# Now plotting the top frequent words

# Plot word frequency

# Create word cloud

2.4 Sentiment Analysis of a text ta(d)

crp<- tm_map(crp, content_transformer(tolower))

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# Now plotting the top frequent words

# Plot word frequency

# Create word cloud

# Get sentiment lexicon

# View sentiment analysis

2.5 Sentiment Analysis of a CSV file ta(e)

tdm<- TermDocumentMatrix(review_corpus, control = list(wordlengths = c(1,Inf)))

# Now plotting the top frequent words

# Plot word frequency

# Create word cloud

# Get sentiment lexicon

# View sentiment analysis

2.6 Sentiment Analysis of a PDF file ta(f)

files<- list.files(pattern = "pdf$")

dtm <- DocumentTermMatrix(read_corpus, control = list(removePunctuation = TRUE, stopwords =

dtm_matrix<-as.matrix(dtm) # converting dtm to a matrix so that data becomes viewable

dtm_matrix<-t(dtm_matrix) # to show the data in strutured format

number_occurance<- rowSums(dtm_matrix) #use rowSums not rowsum as this is matrix

number_occurance_sorted <- sort(number_occurance,decreasing = TRUE)

cor_word <- findAssocs(dtm, "marketing" , corlimit = 0.2)

#Enter the minimum Frequency in a Word Tree

#Enter How many words you want to enter

2.7 Sentiment Analysis of Chapter 7.1 (HKD)

2.8 Sentiment Analysis of Chapter 7.2(HKD)

#Enter the minimum Frequency in a Word Tree

#Enter How many words you want to enter

You might also like