0% found this document useful (0 votes)

425 views4 pages

Twittermining: 1 Twitter Text Mining - Required Libraries

This document outlines the steps to perform sentiment analysis on tweets about Samsung Galaxy phones collected from Seattle, Washington. It establishes a connection to the Twitter API, defines a function to score sentiment, searches Twitter for relevant tweets, cleans the tweet text, loads sentiment dictionaries, scores the tweets, extracts additional tweet metadata and writes the results to a CSV file.

Uploaded by

Samuel Peoples

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

425 views4 pages

Twittermining: 1 Twitter Text Mining - Required Libraries

Uploaded by

Samuel Peoples

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

TwitterMining

December 23, 2017

1 Twitter Text Mining - Required Libraries

In [2]: library(twitteR)
library(ROAuth)
library(RCurl)
library(httr)

library(stringr)
library(plyr)
library(dplyr)
library(tm)

#library(ggmap)
#library(wordcloud)

2 Establishing A Connection - Direct Method

Enter your key and token from your twitter developer page

In [ ]: key=" "
secret=" "

atoken = " "

asecret = " "

setup_twitter_oauth(key, secret, atoken, asecret)

3 Sentiment Score Function - approach after J. Breen

In [ ]: library("stringr")

library("plyr")

# Function is called sentimentfun

sentimentfun = function(tweettext, pos, neg, .progress='non')
{

1
# Parameters
# tweettext: vector of text to score
# pos: vector of words of postive sentiment
# neg: vector of words of negative sentiment
# .progress: passed to laply() 4 control of progress bar

# create simple array of scores with laply

scores = laply(tweettext,
function(singletweet, pos, neg){
# remove punctuation - using global substitute
singletweet = gsub("[[:punct:]]", "", singletweet)
# remove control characters
singletweet = gsub("[[:cntrl:]]", "", singletweet)
# remove digits
singletweet = gsub("\\d+", "", singletweet)

# define error handling function when trying tolower

tryTolower = function(x){
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)}
# use tryTolower with sapply
singletweet = sapply(singletweet, tryTolower)

# split sentence into words with str_split (stringr package)

word.list = str_split(singletweet, "\\s+")
words = unlist(word.list)

# compare words to the dictionaries of positive & negative terms

pos.matches = match(words, pos)
neg.matches = match(words, neg)

# get the position of the matched term or NA

# we just want a TRUE/FALSE
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)

# final score
score = sum(pos.matches) - sum(neg.matches)
return(score)},
pos, neg, .progress=.progress)

2
# data frame with scores for each sentence
sentiment.df = data.frame(text=tweettext, score=scores)
return(sentiment.df)
}

4 Using searchTwitter for our project

Los Angeles, geocode="34.052,-118.244,200mi"

New York, geocode="40.713,-74.006,200mi"
Austin, geocode="30.267,-97.743,500mi"
Seattle, geocode="47.606,-122.332,500mi"

4.0.1 Searching for 'apple+iphone' or 'samsung+galaxy'

4.0.2 Since is always 14 days prior to run-date, due to API restrictions

In [ ]: tweets =searchTwitter("samsung+galaxy", n=2000,

lang="en",
geocode="47.606,-122.332,500mi",
since = "2017-12-04")

5 Extracting the text

In [ ]: tweettext = sapply(tweets, function(x) x$getText())

5.1 First cleaning stage

In [ ]: tweettext=lapply(tweettext, function(x) iconv(x, "latin1",
"ASCII", sub=""))
tweettext=lapply(tweettext, function(x) gsub("htt.*",' ',x))
tweettext=lapply(tweettext, function(x) gsub("#",'',x))
tweettext=unlist(tweettext)

6 Getting the opinion lexicons from working directory

In [ ]: pos = readLines("positive_words.txt")
neg = readLines("negative_words.txt")

neg2 = c(neg, "bearish", "fraud"); tail(neg2)

6.1 Apply function score.sentiment

In [ ]: scores = sentimentfun(tweettext, pos, neg, .progress='text')

3
7 Extracting further elements (besides text) for the export csv

In [ ]: tweetdate=lapply(tweets, function(x) x$getCreated())

tweetdate=sapply(
tweetdate,function(x) strftime(
x, format="%Y-%m-%d %H:%M:%S",tz = "UTC"))

isretweet=sapply(tweets, function(x) x$getIsRetweet())

retweetcount=sapply(tweets, function(x) x$getRetweetCount())

favoritecount=sapply(tweets, function(x) x$getFavoriteCount())

8 Creating the Data Frame

In [ ]: data=as.data.frame(cbind(ttext=tweettext,
date=tweetdate,
isretweet=isretweet,
retweetcount=retweetcount,
favoritecount=favoritecount,
score = scores$score,
product = "Samsung Galaxy",
city = "Seattle", country = "USA"))

8.1 Remove duplicates

In [ ]: data2 = duplicated(data[,1])
data$duplicate = data2

8.2 Create le to wd

In [ ]: write.csv(data, file= "samsung_seattle.csv")

K Means
No ratings yet
K Means
3 pages
Site Instruction and Variation Order Procedures For Contractors Manual PDF
100% (3)
Site Instruction and Variation Order Procedures For Contractors Manual PDF
31 pages
Accounting and Finance Notes For Final Exam
No ratings yet
Accounting and Finance Notes For Final Exam
5 pages
ZPPDoubleSuctionPumps en E00502 5 2008
No ratings yet
ZPPDoubleSuctionPumps en E00502 5 2008
8 pages
Twitter Sentiment Analysis Project Idea
No ratings yet
Twitter Sentiment Analysis Project Idea
3 pages
Sentiment Analysis For Twitter Comments Project Exp
No ratings yet
Sentiment Analysis For Twitter Comments Project Exp
5 pages
Client Duties Case Preparation
No ratings yet
Client Duties Case Preparation
2 pages
Cool Bot Pro Spec Sheet 2020
No ratings yet
Cool Bot Pro Spec Sheet 2020
1 page
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
No ratings yet
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
6 pages
Transcript
No ratings yet
Transcript
12 pages
Is Homework Harmful or Helpful To Students
100% (1)
Is Homework Harmful or Helpful To Students
4 pages
ChatGPT Premium Guide
67% (3)
ChatGPT Premium Guide
152 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
4th Sem CSIT Mini - Project - Abstract Format
No ratings yet
4th Sem CSIT Mini - Project - Abstract Format
2 pages
Sample 1
No ratings yet
Sample 1
22 pages
SMTA - Lab Record - Aim, Procedures and Results
No ratings yet
SMTA - Lab Record - Aim, Procedures and Results
31 pages
Wong 2010
No ratings yet
Wong 2010
27 pages
Onboarding Guide
No ratings yet
Onboarding Guide
18 pages
Lab5 Instructions
No ratings yet
Lab5 Instructions
51 pages
Part C Assignment No 2 Mini Project On Twitter 1
No ratings yet
Part C Assignment No 2 Mini Project On Twitter 1
9 pages
Alphamaquet 1150 Brochure en PDF
No ratings yet
Alphamaquet 1150 Brochure en PDF
24 pages
Data Science Project
No ratings yet
Data Science Project
34 pages
2.2. BASIC Work in Team Environment
No ratings yet
2.2. BASIC Work in Team Environment
3 pages
Fin Ijprems1714118825
No ratings yet
Fin Ijprems1714118825
6 pages
Nop 180
No ratings yet
Nop 180
2 pages
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
No ratings yet
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
10 pages
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
No ratings yet
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
2 pages
Adjusting Review 2
No ratings yet
Adjusting Review 2
9 pages
12-Plinth Beams Layout PDF
No ratings yet
12-Plinth Beams Layout PDF
1 page
Se Write-Up
No ratings yet
Se Write-Up
2 pages
Chapter 26 Text Mining - Introduction To Data Science
No ratings yet
Chapter 26 Text Mining - Introduction To Data Science
20 pages
Lab No 6 - Twitter - Neuro
No ratings yet
Lab No 6 - Twitter - Neuro
2 pages
Geometric Design For Highways and Railways Including Cross Sections Horizontal and Vertical Alignments Super Elevation and Earthworks - Compress
No ratings yet
Geometric Design For Highways and Railways Including Cross Sections Horizontal and Vertical Alignments Super Elevation and Earthworks - Compress
23 pages
1,6 Hexanediamine
No ratings yet
1,6 Hexanediamine
7 pages
Lecture 8
No ratings yet
Lecture 8
45 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Microbial Network Analysis
No ratings yet
Microbial Network Analysis
24 pages
Shiitake Mushroom Handbook
100% (3)
Shiitake Mushroom Handbook
290 pages
Project: Date:: Short-Circuit Summary Report
No ratings yet
Project: Date:: Short-Circuit Summary Report
1 page
DSDM Unit4
No ratings yet
DSDM Unit4
31 pages
Face Recognition J Up y Ter
No ratings yet
Face Recognition J Up y Ter
5 pages
Poisonousmushrooms: 1 Importing The Libraries
No ratings yet
Poisonousmushrooms: 1 Importing The Libraries
8 pages
Notes
No ratings yet
Notes
6 pages
Lab Report - CSE 816
No ratings yet
Lab Report - CSE 816
17 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
Cryptocurrency Optimization: 1 Import The Libraries
No ratings yet
Cryptocurrency Optimization: 1 Import The Libraries
8 pages
Assignment 5 - MLDS Lab
No ratings yet
Assignment 5 - MLDS Lab
4 pages
Social Media
No ratings yet
Social Media
7 pages
FD ch5 PPT Hull
No ratings yet
FD ch5 PPT Hull
37 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Rdso LHB Modifications - BSB - 21.08.2023
No ratings yet
Rdso LHB Modifications - BSB - 21.08.2023
3 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
Descrição Do Pacote
No ratings yet
Descrição Do Pacote
10 pages
CE Certificate
No ratings yet
CE Certificate
3 pages
348 PMP ® Exam Practice Test and Study Guide
No ratings yet
348 PMP ® Exam Practice Test and Study Guide
70 pages
13 Chapter 6 PSO GA DT
No ratings yet
13 Chapter 6 PSO GA DT
11 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Vaibhav DSBDA Project
No ratings yet
Vaibhav DSBDA Project
16 pages
Presentation TCS
No ratings yet
Presentation TCS
15 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
2007 Bakery Whole Catalog
No ratings yet
2007 Bakery Whole Catalog
13 pages
Twitter BDA Presentation
No ratings yet
Twitter BDA Presentation
15 pages
The Raine Report Issue 02
No ratings yet
The Raine Report Issue 02
51 pages
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
No ratings yet
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
13 pages
Restricting Unsolicited Approaches and Counterfeit Users: Batch No: 28 Guided by Done by
No ratings yet
Restricting Unsolicited Approaches and Counterfeit Users: Batch No: 28 Guided by Done by
28 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
25 pages
Twitter Sentiment Analysis For Product Review
No ratings yet
Twitter Sentiment Analysis For Product Review
19 pages
Rtweet Workshop PDF
No ratings yet
Rtweet Workshop PDF
60 pages
Sentiment Analysis of Twitter Data: Radhi D. Desai
No ratings yet
Sentiment Analysis of Twitter Data: Radhi D. Desai
4 pages
Step 1: Create A CSV File: # For Text Mining
No ratings yet
Step 1: Create A CSV File: # For Text Mining
9 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
Review Analysis Using R Software: Team Members
No ratings yet
Review Analysis Using R Software: Team Members
10 pages
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
No ratings yet
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
5 pages
Twitter Sentiment Analysis Using Python
No ratings yet
Twitter Sentiment Analysis Using Python
21 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
Harvesting and Analyzing Tweets Using R
No ratings yet
Harvesting and Analyzing Tweets Using R
23 pages
Analysis of Twitter Data Using R - Part 1 - Twitter Authentication
No ratings yet
Analysis of Twitter Data Using R - Part 1 - Twitter Authentication
7 pages
Text Mining Using Python
No ratings yet
Text Mining Using Python
1 page
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
Design Review
No ratings yet
Design Review
16 pages
Abstract
No ratings yet
Abstract
2 pages
PPPT
No ratings yet
PPPT
20 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
35 pages
DMW Project Report by Saurabh Zingade
No ratings yet
DMW Project Report by Saurabh Zingade
16 pages
Sentiment Analysis of Online Data For Business Analytics: Synopsis
No ratings yet
Sentiment Analysis of Online Data For Business Analytics: Synopsis
6 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
SocialMediaLab Package Tutorial
No ratings yet
SocialMediaLab Package Tutorial
17 pages
RDataMining Slides Twitter Analysis
100% (1)
RDataMining Slides Twitter Analysis
40 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
34 pages