Amber Heard - Twitter BotScores Machine Learning Analysis Report
Amber Heard - Twitter BotScores Machine Learning Analysis Report
Amber Heard - Twitter BotScores Machine Learning Analysis Report
ANALYSIS REPORT
“AMBER HEARD”
Machine Learning Approach Over 2020-2021
• The training data used by the owner of model was from Kaggle twitter-bot-dataset
unfortunately the dataset was unavailable now so I downloaded new dataset from
Botometer website
Data processing
Tweets_2020 data was more than 600k rows so it needed hard wrangling
First I set rows by created_at_1(i.e. most recent tweets) to got last tweet so when taking the
difference with created_at (account creation date I got the age at time of dataset collection)
Then I used user.screen_name to remove duplicates users
I did some calculation as shown in notebook to got the missed feature
4 features used by machine was unavailable even with calculation so I used tweepy to got these
features (favourites count,default profile,default profile image,geo enabled)
Not that the feauter gathered with tweepy are the most recent (i.e. now in June 2021) the rest
were in 2020
Results
Trends
1. Trend occur in 2nd Feb and from 6-13 Nov
2. I merged all trend in one dataframe with also bots dataframe
3. All trend are about 182169 tweet
4. About 15769 tweets created from bot (detected by ML model)
5. About 6270 tweets were created from account with less than 10 days age (account
recently made in 24 Hour)
6. About 1349 account have made more than 10 tweet in the trends
7. About 93 account created in less than 10 days and tweeted more >= 10 tweet
8. About 492 tweets created from bot created less than 10 days
9. Jasmine Benedicto detected as a bot with ML model made about 14 tweets and created in
about 1 day
Deleted Accounts
1. Number of deleted users from 2020 ~115k accounts = 5787
2. Deleted accounts which was also in botometer 2021 = 20 most probable bot was
MERKURIUSME with probability of 53%