Social Media Amharic Fake News Detection Using NLP Techniques With SVM Algorithm

This document discusses social media fake news detection in Amharic using natural language processing (NLP) techniques and support vector machine (SVM) algorithms. It notes that social media has become a primary source of news but lacks verification of information, enabling the spread of fake news. The objectives are to collect and label Amharic news data, explore textual features, extract prominent features, model fake news detection as classification, and use SVM for classification. Research questions focus on relevant Amharic news article features and using features for classification. A literature review found little prior work on Amharic fake news due to a lack of data and the under-resourced nature of Amharic, but some related work on detecting fake news in

Uploaded by

mame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views6 pages

Social Media Amharic Fake News Detection Using NLP Techniques With SVM Algorithm

Uploaded by

mame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Social media Amharic fake news detection using NLP techniques with SVM algorithm

Chapter 1 Introduction
Background of the study
Currently the remarkable advancement of technology brings a dramatic change in means of
communication to the digital world via a world wide web so-called internet. The Internet is a global
computer network providing a variety of information and communication facilities, consisting of
interconnected networks using standardized communication protocols. Social media networks are
communication facilities provided by the internet infrastructure .In other words Social media is
computer-based technology that facilitates the sharing of ideas, thoughts, and information through the
building of virtual networks and communities. Users engage with social media via computer, tablet or
smartphone via web-based software or web application, often utilizing it for messaging.
Nowadays social media plays a great role as a source of information all over the world. Social
networks are dedicated website or other application which enables users to communicate with each
other by posting information, comments, messages, images, etc. the different types of social medias
used in exchange of information among peoples includes Social networks, Bookmarking sites,
Social news, Media sharing websites, Microblogging, Blog comments and forums, Social Review
Sites, Community Blogs and Sharing Economy Networks.
Social networks A social networking site is a social media site that allows you to connect with people
who have similar interests and backgrounds. Facebook, Twitter, and Instagram are three of the most
popular examples of a social network website. Most social network sites let users share thoughts,
upload photos and videos, and participate in groups of interest.
Bookmarking sites (Pinterest, Flipboard, Diggs) allow users to save and organize links to any number
of online resources and websites. A great feature of these sites is the ability for the users to “tag”
links, which makes them easier to search, and invariably, share with their followers. StumbleUpon is
a popular example of a bookmarking site.
A social news site allows its users to post news links and other items to external articles. Users then
proceed to vote on said items, and the items with the highest number of votes are most prominently
displayed. A good example of a social news site is Reddit.
Media sharing websites allow users to share different types of media, with the two main ones being
image sharing and video hosting sites. Most of these sites also offer social features, like the ability to
create profiles and the option of commenting on the uploaded images or videos. These platforms
mostly encourage user-generated content where anyone can create, curate, and share the creativity
that speaks about them or spark conversations. YouTube still remains the most well-known media
sharing site in the world.
Microblogging These are just what they sound like, sites that allow the users to submit their short-
written entries, which can include links to product and service sites, as well as links to other social
media sites. These are then posted on the ‘walls’ of everyone who has subscribed to that user’s
account. The most commonly used microblogging website is Twitter.
Blog comments and forums an online forum is a site that lets users engage in conversations by
posting and responding to community messages. A blog comment site is the same thing except being
a little more focused. The comments are usually centered around the specific subject of the attached
blog. Google has a popular blogging site aptly titled, Blogger. However, there are a seemingly
endless number of blogging sites, particularly because so many of them are niche-based, unlike the
universal appeal of general social media sites.
Social Review Sites (such as TripAdvisor, Yelp, Foursquare): Review sites like TripAdvisor and
Foursquare show reviews from community members for all sorts of locations and experiences. This
keeps people out of the dark and allows them to make better planning or decisions when it comes to
choosing a restaurant for their date.
Community Blogs (like Medium, Tumblr) Sometimes all you want to do is share that one message,
and really not everyone on the internet wants to invest in running and maintaining a blog from a self-
hosted website. This is where shared blogging platforms like Medium give people a space to express
their thoughts and voice.
Sharing Economy Networks (such as: Airbnb, Pantheon, Kickstarter) While it might not occur to you
directly, websites like Airbnb aren't just to find holiday rentals or activities. These sharing economy
networks bring people who have got something they want to share together with the people who need
it.
All of the above are taken from this link https://seopressor.com/social-media-marketing/types-of-
social-media/

Currently social networks like Facebook, twitter become most popular in providing the main source
of news feeds.
As an increasing amount of our lives is spent interacting online through social media platforms, more
and more people tend to seek out and consume news from social media rather than traditional news
organizations. The reasons for this change in consumption behaviors are inherent in the nature of
these social media platforms: (i) it is often more timely and less expensive to consume news on social
media compared with traditional news media, such as newspapers or television; and (ii) it is easier to
further share, comment

1. Web space
The website should provide the users free web space to upload content.

2. Web address
The users are given a unique web address that becomes their web identity. They can post and share
all their content on this web address.

3. Build profiles
Users are is asked to enter personal details like name, address, date of birth, school/college education,
professional details etc. The site then mines the personal data to connect individuals.

4. Connect with friends

Users are encouraged to post personal and professional updates about themselves. The site then
becomes a platform to connect friends and relatives.

5. Upload content in real time

Users are provided the tools to post content in real time. This content can be text, images, audio,
video or even symbolic likes and dislikes. The last post comes first, giving the site freshness.

6. Enable conversations
Members are given the rights to comment on posts made by friends and relatives. The conversations
are a great social connection.

7. Posts have time stamp

All posts are time stamped, making it easy to follow posts.

Malicious

Social medias inherently adopted as

Easy access
Interactive
Rapid dissemination of dissemination of information
Makes in ideal please to express thoughts claims, feelings of individuals beyond their screen freely
Problems absence of a service to check the veracity of posts to determine truth value
Malicious accounts
Politicians
Causes for creation of false information, so-called fake news.
News articles that are intentionally and verifiably false to mislead readers towards their aim are
usually political gains .
Misleading readers lead to death of individual’s ,society ,
Problem of statement

Currently social media become the main source of Information even mainstream media changed
towards it. This is due to the lowest fee, easiness to use, rapid dissemination of information and
reaction to events makes social media an ideal place to express emotions, feelings, claims and
thoughts of individuals. The major cons in social media is absence of a service to check
truthfulness the posted information for users. Absence of verifying information service created
problem of fake information so-called fake news. The extensive spread of fake news has the
potential for extremely negative impacts on individuals and society, reducing the government
credibility, even endangering the national security.
General objective:
To detect Amharic fake news on social media using NLP techniques and support vector
machines.
Specific objective:
⮚ To collect and prepare manually labeled data of Amharic news articles.
⮚ To explore relevant textual features of Amharic news articles.
⮚ To extract prominent textual features of Amharic news articles.
⮚ To model Amharic fake news detection as a binary classification task
⮚ To adopt an inherent two-group attribute of a support vector machine for Amharic fake
news detection.
⮚ To train and evaluate the model
Research questions
⮚ What are relevant textual linguistic features of an Amharic news article?
⮚ How to use relevant features of Amharic news articles to classify news articles as fake or
real?
⮚ Which features will lead to high accuracy in a classification task?
Chapter 2. literature review

Despite the fact that fake news is the major threat that jeopardizes individuals, public,
government credibility which leads to endanger national security as a whole. Due to lack of
publicly available Amharic fake news dataset and the under-resourced property of Amharic
language many efforts have not been done to combat Amharic fake news using natural language
processing and machine learning in online social media. Fake news detection in a hot and news
research area started in recent years. In this section related works are organized as follows.
Hussain et.al. (2020) worked on Detection of Bangla Fake News.in this research Inverse
Document Frequency Vectorizer and Countvectorizer has been used as feature extraction and
Support Vector Machine (SVM) and Multinomial Naive Bayes (MNB) classifiers to recognize
Bangla fake news. The experiment was conducted using a bangala language data set collected
from different sites of Bangladeshi online news sources with a total count of 2500 news article
1548 rales and 940 fakes. Results showed that SVM with linear kernel gives a 96.64 percent
accuracy overperforming MNB with a 93.32 percent accuracy.
Smitha et.al (2020) the main focus of this work was evaluating performance of seven various
machine learning classifiers namely Linear SVM, Random Forest, Logistic Regression, XG-
Boost, Gradient Boosting, Decision Tree, neural network for fake news detection task. These
classifiers were evaluated with three different feature engineering methods namely count
vector, TF-IDF and word embedding on Kaggle dataset (two classes labeled i.e. either fake/real).
Results showed that the highest accuracy obtained was SVM Linear classification algorithm with
TF-IDF feature extraction and Neural Network with Count vectorizer, 0.94 accuracy. Finally,
authors argued despite the fact that Neural Network with Count vectorizer could score the same
highest accuracy like SVM, from running time and complexity point of view Linear SVM
classifier is preferable for their proposed fake news detection system.
Abdulaziz et.al. (2020) in this paper authors conducted an empirical comparison four well-
known machine learning algorithms namely random forest, the Naïve Bayes, the neural network,
and the decision trees with N-gram model for feature extractions on publicly available dataset
called LIAR (multi-class). Results obtained from the experiment showed that trigram feature
shows with Naïve Bayes classifier defeats the other algorithms remarkably on this dataset
achieving the highest accuracy 99%. Besides in determining the training time of each algorithm
in seconds were 54.3 s 1.3 s, 420.2 s, 540.1 s, random forest, the Naïve Bayes, the neural
network respectively. Therefore, this paper argued that the Naïve Bayes outperforms with
highest accuracy as well as minimum computational running time.
Ahmed et.al (2017) this research has been conducted with the intention of “Detecting Online
Fake News Using N-Gram Analysis and Machine Learning Techniques''. This paper used N-
gram models and TF IDF for feature generation evaluated among six various machine learning
classifies namely Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Linear
Support Vector Machines (LSVM), K-Nearest Neighbor (KNN) and Decision Trees (DT) on
ISOT(two class labeled) dataset which is collected from Kaggle and rueters.com by the
authors .Results showed that the best performance model was using Term Frequency-Inverse
Document Frequency (TF-IDF) as feature extraction technique, and Linear Support Vector
Machine (LSVM) as a classifier, with an accuracy of 92%.

Importance of Social Media
100% (1)
Importance of Social Media
30 pages
Introduction of Social Media
100% (1)
Introduction of Social Media
5 pages
Social Media
No ratings yet
Social Media
62 pages
Selenium Notes
No ratings yet
Selenium Notes
5 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Final Report On Blood Donation Website
100% (1)
Final Report On Blood Donation Website
63 pages
MD 102
No ratings yet
MD 102
10 pages
Camm 4e Ch01 PPT
No ratings yet
Camm 4e Ch01 PPT
48 pages
AICT (Outline)
No ratings yet
AICT (Outline)
5 pages
Module 3 - Social Media Overview & Security
No ratings yet
Module 3 - Social Media Overview & Security
8 pages
AdiFarhud SocialmediaEDUver2018
0% (1)
AdiFarhud SocialmediaEDUver2018
13 pages
Lab5 ASIC
No ratings yet
Lab5 ASIC
11 pages
Mod 7 - Social Media
No ratings yet
Mod 7 - Social Media
6 pages
Social Media
No ratings yet
Social Media
22 pages
SOE - Project Report Sample Format - Part B
No ratings yet
SOE - Project Report Sample Format - Part B
5 pages
Social Media - Wikipedia
No ratings yet
Social Media - Wikipedia
422 pages
Amazon Polly: Developer Guide
No ratings yet
Amazon Polly: Developer Guide
256 pages
Social Media by Hanan
No ratings yet
Social Media by Hanan
7 pages
Social Media
No ratings yet
Social Media
91 pages
RM Project
No ratings yet
RM Project
29 pages
Social Media and Freedom of Speech and Expression PDF
No ratings yet
Social Media and Freedom of Speech and Expression PDF
16 pages
All-Products - Esuprt - Electronics - Esuprt - Display - Dell-St2420l - User's Guide - En-Us
No ratings yet
All-Products - Esuprt - Electronics - Esuprt - Display - Dell-St2420l - User's Guide - En-Us
33 pages
Unit 1
No ratings yet
Unit 1
52 pages
Social Media and Freedom of Speech and Expression
No ratings yet
Social Media and Freedom of Speech and Expression
4 pages
Project Social Media Laws (New11)
No ratings yet
Project Social Media Laws (New11)
46 pages
Wa0009
No ratings yet
Wa0009
70 pages
Brayan Zip
No ratings yet
Brayan Zip
98 pages
Cyber Security Unit 3 Unit 3
No ratings yet
Cyber Security Unit 3 Unit 3
28 pages
Viii
No ratings yet
Viii
79 pages
Week 11 CC101 PPT
No ratings yet
Week 11 CC101 PPT
53 pages
Social Media Is The Ability To Connect and Share Information With Anyone On Earth
No ratings yet
Social Media Is The Ability To Connect and Share Information With Anyone On Earth
5 pages
IT Era Chapter 4 Lecture 2
No ratings yet
IT Era Chapter 4 Lecture 2
40 pages
Ip Project
No ratings yet
Ip Project
24 pages
Public Life Media and Recreation
No ratings yet
Public Life Media and Recreation
42 pages
Module 3
No ratings yet
Module 3
15 pages
Impact of Social Media
No ratings yet
Impact of Social Media
7 pages
U 1 Osn
No ratings yet
U 1 Osn
41 pages
Media Law Seminar
No ratings yet
Media Law Seminar
28 pages
Unit 7
No ratings yet
Unit 7
37 pages
Cyber Security Module-3
No ratings yet
Cyber Security Module-3
15 pages
Introduction To Social Media
No ratings yet
Introduction To Social Media
13 pages
Cloud Training Architecture
No ratings yet
Cloud Training Architecture
18 pages
Cmmi 1
No ratings yet
Cmmi 1
21 pages
Role of Social Networking Sites in Recent Era
No ratings yet
Role of Social Networking Sites in Recent Era
36 pages
Social Network
No ratings yet
Social Network
30 pages
Unit 4 Ba Hon
No ratings yet
Unit 4 Ba Hon
13 pages
Global Social Media
No ratings yet
Global Social Media
13 pages
Social Media
No ratings yet
Social Media
11 pages
GS106 CHP (1-7) PQ
No ratings yet
GS106 CHP (1-7) PQ
30 pages
Presentation On Social Media
No ratings yet
Presentation On Social Media
6 pages
05 - Communication Technology
No ratings yet
05 - Communication Technology
21 pages
Social Networking Complt
No ratings yet
Social Networking Complt
30 pages
Group 6 Final
No ratings yet
Group 6 Final
36 pages
How To Connect To MySQL With VB6
No ratings yet
How To Connect To MySQL With VB6
6 pages
People, The Internet, and Social Media
No ratings yet
People, The Internet, and Social Media
9 pages
Final Requirements IN Empowerment Technologies: Submitted By: Laica M. Ravanilla
No ratings yet
Final Requirements IN Empowerment Technologies: Submitted By: Laica M. Ravanilla
13 pages
Social Media: Section-A
No ratings yet
Social Media: Section-A
8 pages
Icse48619 2023 00181
No ratings yet
Icse48619 2023 00181
12 pages
Social Network Analysis Unit1
No ratings yet
Social Network Analysis Unit1
7 pages
DVD CD Restore Guide
No ratings yet
DVD CD Restore Guide
11 pages
ACC 317 Management Information Syetem
No ratings yet
ACC 317 Management Information Syetem
13 pages
Conceptial Review of Literature
No ratings yet
Conceptial Review of Literature
10 pages
DBMS Papers
No ratings yet
DBMS Papers
10 pages
Social Network
No ratings yet
Social Network
12 pages
What Is Social Media (Lifewire)
No ratings yet
What Is Social Media (Lifewire)
8 pages
Social Media and Social Networking
No ratings yet
Social Media and Social Networking
9 pages
Social Media
No ratings yet
Social Media
9 pages
DMM Unit 1
No ratings yet
DMM Unit 1
11 pages
22CSE53 Full Stack Final
No ratings yet
22CSE53 Full Stack Final
4 pages
Features of The Hospital ERP System
No ratings yet
Features of The Hospital ERP System
4 pages
Railway Group D Exam Guide
No ratings yet
Railway Group D Exam Guide
8 pages
Geography
No ratings yet
Geography
4 pages
Managing The System Development Life Cycle
No ratings yet
Managing The System Development Life Cycle
6 pages
Sts Reviewer
No ratings yet
Sts Reviewer
5 pages
Customer
No ratings yet
Customer
5 pages
Thesis - What Is Social Media
No ratings yet
Thesis - What Is Social Media
3 pages
CV MSabry 10-12-30
No ratings yet
CV MSabry 10-12-30
2 pages
Andrew Wells CV
No ratings yet
Andrew Wells CV
3 pages
HP Laserjet Managed MFP E52645 Series
No ratings yet
HP Laserjet Managed MFP E52645 Series
5 pages
Comp PP2 QNS0001
No ratings yet
Comp PP2 QNS0001
4 pages
HHSC
No ratings yet
HHSC
3 pages
DCCASocialMediaBrochure Rev0817
No ratings yet
DCCASocialMediaBrochure Rev0817
2 pages
Lesson 2 The Social Media
No ratings yet
Lesson 2 The Social Media
3 pages
Avocor G Series Datasheets ENGLISH
No ratings yet
Avocor G Series Datasheets ENGLISH
2 pages
Social Media
No ratings yet
Social Media
1 page
Social Media
No ratings yet
Social Media
8 pages