0% found this document useful (0 votes)
203 views26 pages

8.progress Report Presentation (Clickbait Detection System)

This document describes a clickbait detection system that uses a convolutional neural network model. The system is designed to identify clickbait headlines by analyzing online content. It utilizes Python libraries like NumPy and Pandas for data processing and a CNN model for classification. The CNN model is trained on a compiled clickbait corpus to accurately detect clickbait and non-clickbait headlines with over 90% accuracy. Future work may include analyzing important features learned by the model and developing a browser plugin to alert users of clickbait.

Uploaded by

mehak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views26 pages

8.progress Report Presentation (Clickbait Detection System)

This document describes a clickbait detection system that uses a convolutional neural network model. The system is designed to identify clickbait headlines by analyzing online content. It utilizes Python libraries like NumPy and Pandas for data processing and a CNN model for classification. The CNN model is trained on a compiled clickbait corpus to accurately detect clickbait and non-clickbait headlines with over 90% accuracy. Future work may include analyzing important features learned by the model and developing a browser plugin to alert users of clickbait.

Uploaded by

mehak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

CLICK-BAIT DETECTION SYSTEM

Introduction

A popular trend in the online content today is the prevalence


of clickbait, which is nothing but online content of misleading
nature, with the sole aim of attracting the viewers’ attention
and luring them to their web page. Clickbaits are characterized
by poor content with little value and the agencies deploying them are
heavily dependent on revenue by ad streams.
Therefore, they create eye catching titles which lure users into
clicking them, thereby generating revenue. Often promising a
worthwhile experience or an indispensable revelation, these
articles feed on human psychology and create a frustrating
experience for the user, as he or she does not usually get the quality of content they
were expecting.Even though research in the field of clickbait detection is still in an
early phase, a lot of attention has come towards clickbait. Because of the increasing
pervasiveness of clickbait in online media and news, significant backlash has started
to happen against social media platforms where such content appears.
Brief of Project

Clickbaits, in social media, are exaggerated headlines whose main motive is to


mislead the reader to “click” on
them. They create a nuisance in the online experience by creating a lure towards poor
content. Online content creators are utilizing more of them to get increased page
views and thereby more ad revenue without providing the backing content. This
paper proposes a model for detection of clickbait by utilizing convolutional neural
networks and presents a compiled clickbait corpus.
Project Design:

 This project is built using python libraries which are:

 NumPy
 • First one is Numpy, which is an open source library that is extremely popular among Machine
Learning community. It is mainly used for handing mathematics formulas and calculations in
Machine learning applications, in addition to that Numpy is also used for Cleaning data-sets and
handling NULL spaces in data cells.
 • NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
 - a powerful N-dimensional array object
 - sophisticated (broadcasting) functions
 - tools for integrating C/C++ and Fortran code useful linear algebra, Fourier transform, and
random number capabilities
 • Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
 dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy
to seamlessly and speedily integrate with a wide variety of databases.

 • NumPy is licensed under the BSD license, enabling reuse with few restrictions.
Pandas
 • Second Library is Pandas, It is the most widely used tool for data munging. It contains
high-level data structures and manipulation tools designed to make data analysis fast and easy.
 • Its ability of being fast, powerful, flexible and easy to use make it best for data analysis
and manipulation. Another big benefit of using Pandas is that it is open source.
 • The Pandas library provides a really fast and efficient way to manage and explore data. It
does that by providing us with Series and Data-Frames, which help us not only to
 represent data efficiently but also manipulate it in various ways. These features of Pandas is
exactly what makes it such an attractive library for data scientists.
 • Labeling of data is of utmost importance. Another important factor is an organization,
 without which data would be impossible to read. These two needs: Organization and
labeling of data are perfectly taken care of by the intelligent methods of alignment and
indexing, which can be found within Pandas.
 • Data is very crude in nature and one of the many problems associated with data is the
 occurrence of missing data or value. Therefore, it is pertinent to handle the missing
values properly so that they do not adulterate our study results. Some Pandas features
have you covered on this end because handling missing values is integrated within the
library.

 NLP(Natural Language Processing)


  Natural language processing (NLP), helps to make computers understand the
unstructured text and retrieve meaningful pieces of information from it.
  Natural language Processing (NLP) is a sub-field of Artificial Intelligence in
which its depth involves the interactions between computers and humans.
  NLP divided into two approaches which are:-
 Rule-based Natural Language Processing:
 It uses common sense reasoning for processing tasks.This process can take much time,
and it requires manual effort.

 Statistical Natural Language Processing:


 It uses large amounts of data and tries to derive conclusions from it. Statistical NLP uses
machine learning algorithms to train NLP models. After successful training on large
amounts of data, the trained model will have positive outcomes with deduction.
Innovations and model design/solution
CNN(Convolutional Neural Network) :
Click-baits are headlines that exaggerate the facts or hide the partial facts to attract user clicks.
Click-baits deter readers from effectively and efficiently obtaining information in the era of
information explosion, and will obviously affect user experience in news aggregator sites like
Google News and Yahoo News. Detecting and preventing click-baits become crucial. Previous
work achieved remarkable performances on this task with hand-crafted lexical and syntactic
features on limited platforms. However, this line of work heavily depend on expertise
knowledge and can not be easily applied to languages that do not share such features. To
address above issues, we propose a general end-to-end Convolutional Neural Network based
approach, which Automatically induces useful features for the end task without relying on any
external resources. Empirical experiments on English and Chinese corpus show that our method
achieves consistent results, showing the effectiveness and robustness of our approach across
languages. We will share our annotated corpus collected from Chinese news news sites on
publication.
Datasets
Implementation
Implementation
Technolgies Used:-
 Python (NLTK library):

This technology is used in this project for text cleaning process.The Natural Language Toolkit (NLTK) python library
has built-in methods for removing stop words.It can be done by various steps
 Word Embedding:-

This is also used in this project and the method used is Woed2vec embedding.The
first layer of the CNN is used for embedding the words into vectors of low-
dimensions
 Natural Language Processing:

Natural language processing (NLP), helps to make computers understand the unstructured text and retrieve meaningful
pieces of information from it. Natural language Processing (NLP) is a sub-field of Artificial Intelligence in which its
depth involves the interactions between computers and humans.
MODEL USED:

 CNN MODEL
we use a simple CNN having one layer of convolution.The above figure shows a
graphical representation of the complete model utilized. The CNN we utilize is based
on the CNN architecture of Kim . The first layer of the CNN is used for embedding
the words into vectors of low-dimensions. For word embeddings we utilize two
variants word embeddings which are learnt from scratch, and word embeddings
which are learnt from an unsupervised neural language model which keep evolving
as training
occurs. This technique of initializing word vectors from an unsupervised neural
language model has been
shown to improve performance . We utilize the word vectors trained by Mikolov,
Chen, Corrado and Dean
on 100 billion words of Google News. These vectors are publicly available as
word2vec.
CNN Model Used
MODEL DESIGN
Future Scope
1.Finding the features that the model has learnt
and finding the most important ones.
2.Gathering more data for developing better models and
3.Coming up with a serverbacked web browser plugin which can harness the power of this model and
can alert the user about the clickbaits on the page.
CONCLUSION
The nuisance of clickbait keeps on increasing in online
media. To curb that, we collected data from multiple sources and created a new corpus for clickbait and
non-clickbait headlines. We then developed a deep learning model based on CNN that performs strongly
on the classification of headlines into clickbait and non-clickbait categories. We were able toreceive an
accuracy of 0.90 along with a precision of 0.85 and a recall of 0.88 on the clickbait class. We aim to
make available this model and the corpus for further usage.
REFERENCES
[1] K. El-Arini and J. Tang, “News feed fyi: Click-baiting,” 2014.[Online]. Available:
http://newsroom.fb.com/news/2014/08/news-feedfyi-click-baiting/
[2] J. C. dos Reis, F. Benevenuto, P. O. S. V. de Melo, R. O. Prates, H. Kwak,and J. An, “Breaking the
news: First impressions matter on online news,”in Proceedings of ICWSM 2015.
[3] G. J. Digirolamo and D. L. Hintzman, “First impressions are lasting impressions: A primacy effect in
memory for repetitions,” Psychonomic Bulletin & Review, vol. 4, no. 1,
[4] D. J. Dooling and R. Lachman, “Effects of comprehension on retention of prose.” Journal of
Experimental Psychology, vol. 88, no. 2, pp. 216–222, 1971.
[5] G. Loewenstein, “The psychology of curiosity: A review and reinterpretation,” Psychological
Bulletin, vol. 116, no. 1, pp. 75–98, July 1994.
[6] B. Gardiner, “Youll be outraged at how easy it was to
get you to click on this headline,” 2015. [Online]. Available:
http://www.wired.com/2015/12/psychology-of-clickbait/
[7] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep
convolutional networks,” in Proceedings of NIPS 2012.
[8] A. Graves, A. Mohamed, and G. E. Hinton, “Speech recognition with deep
recurrent neural networks,” in Proceedings of ICASSP 2013.
[9] Y. Kim, “Convolutional neural networks for sentence classification,” in
Proceedings of EMNLP 2014.
Project by:

Chaitanya Dhiman 18BCS1892


Prince 18BCS1251
Jayant 18BCS2050
THANK YOU

You might also like