Project Thesis Grp-8 - Final - Upload - Jul31
Project Thesis Grp-8 - Final - Upload - Jul31
A Project Report
By
1
Contents
DECLARATION i
CERTIFICATE ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF FIGURES . v
LIST OF TABLES. vi
1. INTRODUCTION
2. LITERATURE REVIEW
3. METHODOLOGY
4. EXPERIMENTAL RESULTS
5. CONCLUSIONS
6. REFERENCES
2
Declaration
We hereby declare that this submission is our own work and that, to the best of our belief and
knowledge, it contains no material previously published or written by another person or material
which to a substantial error has been accepted for the award of any degree or diploma of
university or other institute of higher learning, except where the acknowledgement has been
made in the text. The project has not been submitted by us at any other institute for the
requirement of any other degree.
3
Certificate
This is to certify that the project report which is titled as Semantic Summarization of Web
News presented by Sunny Jain , Saumitra Shukla and Vipul Sharma Name in the partial
fulfillment for the award of Bachelor of Technology in Computer Science and Engineering, is a
record of work carried out by them under my supervision and guidance at the Department of
Computer Science and Engineering at Institute of Engineering and Technology, Lucknow.
It is also certified that this project has not been submitted at any other Institute for the award of
any other degrees to the best of my knowledge.
4
Acknowledgement
I would firstly like to thank my supporting group members Mr. Sunny Jain, Mr. Vipul Sharma
and Mr. Saumitra Shukla for working diligently with me day in and day out and without whom
this project was impossible.
I am deeply indebted to my mentor Dr. Tulika Narang, CSE department and Dr. Pawan Kumar
Tiwari,CSE department for their valuable guidance, keen interest, constructive criticism and
encouragement at various stages of my training period.
I would like to thank Dr. Promila Bahadur, CSE department and Dr. Tulika Narang, CSE
department, the project monitoring committee members for delivering the guidelines and
organising the online presentations composedly with time and ease.
Finally, I would like to wind up by paying my heartfelt thanks to my supporting family and friends
for motivating me and putting out their ideas for this project
Saumitra Shukla
Sunny Jain
Vipul Sharma
5
Abstract
We describe our experiences creating a news segmenter for our final year project in this thesis. In
our project, we utilise a variety of methodologies, and the outcomes of their experiments are
compared and assessed. We used relaxed error measures for performance evaluation because of the
application backdrop of the final year project.
We are expected to write a report on "Semantic Summarization of Web News" as part of our final
year project and to gain expertise in the field of data science. The primary goal of completing this
project report is to gain understanding about various software engineering tools.
.
We describe our experiences creating a news segmenter for our final year project in this thesis. In
our project, we utilise a variety of methodologies, and the outcomes of their experiments are
compared and assessed. We used relaxed error measures for performance evaluation because of the
application backdrop of the final year project.
We are expected to write a report on "Semantic Summarization of Web News" as part of our final
year project and to gain expertise in the field of data science. The primary goal of completing this
project report is to gain understanding about various software engineering tools.
Citizens do not have access to this information. As a result, the goal of this project is to automate
the extraction and display of essential information from newspaper articles and make it accessible
to the broader audience.
The completion of this project report allows us to expand our understanding of the work of
consumer attitudes about reading web news. We are going through a lot of experiences that are
linked to our theme concepts. We learn about the value of collaboration and the function of job
dedication from this report.
6
List of Figures
7
List of Tables
3.1 25
Commonly used tags in POS tagging
4.1 37
Meaning of arguments for sentences
4.2 39
Experimentation accuracy results
8
Chapter 1
Introduction
1.1. Background
The World Wide Web is also a huge database. This large quantity of information gives rapid access
by human users and algorithms to almost every conceivable content, yet the unstructured nature of
most of the data available might pose a major problem.
The human being can in principle best extract relevant information from posted documents and
texts, the huge amount of knowledge that can be handled demand computerised approaches, the
exponential growth of the online industry has apparently made information searches and tracking
easier and faster, but a massive overload of information requires algorithms and tools with a quick
and easy way to track information.
In other words, the huge amount of data itself is the reason why everyone may access it from the
one hand, so that it creates the recognised difficulty of distinguishing between valuable and
worthless information. In other words, the information is generated from the other hand. It is
therefore necessary to summarise the wide range of information available on the Internet for users
to read knowledge easily without wasting most of their time reading the vast data. This is why we
have used the summary tool so that we are ready to sum up the information and improve the user's
reading capacity.[3]
Our proposal is influenced by text resumption models that have supported Maximum Coverage
Problem, but we create a technique that blends both the syntactic and the semantic structure of a
text, other than them. The news are divided into several parts within the suggested model according
to the emotional element of news, commencing with the most positive and most negative notices.
9
Application of semantic networks to the examined web source will result in Semantic
characterisation. As a consequence, the language text maps into an abstract representation to
identify the subjects addressed inside the net resource itself. By employing the abstract
presentation, the latter job is achieved by a heuristic algorithm to develop the necessary text
segments in the first document.[3]
At the moment, individuals want to use as much news as possible from as many sources as possible
on topics that are essential or which are of interest to them. Interactivity refers to the innate
inclination of the masses to consume their own news. Immediation is an important characteristic in
which individuals need to be notified without delay about news. The environment in which we live
and the technology we are familiar with allows individuals to profit from these qualities by
providing them with quick news about occurrences in real time.
Online news sites have evolved efficient techniques for drawing the attention of the public. Online
news gives thoughts on news organisations that may consist of individuals, places, or objects
during reports of current occurrences. For this reason, many channels of different news websites
offer interactive rating services, i.e., news might be good, bad or neutral. Sentiment Analysis or
Opinion Mining is a technique of discovering the polarity or strength of the (positive or negative)
opinion in writing, in this paper – an item in the news. Manual labelling of words of feeling is a
procedure which takes time.
The sentiment analysis method is automated with two prominent ways. The first method uses a
weighted word lexicon while the second step is based on machine learning approaches. Methods
that are based on Lexicon employ a word inventory dictionary and fit a number of words into a text
to detect polarity.
This methodology does not require preprocessing data and does not need to train a classifier, as
opposed to machine learning approaches.
This investigation is based on a Lexicon-based news article sentiment analysis technique The rest
of this paper is structured as follows:
10
The literature research was out via sentiment analysis for Chapter II is presented in news articles.
Chapter III presents the proposed methodology and experiment setup of this paper. Results have
been presented in Chapter IV followed by Conclusion in Section V.
The news field and the extraction by newspaper articles of defined interest as structured templates
are part of this project. For a large choice of consumers it is very necessary to obtain information in
an extremely better method with the least use of resources, time and money. This type of
information is extremely important. This system saves, along with the extraction of knowledge, all
the news information gathered and offers it in effective and efficient methods that might assist to
make country-related decisions.
Different extraction systems are examined and the system requirements supported are selected as
the most suited. The process of extraction involves the formalism of the selected system of
extraction in an interchangeable manner. Following the writing, implementation and testing of
appropriate rules, the performance of the planned rules is assessed to determine if the whole
process is successful/failed.[1]
The project has been started to meet both academic and corporate standards and criteria of the IET
Lucknow Project. Users can utilise the system for various techniques, such as:
11
1.3. Problem Statement:
The world is changing rapidly and the need to adapt to events now increases for others who wish
to globalisation. They therefore need to gain an enormous quantity of knowledge and understand
it in less time. Increasingly, news stories from thousands of internet sources are making it
increasingly necessary to summarise this information because not everybody has time to read
complete pieces. Readers may browse the information on the newest news from different news
sources. Our solution helps reduce this difficulty by gathering summary information and
extracting it from the news stories and therefore users will not have to go into the news in order
to obtain information about the event. A huge quantity of data is available in electronic format at
current digital era. However, we lack the tools and technology essential for summarising this
information in meaningful knowledge utilised for crucial decisions.[1]
We thus seek to develop a platform for users to login and to provide recent news from many
reputable sources
A user may choose the news source he/she wants explicitly.
We provide the user with semantically ordered news:
● Most negative
● Most positive
● Negative medium
● Positive medium
● Neutral
To use the time pressure of the user efficiently and provide most alarming news
them.
.
● We also provide the user a means to share the content efficiently on social media sources to
raise awareness throughout the population, leading the news source to the authorities
concerned.
12
Chapter 2
Literature Review
Text summarising may also refer to the process of extracting or gathering key information
from a source text and displaying it in a graphical style. In recent years, the need for
summarization has been seen in a variety of contexts and domains, including news article
summaries, email summaries, short messages of reports on mobile devices, and data
summaries for businesspeople, governance, researchers online searching through a
programme to receive a summary of relevant pages found, and medical field tracking
patient's storey for further treatment.
Many examples may be found on the internet, such as article summarizers like Microsoft
News2, Google1, or Columbia Newsblaster3. A few common biomedical summarising
tools include BaseLine, FreqDist, SumBasic, MEAD, AutoSummarize, and SWESUM [6].
Online summarising tools include Text Compactor, Simplify, Tools4Noobs,
FreeSummarizer, WikiSummarizer, and SummarizeTool. Among the most commonly used
open source summarising programmes are Open Text summarizer, Classifier4J, NClassifier,
and CNGLSummarizer.
The first automated summarizer was introduced in late 1950; the automated summarizer
chooses significant sentences from the text and puts them together; it takes less time or
saves time to grasp the information inside the large document. The goal of automated text
summarization is to reduce the size of long texts and save vital information.
13
2.2 Works on semantic analysis
The main article was published in 1940, with the title "Cross-Out Technique as public
opinion analysis." The articles published during the periodical Quarterly, in 1945 and 1947,
covered the measurement of public views in post-WWII nations (Japan, Italy, and
Czechoslovakia) experienced during the war. Computer systems started to appear in the
mid-90s.
In research too, the computer revolution began to reflect. As an example, in 1995, an ADPS
article was released on 'Elicitation, evaluation and pooling of expert opinions using the
Possibility Theory' and utilised as an example a pooling of opinions for expert opinions
within the field of business safety[7]. However, the emergence of an up-to-date study of
feelings was over 10 years away.
The study carried out by the Association of Linguistics, created in 1962, also influenced the
creation of the contemporary sentimental analysis. In 1990 and subsequently in 1999,
Wiebe suggested a gold standard to be followed up to now. It was the computer-based
sentiment analysis which came into existence largely during this community, and thus it
was first found that Wiebe had presented techniques of detecting subjective sentences from
narratives
14
2.3 Related Work
While an abstract summary is more intended, sophisticated speech processing techniques are
required while extractive summaries are more practical.[3]
Another important distinction is the summary of one or many papers. Multiple problems arise
since summary systems must take into account diverse characteristics as such as the distinction
and thus the similitude of all sources as well as the sequence of information collected.
In order to extract the largest important section of a text, the strategy presented follows the fact
that terms often or seldom occurring in excessively large amounts of documents must therefore
be treated as human summaries.[3] In these systems, it is also crucial to identify not just what
information is typically included in the summary, but also how significant the relevance of the
changes to information is, looking at what information has previously been included in the
summary.
Another method is that the aim is to investigate and collect information on a subject/object
triple extracted by words in the sort of semanticized charts. A generic framework is provided
that may integrate information on sentence-level structure with semantical similarity.
Finally, an extractive ontology summary supported an RDF phrase as a key unit of summary.
The summary is achieved by extracting a bunch of remarkable RDF phrases according to a new
method.[3]
16
Chapter 3
Methodology
3.1. System-Architecture
(i) Web Search: Using the accessible API of the most common newspaper rss feeds, various
websites are scraped and news is saved in an area database.
(ii) Text Extractor: enables for the extraction of linked textual sentences from documents
parsing HTML pages.
(iii) Natural language processing: various NLP algorithms divide the retrieved text into
sentences and identify the functional elements (subject, verb, and object) of each phrase, as
well as the related form, resulting in the extraction and sentiment analysis.
(vi) Creating a summary: creates a summary based on the search terms and user references.
3.2 Logic-Flow
17
● We'll start by picking phrases from the clusters with the highest data score and the least
amount of text redundancy.
● We rank the clusters according to their ratings, then utilise the edge to choose the most
essential ones based on the length constraints.
● We choose the sentence with the most representative semantic clusters while also
reducing repetition.
● We'd like to prevent the possibility of the same sentence being presented several times
in the summary because many statements may have different semantic clusters.
● We penalise each cluster's common score if then've already examined it throughout our
summary generating process, and we compare the clusters' cumulative average score to
the edge to see whether they contain any more valuable information.[3]
● After determining the best summary, the sentences are reordered to maximise the partial
ordering of sentences inside the article.
18
3.3 Web-mining
The use of knowledge mining techniques to automatically identify and extract information
from web documents and services is known as web mining; application areas include
resource discovery, information selection, generalisation, and data analysis.
Machine-learning approaches, by the way, generally address the final two goals. Web
content mining, web structure mining, and web use mining are the three primary sub-areas
of web mining. The previous section deals with the analysis of web resource contents,
which often include a variety of data sources such as texts, pictures, videos, and audio;
metadata and hyperlinks are frequently classed as text content. It has been shown that
unstructured text makes up a significant portion of web resources, resulting in widespread
use of text mining tools.[2]
There are several studies in the literature that focus on text mining for web page mining.
We looked into certain website mining approaches for online search, subject extraction, and
web opinion mining. Web content mining might help with things like sentiment
categorization, customer review analysis and summarization, template identification, and
page segmentation. By establishing a framework for competitive information, online
content mining tackles corporate applications.
Web-content classification and word-level summarising approaches were assisted by a
sophisticated computer virus. Unwanted advertising was detected using a web-page
analyzer. The study reported offered a web-page recommendation system in which
collaborative filtering techniques and learning methods collaborated to provide a web filter
for effective user navigation.[2]
In two major ways, the method used in this study varies from that used in previous studies:
It uses semantic-based approaches to pick and score single phrases retrieved from text in
the first place.
Second, it combines website segmentation with summarization. The suggested technique
does not fall within the category of semantic web mining, which refers to approaches that
deal with the occurrence of particular ontologies that enhance original website material in a
19
structured fashion. To the authors' knowledge, there are just two studies in the literature that
employ semantic information for webpage mining. [2]
The study described customised multimedia management systems and employed semantic,
ontology-based contextual data to understand customised content access and retrieval
behaviour. The WordNet semantic network was used to provide innovative semantic
similarity metrics in a study on semantic-based feature extraction for web mining.
20
3.3.1 Web-scraping
Website pages are generally built for visual interaction and feature a variety of graphic parts
that transmit a variety of material. The goal of web page segmentation is to understand the
page structure and divide the information into visual pieces. This might be a difficult task
with a significant number of difficulties. In recent years, several approaches for website
segmentation have been used.[2]
Web scraping is the process of extracting data from a website. This data is gathered and
then exported in a manner that is more user-friendly. It doesn't matter if it's a spreadsheet or
an API. Although online scraping is frequently done manually, automated technologies are
generally preferable for scraping web data since they are more cost-effective and work at a
faster rate. Web scraping, on the other hand, isn't always an easy process. Because websites
come in a variety of shapes and sizes, web scrapers differ in their functionality and
capabilities. For web scraping, we use beautifulsoup4.
Beautiful Soup is a Python library for parsing HTML, XML, and other markup languages
for data. Let's say you come across some websites that display data important to your study,
such as dates or addresses, but don't allow you to download it directly. Beautiful Soup
allows you to extract specific material from a website, strip away the HTML markup, and
save the data. It's a web scraping programme that allows you to pause working and parse
the pages you've taken down from the internet.[2]
Heuristic algorithms are used in web page segmentation methods, which primarily rely on
the Document Object Model (DOM) tree structure associated with an online resource. As a
result, segmentation algorithms may not work effectively if such auxiliary features don't
appear to be present or if they don't match the web page's real semantic structure. The
technique given in this chapter, on the other hand, is based only on the processing of textual
information that may be acquired from an online resource.[2]
21
3.3.2 RSS-Feedparser
We first attempted scraping the positioning, however the scraping results are extremely
dependent on the arrangement of the placement, which can vary over time, resulting in scraper
failure. As a result, we moved on to RSS.
Rich Site Summary (RSS) is a web feed type that publishes regularly updated content such as
blog posts, news headlines, audio, and video. An RSS document (also known as a "feed," "web
feed," or "channel") contains full or summarised text as well as information such as the date
and name of the publication. It displays the data in an XML format, which is nearly identical to
HTML tags with the exception that the names of the tags used in XML are different.[1]
22
We utilised rss feedparser to retrieve these RSS feeds into our web app. Feedparser is a
Python module that parses feeds in all of the common formats, such as Atom, RSS, and
RDF. It supports Python versions 2.4 through 3.3. Because the rss feedparser is automated,
it obtains all of the news articles from the specified url without the need for human
intervention.
3.4. Data-Preprocessing
The RSS data isn't suitable for input into our information extraction engine because it is raw
text data. Preprocessing of the data is required. We utilise the following procedures for
preprocessing in this system such as:
Sentence tokenization, word tokenization, POS tagging, lemmatization, date removal, Unicode
removal, and semantic role labelling are all examples of tokenization.
23
3.4.1. Sentence-Tokenization
3.4.2. Word-Tokenization
The breaking of a sentence into individual words is referred to as word tokenization. This is
crucial for POS tagging, which uses individual words from a phrase as input and assigns
them a tag[1]. The sentence tokenization is accomplished using the nltk library's word
tokenize() method.
24
3.4.3. POS-Tagging
Attaching a POS (parts of speech) tag to each word in a phrase is referred to as POS
tagging[1]. It's crucial to use POS tagging to locate information about a sentence's context.
The following are some of the most often used tags:
25
3.4.4. Semantic-Role-Labeling(SRL)
Semantic role labelling, also known as shallow semantic parsing, is a natural language
processing technique that gives labels to words or phrases in a sentence to identify their
grammatical category, such as agent, aim, or consequence[1]. It entails identifying and
categorising semantic arguments associated with a sentence's predicate or verb. The
meaning of arguments for sentences is shown in the table below.
26
3.5 Text-Summarization
A summary is a text created from one or more other writings that expresses significant
information from the original texts while being less than half the length of the original
documents. Text summarization approaches, on the other hand, attempt to reduce reading
effort by increasing the data density presented to the reader.
Summarization strategies may be divided into two types: extractive and abstractive.
Extractive methods use natural language generators to generate original summaries,
whereas abstractive methods use natural language generators to create original
summaries.[2]
Word frequency analysis, cue word extraction, and phrase selection based on their position
within the text were among the approaches used. Tf-idf metrics (term frequency - inverse
document frequency), graph analysis, latent semantic analysis, machine learning
approaches, and fuzzy systems have all been employed in recent studies.
Other techniques took advantage of semantic processing: lexicon analysis was used, and
ideas extraction was used to assist the study reported in.
The objective of abstractive summarization was addressed in, with the goal of
understanding the major concepts in a document and then expressing those notions in
natural language.
27
3.5.1 Comparisons:
There are several summarising tools available. After examining the efficiency of each
summarizer, we've picked the following Summarizers for comparison:
Copernic Summarizer I (ii) Intellexer Summarizer Pro is a programme that allows you to
summarise information (iii) Excellent summary (iv) Text Compactor (v) Tools4Noobs
Summarizer [3]
The web application is currently being deployed on a localhost server and will be available
soon. PostgreSQL is used to store data.
28
3.7. Software-Development-Model Info:
As a software development approach, the iterative process was used, which began with the
simple implementation of the need. It improved the developing version iteratively until the
entire system was implemented [1]. The iterative model is a type of software development
life cycle (SDLC) that focuses on a simple, initial implementation that gradually increases
in complexity and feature set until the final system is complete.
The nature of a data extraction system based on the knowledge engineering method is
created by first developing a rule to extract a certain object or event, then implementing and
testing it on new types of articles before writing another rule.
When required, the rule is rewritten and re-implemented based on the performance until the
desired outcome is achieved. This step-by-step method to rule development guarantees that
mistakes are identified and corrected as soon as possible. Iterative development is the most
adaptable approach of development, allowing for new requirements and changes to be
easily accommodated.
29
3.8. Associated-Diagrams:
Various diagrams related to this method are included in this section. Use case diagrams,
entity relation diagrams, sequence diagrams, and several levels of data flowcharts are
among the diagrams provided.
3.8.1. Use-Case-Diagram:
The use case diagram for our system below depicts a set of activities (use cases) that the
system may execute in collaboration with one or more external users (actors).
Figure 3.3 : Use case diagram of user’s possible interaction with the system
30
3.8.3. Sequence-Diagram:
Figure 3.4 : Sequence diagram showing object interactions arranged in time sequence
31
3.9 Tools Used By Us:
Github(Repo):
Github will be utilised for source code management and distributed version control. We'll
make a public repository for our source code and update it there.
PostgreSQL database:
PostgreSQL is an object-relational database management system (ORDBMS) that focuses
on flexibility and compliance with industry standards[1]. In our project, we'll utilise
Postgres version 9.5. Pgadmin is a graphical programme administration application.
Django Framework:
Django is a web framework written in Python. It employs the MVC approach, which speeds
up and simplifies development. We'll choose Django 3.11 because it includes many of the
standard libraries and packages we'll need for our project.
BeautifulSoup4:
BeautifulSoup4 is a Python module for extracting information from HTML and XML
documents. BeautifulSoup will be used to extract data from an RSS feed, which will then
be utilised to obtain information. The Feed Parser Python module allows you to download
and parse syndicated feeds. It's RSS-capable (Rich Site Summary).
32
Figure 3.5 : The data flow of the proposed framework
33
Chapter 4
Experimental Results
This section focuses on the detailed discussion on the set of experiments conducted on
sentiment labelling using Vader packages supported the news category of the dataset
and prediction of sentiment labels is presented within the classification report
The sentiment labels are generated using the aspect-based tokenization method. For a
better understanding of the calculation, three typical review sentences from the info set
are highlighted from the news review data set. The aspect terms are highlighted words
in the following review sentences. A tuple of polarity and subjectivity must be collected
to calculate the polarity scores using the aspect-based tokenization per word. The rule
included inside the code was then supported by being labelled as negative or positive.
34
Figure 4.1 Start Page
35
Figure 4.3 Select News Priority Wise
36
Table no. 4.1
37
4.2 Prediction of sentiment labels with the help of results:
.In our approach, the results clearly distinguishes between actuality positive and
true negative values, but the overall number of values differs somewhat because to
a small subset of neutral values branching off from true negative and true positive
values within the Vader method's results.
The classification report in table 4.2 shows that the Vader method's positive
precision does not entirely outperform the Vader method's negative precision, and
that the Vader method's negative recall value is greater than the positive recall
value. The following are some of my observations:
Positive recall indicates that the Vader approach selects 63 percent of positive
labels.
In the Vader technique, negative recall indicates that negative labels are picked 94
percent of the time, with a weighted average of 93 percent.
This model is decent in classifying error, but it appears to be poorer in the Vader
technique of classifying positive classes, with a recall percentage of 63 percent vs
93 percent in negative classes.
38
Table no 4.2
39
Chapter 5
Conclusions
5. 1 Conclusions
.The study described here provides a paradigm that might help sophisticated Web mining
technologies function more successfully.
The suggested system analyses textual data from an internet page and uses semantic networks to
accomplish a number of objectives:
2) the selection of phrases that are more closely related to a specific topic
3) a textual resource's automatic summary. The final framework makes use of these features to
tackle two tasks at once: text summarising and page segmentation.The suggested technique, which
relies on an abstract representation that represents the informative content of the fundamental
textual resource on a cognitive foundation, includes a semantic characterisation of text as a key
component. [ 2 ]
However, because it does not rely on semantic information already contained in web resources, the
current technique cannot be classified as Semantic Web.Semantic networks are used in the
proposed approach to define the content of a textual resource using semantic domains, which are as
important as a conventional bag of words.Experiments have shown that such an approach can result
in a coarse-grained level of sense distinctions, which promotes the identification of the themes that
are really discussed on the website. In this regard, testing results revealed that the system can
mimic human assessors in assessing the importance of a text's sole phrases.
An interesting feature of this work is that the page segmentation technique is predicated only on the
analysis of the textual a part of the net resource. [ 2 ]
40
The page segmentation approach used in this study is unique in that it is based only on the
examination of the textual portion of the internet resource.
The combining of the content-driven segmentation method with traditional segmentation engines,
which are more geared toward the study of the net page's underlying structure, might be a future
path of this research. The resultant framework should be able to integrate the results of the two
modules to improve the segmentation procedure's performance.
The future of sentiment analysis in applicable fields will continue to grow, and as a result,
sentiment analysis techniques will become an integrated element of many services and products.
Advances in language communication processing and machine learning, we believe, will enhance
research methodologies. Furthermore, we witness a shift away from text-based sentiment analysis
approaches and toward methods that influence the opposite side of the brain, such as voice, gaze,
and neuromarker analysis.. However, we are doubtful whether sentiment analysis are able to do an
analogous 50-fold increase within the number of papers within the next ten years as has occurred
during the past ten years (2005-2015). this can be supported the very fact that this could end in
having over 250,000 papers on sentiment analysis published by the year 2025.
Extractive techniques are the most successful and adaptable methods employed in automatic
summarization to date: they attempt to choose the most relevant phrases from a collection of
original documents in order to produce a condensed text that renders essential bits of data. As
we've seen, both approaches are far from ideal: in multi-document summarization, the selection of
phrases from several sources leads in duplication, which must be deleted sequentially. Furthermore,
most of the time just a portion of a phrase is relevant, thus extracting only sub-sentences isn't
practical. Finally, extracting phrases from a variety of texts might be beneficial.Lastly, extracting
the sentences from various different documents may produce an inconsistent and/or hard-to-read
summary.
41
References
[1] Mafiadoc.com (Mafiadoc.com, 2021, #)
[2] www.intechopen.com
[3] Flora Amato, Vincenzo Moscato, Antonio Picariello, Giancarlo Sperlí, Antonio D’Acierno
[5] R. McDonald, A study of global inference algorithms in multi document summarization, Proc.
29th Eur. Conf. IR Res. (2007),pp. 557–564.
[7] R. McDonald and V. Hristidis, A survey of text summarization techniques, Mining Text
Data,43 (2012).
[8] V. Gupta and S. Gurpreet, A survey of text summarization extractive techniques, J. Emerg.
Technol. Web Intel. 258 (2010)
[9] Deepali K. Gaikwad1 and C. Namrata Mahender , A review paper on text summarization(2016)
[10] S. A. Sandri, D. Dubois and H. W. Kalfsbeek, "Elicitation, assessment, and pooling of expert
judgments using possibility theory," in IEEE Transactions on Fuzzy Systems, vol. 3, no. 3,
pp. 313-335, Aug. 1995, doi: 10.1109/91.413236.
42