0% found this document useful (0 votes)

9 views98 pages

G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction

NLP CH4

Uploaded by

shyamthakkar1673

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views98 pages

G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction

NLP CH4

Uploaded by

shyamthakkar1673

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

G.

H Patel College of
Engineering and
Technology
Text Analysis, Summarization and Extraction
Text Classification:
Introduction:

• Unstructured data accounts for over 80% of all data, with text
being one of the most common categories. Because analyzing,
comprehending, organizing, and sifting through text data is
difficult and time-consuming due to its messy nature, most
businesses do not exploit it to its full potential despite all the
potential benefits it would bring.
Introduction:

• where Machine Learning and text classification come into play.

Companies may use text classifiers to quickly and cost-effectively
arrange all types of relevant content, including emails, legal
documents, social media, chatbots, surveys, and more.
Introduction:

• some of the essential models you need to know, how to evaluate

those models, and the potential alternatives to developing your
algorithms.
What is Text Classifier:

• Natural Language Processing (NLP), Sentiment Analysis, spam, and

intent detection, and other applications use text classification as
a core Machine Learning technique.
• A text classifier labels unstructured texts into predefined text
categories. Instead of users having to review and analyze vast
amounts of information to understand the context, text
classification helps derive relevant insight.
• The goal of text classification is to categorize or predict a class of
unseen text documents, often with the help of supervised machine
learning.
Text Classification pipeline:
Text classification use cases and application:

• spam filter is a common application that uses text classification to

sort emails into spam and non-spam categories.
Spam Classification:
Classifying news articles and blogs:

• A supervised machine learning model is trained on labeled data,

which includes both the raw text and the target. Once a model is
trained, it is then used in production to obtain a category (label)
on the new and unseen data (articles/blogs written in the future).
Classifying news and articles, blogs:
Types of Text classification System:

• There are two types of Text Classification:

• 1.Rule based text classification
• 2.Machine learning based classification.
Rule-based text classification:

• Rule-based techniques use a set of manually constructed language

rules to categorize text into categories or groups.

• For example, imagine you have tons of new articles, and your goal
is to assign them to relevant categories such as Sports, Politics,
Economy, etc.
Rule based text classification:

• With a rule-based classification system, you will do a human

review of a couple of documents to come up with linguistic rules
like this one:

• If the document contains words such as money, dollar, GDP, or inflation, it

belongs to the Politics group (class).
Rule-based text classification:

• These systems, to begin with, demand in-depth expertise in the

field. They take a lot of time since creating rules for a
complicated system can be difficult and frequently necessitates
extensive study and testing.
Machine learning Based text classification:

• Machine learning-based text classification is a supervised machine

learning problem.

• It learns the mapping of input data (raw text) with the labels (also
known as target variables).
Machine learning Based text classification:

• supervised machine learning, text classification machine learning

has two phases; training and prediction.
Machine learning Based text classification:
Training Phase:

• A supervised machine learning algorithm is trained on the input-

labeled dataset during the training phase. At the end of this
process, we get a trained model that we can use to obtain
predictions (labels) on new and unseen data.
Training Phase and Prediction phase
Prediction phase

• Once a machine learning model is trained, it can be used to

predict labels on new and unseen data. This is usually done by
deploying the best model from an earlier phase as an API on the
server.
Prediction phase
Text Preprocessing Pipeline:
Feature Extraction

• The two most common methods for extracting feature from text
or in other words converting text data (strings) into numeric
features so machine learning model can be trained are: Bag of
Words (a.k.a CountVectorizer) and Tf-IDF.
Bag of Word :

• A bag of words (BoW) model is a simple way of representing text

data as numeric features. It involves creating a vocabulary of
known words in the corpus and then creating a vector for each
document that contains counts of how often each word appears.
Bag of Word:
TF-IDF

• The TF-IDF model is different from the bag of words model in that
it takes into account the frequency of the words in the document,
as well as the inverse document frequency. This means that the
TF-IDF model is more likely to identify the important words in a
document than the bag of words model.
What is Text summarization:

• Text summarization condenses one or more texts into shorter

summaries for enhanced information extraction.

• Text summarization is the creation of a short, accurate, and fluent

summary of a longer text document.
Types of automatic text summarization:

• 1. Extractive summarization
• 2.Abtractive summarization
Extractive summarization:

• Extractive summarization extracts unmodified sentences from the

original text documents. A key difference between extractive
algorithms is how they score sentence importance while reducing
topical redundancy.
Extractive summarization:

• As with other NLP tasks, text summarization requires text data first
undergo preprocessing. This includes tokenization, stopword removal,
and stemming or lemmatization in order to make the dataset readable
by a machine learning model. After preprocessing, all extractive text
summarization methods follow three general, independent steps:
representation, sentence scoring, and sentence selection.
Extractive summarization(representation)

• Represent the Bag of Word model

• which represent text segments— such as words or sentences— as
datapoints in a vector space. Large, multi-document datasets for use term
frequency-inverse document frequency (TF-IDF), a variant of bag of words
that weights each term to reflect its importance within a text set.
Extractive summarization(Sentence Scoring)

• Sentence scoring, per its name, scores each sentence in a text

according to their importance to that text.

• There are different type of method for sentence scoring

• TF-IDF method.
Extractive summarization(Sentence
Selection)

• weighted sentences by importance, algorithms select the n most

important sentences for a document or collection thereof. These
sentences comprise the generated summary.
Extractive summarization(Sentence
Selection)

• he sentence selection step aims to reduce redundancy in the final

summaries. Maximal marginal relevance methods employ an
iterative approach.
Abstractive summarization:

• Abstractive summarization generates original summaries using

sentences not found in the original text documents. Such
generation requires neural networks and large language models
(LLMs) to produce semantically meaningful text sequences.

• abstractive text summarization is more computationally expensive

then extractive
Abstractive summarization:

• What are the methods are use in Abstractive Summarization?

1. Sentence Compression -: sentence compression—humans
summarize longer texts and sentences by shortening them. There
are two general approaches to sentence compression: rule-based
and statistical methods.
Abstractive summarization:

2. Information Fusion:- summarize documents by concatenating

information from multiple passages into a single sentence or phrase.
What are the Benefits of Text Summarization:
1. Scalable and Quick

• Manually summarizing a short document is fairly easy, but what if

you have an article or paper that is hundreds or thousands of
pages long?

• software will analyze all your input text and source documents and
provide you with a summary text.
Leverage Existing Tools

• Text summarization algorithms are easy to use and available to

make your research and business decision-making process more
efficient and actionable.
Understand Your Customers Better:

• NLP can extract insight from text data, this makes it a perfect
tool for keeping track of customer feedback, determining
sentiment, whether it’s positive or negative, and to what degree.

• monitor reviews in real-time and flag the most important or time-sensitive

comments, provide timely feedback, and ignore irrelevant information.
Summarize a Text In Different Format:

• Natural language processing helps you obtain summarized text extracted

from your competitor’s web pages, market research documents,
industry-related articles, etc. Having a clear idea of the market and your
competitors helps you determine actionable steps for presenting your
product or refining your business strategy. This helps you stand out
amongst the competitors and maintain a competitive advantage in the
market.
Summarize a Text In Different Format:

• NLP platform can provide you with the most relevant sentences
that you can use to communicate your product, important points
to focus on and give you a deep understanding of your
environment.
Ensure all Critical Information is Covered:

• The automated text summarizing approach makes it easy for the

user to read all the most important sentences in a document.
What are the Use cases of Text
Summarization:
• Financial Research with NLP
• Media Monitoring with NLP
Why automatic Text summarization:

• Summaries reduce reading time.

• While researching using various documents, summaries make the selection process easier.

• Automatic summarization improves the effectiveness of indexing.

• Automatic summarization algorithms are less biased than human summarizers.

• Personalized summaries are useful in question-answering systems as they provide personalized

information.

• Using automatic or semi-automatic summarization systems enables commercial abstract services to -

increase the number of text documents they are able to process.
Types of Summarization:
Types of Summarization:

• An Extractive summarization method consists of selecting important sentences, paragraphs

etc. from the original document and concatenating them into shorter form.

• An Abstractive summarization is an understanding of the main concepts in a document and

then express those concepts in clear natural language.

• The Domain-specific summarization techniques utilize the available knowledge specific to

the domain of text. For example, automatic summarization research on medical text
generally attempts to utilize the various sources of codified medical knowledge and
ontologies.
Types of Text Summarization:

• The Generic summarization focuses on obtaining a generic summary or abstract of the collection of
documents, or sets of images, or videos, news stories etc.

• The Query-based summarization, sometimes called query-relevant summarization, summarizes

objects specific to a query.

• The Multi-document summarization is an automatic procedure aimed at extraction of information

from multiple texts written about the same topic. Resulting summary report allows individual users,
such as professional information consumers, to quickly familiarize themselves with information
contained in a large cluster of documents.

• The Single-document summarization generates a summary from a single source document.

How to do text summarization

• Text cleaning
• Sentence Tokenization
• Word tokenization
• Word-frequency table
• Summarization
Named Entity Recognition:

• Named Entity Recognition (NER) is a technique in natural language processing (NLP)

that focuses on identifying and classifying entities. The purpose of NER is to
automatically extract structured information from unstructured text, enabling
machines to understand and categorize entities in a meaningful manner for various
applications like text summarization, building knowledge graphs, question answering,
and knowledge graph construction. The article explores the fundamentals, methods
and implementation of the NER model.
What is Named Entity Recognition (NER)?

• Name-entity recognition (NER) is also referred to as entity identification, entity

chunking, and entity extraction. NER is the component of information extraction that
aims to identify and categorize named entities within unstructured text.

• NER involves the identification of key information in the text and classification into a
set of predefined categories.

• There are different kinds of Categories like a person names, organizations, locations,
time expressions, quantities, percentages
How Name entity Reorganization Work:

• The NER system analyses the entire input text to identify and locate the named entities.

• NER can be trained to classify entire documents into different types, such as invoices,
receipts, or passports. Document classification enhances the versatility of NER, allowing it to
adapt its entity recognition based on the specific characteristics and context of different
document types.

• NER employs machine learning algorithms, including supervised learning, to analyze labeled
datasets. These datasets contain examples of annotated entities, guiding the model in
recognizing similar entities in new, unseen data.
How Name entity Reorganization Work:

• multiple training iterations, the model refines its understanding of contextual

features, syntactic structures, and entity patterns, continuously improving its
accuracy over time.
Name entity Reorganization Methods:

• Lexicon Based Method

• The NER uses a dictionary with a list of words or terms. The process involves
checking if any of these words are present in a given text. However, this approach
isn’t commonly used because it requires constant updating and careful maintenance
of the dictionary to stay accurate and effective.
Name entity Reorganization Methods:
• Rule Based Method

• The Rule Based NER method uses a set of predefined rules guides the extraction of
information. These rules are based on patterns and context. Pattern-based rules
focus on the structure and form of words, looking at their morphological patterns. On
the other hand, context-based rules consider the surrounding words or the context in
which a word appears within the text document. This combination of pattern-based
and context-based rules enhances the precision of information extraction in Named
Entity Recognition (NER).
Name entity Reorganization Methods:
• Machine learning based method
• Multi-Class Classification with Machine Learning Algorithms
• One way is to train the model for multi-class classification using different machine
learning algorithms, but it requires a lot of labelling. In addition to labelling the
model also requires a deep understanding of context to deal with the ambiguity of
the sentences.
Name entity Reorganization Methods:
• Machine learning based method
• Conditional Random Field (CRF)

• Conditional random field is implemented by both NLP Speech Tagger and NLTK. It is
a probabilistic model that can be used to model sequential data such as words.
What is Information Extraction:

• Information extraction is the process of extracting information from

unstructured textual sources to enable finding entities as well as
classifying and storing them in a database. Semantically enhanced
information extraction (also known as semantic annotation) couples
those entities with their semantic descriptions and connections from a
knowledge graph. By adding metadata to the extracted concepts, this
technology solves many challenges in enterprise content management
and knowledge discovery.
Information Extraction:

• Textual sources from which information extraction can distill

structured information are legal acts, medical records, social
media interactions and streams, online news, government
documents, corporate reports and more.
How does Information Extraction Work:
What are the Key components of Information
Extraction:
• Named Entity Recognition (NER)
• NER identifies and classifies entities within a text into predefined
categories such as the names of persons, organizations, locations,
dates, etc.

• Relationship Extraction

• This involves identifying and categorizing the relationships

between entities within a text, helping to build a network of
connections and insights.
What are the Key components of Information
Extraction:

• Event Extraction
• Event extraction identifies specific occurrences described in the
text and their attributes, such as what happened, who was
involved, and where and when it occurred.
Information Extraction Techniques in NLP:

• 1. Named Entity Recognition (NER)

• Definition: Identifying and classifying named entities (e.g., persons, organizations,

locations, dates) in text.

• Techniques:

• Rule-based approaches: Utilize predefined rules and patterns.

• Statistical models: Use probabilistic models like Hidden Markov Models (HMM) and
Conditional Random Fields (CRF).

• Deep learning: Leverage neural networks such as BiLSTM-CRF and transformers like BERT.
Information Extraction Techniques in NLP:
• 2. Relation Extraction

• Definition: Identifying and categorizing relationships between entities within a text.

• Techniques:

• Pattern-based: Uses patterns and linguistic rules.

• Supervised learning: Employs labeled data to train classifiers.

• Distant supervision: Uses a large amount of noisy labeled data from knowledge
bases.

• Neural networks: Utilizes CNNs, RNNs, and transformers for relation classification.
Information Extraction Techniques in NLP:

• Definition: Detecting events and their participants, attributes, and temporal

information.

• Techniques:

• Template-based: Matches text with pre-defined event templates.

• Machine learning: Uses classifiers and sequence labeling methods.

• Deep learning: Applies RNNs, CNNs, and attention mechanisms to capture event
structures.
Information Extraction Techniques in NLP:

• Definition: Determining when different expressions in a text refer to the same entity.

• Techniques:

• Rule-based: Employs heuristic rules.

• Machine learning: Trains classifiers using features like gender, number, and syntactic
role.

• Neural networks: Uses deep learning models like BiLSTM and transformers for
coreference chains.
Information Extraction Techniques in NLP:

• 5. Template Filling

• Definition: Extracting specific pieces of information to populate predefined

templates.

• Techniques:

• Rule-based: Matches text to slots based on rules.

• Machine learning: Uses classifiers to fill template slots.

• Hybrid methods: Combine rules and machine learning for better accuracy.
Information Extraction Techniques in NLP:

• 6. Open Information Extraction (OpenIE)

• Definition: Extracting tuples of arbitrary relations and arguments from text.

• Techniques:

• Pattern-based: Utilizes linguistic patterns to identify relational triples.

• Statistical: Uses probabilistic models to determine the confidence of extracted

relations.

• Neural OpenIE: Leverages deep learning models to improve the extraction process.
What are the challenges in Information
Extraction:
• Ambiguity and Variability of Language: Human language is inherently ambiguous and
varies greatly in structure and style, making accurate extraction challenging.

• Domain-Specific Adaptation: IE systems need to be tailored to specific domains to

achieve high accuracy, requiring substantial effort in training and customization.

• Data Quality and Annotation: The quality of the extracted information heavily
depends on the quality of the training data and the annotations used to train IE
models.

Deep Learning Hardware
No ratings yet
Deep Learning Hardware
82 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Sma U-4
No ratings yet
Sma U-4
25 pages
Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
Chapter3 - BP
No ratings yet
Chapter3 - BP
12 pages
Applications of NLP
No ratings yet
Applications of NLP
85 pages
NLP m4
No ratings yet
NLP m4
97 pages
Telegram Channel Telegram Group
No ratings yet
Telegram Channel Telegram Group
36 pages
CAT King Study Material 4
No ratings yet
CAT King Study Material 4
32 pages
Best Text To Speech Ai - Aitech - Studio
No ratings yet
Best Text To Speech Ai - Aitech - Studio
8 pages
Automatic Text Recognisation
No ratings yet
Automatic Text Recognisation
4 pages
IEEE Conference Template 1 PDF
No ratings yet
IEEE Conference Template 1 PDF
3 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Unit 2
No ratings yet
Unit 2
6 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Mining Text Data and Classificatin
No ratings yet
Mining Text Data and Classificatin
4 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
28 pages
Paper News Text Summaraization 1
No ratings yet
Paper News Text Summaraization 1
7 pages
Lec # 4-1
No ratings yet
Lec # 4-1
15 pages
Text Summarization - Articles - Weights & Biases
No ratings yet
Text Summarization - Articles - Weights & Biases
16 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
Unit 3
No ratings yet
Unit 3
27 pages
Text Processing For NLP Text Processing
No ratings yet
Text Processing For NLP Text Processing
15 pages
Aws Ai Cert Mock Exam 1 (Detailed Ans)
No ratings yet
Aws Ai Cert Mock Exam 1 (Detailed Ans)
88 pages
Paper Work
No ratings yet
Paper Work
12 pages
Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
No ratings yet
Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
11 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
25 pages
Text Summarization Using NLP Technique
No ratings yet
Text Summarization Using NLP Technique
7 pages
NLP 2
No ratings yet
NLP 2
86 pages
Day 2
No ratings yet
Day 2
58 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
A Review Deep Learning Techiques For Speech Processing2023
No ratings yet
A Review Deep Learning Techiques For Speech Processing2023
75 pages
IEEE Conference Template 3
No ratings yet
IEEE Conference Template 3
4 pages
Text Mining Introduction
No ratings yet
Text Mining Introduction
6 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
CoSc581 NLP Topic 5-Text Summarization PDF
No ratings yet
CoSc581 NLP Topic 5-Text Summarization PDF
25 pages
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
No ratings yet
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
14 pages
Project Final Presentation
No ratings yet
Project Final Presentation
30 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Part 1 - Machine Learning
No ratings yet
Part 1 - Machine Learning
44 pages
NLP Module 6
No ratings yet
NLP Module 6
30 pages
Unit 3 AI-ML Driven Data Science and Automation
No ratings yet
Unit 3 AI-ML Driven Data Science and Automation
49 pages
Talking Points
No ratings yet
Talking Points
8 pages
Emerging Trends
No ratings yet
Emerging Trends
12 pages
Learning Deep Features For Discriminative Localization - Supp
No ratings yet
Learning Deep Features For Discriminative Localization - Supp
9 pages
Text Summarization Using NLP
No ratings yet
Text Summarization Using NLP
15 pages
AI - Human Computer Interaction Quiz - June 2024
No ratings yet
AI - Human Computer Interaction Quiz - June 2024
14 pages
For Fake or Real Disaster Tweet Analysis of Machine Learning Algorithms
No ratings yet
For Fake or Real Disaster Tweet Analysis of Machine Learning Algorithms
23 pages
Unit - 4
No ratings yet
Unit - 4
26 pages
IEEE Conference Template 3 PDF
No ratings yet
IEEE Conference Template 3 PDF
4 pages
Basic Introduction To Convolutional Neural Network in Deep Learning
No ratings yet
Basic Introduction To Convolutional Neural Network in Deep Learning
9 pages
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
No ratings yet
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
2 pages
How To Reduce Overfitting With Dropout Regularization in Keras
No ratings yet
How To Reduce Overfitting With Dropout Regularization in Keras
12 pages
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
13 pages
BI Module 5
No ratings yet
BI Module 5
11 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Let Your Graph Do The Talking: Encoding Structured Data For Llms
No ratings yet
Let Your Graph Do The Talking: Encoding Structured Data For Llms
16 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
10 1108 - Imds 08 2022 0468
No ratings yet
10 1108 - Imds 08 2022 0468
17 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
7 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Paper News Text Summaraizaton
No ratings yet
Paper News Text Summaraizaton
8 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
CT3 Set A
No ratings yet
CT3 Set A
3 pages
NLP Text Summary
No ratings yet
NLP Text Summary
21 pages
Gradient of A Matrix Matrix Multiplication
No ratings yet
Gradient of A Matrix Matrix Multiplication
1 page
Research Paper 8
No ratings yet
Research Paper 8
4 pages
Unit 5
No ratings yet
Unit 5
8 pages
Convolutional Neural Network CNN For Image Detection and Recognition
No ratings yet
Convolutional Neural Network CNN For Image Detection and Recognition
5 pages
EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary
No ratings yet
EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary
24 pages
A Complete Process of Text Classification System Using State of The Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State of The Art NLP Models
26 pages
The Advantages and Disadvantages of AI
No ratings yet
The Advantages and Disadvantages of AI
1 page
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
No ratings yet
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
26 pages
NLP Synopsis
No ratings yet
NLP Synopsis
9 pages
Writing Task-1 Sunitha
No ratings yet
Writing Task-1 Sunitha
1 page
Implementation of NLP Based Automatic Text Summarization Using Spacy
No ratings yet
Implementation of NLP Based Automatic Text Summarization Using Spacy
15 pages
CG-FER A Hybrid CNN and GAN Based Facial Expression Recognition
No ratings yet
CG-FER A Hybrid CNN and GAN Based Facial Expression Recognition
6 pages
1 Text Mining Review Slides
No ratings yet
1 Text Mining Review Slides
78 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
Generative AI Essay
No ratings yet
Generative AI Essay
3 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages

G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction

Uploaded by

G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction

Uploaded by

G.

• where Machine Learning and text classification come into play.

• some of the essential models you need to know, how to evaluate

• Natural Language Processing (NLP), Sentiment Analysis, spam, and

• spam filter is a common application that uses text classification to

• A supervised machine learning model is trained on labeled data,

• There are two types of Text Classification:

• Rule-based techniques use a set of manually constructed language

• With a rule-based classification system, you will do a human

• If the document contains words such as money, dollar, GDP, or inflation, it

• These systems, to begin with, demand in-depth expertise in the

• Machine learning-based text classification is a supervised machine

• supervised machine learning, text classification machine learning

• A supervised machine learning algorithm is trained on the input-

• Once a machine learning model is trained, it can be used to

• A bag of words (BoW) model is a simple way of representing text

• Text summarization condenses one or more texts into shorter

• Text summarization is the creation of a short, accurate, and fluent

• Extractive summarization extracts unmodified sentences from the

• Represent the Bag of Word model

• Sentence scoring, per its name, scores each sentence in a text

• There are different type of method for sentence scoring

• weighted sentences by importance, algorithms select the n most

• he sentence selection step aims to reduce redundancy in the final

• Abstractive summarization generates original summaries using

• abstractive text summarization is more computationally expensive

• What are the methods are use in Abstractive Summarization?

2. Information Fusion:- summarize documents by concatenating

• Manually summarizing a short document is fairly easy, but what if

• Text summarization algorithms are easy to use and available to

• monitor reviews in real-time and flag the most important or time-sensitive

• Natural language processing helps you obtain summarized text extracted

• The automated text summarizing approach makes it easy for the

• Summaries reduce reading time.

• Automatic summarization improves the effectiveness of indexing.

• Automatic summarization algorithms are less biased than human summarizers.

• Personalized summaries are useful in question-answering systems as they provide personalized

• Using automatic or semi-automatic summarization systems enables commercial abstract services to -

• An Extractive summarization method consists of selecting important sentences, paragraphs

• An Abstractive summarization is an understanding of the main concepts in a document and

• The Domain-specific summarization techniques utilize the available knowledge specific to

• The Query-based summarization, sometimes called query-relevant summarization, summarizes

• The Multi-document summarization is an automatic procedure aimed at extraction of information

• The Single-document summarization generates a summary from a single source document.

• Named Entity Recognition (NER) is a technique in natural language processing (NLP)

• Name-entity recognition (NER) is also referred to as entity identification, entity

• multiple training iterations, the model refines its understanding of contextual

• Lexicon Based Method

• Information extraction is the process of extracting information from

• Textual sources from which information extraction can distill

• This involves identifying and categorizing the relationships

• 1. Named Entity Recognition (NER)

• Definition: Identifying and classifying named entities (e.g., persons, organizations,

• Rule-based approaches: Utilize predefined rules and patterns.

• Definition: Identifying and categorizing relationships between entities within a text.

• Pattern-based: Uses patterns and linguistic rules.

• Supervised learning: Employs labeled data to train classifiers.

• Definition: Detecting events and their participants, attributes, and temporal

• Template-based: Matches text with pre-defined event templates.

• Machine learning: Uses classifiers and sequence labeling methods.

• Rule-based: Employs heuristic rules.

• Definition: Extracting specific pieces of information to populate predefined

• Rule-based: Matches text to slots based on rules.

• Machine learning: Uses classifiers to fill template slots.

• 6. Open Information Extraction (OpenIE)

• Definition: Extracting tuples of arbitrary relations and arguments from text.

• Pattern-based: Utilizes linguistic patterns to identify relational triples.

• Statistical: Uses probabilistic models to determine the confidence of extracted

• Domain-Specific Adaptation: IE systems need to be tailored to specific domains to

You might also like