0% found this document useful (0 votes)

19 views5 pages

Exam 2

Uploaded by

JUBAYAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views5 pages

Exam 2

Uploaded by

JUBAYAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Week -10

1. What are the 3 differences between Primary vs secondary data?

2. Different types of Application-programming interface (API)

a. Public API?
b. Rest API?-Historical Data
c. Streaming API?- Real time data
d. Web Crawling

Week-11

• Supervised vs unsupervised learning

In this class, we only learn unsupervised ML (where we do not have target

variables). If there are target variables, then it is supervised ML; if there are
no target variables, then it is unsupervised ML.

Topic modeling is unsupervised ML.

• Structured vs unstructured data

Structured-organized data in rows and columns in excel sheet

Unstructured Data- Video, Audio, Textual data. Using Term by Document

Matrix (TDM), unstructured data can be converted into structured data. E.g
Bag of Words

• Text mining concepts

Define the process of Text Analysis?

Text Analytics = Information Retrieval + Information Extraction + Data

Mining + Web Mining
text mining = A semi-automated process of extracting knowledge from
unstructured data sources a.k.a. text data mining or knowledge discovery in
textual databases

To perform text mining – first, impose structure to the data, then mine the
structured data

Text Mining Terminology (Unit of Analysis:

Document)
Unstructured or semistructured data

Corpus (and corpora)-collection of documents

Terms-each word known as terms

Concepts

Stemming- cutting the word, to bring words in same level those are in
different forms

Stop words (and include words)- are those words we do not need in our
analysis. Like articles (a,an, the etc)

Synonyms (and polysemes)

Tokenisation – the process of breaking up a given text into units called

tokens.

Lemmatization - remove inflectional endings only and to return the base or

dictionary form of a word

Term dictionary

Word frequency

Part-of-speech tagging

Term-by-document matrix

Occurrence matrix

Transformation

The Three-Step/Task Text Mining Process

Task 1 Task 2 Task 3
Establish the Corpus: Create the Term- Extract Knowledge:
Collect and organize Document Matrix: Discover novel
Data
Text the domain-specific Introduce structure patterns from the 5
4
3
unstructured data to the corpus T-D matrix 1
2

Knowledge
Feedback Feedback

The inputs to the process The output of Task 1 is a The output of Task 2 is a flat The output of Task 3 is a
include a variety of relevant collection of documents in file called term-document number of problem-specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such as computer processing populated with the term clustering models and
text, XML, HTML, etc. frequencies visualizations

TF-IDF

A high weight in tf–idf is reached by a high term frequency (in the given document) and
a low document frequency of the term in the whole collection of documents; the
weights hence tend to filter out common terms.

Week-12

Sentiment Analysis Process

Objective-Subjective
Negative-Positive

Comes right after the retrieval and preparation of the text documents
Step 1 – Sentiment It is also called detection of objectivity
Detection Fact [= objectivity] versus Opinion [= subjectivity]

Step 2 – N-P Given an opinionated piece of text, the goal is to classify the opinion as
falling under one of two opposing sentiment polarities
Polarity
N [= negative] versus P [= positive]
Classification

The goal of this step is to accurately identify the target of

Step 3 – Target the expressed sentiment (e.g., a person, a product, an
event, etc.)
Identification Level of difficulty  the application domain

Step 4 – Once the sentiments of all text data points in the

document are identified and calculated, they are to be
Collection and aggregated
Aggregation Word  Statement  Paragraph  Document

Tag the documents Parse and

Read Data Extract
using MPQA Pre-
and Create Words
positive and Process
Corpus negative list
Bag of words

Aggregate at Tag *TF

Term Frequency
the Review
Positive =1 (Absolute)
level
Negative=-1
How many times + or – words

Week 14 – Future trends

• Definitions
• Deep learning – complex neural networks – non-linear relationships
• Transfer learning – using pre-trained models
• Generative AI – model able to create new content
• Reinforcement learning – reward-based ML
• Federated Models – involves local and global ML
• IoT
• Machine to Machine communications
• Sensors
• Automated algorithms

Galgotia College of Engineering & Technology, Greater Noida Department of Computer Science & Engineering
100% (1)
Galgotia College of Engineering & Technology, Greater Noida Department of Computer Science & Engineering
6 pages
Text Mining
No ratings yet
Text Mining
25 pages
Lecture 5 - Text Mining Sentiment and Social Media Analytics
No ratings yet
Lecture 5 - Text Mining Sentiment and Social Media Analytics
52 pages
Lecture 6-Text Mining and Sentiment Analysis
No ratings yet
Lecture 6-Text Mining and Sentiment Analysis
57 pages
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
No ratings yet
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
122 pages
AFM - Module 4
No ratings yet
AFM - Module 4
48 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
Week 12
No ratings yet
Week 12
19 pages
Lecture 6 - From Unstructured Texts To Structure Data I
No ratings yet
Lecture 6 - From Unstructured Texts To Structure Data I
17 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Screenshot 2024-06-04 at 12.02.17 AM
No ratings yet
Screenshot 2024-06-04 at 12.02.17 AM
23 pages
1 Text Mining Review Slides
No ratings yet
1 Text Mining Review Slides
78 pages
FALLSEM2024-25 BCSE409L TH VL2024250101881 2024-11-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE409L TH VL2024250101881 2024-11-15 Reference-Material-I
68 pages
Introduction To Text Mining
No ratings yet
Introduction To Text Mining
82 pages
Sentiment Analysis: Srishti Chaubey
No ratings yet
Sentiment Analysis: Srishti Chaubey
40 pages
Unit6 002
No ratings yet
Unit6 002
10 pages
Text Analysis: Why Do We Need Text Analytics
No ratings yet
Text Analysis: Why Do We Need Text Analytics
2 pages
Intro To TM
No ratings yet
Intro To TM
32 pages
Chapter 03 - Sharda 11e Full Accessible PPT 07
No ratings yet
Chapter 03 - Sharda 11e Full Accessible PPT 07
29 pages
Lect 5
No ratings yet
Lect 5
40 pages
Bcse206l FDS Module-4 Smsatapathy
No ratings yet
Bcse206l FDS Module-4 Smsatapathy
50 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
Chapter 07 - in Class
No ratings yet
Chapter 07 - in Class
49 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
Text Analysis
No ratings yet
Text Analysis
13 pages
Data Mining and Sentiment Analysis: A Seminar Report On
No ratings yet
Data Mining and Sentiment Analysis: A Seminar Report On
39 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
Web Mining Unit 2
No ratings yet
Web Mining Unit 2
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Text Mining
No ratings yet
Text Mining
35 pages
TEXT ANALYTICS With Python
No ratings yet
TEXT ANALYTICS With Python
37 pages
BDA3
No ratings yet
BDA3
61 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
31 pages
I Ji Scs 02222013
No ratings yet
I Ji Scs 02222013
5 pages
Paper News Text Summaraizaton
No ratings yet
Paper News Text Summaraizaton
8 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
Sentiment Analysis On IMDB Movie Comments and Twit
No ratings yet
Sentiment Analysis On IMDB Movie Comments and Twit
8 pages
CH 06 PPTaccessible
No ratings yet
CH 06 PPTaccessible
71 pages
Predictive Methods For Text Mining
No ratings yet
Predictive Methods For Text Mining
75 pages
Ijcst V3i2p17
No ratings yet
Ijcst V3i2p17
5 pages
0900 Karimi
No ratings yet
0900 Karimi
17 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Sciencedirect: Chetashri Bhadane, Hardi Dalal, Heenal Doshi
No ratings yet
Sciencedirect: Chetashri Bhadane, Hardi Dalal, Heenal Doshi
8 pages
Statistical Language Processing
No ratings yet
Statistical Language Processing
32 pages
A New Approach To Represent Textual Documents Using CVSM
No ratings yet
A New Approach To Represent Textual Documents Using CVSM
6 pages
Text and Sentiment Analysis
No ratings yet
Text and Sentiment Analysis
41 pages
Module III
No ratings yet
Module III
42 pages
Doyle 2014 Art Talk
No ratings yet
Doyle 2014 Art Talk
29 pages
Sentimental Analysis Using NLP
No ratings yet
Sentimental Analysis Using NLP
5 pages
DeekshikaJadyada AP24LDS11
No ratings yet
DeekshikaJadyada AP24LDS11
6 pages
Sentiment Analysis and Opinion Mining
No ratings yet
Sentiment Analysis and Opinion Mining
49 pages
Lec 5 e Text Analytics Vector Space TF IDF
No ratings yet
Lec 5 e Text Analytics Vector Space TF IDF
51 pages
NCSPCN 12 CRP
No ratings yet
NCSPCN 12 CRP
3 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
Text Mining
No ratings yet
Text Mining
85 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
CIVL 3010 Fall 2024 HW 1
No ratings yet
CIVL 3010 Fall 2024 HW 1
6 pages
Exam 1
No ratings yet
Exam 1
12 pages
Ce-6004 Lecture-8
No ratings yet
Ce-6004 Lecture-8
2 pages
Lecture 11
No ratings yet
Lecture 11
14 pages
RSTP A Comprehensive Study
No ratings yet
RSTP A Comprehensive Study
6 pages
The Conservation of The Flood Flow Zone of Turag River and Compact Township Development Project, Turag
No ratings yet
The Conservation of The Flood Flow Zone of Turag River and Compact Township Development Project, Turag
6 pages
Assignment # 2: Q1. What Are The Transportation Planning Institutions in Bangladesh and How Do They Work?
No ratings yet
Assignment # 2: Q1. What Are The Transportation Planning Institutions in Bangladesh and How Do They Work?
2 pages
Bits Bytes
No ratings yet
Bits Bytes
2 pages
DHAYANITHI - 422421104011 - Internship Report
No ratings yet
DHAYANITHI - 422421104011 - Internship Report
11 pages
Zookeeper
No ratings yet
Zookeeper
59 pages
Top 50 C# Interview Q&A
No ratings yet
Top 50 C# Interview Q&A
11 pages
Cloud Interview Questions and Answers
No ratings yet
Cloud Interview Questions and Answers
22 pages
Final Proposal - Updated
No ratings yet
Final Proposal - Updated
7 pages
Dbms Lab Viva Questions
No ratings yet
Dbms Lab Viva Questions
9 pages
ML 01
No ratings yet
ML 01
23 pages
Parking File
No ratings yet
Parking File
17 pages
ADF Workshop by Amit Navgire
No ratings yet
ADF Workshop by Amit Navgire
26 pages
AI in Marketing Research
No ratings yet
AI in Marketing Research
31 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
MIE1624 - Assignment 3
No ratings yet
MIE1624 - Assignment 3
6 pages
Unit 5 Da
No ratings yet
Unit 5 Da
41 pages
Security Privacy and Forensics Issues in Big Data Advances in Information Security Privacy and Ethics 1st Edition Ramesh C. Joshi Download
No ratings yet
Security Privacy and Forensics Issues in Big Data Advances in Information Security Privacy and Ethics 1st Edition Ramesh C. Joshi Download
51 pages
Marketing Analyticsu A Machine Learning Approach 1st Edition by Mansurali 9781000608908 1000608905pdf Download
100% (5)
Marketing Analyticsu A Machine Learning Approach 1st Edition by Mansurali 9781000608908 1000608905pdf Download
77 pages
MIT6 857S14 ps2
No ratings yet
MIT6 857S14 ps2
3 pages
Curriculum For The Degree Study Program "Geographical Information Science & Systems (Unigis MSC) "
No ratings yet
Curriculum For The Degree Study Program "Geographical Information Science & Systems (Unigis MSC) "
21 pages
DB Lab Manuals
No ratings yet
DB Lab Manuals
87 pages
Al - ML Week 1 Assignment
No ratings yet
Al - ML Week 1 Assignment
3 pages
Ontology-Based Question Answering System
0% (1)
Ontology-Based Question Answering System
18 pages
Healthcare Analytics
No ratings yet
Healthcare Analytics
4 pages
Safety Intelligence As An Essential Perspective For Safety Management in The Era of Safety 4.0 - From A Theoretical To A Practical Framework
No ratings yet
Safety Intelligence As An Essential Perspective For Safety Management in The Era of Safety 4.0 - From A Theoretical To A Practical Framework
11 pages
Security Goals, Attacks, Services, and Mechanisms
No ratings yet
Security Goals, Attacks, Services, and Mechanisms
24 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
2022 Summer Question Paper (Msbte Study Resources)
No ratings yet
2022 Summer Question Paper (Msbte Study Resources)
4 pages
MYSQL MCQ Term2 Sujan
No ratings yet
MYSQL MCQ Term2 Sujan
3 pages
Handbook
No ratings yet
Handbook
10 pages
Enterprise Java Lab QB SK
No ratings yet
Enterprise Java Lab QB SK
3 pages

Exam 2

Uploaded by

Exam 2

Uploaded by

Week -10

1. What are the 3 differences between Primary vs secondary data?

2. Different types of Application-programming interface (API)

• Supervised vs unsupervised learning

In this class, we only learn unsupervised ML (where we do not have target

Topic modeling is unsupervised ML.

• Structured vs unstructured data

Structured-organized data in rows and columns in excel sheet

Unstructured Data- Video, Audio, Textual data. Using Term by Document

• Text mining concepts

Define the process of Text Analysis?

Text Analytics = Information Retrieval + Information Extraction + Data

Text Mining Terminology (Unit of Analysis:

Corpus (and corpora)-collection of documents

Terms-each word known as terms

Synonyms (and polysemes)

Tokenisation – the process of breaking up a given text into units called

Lemmatization - remove inflectional endings only and to return the base or

The Three-Step/Task Text Mining Process

Sentiment Analysis Process

The goal of this step is to accurately identify the target of

Step 4 – Once the sentiments of all text data points in the

Tag the documents Parse and

Aggregate at Tag *TF

Week 14 – Future trends

You might also like