Exam 2
Exam 2
Week-11
To perform text mining – first, impose structure to the data, then mine the
structured data
Concepts
Stemming- cutting the word, to bring words in same level those are in
different forms
Stop words (and include words)- are those words we do not need in our
analysis. Like articles (a,an, the etc)
Term dictionary
Word frequency
Part-of-speech tagging
Term-by-document matrix
Occurrence matrix
Transformation
Knowledge
Feedback Feedback
The inputs to the process The output of Task 1 is a The output of Task 2 is a flat The output of Task 3 is a
include a variety of relevant collection of documents in file called term-document number of problem-specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such as computer processing populated with the term clustering models and
text, XML, HTML, etc. frequencies visualizations
TF-IDF
A high weight in tf–idf is reached by a high term frequency (in the given document) and
a low document frequency of the term in the whole collection of documents; the
weights hence tend to filter out common terms.
Week-12
Objective-Subjective
Negative-Positive
Comes right after the retrieval and preparation of the text documents
Step 1 – Sentiment It is also called detection of objectivity
Detection Fact [= objectivity] versus Opinion [= subjectivity]
Step 2 – N-P Given an opinionated piece of text, the goal is to classify the opinion as
falling under one of two opposing sentiment polarities
Polarity
N [= negative] versus P [= positive]
Classification