Text Mining
Text Mining
Goal: Decide what you want to achieve from the text, like finding
patterns, classifying topics, or extracting key information.
Do you want to classify text into categories like spam vs. not spam?
2. Text Collection
Gather text data from various sources like:
Emails
News articles
Customer reviews
b) Converting to Lowercase
Standardize text by making everything lowercase (e.g., Hello →
hello ).
c) Tokenization
Break the text into smaller units like words or sentences.
d) Stopword Removal
Remove common words that don't add meaning, like is , the , and .
4. Text Representation
Transform the cleaned text into a format a computer can understand.
Example:
b) N-grams
Represents text as sequences of NNN words to capture context.
Example:
Input Text: "I love mountains and nature."
Trigrams (N=3):
"Ilovemountains,""lovemountainsand,""mountainsandnature".
"Ilovemountains,""lovemountainsand,""mountainsandnature""I love
mountains," "love mountains and," "mountains and nature"
Application:
Bigrams and trigrams are particularly useful for capturing phrases like
"New York City" or "machine learning."
5. Data Analysis
Use statistical or machine-learning methods to discover patterns or
insights:
1. Sentiment Analysis
What it is: Analyzing text to determine if it's positive, negative, or
neutral.
Use case: A restaurant can track reviews to see if customers are happy
with their food and service.
2. Spam Detection
What it is: Identifying and filtering out unwanted or harmful emails.
Use case: Email providers like Gmail and Outlook use text mining to
keep your inbox clean.
Use case: An online store can use text mining to understand common
complaints and improve their delivery process.
6. Content Recommendation
What it is: Recommending articles, books, or products based on the
content of previous interactions or user interests.
7. Information Extraction
What it is: Automatically extracting useful data or facts from a large
body of text.
Use case: A research organization uses text mining to pull out data from
scientific papers to create databases.
8. Fraud Detection
What it is: Identifying suspicious or fraudulent text patterns in
transactions, applications, or reports.
9. Healthcare Insights
What it is: Analyzing patient records, research papers, and medical
literature to find trends and improve treatment.
Use case: Hospitals use text mining to analyze patient feedback for
areas to improve patient care.
Impact: Helps customers quickly find what they need while boosting
Amazon's sales.
3. Google Search
Example: If you search for “best pizza near me,” Google analyzes local
restaurant reviews and descriptions to provide the most relevant results.
Impact: Makes it easier for users to find music that matches their mood.
2. Saving Time
Why it matters: Manually reading and analyzing large amounts of text is
time-consuming.
Example: A fashion brand uses text mining on social media to find out
which clothing styles are trending.
5. Reducing Risks
Why it matters: It helps detect issues before they become big problems.
3. Improves Decision-Making
Why it’s good: It provides valuable insights to guide business strategies.
6. Supports Innovation
Why it’s good: Extracts insights from research papers and reports to
create new solutions.
2. Complexity of Language
Why it’s a problem: Words can have multiple meanings, and
understanding context can be tricky.
Example: “Apple” could mean the fruit or the tech company, which can
confuse the system.
Presented By :-
Varun Kumar 23LCS001
Amaan Khan 23LCS002
Akshita Sharma 23LCS005
Abhishek Sharma 23LCS007