0% found this document useful (0 votes)
3 views

web and text mining

Web mining is utilized for detecting fraud, analyzing search engine queries, customer behavior, and extracting health-related information. Text mining involves extracting insights from unstructured text data using techniques like natural language processing, which includes summarization, sentiment analysis, and text categorization. The process of text mining consists of data collection, preprocessing, feature extraction, modeling, and evaluation, with applications in sentiment analysis, information retrieval, and spam detection.

Uploaded by

farwajavaid19
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

web and text mining

Web mining is utilized for detecting fraud, analyzing search engine queries, customer behavior, and extracting health-related information. Text mining involves extracting insights from unstructured text data using techniques like natural language processing, which includes summarization, sentiment analysis, and text categorization. The process of text mining consists of data collection, preprocessing, feature extraction, modeling, and evaluation, with applications in sentiment analysis, information retrieval, and spam detection.

Uploaded by

farwajavaid19
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Applications of web mining:

1. Web mining can be used to detect fraudulent activity on websites.


2. Used to analyze search engine queries and search engine results pages
(SERPs).
3. Used to detect fraudulent activity on websites.
4. Used to analyze customer behavior on websites and social media
platforms.
5. Web mining can be used to analyze health-related websites and extract
valuable information about diseases, treatments, and medications.

Text Mining
Text mining, also known as text data mining or text analytics, is the process of extracting
meaningful information and insights from unstructured text data. This involves various
techniques to analyze and interpret text, transforming it into a structured format that can be easily
understood and used for decision-making.

Technique: Natural language processing (NLP)

Natural language processing which evolved from computational linguistics, uses methods from
various disciplines, such as computer science, linguistics, and data science, to enable computers
to understand human language in both written and verbal forms. By analyzing sentence structure
and grammar, NLP sub-tasks allow computers to “read”. Common sub-tasks include:

 Summarization: provides a synopsis of long pieces of text to create a concise, coherent


summary of a document’s main points.

 Part-of-Speech (PoS) tagging: assigns a tag to every token in a document based on its
part of speech—that is, denoting nouns, verbs, adjectives.

 Text categorization: also known as text classification, is responsible for analyzing text
documents and classifying them based on predefined topics or categories.
 Sentiment analysis: detects positive or negative sentiment from internal or external data
sources, allowing you to track changes in customer attitudes over time.
Process of Text Mining:

1. Data Collection: Gathering text from various sources like documents, emails, social
media, and web pages.
2. Preprocessing: Cleaning and preparing the data
o Tokenization: Splitting text into words or phrases.
o Stopword Removal: Eliminating common words (e.g., "and," "the") that add
little value.
o Stemming/Lemmatization: Reducing words to their base forms.
3. Feature Extraction: Transforming text into a structured format
o Bag of Words: Representing text as a frequency count of words.
o TF-IDF (Term Frequency-Inverse Document Frequency): Weighing the
importance of words based on their frequency in a document relative to a corpus.
4. Modeling: Applying statistical and machine learning methods to identify patterns or
make predictions. Common approaches include:
o Classification: Categorizing text into predefined labels
o Clustering: Grouping similar texts together
o Topic Modeling: Discovering abstract topics within a collection of documents.
5. Evaluation: Assessing the performance of the models using metrics such as accuracy,
precision, recall, and F1 score.

Applications of Text mining:

1. Sentiment Analysis: Understanding opinions and emotions expressed in text.

2. Information Retrieval: Improving search engines and recommendation systems.

3. Customer Feedback Analysis: Gleaning insights from reviews and comments.

4. Spam Detection: Identifying unwanted messages in email and online platforms.

5. Social Media Monitoring: Analyzing trends and public sentiment.

You might also like