ISS - Module 3
ISS - Module 3
Data mining is the process of discovering useful patterns, trends, relationships, and
insights from large datasets using statistical, machine learning, and database
techniques.
🔹 Business:
Customer segmentation
Sales forecasting
Fraud detection
🔹 Healthcare:
Recommendation systems
Customer behavior tracking
Inventory optimization
🔹 Education:
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Presentation
Use visualization, reports, and summaries to present results.
🔷 Predictive Methods:
1. Classification
2. Regression
🔷 Descriptive Methods:
1. Clustering
3. Anomaly Detection
✅ Summary Table
Example
Method Purpose
Algorithm
Decision Trees,
Classification Predict categories
SVM
Regression Predict numeric values Linear Regression
Clustering Group similar records k-Means, DBSCAN
Association Rules Discover relationships Apriori, FP-Growth
Anomaly
Detect rare items or outliers Isolation Forest
Detection
Sequential
Find ordered patterns GSP, SPADE
Pattern
🔧 Popular Tools:
Data preprocessing
Modeling
Evaluation
Visualization
❌ Common Blunders:
Ignoring data cleaning → leads to biased models.
Overfitting → model fits training data too well, but performs poorly on new
data.
ANNs are computing systems inspired by the human brain that can learn patterns
from data, especially non-linear and complex relationships.
🧠 Key Features:
Fraud detection
Credit scoring
Medical diagnosis
✅ Advantages:
❌ Limitations:
Requires large datasets.
Computationally intensive.
✅ 4. Text Mining
📌 Definition:
🔧 Techniques:
🧠 Applications:
Document classification
Spam detection
Chatbot intelligence
✅ 5. Web Mining
📌 Definition:
Web mining refers to discovering patterns from the World Wide Web, including web
content, structure, and usage.
🌐 Types:
Web Content Mining:
🧠 Applications:
E-commerce personalization
SEO optimization
✅ 1. Data Warehousing
📌 Definition:
A Data Warehouse is a centralized repository that stores data from multiple sources
in a structured, organized, and subject-oriented manner to support decision-making
and business intelligence.
Integrated: Combines data from different sources (databases, flat files, etc.)
Component Description
Source
OLTP databases, CRM, ERP, etc.
Systems
ETL Tools Extract, Transform, Load – clean and integrate data
Data Staging
Temporary storage for processing
Area
Data
Warehouse Central data storage system (SQL Server, Oracle)
DB
Metadata Data about the data (structure, origin, usage)
Data Marts Department-specific subsets (e.g., finance mart)
Online Analytical Processing – for multidimensional
OLAP Tools
queries
🔍 Benefits:
BPM refers to the set of processes, tools, and methodologies used by organizations
to monitor, measure, and improve performance against strategic goals.
🎯 Objectives of BPM:
Component Description
Strategic Planning Define vision, mission, objectives
KPI Definition Identify measurable performance indicators
Data Collection Collect data from internal/external sources
Analytics & Use tools to evaluate and visualize
Reporting performance
Performance
Track ongoing operations and targets
Monitoring
Feedback &
Adjust processes or goals based on analysis
Adjustment
Balanced Scorecards