Lecture 1
Lecture 1
Mining
Enhances customer
Essential for AI-driven
relationship
applications.
management (CRM).
Business: Market basket analysis, customer
segmentation
Applications
of Data Finance: Fraud detection, credit scoring
Mining
Social Media: Sentiment analysis,
recommendation systems
Data Mining
vs. Knowledge Data Mining: A step in the KDD process
Discovery in focused on extracting patterns
Databases
KDD Steps:
Data Pattern Knowledge
Data Cleaning Data Integration Data Selection Data Mining
Transformation Evaluation Presentation
KDD Process Example
• Dataset: Customer Purchase Behavior
Customer_ID Age Income Purchase_Amount Category
• KDD Steps and Example Output
1 25 30000 200 Electronics
o Selection → Extract relevant features (Age, Income,
2 40 50000 350 Clothing
Purchase_Amount)
o Preprocessing → Handle missing values, remove duplicates.
3 30 45000 120 Grocery o Transformation → Normalize Income and Purchase_Amount.
4 22 27000 400 Electronics o Data Mining → Apply clustering to find customer segments.
o Interpretation → Identify high-spending customer groups.
5 35 60000 150 Grocery
• Example Output(Clusters Identified):
o Cluster 1: Young, low-income, high spenders (Electronics)
o Cluster 2: Middle-aged, high-income, moderate spenders
(Grocery, Clothing)
M. Usman Sarwar(Experienced Data consultant) 9
Data Mining vs. Machine Learning vs.
Statistics
in Data understandable
3 30 750 5000 Yes • Data Preparation → Handle missing values, scale numerical
4 22 620 15000 No
features.
5 35 720 12000 Yes • Modeling → Apply Decision Tree Classifier.
• Evaluation → Accuracy: 85%, Confusion Matrix:
• Deployment → Deploy model for loan approval automation.
• Example Output:
A decision rule from the model:
• If Credit_Score > 700 → Approve Loan.
• If Credit_Score < 650 → Reject Loan.
• What is SEMMA?
o A methodology developed by SAS for
data mining.
SEMMA: Sample,
o Focuses on the technical aspects of data
mining.
Data Mining
Tools Programming-based tools: Require
coding knowledge.
Popular Open-
o Python-based, visual programming tool.
o Great for beginners and interactive
Based Tools
and research use.
Useful Links
and-crisp-dm-fe9d03d3ab6c
• https://www.geeksforgeeks.org/kdd-process-
in-data-mining/