0% found this document useful (0 votes)
4 views

Week 1A - Overview and Introduction of Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 1A - Overview and Introduction of Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

1604C331 Data Mining

Week 1:
Overview &
Introduction

Odd Semester 2024-2025


20102620240829
Informatics Engineering
Faculty of Engineering | Universitas Surabaya
Overview

2
Informatics Engineering | Universitas Surabaya
Reference Book
Introduction to DATA MINING
Pang Ning Tan, Michael Steinbach, Vipin Kumar
Topics
• Introduction to Data Mining
• Data Exploration
• Classification Analysis
• Association Analysis
• Clustering Analysis
Lesson Plan (1 st Half of Semester)
Week Topic(s) Description

1 Overview & Introduction, Data Lecture, Assignment

2 Data Quality Lecture, Assignment

3 Similarity & Distance Measure Lecture, Assignment

4 Classification: Decision Tree 1 Lecture, Assignment

5 Quiz 1 Online via ULS (Thu, 26 Sep 2024)

6 Classification: Decision Tree 2 Lecture, Assignment

7 Classification: Naïve Bayes Lecture, Assignment

Mid-term Exam
Lesson Plan (2 nd Half of Semester)
Week Topic(s) Description

8 Pattern Mining: Apriori 1 Lecture, Assignment

9 Pattern Mining: Apriori 2 Lecture, Assignment

10 Pattern Mining: FP-Growth Lecture, Assignment

11 Clustering: Introduction, k-Means Lecture, Assignment

12 Quiz 2 Online via ULS (Thu, 28 Nov 2024)

13 Clustering: Hierarchical Lecture

14 Clustering: DB-Scan Lecture

Final Exam
Grading
Mid-term Grade (NTS: Nilai Tengah Semester) =
20% Assignments +
30% Quiz 1 (QTS: Quiz Tengah Semester) +
50% Mid Exam (UTS: Ujian Tengah Semester)

Final-term Grade (NAS: Nilai Akhir Semester) =


20% Assignments +
30% Quiz 2 (QAS: Quiz Akhir Semester) +
50% Mid Exam (UAS: Ujian Akhir Semester)

Final Grade (NA: Nilai Akhir) = 40% Mid-term Grade + 60% Final-term Grade

All kinds of PLAGIARISM and CHEATING will give you zero on your grade.
Introduction

13
Informatics Engineering | Universitas Surabaya
What is Data Mining?
• The processes or techniques of DISCOVERING INTERESTING
PATTERNS, MODELS, and other kinds of knowledge by analyzing
large datasets that provides insights or enable fast and accurate
decision making.
• Non-trivial extraction of implicit, previously unknown, and potentially
useful information from data.
• Exploration & analysis, by automatic or semi-automatic means, of
large quantities of data in order to discover meaningful patterns.
• Knowledge mining from data.
Why Data Mining? (1)
• Business worldwide generate gigantic datasets, including sales
transactions, stock trading records, product descriptions, sales
promotions, company profiles and performance, and customer
feedback.
• Scientific and engineering practices generate high orders of
petabytes of data in continuous manner, from remote sensing, to
process measuring, scientific experiments, system performance,
engineering observations, and environment surveillance.
Why Data Mining? (2)
• Biomedical research and health industry generate tremendous
amounts of data from gene sequence machines, biomedical
experiment and research reports, medical reports, patient
monitoring, and medical imaging.
• Billions of web searches supported by search engines process tens
of petabytes of data daily.
• Social media tools have become increasingly popular, producing a
tremendous number of texts, pictures, and videos, generating
various kinds of web communities and social networks.
Why Data Mining? (3)
• The explosively growing, widely available, and gigantic body of data
makes our time truly the data age.
• Powerful and versatile tools are badly needed to automatically
uncover information from the tremendous amounts of data and to
transform such data into organized knowledge.
Data Age is Here
• “We are living in the information age” or “We are
actually living in the data age”?
• Terabytes of petabytes of data pour into our
computer networks, WWW, and various kinds of
devices every day
• Business, news agency, society, science,
engineering, medicine, and almost every other aspect
of daily life. https://www.splunk.com/en_us/campaigns/data-age.html
• This explosive growth of available data volume is a
result of the computerization of our society and the
fast development of powerful computing, sensing,
and data collection, storage, and publication tools.
• This explosive growing, widely available, and gigantic
body of data makes our time truly the data age.
Data Mining Tasks
• Prediction Methods
Use some variables to predict unknown or future values of other
variables.

• Description Methods
Find human-interpretable patterns that describe the data.
Data Mining Tasks
• Classification (PREDICTIVE)
• Clustering (DESCRIPTIVE)
• Association Rule Discovery (DESCRIPTIVE)
• Sequential Pattern Discovery (DESCRIPTIVE)
• Regression (PREDICTIVE)
• Deviation Detection (PREDICTIVE)
Pattern Discovery Techniques
• Classification:
– Decision Trees, Naïve Bayes, Support Vector Machines
• Clustering:
– k-means, Hierarchical Clustering
• Association Rule Mining:
– Apriori Algorithm
Data Mining in Summary (Shivam Arora, 2024)
Retail and Marketing
(REAL-WORLD EXAMPLES)

• Customer Segmentation
– Retailers use data mining to segment customers based on purchasing
behavior.
– Example: identifying high-value customers who are likely to buy premium
products.
• Market Basket Analysis
– To understand the purchase behavior of customers by finding
associations between different products.
– Example: if customers frequently buy bread and butter together, a store
might place these items to each other.
E-commerce
(REAL-WORLD EXAMPLES)

• Recommendation Systems:
– E-commerce platforms (e.g. Amazon, Netflix) use data mining to
recommend products and content to users based on their browsing and
purchase history.
– Example: “Customers who bought this also bought … “
recommendations/
• Dynamic Pricing
– Online retailers use data mining to adjust prices dynamically based on
demand, competition, and customer behavior. This helps in maximizing
sales and profits.
Finance and Banking
(REAL-WORLD EXAMPLES)

• Credit Scoring
– Financial institutions use data mining to assess the creditworthiness of
applicants by analyzing historical data on loan repayments, credit card
usage, and financial transactions.
• Fraud Detection
– Banks use anomaly detection techniques to identify unusual patterns in
transactions that may indicate fraudulent activities.
– Example: a sudden large transaction from a foreign country could trigger
a fraud alert.
Healthcare
(REAL-WORLD EXAMPLES)

• Predictive Analytics
– Healthcare providers use data mining to predict disease outbreaks,
patient admission rates, and the likelihood of patient readmissions. This
help resources and improving patient care.
• Personalized Treatment Plans
– Healthcare professionals can develop personalized treatment plans
based on the patient’s medical history, genetics, and lifestyle by
analyzing patient data.
Telecommunications
(REAL-WORLD EXAMPLES)

• Churn Prediction
– Telecom companies use data mining to predict which customers are
likely to switch to a competitor.
– Companies can take proactive measures to retain customers by
understanding the factors leading to churn.
– Customer Churn: the number of customers that stopped using the company’s product
or service during a period of time.

• Network Optimization
– Data mining helps in optimizing network performance by analyzing call
data records and detecting issues like dropped calls and network
congestion.
Manufacturing
(REAL-WORLD EXAMPLES)

• Predictive Maintenance
– Manufacturers uses data mining to predict equipment failures before they
occur by analyzing sensor data from machinery.
– This helps in scheduling maintenance and reducing downtime.
• Quality Control
– Data mining is used to identify patterns in production data that lead to
defects, allowing manufacturers to improve product quality and reduce
waste.
Energy and Utilities
(REAL-WORLD EXAMPLES)

• Energy Consumption Forecasting


– Utility companies use data mining to predict energy demand based on
historical consumption patterns, weather data, and other factors.
– This helps in efficient energy production and distribution.
• Smart Grid Management
– Data mining helps in managing smart grids by analyzing data from smart
meters to detect anomalies, optimize energy usage, and prevent
outages.
Social Media & Online Platforms
(REAL-WORLD EXAMPLES)

• Sentiment Analysis
– Companies use data mining to analyze social media posts, reviews, and
comments to gauge public sentiment about their products or services.
– This helps in marketing strategy and brand management.
• User Behavior Analysis
– Social media platforms like Facebook and Twitter use data mining to
understand user behavior, preferences, and engagement patterns, which
helps in improving user experience and targeted advertising.
Sports & Entertainment
(REAL-WORLD EXAMPLES)

• Performance Analysis
– Sports teams use data mining to analyze player performance, injury
patterns, and game strategies.
– This helps in making informed decisions on player selection and game
tactics.
• Audience Engagement
– Entertainment companies use data mining to analyze viewer preference
and engagement patterns, helping in content creation and personalized
recommendations.
Data Mining Tools
Challenges of Data Mining
• Scalability
• High Dimensionality
• Heterogeneous & Complex Data
• Data Quality
• Data Ownership & Distribution
• Non-traditional Analysis
Data mining:
An essential step in knowledge discovery
• Many people treat data mining as a synonym for another popularly
used term, knowledge discovery from data, or KDD.
• Others view data mining as merely an essential step in the overall
process of knowledge discovery.
Knowledge Discovery Process
• Data Collection
• Data Preparation
– Data cleaning: to remove noise and inconsistent data
– Data integration: where multiple data sources may be combined
– Data transformation: where data are transformed and consolidated into forms appropriate for mining by
performing summary or aggregation operations
– Data selection: where data relevant to the analysis task are retrieved from the database
• Data mining: an essential process where intelligent methods are applied to extract
patterns or construct models
• Pattern/model Evaluation and Interpretation: to identify the truly interesting patterns or
models representing knowledge based on interestingness measures
– Metrics for Evaluation: accuracy, precision, recall, F1-score
– Visualization Techniques: confusion matrix, ROC curve
• Knowledge Presentation: where visualization and knowledge representation techniques are
used to present mined knowledge to users
– Tools for Presenting Data: Tableau, Power BI
– Effective Data Visualization
KDD Process:
A typical view from ML & Statistics

Input Data Data Pre- Data Post-


Processing Mining Processing

Data integration Pattern discovery Pattern evaluation


Normalization Association & correlation Pattern selection
Feature selection Classification Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………
A Brief History of Data Mining Society
• 1989 IJCAI Workshop on Knowledge Discovery in Databases
– Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991)
• 1991-1994 Workshops on Knowledge Discovery in Databases
– Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy, 1996)
• 1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining
(KDD’95-98)
– Journal of Data Mining and Knowledge Discovery (1997)
• ACM SIGKDD conferences since 1998 and SIGKDD Explorations
• More conferences on data mining
– PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc.
• ACM Transactions on KDD starting in 2007
Conferences and Journals on Data Mining
• KDD Conferences ◼ Other related conferences
– ACM SIGKDD Int. Conf. on ◼ DB conferences: ACM SIGMOD,
Knowledge Discovery in
Databases and Data VLDB, ICDE, EDBT, ICDT, …
Mining (KDD) ◼ Web and IR conferences: WWW,
– SIAM Data Mining Conf. SIGIR, WSDM
(SDM)
– (IEEE) Int. Conf. on Data ◼ ML conferences: ICML, NIPS
Mining (ICDM)
PR conferences: CVPR,
– European Conf. on ◼

Machine Learning and ◼ Journals


Principles and practices of
Knowledge Discovery and ◼ Data Mining and Knowledge
Data Mining (ECML-PKDD) Discovery (DAMI or DMKD)
– Pacific-Asia Conf. on ◼ IEEE Trans. On Knowledge and
Knowledge Discovery and
Data Mining (PAKDD) Data Eng. (TKDE)
– Int. Conf. on Web Search ◼ KDD Explorations
and Data Mining (WSDM)
◼ ACM Trans. on KDD
Where to Find References?
• Data mining and KDD (SIGKDD: CDROM)
– Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
– Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM TKDD
• Database systems (SIGMOD: ACM SIGMOD Anthology —CD ROM)
– Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA
– Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc.
• AI & Machine Learning
– Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), CVPR, NIPS, etc.
– Journals: Machine Learning, Artificial Intelligence, Knowledge and Information Systems, IEEE-PAMI, etc.
• Web and IR
– Conferences: SIGIR, WWW, CIKM, etc.
– Journals: WWW: Internet and Web Information Systems,
• Statistics
– Conferences: Joint Stat. Meeting, etc.
– Journals: Annals of statistics, etc.
• Visualization
– Conference proceedings: CHI, ACM-SIGGraph, etc.
– Journals: IEEE Trans. visualization and computer graphics, etc.
Recommended Reference Books
• S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data. Morgan Kaufmann, 2002
• R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Interscience, 2000
• T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, 2003
• U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996
• U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001

• J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd ed., 2011
• D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001
• T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 nd ed., Springer-Verlag,
2009
• B. Liu, Web Data Mining, Springer 2006.

• T. M. Mitchell, Machine Learning, McGraw Hill, 1997


• G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991
• P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005

• S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998


• I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Ka ufmann, 2nd
ed. 2005
Question?

59
Informatics Engineering | Universitas Surabaya

You might also like