0% found this document useful (0 votes)

314 views43 pages

01-Introduction To Data Mining

Data mining involves extracting useful patterns from large amounts of data through techniques like classification, clustering, and association rule mining. It draws from multiple disciplines like machine learning, statistics, and database systems to analyze vast, complex datasets. The goals of data mining include prediction, description, and discovering hidden patterns in data to help organizations make better decisions.

Uploaded by

Ku Ha Ku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

314 views43 pages

01-Introduction To Data Mining

Uploaded by

Ku Ha Ku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Introduction to Data Mining

M. Tanzil Furqon, S.Kom., MCompSc.

2
What Is Data Mining?
• Data mining (knowledge discovery from data)
▫ Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
▫ Data mining: a misnomer?
• Alternative names
▫ Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging/cleaning, information harvesting, business
intelligence, etc.
• Watch out: Is everything “data mining”?
▫ Simple search and query processing
▫ (Deductive) expert systems
Why Mine the Data?
• Lots of data is being collected
and warehoused
▫ Web data, e-commerce
▫ purchases at department/
grocery stores
▫ Bank/Credit Card
transactions
• Competitive Pressure is Strong
▫ Provide better, customized services for an edge (e.g.
in Customer Relationship Management) à
automobile industry (Mitsubishi xpander)
Why Mine the Data? (contd..)
• Data collected and stored at enormous speeds (GB/hour)
▫ remote sensors on a satellite
▫ telescopes scanning the skies
▫ microarrays generating gene
expression data
▫ scientific simulations
▫ generating terabytes of data
• Traditional techniques infeasible for raw data
• Data mining may help scientists
▫ in classifying and segmenting data
▫ in Hypothesis Formation
Why Mine the Data? (contd..)
• There is often information “hidden” in the data that is
not readily evident
• Human analysts may take weeks to discover useful
information à take much time
• Much of the data is never analyzed at all
Definition of Data Mining
• Non-trivial extraction of implicit, previously unknown
and potentially useful information from data

• Exploration & analysis, by automatic or

semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns
Data Mining in Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
KDD Process: A Typical View from ML and Statistics

Input Data Data Pre- Data Post-

Processing Mining Processing

Data integration Pattern discovery Pattern evaluation

Normalization Association & correlation Pattern selection
Feature selection Classification Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………

• This is a view from typical machine learning and statistics communities

Data Mining Vs Non Data Mining
! Non Data Mining ! Data Mining (example)
(example) Certain names are more
– Look up phone prevalent in certain US
number in phone locations (O’Brien, O’Rurke,
directory O’Reilly… in Boston area)
– Group together similar
– Query a Web documents returned by search
search engine for engine according to their
information about context (e.g. Amazon
“Amazon” rainforest, Amazon.com,)
Data Mining Vs Database
• DB’s user knows what is looking for.
• DM’s user might/might not know what is looking for.
• DB’s answer to query is 100% accurate, if data correct.
• DM’s effort is to get the answer as accurate as possible.
• DB’s data are retrieved as stored.
• DM’s data need to be cleaned (some what) before
producing results.
• DB’s results are subset of data.
• DM’s results are the analysis of the data.
• The meaningfulness of the results is not the concern of
Database as it is the main issue in Data Mining.
Data Mining: On What Kind of Data?
• Relational databases
• Data warehouses
• Transactional databases
• Advanced DB and information repositories
▫ Object-oriented and object-relational databases
▫ Spatial databases
▫ Time-series data and temporal data
▫ Text databases and multimedia databases
▫ Heterogeneous and legacy databases
▫ WWW
Origins of Data Mining
• Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems
• Traditional Techniques
may be unsuitable due to
Statistics/ Machine Learning/
▫ Enormity of data AI Pattern
▫ High dimensionality Recognition
of data
Data Mining
▫ Heterogeneous,
distributed nature
of data Database
systems
Data Mining: Confluence of Multiple Disciplines
13

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance

Technology Computing
Why Confluence of Multiple Disciplines?
14

• Tremendous amount of data

▫ Algorithms must be scalable to handle big data
• High-dimensionality of data
▫ Micro-array may have tens of thousands of dimensions
• High complexity of data
▫ Data streams and sensor data
▫ Time-series data, temporal data, sequence data
▫ Structure data, graphs, social and information networks
▫ Spatial, spatiotemporal, multimedia, text and Web data
▫ Software programs, scientific simulations
• New and sophisticated applications
Data mining is supported by three
sufficiently mature technologies:
• Massive data collections
Commercial databases (using high performance engines)
are growing at exceptional rates

• Powerful multiprocessor computers

cost-effective parallel multiprocessor computer technology

• Data mining algorithms

under development for decades, in research areas such as
statistics, artificial intelligence, and machine learning,
but now implemented as mature, reliable, understandable
tools that consistently outperform older statistical methods
Data Mining Tasks
• Prediction Methods
▫ Use some variables to predict unknown or future
values of other variables.

• Description Methods
▫ Find human-interpretable patterns that describe
the data.
Data Mining Tasks (contd..)
• Classification [Predictive]
• Clustering [Descriptive]
• Association Rule Discovery [Descriptive]
• Sequential Pattern Discovery [Descriptive]
• Regression [Predictive]
• Deviation Detection [Predictive]
1. Classification (Definition)
• Given a collection of records (training set )
▫ Each record contains a set of attributes, one of the
attributes is the class.

• Find a model for class attribute as a function of

the values of other attributes.

• Goal: previously unseen records should be

assigned a class as accurately as possible.
Classification (example)…. Contd.
i cal i cal o us
gor gor i n u
a te a te o nt a ss
c c c cl
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
Set
10

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No Learn
Training Model
10 No Single 90K Yes
10

Set Classifier
Classification (application -1)
• Direct Marketing
▫ Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new cell-phone product.
▫ Approach:
Use the data for a similar product introduced before.
We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
Collect various demographic, lifestyle, and company-
interaction related information about all such customers.
Type of business, where they stay, how much they earn, etc.
Use this information as input attributes to learn a classifier
model.
Classification (application-2)
• Fraud Detection
▫ Goal: Predict fraudulent cases in credit card
transactions.
▫ Approach:
Use credit card transactions and the information on its
account-holder as attributes.
When does a customer buy, what does he buy, how often he pays on
time, etc
Label past transactions as fraud or fair transactions. This
forms the class attribute.
Learn a model for the class of the transactions.
Use this model to detect fraud by observing credit card
transactions on an account.
Classification (application-3)
• Customer Attrition/Churn:
▫ Goal: To predict whether a customer is likely to be
lost to a competitor.
▫ Approach:
Use detailed record of transactions with each of the
past and present customers, to find attributes.
How often the customer calls, where he calls, what
time-of-the day he calls most, his financial status,
marital status, etc.
Label the customers as loyal or disloyal.
Find a model for loyalty.
2. Clustering (definition)
• Given a set of data points, each having a set of
attributes, and a similarity measure among
them, find clusters such that
▫ Data points in one cluster are more similar to one
another.
▫ Data points in separate clusters are less similar to
one another.
• Similarity Measures:
▫ Euclidean Distance
▫ Cosine similarity, etc.
Illustration of clustering
! Euclidean Distance Based Clustering in 3-D space.

Intracluster distances Intercluster distances

are minimized are maximized
Clustering (application -1)
• Market Segmentation:
▫ Goal: subdivide a market into distinct subsets of
customers where any subset may conceivably be
selected as a market target to be reached with a
distinct marketing mix.
▫ Approach:
Collect different attributes of customers based on their
geographical and lifestyle related information.
Find clusters of similar customers.
Measure the clustering quality by observing buying patterns
of customers in same cluster vs. those from different clusters.
Clustering (application-2)
• Document Clustering:
▫ Goal: To find groups of documents that are
similar to each other based on the important
terms appearing in them.
▫ Approach: To identify frequently occurring terms
in each document. Form a similarity measure
based on the frequencies of different terms. Use it
to cluster.
3. Association Rule Discovery (definition)
• Given a set of records each of which contain some
number of items from a given collection;
▫ Produce dependency rules which will predict
occurrence of an item based on occurrences of other
items.
TID Items
1 Bread, Coke, Milk Rules Discovered:
2 Coffee, Bread {Milk} --> {Coke}
{Diaper, Milk} --> {Coffee}
3 Coffee, Coke, Diaper, Milk
4 Coffee, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Association Rule Discovery (definition) –
contd..
• A rule must have some minimum user-specified
confidence & support
• Support: proportion of transactions in the data
set which contain the itemset
• Confidence (XàY): Sup(X U Y)/Sup(X)
Association Rule (application)
• Marketing and Sales Promotion:
▫ Let the rule discovered be
{Bagels, … } --> {Potato Chips}
▫ Potato Chips as consequent => Can be used to
determine what should be done to boost its sales.
▫ Bagels in the antecedent => Can be used to see which
products would be affected if the store discontinues
selling bagels.
▫ Bagels in antecedent and Potato chips in consequent
=> Can be used to see what products should be sold
with Bagels to promote sale of Potato chips!
Association Rule (application-2)
• Supermarket shelf management.
▫ Goal: To identify items that are bought together
by sufficiently many customers.
▫ Approach: Process the point-of-sale data collected
with barcode scanners to find dependencies
among items.
▫ A classic rule --
If a customer buys diaper and milk, then he is very
likely to buy tea.
So, don’t be surprised if you find six-packs stacked
next to diapers!
4. Sequential Pattern Discovery
• Given is a set of objects, with each object associated with its own timeline of
events, find rules that predict strong sequential dependencies among
different events.

(A B) (C) (D E)

• Rules are formed by first discovering patterns. Event occurrences in the

patterns are governed by timing constraints.
Sequential Pattern (application)
• In telecommunications alarm logs,
▫ (Inverter_Problem Excessive_Line_Current)
(Rectifier_Alarm) --> (Fire_Alarm)
• In point-of-sale transaction sequences,
▫ Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
▫ Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
5. Regression
• Predict a value of a given continuous valued variable
based on the values of other variables, assuming a linear
or nonlinear model of dependency.
• Greatly studied in statistics, neural network fields.
• Examples:
▫ Predicting sales amounts of new product based on
advertising expenditure.
▫ Predicting wind velocities as a function of temperature,
humidity, air pressure, etc.
▫ Time series prediction of stock market indices.
Applications of Data Mining
• Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
• Collaborative analysis & recommender systems
• Basket data analysis to targeted marketing
• Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological
network analysis
• Data mining and software engineering
• From major dedicated data mining systems/tools (e.g., SAS, MS SQL-
Server Analysis Manager, Oracle Data Mining Tools) to invisible data
mining
Major Issues in Data Mining (1)
• Mining Methodology
▫ Mining various and new kinds of knowledge
▫ Mining knowledge in multi-dimensional space
▫ Data mining: An interdisciplinary effort
▫ Boosting the power of discovery in a networked environment
▫ Handling noise, uncertainty, and incompleteness of data
▫ Pattern evaluation and pattern- or constraint-guided mining
• User Interaction
▫ Interactive mining
▫ Incorporation of background knowledge
▫ Presentation and visualization of data mining results
Major Issues in Data Mining (2)

• Efficiency and Scalability

▫ Efficiency and scalability of data mining algorithms
▫ Parallel, distributed, stream, and incremental mining methods
• Diversity of data types
▫ Handling complex types of data
▫ Mining dynamic, networked, and global data repositories
• Data mining and society
▫ Social impacts of data mining
▫ Privacy-preserving data mining
▫ Invisible data mining
A Brief History of Data Mining Society
• 1989 IJCAI Workshop on Knowledge Discovery in Databases
▫ Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991)
• 1991-1994 Workshops on Knowledge Discovery in Databases
▫ Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-
Shapiro, P. Smyth, and R. Uthurusamy, 1996)
• 1995-1998 International Conferences on Knowledge Discovery in Databases and Data
Mining (KDD’95-98)
▫ Journal of Data Mining and Knowledge Discovery (1997)
• ACM SIGKDD conferences since 1998 and SIGKDD Explorations
• More conferences on data mining
▫ PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001),
WSDM (2008), etc.
• ACM Transactions on KDD (2007)
Conferences and Journals on Data Mining
• KDD Conferences n Other related conferences
▫ ACM SIGKDD Int. Conf. on Knowledge
n DB conferences: ACM SIGMOD,
Discovery in Databases and Data
VLDB, ICDE, EDBT, ICDT, …
Mining (KDD)
▫ SIAM Data Mining Conf. (SDM) n Web and IR conferences: WWW,
▫ (IEEE) Int. Conf. on Data Mining SIGIR, WSDM
(ICDM) n ML conferences: ICML, NIPS
▫ European Conf. on Machine Learning n PR conferences: CVPR,
and Principles and practices of
n Journals
Knowledge Discovery and Data Mining
(ECML-PKDD) n Data Mining and Knowledge
▫ Pacific-Asia Conf. on Knowledge Discovery (DAMI or DMKD)
Discovery and Data Mining (PAKDD) n IEEE Trans. On Knowledge and
▫ Int. Conf. on Web Search and Data Data Eng. (TKDE)
Mining (WSDM) n KDD Explorations
n ACM Trans. on KDD
Where to Find References? DBLP, CiteSeer, Google
• Data mining and KDD (SIGKDD: CDROM)
▫ Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
▫ Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM TKDD
• Database systems (SIGMOD: ACM SIGMOD Anthology—CD ROM)
▫ Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA
▫ Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc.
• AI & Machine Learning
▫ Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), CVPR, NIPS, etc.
▫ Journals: Machine Learning, Artificial Intelligence, Knowledge and Information Systems, IEEE-PAMI,
etc.
• Web and IR
▫ Conferences: SIGIR, WWW, CIKM, etc.
▫ Journals: WWW: Internet and Web Information Systems,
• Statistics
▫ Conferences: Joint Stat. Meeting, etc.
▫ Journals: Annals of statistics, etc.
• Visualization
▫ Conference proceedings: CHI, ACM-SIGGraph, etc.
▫ Journals: IEEE Trans. visualization and computer graphics, etc.
Data Mining di GOJEK
• Bagaimana GOJEK Memanfaatkan Big
Data Penggunanya untuk Bisnis
▫ memanfaatkan big data dengan pendekatan data
science
▫ mengambil berbagai keputusan real-time, dengan
menggunakan teknik seperti machine learning,
kecerdasan buatan (AI), dan juga natural
language processing.
Implementasi Data Science di GOJEK
• GOJEK menerapkan data science pada hampir seluruh proses
bisnis dan operasional mereka.
• Implementasinya tidak hanya bagi pengguna, tapi
mitra driver dan merchant juga dianalisis datanya.
Implementasi Data Science di GOJEK
• “Sistem pengalokasian driver kini jauh lebih baik dengan
penerapan machine learning. Dulu order yang masuk pasti
dialokasikan ke driver terdekat. Kini, berbagai
pertimbangan lain ikut dilibatkan. Hasilnya pick-up
rate semakin cepat, dan cancelation rate menurun”
(Syafrie, VP of Data Science GOJEK)
“Kan ada driver yang mencari
order searah dengan jalan
pulang. Ada juga yang
senangnya mengambil order
jarak-jarak pendek.
Dengan menggunakan machine
learning untuk memprediksi itu,
kita bisa sesuaikan sedemikian
rupa, sehingga baik customer
maupun driver sama-sama
enak“.
Rekrutmen talenta masih menjadi
tantangan utama

Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
DSC652 - Project Heart Attack Prediction
No ratings yet
DSC652 - Project Heart Attack Prediction
26 pages
Research Paper
No ratings yet
Research Paper
7 pages
Social Media Data Mining
100% (2)
Social Media Data Mining
382 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
CSE Database Management System
No ratings yet
CSE Database Management System
23 pages
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (8)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Distributed Database System
No ratings yet
Distributed Database System
6 pages
Data Mining
100% (1)
Data Mining
29 pages
Data Mining 101
No ratings yet
Data Mining 101
50 pages
SMA TechNeo Full Merged
No ratings yet
SMA TechNeo Full Merged
171 pages
Social Media
No ratings yet
Social Media
17 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Bigdata MINT PDF
No ratings yet
Bigdata MINT PDF
4 pages
Intro To BI
No ratings yet
Intro To BI
28 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
BDM Unit I Slides Part 1
No ratings yet
BDM Unit I Slides Part 1
27 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Web Analytics, Web Mining, and Social Analytics
No ratings yet
Web Analytics, Web Mining, and Social Analytics
53 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
An Introduction To Text: Mining
No ratings yet
An Introduction To Text: Mining
39 pages
Social Network Analysis in R PDF
No ratings yet
Social Network Analysis in R PDF
35 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Clouds and Big Data Computing
No ratings yet
Clouds and Big Data Computing
13 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
1-Big Data Analytics
No ratings yet
1-Big Data Analytics
37 pages
Personal, Legal, Ethical, and Organizational Issues of Information Systems
No ratings yet
Personal, Legal, Ethical, and Organizational Issues of Information Systems
38 pages
Pert 7 - Ethics and Privacy
No ratings yet
Pert 7 - Ethics and Privacy
18 pages
Data Mining in Social Network
No ratings yet
Data Mining in Social Network
28 pages
Distributed System
100% (1)
Distributed System
119 pages
CS 2032 Datawarehousing & Data Mining QB Topic Wise
No ratings yet
CS 2032 Datawarehousing & Data Mining QB Topic Wise
11 pages
Text Mining PPT Merged
100% (1)
Text Mining PPT Merged
58 pages
Data Mining
No ratings yet
Data Mining
27 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
18 pages
Data Scraping
No ratings yet
Data Scraping
17 pages
Data Model: Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
100% (1)
Data Model: Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
71 pages
Data Mining Tutorial
100% (2)
Data Mining Tutorial
64 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
DataMining S
No ratings yet
DataMining S
103 pages
LS1.1 - V6 Generalized Architecture of Big Data Systems
No ratings yet
LS1.1 - V6 Generalized Architecture of Big Data Systems
8 pages
An Introduction To Big Data
No ratings yet
An Introduction To Big Data
31 pages
A Dive Into Web Scraper World
No ratings yet
A Dive Into Web Scraper World
11 pages
Introduction To Data Mining With Case Studies - Sample Index
0% (1)
Introduction To Data Mining With Case Studies - Sample Index
16 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Big Data - Big Insights
No ratings yet
Big Data - Big Insights
54 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Web Mining
No ratings yet
Web Mining
53 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
Lec1 Machine Learning
No ratings yet
Lec1 Machine Learning
25 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Data Mining: July 18, 2019 1
No ratings yet
Data Mining: July 18, 2019 1
41 pages
Data Mining
No ratings yet
Data Mining
23 pages
combinepdf-1
No ratings yet
combinepdf-1
74 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
59 pages
Wk. 1. Introduction [08.10.2020]
No ratings yet
Wk. 1. Introduction [08.10.2020]
30 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
A Novel Statistical Analysis and Autoencoder Driven (CB)
No ratings yet
A Novel Statistical Analysis and Autoencoder Driven (CB)
29 pages
Lab Assesment Sheet of FML
No ratings yet
Lab Assesment Sheet of FML
1 page
Inter Departmental Electives 2022-24 Scheme
No ratings yet
Inter Departmental Electives 2022-24 Scheme
9 pages
Class Imbalance Paper
No ratings yet
Class Imbalance Paper
18 pages
ISO-IEC-TS-4213-2022
No ratings yet
ISO-IEC-TS-4213-2022
13 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Cyber Attack and Faulth HVAC in BMS
No ratings yet
Cyber Attack and Faulth HVAC in BMS
6 pages
2022 Multimodal brain tumor detection using multimodal deep transfer learning
No ratings yet
2022 Multimodal brain tumor detection using multimodal deep transfer learning
11 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Machine-Learning Set 7
No ratings yet
Machine-Learning Set 7
22 pages
Machine Learning (Aryan Kumar 7th Sem) PDF
No ratings yet
Machine Learning (Aryan Kumar 7th Sem) PDF
56 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
47 pages
6th_SEM Machine Learning Notes PDF
100% (1)
6th_SEM Machine Learning Notes PDF
36 pages
A Survey On Data Mining Techniques For COVID Prediction
100% (2)
A Survey On Data Mining Techniques For COVID Prediction
6 pages
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
No ratings yet
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
10 pages
Towards Robust Ferrous Scrap Material Classification With Deep Learning and Conformal Prediction
No ratings yet
Towards Robust Ferrous Scrap Material Classification With Deep Learning and Conformal Prediction
34 pages
Machine Learning With 3D Spatio-Temporal SSM For Alzheimer's Disease Patient Classification
No ratings yet
Machine Learning With 3D Spatio-Temporal SSM For Alzheimer's Disease Patient Classification
2 pages
Data Mining: A Preprocessing Engine
No ratings yet
Data Mining: A Preprocessing Engine
5 pages
DenseNet For Brain Tumor Classification in MRI Images
100% (1)
DenseNet For Brain Tumor Classification in MRI Images
9 pages
Marathi Hate Speech Detection IEEE Paper
No ratings yet
Marathi Hate Speech Detection IEEE Paper
5 pages
Applications of Machine Learning Methods in Traffic Crash Severity Modelling Current Status and Future Directions
No ratings yet
Applications of Machine Learning Methods in Traffic Crash Severity Modelling Current Status and Future Directions
26 pages
Where can buy Swarm Intelligence and Machine Learning: Applications in Healthcare 1st Edition Manish Gupta (Editor) ebook with cheap price
100% (1)
Where can buy Swarm Intelligence and Machine Learning: Applications in Healthcare 1st Edition Manish Gupta (Editor) ebook with cheap price
40 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
3 pages
Skin Detection A Bayesian Network Approach
No ratings yet
Skin Detection A Bayesian Network Approach
5 pages
Keyphrase Extraction Using Word Embedding
100% (1)
Keyphrase Extraction Using Word Embedding
8 pages
Machinelearning - Alisya Athirah Binti Mohd Huzzainny (Updated)
No ratings yet
Machinelearning - Alisya Athirah Binti Mohd Huzzainny (Updated)
26 pages
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
No ratings yet
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
16 pages

01-Introduction To Data Mining

Uploaded by

01-Introduction To Data Mining

Uploaded by

Introduction to Data Mining

M. Tanzil Furqon, S.Kom., MCompSc.

• Exploration & analysis, by automatic or

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

Input Data Data Pre- Data Post-

Data integration Pattern discovery Pattern evaluation

• This is a view from typical machine learning and statistics communities

Machine Pattern Statistics

Applications Data Mining Visualization

Algorithm Database High-Performance

• Tremendous amount of data

• Powerful multiprocessor computers

• Data mining algorithms

• Find a model for class attribute as a function of

• Goal: previously unseen records should be

1 Yes Single 125K No No Single 75K ?

7 Yes Divorced 220K No

Intracluster distances Intercluster distances

• Rules are formed by first discovering patterns. Event occurrences in the

• Efficiency and Scalability

You might also like