0% found this document useful (0 votes)
16 views21 pages

1.1 Introduction To Data Mining

The document introduces data mining by defining it, discussing its significance, and outlining the key views and stages in the data mining process. Data mining involves discovering useful patterns from large datasets and can be applied across many domains to gain insights and make informed decisions.

Uploaded by

Đạt Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

1.1 Introduction To Data Mining

The document introduces data mining by defining it, discussing its significance, and outlining the key views and stages in the data mining process. Data mining involves discovering useful patterns from large datasets and can be applied across many domains to gain insights and make informed decisions.

Uploaded by

Đạt Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 1:

Introduction to Data Mining


Introduction to Data Mining
Objective
● Identify different views of data mining; understand the key issues in data mining
● Introduction to Data Mining:
○ Definition and significance of data mining.
○ Extracting valuable insights from large datasets.
● Data Mining Pipeline:
○ A systematic process for discovering knowledge from data.
○ Four key views and stages: Data, Patterns, Knowledge, Utility.

2
Introduction to Data Mining
Into the Digital Era
● Identify different views of data mining;
understand the key issues in data mining
● People’s daily lives
○ > 4 billion internet users
○ Social media, smart devices, …
● Scientific discovery
○ Rubin Observatory: 20TB/night
● Many application domains…

3
Introduction to Data Mining
Why Data Mining?
● Explosive data growth
○ KB, MB, GB, TB, PB, EB, ZB
○ Data creation, transmission, storage,
sharing, processing
● Drowning in data & starving for
knowledge
● Need automated analysis of massive data

4
Introduction to Data Mining
What is Data Mining?
● Knowledge discovery from data
○ Extraction of interesting patterns or
knowledge from huge amounts of data
○ Interesting: valid, previously unknown,
potentially useful, ultimately
understandable by human
○ Huge amounts of data: scalability,
efficiency

5
Introduction to Data Mining
The Four Views of Data Mining
● The data mining process encompasses four key views:
○ Data View: Understanding the dataset and its attributes.
○ Technique View: Techniques of Discovering patterns and relationships.
○ Knowledge View: Interpreting and evaluating discovered knowledge.
○ Application View: Applying the mined knowledge to achieve business goals.
● These views collectively guide us through the data mining pipeline.

6
Introduction to Data Mining
The Four Views of Data Mining

7
Introduction to Data Mining
Data View
● The 3Vs, 4Vs, 5Vs Value

Volume VAriety Velocity Veracity

8
Introduction to Data Mining
Data View
● Relational, transactional data
Single or
○ E.g., student records, bank accounts, store purchases mixture of
● Sequential, temporal, streaming data multiple data
types
○ E.g., gene sequences, stock prices, sensor reading
● Spatial, spatial-temporal data
○ E.g., land use, bird migration, traffic condition
● Text, multimedia, Web data
○ E.g., news articles, audio/video/image, hypertext
● Graph, network data
○ E.g., social network, power grid, co-authorship 9
Introduction to Data Mining
Application View
● Market analysis, target advertisement
And many
○ E.g., customer profiling, product recommendation many more…
● Healthcare, medical research
○ E.g., disease diagnosis, patient care, drug discovery
● Science and engineering
○ E.g., air pollution, marine life, electric vehicles
● Security
○ E.g., surveillance, intrusion/crime, fraud, cyberattack
● Government, nonprofit
○ E.g., urban planning, traffic control, education 10
Introduction to Data Mining
Knowledge View
● Frequent pattern, association, correlation
Descriptive,
○ E.g., songs listened together or in certain sequence predictive,
○ E.g., A is (more/less) likely to happen given B prescriptive

● Categorization
○ E.g., similarity among users with certain purchases
○ E.g., differences between two patient groups
● Anomaly, outliers
○ E.g., sensor errors, fraud activities, extreme events
● Changes over time
○ E.g., emerging new patterns, shift of user interest 11
Introduction to Data Mining
Technique View
● Frequent pattern analysis
● Classification, prediction
● Clustering
● Anomaly detection
● Trend and evolution analysis

12
Introduction to Data Mining
Frequent Pattern Analysis
● Frequent itemset
● Frequent sequence
● Frequent structure
● Association rules
● Correlation analysis

13
Introduction to Data Mining
Classification
● Pre-defined classes
● Need training data
● Build model to distinguish
classes

14
Introduction to Data Mining
Prediction
● Numerical prediction
(continuous value)
○ E.g., weather
○ E.g., stock price
○ E.g., traffic

15
Introduction to Data Mining
Clustering
● No predefined classes
● Intra-cluster similarity
● Inter-cluster dissimilarity

16
Introduction to Data Mining
Anomaly Detection
● Anomaly/outlier
○ Differ from the “norm”
○ E.g., error, noise
○ E.g., fraud
○ E.g., extreme events

17
Introduction to Data Mining
Trend and Evolution Analysis
● Changes over time
○ Overall trend
○ Periodical patterns
○ Anomalies
○ …

18
Introduction to Data Mining Pipeline
Key Components of the Data Mining Pipeline
● The data mining pipeline consists of essential components:
○ Data Collection: Gathering relevant data from various sources.
○ Data Preprocessing: Cleaning, integrating, transforming, and reducing data.
○ Data Mining: Applying algorithms to discover patterns and knowledge.
○ Pattern Evaluation: Assessing the quality and relevance of discovered
patterns.
○ Knowledge Presentation: Communicating insights and findings to
stakeholders.
● A well-structured pipeline ensures effective data analysis and decision-making.

19
Introduction to Data Mining Pipeline
Data Mining Pipeline

20
Introduction to Data Mining Pipeline
Summary
● Data mining: Discovering insights from large datasets.
● Significance: Informed decision-making, problem-solving.
● Key Views: Data, Patterns, Knowledge, Utility.
● Understand the stages for effective data mining.

21

You might also like