0% found this document useful (0 votes)

24 views48 pages

1-Introduction To Data Mining-13-12-2024

The document provides an overview of data mining, highlighting its significance due to the exponential growth of data and the need for knowledge extraction. It outlines the evolution of scientific disciplines leading to data science, the knowledge discovery process, and the architecture of data mining systems. Additionally, it discusses the interdisciplinary nature of data mining, its applications, and the challenges posed by traditional data analysis methods.

Uploaded by

naresh.r2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views48 pages

1-Introduction To Data Mining-13-12-2024

Uploaded by

naresh.r2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

Introduction to Data Mining

SWE2009 - Data Mining

March 20, 2025 1
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes

Data collection and data availability

Automated data collection tools, database systems, Web,
computerized society

Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation,
…

Society and everyone: news, digital cameras, YouTube
 We are drown in data, but starving for knowledge!
 We are data rich, but information poor.
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
March 20, 2025 SWE2009 - Data Mining 2
Evolution of Sciences
 Before 1600, empirical science
 1600-1950s, theoretical science
 Each discipline has grown a theoretical component. Theoretical models often
motivate experiments and generalize our understanding.
 1950s-1990s, computational science
 Over the last 50 years, most disciplines have grown a third, computational
branch (e.g. empirical, theoretical, and computational ecology, or physics, or
linguistics.)
 Computational Science traditionally meant simulation. It grew out of our
inability to find closed-form solutions for complex mathematical models.
 1990-now, data science
 The flood of data from new scientific instruments and simulations
 The ability to economically store and manage petabytes of data online
 The Internet and computing Grid that makes all these archives universally
accessible
 Scientific info. management, acquisition, organization, query, and visualization
tasks scale almost linearly with data volumes. Data mining is a major new
challenge!
 Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online
Science,
March 20, 2025 Comm. ACM, 45(11): 50-54, Nov.- Data
SWE2009 2002 Mining 3
March 20, 2025 SWE2009 - Data Mining 4
Evolution of Database
Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive,
etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information
systems
March 20, 2025 SWE2009 - Data Mining 5
What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from larger amount of data
 Data mining: a misnomer?
 The mining of gold from rocks or sand is referred to as
gold mining rather than rock or sand mining.
 The mining of coal from rocks or sand is referred to as
coal mining.

March 20, 2025 SWE2009 - Data Mining 6

What Is Data Mining?

 Alternative names
 Knowledge discovery (mining) in databases
(KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging,
information harvesting, business intelligence,
etc.

 Data mining—searching for knowledge

(interesting patterns) in your data.

March 20, 2025 SWE2009 - Data Mining 7

KDD: A Definition

Simply stated, data

mining refers to
extracting or “mining”
knowledge
from large amounts of
data, usually
automatically gathered

March 20, 2025 SWE2009 - 8Data Mining

KDD: A Definition
KDD is the automatic or semi-automatic
extraction of non-obvious, hidden knowledge
from large volumes of data.

106-1012 bytes: What is the knowledge?

we never see the whole Then run Data How to represent
data set, so will put it in Mining algorithms and use it?
the memory of computers

March 20, 2025 SWE2009 - Data Mining 9

Data, Information, Knowledge
We often see data as a string of bits, or
numbers and symbols, or “objects” which we
collect daily.

Information is data stripped of redundancy, and

reduced to the minimum necessary to
characterize the data.

Knowledge is integrated information, including

facts and their relations, which have been
perceived, discovered, or learned as our
“mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.

March 20, 2025 SWE2009 - Data Mining 10

From Data to Knowledge

Numerical attribute categorical attribute missing values class labels

If (Headache=No AND Vomiting = Yes AND Temperature = High)

THEN Viral illness = Yes

March 20, 2025 SWE2009 - Data Mining 11

Data Rich Knowledge Poor
How to acquire knowledge
for
knowledge-based systems
remains as the main
People gathered and stored difficult
so much data because they and crucial
think some valuable assets
are implicitly coded within it.
problem. ?
Raw data is rarely of direct knowledge inference
base engine
benefit.
Its true value depends on the
ability to extract information
useful for decision support. Tradition: via knowledge
engineers
Impractical Manual Data Analysis New trend: via automatic
programs
March 20, 2025 SWE2009 - Data Mining 12
Knowledge Discovery (KDD) Process

 Data mining—core of Pattern Evaluation

knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
March 20, 2025 SWE2009 - Data Mining 13
KDD Process - Steps
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be
combined)
3. Data selection (where data relevant to the analysis
task are retrieved from the database)
4. Data transformation (where data are transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations)
5. Data mining (an essential process where intelligent
methods are applied in order to extract data patterns)
6. Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on some
interestingness measures)
7. Knowledge presentation (where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user)

March 20, 2025 SWE2009 - Data Mining 14

Architecture of Typical Data Mining
System

March 20, 2025 SWE2009 - Data Mining 15

Architecture of a typical data
mining system
 Database, data warehouse, World
Wide Web, or other information
repository:

One or a set of databases, data warehouses,
spreadsheets, or other kinds of information
repositories.

Data cleaning and data integration techniques
may be performed on the data.

 Database or data warehouse server:


Responsible for fetching the relevant data,
based on the user’s data mining request.

March 20, 2025 SWE2009 - Data Mining 16

Contd….
 Knowledge base:

Knowledge is used to guide the search or
evaluate the interestingness of resulting
patterns.


knowledge can include concept hierarchies,
used to organize attributes or attribute values
into different levels of abstraction.


Knowledge such as user beliefs, which can be
used to assess a pattern’s interestingness based
on its unexpectedness, may also be included.

March 20, 2025 SWE2009 - Data Mining 17

Contd…
 Data mining engine:

Consists of a set of functional modules for tasks such as
characterization, association and correlation analysis,
classification, prediction, cluster analysis, outlier analysis,
and evolution analysis.

 Pattern evaluation module:


To focus the search toward interesting patterns.

To filter out discovered patterns.

The pattern evaluation module may be integrated with the
mining module, depending on the implementation of the
data mining method used.

For efficient data mining, it is highly recommended to
push the evaluation of pattern interestingness as deep as
possible into the mining process so as to confine the
search to only the interesting patterns.

March 20, 2025 SWE2009 - Data Mining 18

Contd….
 User interface:

Communicates between users and the data
mining system

Allow the user to interact with the system by
specifying a data mining query or task

Provide information to help focus the search

Performing exploratory data mining based on
the intermediate data mining results.

Allow the user to browse database and data
warehouse schemas or data structures, evaluate
mined patterns, and visualize the patterns in
different forms.

March 20, 2025 SWE2009 - Data Mining 19

Data Mining and Business
Intelligence
Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
March 20, 2025 SWE2009 - Data Mining 20
Data Mining: Confluence of Multiple
Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Other
Algorithm Disciplines

March 20, 2025 SWE2009 - Data Mining 21

Contd….
 DM  an interdisciplinary field
 Set of disciplines including database
systems, statistics, machine learning,
visualization, and information science.
 Other disciplines  Neural networks, fuzzy
logic or rough set theory, knowledge
representation, etc.

March 20, 2025 SWE2009 -22

Data Mining
 Statistics is the study of the collection, organization, analysis,
interpretation and presentation of data.
 Machine learning, a branch of artificial intelligence, concerns
the construction and study of systems that can learn from data.
 For example, a machine learning system could be trained on
email messages to learn to distinguish between spam and non-
spam messages. Ex- trees, neural n/w etc.
 A database is an organized collection of data.

SWE2009 - Data Mining 23

 Artificial intelligence (AI) is technology and

a branch of computer science that studies
and develops intelligent machines and
software.
 Pattern recognition aims to classify data (patt
erns) based on either a priori knowledge or o
n statistical information extracted from the
patterns.

SWE2009 - Data Mining 24

Data Mining: Classification
Schemes
 General functionality
 Descriptive data mining
 Predictive data mining
 Different views, different classifications
 Kinds of databases to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted

March 20, 2025 SWE2009 - Data Mining 25

Data Mining
 Prediction Methods

using some variables to predict unknown or
future values of other variables

 Descriptive Methods

finding human-interpretable patterns
describing the data

March 20, 2025 SWE2009 - Data Mining 26

Why Not Traditional Data
Analysis?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-
bytes of data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications

March 20, 2025 SWE2009 - Data Mining 27

Multi-Dimensional View of Data
Mining
 Data to be mined
 Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-
media, heterogeneous, legacy, WWW
 Knowledge to be mined
 Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Machine learning, statistics, visualization, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data
mining, stock market analysis, text mining, Web mining, etc.

March 20, 2025 SWE2009 - Data Mining 28

Multi-Dimensional View of Data
Mining
 Data to be mined
1. Relational
2. Data warehouse
3. Transactional
4. Stream
5. Object-oriented
6. Temporal Databases, Sequence Databases, and Time-Series
Databases
7. Spatial and Spatiotemporal
8. Heterogeneous Databases and Legacy Databases
9. Text and multi-media
10. WWW

March 20, 2025 SWE2009 - Data Mining 29

1. Relational
 A database system, also called a database
management system (DBMS).
 DBMS consists of a collection of interrelated
data, known as a database.
 A set of software programs to manage and
access the data.
 The software programs involve mechanisms
for the definition of database structures; for
data storage; for concurrent, shared, or
distributed data access; and for ensuring the
consistency and security of the information
stored, despite system crashes or attempts
at unauthorized access.
March 20, 2025 SWE2009 - Data Mining 30
Contd…..
 A relational database is a collection of tables,
each of which is assigned a unique name.
 Each table consists of a set of attributes
(columns or fields) and usually stores a large
set of tuples (records or rows).
 Each tuple in a relational table represents an
object identified by a unique key and
described by a set of attribute values.
 A semantic data model, such as an entity-
relationship (ER) data model, is often
constructed for relational databases.
 An ER data model represents the database
as a set of entities and their relationships.
March 20, 2025 SWE2009 - Data Mining 31
2. Data warehouse
 A repository of information collected from
multiple sources, stored under a unified
schema, and that usually resides at a single
site.

 Constructed via a process of data cleaning,

data integration, data transformation, data
loading, and periodic data refreshing.

 “A data warehouse is a subject-oriented,

integrated, time-variant, and nonvolatile
collection of data in support of
management’s decision-making process.”—
W. H. Inmon
March 20, 2025 SWE2009 - Data Mining 32
Contd…
 Usually modeled by a multidimensional
database structure
 Each dimension corresponds to an attribute
or a set of attributes in the schema
 Each cell stores the value of some
aggregate measure, such as count or sales
amount.
 The actual physical structure of a data
warehouse may be a relational data store or
a multidimensional data cube.
 A data cube provides a multidimensional
view of data and allows the pre-
computation and fast accessing of
summarized data.
March 20, 2025 SWE2009 - Data Mining 33
Contd…

March 20, 2025 SWE2009 - Data Mining 34

3. Transactional
 Consists of a file where each record
represents a transaction.
 A transaction typically includes a unique
transaction identity number (trans ID) and a
list of the items making up the transaction
(such as items purchased in a store).

March 20, 2025 SWE2009 - Data Mining 35

 The transactional database may have
additional tables associated with it, which
contain other information regarding the
sale, such as the date of the transaction,
the customer ID number, the ID number of
the salesperson and of the branch at which
the sale occurred, and so on.

March 20, 2025 SWE2009 - Data Mining 36

4. Stream
 data flow in and out of an observation
platform (or window) dynamically

 Unique features:

huge or possibly infinite volume

dynamically changing

flowing in and out in a fixed order

allowing only one or a small number of scans

demanding fast (often real-time) response time.

March 20, 2025 SWE2009 - Data Mining 37

4. Stream
 Typical examples of data streams include
various kinds of scientific and engineering
data, time-series data, and data produced
in other dynamic environments, such as
power supply, network traffic, stock
exchange, telecommunications, Web click
streams, video surveillance, and weather or
environment monitoring.

March 20, 2025 SWE2009 - Data Mining 38

5. Object-oriented
 Each entity is considered as an object

 Objects that share a common set of properties can

be grouped into an object class.

 Each object is an instance of its class.

 Object classes can be organized into class/subclass

hierarchies so that each class represents
properties that are common to objects in that
class.

 For instance, an employee class can contain

variables like name, address, and birthdate.

March 20, 2025 SWE2009 - Data Mining 39

Contd…
 Suppose that the class, sales person, is a
subclass of the class, employee.

 A sales person object would inherit all of the

variables pertaining to its superclass of
employee.

 In addition, it has all of the variables that

pertain specifically to being a salesperson
(e.g., commission).

 Such a class inheritance feature benefits

information sharing.
March 20, 2025 SWE2009 - Data Mining 40
6. Temporal Databases, Sequence
Databases, and Time-Series Databases

 A temporal database typically stores relational

data that include time-related attributes. These
attributes may involve several timestamps, each
having different semantics.

 A sequence database stores sequences of

ordered events, with or without a concrete notion of
time. Examples include customer shopping
sequences, Web click streams, and biological
sequences.

 A time-series database stores sequences of

values or events obtained over repeated
measurements of time (e.g., hourly, daily, weekly).
Examples include data collected from the stock
exchange, inventory control, and the observation of
natural phenomena SWE2009
March 20, 2025
(like -temperature
Data Mining
and wind). 41
7. Spatial and
Spatiotemporal
 Spatial databases contain spatial-related
information.
 Examples include geographic (map) databases,
very large-scale integration (VLSI) or computed-
aided design databases, and medical and satellite
image databases.
 Spatial data may be represented in raster format,
consisting of n-dimensional bit maps or pixel maps.
 For example, a 2-D satellite image may be
represented as raster data, where each pixel
registers the rainfall in a given area.
 Maps can be represented in vector format, where
roads, bridges, buildings, and lakes are represented
as unions or overlays of basic geometric constructs,
such as points, lines, polygons, and the partitions
and networks formed by these components.
March 20, 2025 SWE2009 - Data Mining 42
Contd….
 A spatial database that stores spatial
objects that change with time is called a
spatiotemporal database, from which
interesting information can be mined. For
example,
 we may be able to group the trends of
moving objects and identify some strangely
moving vehicles, or distinguish a
bioterrorist attack from a normal outbreak
of the flu based on the geographic spread of
a disease with time.

March 20, 2025 SWE2009 - Data Mining 43

8. Heterogeneous Databases
and Legacy Databases
 A heterogeneous database consists of a
set of interconnected, autonomous
component databases.

 A legacy database is a group of

heterogeneous databases that combines
different kinds of data systems.

 The heterogeneous databases in a legacy

database may be connected by intra or
inter-computer networks.

March 20, 2025 SWE2009 - Data Mining 44

9. Text and multi-media
 Text databases are databases that contain
word descriptions for objects.

 Words, sentences or paragraphs (product

specifications, error or bug reports, warning
messages, summary reports, notes, or other
documents).

 may be highly unstructured (such as some

Web pages on theWorldWideWeb).

March 20, 2025 SWE2009 - Data Mining 45

Contd…
 Some text databases may be somewhat
structured, that is, semi-structured (such as
e-mail messages and many HTML/XML Web
pages),

 Others are relatively well structured (such

as library catalogue databases).

 Text databases with highly regular

structures typically can be implemented
using relational database systems.

March 20, 2025 SWE2009 - Data Mining 46

Contd….
 (e.g.) Document classification

 Multimedia databases store image, audio, and

video data.

 Used in applications such as picture content-based

retrieval, voice-mail systems, video-on-demand
systems, the World Wide Web, and speech-based
user interfaces that recognize spoken commands.

 It must support large objects, because data

objects such as video can require gigabytes of
storage.

March 20, 2025 SWE2009 - Data Mining 47

10. WWW
 Distributed information services, such as
Yahoo!, Google, America Online, and
AltaVista, provide rich, worldwide, on-line
information services, where data objects are
linked together to facilitate interactive access.

 Users seeking information of interest traverse

from one object via links to another.

 Capturing user access patterns in such

distributed information environments is called
Web usage mining (or Weblog mining).

March 20, 2025 SWE2009 - Data Mining 48

C Tadm 23
100% (1)
C Tadm 23
14 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Unit #2 - Data Warehouse and Data Mining
No ratings yet
Unit #2 - Data Warehouse and Data Mining
51 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
Cap481 - Business Communication Unit 4
No ratings yet
Cap481 - Business Communication Unit 4
90 pages
01 Intro
No ratings yet
01 Intro
61 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
01 Intro
No ratings yet
01 Intro
52 pages
Introduction To Data Mining: Unit 1
No ratings yet
Introduction To Data Mining: Unit 1
28 pages
KDD Process
No ratings yet
KDD Process
56 pages
DB 14
No ratings yet
DB 14
97 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
No ratings yet
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
25 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
01 Intro
No ratings yet
01 Intro
45 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Haramaya University College of Engineering and Technology Department of Information Technology
No ratings yet
Haramaya University College of Engineering and Technology Department of Information Technology
38 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
01 Intro
No ratings yet
01 Intro
40 pages
01 Intro
No ratings yet
01 Intro
29 pages
Data Mining Chapter 1
No ratings yet
Data Mining Chapter 1
43 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Module 1
No ratings yet
Module 1
107 pages
Dataminig
No ratings yet
Dataminig
21 pages
Unit 3
No ratings yet
Unit 3
23 pages
File 1704273297 0009750 IntroUNIT-1
No ratings yet
File 1704273297 0009750 IntroUNIT-1
13 pages
01 Intro
No ratings yet
01 Intro
28 pages
Chapter 7 Introduction To Knowledge Discovery in Databases
No ratings yet
Chapter 7 Introduction To Knowledge Discovery in Databases
15 pages
1 Intro
No ratings yet
1 Intro
50 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Unit 1
No ratings yet
Unit 1
19 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
Slide 03 Chapter1 Introduction
No ratings yet
Slide 03 Chapter1 Introduction
36 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
DM 1
No ratings yet
DM 1
78 pages
2 DM Module 1 Introduction DVS
No ratings yet
2 DM Module 1 Introduction DVS
81 pages
1 - 1 Intro To Data Mining - ch1
No ratings yet
1 - 1 Intro To Data Mining - ch1
18 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
30 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
Carraro 20.19
100% (1)
Carraro 20.19
10 pages
01 Intro
No ratings yet
01 Intro
23 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Data Mining:: Knowledge Discovery in Databases
No ratings yet
Data Mining:: Knowledge Discovery in Databases
14 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining
No ratings yet
Data Mining
27 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
01 TASS Training Manual For Tax Payer - Copy - PPTM
No ratings yet
01 TASS Training Manual For Tax Payer - Copy - PPTM
109 pages
BSBINM601 Assessment2
100% (1)
BSBINM601 Assessment2
8 pages
4.18.2024 Impartiality-Confidentiality
No ratings yet
4.18.2024 Impartiality-Confidentiality
24 pages
WLAN Technical Proposal
No ratings yet
WLAN Technical Proposal
78 pages
8051 External Memory Interfacing
No ratings yet
8051 External Memory Interfacing
17 pages
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
0% (1)
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
5 pages
EAO MC 61 Main-Catalogue en
No ratings yet
EAO MC 61 Main-Catalogue en
110 pages
Beam Design Excel Sheet
No ratings yet
Beam Design Excel Sheet
1 page
Particulars of Factories Paying Revenue of Rs. One Crore and Above During The Year 2006-2007 As Compared To 2005 - 06 Commissionerate: Chennai-Iv
No ratings yet
Particulars of Factories Paying Revenue of Rs. One Crore and Above During The Year 2006-2007 As Compared To 2005 - 06 Commissionerate: Chennai-Iv
13 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Case Study - How Aggressively Should A Bank Pursue - 240820 - 080128
No ratings yet
Case Study - How Aggressively Should A Bank Pursue - 240820 - 080128
11 pages
Module 4 It New Era Prelim
No ratings yet
Module 4 It New Era Prelim
7 pages
Power Factor Correction - PPT (Autosaved)
No ratings yet
Power Factor Correction - PPT (Autosaved)
13 pages
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
No ratings yet
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
59 pages
Final J Component
No ratings yet
Final J Component
56 pages
BTMC506 AppliedThermodynamicsmECH
No ratings yet
BTMC506 AppliedThermodynamicsmECH
2 pages
3-Data Mining Task Primitives-19-12-2024
No ratings yet
3-Data Mining Task Primitives-19-12-2024
8 pages
Xi Chap 4
100% (1)
Xi Chap 4
7 pages
5-Pile Cap-Eurocode
No ratings yet
5-Pile Cap-Eurocode
1 page
Lec 10
No ratings yet
Lec 10
51 pages
Integrated Service Architecture To Promote The Circular Economy
No ratings yet
Integrated Service Architecture To Promote The Circular Economy
25 pages
Soft Computing J - Component
No ratings yet
Soft Computing J - Component
16 pages
Pinhole Cameras and Eyes
No ratings yet
Pinhole Cameras and Eyes
5 pages
49 00 00 Fi
No ratings yet
49 00 00 Fi
8 pages
TAM Final LAS
No ratings yet
TAM Final LAS
4 pages
4-Integration of Data Mining With Database-20-12-2024
No ratings yet
4-Integration of Data Mining With Database-20-12-2024
11 pages
Помощь метамаск
No ratings yet
Помощь метамаск
4 pages
Nsikak Eseme Adada 0037509021 20240703013724
No ratings yet
Nsikak Eseme Adada 0037509021 20240703013724
2 pages
Mitsubishi Q170M Quick Start Guide
No ratings yet
Mitsubishi Q170M Quick Start Guide
88 pages
BL Gritel W 49
No ratings yet
BL Gritel W 49
6 pages
Advanced Sessions STEAM
No ratings yet
Advanced Sessions STEAM
9 pages
Acción Psicológica - Home Page
No ratings yet
Acción Psicológica - Home Page
1 page
Jadual
No ratings yet
Jadual
4 pages

1-Introduction To Data Mining-13-12-2024

Uploaded by

1-Introduction To Data Mining-13-12-2024

Uploaded by

Introduction to Data Mining

SWE2009 - Data Mining

 Data mining (knowledge discovery from data)

March 20, 2025 SWE2009 - Data Mining 6

 Data mining—searching for knowledge

March 20, 2025 SWE2009 - Data Mining 7

Simply stated, data

March 20, 2025 SWE2009 - 8Data Mining

106-1012 bytes: What is the knowledge?

March 20, 2025 SWE2009 - Data Mining 9

Information is data stripped of redundancy, and

Knowledge is integrated information, including

March 20, 2025 SWE2009 - Data Mining 10

Numerical attribute categorical attribute missing values class labels

If (Headache=No AND Vomiting = Yes AND Temperature = High)

March 20, 2025 SWE2009 - Data Mining 11

 Data mining—core of Pattern Evaluation

March 20, 2025 SWE2009 - Data Mining 14

March 20, 2025 SWE2009 - Data Mining 15

 Database or data warehouse server:

March 20, 2025 SWE2009 - Data Mining 16

March 20, 2025 SWE2009 - Data Mining 17

 Pattern evaluation module:

March 20, 2025 SWE2009 - Data Mining 18

March 20, 2025 SWE2009 - Data Mining 19

Data Preprocessing/Integration, Data Warehouses

March 20, 2025 SWE2009 - Data Mining 21

March 20, 2025 SWE2009 -22

SWE2009 - Data Mining 23

 Artificial intelligence (AI) is technology and

SWE2009 - Data Mining 24

March 20, 2025 SWE2009 - Data Mining 25

March 20, 2025 SWE2009 - Data Mining 26

March 20, 2025 SWE2009 - Data Mining 27

March 20, 2025 SWE2009 - Data Mining 28

March 20, 2025 SWE2009 - Data Mining 29

 Constructed via a process of data cleaning,

 “A data warehouse is a subject-oriented,

March 20, 2025 SWE2009 - Data Mining 34

March 20, 2025 SWE2009 - Data Mining 35

March 20, 2025 SWE2009 - Data Mining 36

March 20, 2025 SWE2009 - Data Mining 37

March 20, 2025 SWE2009 - Data Mining 38

 Objects that share a common set of properties can

 Each object is an instance of its class.

 Object classes can be organized into class/subclass

 For instance, an employee class can contain

March 20, 2025 SWE2009 - Data Mining 39

 A sales person object would inherit all of the

 In addition, it has all of the variables that

 Such a class inheritance feature benefits

 A temporal database typically stores relational

 A sequence database stores sequences of

 A time-series database stores sequences of

March 20, 2025 SWE2009 - Data Mining 43

 A legacy database is a group of

 The heterogeneous databases in a legacy

March 20, 2025 SWE2009 - Data Mining 44

 Words, sentences or paragraphs (product

 may be highly unstructured (such as some

March 20, 2025 SWE2009 - Data Mining 45

 Others are relatively well structured (such

 Text databases with highly regular

March 20, 2025 SWE2009 - Data Mining 46

 Multimedia databases store image, audio, and

 Used in applications such as picture content-based

 It must support large objects, because data

March 20, 2025 SWE2009 - Data Mining 47

 Users seeking information of interest traverse

 Capturing user access patterns in such

March 20, 2025 SWE2009 - Data Mining 48

You might also like