0% found this document useful (0 votes)

158 views67 pages

DM 1 PDF

The document provides an overview of data mining fundamentals. It discusses that data mining helps extract useful information and patterns from large amounts of raw data. Some common uses of data mining mentioned are in healthcare, finance, and retail to learn consumer preferences. The history of data mining is then outlined, from the beginnings of data collection and access in the 1960s-1980s, to data warehousing and decision reports in the 1990s, to the current focus on data mining. Some influential people and events in the development of data mining are also noted. Finally, the document discusses key data mining terminology, components, functionalities, and techniques.

Uploaded by

Rahul Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views67 pages

DM 1 PDF

Uploaded by

Rahul Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Data Mining

Fundamentals
Why Data Mining?

Almost everything we do leaves electronic

data behind
Needs to extract useful information from
data in order to interpret the data
Data mining helps speed up the process of
finding relationships and patterns in raw
data
Uses of Data Mining

Healthcare
Finance
Retail & E-Commerce
Learn about consumer preferences
Countless others!
History of Data Mining

The term “data

mining” is relatively
new but the concepts
have been around for
many years
Classical statistics,
artificial intelligence
and machine learning
culminated over the
years and evolved into
data mining
History of Data Mining (Cont.)

Data Collection (1960s)- process of storing

information on computers
Technology- computers, tapes and disks

Data Access (1980s)-the introduction of

structured query languages and relational
databases helped us learn more about data
Data available at record level dynamically
History of Data Mining (Cont.)

Data Warehousing and Decision Report

(1990s)-the process of centralized data
management and retrieval
Maintaining a central location for all
organizational data
Helps you analyze data and concentrate on very
specific characteristics
Dynamic data delivery at multiple levels

Data Mining (present)- generalizing

patterns, predictive
Influential People/Events

In 1975, John Henry Holland wrote Adaptation in

Natural and Artificial Systems, a book on genetic
algorithms – start in data mining

1990s, the term “data mining” appeared in the

database community for the first time

In 2001, William S. Cleveland introduced data

mining as an independent discipline

DJ Patil became the first Chief Data Scientist

in the White House in February 2015
Terminology

Data: facts, numbers or text that can be

processed by a computer

Information: the patterns, associations and

relationships of data

Knowledge: understanding of a subject,

synthesize information to gain knowledge
about historical patterns and future trends
Data, Information,
vs. Knowledge
Data Mining
Definitions
Is the process of discovering meaningful, new
correlation pattern and trends by shifting
through large amount of data stored in
repositories, using pattern recognition technique
as well as statistical and mathematical
technique.
It’s a non trivial extraction of implicit,
previously unknown and potentially useful
information from the data.
Data Mining
Definitions
Is the search of relationships and global pattern
that exist in large database but are hidden
among vast amounts of data such as relationship
between patient data and their medical
diagnosis.
Exploration & analysis, by automatic or semi-
automatic means, of large quantities of data in
order to discover meaningful patterns
Data Mining
What is not Data What is Data Mining?
Mining?
Look up phone Certain names are more
number in phone prevalent in certain US
directory locations (O’Brien, O’Rurke, O’Reilly…
in Boston area)
Query a Web Group together similar documents
search engine for returned by search engine according to
information about their context (e.g. Amazon
“Amazon rainforest, Amazon.com,)
Knowledge Discovery from Data
Knowledge Discovery from Data

Knowledge discovery consists of an iterative

sequence of the following steps:
data cleaning: to remove noise or irrelevant data
data integration: where multiple data sources may
be combined
data selection: where data relevant to the analysis
task are retrieved from the database
data transformation: where data are transformed
or consolidated into forms appropriate for mining by
performing summary or aggregation operations
data mining: an essential process where intelligent
methods are applied in order to extract data
patterns
Knowledge Discovery from Data

Knowledge discovery consists of an iterative

sequence of the following steps:
pattern evaluation: to identify the truly
interesting patterns representing knowledge
based on some interestingness measures
knowledge presentation: where visualization
and knowledge representation techniques are
used to present the mined knowledge to the user
Knowledge Discovery from Data

data cleaning:
data integration: Data
data selection: Mining
data transformation:
Data Mining
Data Mining
A typical data mining system may have the
following major components:
A database, data warehouse, or other
information repository, which consists of the set
of databases, data warehouses, spreadsheets etc.
A database or data warehouse server which
fetches the relevant data based on users’ data
mining requests.
Data Mining
A typical data mining system may have the
following major components:
A knowledge base that contains the domain
knowledge used to guide the search or to
evaluate the interestingness of resulting
patterns. For example, the knowledge base may
contain metadata which describes data from
multiple heterogeneous sources.
Data Mining
A typical data mining system may have the
following major components:
A data mining engine, which consists of a set
of functional modules for tasks such as
classification, association, classification, cluster
analysis, and evolution and deviation analysis.
A pattern evaluation module that works in
tandem with the data mining modules by
employing interestingness measures to help
focus the search towards interestingness
patterns.
Data Mining
A typical data mining system may have the
following major components:
A graphical user interface that allows the
user an interactive approach to the data mining
system.
Data Mining
Data to be mined
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
• Object‐oriented and object‐relational databases
• Spatial databases
• Time‐series data and temporal data
• Text databases and multimedia databases
• Heterogeneous and legacy databases
• WWW
Data Mining functionalities
Data mining functionalities are used specify
the kind of patterns to be found in data
mining tasks.

Characterize the general properties

Descriptive
of the data
Data mining
Performs the inference on the data
Predictive
in order to make predictions
Predictive model makes predictions
regarding data values using the results
found from available data. Thus it makes
use of historical data to make predictions
Descriptive model identifies patterns or
relationships in data. It finds out the
properties of existing data and does not
predict the new properties
Data Mining functionalities

Classification

Regression
Predictive
Time series
Data Mining
analysis
Descriptive
Prediction
Data Mining functionalities

Association
Predictive
rule
Data mining
Descriptive summerization

Clustering
Data Mining functionalities
Concept description: Characterization and
discrimination
Data can be associated with classes or concepts
• Ex. AllElectronics store classes of items for sale
include computer and printers.
Description of class or concept called
class/concept description can be done in 2 ways.
• data characterization, by summarizing the data of the
class under study (often called target class)
• data discrimination, by comparison of the target class
with one or a set of comparative classes (often called
the contrasting classes)
Data Mining functionalities
Data characterization is a summarization of
the general characteristics or features of a
target class of data.
Example : summarizing the characteristics of
customers who spend more than $1,000 a year. The
result could be a general profile of the customers,
such as they are 40–50 years old, employed, and
have excellent credit ratings.
The output of data characterization can be
presented in various forms.
Examples include pie charts, bar charts, curves,
multidimensional data cubes, and multidimensional
tables, including crosstabs.
Data Mining functionalities
data discrimination is a comparison of the
general features of target class data objects
with the general features of objects from one
or a set of contrasting classes.
The target and contrasting classes can be
specified by the user, and the corresponding
data objects retrieved through database
queries.
Example: two groups of customers, such as
those who shop for computer products regularly
versus those who rarely shop for such products
Data Mining functionalities
Association Rules – Tries to find out
relationship between data. Also called as
link analysis or affinity analysis
Best application of this task is association rules,
which is a model identifying specific type of data
associations.
• Example: buys(X; “computer”))buys(X; “software”)
• multidimensional association rule Example:
• age(X, “20:::29”)^income(X, “20K:::29K”))buys(X, “CD
player”)
Data Mining functionalities
Classification – Classification is the process of
finding a model (or function) that describes and
distinguishes data classes or concepts,
for the purpose of being able to use the model to predict
the class of objects whose class label is unknown.
The derived model is based on the analysis of a set of
training data (i.e., data objects whose class label is
known).
It maps data onto predefined groups or classes.
This is called as supervised learning as classes are
decided before examining the data.
Classes are decided based on characteristic of data
already belonging to the class
Data Mining functionalities
Pattern recognition is a type of
classification, where a given pattern is
classified into one of several classes based on
its similarity with predefined patterns.
Data Mining functionalities
Regression – maps a data item to real
valued prediction variable. This function
assumes that target data fits into some
known function and tries to find out best
function that models the given data.
Error analysis is used to determine which
function is the best.
Data Mining functionalities
Prediction – In many DM applications
future data is predicted based on current or
past data.
Examples are
prediction of flooding
Speech recognition
Machine learning
Pattern recognition
Data Mining functionalities
Cluster Analysis:
clustering analyzes data objects without
consulting a known class label
The objects are clustered or grouped based on
the principle of maximizing the intraclass
similarity and minimizing the interclass
similarity.
Data Mining functionalities
Outlier Analysis:
database may contain data objects that do not
comply with the general behaviour or model of
the data. These data objects are outliers.
The analysis of outlier data is referred to as
outlier mining.
Example : Outlier analysis may uncover
fraudulent usage of credit cards by detecting
purchases of extremely large amounts for a
given account number in comparison to regular
charges incurred by the same account.
Interestingness in pattern
A data mining system has the potential to
generate thousands or even millions of
patterns, “So, “are all of the patterns
interesting?” NOT
a pattern is interesting if it is
easily understood by humans
valid on new data with some degree of certainty
potentially useful
Novel
validates a hypothesis that the user sought to
confirm
Interestingness in pattern
Can a data mining system generate all of the
interesting patterns?
is often unrealistic and inefficient for data
mining systems to generate all of the possible
patterns.
user-provided constraints and interestingness
measures should be used to focus the search.
Interestingness in pattern
Can a data mining system generate only
interesting patterns?
is an optimization problem in data mining.
It is highly desirable for data mining systems to
generate only interesting patterns.
Related technologies
Related technologies
Machine learning
is the field of study that gives computers the
capability to learn without being explicitly
programmed.
is an application of artificial intelligence (AI)
that provides systems
• the ability to automatically learn and
• improve from experience
The primary aim is to allow the computers learn
automatically without human intervention or
assistance and adjust actions accordingly.
Related technologies
Statistics:
is a branch of mathematics working with data
collection, organization, analysis, interpretation
and presentation.
Statistics is a term used to summarize a process
that an analyst uses to characterize a data set.
Related technologies
Visualisation
is any technique for creating images, diagrams,
or animations to communicate a message.
It is the art of making data beautiful
information science
the study of processes for storing and retrieving
information.
is a field primarily concerned with the analysis,
collection, classification, manipulation, storage,
retrieval, movement, dissemination, and
protection of information
Related technologies
Database technology
is a computer based record keeping system
which is used to record ,maintain and retrieve
data.
It is an organized collection of interrelated
(persistent) data.
It facilitate the storage, retrieval, modification,
and deletion of data in conjunction with various
data-processing operations
Related technologies
Other technologies : depending upon the
requirement the technology from other
domain can be incorporated.
neural networks
fuzzy logic
rough set theory,
inductive logic programming,
high-performance computing.
Classification of Data Mining
Systems
Classification according to the kinds of
databases mined
Database systems can be classified according to
different criteria (such as data models, or the types
of data or applications involved), each of which may
require its own data mining technique.
• For instance, if classifying according to data models, we
may have a relational, transactional, object-relational, or
data warehouse mining system.
If classifying according to the special types of data
handled, we may have a spatial, time-series, text,
stream data, multimedia data mining system, or a
WorldWideWeb mining system.
Classification of Data Mining
Systems
Classification according to the kinds of
knowledge mined:
Data mining systems can be categorized
according to the kinds of knowledge they mine,
that is, based on data mining functionalities.
• Ex: characterization, discrimination, association and
correlation analysis, classification, prediction,
clustering, outlier analysis, and evolution analysis.
Classification of Data Mining
Systems
Classification according to the on the
granularity or levels of abstraction:
data mining systems can be distinguished based
on the granularity or levels of abstraction of the
knowledge mined.
• generalized knowledge - at a high level of abstraction
• primitive-level knowledge - at a raw data level
• Knowledge at multiple levels (considering several
levels of abstraction).
• An advanced data mining system should facilitate the
discovery of knowledge at multiple levels of
abstraction.
Classification of Data Mining
Systems
Data mining systems can also be categorized
as those that
mine data regularities (commonly occurring
patterns) versus
those that mine data irregularities (such as
exceptions, or outliers).
In general, concept description, association and
correlation analysis, classification, prediction,
clustering mine data regularities, rejecting
outliers as noise.
Classification of Data Mining
Systems
Classification according to the kinds of
techniques utilized:
Data mining systems can be categorized
according to the underlying data mining
techniques employed.
• database-oriented or data warehouse– oriented
techniques,
• Or machine learning, statistics, visualization, pattern
recognition, neural networks, and so on
A sophisticated data mining system will often
adopt multiple data mining techniques
Classification of Data Mining
Systems
Classification according to the applications
adapted:
For example, data mining systems may be
tailored specifically for finance,
telecommunications, DNA, stock markets, e-
mail, and so on.
Different applications often require the
integration of application-specific methods.
Data Mining Task Primitives
A user wants to perform some form of data
analysis.
A data mining task can be specified in the
form of a data mining query.
A data mining query is defined in terms of
data mining task primitives.
These primitives allow the user to
interactively communicate with the data
mining system
Data Mining Task Primitives
The set of task-relevant data to be mined:
This specifies the portions of the database or the
set of data in which the user is interested.
• For example: the database attributes or data
warehouse dimensions of interest. It is also referred
to as the relevant attributes or dimensions.
The kind of knowledge to be mined:
This specifies the data mining functions to be
performed,
• Example characterization, discrimination, association
or correlation analysis, classification, prediction,
clustering, outlier analysis, or evolution analysis.
Data Mining Task Primitives
The background knowledge to be used in the
discovery process:
The domain knowledge is useful for guiding the
knowledge discovery process and for evaluating
the patterns found.
Concept hierarchies are a popular form of
background knowledge, which allow data to be
mined at multiple levels of abstraction.
Data Mining Task Primitives
The interestingness measures and
thresholds for pattern evaluation:
They may be used to guide the mining process
or,
• after discovery, to evaluate the discovered patterns.
The expected representation for visualizing
the discovered patterns:
This refers to the form in which discovered
patterns are to be displayed,
• Such as rules, tables, charts, graphs, decision trees,
and cubes.
Data Mining Task Primitives
Major Issues in Data Mining
Human Interaction:
As data mining problems are not precisely
stated, interfaces may be needed with both
domain and technical experts.
• Technical experts are needed to formulate the queries
and assist in interpreting the results.
• Users are needed to identify training data and desired
results.
Major Issues in Data Mining
Over fitting:
When a model is generated that is associated with a
given database state it is desirable that the model also
fit future database states.
Over fitting occurs when the model does not fit future
states.
There are 2 reasons for over fitting
• It be caused by assumptions that are made about the data or
• Because training database is too small.
• For example, a classification model for an employee database
may be developed to classify employees as short, medium, or tall.
If the training database is quite small, the model might
erroneously indicate that a short person is anyone under five feet
eight inches because there is only one entry in the training
database under five feet eight. In this case, many future
employees would be erroneously classified as short.
Major Issues in Data Mining
Outliers:
There are often many data entries that do not fit
nicely into the derived model.
If a model is developed that includes these
outliers, then the model may not behave well for
data that are not outliers.
Interpretation of results:
The data mining output may require experts to
correctly interpret the results.
Major Issues in Data Mining
Visualization of results:
To easily view and understand the output of
data mining algorithms, visualization of the
results is helpful
Large datasets:
Most of the dataset are massive datasets while
the algorithms are designed for small datasets.
Many modeling applications grow exponentially
on the dataset size and thus are too inefficient
for larger datasets.
This is scalability problem.
Major Issues in Data Mining
High dimensionality:
Not all attributes may be needed to solve a given
data mining problem.
• In fact, the use of some attributes may interfere with the
correct completion of a data mining task.
• Some of other attributes may simply increase the overall
complexity and decrease the efficiency of an algorithm.
This problem is referred as the dimensionality
curse, meaning that there are many attributes
involved and it is difficult to determine which ones
should be used.
One solution to this high dimensionality problem is
to reduce the number of attributes, which is known
as dimensionality reduction. However, determining
which attributes not needed is not always easy to
do.
Major Issues in Data Mining
Multimedia data:
Most previous data mining algorithms are targeted
to traditional data types (numeric, character, text,
etc.).
The use of multimedia data or GIS databases
complicates or invalidates many proposed
algorithms.
Missing data:
During the pre-processing phase of KDD, missing
data may be replaced with estimates.
This and other approaches to handling missing data
can lead to invalid results in the data mining step.
Major Issues in Data Mining
Irrelevant data:
Some attributes in the database might not be of
interest to the data mining task being
developed. How to identify?
Noisy data:
Some attribute values might be invalid or
incorrect.
So, these values are required to corrected before
running data mining applications.
Major Issues in Data Mining
Changing data:
Databases cannot be assumed to be static.
However, most data mining algorithms do assume a
static database.
This requires that the algorithm be completely rerun
anytime the database changes.
Integration:
The KDD process is not currently integrated into normal
data processing activities.
KDD requests may be treated as special, unusual, or
one-time needs.
This makes them inefficient, ineffective, and not general
enough to be used on an ongoing basis.
Integration of data mining functions into traditional
DBMS systems is certainly a desirable goal.
Major Issues in Data Mining
Application:
Determining the intended use for the
information obtained from the data mining
function is a challenge.
Indeed, how business executives can effectively
use the output is sometimes considered the more
difficult part, not the running of the algorithms
themselves.
Because the data are of a type that has not
previously been known, business practices may
have to be modified to determine how to
effectively use the information uncovered.
Data mining metrics
Measuring the effectiveness or usefulness of a
data mining approach is not straightforward.
In fact, different metrics could be used for
different techniques and also based on the
interest level.
From an overall business or usefulness
perspective, a measure such as return on
investment (ROI) could be used.
ROI examines the difference between what the data
mining technique costs and what the savings or
benefits from its use are.
Data mining metrics
But the return is hard to quantify.
It could be measured as increased sales, reduced
advertising expenditure, or both.
In a specific advertising campaign implemented
via targeted catalogue mailings, the percentage
of catalogue recipients and the amount of
purchase per recipient would provide one means
to measure the effectiveness of the mailing.
In a classroom approach accuracy in
classification is mostly used as metrics.

All Programs of Python PDF
89% (9)
All Programs of Python PDF
105 pages
Workflow Management Systems
No ratings yet
Workflow Management Systems
19 pages
Connecting The Digital Dots
No ratings yet
Connecting The Digital Dots
35 pages
Knowledge Based Expert Systems
100% (1)
Knowledge Based Expert Systems
12 pages
Types of Information Management
No ratings yet
Types of Information Management
20 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Global Strategies N MNC
No ratings yet
Global Strategies N MNC
28 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Why Information Systems Matter: There Are Four Reasons Why IT Makes A Difference To The Success of A Business
No ratings yet
Why Information Systems Matter: There Are Four Reasons Why IT Makes A Difference To The Success of A Business
28 pages
Plagiarism
No ratings yet
Plagiarism
10 pages
AnalysisofFinancialStatements PDF
100% (1)
AnalysisofFinancialStatements PDF
208 pages
5 Legal Aspects
No ratings yet
5 Legal Aspects
27 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Roadmap To An Effective Legal Regime of Environmental Protection in Nigeria
100% (1)
Roadmap To An Effective Legal Regime of Environmental Protection in Nigeria
5 pages
Petronas Licensing and Registration General Guidelines Eng As at 26 May 2018
No ratings yet
Petronas Licensing and Registration General Guidelines Eng As at 26 May 2018
40 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Crawling The Web: Seed Page and Then Uses The External Links Within It To Attend To Other Pages
No ratings yet
Crawling The Web: Seed Page and Then Uses The External Links Within It To Attend To Other Pages
25 pages
Concession Contract
100% (1)
Concession Contract
1 page
Data Mining
No ratings yet
Data Mining
14 pages
Analysis On The Business Transformation of Organizations Through Innovationlkjhg
No ratings yet
Analysis On The Business Transformation of Organizations Through Innovationlkjhg
22 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
A Strategic Partnership Presentation
No ratings yet
A Strategic Partnership Presentation
21 pages
ISBB Chapter7
100% (1)
ISBB Chapter7
23 pages
Home Automation System
100% (2)
Home Automation System
16 pages
Digital Government in Malaysia (Good For BM)
No ratings yet
Digital Government in Malaysia (Good For BM)
26 pages
Profiting From Plunder: How Malaysia Smuggles Endangered Wood
100% (1)
Profiting From Plunder: How Malaysia Smuggles Endangered Wood
24 pages
DG Framework and Maturity Stages I. Overall Framework: Commented (DLT (1) : DGD Plays The Role As Secretary For
No ratings yet
DG Framework and Maturity Stages I. Overall Framework: Commented (DLT (1) : DGD Plays The Role As Secretary For
4 pages
How Digital Marketing Operations Can Transform Business
No ratings yet
How Digital Marketing Operations Can Transform Business
14 pages
2011 Oil and Gas Fiscal Regimes
No ratings yet
2011 Oil and Gas Fiscal Regimes
66 pages
Integrated Information Management Control
No ratings yet
Integrated Information Management Control
3 pages
Accountability For Information Security Roles and Responsibilities Part 1 - Joa - Eng - 1019
No ratings yet
Accountability For Information Security Roles and Responsibilities Part 1 - Joa - Eng - 1019
10 pages
Web Crawling State of ArtTechniques ApproachesandApplication
No ratings yet
Web Crawling State of ArtTechniques ApproachesandApplication
26 pages
Indonesia Vs Malaysia PSC
No ratings yet
Indonesia Vs Malaysia PSC
20 pages
Lesson 1 - Introduction To IT
No ratings yet
Lesson 1 - Introduction To IT
24 pages
Ultimate-UEM-Guide US EN V4 PDF
No ratings yet
Ultimate-UEM-Guide US EN V4 PDF
17 pages
Information Systems Development Information Systems Development
No ratings yet
Information Systems Development Information Systems Development
38 pages
Doctrin Separation of POWER in Malaysia: Mohd Mizan Bin Mohammad Aslam +60195607711
No ratings yet
Doctrin Separation of POWER in Malaysia: Mohd Mizan Bin Mohammad Aslam +60195607711
12 pages
Various Applications of Data Warehouse
No ratings yet
Various Applications of Data Warehouse
30 pages
Global E Business and Collaboration
No ratings yet
Global E Business and Collaboration
2 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
CH 01
No ratings yet
CH 01
23 pages
Final Fog Computing
No ratings yet
Final Fog Computing
25 pages
Local Content Law in Nigeria and Its Implication On Nigerian Oil and Gas Industry
100% (2)
Local Content Law in Nigeria and Its Implication On Nigerian Oil and Gas Industry
26 pages
Unit 1 Introduction To Business Intelligence (BI) Systems: Structure
No ratings yet
Unit 1 Introduction To Business Intelligence (BI) Systems: Structure
24 pages
E-Library Management System
No ratings yet
E-Library Management System
29 pages
IoT and Its Application
No ratings yet
IoT and Its Application
25 pages
Spatial Query Language
100% (1)
Spatial Query Language
18 pages
Web Mining
No ratings yet
Web Mining
53 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Business Intelligence Capabilities
No ratings yet
Business Intelligence Capabilities
21 pages
Managing Information Systems
No ratings yet
Managing Information Systems
5 pages
Data Mining: Books
No ratings yet
Data Mining: Books
14 pages
The Internet of Things (Iot)
No ratings yet
The Internet of Things (Iot)
25 pages
ERP & Commerce by Ahlam
No ratings yet
ERP & Commerce by Ahlam
28 pages
Answers To Study Questions - Information Systems For Business and Beyond
100% (1)
Answers To Study Questions - Information Systems For Business and Beyond
23 pages
15-Session Taxonomy of Virtualization Techniques
No ratings yet
15-Session Taxonomy of Virtualization Techniques
9 pages
Principles
100% (1)
Principles
2 pages
Cyber Risks for Business Professionals: A Management Guide
From Everand
Cyber Risks for Business Professionals: A Management Guide
Rupert Kendrick
No ratings yet
1 Intro
No ratings yet
1 Intro
33 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
MF PDF
No ratings yet
MF PDF
60 pages
Data Flow Diagram
No ratings yet
Data Flow Diagram
4 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
DFD
No ratings yet
DFD
2 pages
Case Study : Linux Operating System
No ratings yet
Case Study : Linux Operating System
8 pages
FeeStructure
No ratings yet
FeeStructure
1 page
Python Question Paper Mumbai Univercity
No ratings yet
Python Question Paper Mumbai Univercity
5 pages
HR Project For Academic
No ratings yet
HR Project For Academic
6 pages
Seatingarrag CA PDF
No ratings yet
Seatingarrag CA PDF
1 page
CompTIA Network+ (Exam N10-006)
No ratings yet
CompTIA Network+ (Exam N10-006)
5 pages
Summer Training in NTPC Rihand Nagar at IT Dept
No ratings yet
Summer Training in NTPC Rihand Nagar at IT Dept
23 pages
4.1 Arm Based Implementation of Text-To-speech (TTS) For Real Time Embedded System
No ratings yet
4.1 Arm Based Implementation of Text-To-speech (TTS) For Real Time Embedded System
3 pages
Blood Bank Data Abstract PHP Project
40% (5)
Blood Bank Data Abstract PHP Project
25 pages
Entrapass-Special-Edtn-V5 0 Ds r07 A4 en
No ratings yet
Entrapass-Special-Edtn-V5 0 Ds r07 A4 en
2 pages
GNU Info For Versión 4.12,5
No ratings yet
GNU Info For Versión 4.12,5
39 pages
6CS030 Big Data 2019/0 Portfolio - Part 1: Worksheet Three - 5% Hand-Out: Week 9. Demo: Week 10 Workshop
No ratings yet
6CS030 Big Data 2019/0 Portfolio - Part 1: Worksheet Three - 5% Hand-Out: Week 9. Demo: Week 10 Workshop
2 pages
Info
No ratings yet
Info
2 pages
Familiarization With PC Components: Computer Hardware and Networking Lab (R707)
No ratings yet
Familiarization With PC Components: Computer Hardware and Networking Lab (R707)
93 pages
Answer ALL Questions in This Section
No ratings yet
Answer ALL Questions in This Section
10 pages
IEEE Srs Template
No ratings yet
IEEE Srs Template
8 pages
Copy IDOC From One SAP System To Other SAP System
No ratings yet
Copy IDOC From One SAP System To Other SAP System
4 pages
Supercomputer Selection, The Sequel: Input
No ratings yet
Supercomputer Selection, The Sequel: Input
2 pages
Skeletonization Using Distance Transform
No ratings yet
Skeletonization Using Distance Transform
9 pages
User-ID Agent Setup Tips Palo Alto Networks Live
No ratings yet
User-ID Agent Setup Tips Palo Alto Networks Live
3 pages
E-Book of Practical Assignments 2007-08
100% (1)
E-Book of Practical Assignments 2007-08
103 pages
Sudoku IN MATLAB
No ratings yet
Sudoku IN MATLAB
30 pages
How To Monitor and Maintain Printing Via UNIX
No ratings yet
How To Monitor and Maintain Printing Via UNIX
4 pages
3d Modeling
No ratings yet
3d Modeling
3 pages
Service Pack Installation
No ratings yet
Service Pack Installation
10 pages
Image Inpainting Using Partial Convolution
No ratings yet
Image Inpainting Using Partial Convolution
6 pages
Header Files
No ratings yet
Header Files
5 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
"How To Do" in Oracle Apps?
No ratings yet
"How To Do" in Oracle Apps?
10 pages
Attachment Note 1466740
No ratings yet
Attachment Note 1466740
8 pages
EZ1 Quick-Start Guide: Cortexa Automation Convenience, Peace of Mind and Energy Savings 888-CORTEXA
No ratings yet
EZ1 Quick-Start Guide: Cortexa Automation Convenience, Peace of Mind and Energy Savings 888-CORTEXA
2 pages
Citect 2013
No ratings yet
Citect 2013
12 pages
Lab6 BCD-to-Excess-3 Code Conversion
No ratings yet
Lab6 BCD-to-Excess-3 Code Conversion
6 pages
TwonkyServer 7.0.11 Release Notes
No ratings yet
TwonkyServer 7.0.11 Release Notes
7 pages

DM 1 PDF

Uploaded by

DM 1 PDF

Uploaded by

Data Mining

Almost everything we do leaves electronic

The term “data

Data Collection (1960s)- process of storing

Data Access (1980s)-the introduction of

Data Warehousing and Decision Report

Data Mining (present)- generalizing

In 1975, John Henry Holland wrote Adaptation in

1990s, the term “data mining” appeared in the

In 2001, William S. Cleveland introduced data

DJ Patil became the first Chief Data Scientist

Data: facts, numbers or text that can be

Information: the patterns, associations and

Knowledge: understanding of a subject,

Knowledge discovery consists of an iterative

Knowledge discovery consists of an iterative

Characterize the general properties

You might also like