0% found this document useful (0 votes)

297 views13 pages

Chapter-3 DATA MINING PDF

1. Data mining is the process of discovering patterns in large data sets involving methods from machine learning, statistics, and database systems. 2. Knowledge discovery in databases (KDD) is the overall process of extracting knowledge from data, which includes data selection, preprocessing, transformation, data mining, interpretation and evaluation. 3. Data mining is one step in the KDD process that focuses on algorithms to extract patterns from data, while KDD aims to convert raw data into useful knowledge through the entire process.

Uploaded by

Ramesh K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

297 views13 pages

Chapter-3 DATA MINING PDF

Uploaded by

Ramesh K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT-1

Chapter-3
DATA MINING

 INTRODUCTION

 Data mining is the non-trivial process of identifying valid, novel, potentially useful,
and ultimately understandable patterns in data. With the extensive use of databases
and the explosive growth in their sizes, organizations are facing the problem of
information overload. The effective utilization of these massive volumes of data is
becoming a major problem for all enterprises. Data mining techniques support automatic
exploration of data. Data mining attempts to source out patterns and trends in the data and
infer rules from these patterns .With these rules the user will be able to support review
and examine decisions in some related business or scientific area. This opens up the
possibility of a new way of interacting with databases and data warehouses.
 The evolution of data mining began when business data was first stored in computers and
technologies were generated to allow users to navigate through the data in real time. This
evolution is due to the support of three technologies that are sufficiently mature: massive
data collection, high performance computing and data mining algorithms.

 DATA MINING: DEFINITIONS

 Data mining, the extraction of the hidden predictive information from large databases, is
a powerful new technology with great potential to analyze important information in the
data warehouse.
 Definitions: The term ‘data mining’ refers to the finding of relevant and useful
information from databases. Data mining and knowledge discovery in the databases is a
new interdisciplinary field, merging ideas from statistics, machine learning, databases and
parallel computing. Researchers have defined the term ‘data mining’ in many ways.
Def-1:
“Data mining or knowledge discovery in databases, as it is also known, is the non trivial
extraction of implicit, previously unknown and potentially useful information from the data.
This encompasses a number of technical approaches such as clustering, data summarization,
classification, finding dependency networks, analyzing changes and detecting anomalies.”
 Data retrieval attempts to retrieve data that is stored in the database and presents it to the user
in a way that the user can understand. It does not attempt to extract implicit information.
 One may argue that if we store ‘date-of birth’ as a field in the database and extract ‘age’ from
it, the information received from the database is not explicitly available. But all of us would
agree that the information is not ‘non-trivial’ On the other hand, if one attempts to find out
the average age of the employees in a particular company, it can be visualized as a sort of
non-trivial extraction of implicit information.
 Then we can say that extracting the average age of the employees of a department from the
employees database is a data mining task? It is definitely a type of data mining task, but at a
very low level. A higher level task would, for example, be to find correlations between the
average age and average income of individuals in an enterprise.
Def-2:
“Data mining is the search for the relationships and global patterns that exist in large
databases but are hidden among vast amounts of data, such as the relationship between
patient data and their medical diagnosis. This relationship represents valuable knowledge
about the database and the objects in the database, if the database is a faithful mirror of the
real world registered by the database.”
 Consider the employee database and let us assume that the tool to determine relationship
between fields, say relationship between age and lunch patterns. Assume for example that we
find that most of employees in their thirties like to eat pizzas, burgers or Chinese food during
their lunch break. Employees in their forties prefer to carry lunch from their homes and
employees in their fifties take fruits and salads during lunch.
 If our tool finds this pattern from the data base which records the lunch activities of all
employees for last few months, then we can turn our tool as data mining tool. Just by
examining the database, it is impossible to notice any relationship between age and lunch
patterns.
Def-3:
“Data mining refers to using a variety of techniques to identify nuggets of information or
decision making knowledge in the database and extracting these in such a way that they can be
put to use in areas such as decision support prediction, forecasting and estimation. The data is
often voluminous but it has low value and no direct use can be made of it. It is hidden
information in the data i.e. useful.”
 Data mining is a process of finding value from volume. In any enterprise, the amount of
transactional data generated during its day to day operations is massive in volume.
 Data mining attempts to extract smaller pieces of valuable information of from this massive
database.
Def-4:
“Discovering relations that connect variables in a database is the subject of data mining. The
data mining system self learns from the previous history of the investigated system formulating
and testing hypothesis about rules which systems obey. When concise and valuable knowledge
about the system of interest is discovered, it can and should be interpreted into some decision
support system which helps the manager to make wise and informed business decision.”

Def-5:
“Data mining is the process of discovering meaningful, new correlation patterns and trends by
sifting through large amount of data stored in repositories, using pattern recognition
techniques as well as statistical and mathematical techniques.”

 KDD vs. DATA MINING

 Knowledge Discovery in Database (KDD) was formalized in 1989 with reference to the
general concept of being broad and high level in the pursuit of seeking knowledge from data.
This high –level application technique is used to present and analyze data for decision-
makers.
 Data Mining is only one of the many steps involved in knowledge discovery in databases. The
various steps in the knowledge discovery process include data selection, data cleaning, and
preprocessing and the interpretation of the discovered knowledge.
 Knowledge Discovery in Databases is the process of identifying a valid potentially useful
and ultimately understandable structure in data. This process involves selecting or
sampling data from a data warehouse, cleaning or preprocessing it, transforming or reducing
it, applying a data mining component to produce a structure, and the evaluating the derived
structure.
 Data Mining is a step in the KDD process concerned with the algorithmic means by which
patterns or structures are enumerated from the data under acceptable computational
efficiency limitations.
 Thus, the structures are the outcomes of the data mining must meet the certain conditions so
that these can be considered as knowledge. These conditions are: validity,
understandability, utility, novelty and interestingness.
KDD DATA MINING

1 KDD is the overall process of Data Mining is a step inside the KDD process, which
extracting knowledge from data. deals with identifying patterns in data.

2 KDD is the whole process of trying Data mining is the computing process of discovering
to make sense of data by developing patterns in large data sets involving methods at the
appropriate methods or intersection of machine learning, statistics, and
techniques. database systems.
3 Goal of the KDD process is to Data Mining goals as defined by the goal of the
extract knowledge from data in the application, and they are namely verification or
context of large databases. discovery.
4 KDD is the overall process of In data mining algorithms t used for extract the
converting raw data into useful information and patterns derive by the KDD process.
information.

 STAGES OF KDD:

There are 6 stages/steps of KDD there are:

 Data Selection
 Preprocessing
 Transformation
 Data Mining
 Interpretation & Evaluation
 Data Visualization

The stages of KDD, starting with the raw data and finishing with the extracted knowledge, are given
below:
 SELECTION:

o This stage is concerned with selecting right data that are relevant to some criteria.
o For example, for credit card customer profiling, we extract the type of transactions for each
type of customers and we may not be interested in details of the shop where the transaction
takes place.

 PREPROCESSING:

o Preprocessing is the data cleaning stage where unnecessary information is removed.

 TRANSFORMATION:

o The data is not merely transferred across, but transformed in order to be suitable for the task of
data mining.

 DATA MINING:

o This stage is concerned with the extraction of patterns from the data.

 INTERPRETATION AND EVALUATION:

o The patterns obtained in the data mining stage are converted into knowledge, which in turn, is
used to support decision making.

 DATA VISUALIZATION:

o Data visualization makes it possible for the analyst to gain deeper, more sensitive
understanding of the data and as such can work well alongside data mining.
o Data visualization helps users to examine large volumes of data and detect the patterns
visually.
o Visual displays of data such as maps, charts and other representation allow data to be
presented compactly to the users.

 DBMS vs DM

 DBMS supports query languages which are useful for query generated data exploration,
whereas data mining supports automatic data exploration. If we know exactly what
information we are looking for, a DBMS query would be enough: whereas if we are not clear
about the possible correlations or patterns, then data mining techniques are useful.

 Three different ways/approches in which data mining systems use a RDBMS:

i. 1st Approach:
 Majorities of data mining systems do not use any DBMS and have their own memory
and storage management.
 They treat the database simply as a data repository (storehouse) from which data is
expected to be downloaded into their own memory structures, before the data mining
algorithm starts.
 Conversely, these systems ignore the field–proven technologies of DBMS, such as
recovery, concurrency, etc.

ii. 2nd Approach:

 In loosely coupled DBMS, DBMS is used only for storage and retrieval of data.
 For example, one can use a loosely-coupled SQL to fetch data records as required by
the mining algorithm.
 The applications use a SQL select statement to retrieve the set of records of interest
from the database.
 A loop in the application program copies records in the set one by one from the database
address space to the application address space, where computation is performed on
them. This loosely coupled does not use the querying capability provided by the DBMS.

iii. 3rd Approach:

 In tightly coupled approach, the portions of the application programs are selectively
pushed to the database system to perform the necessary computation.
 Data are stored in the database and all processing is done at the database end.
 It is different from bringing the data from the database to the data mining to the data
mining area.
 On the other hand, the data mining application goes where the data naturally reside. This
avoids performance degradation and takes full advantage of database technology.

DBMS DATA MINING

1 DBMS, sometimes just called a database Data mining is also known as

manager Knowledge Discovery in Data (KDD).
2 DBMS is a collection of computer Data Mining is a felid of computer
programs that is dedicated for the science, which deals with the
management (i.e. organization, storage extraction of previously unknown
and retrieval) of all databases that are and interesting information from raw
installed in a system (i.e. hard drive or data.
network).

3 DBMS is used only for storage and Data Mining is a step inside the KDD
retrieval of data. process, which deals with identifying
patterns in data.
 ISSUES AND CHALLENGES IN DM:

 Data mining systems depend on databases to supply the raw input and this raises
problems, such as those databases tend to be dynamic, incomplete, noisy and large. Other
problems arise as a result of the inadequacy and irrelevance of the information stored.
 The difficulties in data mining can be categorized as:
o Limited information
o Noise or missing data
o User interaction and prior knowledge
o Uncertainty
o Size, updates and irrelevant fields

1. Limited information:
 A database is often designed for purposes other then that data mining and, sometimes,
some attributes which are essential for knowledge discovery of the application domain
are not present in the data. Thus, it may be very difficult to discover significant
knowledge about a given domain.

2. Noise and missing data :

 Attributes that rely on subjective or measurement judgments can give rise to errors, such
that some examples may be misclassified.
 Missing data can be treated in a number of ways-simply disregarding missing values,
omitting corresponding records, inferring missing values from known values, and
treating missing data as a special value to be included additionally in the attribute
domain.
 The data should be cleaned so that it is free of errors and missing data.

3. User interaction and prior knowledge:

 An analyst is usually not a KDD expert but simply a person making use of the data by
means of the available KDD techniques.
 Since the KDD process is by definition interactive and iterative, it is challenging to
provide a high performance, rapid-response environment that also assists the users in
the proper selection and matching of the appropriate techniques, to achieve their goals.
 There needs to be more human-computer interaction and less emphasis on total
automation, which supports both the novice and expert users.
 The use of domain knowledge is important in all steps of the KDD process.
 It would be convenient to design a KDD tool which is both interactive and Iterative.

4. Uncertainty:
 This refers to the severity of error and the degree of noise in the data. Data precision is
an important consideration in a discovery system.

5. Size, updates and irrelevant fields:

 Databases tend to be large and dynamic, in that their contents are keep changing as
information is added, modified or removed.
 The problem with this, from the perspective of data mining, is how to ensure that the
rules are up-to-date and consistent with the most current information.
 OTHER RELATED AREAS:

 STATISTICS:
o Statistics is a theory rich approach for data analysis.
o Statistics, with its solid theoretical foundation, generates results are difficult to interpret.
o Statistics is one of the foundational principles on which data mining technology is built.
o Statistical analysis systems are used by analysis to detect unusual patterns and explain
patterns using statistical models, such as linear models.

 MACHINE LEARNING:
 Machine learning is the automation of a learning process and learning is equivalent to the
construction of rules based on observations.
 This is a broad field which includes not only learning from examples but also
reinforcement learning, learning with a teacher, etc.
 A learning algorithm takes the data set and its additional information as the input and
returns a statement,e.g, a concept representing the results of learning as output.
 Inductive learning, where the system concludes knowledge itself from observing its
environment, has two main strategies: Supervised learning and unsupervised learning.

 Supervised Learning:
o Supervised learning means learning from examples, where a training set is given
which acts as examples for the classes.
o The system finds a description of each class. Once the description has been
formulated, it is used to predict the class of previously unseen objects.
o “In supervised learning, the output datasets are provided which are used to train the
machine and get the desired outputs.”
o Supervised learning is where you have input variables (x) and an output variable (Y)
and you use an algorithm to learn the mapping function from the input to the
output.
Y = f(X)
o The goal is to approximate the mapping function so well that when you have new
input data (x) that you can predict the output variables (Y) for that data.
o It is called supervised learning because the process of an algorithm learning from the
training dataset can be thought of as a teacher supervising the learning process.
o Supervised learning problems can be further grouped into regression and
classification problems.
o Classification: A classification problem is when the output variable is a category,
such as “red” or “blue” or “disease” and “no disease”.
o Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
o Some popular examples of supervised machine learning algorithms are:
 Linear regression for regression problems.
 Random forest for classification and regression problems.
 Support vector machines for classification problems.
 Unsupervised Learning:
o Unsupervised learning is learning from observation and discovery.
o In this mode of learning, there is no training set or prior knowledge of the classes.
The system analyzes the given set of data to observe similarities emerging out of the
subsets of the data.
o “In unsupervised learning no datasets are provided, instead the data is clustered into
different classes.”
o Unsupervised learning is where you only have input data (X) and no corresponding
output variables.
o The goal for unsupervised learning is to model the underlying structure or distribution
in the data in order to learn more about the data.
o These are called unsupervised learning because unlike supervised learning above
there is no correct answers and there is no teacher.
o Unsupervised learning problems can be further grouped into clustering and
association problems.
o Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
o Association: An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that buy X also tend to
buy Y.
o Some popular examples of unsupervised learning algorithms are:
 k-means for clustering problems.
 Apriori algorithm for association rule learning problems.

 DM TECHNIQUES:
Researchers identify two goals of data mining: prediction and description.
 Prediction makes use of existing variables in the database in order to predict unknown or
future values of interest.
 Description focuses on finding patterns describing the data on the successive
presentation.
 The relative emphasis of both prediction and description differ with respect to the
fundamental application and the technique.
 Two different classifications of DM techniques are:
 User guided or verification-driven data mining and
 Discovery –driven or automatic discovery of rules

 Verification Model:
o In this process of data mining, the user makes a hypothesis and tests the hypothesis on
the data to verify its validity.
o The importance is on the user who is responsible for formulating the hypothesis and
issuing the query on the data to affirm or negate the hypothesis.
 In a supermarket, for example, with a limited budget for a mailing movement to launch a
new product, it is important to identify the section of the population most likely to buy
the new product (i.e. demand of new product).The user formulates a guess to identify
potential customers and their common characteristics. Historical data about transactions
and demographic information can then queried to reveal comparable purchases and the
characteristics shared buy those purchasers. The whole operation can be repeated by
successive refinements of hypothesis until the required limit is reached. Finally user does
verification against database.

 Discovering Model:
 Discovering model is the system automatically discovering important information hidden
in the data.
 The data is filtered in search of frequently occurring patterns, trends and generalizations
about the data with intervention or guidance from the user.
 The discovery model differs in its emphasis in that it is the system automatically
discovering important information hidden in the data.
 The data is sifted in search of frequently occurring patterns, trends and generalizations
about the data without intervention or guidance from the user.
 The discovery or data mining tools aim to reveal a large number of facts about the data
in as short a time as possible.
 For example:Model is a bank database which is mined to discover the many groups of
customers to target for a mailing campaign.The data is searched with no hypothesis in
mind other than for the system to group the customers according to the common
characteristics found.
 The typical discovery driven tasks are:
 Discovery of association rules
 Discovery of classification rules
 Clustering
 Discovery of frequent episodes
 Deviation detection

 Discovery of association rules:

 An association rule is an expression of the form X => Y, where X and Y are the
sets of items. The observant meaning of such a rule is that the transaction of the
database which contains X tends to contain Y.Given a database, the goal is to
discover all the rules that have the support and confidence greater that or equal
to the minimum support and confidence, respectively.
 Clustering:
 Clustering is as a method of grouping data into different groups, so that the data
in each group share similar trends and patterns. Clustering constitutes a major
class of data mining algorithms.
 Clustering is a concept which appears in many disciplines. If a measure of similarity is
available, then there are a number of techniques forming clusters. Another approach is to
build set functions that measure some particular property of groups.
 The objectives of clustering are:
i. To uncover natural groupings
ii. To initiate hypothesis about the data
iii. To find consistent and valid organization of the data.

 Discovery of classification rule:

 Classification involves finding rules that partition the data into disjoint groups.
The input for the classification is the training data set, whose class labels are
already known. Classification analyzes the training data set and constructs a
model based on the class label and aims to assign a class label to the future
unlabelled records. Since the class field is known, this type of classification is
known as supervised learning .A set of classification rules are generated by such
a classification process , which can be use to classify future data and develop a
better understanding of each class in the database. We can term this as
supervised learning too.
 There are several classification discovery models; they are the decision tress, neutral
networks, genetic algorithms and statistical models like linear/geometric discriminates.
The applications include the credit card analysis, banking, medical applications and the
like. Consider the following example. The domestic flights in our country were at one
time only operated by Indian Airlines. Recently, many other private airlines began their
operations for domestic travel. Some of the customers of Indian Airlines started flying
with these private airlines and, as a result, Indian Airlines lost these customers. Let us
assume that Indian Airlines wants to understand why some customers remain loyal while
others leave. Ultimately, the airline wants to predict which customers it is most likely to
lose to its competitors. Their aim to build a model based on the historical data of loyal
customers who have left. This becomes a classification problem.

 DM APPLICATION AREAS:

1. BUSINESS AND E-COMMERCE DATA

o This is a major source category of data for data mining applications. Back-office, front-
office, and network applications produce large amounts of data about business processes;
Using this data for effective decision making remains a fundamental challenge.
 BUSINESS TRANSACTIONS:
o Modern business processes are consolidating with millions of customers and billions
of their transactions. Business enterprises require necessary information for their
effective functioning in today's competitive world.
 ELECTRONIC COMMERCE:
o Not only does electronic commerce produce large data sets in which the analysis of
marketing patterns and risk patterns is critical but, it is also important to do this in
near-real time, in order to meet the demands of online transactions.

2. SCIENTIFIC, ENGINEERING AND HEALTH CARE DATA

o Scientific data and metadata tend to be more complex in structure than business data.
In addition, scientists and engineers are making increasing use of simulation and
systems with application domain knowledge.

 GENOMIC DATA:
o Genomic sequencing and mapping efforts have produced a number of databases
which are accessible on the web. In addition, there are also a wide variety of other
online databases. Finding relationships between these data sources is another
fundamental challenge for data mining.
 SENSOR DATA:
o Remote sensing data is another source of voluminous data. Remote sensing satellites
and a variety of other sensors produce large amounts of geo-referenced data. A
fundamental challenge is to understand the relationships, including causal
relationships, amongst this data.

 SIMULATION DATA:
o Simulation is now accepted as an important mode of science, supplementing theory
and experiment. Today, not only do experiments produce huge data sets, but so do
simulations. Data mining and, more generally, data intensive computing is proving to
be a critical link between theory, simulation, and experiment.

 HEALTH CARE DATA:

o Hospitals, health care organizations, insurance companies, and the concerned
government agencies accumulate large collections of data about patients and health
care-related details.

 WEB DATA:
o The data on the web is growing not only in volume but also in complexity. Web data
now includes not only text, audio and video material, but also streaming data and
numerical data.

 MULTIMEDIA DOCUMENTS:
o Today's technology for retrieving multimedia items on the web is far from
satisfactory. On the other hand, an increasingly large number of matters are on the
web and the number of users is also growing explosively. It is becoming harder to
extract meaningful information from the archives of multimedia data as the volume
grows.

 DM APPLICATIONS:

 There is a wide range of well-established business applications for data mining. These
include customer attrition, profiling, promotion forecasting, product cross-selling, plan
detection, targeted marketing, propensity analysis; credit scoring, risk analysis, etc. We
shall now discuss a few mock case-studies and areas of DM applications.

 HOUSING LOAN PREPAYMENTPREDICTION:

o A home-finance loan actually has an average life-span of only 7 to 10 years, due to
prepayment. Prepayment means that the loan is paid off early, rather than at the end
of, say, 25 years. People prepay loans when they refinance or when they sell their
home. The financial return that a home-finance institution derives from a loan
depends on its life-span. Therefore, it is necessary for the financial institutions to be
able to predict the life-spans of their loans. Rule discovery techniques can be used to
accurately predict the aggregate number of loan prepayments in a given quarter (or,
in a year), as a function of prevailing interest rates, borrower characteristics, and
account data. This information can be used to fine-tune loan parameters such as
interest rates, points, and fees, in order to maximize profits.
o
 MORTGAGE LOAN DELINQUENCY (negligence) PREDICTION:
o Loan defaults usually entail expenses and losses for the banks and other lending
institutions. Data mining techniques can be used to predict whether or not a loan
would go delinquent within the succeeding 12 months, based on historical data, on
account information, borrower demographics, and economic indicators. The rules can
be used to estimate and fine-tuned loan loss reserves and to gain some business
insight into the characteristics and circumstances of delinquent loans. This will also
help in deciding the funds that should be kept aside to handle bad loans.

 CRIME DETECTION:
o Crime detection is another area one might immediately associate with data mining.
Let us consider a specific case: to find patterns in 'bogus official' burglaries. A
typical example of this kind of crime is when someone turns up at the door
pretending to be from the water board, electricity board, telephone department or gas
company. At the same time as they distract the householder, their partners will search
the premises and steal cash and items of value. Victims of this sort of crime tend to
be the elderly. These cases have no obvious leads, and data mining techniques may
help in providing some unexpected connections to known perpetrators.

 STORE-LEVEL FRUITS PURCHASING PREDICTION:

o A super market chain called 'Fruit World' sells fruits of different types and it
purchases these fruits from the wholesale suppliers on a day-to-day basis. The
problem is to analyze fruit-buying patterns, using large volumes of data captured at
the 'basket' level. Because fruits have a short shelf-life, it is important that accurate
store-level purchasing predictions should be made to ensure optimum freshness and
availability.

END OF UNIT-1

Causal Inference in Statistics: An Overview
100% (2)
Causal Inference in Statistics: An Overview
51 pages
Side Resistance Capacity of Piles
100% (1)
Side Resistance Capacity of Piles
12 pages
Changes 2025
No ratings yet
Changes 2025
24 pages
DWM Question Bank
No ratings yet
DWM Question Bank
3 pages
Text Book Answers Unit 11
100% (2)
Text Book Answers Unit 11
16 pages
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
MCM1C03 0
No ratings yet
MCM1C03 0
220 pages
SCM APO DP Overview
No ratings yet
SCM APO DP Overview
104 pages
User Satisfaction On UUM Online Learning
100% (1)
User Satisfaction On UUM Online Learning
33 pages
Ipl Teams Strike Gold With
100% (1)
Ipl Teams Strike Gold With
15 pages
Transportation Asset Mangement
No ratings yet
Transportation Asset Mangement
67 pages
K Means Questions
No ratings yet
K Means Questions
2 pages
The Relationship Between Corporate Governance Characteristics and Financial Statement Frauds: An Empirical Analysis of Italian Listed Companies
No ratings yet
The Relationship Between Corporate Governance Characteristics and Financial Statement Frauds: An Empirical Analysis of Italian Listed Companies
21 pages
Accounting Fin Bcom
No ratings yet
Accounting Fin Bcom
36 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Hire Purchase and Installment Purchases
No ratings yet
Hire Purchase and Installment Purchases
40 pages
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
No ratings yet
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
9 pages
Data Innovation Lab: Technische Universität München: MTU Aero Engines
No ratings yet
Data Innovation Lab: Technische Universität München: MTU Aero Engines
41 pages
Digital Image Processing - Image Enhancement
No ratings yet
Digital Image Processing - Image Enhancement
50 pages
Review Article Digital Change Detection Techniques Using Remotely Sensed Data
No ratings yet
Review Article Digital Change Detection Techniques Using Remotely Sensed Data
16 pages
Marketing Analytics Model Paper With Solution
No ratings yet
Marketing Analytics Model Paper With Solution
25 pages
STAT 2601 Final Exam Extra Practice Questions
No ratings yet
STAT 2601 Final Exam Extra Practice Questions
9 pages
Ma - Unit V
No ratings yet
Ma - Unit V
22 pages
Past Paper Erp Short
No ratings yet
Past Paper Erp Short
6 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Test Bank For Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank For Business Analytics 3rd Edition by Evans
28 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Unit 2 - Knowledge Delivery
No ratings yet
Unit 2 - Knowledge Delivery
31 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Data Mining
No ratings yet
Data Mining
8 pages
Analysis of Profitability of Fish Farming Among Women in Osun State, Nigeria
No ratings yet
Analysis of Profitability of Fish Farming Among Women in Osun State, Nigeria
10 pages
Business Intelligence
No ratings yet
Business Intelligence
60 pages
Verification and Validation in Scientific Computing William L Oberkampf PDF Download
No ratings yet
Verification and Validation in Scientific Computing William L Oberkampf PDF Download
80 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
AGE 301 - NOTE - A-1
No ratings yet
AGE 301 - NOTE - A-1
8 pages
Sharda Dss10 PPT 08 ST
No ratings yet
Sharda Dss10 PPT 08 ST
14 pages
Data Mining: Books
No ratings yet
Data Mining: Books
14 pages
Breiman, L. Friedman, J. H. Olshen, R. A. Stone, C. J. - Classification and Regression Trees - 1984
No ratings yet
Breiman, L. Friedman, J. H. Olshen, R. A. Stone, C. J. - Classification and Regression Trees - 1984
33 pages
BA4101 - Statistics - For - Management - Revised
No ratings yet
BA4101 - Statistics - For - Management - Revised
21 pages
Data Mining: Exploring Data: Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data: Lecture Notes For Chapter 3
21 pages
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
7 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
Final Meskerm Feleke
No ratings yet
Final Meskerm Feleke
75 pages
Investigating The Relationship Between Knowledge Management Dimensions and Organizational Performance in Lean Manufacturing
No ratings yet
Investigating The Relationship Between Knowledge Management Dimensions and Organizational Performance in Lean Manufacturing
9 pages
Pengaruh Stres Kerja Terhadap Kinerja Karyawan
No ratings yet
Pengaruh Stres Kerja Terhadap Kinerja Karyawan
9 pages
Chapter 1 - Database Performance Tuning and Query Optimization
No ratings yet
Chapter 1 - Database Performance Tuning and Query Optimization
50 pages
SPSS Project
0% (1)
SPSS Project
12 pages
Better Subset Regression Using The Nonnegative Garrote: Technometrics
No ratings yet
Better Subset Regression Using The Nonnegative Garrote: Technometrics
13 pages
Tan 2016
No ratings yet
Tan 2016
27 pages
Bi Unit1
No ratings yet
Bi Unit1
93 pages
CH 3
No ratings yet
CH 3
27 pages
UNIT II - Pricing Analytics
No ratings yet
UNIT II - Pricing Analytics
45 pages
SNM Notes Based On Syllabus of Pokhara University
No ratings yet
SNM Notes Based On Syllabus of Pokhara University
73 pages
Big Data Notes
No ratings yet
Big Data Notes
4 pages
Unit 1
No ratings yet
Unit 1
70 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
3 Lecture03
No ratings yet
3 Lecture03
30 pages
The Value of A Non-Sport-Specific Motor Test Battery in Predicting Performance in Young Female Gymnasts
No ratings yet
The Value of A Non-Sport-Specific Motor Test Battery in Predicting Performance in Young Female Gymnasts
11 pages
Data Preprocessing
No ratings yet
Data Preprocessing
3 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Chapter 3 Olap and Oltp
No ratings yet
Chapter 3 Olap and Oltp
29 pages
Chapter 5 Data Resource Management
No ratings yet
Chapter 5 Data Resource Management
24 pages
Financial Analytics 4
No ratings yet
Financial Analytics 4
9 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Edureka CAS Brochure PDF
No ratings yet
Edureka CAS Brochure PDF
15 pages
To Pass or Fail: A University of Mindanao Librarian Licensure Performance Study
No ratings yet
To Pass or Fail: A University of Mindanao Librarian Licensure Performance Study
7 pages
DBMS Notes
No ratings yet
DBMS Notes
141 pages
PopDynLWR E504
No ratings yet
PopDynLWR E504
19 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Sharda dss10 PPT 04
No ratings yet
Sharda dss10 PPT 04
38 pages
Web Analytics, Web Mining, and Social Analytics
No ratings yet
Web Analytics, Web Mining, and Social Analytics
53 pages
Ds - Lab - 4.ipynb - Colab
No ratings yet
Ds - Lab - 4.ipynb - Colab
7 pages
Lab Manual B.Sc. (CA) : Department of Computer Science Ccb-2P2: Laboratory Course - Ii
No ratings yet
Lab Manual B.Sc. (CA) : Department of Computer Science Ccb-2P2: Laboratory Course - Ii
31 pages
7 Youth Awareness - A Survey On Mobile Gaming Addiction Concerning Physical Health Performance On Young Adults in Malaysia
No ratings yet
7 Youth Awareness - A Survey On Mobile Gaming Addiction Concerning Physical Health Performance On Young Adults in Malaysia
14 pages
Mining Class Comparisions and Mining Descriptive Statistical Measures
No ratings yet
Mining Class Comparisions and Mining Descriptive Statistical Measures
24 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Public Drinking Water Contamination and Birth Outcomes
No ratings yet
Public Drinking Water Contamination and Birth Outcomes
13 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Unit 3 Network Infrastructure: Structure
No ratings yet
Unit 3 Network Infrastructure: Structure
24 pages
Data Mining KDD Process
No ratings yet
Data Mining KDD Process
22 pages
Model Test Paper Dbms
No ratings yet
Model Test Paper Dbms
14 pages
I M Com QT Final On16march2016
0% (1)
I M Com QT Final On16march2016
166 pages
Business Statistics: Measures of Central Tendency
No ratings yet
Business Statistics: Measures of Central Tendency
44 pages
Chapter 10 Decision Support Systems
No ratings yet
Chapter 10 Decision Support Systems
42 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
Data Warehouse and OLAP
No ratings yet
Data Warehouse and OLAP
55 pages
Business Datamining and Warehousing
No ratings yet
Business Datamining and Warehousing
121 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet

Chapter-3 DATA MINING PDF

Uploaded by

Chapter-3 DATA MINING PDF

Uploaded by

UNIT-1

 DATA MINING: DEFINITIONS

 KDD vs. DATA MINING

There are 6 stages/steps of KDD there are:

o Preprocessing is the data cleaning stage where unnecessary information is removed.

 INTERPRETATION AND EVALUATION:

 Three different ways/approches in which data mining systems use a RDBMS:

ii. 2nd Approach:

iii. 3rd Approach:

DBMS DATA MINING

1 DBMS, sometimes just called a database Data mining is also known as

2. Noise and missing data :

3. User interaction and prior knowledge:

5. Size, updates and irrelevant fields:

 Discovery of association rules:

 Discovery of classification rule:

1. BUSINESS AND E-COMMERCE DATA

2. SCIENTIFIC, ENGINEERING AND HEALTH CARE DATA

 HEALTH CARE DATA:

 HOUSING LOAN PREPAYMENTPREDICTION:

 STORE-LEVEL FRUITS PURCHASING PREDICTION:

You might also like