0% found this document useful (0 votes)

42 views62 pages

Unit 1 DM

The document provides an overview of data mining, including its definition, characteristics, and the Knowledge Discovery in Databases (KDD) process, which consists of steps like data cleaning, integration, selection, transformation, mining, evaluation, and representation. It discusses various types of data, data mining tasks, functionalities, and the integration of data mining systems with data warehouses. Additionally, it highlights major issues in data mining, such as data reduction and dimensionality reduction, emphasizing the importance of extracting useful patterns from large datasets.

Uploaded by

enuguprasanna23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views62 pages

Unit 1 DM

Uploaded by

enuguprasanna23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Data Mining

Unit-1
Syllabus

Data Mining:
Data–Types of Data–, Data Mining Functionalities–
Interestingness Patterns–Classiﬁcation of Data Mining
systems– Data mining Task primitives –Integration of Data
mining system with a Data warehouse–Major issues in Data
Mining–Data Preprocessing.
What is Data Mining?

✔Extracting the information from large collection of data which is unknown to the user.
Characteristics of Data Mining:
Non-Trivial: should be relevant that the data to be required.
Novel: Unique all times- should give same results at all times., even apply different algorithm.
Useful: information which is retrieved should be useful for decision making.

Data mining is used in business to make better managerial decisions by:

❑Automatic summarization of data
❑Extracting essence of information stored.
❑Discovering patterns in raw data.
KDD Process in Data Mining- Knowledge Discovery in Databases
1. Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection.
✔Cleaning in case of Missing values.
✔Cleaning noisy data, where noise is a random or variance error.
✔Cleaning with Data discrepancy detection and Data transformation tools.
2. Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common
source(DataWarehouse).
✔Data integration using Data Migration tools.
✔Data integration using Data Synchronization tools.
✔Data integration using ETL(Extract-Load-Transformation) process.
3. Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from
the data collection. Data selection using Neural network.
✔Data selection using Decision Trees.
✔Data selection using Naive bayes.
✔Data selection using Clustering, Regression, etc
4. Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form required by
mining procedure.
5. Data Mining: Data mining is defined as clever techniques that are applied to extract patterns potentially useful.
✔Transforms task relevant data into patterns.
✔Decides purpose of model using classification or characterization.
6. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on
given measures.
✔Find interestingness score of each pattern.
✔Uses summarization and Visualization to make data understandable by user.
7. Knowledge representation: Knowledge representation is defined as technique which utilizes visualization tools to represent
data mining results. Note:
KDD is an iterative process where evaluation
✔Generate reports. measures can be enhanced, mining can be
✔Generate tables. refined, new data can be integrated and
transformed in order to get different and more
✔Generate discriminant rules, classification rules, characterization rules, etc. appropriate results.
Types of Data
Data: Data Is Information That Has Been Translated Into A Form That Is Efficient For Movement Or
Processing.

The Most Basic Forms Of Data For Mining Applications Are

1. Database Data
2. Data Warehouse Data
3. Transactional Data
4. Other Kinds Of Data
1. Database Data

❖DBMS – database management system, contains a collection of interrelated databases

Ex: Faculty database, student database, publications database
❖Each database contains a collection of tables and functions to manage and access the
data.
Ex: student_bio, student_parking
❖Each table contains columns and rows, with columns as attributes of data and rows as
records.
❖Tables can be used to represent the relationships between or among multiple tables.
Example
Through the use of relational queries, you can ask things like, “Show me a list of all items that were sold in the last quarter.”
Relational languages also use aggregate functions such as sum, avg (average), count, max (maximum), and min (minimum).
Using aggregates allows you to ask:
1. “Show me the total sales of the last month, grouped by branch,” or
2. “How many sales transactions occurred in the month of December?”
3. or “Which salesperson had the highest sales?”

When mining relational databases, we can go further by searching for trends or data
patterns.
For example, data mining systems can analyze customer data to predict the credit risk of new customers based on their
income, age, and previous credit information.
Data mining systems may also detect deviations—that is, items with sales that are far from those expected in comparison
with the previous year.
2. Data Warehouse Data
✔A data warehouse is a repository of information collected from multiple sources, stored under a uniﬁed schema, and usually residing at a
single site.
✔Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data
refreshing.
✔A data warehouse is usually modeled by a multidimensional data structure, called a data cube, in which each dimension corresponds to an
attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure such as count or sum(sales
amount). A data cube provides a multidimensional view of data and allows the precipitation and fast access of summarized data.
✔Data are organized around major subjects
e.g. customer, item, supplier and activity.
✔Provide information from a historical perspective
e.g. from the past 5 – 10 years
✔Typically summarized to a higher level
e.g. a summary of the transactions per item type for each store
✔User can perform drill-down or roll-up operation to view the data at
different degrees of summarization.
3. Transactional Data
✔In general, each record in a transactional database captures a transaction, such as a customer’s purchase, a ﬂight
booking, or a user’s clicks on a web page.
✔A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up the
transaction, such as the items purchased in the transaction.
✔A transactional database may have additional tables, which contain other information related to the transactions,
such as item description, information about the salesperson or the branch, and so on.
4. Other Kinds Data
✔Time-related Or Sequence Data (E.G., Historical Records, Stock Exchange Data, And Time-series And Biological Sequence Data),

✔Data Streams (E.G., Video Surveillance And Sensor Data, Which Are Continuously Transmitted),

✔Spatial Data (E.G., Maps),

✔Engineering Design Data (E.G., The Design Of Buildings, System Components, Or Integrated Circuits),

✔Hypertext And Multimedia Data (Including Text, Image, Video, And Audio Data),

✔Graph And Networked Data (E.G., Social And Information Networks),

✔The Web (A Huge, Widely Distributed Information Repository Made Available By The Internet).
A typical DM System Architecture

Data Mining Tasks:

Class/Concept Description
Mining of Frequent Patterns
Mining of Associations
Mining of Correlations
Mining of Clusters
Data Sources: Database, World Wide Web(WWW), and data warehouse are parts of data sources. The data in these sources
may be in the form of plain text, spreadsheets, or other forms of media like photos or videos. WWW is one of the biggest sources
of data.
Database Server: The database server contains the actual data ready to be processed. It performs the task of handling data
retrieval as per the request of the user.
Data Mining Engine: It is one of the core components of the data mining architecture that performs all kinds of data mining
techniques like association, classification, characterization, clustering, prediction, etc.
Pattern Evaluation Modules: They are responsible for finding interesting patterns in the data and sometimes they also
interact with the database servers for producing the result of the user requests.
Graphic User Interface: Since the user cannot fully understand the complexity of the data mining process so graphical
user interface helps the user to communicate effectively with the data mining system.
Knowledge Base: Knowledge Base is an important part of the data mining engine that is quite beneficial in guiding the
search for the result patterns. Data mining engines may also sometimes get inputs from the knowledge base. This knowledge
base may contain data from user experiences. The objective of the knowledge base is to make the result more accurate and
reliable.
Data Mining Functionalities
Data mining functionalities are used to specify the kinds of patterns to be found in data mining tasks.
1. Descriptive mining
2. Predictive mining

2. Predictive Data Mining is the Analysis

1. Descriptive Data Mining is a data mining
done to predict a future event or other
technique that identifies what happened
data or trends.
in the past by analyzing the stored past
Ex:
data.
✔Predicting employee growth in HR.
Ex: research, business, economics, social ✔Predicting performance in sports.
✔Forecasting patterns in weather.
sciences, and healthcare.
✔Fraud Detection
Data Mining Functionalities
✔Classification: data into different classes
✔Clustering & Anomaly Detection (Outlier Change Detection): clustering groups a set of objects so that objects in the
same group are more similar to each other than those in other groups & identifies unusual data records.
✔Regression: predicts a range of numeric values based on a continuous dataset
✔Association Rules: discovering interesting relations between variables in large databases
✔Decision Trees: a model that uses a tree-like graph of decisions and their possible consequences.
✔Neural Networks: neural networks are a series of algorithms that attempt to recognise underlying relationships in a
data set.
✔Data Visualization: Turning complex data sets into graphical representations that are easy to understand and interpret.
✔Text Mining: Utilizing techniques to extract qualitative information from text data sources.
Interestingness Patterns
A data mining system has the potential to generate thousands or even millions of patterns, or rules. then “are
all of the patterns interesting?” Typically, not—only a small fraction of the patterns potentially generated
would be of interest to any given user.

What makes a pattern interesting? understood by humans, valid on new or test data with some degree of certainty,

Potentially useful and novel.

Can a data mining system generate all the interesting patterns? refers to the completeness of a data mining algorithm.

Can a data mining system generate only interesting patterns? It is highly desirable for data mining systems to generate

only interesting patterns.An interesting pattern represents knowledge.

Classiﬁcation of Data Mining Systems Classiﬁcation of the data mining system helps
users to understand the system and match their
requirements with such systems.
Data mining systems can be categorized according to various criteria, as follows:
i) Classification according to the kinds of databases mined:
✔Database systems can be classified according to data models, we may have a relational,
transactional, object-relational, or data warehouse mining system.
✔Each of which may require its own data mining technique.
ii) Classification according to the kinds of knowledge mined:
✔Data mining systems can be categorized according to the kinds of knowledge they mine, that is,
based on data mining functionalities, such as characterization, discrimination, association and
correlation analysis, classification, prediction, clustering, outlier analysis, and evolution analysis.
iii) Classification according to the kinds of techniques utilized:
✔Data mining systems can be categorized according to the underlying data mining techniques
employed.
✔These techniques can be described according to the degree of user interaction involved (e.g.,
autonomous systems, interactive exploratory systems, query-driven systems).
IV) Classification according to the applications adapted:

✔Data mining systems can also be categorized according to the applications they adapt.

✔For example, data mining systems may be tailored specifically for finance,

telecommunications, DNA, stock markets, e-mail, and so on

Data Mining Task primitives
✔A data mining task can be specified in the form of
a data mining query, which is input to the data
mining system.
✔A data mining query is defined in terms of data
mining task primitives.
✔These primitives allow the user to interactively
communicate with the data mining system during
the mining process to discover interesting
patterns.
Set of task relevant data to be mined
This specifies the portions of the database or the set of data in which the user is interested.
This portion includes the following
✔Database Attributes
✔Data Warehouse dimensions of interest
For example, suppose that you are a manager of All Electronics in charge of sales in the United States and Canada. You would like to
study the buying trends of customers in Canada. Rather than mining on the entire database. These are referred to as relevant attributes.
Kind of knowledge to be mined
This specifies the data mining functions to be performed, such as Characterization& Discrimination
✔Association
✔Classification
✔Clustering
✔Prediction
✔Outlier analysis
For instance, if studying the buying habits of customers in Canada, you may choose to mine associations between customer profiles
and the items that these customers like to buy.
Background knowledge to be used in discovery process: Users can specify background knowledge, or knowledge
about the domain to be mined. This knowledge is useful for guiding the knowledge discovery process, and for evaluating
the patterns found. User beliefs about relationship in the data.
✔Concept hierarchies are a popular form of background knowledge, which allow data to be mined at multiple levels of
abstraction.
Interestingness measures and thresholds for pattern evaluation

✔The Interestingness measures are used to separate interesting and uninteresting patterns from the knowledge.

✔They may be used to guide the mining process, or after discovery, to evaluate the discovered patterns. Different kinds

of knowledge may have different interestingness measures.

For example, interesting measures for association rules include support and conﬁdence.

Representation for visualizing the discovered patterns

✔This refers to the form in which discovered patterns are to be displayed. Users can choose from different forms for

knowledge presentation.

rules, tables, reports, charts, graphs, decision trees, and cubes.

Integration of Data mining system with a Data warehouse
The data mining system is integrated with a database or data warehouse system so that it can do its tasks in an effective mode. A data
mining system operates in an environment that needs to communicate with other data systems like a Database or Dataware house
system.
No Coupling
❑No coupling means that a Data Mining system will not utilize any function of a Data Base or Data Warehouse

system.

❑It may fetch data from a particular source (such as a ﬁle system), process data using some data mining algorithms,

and then store the mining results in another ﬁle.

Drawbacks of No Coupling

❖First, without using a Database/Data Warehouse system, a Data Mining system may spend a substantial amount of

time ﬁnding, collecting, cleaning, and transforming data.

❖Second, there are many tested, scalable algorithms and data structures implemented in Database and Data

Warehouse systems.
Loose Coupling
❑In this Loose coupling, the data mining system uses some facilities / services of a database or data warehouse

system. The data is fetched from a data repository managed by these (DB/DW) systems.

❑Data mining approaches are used to process the data and then the processed data is saved either in a ﬁle or in a

designated area in a database or data warehouse.

❑Loose coupling is better than no coupling because it can fetch any portion of data stored in Databases or Data

Warehouses by using query processing, indexing, and other system facilities.

Drawbacks of Loose Coupling

❑It is difficult for loose coupling to achieve high scalability and good performance with large data sets.
Semi-Tight Coupling
✔Semi tight coupling means that besides linking a Data Mining system to a Data Base/Data Warehouse system, efficient
implementations of a few essential data mining primitives can be provided in the DB/DW system.
✔These primitives can include sorting, indexing, aggregation, histogram analysis, multi way join, and pre-computation of
some essential statistical measures, such as sum, count, max, min, standard deviation.
Advantage of Semi-Tight Coupling
This Coupling will enhance the performance of Data Mining systems

Tight Coupling
Tight coupling means that a Data Mining system is smoothly integrated into the Data Base/Data Warehousesystem.
The data mining subsystem is treated as one functional component of information system. Data mining queries and
functions are optimized based on mining query analysis, data structures, indexing schemes, and query processing
methods of a DB or DW system.
Major Issues in Data Mining
Data Reduction
✔Data reduction is a process used in data processing and analysis to reduce the amount of data without signiﬁcantly affecting its integrity

or quality. The goal is to simplify or compress the dataset to make it easier to store, process, and analyze while retaining the essential

information.

✔Mining on the reduced data set should be more efficient yet to produce same analytical results.

Ex: Image Processing

Techniques in data Reduction:

1. Data Compression

2. Dimensionality Reduction

3. Numeracity reduction
Dimensionality Reduction
->The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
->Dimensionality reduction technique can be deﬁned as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information.“
Ex: speech recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.
Wavelet Transformation
● The signal is represented by wavelets, which are small, oscillating functions that capture both time and

frequency information.

● Wavelets are decomposed a signal into a set of basis functions. These basic functions are called wavelets.

● Wavelet Transformation works on both positive and negative areas.

● Types of wavelet transforms

1. Continuous wavelet transforms

2. Discrete wavelet transforms

Continue
● The discrete wavelet transform (DWT) is a signal processing technique that transforms linear signals.

● The data vector X is transformed into a numerically different vector, Xo, of wavelet coefficients when the

DWT is applied.

● The two vectors X and Xo must be of the same length. When applying this technique to data reduction, we

consider n-dimensional data tuple, that is, X = (x1,x2,…,xn), where n is the number of attributes present in

the relation of the data set.

Pyramid Method
Pyramid Algorithm
1. The input data vector is of length L and L is an integer and is the power of 2. If length L is not the power of
2 then we can append the zeroes at the end of input data vector to make it as power of 2.
2. We apply two functions for each transform of the data vector . The first function is to perform the data
smoothing, like finding the weighted average of the data vectors. The second function is to find the
weighted difference and this retrieves the important features of the input vector .
3. We apply the two functions to the X axis pairs of the data points (x2i ,x2i+1). Two different data sets of
length L/2 are obtained after applying the two functions. The first data set is the low-frequency version
of the original data and the second one is the high frequency data set of it.
4. These two functions are applied to the data vectors recursively until the obtained resultant data vectors
are of length 2.
The wavelet coefficients are assigned to the transformed data vectors finally.
Principle Component Analysis
1. Principal Component Analysis (PCA) is a data reduction technique used to simplify large datasets by reducing the number of variables
(features) while preserving as much information as possible.
2. It does this by transforming the original variables into new, uncorrelated variables called principal components.
Steps of PCA:

Standardization: Ensure the data has a mean of 0 and standard deviation of 1.

Covariance Matrix: Calculate the covariance matrix to understand how variables are related.

Eigenvectors and Eigenvalues: Identify the principal components (eigenvectors) and the amount of variance they capture (eigenvalues).

Project Data: Transform the original data into the new principal components.

Applications:

● Data compression: Reduce the number of features while keeping essential information.

● Visualization: PCA can reduce complex datasets (e.g., 10 features) to 2 or 3 components for easy visualization.
Numeracity Reduction
● It is the technique to replace the original data by alternative smaller forms of data representation.
Types:
1. Parametric
2. Non-Parameric
1. Parametric
This method assumes a model into which the data ﬁts. Data model parameters are estimated, and only those

parameters are stored, and the rest of the data is discarded.

1. Regression

2. Log-linear Regression

Regression: Regression can be a simple linear regression or multiple linear regression. When there is only a

single independent attribute, such a regression model is called simple linear regression. If there are

multiple independent attributes, then such regression models are called multiple linear regression.

Log-Linear Model: The Log-Linear model discovers the relationship between two or more discrete attributes
2. Non-Parametric
A non-parametric numerosity reduction technique does not assume any

model.

1. Histogram

2. Clustering

3. Sampling

4. Data Cube Aggregation

✔A histogram is the data representation in terms of frequency.
✔Histogram of attribute ‘A’ partitioned to the data into disjoint subsets referred as

Histogram buckets of bins.

1. Single – ton Bucket 2. Equal- width bucket
1. Single ton Bucket
2. Equal Width Bucket
Exercise
Data Discretization and Concept Hierarchy

✔ method of converting a huge number of data values

into smaller ones so that the evaluation and

management of data become easy.

✔data discretization is a method of converting

attributes values of continuous data into a ﬁnite set

of intervals with minimum data loss.

Supervised discretization refers to a method in which the class data is used.
Unsupervised discretization refers to a method depending upon the way which operation proceeds.
Top-down Discretization -
If the process starts by ﬁrst ﬁnding one or a few points called split points or cut points to split the entire attribute range and
then repeat this recursively on the resulting intervals.

Bottom-up Discretization -
✔Starts by considering all of the continuous values as potential split-points.
✔Removes some by merging neighborhood values to form intervals, and then recursively applies this process to the resulting
intervals.
Concept Hierarchies
✔Discretization can be performed rapidly on an attribute to provide a hierarchical partitioning of the attribute values,
known as a Concept Hierarchy.
✔Concept hierarchies can be used to reduce the data by collecting and replacing low-level concepts with higher-level
concepts.
✔In the multidimensional model, data are organized into multiple dimensions, and each dimension contains multiple
levels of abstraction defined by concept hierarchies.
✔organization provides users with the flexibility to view data from different perspectives.
✔Data mining on a reduced data set means fewer input and output operations and is more efficient than mining on a
larger data set.
✔Because of these benefits, discretization techniques and concept hierarchies are typically applied before data
mining, rather than during mining.
Discretization and Concept Hierarchy Generation for Numerical Data
1. Binning

2. Histogram Analysis

3. Cluster Analysis

4. Discretization by Intuitive Partitioning

1] Binning
❑Binning is a top-down splitting technique based on a specified number of bins.
❑Binning is an unsupervised discretization technique because it does not use class information.
❑The sorted values are distributed into several buckets or bins and then replaced with each bin value by the bin mean or
median.
2] Histogram Analysis
✔It is an unsupervised discretization technique because histogram analysis does not use class information.
✔Histograms partition the values for an attribute into disjoint ranges called buckets.
It is also further classified into
Equal-width histogram
Equal frequency histogram
The histogram analysis algorithm can be applied recursively to each partition to automatically generate a multilevel concept hierarchy,
with the procedure terminating once a pre-specified number of concept levels has been reached.
3] Cluster Analysis
✔Cluster analysis is a popular data discretization method.
✔A clustering algorithm can be applied to discretize a numerical attribute of A by partitioning the values of A into clusters or groups.
✔Clustering considers the distribution of A, as well as the closeness of data points, and therefore can produce high-quality discretization
results.
✔Each initial cluster or partition may be further decomposed into several subcultures, forming a lower level of the hierarchy.
4. Discretization by Intuitive Partitioning

✔Numerical ranges partitioned into relatively uniform, easy-to-read intervals that appear intuitive or “natural.”
✔The 3-4-5 rule can be used to segment numerical data into relatively uniform, naturalseeming intervals.
✔In general, the rule partitions a given range of data into 3, 4, or 5 relatively equal-width intervals, recursively and level
by level, based on the value range at the most significant digit.
The rule is as follows:
✔If an interval covers 3, 6, 7, or 9 distinct values at the most significant digit, then partition the range into 3 intervals
✔If it covers 2, 4, or 8 distinct values at the most significant digit, then partition the range into 4 equal-width intervals.
✔If it covers 1, 5, or 10 distinct values at the most significant digit, then partition the range into 5 equal-width intervals.
Concept Hierarchy Generation for Nominal Data(Categorical Data )
Categorical data are discrete data.
• Categorical attributes have a finite (but possibly large) number of distinct values, with no ordering among the values.
• Examples include geographic location, job category, and item type.

i) Speciﬁcation of a partial ordering of attributes explicitly at the schema level by users or experts

ii) Speciﬁcation of a portion of a hierarchy by explicit data grouping

iii) Speciﬁcation of a set of attributes, but not of their partial ordering

iv) Speciﬁcation of only a partial set of attributes

i) Specification of a partial ordering of attributes explicitly at the schema level
by users or experts
❑Concept hierarchies for nominal attributes or dimensions
typically involve a group of attributes.
❑A user or expert can easily define a concept hierarchy by
specifying a partial or total ordering of the attributes at the
schema level.
❑For example, suppose that a relational database contains the
following group of attributes: street, city, province or state, and
country
❑ A hierarchy can be defined by specifying the total ordering
among these attributes at the schema level such as street <
city < province or state < country.
❑example of a partial order for the time dimension based on
the attributes day, week, month, quarter, and year is “day
<{month < quarter; week} < year
ii) Specification of a partial of a hierarchy by explicit data grouping
✔Concept hierarchies may also be defined by discretizing or grouping values for a given dimension or
attribute, resulting in a set-grouping hierarchy. A total or partial order can be defined among
groups of values.
✔Concept hierarchies may be provided manually by system users
✔An example of a set-grouping hierarchy is shown in Figure for the dimension price, where an interval
($X…$Y] denotes the range from $X (exclusive) to $Y (inclusive).
iii) Specification of a set of attributes, but not of their partial ordering

vi) Speciﬁcation of only a partial set of attributes

Unit-1 Completed
THANK YOU

ToLiss AirbusA320 SimulationManual
0% (1)
ToLiss AirbusA320 SimulationManual
89 pages
Java Web Start Configuration
No ratings yet
Java Web Start Configuration
6 pages
SLAE Student Slides PDF
No ratings yet
SLAE Student Slides PDF
263 pages
Module 1
No ratings yet
Module 1
41 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
DM Unit2 (Part1)
No ratings yet
DM Unit2 (Part1)
19 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Data Mining
No ratings yet
Data Mining
52 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Data Mining
No ratings yet
Data Mining
26 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Minng
No ratings yet
Data Minng
20 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
Chap 1
No ratings yet
Chap 1
32 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
UNIT-1 Why We Need Data Mining?
No ratings yet
UNIT-1 Why We Need Data Mining?
99 pages
Datamining Unit - 1
No ratings yet
Datamining Unit - 1
20 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
13 pages
DSDM Notes
No ratings yet
DSDM Notes
114 pages
Wao
No ratings yet
Wao
9 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
29 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
80 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Unit 1
No ratings yet
Unit 1
21 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Module 4
No ratings yet
Module 4
54 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
100 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Chapter One
No ratings yet
Chapter One
21 pages
Unit 1 Mining
No ratings yet
Unit 1 Mining
15 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
No ratings yet
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
83 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
RosellaC - Ramos-Topic 3-PHP Control Structures-Part 2
No ratings yet
RosellaC - Ramos-Topic 3-PHP Control Structures-Part 2
16 pages
3rd Quarter Exam (2023-2024)
No ratings yet
3rd Quarter Exam (2023-2024)
2 pages
Website Development Proposal: Project: Client
No ratings yet
Website Development Proposal: Project: Client
10 pages
P740en PRC11 PDF
No ratings yet
P740en PRC11 PDF
14 pages
SAP Financial Accounting - Document Control
No ratings yet
SAP Financial Accounting - Document Control
21 pages
PHD Thesis On Cloud Computing Security PDF
100% (1)
PHD Thesis On Cloud Computing Security PDF
8 pages
RWD Uperform 3.0 - Administration
No ratings yet
RWD Uperform 3.0 - Administration
235 pages
ANIK CHATTERJEE - Cyber
No ratings yet
ANIK CHATTERJEE - Cyber
10 pages
Instructables LED Matrix Using Shift Registers
100% (1)
Instructables LED Matrix Using Shift Registers
21 pages
6-Emerging-Trends-In-Facilities-Management-Sourcing McKinsey
No ratings yet
6-Emerging-Trends-In-Facilities-Management-Sourcing McKinsey
12 pages
Smart Card: Credit Integrated Personal
No ratings yet
Smart Card: Credit Integrated Personal
45 pages
Control System Application
No ratings yet
Control System Application
3 pages
Multiple Choice Question of Computer Networking
100% (3)
Multiple Choice Question of Computer Networking
13 pages
Quote - Alia Hospital - MSWin7, Win10, MSOffice & Symantec Antivirus - 8dec2016, v2
No ratings yet
Quote - Alia Hospital - MSWin7, Win10, MSOffice & Symantec Antivirus - 8dec2016, v2
2 pages
Report On Three Day Hands On Traing Program On PCB Design
No ratings yet
Report On Three Day Hands On Traing Program On PCB Design
4 pages
Power BI Desktop-Interactive Reports - Microsoft Power BI
No ratings yet
Power BI Desktop-Interactive Reports - Microsoft Power BI
11 pages
Selective Tuning and Indexing
No ratings yet
Selective Tuning and Indexing
3 pages
2 Ipv6-Addressing
No ratings yet
2 Ipv6-Addressing
51 pages
Medal Log 20241225
No ratings yet
Medal Log 20241225
151 pages
PC Trickes
No ratings yet
PC Trickes
28 pages
DS-K1T804 Series: Fingerprint Access Control Terminal
No ratings yet
DS-K1T804 Series: Fingerprint Access Control Terminal
3 pages
Iot Unit-3
No ratings yet
Iot Unit-3
19 pages
Playfablogin 1.Cs
No ratings yet
Playfablogin 1.Cs
4 pages
Huawei IT Portfolio Cigre 2022
No ratings yet
Huawei IT Portfolio Cigre 2022
30 pages
Devops Course Syllabus
No ratings yet
Devops Course Syllabus
2 pages
Exploring Security Threats On Blockchain Technology Along With Possible Remedies
No ratings yet
Exploring Security Threats On Blockchain Technology Along With Possible Remedies
4 pages
VideoXpert Professional™ V 3.12
No ratings yet
VideoXpert Professional™ V 3.12
14 pages

Unit 1 DM

Uploaded by

Unit 1 DM

Uploaded by

Data Mining

Data mining is used in business to make better managerial decisions by:

The Most Basic Forms Of Data For Mining Applications Are

❖DBMS – database management system, contains a collection of interrelated databases

✔Spatial Data (E.G., Maps),

✔Graph And Networked Data (E.G., Social And Information Networks),

Data Mining Tasks:

2. Predictive Data Mining is the Analysis

Potentially useful and novel.

only interesting patterns.An interesting pattern represents knowledge.

telecommunications, DNA, stock markets, e-mail, and so on

of knowledge may have different interestingness measures.

Representation for visualizing the discovered patterns

rules, tables, reports, charts, graphs, decision trees, and cubes.

and then store the mining results in another ﬁle.

time ﬁnding, collecting, cleaning, and transforming data.

designated area in a database or data warehouse.

Warehouses by using query processing, indexing, and other system facilities.

Drawbacks of Loose Coupling

Ex: Image Processing

Techniques in data Reduction:

● Wavelet Transformation works on both positive and negative areas.

● Types of wavelet transforms

1. Continuous wavelet transforms

2. Discrete wavelet transforms

the relation of the data set.

Standardization: Ensure the data has a mean of 0 and standard deviation of 1.

parameters are stored, and the rest of the data is discarded.

4. Data Cube Aggregation

Histogram buckets of bins.

✔ method of converting a huge number of data values

into smaller ones so that the evaluation and

management of data become easy.

✔data discretization is a method of converting

attributes values of continuous data into a ﬁnite set

of intervals with minimum data loss.

4. Discretization by Intuitive Partitioning

ii) Speciﬁcation of a portion of a hierarchy by explicit data grouping

iii) Speciﬁcation of a set of attributes, but not of their partial ordering

iv) Speciﬁcation of only a partial set of attributes

vi) Speciﬁcation of only a partial set of attributes

You might also like