0% found this document useful (0 votes)

20 views15 pages

Unit 1 Mining

Uploaded by

nikita0103sharmaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Unit 1 Mining

Uploaded by

nikita0103sharmaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit 3

Data Mining : Data mining refers to extracting or mining knowledge from

large amount of data.
Big data Useful data

Other simple definition, it is the process of mining knowledge from large

amount of data.
the information or knowledge that is extracted through mining that can be
used for
1. market analysis,
2. fraud detection,
3. production control,
4. science exploration.
Data mining is also called knowledge discovery in database, KDD.
The knowledge discovery process include data cleaning, data integration,
data selection, data transformation, data mining, and knowledge
representation.
Aim of data mining The primary aim of data mining is discover hidden
patterns and the relationship in the data that can be used to make informed
decision or prediction.

1
Unit 3

Kind of data to be mined

1. Relational Database,
2. Spatial Database,
3. Flat Files,
4. Time Service Database,
5. Transactional Database,
6. WWW,
7. Data Warehouse,
8. Multidimensional Database
1. Flat File These are the data files in text form or binary form with the
structure that can be easily extracted by data mining algorithm.
The data stored in the flat file has no relationship or paths to each
other. These files represent data dictionary such as CSV files.
2. Relational Database It is defined as the collection of data organized in
tables with rows and columns. The physical schema of a relational
database is the schema that defines the structure of tables. The logical
schema is schema that defines the relationship between tables. It is mostly
used in applications like SQL and Oracle Database.
3. Transaction Database It is a database collection organized by timestamp,
dates, and transactions. It follows ACID property of DBMS. Each
transaction has unique transaction ID, seller ID
Application-banking object, database.
4. Data Warehouse It is a cluster of data that is integrated from multiple
sources that have been queried and determined.
5. WWW:-It is a collection of document and resources such as audio, video,
and text,
Application- online shopping, job search.
6. Time Service It contains stock, exchange data, and user logged activities.
It requires real-time analysis.
7. Spatial Database Stores geographical information. Stores data in the
form of coordinates, lines, polygons.
8. Multimedia It consists of audio, video, images, and text media. These can
be stored on an objective-oriented database.

2
Unit 3

Data Mining Functionalities

Data Mining Functionalities are used to specify the kind of patterns to be
found in data mining tasks.
In general, data mining tasks can be divided into two categories,
1. descriptive and
2. predictive.
1. Descriptive. It involves analysing the data in database to understand its
general properties and characteristics. This aims to provide a comprehensive
overview of the data without making any predictions about future outcomes.
2. Predictive Mining. This mining focuses on using the current data to make
predictions about future outcomes.
Mining functionalities
1. Class description.
• It is used to associate data with a class, a concept.
• characterization. It is one of the methods used in class description.
This method helps to connect data with a certain set of customers.
• Discrimination. It is used to compare the characteristics of two
different classes of customers.
• Example. One of the best example is the release of the same model
of mobile phone in different variants. This helps companies to
satisfy the need of different customer segments.
2. Frequent patterns. These are patterns that occur frequently in data.
There are many kinds of frequent patterns including frequent item sets,
frequent subsequences, and frequent substructures.
3. Classification. It is one of the most important data mining functionalities
that use models to predict the trend in the available data.
This method uses the if-then rule, decision tree, mathematical
formula, and neural network to analyze a model.
4. Prediction. Finding missing data in a database is very important for the
accuracy of the analysis.
It is the most popular data mining functionality. Determining any
missing or unknown element in a data set. Linear regression models
based on previous data are used to make numeric predictions, which
help business forecast the result of any given event positively or
negatively.
There are two types of predictions.

3
Unit 3

a. Numeric prediction and

b. class prediction.
a. Numeric prediction predicts any missing or unknown element in a
data set.
b. Class prediction predicts the class label using a previously built
class model.
5. Cluster analysis. The data mining functionality is same as classification
but in this the class label is unknown. In cluster analysis, similar data
types are grouped in a cluster. There are huge differences between one
cluster and another. It is applied in different fields like machine learning,
image processing, and pattern recognition.
6. Outer analysis. It is used to group data that don't appear under any class.
There can be data that has no similarity with attributes of other classes.
These are called outliers. Such occurrence are considered to be noise or
expectations.
7. Association analysis. It is also called market basket analysis. Association
analysis helps to find relation between elements frequently occurring
together. It is widely used in retail sales. It is a way of discovering the
relationship between various items.
For example, suppose we want to know which items are frequently
purchased together. Here x is a variable representing a customer. A
confidence over a certainty of 50% means that if a customer buys a
computer, there is a 50% chance that he will buy software as well. A
1% support means that 1% of all transactions under analysis show
that computer and software are purchased together.
8. Correlation Analysis. It is a mathematical technique that can show
whether and how strongly the pair of attributes are related to each other.
For example, high lead people chance to have more weight.

4
Unit 3

Technology used in data mining,

a) Statistics
b) Machine learning
c) Datawarehouse and database
d) Visualization
e) Information retrieval
f) Pattern recognition
g) algorithm
1. statistics . It is used to collection, analysis, and explanation and
presentation of data. Data mining has an infinite connection with statics.
A statical model is used for data classes and data modeling. It describes
the behavior of an object in class and its probabilities.
Advantages.
It can be used to model noise and missing data values.
It is used for pattern mining.
Disadvantages.
When it is used in large data sets, it increases the complexity cost.
2. Machine Learning. It describes how a computer can learn based on data.
It is a fast-growing discipline which researches how computers
automatically learn based on the given input data and make intelligent
decisions.
It has three types, supervised, unsupervised, and semi-supervised.
1) Supervised. It is a synonym of classification. It uses class labels
to predict information.
2) Unsupervised. It is a synopsis of clusters. It does not use class
labels to predict information. But it discovers new classes within
data.
3) Semi-supervised. It is a class of machine learning techniques
which uses both labeled and unlabeled data.
3. Database System and Data Warehouse. Database systems are used in
query language, query processing, optimization, and data models. Data
warehousing combines data from multiple sources and various time
frames. It provides OLAP facilities in multi-dimensional databases to
promote multi-dimensional data mining. It maintains recent data and
historical data in databases.
4. Information retrieval. This technique searches for the information in
documents which may be taxed, multimedia, or may reside on the web. It
has two features.
1.Searched data in unstructured 2. checking queries are formed by
keywords that don't have complex structures.
5
Unit 3

5. Pattern Recognition. Pattern is everything around in this physical world.

A pattern can either be seen physically or it can be observed
mathematically by applying algorithms. For example, the color of clothes,
speech patterns, etc. Pattern recognition is the process of recognizing
patterns by using a machine learning algorithm.
6. Algorithm is a set of heuristics and calculations that create models from
data.
7. Virtualization. It is the process of converting data and information into a
graphical form.
Application of Data Mining,
1. Healthcare,
2. Market Basket Analysis,
3. Education,
4. Financial Banking,
5. Flood Detection,
6. Customer Segmentation,
7. Risk Management, Lie Detection,
8. Manufacturing,
1. Healthcare Data mining helps in improving of quality of healthcare
system. Data mining is used in healthcare to analyse patient data, identify
risk factor and develop personal treatment plan.
2. Market Basket Analysis, it involves the analysis of customer purchase
data to identify patterns and trends in customer behaviour.
3. Education, data mining is used in education to analyse student
performance data and identify trends and patterns in student behaviour.
4. Finance and Banking, Finance and Banking involves the management of
money and investment. Data mining is important in finance and banking
because it can help banks to identify fraud and behaviour patterns,
analyse customer behaviour.
5. Fraud Detection, Fraud Detection involves identifying fraud and
behaviour in various industries like banking, e-commerce. Data mining
provides meaningful patterns and turn data into information which helps
in flood detection.
6. customer segmentation. Data Mining Identifiers, the common
characteristics of customers who buy the same products from the
company.
7. Lie Detection, Law Enforcement may use data mining techniques to bring
out the truth from criminals.

6
Unit 3

Major issues in data mining.

There are many issues in data mining. Some of these are
1. mining methodology and user interaction issues,
2. performance issues,
3. diverse data type issues,
1. mining methodology and user interaction issues,
a. mining different kind of knowledge in database, different user feed,
different kind of knowledge, so it becomes difficult to cover large
range of data discovery tasks.
b. Interactive mining of knowledge at multiple levels of abstraction.
Data mining process should be interactive because it is difficult to
know what can be discovered within a database.
c. Incorporation of background knowledge. Background knowledge
is used to guide discovery process.
d. Data mining query language. The language of data mining query
language should be perfectly matched with query language of
warehouse.
e. Handle noisy data. The data cleaning methods are required to
handle the noisy data. If data cleaning methods are not there, then
the accuracy of discovery patterns will be poor.
2. Performance issues.
a. Efficiency and scalability. To effectively extract information from
a huge amount of data in database, data mining algorithm must be
efficient.
b. Parallel and distributed mining algorithms the huge size of many
database, wide distribution of data and complex data mining
methods are effective due to that parallel and distributed data
mining algorithm exists. Such algorithms divide the data into
different parts and which extracts the process in parallel.
3. Diverse data type.
a. Handling relational and complex data type.
There are many kind of data stored in database, that is multimedia,
text, document, etc. It is not possible for one system to mine all these
kind of data.
b. Mining information from heterogeneous database and global
information system.
The data is available at different data storage on land or van. This
data source may be semi-structured or unstructured. Therefore,
mining knowledge from dam adds challenge to data mining.
7
Unit 3

Data Pre-processing
Real-world datasets are raw, incomplete, inconsistent, and unusable. It is an
unusable data, so data pre-processing is the process of converting raw data
into a format that is understandable and usable.
Data pre-processing is a technique that is used to improve the quality of data
before applying mining so that data mining will give high-quality mining
results.
Data Data Data Data
cleaning integration transformation reduction

There are four stages in pre-processing of data.

Needs of pre-processing or why pre-processing
A. It improves accuracy and reliability.
B. It makes data consistent.
C. It increases data algorithm readability.
Data pre-processing is needed to check the data quality.
The quality can be checked by following
A. accuracy. To check whether the data entered is correct or not.
B. Completeness To check whether the data is available or not Recorded
C. consistency To check whether the same data is kept in all the places
that do or do not match.
D. Timelines The data should be updated correctly.
E. Believability The data should be trustable.
F. Interpretability The understandability of the data.

8
Unit 3

Data: Collection of Data Objects and Their Attributes, or it is how the data
objects and their attributes store.
Datasets, these are made of data objects.
Data objects represent an entity, it is also called sample, example, instance,
tuple, object,
attribute. It is the property or characteristics of an object, example, eye color
of a person, temperature, etc.
Type of Attribute
1) Qualitative (Nominal Ordinal Binary )
2) Quantitative (numeric Discrete Continuous )
Qualitative
1) Nominal Attribute It provides enough attributes which differentiate
between one object and another object. The values of nominal attributes
are name of things or some kind of symbols. If it is a category or state, it
does not follow any order, example, hair colour, black-white-brown,
gender, male-female.

2) ordinal attribute- This attribute Contains value that have a meaningful

sequence or Ranking or order btw them
e.g. T-shirt size (XL, L, M, 5) grade (O, A+, A, B, C)

3) Binary Attribute: It has two category/value 0 and 1, it is also called

Boolean attribute.
where 0 is absence of any feature.
1 is include of any Features.
It has two sub category.
a. symmetric → gender (M OR F) (Both value imp.)
b. asymmetric:-> Disease-test (0, 1) (Both value are Not Imp.)

Quantitative attribute:-
(1) Numeric Attribute: it is Quantitative BCZ Quantity can be Measured
and it can have Integer or real value
it has two type.
1) Interval scaled-
• It Can Be measured on se scale of equal size.
• It can have positive, Zero or Negative value.

9
Unit 3

• it allows to compare such as temperature in C or F

2) Ratio scale attribute
• it is a Measurable quantity
• it has zero point
• The value are ordered and Mean, Median and Made con be
calculated for these.
• eg age, Length, and weight
2.Discreate & Continuous attributed
Discreate it is set of possible value which may.
or may not be represented as Integer. e.g. Hair colour (Black, white)
Continuous
it has Floating-point or integer values. e.g. year of experience, salary
etc.

Similarity and Dissimilarity.

In clustering techniques Similarity and Dissimilarity is an Imp
Measurement
Similarity : The Similarity btw two object is a Numerical Measure of
the degree to which the two objects. are alike (Same).
# similarity are higher for pair of object that are look like same.
# Similarity are usually Non- Negative and. Often Btw 0 (No same) and I
(Complete some)
Dissimilarity: The Dissimilarity btw two object is Numerical Me
Measure of degree to which the two object. Ls. Not Same (Different)
# The Term Distance is used as synonym for dissimilarity
#proximity is refers to a similarity or Dissimilarity

Similarity-
• Numerical measure How Much two object is same
• value is Higher when object are look like, same
• Range is [0, 1]
Dissimilarity
• Numerical Measure of How Different is
• two object Lower when object are same
• Minimum is 0.
• upper limit varier (maybe as)

10
Unit 3

Data visualization
Data visualization is the graphical representation of Information and data
in graphical format Pictorial or
Visualization of Data could be Charts, graphs and Maps.
# Data visualization tool provides an accessible way to see and understand
trends, outliers and patterns in Data.
# Data visualization tools provides and. Technology are essential to
analysing the large amount of data (Information) and making Decisions

Type of Data visualization..

Chart:- it is graphical Representation. of Information

Line Graph- it shows the data as a series of Point connected by straight line
Segment. it is used to show changes through time Trends, development or
changes through time
Bar chart:- It show thin Colourful rectangular Bar with their length
which is proportional to value Represented.
Scatter Plot: it a very Basic and useful graphical Form. it help to find the
Relationship bow two variables
Pie chart:- It is circular statistical graph in which a single Number is
Represented by several Categories
Bubble chart:- It in variant of scatter plot where the size and colour of
Bubbles, which Represent the date point, provides the extra information
Heap Map:- it uses colours to Denote value, great for seeing trends in
Huge dataset.
Tree chart:- it is alternative of table for Accurately Numerical data.

why use or Adv

(i) Discover the Trends in Data
The Most Imp thing that data visualization does lo Discover the trends in
Data.it is much easier to observe data trends. when all data is visuel form
as Compared to data in table.
(2) Perspective on Data
Visualization of Data Provides a perspective on data by showing Its
meaning.It tell How a particular data stand against overall data picture.
(3) Put Data in correct context-
It is easy to understand the context of Data with help of data visualization.

11
Unit 3

4) Save time:It is faster to get out information from the data by using Date
visualization.
(5) It is used for competitive analyse.
(6) It help to find Relationship and patterned Quickly.

Data Cleaning
In Real world database is row, in complete, inconsistent and unusable
So Data cleaning is to clean the data by filling Missing value, Smoothing
Noisy Data Identifying or removing outlier and Removing in consistency in
Data.
OR.
Data cleaning is the process of fixing or nee removing Incorrect,
Incomplete, Corrupted, Duplicate data From database.

(A) Handling of Missing Data:-

This situation arises when some data is Missing in Dala Set It can be
handled following ways.
(i) Ignore the tuple. It is suitable when data set is large
(ii) Full the missing value By Filling attribute men or median. By Filling
the Most probable value.

(B) Noisy Data- It means error in Data occur during data collection, or
data o entering
It is inconsistent Data.
It can be Handle by Fallowing way.

1. Binning- in this 1st data is Sorted then.. Sorted data is stored in Bins.
There are three method to Handle data in Bins
(i) Smoothing By Bin mean
(ii) Smoothing By Bin Median.
(iii) Smoothing By Bin Boundary

2. Regression:- it is Numerical prediction of data. it can be Linear or

multiple.

3clustering:- in this similar data item are grouped at one place and
Dissimilar items outside the cluster.

12
Unit 3

Data Integration
It is process of combining Date from multiple Sources into a single dataset.
OR.
It is the process of merging Data From different Source Le Database, date
cube and flat Files to avoid inconsistencies and Redundancies So that speed
and Accuracy of Data mining Improves
It has two approaches.
(1) Tight coupling:
Data is Combined together Into a Physical location.
(2) Loose coupling:-
in this Data only Remain in Actual source Database.

# in this method user are provided an Interface to Input their queries and
This interface then transforms this query in way that Source Database Can
understand and then Send to this query to source Database to obtain
results.

Issue in Data Integration-

(1) Entity Identity trouble.
(2) Tuple Duplication
(3) Redundancy and inconsistency
(4) Date Conflict detection and Resolution.
(5) Quality Reduce

13
Unit 3

Data Reduction
Data Reduction is a process which can be applied to obtain or Reduced
representation. of Dataset that is smaller in volume and Maintains the
integrity of original Data.
# in this volume of Data is Reduced to Make analysis easier.

Method of Daty Reduction:-

(1) Dimensionality Reduction:- It is process of Reducing the Number of
variables in data set Bcz Large variables poor performance.
(2) Date cube aggregation:- in this Data is Combined to construct a data
cube. #Redundancy, Noisy data is removed.
(3) Attribute subset Selection:- in this Highly. relevant attributes should be
used. and other 's Discarded.
(4) Numerosity Reduction:- in this we Replace the original Data. volume by
alternative, Smaller form of Data Representation.

Data Transformation
it is a Method used to transform the data with a small range so that mining
process can be more efficient easy
Method
(1) smoothing:- it is used to remove Noise From the data i.e. Clustering,
Binning.
(2) Attribute Selection:- In this we create New attribute By using older
attributes.

(3) Aggregation:- In this summary or Aggregation operation are applied to

the data

(4) Normalization. (-1.0 to 1.0 or 0 to 1):- It is Done in order to Scale the

data value In a specific Range

(5) Hierarchy Generation: - attributes are converted From Low level to

High level. e.g. city-> country.

(6) Data Discretization:- In this Raw value of Numeric attributes are

Replaced by interval table or conceptual tables.
e.g. age (RAW) interval (0-10,18-30)

14
Unit 3

Data discretization
Data discretization converts a large number of data values into smaller ones,
so that data isolation and data management becomes very easy.
Discretization is the process of putting value into buckets so that there are a
limited number of possible states. The buckets themselves are treated as
ordered at discrete value.
There are several methods that you can use to discrete data. There are
different methods which are used for performing data discretization.
1. Supervised discretization, if data is discretized using class
information, then it is referred as supervised or organized
discretization
2. Unsupervised discretization, if data values are reduced by
substituting them by limited internal descriptions but without using
class information, then it is referred to as unsupervised discretization.
3. Top-down discretization, if the process starts by first finding one or a
few points to split the entire attribute range and then repeat this
recursively on the resulting interval, then it is called top-down
discretization of splitting.
4. Bottom-up discretization, if the process starts by considering all of the
contiguous values as potential split points, remove some by merging
neighbors, load value to form intervals, then it is called bottom-up
discretization of merging.
Techniques of data discretization,
1. Histogram analysis,
2. Binding,
3. Correlation analysis,
4. Clustering analysis,
5. Decision tree analysis,
6. Equal bits partitioning,
7. Equal depth partitioning, and
8. entropy-based discretization.

10 Standout Coding Projects
No ratings yet
10 Standout Coding Projects
61 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
65eeb59cc7566 Solutions Manual Digi 240808 034325
No ratings yet
65eeb59cc7566 Solutions Manual Digi 240808 034325
142 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Bcom - Business - Analytics - 4 Full Modules
No ratings yet
Bcom - Business - Analytics - 4 Full Modules
116 pages
Major Project 30% Work Report
No ratings yet
Major Project 30% Work Report
5 pages
Data Mining Unit 1-1
No ratings yet
Data Mining Unit 1-1
11 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Plagiarism Checker
No ratings yet
Plagiarism Checker
59 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
66 pages
Blood Group Fingerprint Project Report
No ratings yet
Blood Group Fingerprint Project Report
3 pages
Handwriting To Text Conversion
No ratings yet
Handwriting To Text Conversion
7 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
UNIT-1 Why We Need Data Mining?
No ratings yet
UNIT-1 Why We Need Data Mining?
99 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Next Generation Edge - Edge Computing Architectures For Artificial Intelligence and Machine Learning Use Cases
No ratings yet
Next Generation Edge - Edge Computing Architectures For Artificial Intelligence and Machine Learning Use Cases
20 pages
Data Warehousing Fundamentals - Unit 2
No ratings yet
Data Warehousing Fundamentals - Unit 2
38 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
Unit 1
No ratings yet
Unit 1
59 pages
Module 1
No ratings yet
Module 1
41 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
29 pages
Unit 2
No ratings yet
Unit 2
37 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
39 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Unit 1 DM
No ratings yet
Unit 1 DM
24 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
Image Forgeryin
No ratings yet
Image Forgeryin
25 pages
Datamining Unit - 1
No ratings yet
Datamining Unit - 1
20 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Gita Autonomous College, Bhubaneswar Question Bank Subject
No ratings yet
Gita Autonomous College, Bhubaneswar Question Bank Subject
27 pages
Data Pre-Processing - by Quant Arb - The Quant Stack
No ratings yet
Data Pre-Processing - by Quant Arb - The Quant Stack
9 pages
Data Mining
100% (1)
Data Mining
40 pages
Lecture 7 - Data Cleaning
No ratings yet
Lecture 7 - Data Cleaning
36 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
Data Analytics Unit 1
No ratings yet
Data Analytics Unit 1
16 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Data Minng
No ratings yet
Data Minng
20 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
Mini Project Final
No ratings yet
Mini Project Final
12 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Data Mining
No ratings yet
Data Mining
8 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Detection of Phishing Web Page Using Machine Learning
No ratings yet
Detection of Phishing Web Page Using Machine Learning
20 pages
FoDS - Unit 1
No ratings yet
FoDS - Unit 1
7 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Chapter 2 - Data Preprocessing
No ratings yet
Chapter 2 - Data Preprocessing
15 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
Enset
No ratings yet
Enset
15 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
Whats App
No ratings yet
Whats App
23 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
DWDM Unit3
No ratings yet
DWDM Unit3
15 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
AI Based Eco Lifestyle Advisor
No ratings yet
AI Based Eco Lifestyle Advisor
8 pages
Data Mining
No ratings yet
Data Mining
14 pages
A Survey of 3D Indoor Scene Synthesis
No ratings yet
A Survey of 3D Indoor Scene Synthesis
15 pages
Dasf (En)
No ratings yet
Dasf (En)
8 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Data Mining Real
No ratings yet
Data Mining Real
19 pages
Posture Detection Abstract
No ratings yet
Posture Detection Abstract
6 pages
Document 4
No ratings yet
Document 4
4 pages
Business Data Mining - Syllabus7675535
No ratings yet
Business Data Mining - Syllabus7675535
1 page
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
No ratings yet
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
4 pages
Sheet 1 Solution1
No ratings yet
Sheet 1 Solution1
4 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 1 Mining

Uploaded by

Unit 1 Mining

Uploaded by

Unit 3

Data Mining : Data mining refers to extracting or mining knowledge from

Other simple definition, it is the process of mining knowledge from large

Kind of data to be mined

Data Mining Functionalities

a. Numeric prediction and

Technology used in data mining,

5. Pattern Recognition. Pattern is everything around in this physical world.

Major issues in data mining.

There are four stages in pre-processing of data.

2) ordinal attribute- This attribute Contains value that have a meaningful

3) Binary Attribute: It has two category/value 0 and 1, it is also called

• it allows to compare such as temperature in C or F

Similarity and Dissimilarity.

Type of Data visualization..

why use or Adv

(A) Handling of Missing Data:-

2. Regression:- it is Numerical prediction of data. it can be Linear or

Issue in Data Integration-

Method of Daty Reduction:-

(3) Aggregation:- In this summary or Aggregation operation are applied to

(4) Normalization. (-1.0 to 1.0 or 0 to 1):- It is Done in order to Scale the

(5) Hierarchy Generation: - attributes are converted From Low level to

(6) Data Discretization:- In this Raw value of Numeric attributes are

You might also like