0% found this document useful (0 votes)

54 views13 pages

Data Mining Unit 1

The document provides an overview of data and data mining, defining data as a collection of objects characterized by attributes. It explains the data mining process, including its functionalities, architecture, and applications across various sectors such as healthcare, finance, and education. Additionally, it discusses the Knowledge Discovery from Data (KDD) process, types of data, and the classification of data mining systems.

Uploaded by

Aparna kallepu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views13 pages

Data Mining Unit 1

Uploaded by

Aparna kallepu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT-1

What is Data
Data can be defined as a representation of facts, concepts, or instructions in a formalized manner, which should be
suitable for communication, interpretation, or processing by human or electronic machine.

In other words, The Data is collection of objects defined by attributes.

A data object represents an entity.Also called as record, sample, example, instance, data point, object, tuple.
Examples:
 In a sales database, the objects may be customers, store items, and sales;
 In a medical database, the objects may be patients;
 In a university database, the objects may be students, professors, and courses.

Data objects are described by attributes, in other words, A collection of attributes describes an object.
Attributes
 An attribute is a data field, representing property or feature of a data object.
 Also known as dimension, feature, and variable.

1) What is Data Mining

Definition 1

Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. The data
sources can include databases, data warehouses, the Web, other information repositories, or data that are streamed into
the system dynamically.

(or)

Definition 2
Data Mining is all about discovering hidden, unsuspected, and previously unknown yet valid relationships amongst the
data.

Some Terms in Data mining

Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research
level. Flat files are simple data files in text or binary format with a structure known by the data mining algorithm to be
applied. The data in these files can be transactions, time-series data, scientific measurements, etc.

Relational Database (RDBMS):

● RDBMS stands for Relational Database Management System. Data mining techniques
can be used to analyze data stored in relational databases.
● These databases organize data into tables, consisting of rows and columns.
Data Warehouse:
● A data warehouse is a large centralized repository that consolidates data from various
sources within an organization.
● It is designed to support analytical processing and decision-making.
Transactional Data:
● Transactional data captures records of individual transactions or activities, such as
customer purchases, financial transactions, online interactions, and user behavior.
● Data mining techniques can be applied to transactional data to discover patterns, detect
anomalies, and make predictions.
2) Knowledge Discovery from Data (KDD)

The need of data mining is to extract useful information from large datasets and use it to make predictions or better
decision-making. Nowadays, data mining is used in almost all places where a large amount of data is stored and
processed.
For examples: Banking sector, Market Basket Analysis, Network Intrusion Detection.
Data Mining also known as Knowledge Discovery from Data or KDD.

Knowledge Discovery from Data (KDD) Process

KDD is a process that involves the extraction of useful, previously unknown, and potentially valuable information from
large datasets.
The KDD process is an iterative process and it requires multiple iterations of the above steps to extract accurate
knowledge from the data.

Fig : KDD
The following steps are included in KDD process:
 Data Cleaning
 Data Integration
 Data Selection
 Data Transformation
 Data Mining
 Pattern Evaluation
 Knowledge Representation
a) Data cleaning: to remove noise or irrelevant data
b) Data integration: where multiple data sources may be combined
c) Data selection: where data relevant to the analysis task are retrieved from the database
d) Data transformation: where data are transformed or consolidated into forms appropriate for mining by
performing summary or aggregation operations
e) Data mining: an essential process where intelligent methods are applied in order to extract data patterns
f) Pattern evaluation to identify the truly interesting patterns representing knowledge based on some
interestingness measures
g) Knowledge presentation: where visualization and knowledge representation techniques are used to present the
mined knowledge to the user.

3) Data Mining architecture

Data mining is a very important process where potentially useful and previously unknown information is extracted from
large volumes of data. There are several components involved in the data mining process.
The major components of any data mining system are data source, data warehouse server, data mining engine, pattern
evaluation module, graphical user interface and knowledge base.

Data Sources
Database, data warehouse, World Wide Web (WWW), text files and other documents are the actual sources of data.
You need large volumes of historical data for data mining to be successful.
Database or Data Warehouse Server
The database or data warehouse server contains the actual data that is ready to be processed. Hence, the server is
responsible for retrieving the relevant data based on the data mining request of the user.
Data Mining Engine
The data mining engine is the core component of any data mining system. It consists of several modules for performing
data mining tasks including association, classification, characterization, clustering,
Pattern Evaluation Modules
The pattern evaluation module is mainly responsible for the measure of interestingness of the pattern by using
a threshold value.
Graphical User Interface
The graphical user interface module provides the communication between the user and the data mining system. This
module helps the user use the system easily
Knowledge Base
The knowledge base is helpful in the whole data mining process. It might be useful for guiding the search or evaluating
the interestingness of the result patterns.

4) What are the Types of Data

What Kinds of Data Can Be Mined

As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a target
application.
The following are the most basic forms of data for mining.

Basic forms of data for mining

 Database Data (or) Relational database

 Data warehouse data
 Transactional data

other forms of data for mining

 Multimedia Database
 Spatial Database
 World Wide Web
 Text data (Flat File)
 Time series database
5) Describe the Data Mining Functionalities
Data mining is important because there is so much data out there, and it's impossible for people to look through it all by
themselves.
Data mining uses various functionalities to analyze the data and find patterns, trends, and other information that would
be hard for people to find on their own.
Data mining functionalities are used to specify the kinds of patterns to be found in data mining tasks.In general, such
data mining tasks can be classified into two categories: descriptive and predictive.

1]Descriptive Data Mining:

This category of data mining is concerned with finding
patterns and relationships in the data that can provide
insight into the underlying structure of the data.
Descriptive data mining is often used to summarize or
explore the data

2]Predictive Data Mining: This category of data mining

is concerned with developing models that can predict
future behavior or outcomes based on historical data.
Fig: Data Mining Functionalities Predictive data mining is often used for classification
or regression tasks

1] Descriptive Data Mining:

 Data Characterization
Data characterization refers to actively summarizing the general features or characteristics of the class
under study. Presenting the output can take various forms, including bar charts, pie charts,
multidimensional data cubes, etc.
 Data Discrimination
In data discrimination, common features of the class in question are identified and compared.
 Cluster Analysis
Cluster analysis, or called clustering, is a process of data mining where similar data points are identified
and grouped.
It is commonly used in customer behavior analysis, fraud detection, etc.
 Classification
Classification in data mining is a technique used to categorize data into predefined classes or categories
based on specific attributes or characteristics.
It is commonly used for churn prediction, loan default risk assessment, item categorization, etc.
 Regression
Regression is a data mining technique that predicts numeric values by modeling the relationship
between a dependent variable and one or more independent variables. The model can then be used to
predict future values of the independent variables.
Some examples of what can be predicted using regression include:
Profit, Sales, Mortgage rates, House values, Square footage, Temperature, and Distance.

2] Predictive Data Mining:

 Prediction
Prediction in data mining is the process of using historical data and patterns to make informed estimates
about future or missing data values. It involves the application of various algorithms and techniques to
anticipate numerical values, such as sales figures, or to classify items into predefined categories. This
predictive capability enables businesses and researchers to make data-driven decisions, identify trends,
and enhance their understanding of complex datasets, ultimately facilitating better planning and
strategy development
 Decision Tree
A great way of predicting values is through decision trees, which use a tree-like visualization to explain
how the model reaches a prediction. This allows users to drill deeper into the data and understand the
relationship between the predictors and the predicted value.
 Neural Networks
The most advanced way of performing predictive data mining is by using neural networks, a class of
algorithms that simulate how the human brain works. Neural networks use input, weights, and output
to form a node that acts as a human brain cell – neuron.
 Association Analysis
In association analysis, we identify rules that actively dictate the relationships between the data. For
instance, conducting market basket analysis on a supermarket’s transaction data helps identify
frequently bought items, leading to improved inventory management, optimized product placement,
and effective group discounts.

6) Applications or Uses or Advantages of Data Mining

Data Mining in Healthcare:

Data mining in healthcare has excellent potential to improve the health system. It uses data and analytics for better
insights and to identify best practices that will enhance health care services and reduce costs. Data Mining can be used
to forecast patients in each category. The procedures ensure that the patients get intensive care at the right place and at
the right time. Data mining also enables healthcare insurers to recognize fraud and abuse.

Data Mining in Market Basket Analysis:

Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group of products, then you
are more likely to buy another group of products. This technique may enable the retailer to understand the purchase
behavior of a buyer.

Data mining in Education:

Education data mining is a newly emerging field, concerned with developing techniques that explore knowledge from
the data generated from educational Environmentsn organization can use data mining to make precise decisions and
also to predict the results of the student. With the results, the institution can concentrate on what to teach and how to
teach.
Data Mining in Manufacturing Engineering:
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be beneficial to find patterns
in a complex manufacturing process. Data mining can be used in system-level designing to obtain the relationships
between product architecture, product portfolio, and data needs of the customers
CRM (Customer Relationship Management):
Customer Relationship Management (CRM) is all about obtaining and holding Customers, also enhancing customer
loyalty and implementing customer-oriented strategies. To get a decent relationship with the customer, a business
organization needs to collect data and analyze the data. With data mining technologies, the collected data can be used
for analytics.

Data Mining in Fraud detection:

Data mining provides meaningful patterns and turning data into information. An ideal fraud detection system should
protect the data of all the users. Supervised methods consist of a collection of sample records, and these records are
classified as fraudulent or non-fraudulent.

Data Mining in Lie Detection:

Law enforcement may use data mining techniques to investigate offenses, monitor suspected terrorist communications,
etc.

Data Mining Financial Banking:

The Digitalization of the banking system is supposed to generate an enormous amount of data with every new
transaction. The data mining technique can help bankers by solving business-related problems in banking and finance by
identifying trends, casualties, and correlations in business information and market costs that are not instantly evident to
managers or executives because the data volume is too large or are produced too rapidly on the screen by experts. The
manager may find these data for better targeting, acquiring, retaining, segmenting, and maintain a profitable customer.

7) Interesting Patterns

A data mining system has the potential to generate thousands or even millions of patterns, or rules. then “are all of the
patterns interesting?” Typically, not—only a small fraction of the patterns potentially generated would be of interest to
any given user.
This raises some serious questions for data mining. You may wonder,

1. What makes a pattern interesting?

2. Can a data mining system generate all the interesting patterns?
3. Can a data mining system generate only interesting patterns?
To answer the first question, a pattern is interesting if it is

1. easily understood by humans,

2. valid on new or test data with some degree of certainty,
3. potentially useful, and
4. novel.
The second question―Can a data mining system generate all the interesting patterns?--refers to the completeness of a
data mining algorithm. It is often unrealistic and inefficient for data mining systems to generate all the possible patterns.
Instead, user-provided constraints and interestingness measures should be used to focus the search. A data mining
algorithm is complete if it mines all interesting patterns.
Finally, the third question -- “Can a data mining system generate only interesting patterns?”— is an optimization
problem in data mining. It is highly desirable for data mining systems to generate only interesting patterns. An
interesting pattern represents knowledge.

Fig: interesting Patterns

8) Classification of Data Mining systems

Data Mining is considered as an interdisciplinary field. It includes a set of various disciplines such as statistics, database
systems, machine learning, visualization, and information sciences. Classification of the data mining system helps users
to understand the system and match their requirements with such systems.
Data mining discovers patterns and extracts useful information from large datasets. Organizations need to analyze and
interpret data using data mining systems as data grows rapidly. With an exponential increase in data, active data
analysis is necessary to make sense of it all.
Data mining (DM) systems can be classified based on various factors.

 Classification based on Types of Data Mined

 Classification based on Type of knowledge Mined
 Classification based on Type of Technique Utilized
 Classification based on Application Domain

Classification based on Types of Data Mined

A database mining system can be classified based on ‘type of data’ or ‘use of data’ model or ‘application of data.’
For Example: Relational Database, Transactional Database, Multimedia Database, Textual Data, World Wide Web
(WWW) and etc,

Classification based on Type of knowledge Mined

We can classify a data mining system according to the kind of knowledge mined. It means the data mining system is
classified based on functionalities such as
 Association Analysis
 Classification
 Prediction
 Cluster Analysis
 Characterization
 Discrimination

Classification based on Type of Technique Utilized

We can classify a data mining system according to the kind of techniques used. We can describe these techniques
according to the degree of user interaction involved or the methods of analysis employed.
Data Mining systems use various techniques, including Statistics, Machine Learning, Database Systems, Information
retrieval, Visualization, and pattern recognition.

Classification based on Application Domain

We can classify a data mining system according to the applications adapted. These applications are as follows

 Finance
 Telecommunications
 E-Commerce
 Media Sector
 Stock Markets
9) Data mining Task primitives

A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data
mining query is defined in terms of data mining task primitives. These primitives allow the user to interactively
communicate with the data mining system during the mining process to discover interesting patterns.
Here is the list of Data Mining Task Primitives

 Set of task relevant data to be mined.

 Kind of knowledge to be mined.
 Background knowledge to be used in discovery process.
 Interestingness measures and thresholds for pattern evaluation.
 Representation for visualizing the discovered patterns.
10) Integration of Data mining system with a Data warehouse

The data mining system is integrated with a database or data warehouse system so that it can do its tasks in an effective
mode. A data mining system operates in an environment that needs to communicate with other data systems like a
Database or Data ware house system.
There are different possible integration (coupling) schemes as follows:

 No Coupling
 Loose Coupling
 Semi-Tight Coupling
 Tight Coupling

No Coupling

No coupling means that a Data Mining system will not utilize any function of a Data Base or Data Warehouse system.
It may fetch data from a particular source (such as a file system), process data using some data mining algorithms, and
then store the mining results in another file.

Loose Coupling

In this Loose coupling, the data mining system uses some facilities / services of a database or data warehouse system.
The data is fetched from a data repository managed by these (DB/DW) systems.
Loose coupling is better than no coupling because it can fetch any portion of data stored in Databases or Data
Warehouses by using query processing, indexing, and other system facilities.

Semi-Tight Coupling

Semi tight coupling means that besides linking a Data Mining system to a Data Base/Data Warehouse system, efficient
implementations of a few essential data mining primitives can be provided in the DB/DW system. These primitives can
include sorting, indexing, aggregation, histogram analysis, multi way join, and pre computation of some essential
statistical measures, such as sum, count, max, min, standard deviation.
Tight Coupling

Tight coupling means that a Data Mining system is smoothly integrated into the Data Base/Data Ware house system.
The data mining subsystem is treated as one functional component of information system. Data mining queries and
functions are optimized based on mining query analysis, data structures, indexing schemes, and query processing
methods of a DB or DW system.

Major issues in Data Mining

Data mining, the process of extracting knowledge from data, has become increasingly important as the amount of data
generated by individuals, organizations, and machines has grown exponentially.Data mining is not an easy task, as the
algorithms used can get very complex and data is not always available at one place. It needs to be integrated from
various heterogeneous data sources.
The above factors may lead to some issues in data mining. These issues are mainly divided into three categories, which
are given below:

1. Mining Methodology and User Interaction

2. Performance Issues
3. Diverse Data Types Issues

Fig : Data Mining Issues

11) What is Data Preprocessing?

Data preprocessing is a crucial step in data mining. It involves transforming raw data into a clean, structured, and
suitable format for mining. Proper data preprocessing helps improve the quality of the data, enhances the performance
of algorithms, and ensures more accurate and reliable results.
Major Tasks in Data Preprocessing
Data preprocessing is an essential step in the knowledge discovery process, because quality decisions must be based on
quality data. And Data Preprocessing involves Data Cleaning, Data Integration, Data Reduction and Data Transformation.
Steps in Data Preprocessing
1. Data Cleaning
2. Data integration
3. Data Reduction
4. Data Transformation
1.Data Cleaning:
● Handling missing values: Dealing with cases where some data points have no values by filling them in or
removing them.
● Smoothing noisy data: Removing or reducing random errors or outliers in the data.
● Removing outliers: Identifying and eliminating data points that significantly deviate from the overall pattern.
● Resolving inconsistencies: Correcting discrepancies or conflicts in codes, names, or values across the data.

2.Data Integration:
● Combining data from multiple sources: Bringing together data from different databases, files, or data cubes
into a single, unified format for analysis.

3.Data Transformation:
● Normalizing data: Scaling the values of different attributes to a common range, ensuring they are on the same
scale for accurate analysis.
● Aggregating data: Summarizing or grouping data to a higher level of abstraction, such as calculating averages
or totals.

4.Data Reduction:
● Reducing data volume: Applying techniques to reduce the size of the dataset without losing essential
information.
● Preserving important information: Ensuring that the reduced dataset still retains key
patterns, trends, or characteristics present in the original data.

Virus Dynamics Mathematical Principles of Immunolo
No ratings yet
Virus Dynamics Mathematical Principles of Immunolo
12 pages
Hifonics Atlas Subwoofer Manual
No ratings yet
Hifonics Atlas Subwoofer Manual
8 pages
Data Minng
No ratings yet
Data Minng
20 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
02 DM BI Data Mining
No ratings yet
02 DM BI Data Mining
66 pages
Wao
No ratings yet
Wao
9 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
Data Mining
No ratings yet
Data Mining
43 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Module 4
No ratings yet
Module 4
54 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
DM Module1
No ratings yet
DM Module1
15 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Unit III
No ratings yet
Unit III
101 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Unit 3
No ratings yet
Unit 3
23 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Chap 1
No ratings yet
Chap 1
32 pages
01 Intro
No ratings yet
01 Intro
45 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
Module 3
No ratings yet
Module 3
187 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
03 - 2011 - Markus G. Kuhn - Compromising Emanations of LCD TV Sets
No ratings yet
03 - 2011 - Markus G. Kuhn - Compromising Emanations of LCD TV Sets
7 pages
(English) Фильм 14+ «История первой любви» Смотреть в HD (DownSub.com)
No ratings yet
(English) Фильм 14+ «История первой любви» Смотреть в HD (DownSub.com)
21 pages
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
No ratings yet
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
43 pages
Audit Chapter 5 Remaining Questions (Kindly Printout)
No ratings yet
Audit Chapter 5 Remaining Questions (Kindly Printout)
18 pages
Teacher Evaluation by Students
No ratings yet
Teacher Evaluation by Students
1 page
Brandfootprint2024 GLOBAL BRANDS
No ratings yet
Brandfootprint2024 GLOBAL BRANDS
38 pages
Internship Report - Md. Tanvir Hossain
No ratings yet
Internship Report - Md. Tanvir Hossain
73 pages
Cost
No ratings yet
Cost
26 pages
PHD Lecture26 27
No ratings yet
PHD Lecture26 27
8 pages
Manual Book Switch Tp-Link
No ratings yet
Manual Book Switch Tp-Link
2 pages
Inorganic Chemistry Exam 20110331ans
No ratings yet
Inorganic Chemistry Exam 20110331ans
4 pages
Forecasting of Electric Consumption in A Semiconductor Plant Using Time Series Methods
No ratings yet
Forecasting of Electric Consumption in A Semiconductor Plant Using Time Series Methods
9 pages
LIMPO, JEPHERLIN-LESSON 4 Becoming A Member of Society
100% (1)
LIMPO, JEPHERLIN-LESSON 4 Becoming A Member of Society
9 pages
Yellow Jambhala Cultivation Booklet PDF
No ratings yet
Yellow Jambhala Cultivation Booklet PDF
15 pages
Zhang 2020 J. Phys. Conf. Ser. 1449 012001
No ratings yet
Zhang 2020 J. Phys. Conf. Ser. 1449 012001
6 pages
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
No ratings yet
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
2 pages
Quran Thesis Statement
100% (3)
Quran Thesis Statement
5 pages
(INV0006) Copy Inventory Organization - Simplifying Oracle E Business Suite
0% (1)
(INV0006) Copy Inventory Organization - Simplifying Oracle E Business Suite
3 pages
Lampiran 5. Growth Chart
No ratings yet
Lampiran 5. Growth Chart
14 pages
Kipruto CV
No ratings yet
Kipruto CV
3 pages
O J T Check Sheet
No ratings yet
O J T Check Sheet
1 page
Suprativ Datta Atlassian Resume
No ratings yet
Suprativ Datta Atlassian Resume
1 page
A Jury of Her Peers Questions
No ratings yet
A Jury of Her Peers Questions
2 pages
RSLogix5000系统伺服轴属性说明
No ratings yet
RSLogix5000系统伺服轴属性说明
40 pages
English 9 Prose Story
No ratings yet
English 9 Prose Story
56 pages
Samantha Beasley Resume-2014
No ratings yet
Samantha Beasley Resume-2014
3 pages
Parasound PLD-1100 Owners Manual
No ratings yet
Parasound PLD-1100 Owners Manual
7 pages
Phenotypic Variability and Divergence in Lentil
No ratings yet
Phenotypic Variability and Divergence in Lentil
19 pages

Data Mining Unit 1

Uploaded by

Data Mining Unit 1

Uploaded by

UNIT-1

In other words, The Data is collection of objects defined by attributes.

1) What is Data Mining

Some Terms in Data mining

Relational Database (RDBMS):

Knowledge Discovery from Data (KDD) Process

3) Data Mining architecture

4) What are the Types of Data

What Kinds of Data Can Be Mined

Basic forms of data for mining

 Database Data (or) Relational database

other forms of data for mining

1]Descriptive Data Mining:

2]Predictive Data Mining: This category of data mining

1] Descriptive Data Mining:

2] Predictive Data Mining:

6) Applications or Uses or Advantages of Data Mining

Data Mining in Healthcare:

Data Mining in Market Basket Analysis:

Data mining in Education:

Data Mining in Fraud detection:

Data Mining in Lie Detection:

Data Mining Financial Banking:

1. What makes a pattern interesting?

1. easily understood by humans,

Fig: interesting Patterns

8) Classification of Data Mining systems

 Classification based on Types of Data Mined

Classification based on Types of Data Mined

Classification based on Type of knowledge Mined

Classification based on Type of Technique Utilized

Classification based on Application Domain

 Set of task relevant data to be mined.

Major issues in Data Mining

1. Mining Methodology and User Interaction

Fig : Data Mining Issues

You might also like