An Introduction To Data Mining

Data mining involves collecting, cleaning, analyzing, and gaining insights from data. There has been an explosion in the amount of data generated from sources like the web, financial transactions, user interactions, and sensors. This deluge of data presents both opportunities and challenges for extracting useful knowledge. Data mining addresses this by applying a multi-step process including data collection, cleaning, transformation, and then applying analytical methods to discover patterns. The type of data, from quantitative to text to graphs, also impacts the mining approaches used.

Uploaded by

borisdblejd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

An Introduction To Data Mining

Uploaded by

borisdblejd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

An Introduction to Data Mining

Education is not the piling on of learning, information, data, facts, skills,

or abilities thats training or instruction but is rather making visible
what is hidden as a seed.Thomas More
1.1 Introduction
Data mining is the study of collecting, cleaning, processing, analyzing, and gaining useful
insights from data. A wide variation exists in terms of the problem domains, applications,
formulations, and data representations that are encountered in real applications. Therefore,
data mining is a broad umbrella term that is used to describe these different aspects of
data processing.
In the modern age, virtually all automated systems generate some form of data either
for diagnostic or analysis purposes. This has resulted in a deluge of data, which has been
reaching the order of petabytes or exabytes. Some examples of different kinds of data are
as follows:
World Wide Web: The number of documents on the indexed Web is now on the order
of billions, and the invisible Web is much larger. User accesses to such documents
create Web access logs at servers and customer behavior profiles at commercial sites.
Furthermore, the linked structure of the Web is referred to as the Web graph, which
is itself a kind of data. These different types of data are useful in various applications.
For example, the Web documents and link structure can be mined to determine associations
between different topics on the Web. On the other hand, user access logs can
be mined to determine frequent patterns of accesses or unusual patterns of possibly
unwarranted behavior.
Financial interactions: Most common transactions of everyday life, such as using an
automated teller machine (ATM) card or a credit card, can create data in an automated
way. Such transactions can be mined for many useful insights such as fraud orUser interactions: Many
forms of user interactions create large volumes of data. For
example, the use of a telephone typically creates a record at the telecommunication
company with details about the duration and destination of the call. Many phone
companies routinely analyze such data to determine relevant patterns of behavior
that can be used to make decisions about network capacity, promotions, pricing, or
customer targeting.
Sensor technologies and the Internet of Things: A recent trend is the development
of low-cost wearable sensors, smartphones, and other smart devices that can communicate
with one another. By one estimate, the number of such devices exceeded the
number of people on the planet in 2008 [30]. The implications of such massive data
collection are significant for mining algorithms.
The deluge of data is a direct result of advances in technology and the computerization of
every aspect of modern life. It is, therefore, natural to examine whether one can extract
concise and possibly actionable insights from the available data for application-specific goals.
This is where the task of data mining comes in. The raw data may be arbitrary, unstructured,
or even in a format that is not immediately suitable for automated processing. For example,
manually collected data may be drawn from heterogeneous sources in different formats and
yet somehow needs to be processed by an automated computer program to gain insights.
To address this issue, data mining analysts use a pipeline of processing, where the raw
data are collected, cleaned, and transformed into a standardized format. The data may be
stored in a commercial database system and finally processed for insights with the use of
analytical methods. In fact, while data mining often conjures up the notion of analytical
algorithms, the reality is that the vast majority of work is related to the data preparation
portion of the process. This pipeline of processing is conceptually similar to that of an actual
mining process from a mineral ore to the refined end product. The term mining derives
its roots from this analogy.
From an analytical perspective, data mining is challenging because of the wide disparity
in the problems and data types that are encountered. For example, a commercial product
recommendation problem is very different from an intrusion-detection application, even at
the level of the input data format or the problem definition. Even within related classes
of problems, the differences are quite significant. For example, a product recommendation
problem in a multidimensional database is very different from a social recommendation
problem due to the differences in the underlying data type. Nevertheless, in spite of these
differences, data mining applications are often closely connected to one of four superproblems
in data mining: association pattern mining, clustering, classification, and outlier
detection. These problems are so important because they are used as building blocks in a
majority of the applications in some indirect form or the other. This is a useful abstraction
because it helps us conceptualize and structure the field of data mining more effectively.
The data may have different formats or types. The type may be quantitative (e.g., age),
categorical (e.g., ethnicity), text, spatial, temporal, or graph-oriented. Although the most
common form of data is multidimensional, an increasing proportion belongs to more complex
data types. While there is a conceptual portability of algorithms between many data types
at a very high level, this is not the case from a practical perspective. The reality is that
the precise data type may affect the behavior of a particular algorithm significantly. As a
result, one may need to design refined variations of the basic approach for multidimensional
data, so that it can be used effectively for a different data type. Therefore, this book will
dedicate different chapters to the various data types to provide a better understanding of
how the processing methods are affected by the underlying data type.
A major challenge has been created in recent years due to increasing data volumes. The
prevalence of continuously collected data has led to an increasing interest in the field of data
streams. For example, Internet traffic generates large streams that cannot even be stored
effectively unless significant resources are spent on storage. This leads to unique challenges
from the perspective of processing and analysis. In cases where it is not possible to explicitly
store the data, all the processing needs to be performed in real time.
This chapter will provide a broad overview of the different technologies involved in preprocessing
and analyzing different types of data. The goal is to study data mining from the
perspective of different problem abstractions and data types that are frequently encountered.
Many important applications can be converted into these abstractions.
This chapter is organized as follows. Section 1.2 discusses the data mining process with
particular attention paid to the data preprocessing phase in this section. Different data
types and their formal definition are discussed in Sect. 1.3. The major problems in data
mining are discussed in Sect. 1.4 at a very high level. The impact of data type on problem
definitions is also addressed in this section. Scalability issues are addressed in Sect. 1.5. In
Sect. 1.6, a few examples of applications are provided. Section 1.7 gives a summary.

Anschlussplan DEUTZ TCD 4.1 L4
No ratings yet
Anschlussplan DEUTZ TCD 4.1 L4
5 pages
Operation Manual-3a - Gie-1109-Om3a
100% (1)
Operation Manual-3a - Gie-1109-Om3a
74 pages
Robots Should Replace Teachers
No ratings yet
Robots Should Replace Teachers
4 pages
CSCU Course
No ratings yet
CSCU Course
12 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
1st Slides
No ratings yet
1st Slides
60 pages
unit2
No ratings yet
unit2
20 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
Assignment 5
No ratings yet
Assignment 5
16 pages
Data Moning Seminar Report
No ratings yet
Data Moning Seminar Report
12 pages
DATA MINING UNIT-1
No ratings yet
DATA MINING UNIT-1
59 pages
Ramy mahmoud 52117
No ratings yet
Ramy mahmoud 52117
3 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Data Mining
No ratings yet
Data Mining
157 pages
Introduction Data Science
No ratings yet
Introduction Data Science
29 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Overview of Data Mining
No ratings yet
Overview of Data Mining
4 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
46 pages
DM-Unit_1
No ratings yet
DM-Unit_1
13 pages
Data Mining First draft
No ratings yet
Data Mining First draft
84 pages
Data Mining Group 6
No ratings yet
Data Mining Group 6
21 pages
DMW-M1-Ktunotes.in
No ratings yet
DMW-M1-Ktunotes.in
75 pages
1 ST Review Document
No ratings yet
1 ST Review Document
37 pages
Unit 2
No ratings yet
Unit 2
37 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Applications & Trends in Data Mining: Gaurav Gupta, Geetika Hans, Tamanna Sehgal
No ratings yet
Applications & Trends in Data Mining: Gaurav Gupta, Geetika Hans, Tamanna Sehgal
3 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining Algorithem Imp
No ratings yet
Data Mining Algorithem Imp
5 pages
Acp Excise
No ratings yet
Acp Excise
11 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
IJCSE-01768
No ratings yet
IJCSE-01768
4 pages
World Wide Web: The Number of Documents On The Indexed Web Is Now On The Order
No ratings yet
World Wide Web: The Number of Documents On The Indexed Web Is Now On The Order
1 page
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Whats App
No ratings yet
Whats App
23 pages
Final Document
No ratings yet
Final Document
25 pages
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
No ratings yet
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
4 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
1.1 Introduction To Data Mining: 1.1.1 Moving Toward The Information Age
No ratings yet
1.1 Introduction To Data Mining: 1.1.1 Moving Toward The Information Age
14 pages
Sathyapriya Thesis NEW
No ratings yet
Sathyapriya Thesis NEW
47 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
DMT UNIT 5
No ratings yet
DMT UNIT 5
25 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
12 pages
TJ 11 2017 3 128 132
No ratings yet
TJ 11 2017 3 128 132
5 pages
Data Mining U-1
No ratings yet
Data Mining U-1
10 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Reality Mining: Using Big Data to Engineer a Better World
From Everand
Reality Mining: Using Big Data to Engineer a Better World
Nathan Eagle
4/5 (2)
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Addis Ababa Science & Technology University
No ratings yet
Addis Ababa Science & Technology University
34 pages
Evaluation of Electronic Detonators - Requirements For Shunting & Circuit Testing
100% (1)
Evaluation of Electronic Detonators - Requirements For Shunting & Circuit Testing
27 pages
Chapter 1 - Introduction To Services: Service Marketing Week 1
No ratings yet
Chapter 1 - Introduction To Services: Service Marketing Week 1
14 pages
CBMIAntikainenValkokari TIMReview 2016
No ratings yet
CBMIAntikainenValkokari TIMReview 2016
9 pages
Activity1 - DA - Swot Analysis
No ratings yet
Activity1 - DA - Swot Analysis
3 pages
02 Ts 51
No ratings yet
02 Ts 51
2 pages
5 Internal Parts of The Computer
No ratings yet
5 Internal Parts of The Computer
2 pages
742210V01
No ratings yet
742210V01
2 pages
Generator Protection System: Grid Solutions
No ratings yet
Generator Protection System: Grid Solutions
858 pages
LP SolidStart LVL CSI
No ratings yet
LP SolidStart LVL CSI
6 pages
MS Class 12 CS
No ratings yet
MS Class 12 CS
5 pages
Experts Mobile Billing Solution
No ratings yet
Experts Mobile Billing Solution
6 pages
Talent Guide To Virtual Networking 3
No ratings yet
Talent Guide To Virtual Networking 3
15 pages
V2-2024-Bar-Examplify-Manual
No ratings yet
V2-2024-Bar-Examplify-Manual
32 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Code Blocks
No ratings yet
Code Blocks
5 pages
Worms World Party: European Supplementary Documentation
No ratings yet
Worms World Party: European Supplementary Documentation
11 pages
Yousef Salman Poor - T5 Worksheet 5
No ratings yet
Yousef Salman Poor - T5 Worksheet 5
4 pages
Office 2021
100% (1)
Office 2021
4 pages
Practice test 1 - Sales Admin (Quiz)
No ratings yet
Practice test 1 - Sales Admin (Quiz)
6 pages
Multimedia Systems: Lehrstuhl Für Informatik IV RWTH Aachen
No ratings yet
Multimedia Systems: Lehrstuhl Für Informatik IV RWTH Aachen
13 pages
Specialized Model in Software Engineering: Component Based Development
No ratings yet
Specialized Model in Software Engineering: Component Based Development
6 pages
Final Report Main Page of Motor Driving School System
No ratings yet
Final Report Main Page of Motor Driving School System
6 pages
Part Submission Warrant
No ratings yet
Part Submission Warrant
2 pages
SAP QM Course Syllabus
50% (2)
SAP QM Course Syllabus
3 pages
WEG CFW11 ALC11 PCP Users Manual 10008258371 en
No ratings yet
WEG CFW11 ALC11 PCP Users Manual 10008258371 en
107 pages