Data Mining and Warehousing - L1 & L2

This document provides an overview of data mining and data warehousing. It discusses key concepts in data mining including data preparation, artificial intelligence, association rule learning, clustering, classification, and machine learning. It also defines data warehousing as collecting and managing varied data sources to provide business insights. Common data warehouse architectures include shared memory, shared disk, and shared nothing. The document also discusses characteristics, processes, and mapping data warehouses to multiprocessor architectures.

Uploaded by

Deepika Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views30 pages

Data Mining and Warehousing - L1 & L2

Uploaded by

Deepika Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Data Mining and Warehousing

Lecture-1,2

Dr. Shweta Sharma

School of Computing Information Technology
Manipal University Jaipur
India
Data Mining
• Data mining is the process of analyzing massive volumes of data to
discover business intelligence that helps companies solve problems,
mitigate risks, and seize new opportunities. This branch of data
science derives its name from the similarities between searching for
valuable information in a large database and mining a mountain for
ore. Both processes require sifting through tremendous amounts of
material to find hidden value.
Conti…
Data Mining Concepts
Achieving the best results from data mining requires an array of tools and techniques. Some of the most commonly-used functions
include:

• Data cleansing and preparation — A step in which data is transformed into a form suitable for further analysis and processing, such as
identifying and removing errors and missing data.
• Artificial intelligence (AI) — These systems perform analytical activities associated with human intelligence such as planning, learning,
reasoning, and problem-solving.
• Association rule learning — These tools, also known as market basket analysis, search for relationships among variables in a dataset,
such as determining which products are typically purchased together.
• Clustering — A process of partitioning a dataset into a set of meaningful sub-classes, called clusters, to help users understand the
natural grouping or structure in the data.
• Classification — This technique assigns items in a dataset to target categories or classes with the goal of accurately predicting the target
class for each case in the data.
• Data analytics — The process of evaluating digital information into useful business intelligence.
• Data warehousing — A large collection of business data used to help an organization make decisions. It is the foundational component
of most large-scale data mining efforts.
• Machine learning — A computer programming technique that uses statistical probabilities to give computers the ability to “learn”
without being explicitly programmed.
• Regression — A technique used to predict a range of numeric values, such as sales, temperatures, or stock prices, based on a particular
data set.
Conti…
Advantages of Data Mining
For example, data mining can tell you which prospects are likely to become profitable customers
based on past customer profiles, and which are most likely to respond to a specific offer. With this
knowledge, you can increase your return on investment (ROI) by making your offer to only those
prospects likely to respond and become valuable customers.
• Increasing revenue.
• Understanding customer segments and preferences.
• Acquiring new customers.
• Improving cross-selling and up-selling.
• Retaining customers and increasing loyalty.
• Increasing ROI from marketing campaigns.
• Detecting fraud.
• Identifying credit risks.
• Monitoring operational performance.
Data Warehousing?
• A Data Warehousing (DW) is process for collecting and managing
data from varied sources to provide meaningful business insights. A
Data warehouse is typically used to connect and analyze business
data from heterogeneous sources. The data warehouse is the core of
the BI system which is built for data analysis and reporting.
• It is a blend of technologies and components which aids the strategic
use of data. It is electronic storage of a large amount of information
by a business which is designed for query and analysis instead of
transaction processing. It is a process of transforming data into
information and making it available to users in a timely manner to
make a difference.
Conti…
Conti…
Data warehouse architecture
Conti…
Conti…
Conti…
Data warehouse system is also known by the following name:
• Decision Support System (DSS)
• Executive Information System
• Management Information System
• Business Intelligence Solution
• Analytic Application
• Data Warehouse
How Datawarehouse works?
A Data Warehouse works as a central repository where information arrives from one or more data
sources. Data flows into a data warehouse from the transactional system and other relational
databases.
Data may be:
1.Structured
2.Semi-structured
3.Unstructured data
The data is processed, transformed, and ingested so that users can access the processed data in the
Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data
warehouse merges information coming from different sources into one comprehensive database.
By merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available. Data
warehousing makes data mining possible. Data mining is looking for patterns in the data that may
lead to higher sales and profits.
Characteristics of data warehousing
• Subject oriented: data are organized by detailed subject containing only information
relevant for decision support. It provides a more comprehensive view of the organization
• Integrated: data warehouses must place data from different sources into a consistent
format
• Time variant (time series): it contains historical (daily, weekly and monthly) inc addition
to current data (real-time)
• Nonvolatile: data can not be changed or updated after it had entered into data
warehouse. Obsolete (Old) data are discarded and changes are recorded as new data
• Web based: designed for web based applications
• Relational/multidimensional: its structure is either relational or multidimensional
• Uses Client/server: so as to be easy to access.
• Real-time: this a character for new data warehouse
• Include metadata: it is a data about data (about how data are organized and to use
them)
Conti…
• Data mart
A departmental data warehouse that stores only relevant data
(usually smaller that warehouse)
• Dependent data mart
A subset that is created directly from a data warehouse
• Independent data mart
A small data warehouse designed for a strategic business unit (SBU) or
a department and its source is not the EDW (Enterprise Data
Warehouse)
Conti…
• Operational data stores (ODS)
A type of database often used as an interim (temporal) area for a data
warehouse, especially for customer information files
• Oper marts
An operational data mart. An oper mart is a small-scale data mart
typically used by a single department or functional area in an
organization when they need to analyze operational data
• Enterprise data warehouse (EDW)
A technology that provides a vehicle for pushing data from source
systems into a data warehouse that is used across the enterprise for
decision support
• Metadata
Data about data. In a data warehouse, metadata describe the
contents of a data warehouse and the manner of its use
Data Warehousing Process Overview
• Organizations continuously collect data, information, and knowledge
at an increasingly accelerated rate and store them in computerized
systems
• The number of users needing to access the information continues to
increase as a result of improved reliability and availability of network
access, especially the Internet
Data Objects and Attribute Types
• Data sets are made up of data objects. A data object represents
an entity—in a sales database, the objects may be customers,
store items, and sales; in a medical database, the objects may be
patients; in a university database, the objects may be students,
professors, and courses. Data objects are typically described by
attributes. Data objects can also be referred to as samples,
examples, instances, data points, or objects. If the data objects are
stored in a database, they are data tuples. That is, the rows of a
database correspond to the data objects, and the columns
correspond to the attributes. In this section, we define attributes
and look at the various attribute types.
• Data: It is how the data objects, and their attributes are stored.

• An attribute is an object’s property or characteristics. For example. A
person’s hair color, air humidity etc.
• An attribute set defines an object. The object is also referred to as a
record of the instances or entity.
Mapping the data warehouse to a
multiprocessor architecture
To manage large number of client requests efficiently, database vendor’s designed parallel hardware
architectures by implementing multiserver and multithreaded systems. This is called interquery
parallism in which different server threads handle multiple requests at the same time.
This can be implemented on SMP systems, where it increases throughput and allowed the support
of more concurrent users.

Data warehouse can be mapped into different type of architectures as follows:

• Shared memory architecture

• Shared disk architecture

• Shared nothing architecture

Multiprocessor Architecture
This architecture is simple to implement, and the key idea is
that a single RDBMS server can potentially utilize all
processors, access all memory and access the entire
database.
There are three DBMS software architecture
styles for parallel processing:
1. Shared memory or shared everything Architecture
2. Shared disk architecture
3. Shred nothing architecture
1. Shared Memory Architecture
Tightly coupled shared memory systems, illustrated in following figure
have the following characteristics:
 Multiple PUs share memory.
 Each PU has full access to all shared memory through a common bus.
Communication between nodes occurs via shared memory.
Performance is limited by the bandwidth of the memory bus.
Conti…
Parallel processing advantages of shared memory
systems are these:
• Memory access is cheaper than inter-node
communication. This means that internal
synchronization is faster than using the Lock
Manager.
• Shared memory systems are easier to administer
than a cluster.
A disadvantage of shared memory systems for
parallel processing is as follows:
• Scalability is limited by bus bandwidth and latency,
and by available memory.
Shared Disk Architecture
Shared disk systems are typically loosely coupled.
Such systems, illustrated in following figure, have
the following characteristics:
• Each node consists of one or more PUs and
associated memory.
• Memory is not shared between nodes.
• Communication occurs over a common high-
speed bus.
• Each node has access to the same disks and other
resources.
• A node can be an SMP if the hardware supports it.
• Bandwidth of the high-speed bus limits the
number of nodes (scalability) of the system.
Shared Nothing Architecture
Advantages.
• Shared nothing systems provide for incremental growth.
• System growth is practically unlimited.
• MPPs are good for read-only databases and decision support
applications.
• Failure is local: if one node fails, the others stay up.
Disadvantages
• More coordination is required.
• More overhead is required for a process working on a disk belonging
to another node.
• If there is a heavy workload of updates or inserts, as in an online
transaction processing system, it may be worthwhile to consider
data-dependent routing to alleviate contention.

Diagrama Elétrico Rolo 3411
100% (1)
Diagrama Elétrico Rolo 3411
67 pages
Data Warehousing Research Paper
50% (2)
Data Warehousing Research Paper
7 pages
Interview Abinitio
100% (2)
Interview Abinitio
28 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
Module 1
No ratings yet
Module 1
32 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
31 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
ALL YOU NEED Data - Mining - and - Warehousing
No ratings yet
ALL YOU NEED Data - Mining - and - Warehousing
42 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
5, Data Warehousing
No ratings yet
5, Data Warehousing
16 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Kushalproject 1
No ratings yet
Kushalproject 1
77 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
12 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
BMIS Chapter 4 SCMSB
No ratings yet
BMIS Chapter 4 SCMSB
35 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
1 What Is Data Mining
No ratings yet
1 What Is Data Mining
9 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
52 pages
1,2 Units Notes
No ratings yet
1,2 Units Notes
53 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Data Mining Warehousing I & II
No ratings yet
Data Mining Warehousing I & II
7 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
6 pages
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
No ratings yet
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
18 pages
Unit Ii-Ba
No ratings yet
Unit Ii-Ba
16 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
UNIT 1 Datamining & Warehousing
No ratings yet
UNIT 1 Datamining & Warehousing
6 pages
Data Ware House
No ratings yet
Data Ware House
203 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Unit 1
No ratings yet
Unit 1
22 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
108 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
Data Warehousing
No ratings yet
Data Warehousing
30 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
14 pages
Data Warehousing & Data Mining: Unit-1
No ratings yet
Data Warehousing & Data Mining: Unit-1
24 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
11 pages
Data Warehousing and Data Mining Original Notes
No ratings yet
Data Warehousing and Data Mining Original Notes
47 pages
Unit Ii-Ba (2) - 1
No ratings yet
Unit Ii-Ba (2) - 1
29 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
51 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
DWDM Unit-1
No ratings yet
DWDM Unit-1
31 pages
Data Warehouse and Data Mining - Neccessity or Useless Investment
No ratings yet
Data Warehouse and Data Mining - Neccessity or Useless Investment
8 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
14 pages
3025 Fluorescence Microscope System Manual PDF
100% (1)
3025 Fluorescence Microscope System Manual PDF
16 pages
Banking and Insurance
50% (2)
Banking and Insurance
13 pages
Systemair Fans KVO Data Sheet Eng PDF
No ratings yet
Systemair Fans KVO Data Sheet Eng PDF
4 pages
E+H-PROMAG W 400 - Tender Text - TTW400EN
No ratings yet
E+H-PROMAG W 400 - Tender Text - TTW400EN
2 pages
Background To IPSAS Implementation in Nigeria
67% (3)
Background To IPSAS Implementation in Nigeria
28 pages
Grundfos - CR 5 12 A A A E HQQE
No ratings yet
Grundfos - CR 5 12 A A A E HQQE
10 pages
Letter To Ranjit Sinha CBI Director July 1, 2013-FIR Against Ramnish With Annex
No ratings yet
Letter To Ranjit Sinha CBI Director July 1, 2013-FIR Against Ramnish With Annex
44 pages
BSCPL Tech Spec MLTP Botanical R00
No ratings yet
BSCPL Tech Spec MLTP Botanical R00
57 pages
Statistical Analysis of Data From The Stock Market
No ratings yet
Statistical Analysis of Data From The Stock Market
25 pages
MS9882 10 Military Fasteners Com
No ratings yet
MS9882 10 Military Fasteners Com
2 pages
Copy of Copy of LOCAL BIRTH CERTIFICATE - 20250116 - 135004 - 0000.pdf - 20 - 20250221 - 121021 - 0000
No ratings yet
Copy of Copy of LOCAL BIRTH CERTIFICATE - 20250116 - 135004 - 0000.pdf - 20 - 20250221 - 121021 - 0000
4 pages
BUILDING AND ENHANCING NEW LITERACIES ACROSS THE CURRICULUM Module 2
No ratings yet
BUILDING AND ENHANCING NEW LITERACIES ACROSS THE CURRICULUM Module 2
11 pages
Great Debaters
No ratings yet
Great Debaters
51 pages
Top 50 SAP ABAP Interview Questions and Answers PDF
No ratings yet
Top 50 SAP ABAP Interview Questions and Answers PDF
12 pages
Nansy Oops Spider Eng
100% (1)
Nansy Oops Spider Eng
5 pages
كاتلوج 2
No ratings yet
كاتلوج 2
44 pages
Case Study - Yangpu - Riverfront
No ratings yet
Case Study - Yangpu - Riverfront
2 pages
Arts 10: 3 Quarter Week 3
No ratings yet
Arts 10: 3 Quarter Week 3
10 pages
Los Campeones
No ratings yet
Los Campeones
1 page
Sugar As On 01-08-2024
No ratings yet
Sugar As On 01-08-2024
1 page
Academic Calendar Spring 2018 FINAL
No ratings yet
Academic Calendar Spring 2018 FINAL
1 page
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
No ratings yet
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
20 pages
Food Packaging: Unit 1 - Metals
No ratings yet
Food Packaging: Unit 1 - Metals
22 pages
How The World Sees You
100% (1)
How The World Sees You
10 pages
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
100% (1)
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
35 pages
VTX PDF
No ratings yet
VTX PDF
6 pages
IPC Engineering Critical Assessment of Dents and Dents With Cracks Using Inline Inspection
No ratings yet
IPC Engineering Critical Assessment of Dents and Dents With Cracks Using Inline Inspection
9 pages
Kolkata Faculty List DG Upload Jan 2023
No ratings yet
Kolkata Faculty List DG Upload Jan 2023
3 pages
Mod 7
No ratings yet
Mod 7
70 pages

Data Mining and Warehousing - L1 & L2

Uploaded by

Data Mining and Warehousing - L1 & L2

Uploaded by

Data Mining and Warehousing

Dr. Shweta Sharma

Data warehouse can be mapped into different type of architectures as follows:

• Shared memory architecture

• Shared disk architecture

• Shared nothing architecture

You might also like