0% found this document useful (0 votes)

144 views11 pages

Data Warehousing and Data Mining - Thara - M.Tech Cse

This document provides information on data warehousing and data mining. It includes a table listing books and authors on these topics. It also defines and compares key terms such as data vs. information, open data vs. informational data, database systems vs. information systems, and proprietary systems vs. open systems. Finally, it provides a brief introduction to data warehousing, data marts, data mining, metadata, and data sets.

Uploaded by

thilaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views11 pages

Data Warehousing and Data Mining - Thara - M.Tech Cse

Uploaded by

thilaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

[DATA WAREHOUSING AND DATA

SMVEC ASSIGNMENT
NO 1 – DATA MINING
AND DATA

MINING – THARA - M.TECH CSE]

WAREHOUSING

D.THARA PARAMESWARI
Task 1 : Books name with authors, Task 2 – comparative analysis,Task 3- Brief introduction about data WH, data Mng,
Meta data, Data sets, Task 4 – various tools, Task 5 – big data workshop notes.
S.NO AUTHOR NAME TITLE NAME RACK NO ( STUD SESSION )
1 Jiawei Han and Micheline Data Mining Concepts and Techniques- 230,225
Kamber Elsevier, 2nd Edition, 2008
2 Alex Berson and Stephen J. Data Warehousing, Data Mining & OLAP 204, 225, 217
Smith
3 K.P. Soman, Shyam Diwakar “Insight into Data mining Theory and 217, 230
and V. Ajay Practice”, Prentice Hall of India, Easter
Economy Edition, 2006
4 G. K. Gupta “Introduction to Data Mining with Case 217,230
Studies”, Prentice Hall of India,
Easter Economy Edition, 2006.
5 Pang-Ning Tan, Michael “Introduction to Data Mining”, 230
Steinbach and Vipin Kumar Pearson Education, 2007.
6 Kargupta, Joshi, Sivakumar Data Mining – Next Generation and Future 209
&Yesha Generation
7 Sam Anahory , Dennis Data Warehousing in the real world 230
Murray
TASK 1 :

TASK 2 : COMPARITIVE ANALYSIS OF THE FOLLOWING TERMS :-

1. DATA Vs INFORMATION
2. OPEN DATA Vs INFORMATIONAL DATA
3. DATABASE SYSTEM Vs INFORMATION SYSTEM
4. PROPRIETORY SYSTEM Vs OPEN SYSTEM

Data Vs Information
Data and Information are important to the design of databases. The term data is used to describe raw facts (not
yet processed) about something or some one. Data is raw facts which has no context and just left with numbers
and text from which the required information is derived.

Information is nothing but the data with context, a processed data, and a value added to the data which is
summarized, organized, analyzed.
For Example :
Data : 211215
Information : 02/12/15 – is the review date of the first phase of the project.
211215 - can be the range of a salary
211215 - can be the zip code.

Data : The other example is that the data to represent the growth of the company. And the data
are 6.34 ,6.45, 6.39, 6.62, 6.57, 6.64, 6.71, 6.82, 7.12, 7.06
SIRIUS SATELLITE RADIO INC.
Information: The graph shows the
$7.20
Processed data. $7.00

$6.80
Stock Price

$6.60

$6.40

$6.20

$6.00

Data Warehousing and Data Mining – Thara - M.Tech CSE $5.80

1 2 3 4 5 6 7 8 9 10
Last 10 Days
Open data Vs Informational data
It is similar to data Vs information which follows the sequential steps to attain the information by
summarizing the data, averaging the data, selecting the part of the data, graphing the data, adding context, and adding
value to the data which results in Knowledge.

A simple example : - name = sham

Class = 12
Age = 16
Marks = 80
Subject = Mathematics

The processed information for the above specified data and information are : sham is in 16yrs, sham is in
twelfth standard and scored 80% in mathematics.

Database System Vs Informational System

The generic goal is that, transforming data into information and the core of an information is mentioned as
information system or otherwise called as Database (raw data). Data does not appear, data has to be captured. The
captured data have other purposes say ( i ) Transaction Processing System ( TPS ) ( ii ) Process Control System. Under
processing the basic system is composed of 5 components say : Input, output, Processing, Feedback, Control.

input output
processing
raw data information

Processing leads to summarizing, computing average, graphing, creating charts, visualizing data. The processing systems
are also called as navigation system, say for example a specialized geographic Information system.

Input : Maps, Addresses, Point of Interest – “ Yellow Pages”

Processing : computing shortest path, finding the nearest Chinese restaurant.
Output : Directions ( each turn with a map followed by an arrows )
List of Chinese restaurants sorted by distance.

Proprietary System Vs Open System

An open systems are free to distribute

Provides the original source code
Allows for unrestricted modification of the original source code
Allows modified code to be redistributed either as a modified version of the original code, or as the original code
along with any patch files that may be used to modify the original code
May not discriminate against any person or group
May not be restricted from any fields of endeavor
Applies the same license to all whom the program is distributed to
May not be program or product specific
May not restrict other software
Must be platform or technology neutral

By contrast, proprietary systems are closed-source software follows few, if any, of these requirements. Most
proprietary software has limited licenses, often cost money, cannot be redistributed, and cannot be altered. The primary
advantage to open-source system is that is often at least moderately powerful compared to proprietary system, yet

Data Warehousing and Data Mining – Thara - M.Tech CSE

costs nothing. Additionally, because open-source system is able to be modified by individual users, any features that are
desired of the program can be added by a user of sufficient programming skill, and then those features can be utilized by
anybody that desires to do so. As such, open-source system is flexible and can often meet the needs of anybody that
chooses to use it. In comparison, the primary advantage of proprietary system is that, while potentially expensive, it is
often more powerful and feature-rich than open-source alternatives. Proprietary system also has, usually, the benefit of
a dedicated technical support department that can assist with both training as well as in troubleshooting and problem
solving any issues that may arise from using their product.

For Example :

When it comes to databases, one of the most popular open-source solutions is MySQL. MySQL has many of the
features that can be found in most commercial, proprietary database management systems (Lorini, 2010). MySQL is
robust, it has high availability, and has an available GUI management system comparable to Microsoft’s SQL Server
Management Studio. MSSQL Server, however, does have a few advantages unavailable in MySQL, such as partitioning
and external rights management, features which may not be used by a small business. As with most cases, the needs of
the user must be identified before the database solution is chosen. It can generally be said that the smaller the business,
the more likely an open-source solution will work best. As a company grows in size, however, more suitable solutions
must be found, solutions that provide more support and greater number of features. In the end, the user must decide
which solution works best for their purpose.

TASK 3 : BRIEF INTRODUCTION ABOUT THE FOLLOWING TERMS

1. DATA WAREHOUSING
2. DATA MARTING
3. DATA MINING
4. META DATA
5. DATA SETS

Data Warehousing : ( reference : net exam book : harihant publications)

Data warehouse generalize and consolidate data in multi dimensional space. The construction of datawarehouse
involves data cleaning, data integration and data transformation and can be viewed as an important
preprocessing step for data mining. Data warehouse provides architecture tools for business executives to
systematically organize, understand and use their data to make strategic decisions. Data Warehouse provides
online analytical processing tool for the interactive analysis of multi-dimensional data of varied granularities,
which facilitates effective data generalization and data mining. A data warehouse is a subject oriented,
integrated , time variant and non-volatile collection of data in support of management’s decision making
process.
A three Tier Datawarehouse Architecture:
Datawarehouses often adopt a three architecture.
1. The bottom tier is a warehouse database that is almost always a relational database system. Back-end tools
and utilities are used to feed data into the bottom tier from operational databases or other external sources.
2. The middle tier is an OLAP server that is typically implemented using either-
(i) A relational OLA ( ROLAP) model that is extended DBMS that maps operation on multi-dimensional
data to standard relational operations.
(ii) A Multidimensional OLAP (MOLAP) model that is a special purpose server that directly implements
multi-dimensional data and operations.
3. The top tier is a front –end client layer, which contains query and reporting tools, analysis tools and/or data
mining tools.

Data Warehousing and Data Mining – Thara - M.Tech CSE

The following diagram depicts the three-tier architecture of data warehouse:

Data Mining: ( reference : net exam book : harihant publications)

Data mining can be said as Knowledge Discovery from Data ( MDD ). Data mining means extraction of data or
find interesting data patterns in large data sets. The terms which are considered as data mining- knowledge mining from
data, knowledge extraction, Data/pattern analysis, Data archeology, data dredging. Essential steps are considered in the
process of knowledge discovery

1. Data cleaning : All the noises and inconsistencies are removed and make the data of knowledge noise free.
2. Data Integration : It is where multiple data sources may be combined and store the required information in
coherent data store as in data store as in data warehousing.
3. Data selection : It is where data relevant to the analysis task are retrieved from the database ( coherent store )
4. Data transformation : It is where the data are transformed or consolidated into forms appropriate for mining by
performing summary or aggregation operations.
5. Data Mining : An essential process where intelligent methods are applied in order to extract data patterns.
6. Pattern Evaluation : Identify the truly interesting patterns representing knowledge based on some interesting
measures.
7. Knowledge Presentation : It is where visualization and knowledge representation techniques are used to
present the mined knowledge to the user.

Data Warehousing and Data Mining – Thara - M.Tech CSE

remove multiple data relevant
Data cleaning Data Data Data
noise transformation
integration will combine selection data is selected

data trasformed
into appropriate
form for mining

Methods are applied

Identify pattern Data mining
Pattern
Knowledge To extract data
evaluation
presentation

Represented in Mined data

Architecture of Typical data mining

1. Database, data warehouse, world wide web or other information repository
This is one or a set of databases , data warehosuses spreadsheets or other kinds of information repositories.
Data cleaning and data integration techniques may be performed on the data.
2. Database or data warehouse server
The database or data warehouse server is responsible for fetching the relevant data, based on the user’s
data mining request.
3. Knowledge base
Knowledge base is the domain knowledge that is used to guide the search or evaluate the interestingness
of resulting patterns.
4. Data Mining engine
It is a set of modules for tasks such a characterization, association and correlation analysis, classification,
prediction, cluster analysis, outlier analysis and evaluation analysis.
5. Pattern evaluation module
This component employs interestingness measures and interact with data mining modules.
6. User Interface
This module communicates between users and the data mining system allowing user to interact with
system by specifying a data mining query or task.

User interface
Knowledge
Pattern evaluation base

Data mining engine

Database or datawarehouse server

Data cleaning, integration and selection

www Other
database Data warehouse
infromatin
repository

Data Warehousing and Data Mining – Thara - M.Tech CSE

DATA MINING ARCHITECTURE

Data Marting ( reference [7])

DATA marts are created for the following reasons:

 To spped up the quesries by reducing the volume of data to be scanned.
 To structure data in a form suitable for a user access tool.
 To partition data in order to impose access control strategies.
 To segment data into different hardware platforms
The operational cost of data marting are highand once startgey is in place , it can be difficult to change without
incurring substantial revedevoplememt costs.

Before designing data marting , one must make sure the appropriate strategies are very much necessary for a
particular situation.

To reduce the cost and make the data mart to fit your bill, the following the steps are to be followed:

 Identify whether a natural functional split within the organization.

 Identify whether there is a natural split of the data.
 Identify whether the proposed user access tool uses its own database structure.
 Identify whether any infrastructure issues predicate the use of data marts.
 Identify whether there are any access control issues that require data marts to provide Chinese walls.

Note : It is recommended that one has to allow data to be loaded into an enterprise data warehouse and then to
be data marted.

Identify functional splits

Here one has to determine whether the business is structured in such a way as to benefit from functionally
splitting the data.

For example considering a retail organization in which each is responsible for maximizing the sales of a group of
products.

This means that the information in a data warehouse will have its value when :
 Sales transaction on daily level, to monitor the actual sales
 Sales forecast on weekely basis
 stock position on daily basis, to monitor stock levels
 stock movements on daily basis , to monitor supplier or shrinkage issues.

All this information can form substantial data volumes when , by the nature of the role , the merchant is not
interested in products that one is not responsible for.

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data mart

Department 1

Summary info

Detailed
Information

Data mart
Meta data

Department 2

Data mart

Department 3

One may think or consider that data marting the subset of data dealing with product group of interest, because the
merchant is unlikely to query about the other products. Here a big question arises when there is a change in the product
from one department to the other.

With further investigation , the functionality of the departmental split are valid where it requires additional information.

Meta data ( reference [7] – please see on the top )

Meta data in general describes about something else. Metadata is used for:
 data transformation and load
 data management
 query generation
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes
that need to be made. The advantage of storing metadata about the data being transformed is that as source data
changes can be captured in the metadata and transformation program automatically regenerated. For each source data
field the following information is required:
 source field
- unique identifier

Data Warehousing and Data Mining – Thara - M.Tech CSE

- name
- type
- location
-system and object
The destination field needs to be described in a similar way to the source :
 destination
- unique identifier
- name
- type
- tablename
The other information that needs to be stored in the transformation or transformation that need to be applied to turn
the source data into destination data:
- transformation(s)
- name
- language
-module name
-syntax
Here what s worse is that potentially every source system is differently enclosed with different data types and so on.
Another common difficulty is how to deal with many-to-one, one-to-many and many-to-many mappings from source to
destination. If there are many of these mappings you should probably be using a case tool or a third party
transformation mapping tool to handle the data transformation.

The advantage of transformation and mapping tools is that they will do all the described above and more. The main
disadvantage of these tools is their cost. The other prime disadvantage of many transformation tools is that the code
they generate is not efficient.

Reference : [7] ( please see on the top )

Data sets

There are several standard datasets that we will come back to repeatedly. Different datasets tend to expose new issues
and challenges, and it is interesting and instructive to have in mind a variety of problems when considering learning
methods. In fact, the need to work with different datasets is so important that a corpus containing around 100 example
problems has been gathered together so that different algorithms can be tested and compared on the same set of
problems.

Another problem with actual real-life datasets is that they are often proprietary. No corporation is going to share its
customer and product choice database with you so that you can understand the details of its data mining application
and how it works. Corporate data is a valuable asset, one whose value has increased enormously with the development
of data mining techniques.

The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely
fictitious, it supposedly concerns the conditions that are suitable for playing some unspecified game. In general,
instances in a dataset are characterized by the values of features, or attributes, that measure different aspects of the
instance. In this case there are four attributes: outlook, temperature, humidity, and windy.

Table 1.2

The Weather Temperatur Humidit Wind Pla

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data e y y y
Outlook

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No

The rules we have seen so far are classification rules: they predict the classification of the example in terms of whether
or not to play. It is equally possible to disregard the classification and just look for any rules that strongly associate
different attribute values. These are called association rules. Many association rules can be derived from the weather
data in Table 1.2. Some good ones are as follows:

If temperature = cool then humidity = normal

If humidity = normal and windy = false then play = yes
If outlook = sunny and play = no then humidity = high
If windy = false and play = no then outlook = sunny
and humidity = high.
Reference : http://searchbusinessanalytics.techtarget.com/feature/Simple-data-mining-examples-and-datasets

TASK 5:

Various tools
General purpose data mining tools such Clementine and Enterprise Miner are designed to analyze large
commercial databases. Although these tools were primarily designed to identify customer buying patterns in market
basket data, they have also been used in analyzing scientific and engineering data, astronomical data, multimedia data,
genomic data and web data .

Data Warehousing and Data Mining – Thara - M.Tech CSE

As an alternative, several research groups started to develop suites of programs that shared data formats and
provided tools for evaluation and reporting. An early example of such an implementation is MLC a machine learning
library in C with a command line interface that featured several then-standard data analysis techniques from machine
learning.

MLC was also designed as an object-oriented library, extendible through algorithms written by a user who could reuse
parts of the library as desired. Command line interfaces, limited interaction with the data analysis environment, and
textual output of inferred models and their performance scores were not things a physician or medical researcher would
get too excited about. To be optimally useful for researchers, data mining programs needed to provide built-in data
visualization and the ability to easily interact with the program. With the evolution of graphical user interfaces and
operating systems that supported them, data mining programs started to incorporate these features. MLC For instance,
was acquired by Silicon Graphics in mid 1990s, and turned into Mine Set at that time the most sophisticated data mining
environment with many interesting data and model visualizations. Mine Set implemented an interface whereby the data
analysis schema was in a way predefined: the user could change the parameters of analysis methods, but not the
composition of the complete analysis pathway.

Clementine (http://www.spss.com/clementine), another popular commercial data mining suite, pioneered user control
over the analysis pathway by embedding various data mining tasks within separate components that were placed in the
analysis schema and then linked with each other to construct a particular analysis pathway. Several modern open-source
data mining tools use a similar visual programming approach that, because it is flexible and simple to use, may be
particularly appealing to data analysts and users with backgrounds other than computer science.

Flexibility and extensibility in analysis software arise from being able to use existing code to develop or extend one’s
own algorithms.

For example,
Weka (http://www.cs.waikato.ac.nz/ml/weka/), a popular data mining suite, offers a library of well-documented Java-
based functions and classes that can be easily extended, provided sufficient knowledge of Weka’s architecture and Java
programming. A somewhat different approach has been taken by other packages, including R (http://www.r-
project.org), which is one of the most widely known open-source statistical and data mining suites. Instead of extending
R with functions in C (the language of its core) R also implements its own scripting language with an interface to its
functions in C. Most extensions of R are then implemented as scripts, requiring no source-code compilation or use of a
special development environment.

Recently, with advances in the design and performance of general purpose scripting languages and their growing
popularity, several data mining tools have incorporated these languages. The particular benefit of integration with a
scripting language is the speed (all computationally intensive routines are still implemented in some fast low-level
programming language and are callable from the scripting language), flexibility (scripts may integrate functions from the
core suite and functions from the scripting language’s native library), and extensibility that goes beyond the sole use of
the data mining suites through use of other packages that interface with that particular scripting language. Although
harder to learn and use for novices and those with little expertise in computer science or math than systems driven
completely by graphical user interfaces, scripting in data mining environments is essential for fast prototyping and
development of new techniques and is a key to the success of packages like R.

Reference :
Blaz Zupan, PhD , Janez Demsar, PhD “ Open-source tools for Data Mining” – Clin Lab Med , v 28, 2008, pp 37 -54.

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data Mining Techniques
No ratings yet
Data Mining Techniques
108 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
Management Information Systems 2
No ratings yet
Management Information Systems 2
208 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
Cs 903advanced Computer Architecture Unit - I
No ratings yet
Cs 903advanced Computer Architecture Unit - I
57 pages
DWM Cheatsheet Sem 5
No ratings yet
DWM Cheatsheet Sem 5
27 pages
FastAPI Slides
No ratings yet
FastAPI Slides
332 pages
ST-030 Correct Installation and Handling of Connecting Rods PDF
100% (1)
ST-030 Correct Installation and Handling of Connecting Rods PDF
7 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
BCS403 DBMS Module2 Notes
No ratings yet
BCS403 DBMS Module2 Notes
25 pages
Knowledge Representation and Mappings
No ratings yet
Knowledge Representation and Mappings
15 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
What Motivated Data Mining? Why Is It Important?: The Evolution of Database Technology
100% (1)
What Motivated Data Mining? Why Is It Important?: The Evolution of Database Technology
18 pages
Unit 2
No ratings yet
Unit 2
144 pages
Clothes Shopping Website Thesis
No ratings yet
Clothes Shopping Website Thesis
50 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
DWM Module 2
No ratings yet
DWM Module 2
122 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
8 Data Warehousing
No ratings yet
8 Data Warehousing
113 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
Pptcs 1661
No ratings yet
Pptcs 1661
38 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
Unit 2 - V2 - Data Science
No ratings yet
Unit 2 - V2 - Data Science
23 pages
Down 2
No ratings yet
Down 2
61 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
DM Unit2 (Part1)
No ratings yet
DM Unit2 (Part1)
19 pages
Unit - 1 (B) DWM
No ratings yet
Unit - 1 (B) DWM
26 pages
Unit-1 DMDW
No ratings yet
Unit-1 DMDW
22 pages
Datamining 1
No ratings yet
Datamining 1
21 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
19 pages
Data Minng
No ratings yet
Data Minng
20 pages
Syllabus:: 1.1 Data Mining
No ratings yet
Syllabus:: 1.1 Data Mining
30 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
Data Mining Important
No ratings yet
Data Mining Important
15 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Data Warehouse and Data Mining - Definition and Concepts
No ratings yet
Data Warehouse and Data Mining - Definition and Concepts
20 pages
Data Warehousing and Data Mining: Dr. Karunendra Verma
No ratings yet
Data Warehousing and Data Mining: Dr. Karunendra Verma
101 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
11 pages
Data Mining
No ratings yet
Data Mining
14 pages
Im 7 Data Warehousing and Data Mining
No ratings yet
Im 7 Data Warehousing and Data Mining
21 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
Data Base Management Sysytem
No ratings yet
Data Base Management Sysytem
26 pages
DMDW Imp Ques
No ratings yet
DMDW Imp Ques
17 pages
Data Warehouse
No ratings yet
Data Warehouse
14 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
FoDS Notes - Unit 2
No ratings yet
FoDS Notes - Unit 2
12 pages
Explores The Basic Concepts of ICT Together With 1.1
No ratings yet
Explores The Basic Concepts of ICT Together With 1.1
17 pages
DMDW-Unit I
No ratings yet
DMDW-Unit I
14 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Data Mining and Data Warehousing Notes ct1
No ratings yet
Data Mining and Data Warehousing Notes ct1
12 pages
An Introduction To Data Warehousing and Data Mining
No ratings yet
An Introduction To Data Warehousing and Data Mining
34 pages
Practicalno: 1 Introduction To Database: Data
No ratings yet
Practicalno: 1 Introduction To Database: Data
33 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
Cs 614
No ratings yet
Cs 614
12 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
Lecture 1 - Introductory To Data Analytics
No ratings yet
Lecture 1 - Introductory To Data Analytics
11 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
Amdahl's Law: S (N) T (1) /T (N)
No ratings yet
Amdahl's Law: S (N) T (1) /T (N)
46 pages
RDBMS Stands For Relational Database Management System. It's A Type of Database Management System That Stores Data in
No ratings yet
RDBMS Stands For Relational Database Management System. It's A Type of Database Management System That Stores Data in
2 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
Relational Database Design
No ratings yet
Relational Database Design
9 pages
Leave Management System Project Report
No ratings yet
Leave Management System Project Report
28 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Multi Org Structure 11i
No ratings yet
Multi Org Structure 11i
18 pages
Actix Software Installation Guide
No ratings yet
Actix Software Installation Guide
30 pages
Minor Project (MCA-169) - MCA2022-24 - Format
No ratings yet
Minor Project (MCA-169) - MCA2022-24 - Format
14 pages
Unit - 1 Wireless Network Definition - What Does Wireless Network Mean?
No ratings yet
Unit - 1 Wireless Network Definition - What Does Wireless Network Mean?
21 pages
Objective Questions
No ratings yet
Objective Questions
27 pages
General Syllabus
No ratings yet
General Syllabus
2 pages
KP
0% (1)
KP
13 pages
Unit 3 Digital Notes
No ratings yet
Unit 3 Digital Notes
42 pages
Lecture 02 - Conceptual Phase and ERD
No ratings yet
Lecture 02 - Conceptual Phase and ERD
38 pages
MTA Exam 98-361 Software Development
No ratings yet
MTA Exam 98-361 Software Development
5 pages
Lesson 9: Phase 3: System Design
No ratings yet
Lesson 9: Phase 3: System Design
12 pages
451 Computer Studies Paper 2 Revision Strategy 2023
No ratings yet
451 Computer Studies Paper 2 Revision Strategy 2023
10 pages
Project Synopsis
No ratings yet
Project Synopsis
5 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
A Mobile Cloud Computing System For Emergency Management
No ratings yet
A Mobile Cloud Computing System For Emergency Management
23 pages
Ajay Kumar B - PHP Mysqli Procedural
100% (1)
Ajay Kumar B - PHP Mysqli Procedural
2 pages
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
No ratings yet
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
13 pages
BigData Section1
No ratings yet
BigData Section1
14 pages
Intrust Datasheet 67909
No ratings yet
Intrust Datasheet 67909
2 pages
Chapter One Introduction To Database Concepts: Data, Information, Information System
No ratings yet
Chapter One Introduction To Database Concepts: Data, Information, Information System
24 pages
00 Creating The Driving School Database
No ratings yet
00 Creating The Driving School Database
28 pages
Applets: Unit - V
No ratings yet
Applets: Unit - V
19 pages
Akshata Kharade - Resume - V11
No ratings yet
Akshata Kharade - Resume - V11
1 page
Chapter - 2 Thesaurus Construction and Its Role in Indexing
No ratings yet
Chapter - 2 Thesaurus Construction and Its Role in Indexing
9 pages
Data Redundancy
No ratings yet
Data Redundancy
4 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet

Data Warehousing and Data Mining - Thara - M.Tech Cse

Uploaded by

Data Warehousing and Data Mining - Thara - M.Tech Cse

Uploaded by

[DATA WAREHOUSING AND DATA

MINING – THARA - M.TECH CSE]

TASK 2 : COMPARITIVE ANALYSIS OF THE FOLLOWING TERMS :-

Data Warehousing and Data Mining – Thara - M.Tech CSE $5.80

A simple example : - name = sham

Database System Vs Informational System

Input : Maps, Addresses, Point of Interest – “ Yellow Pages”

Proprietary System Vs Open System

An open systems are free to distribute

Data Warehousing and Data Mining – Thara - M.Tech CSE

TASK 3 : BRIEF INTRODUCTION ABOUT THE FOLLOWING TERMS

Data Warehousing : ( reference : net exam book : harihant publications)

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data Mining: ( reference : net exam book : harihant publications)

Data Warehousing and Data Mining – Thara - M.Tech CSE

Methods are applied

Represented in Mined data

Architecture of Typical data mining

Data mining engine

Database or datawarehouse server

Data cleaning, integration and selection

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data Marting ( reference [7])

DATA marts are created for the following reasons:

 Identify whether a natural functional split within the organization.

Identify functional splits

Data Warehousing and Data Mining – Thara - M.Tech CSE

Meta data ( reference [7] – please see on the top )

Data Warehousing and Data Mining – Thara - M.Tech CSE

Reference : [7] ( please see on the top )

The Weather Temperatur Humidit Wind Pla

Data Warehousing and Data Mining – Thara - M.Tech CSE

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No

If temperature = cool then humidity = normal

Data Warehousing and Data Mining – Thara - M.Tech CSE

Data Warehousing and Data Mining – Thara - M.Tech CSE

You might also like