0% found this document useful (0 votes)
23 views

DBMS, Data Warehousing and Data Mining

This document discusses database management systems (DBMS), data warehousing, and data mining. It defines key terms like bit, byte, field, and record. It describes the functions of a DBMS like storing, manipulating, and presenting data. It outlines characteristics of the DBMS approach like data abstraction and centralized control. It also lists advantages of using a DBMS and classifications of DBMS. Finally, it provides an overview of what constitutes a database including user data, metadata, indexes, and application programs.

Uploaded by

sajjukrish
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

DBMS, Data Warehousing and Data Mining

This document discusses database management systems (DBMS), data warehousing, and data mining. It defines key terms like bit, byte, field, and record. It describes the functions of a DBMS like storing, manipulating, and presenting data. It outlines characteristics of the DBMS approach like data abstraction and centralized control. It also lists advantages of using a DBMS and classifications of DBMS. Finally, it provides an overview of what constitutes a database including user data, metadata, indexes, and application programs.

Uploaded by

sajjukrish
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

DBMS, Data Warehousing

and Data Mining


TERMINOLOGY
• Bit
• Byte
• Field
• Record
• File
• Database
• Entity
• Attribute
• Key field
DBMS
 A database management system (DBMS) is a
set of computer-based application programs
that support the processes of storing,
manipulating, retrieving and presenting the data
within the database.
• Acts as an interface between application
programs & physical data files

 The data within the database are organized into


tables, records, and fields.
Characteristics of DBMS
Approach
• Self-contained nature
• Program-data independence
• Data abstraction
• Support for multiple views
• Centralised control of the data resource
• reduces redundancy
• avoids inconsistencies
• data can be shared
• standards can be enforced
• security restrictions can be applied
• integrity can be maintained
• Sharing of data
• Multiuser transaction processing
Advantages of DB
• Controlling Redundancy
• Restricting Unauthorized Access
• Providing persistent storage for program objects
• Proving storage structures for efficient query
processing
• Providing backup and recovery
• Providing multi-user interfaces
• Representing complex relationships among data
• Enforcing integrity constraints
• Permitting inferencing and actions using rules
• Additional implications of using the DB approach
Classification of DBMS
• Based on Data Model
– Relational, object, object-relational, hierarchical,
network
• Based on number of users
– Single-user and Multi-user
• Number of sites
– Centralized, Distributed, Homogeneous
• Cost
• Types of Access path
• General purpose or special purpose
Actors on the DBMS Scene
– Data administrator (DA)
– Database administrator (DBA)
– Database designers
– Users
– casual end users
– Application programmers
Contents of a Database
• A Database contains:
– User Data
– Metadata
– Indexes
– Applications
User Data
• End-users work directly with the DBMS by entering, updating and
viewing the data. Typically they would use a query language (SQL)
• In a relational DB, data will be generally stored in tables with some
relationships between tables.
• Each table has one or more columns (attributes).
• For example, below is a bank account table.

Customer ID Acct Number Acct Type Date Opened Balance


1001 9987 Checking 10/12/1998 4000.00
1001 9980 Savings 10/12/1998 2000.00
1002 8811 Savings 1/5/1999 10000.00
1003 4422 Checking 10/1/2000 6000.00
1003 4432 Savings 12/11/2000 9000.00
1004 3294 Savings 8/22/1997 500.00
1004 5445 Checking 11/13/1996 800.00
Metadata
• Data about data.
• Data that describes how user’s data are stored in terms
of table name, column name, data type, length, primary
keys, etc.
• Metadata are typically stored in System tables and are
typically only directly accessible by the DBMS or by the
system administrator.
• For example, the metadata for the bank account table,
could be:
Indexes
• Allow users to access a specific record without having to search
through the entire table
– For example, indexes would be used to find all customers who
opened the account before 01/01/2000. In this case the bank account
table is indexed on date opened attribute (see below)
• Indexes provide efficient data access on one hand, but are
expensive to maintain: Updating data requires an extra step:
Index(s) must also be updated.

Customer ID Acct Number Acct Type Date Opened Balance


1004 5445 Checking 11/13/1996 800.00
1004 3294 Savings 8/22/1997 500.00
1001 9987 Checking 10/12/1998 4000.00
1001 9980 Savings 10/12/1998 2000.00
1002 8811 Savings 1/5/1999 10000.00
1003 4422 Checking 10/1/2000 6000.00
1003 4432 Savings 12/11/2000 9000.00
Forms/Report
Generators/Application Programs
• Many DBMS have the capability to handle forms (for users to
enter/access/update data), reports, and other application
components.
– Report is an organized representation, designed to be printed, of
the information in your tables or queries. You can create a report
from a single table or from a query of two or more tables
– Query allows you to ask questions of your information. Database
management system, such as Microsoft Access would use your
questions to generate a subset of he data in your database.
– Form is a convenient way to enter or find information in tables.
• Applications are various programs written in various languages
to access and manipulate the data. Each application is
designed for a specific aspect of a given functional area, e.g.,
payroll application, accounting, etc.
Data Modeling and Database
Design
• Database Schema: The structure of a database that:
– Represents data elements, data types, relationships among data
elements, and constraints on data
– Is independent of any application program
– Typically, changes infrequently
• Data Model:
– A set of primitives for defining the structure of a database.
– A set of operations for specifying retrieval, and updates on a database.
A producer wants to know….
Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers
customers??
Who
Whoare
aremy
mycustomers
customers
What and
andwhat
whatproducts
Whatisisthe
themost
most products
effective are
arethey
theybuying?
effectivedistribution
distribution buying?
channel?
channel?

What
Whatproduct
productprom- Which
prom- Whichcustomers
customers
-otions
-otionshave
havethe
thebiggest are
biggest aremost
mostlikely
likelyto
togo
go
impact
impactononrevenue? to
revenue? tothe
thecompetition
competition??
What
Whatimpact
impactwill
will
new
newproducts/services
products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?
Data problems and difficulties
• Amount of data increases exponentially with
time.
• Data are scattered throughout organizations and
are collected by many individuals using several
methods
• Data security, quality & integrity are critical , yet
are easily jeopardized
• Selecting data mgmnt tools can be a major
problem.
Data warehouse
• The main repository of an organization's
historical data, its corporate memory
• Contains the raw material for
management's decision support system
• A data analyst can perform complex
queries and analysis, such as data mining,
on the information without slowing down
the operational systems
Data warehouse
• In data warehousing, you create stores
of informational data, data that is
extracted from the operational data and
then transformed for decision making.
Data Warehouse
• A data warehouse is a
– subject-oriented
– integrated
– time-varying
– non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
Characteristics
• Organisation: Data are organised by subject & contain
information relevant for DSS only.
• Consistency : Data will be coded in a consistent manner
• Time variant : The data are kept for many years & used for
trends,forecasting, & comparisons
• Non volatile: Once entered into the warehouse, data are not
erased
• Relational: Data warehouse uses a relational structure
• Client/Server:To provide the end user an easy access to its data
• Web based: provide an efficient computing environment for
web based applications
Advantages of data
warehouse
• Enhances end-user access to a wide
variety of data.
• Decision support system users can obtain
specified trend reports, e.g. the item with
the most sales in a particular
area/country within the last two years.
• A data warehouse can be a significant
enabler of commercial business
applications, most notably CRM
Operational
data

Historical
data

Operational Extract
data Data
& Warehouse
Transform
External
data •Queries
•Reports
External
•OLAP
data
•Data mining
Capabilities of data mining
 Automated prediction of trends and behaviors.
Data mining automates the process of finding
predictive information in large databases.
A typical example of a predictive problem is
targeted marketing. Data mining uses data on
past promotional mailings to identify the targets
most likely to maximize return on investment in
future mailings.
Contd..
 Automated discovery of previously unknown
patterns.
Data mining tools sweep through databases
and identify previously hidden patterns in one
step.
An example of pattern discovery is the
analysis of retail sales data to identify
seemingly unrelated products that are often
purchased together.
Techniques
• Case based reasoning
• Neural computing
• Intelligent agents
• Other tools
Decision Trees
Genetic Algorithms
Nearest neighbor method
Rule induction
Types of information obtained
from data mining
• Associations
• Sequences
• Classifications
• Clustering
• forecasting
Associations
• Occurrences linked to a single event
Sequence
• Events are linked over time
Classification
• Recognizes patterns that describe the
group to which an item belongs by
examining existing items that have been
classified and by inferring a set of rules
Clustering
• Works similar to classification when no
groups have yet been defined
Forecasting
• Uses predictions .
• Uses a series of existing values to forecast
what other values will be
• Retailing & Sales
• Banking
• Manufacturing & production
• Insurance
• Computer hardware & software
• Policework
• Government & defense
• Airlines
• Broadcasting

You might also like