IM III Unit Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Unit III

DATABASE MANAGEMENT SYSTEMS

UNIT III DATABASE MANAGEMENT SYSTEMS (syllabus)

DBMS – types and evolution, RDBMS, OODBMS, RODBMS, Data warehousing, Data Mart,
Data mining

Database - defined
A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database management
system (DBMS).
What is Database?
A database is a systematic collection of data. They support electronic storage and manipulation
of data. Databases make data management easy.
Together, the data and the DBMS, along with the applications that are associated with them, are
referred to as a database system, often shortened to just database.
Let us discuss a database example: Facebook. It needs to store, manipulate, and present data
related to members, their friends, member activities, messages, advertisements, and a lot more.
We can provide a countless number of examples for the usage of databases.
Types of Databases
Here are some popular types of databases.
1. Distributed databases:
A distributed database is a type of database that has contributions from the common database and
information captured by local computers. In this type of database system, the data is not in one
place and is distributed at various organizations.
2. Relational databases:
This type of database defines database relationships in the form of tables. It is also called
Relational DBMS, which is the most popular DBMS type in the market. Database example of the
RDBMS system include MySQL, Oracle, and Microsoft SQL Server database.
3. Object-oriented databases:
This type of computers database supports the storage of all data types. The data is stored in the
form of objects. The objects to be held in the database have attributes and methods that define
what to do with the data. PostgreSQL is an example of an object-oriented relational DBMS.
4. Centralized database:
It is a centralized location, and users from different backgrounds can access this data. This type
of computers databases store application procedures that help users access the data even from a
remote location.
5. Data warehouses:
Data Warehouse is to facilitate a single version of truth for a company for decision making and
forecasting. A Data warehouse is an information system that contains historical and commutative
data from single or multiple sources. Data Warehouse concept simplifies the reporting and
analysis process of the organization.
6. NoSQL databases:
NoSQL database is used for large sets of distributed data. There are a few big data performance
problems that are effectively handled by relational databases. This type of computers database is
very efficient in analyzing large-size unstructured data.
7. Graph databases:
A graph-oriented database uses graph theory to store, map, and query relationships. These kinds
of computers databases are mostly used for analyzing interconnections. For example, an
organization can use a graph database to mine data about customers from social media.
8. OLTP databases:
OLTP another database type which able to perform fast query processing and maintaining data
integrity in multi-access environments.
9. Personal database:
A personal database is used to store data stored on personal computers that are smaller and easily
manageable. The data is mostly used by the same department of the company and is accessed by
a small group of people.
10. Hierarchical:
This type of DBMS employs the “parent-child” relationship of storing data. Its structure is like a
tree with nodes representing records and branches representing fields. The windows registry used
in Windows XP is a hierarchical database example.
11. Network DBMS:
This type of DBMS supports many-to-many relations. It usually results in complex database
structures. RDM Server is an example of database management system that implements the
network model.
Some of the latest databases include
12. Open-source databases:
This kind of database stored information related to operations. It is mainly used in the field of
marketing, employee relations, customer service, of databases.
13. Cloud databases:
A cloud database is a database which is optimized or built for such a virtualized environment.
There are so many advantages of a cloud database, some of which can pay for storage capacity
and bandwidth. It also offers scalability on-demand, along with high availability.
14. Self-driving databases:
The newest and most groundbreaking type of database, self-driving databases (also known as
autonomous databases) are cloud-based and use machine learning to automate database
tuning, security, backups, updates, and other routine management tasks traditionally
performed by database administrators.
15. Multimodal database:
The multimodal database is a type of data processing platform that supports multiple data models
that define how the certain knowledge and information in a database should be organized and
arranged.
16. Document/JSON database:
In a document-oriented database, the data is kept in document collections, usually using the
XML, JSON, BSON formats. One record can store as much data as you want, in any data type
(or types) you prefer.

Database Components

There are five main components of a database:


Hardware:
The hardware consists of physical, electronic devices like computers, I/O devices, storage
devices, etc. This offers the interface between computers and real-world systems.
Software:
This is a set of programs used to manage and control the overall database. This includes the
database software itself, the Operating System, the network software used to share the data
among users, and the application programs for accessing data in the database.
Data:
Data is a raw and unorganized fact that is required to be processed to make it meaningful. Data
can be simple at the same time unorganized unless it is organized. Generally, data comprises
facts, observations, perceptions, numbers, characters, symbols, images, etc.
Procedure:
Procedure are a set of instructions and rules that help you to use the DBMS. It is designing and
running the database using documented methods, which allows you to guide the users who
operate and manage it.
Database Access Language:
Database Access language is used to access the data to and from the database, enter new data,
update already existing data, or retrieve required data from DBMS. The user writes some specific
commands in a database access language and submits these to the database.

DBMS
DBMS stands for Database Management System. We can break it like this DBMS =
Database + Management System.
A database management system stores data in such a way that it becomes easier to
retrieve, manipulate, and produce information. DBMS is a collection of inter-related data
and set of programs to store & access those data in an easy and effective manner.
What is database software?
Database software is used to create, edit, and maintain database files and records, enabling easier
file and record creation, data entry, data editing, updating, and reporting. The software also
handles data storage, backup and reporting, multi-access control, and security. Database software
is sometimes also referred to as a “database management system” (DBMS).

Database software makes data management simpler by enabling users to store data in a
structured form and then access it. It typically has a graphical interface to help create and
manage the data and, in some cases, users can construct their own databases by using database
software.

What is a Database Management System (DBMS)?


Database Management System (DBMS) is a collection of programs that enable its users to
access databases, manipulate data, report, and represent data. It also helps to control access to the
database. Database Management Systems are not a new concept and, as such, had been first
implemented in the 1960s.
Evolution of DBMS
Charles Bachman’s Integrated Data Store (IDS) is said to be the first DBMS in history. With
time database, technologies evolved a lot, while usage and expected functionalities of databases
increased immensely.
History of Database Management System
Here, are the important landmarks from the history:
• 1960 – Charles Bachman designed first DBMS system.
• 1970 – Codd introduced IBM’S Information Management System (IMS).
• 1976 – Peter Chen coined and defined the Entity-relationship model also know as the ER
model.
• 1980 – Relational model becomes a widely accepted database component.
• 1985 – Object-oriented DBMS develops.
• 1990 – Incorporation of object-orientation in relational DBMS.
• 1991 – Microsoft ships MS access, a personal DBMS and that displaces all other personal
DBMS products.
• 1995 – First Internet database applications.
• 1997 – XML applied to database processing. Many vendors begin to integrate XML into
DBMS products.
Advantages of DBMS
• DBMS offers a variety of techniques to store & retrieve data.
• DBMS serves as an efficient handler to balance the needs of multiple applications using
the same data.
• Uniform administration procedures for data.
• Application programmers never exposed to details of data representation and storage.
• A DBMS uses various powerful functions to store and retrieve data efficiently.
• Offers Data Integrity and Security.
• The DBMS implies integrity constraints to get a high level of protection against
prohibited access to data.
• A DBMS schedules concurrent access to the data in such a manner that only one user can
access the same data at a time.
• Reduced Application Development Time.
Disadvantage of DBMS
DBMS may offer plenty of advantages but, it has certain flaws-
• Cost of Hardware and Software of a DBMS is quite high which increases the budget of
your organization.
• Most database management systems are often complex systems, so the training for users
to use the DBMS is required.
• In some organizations, all data is integrated into a single database which can be damaged
because of electric failure or database is corrupted on the storage media.
• Use of the same program at a time by many users sometimes lead to the loss of some
data.
• DBMS can’t perform sophisticated calculations.

A database typically requires a comprehensive database software program known as a database


management system (DBMS). A DBMS serves as an interface between the database and its end
users or programs, allowing users to retrieve, update, and manage how the information is
organized and optimized. A DBMS also facilitates oversight and control of databases, enabling a
variety of administrative operations such as performance monitoring, tuning, and backup and
recovery.

Examples of popular database software or DBMSs include MySQL, Microsoft Access,


Microsoft SQL Server, FileMaker Pro, Oracle Database, and dBASE.

Need of DBMS

Database systems are basically developed for large amount of data. When dealing with
huge amount of data, there are two things that require optimization:
(i) Storage of data and (ii) retrieval of data.
Storage: According to the principles of database systems, the data is stored in such a way
that it acquires lot less space as the redundant data (duplicate data) has been removed
before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place
(these places are called tables we will learn them later) and salary account data at another
place, in that case if the customer information such as customer name, address etc. are
stored at both places then this is just a wastage of storage (redundancy/ duplication of
data), to organize the data in a better way the information should be stored at one place
and both the accounts should be linked to that information somehow. The same thing we
achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database
systems ensure that the data is retrieved as quickly as possible.
Purpose of Database Systems
The main purpose of database systems is to manage the data. Consider a university that
keeps the data of students, teachers, courses, books etc. To manage this data we need to
store this data somewhere where we can add new data, delete unused data, update
outdated data, retrieve data, to perform these operations on data we need a Database
management system that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.
Applications where we use Database Management Systems are:

• Telecom: There is a database to keeps track of the information regarding calls


made, network usage, customer details etc. Without the database systems it is hard
to maintain that huge amount of data that keeps updating every millisecond.
• Industry: Where it is a manufacturing unit, warehouse or distribution centre, each
one needs a database to keep the records of ins and outs. For example distribution
centre should keep a track of the product units that supplied into the centre as well
as the products that got delivered out from the distribution centre on each day; this
is where DBMS comes into picture.
• Banking System: For storing customer info, tracking day to day credit and debit
transactions, generating bank statements etc. All this work has been done with the
help of Database management systems.
• Sales: To store customer information, production information and invoice details.
• Airlines: To travel though airlines, we make early reservations, this reservation
information along with flight schedule is stored in database.
• Education sector: Database systems are frequently used in schools and colleges to
store and retrieve the data regarding student details, staff details, course details,
exam details, payroll data, attendance details, fees details etc. There is a hell lot
amount of inter-related data that needs to be stored and retrieved in an efficient
manner.
• Online shopping: You must be aware of the online shopping websites such as
Amazon, Flipkart etc. These sites store the product information, your addresses and
preferences, credit details and provide you the relevant list of products based on
your query. All this involves a Database management system.

Drawbacks of File system


• Data redundancy: Data redundancy refers to the duplication of data, lets say we
are managing the data of a college where a student is enrolled for two courses, the
same student details in such case will be stored twice, which will take more storage
than needed. Data redundancy often leads to higher storage costs and poor access
time.
• Data inconsistency: Data redundancy leads to data inconsistency, lets take the
same example that we have taken above, a student is enrolled for two courses and
we have student address stored twice, now lets say student requests to change his
address, if the address is changed at one place and not on all the records then this
can lead to data inconsistency.
• Data Isolation: Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate data
is difficult.
• Dependency on application programs: Changing files would lead to change in
application programs.
• Atomicity issues: Atomicity of a transaction refers to “All or nothing”, which
means either all the operations in a transaction executes or none.
• Data Security: Data should be secured from unauthorised access, for example a
student in a college should not be able to see the payroll details of the teachers, such
kind of security constraints are difficult to apply in file processing systems.
Advantage of DBMS over file system
There are several advantages of Database management system over file system. Few of
them are as follows:

• No redundant data: Redundancy removed by data normalization. No data


duplication saves storage and improves access time.
• Data Consistency and Integrity: As we discussed earlier the root cause of data
inconsistency is data redundancy, since data normalization takes care of the data
redundancy, data inconsistency also been taken care of as part of it
• Data Security: It is easier to apply access constraints in database systems so that
only authorized user is able to access the data. Each user has a different set of
access thus data is secured from the issues such as identity theft, data leaks and
misuse of data.
• Privacy: Limited access means privacy of data.
• Easy access to data – Database systems manages data in such a way so that the
data is easily accessible with fast response times.
• Easy recovery: Since database systems keeps the backup of data, it is easier to do a
full recovery of data in case of a failure.
• Flexible: Database systems are more flexible than file processing systems.

DBMS Architecture
The architecture of DBMS depends on the computer system on which it runs. For
example, in a client-server DBMS architecture, the database systems at server machine
can run several requests made by client machine. We will understand this communication
with the help of diagrams.
Types of DBMS Architecture
There are three
types of DBMS architecture:
1. Single tier architecture
2. Two tier architecture
3. Three tier architecture
1. Single tier architecture
In this type of architecture, the database is readily available on the client machine, any
request made by client doesn’t require a network connection to perform the action on the
database.
For example, lets say you want to fetch the records of employee from the database and
the database is available on your computer system, so the request to fetch employee
details will be done by your computer and the records will be fetched from the database
by your computer as well. This type of system is generally referred as local database
system.
2. Two tier architecture

In two-tier architecture, the Database system is present at the server machine and the
DBMS application is present at the client machine, these two machines are connected
with each other through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a
query language like sql, the server perform the request on the database and returns the
result back to the client. The application connection interface such as JDBC, ODBC are
used for the interaction between server and client.
3. Three tier architecture

In three-tier architecture, another layer is present between the client machine and server
machine. In this architecture, the client application doesn’t communicate directly with the
database systems present at the server machine, rather the client application
communicates with server application and the server application internally communicates
with the database system present at the server
.

DBMS – Three Level Architecture


DBMS Three Level Architecture Diagram

This architecture has three levels:


1. External level
2. Conceptual level
3. Internal level
1. External level
It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with
the help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view
level after it has been fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This
level is maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the
data. This is the lowest level of the architecture.
View of Data in DBMS
Abstraction is one of the main features of database systems. Hiding irrelevant details
from user and providing abstract view of data to users, helps in easy and efficient user-
database interaction.
To fully understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema.
Data Abstraction in DBMS
Database systems are made-up of complex data structures. To ease the user interaction
with database, the developers hide internal irrelevant details from users. This process of
hiding irrelevant details from user is called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is actually
stored in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes
what data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction
with database system.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.)
in memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their
data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.
At view level, user just interact with system with the help of GUI and enter the details at
the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.
DBMS Schema
Definition of schema: Design of a database is called the schema. Schema is of three
types: Physical schema, logical schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship
between three tables: Course, Student and Section. The diagram only shows the design of
the database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.

The design of a database at physical level is called physical schema, how the data stored
in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of
data records gets stored in data structures, however the internal details such as
implementation of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is
called instance of database. Database schema defines the variable declarations in tables
that belong to a particular database; the value of these variables at a moment of time is
called the instance of that database.
For example, lets say we have a single table student in the database, today the table has
100 records, so today the instance of the database has 100 records. Lets say we are going
to add another 100 records in this table by tomorrow so the instance of database
tomorrow will have 200 records in table. In short, at a particular moment the data stored
in database is called the instance, that changes over time when we add or delete data from
the database.
DBMS languages
Database languages are used to read, update and store data in a database. There are
several such languages that can be used for this purpose; one of them is SQL (Structured
Query Language).

Types of DBMS languages:


Data Definition Language (DDL)
DDL is used for specifying the database schema. It is used for creating tables, schema,
indexes, constraints etc. in database. Lets see the operations that we can perform on
database using DDL:

• To create the database instance – CREATE


• To alter the structure of database – ALTER
• To drop database instances – DROP
• To delete tables in a database instance – TRUNCATE
• To rename database instances – RENAME
• To drop objects from database such as tables – DROP
• To Comment – Comment
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
Data Manipulation Language (DML)
DML is used for accessing and manipulating data in a database. The following operations
on database comes under DML:

• To read records from table(s) – SELECT


• To insert record(s) into the table(s) – INSERT
• Update the data in table(s) – UPDATE
• Delete all the records from the table – DELETE
Data Control language (DCL)
DCL is used for granting and revoking user access on a database –

• To grant access to user – GRANT


• To revoke access from user – REVOKE
In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database
language such as SQL.
Transaction Control Language(TCL)
The changes in the database that we made using DML commands are either performed or
rollbacked using TCL.
• To persist the changes made by DML commands in database – COMMIT
• To rollback the changes made to the database – ROLLBACK

Data models in DBMS


Data Model is a logical structure of Database. It describes the design of database to reflect
entities, attributes, relationship among data, constrains etc.

Types of Data Models


There are several types of data models in DBMS.
I. Object based logical Models – Describe data at the conceptual and view levels.
1. E-R Model 2. Object oriented Model (OODBMS)
II. Record based logical Models – Like Object based model, they also describe
data at the conceptual and view levels. These models specify logical structure
of database with records, fields and attributes.
1. Relational Model (RDBMS) 2. Hierarchical Model (HDBMS)
3. Network Model- Network Model is same as hierarchical model except that it has
graph-like structure rather than a tree-based structure. Unlike hierarchical model,
this model allows each record to have more than one parent record.
4. Relation- Object DBMS (RODBMS)

Entity Relationship Diagram – ER Diagram in DBMS


An Entity–relationship model (ER model) describes the structure of a database with the
help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER
model is a design or blueprint of a database that can later be implemented as a database.
The main components of E-R model are: entity set and relationship set.
What is an Entity Relationship Diagram (ER Diagram)?
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in database, so by showing relationship among tables and their
attributes, ER diagram shows the complete logical structure of a database.
We have a detailed notes on this in the II Unit notes (pls refer II unit notes)
Relational model in DBMS (RDBMS)
In relational model, the data and relationships are represented by collection of inter-
related tables. Each table is a group of column and rows, where column represents
attribute of an entity and rows represents records.
Sample relationship Model: Student table with 3 columns and four records.
Table: Student

Stu_Id Stu_Name Stu_Age

111 Ashish 23

123 Saurav 22

169 Lester 24

234 Lou 26

Table: Course

Stu_Id Course_Id Course_Name


111 C01 Science
111 C02 DBMS
169 C22 Java
169 C39 Computer
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id
Networks
& Course_Name are attributes of table Course. The rows with values are the records
(commonly known as tuples).
Hierarchical model in DBMS (HDBMS)
In hierarchical model, data is organized into a tree like structure with each record is
having one parent record and many children. The main drawback of this model is that, it
can have only one to many relationships between nodes.
Note: Hierarchical models are rarely used now.
Sample Hierarchical Model Diagram:
Lets say we have few students and few courses and a course can be assigned to a single
student only, however a student take any number of courses so this relationship becomes
one to many.
Example of hierarchical data represented as
relational tables: The above hierarchical model can be represented as relational tables like
this: Student Table
Stu_Id Stu_Name Stu_Age

123 Steve 29

367 Chaitanya 27

234 Ajeet 28
Course Table:

Course_Id Course_Name Stu_Id

C01 Cobol 123

C21 Java 367

C22 Perl 367

C33 JQuery 234

Types of Data Model


Following are the types of Data Model,
1. Hierarchical Model
2. Relational Model
3. Network Database Model
4. Entity Relationship Model
5. Object Model
1. Hierarchical Model

• Hierarchical model was developed by IBM and North American Rockwell known as
Information Management System.
• It represents the data in a hierarchical tree structure.
• This model is the first DBMS model.
• In this model, the data is sorted hierarchically.
• It uses pointer to navigate between the stored data.

2. Relational Model

• Relational model is based on first-order predicate logic.


• This model was first proposed by E. F. Codd.
• It represents data as relations or tables.
• Relational database simplifies the database structure by making use of tables and columns.
3. Network Database Model

• Network Database Model is same like Hierarchical Model, but the only difference is that it
allows a record to have more than one parent.
• In this model, there is no need of parent to child association like the hierarchical model.
• It replaces the hierarchical tree with a graph.
• It represents the data as record types and one-to-many relationship.
• This model is easy to design and understand.

4. Entity Relationship Model

• Entity Relationship Model is a high-level data model.


• It was developed by Chen in 1976.
• This model is useful in developing a conceptual design for the database.
• It is very simple and easy to design logical view of data.
• The developer can easily understand the system by looking at an ER model constructed.

In this diagram,

• Rectangle represents the entities. Eg. Doctor and Patient.


• Ellipse represents the attributes. Eg. DocId, Dname, PId, Pname. Attribute describes each
entity becomes a major part of the data stored in the database.
• Diamond represents the relationship in ER diagrams. Eg. Doctor diagnoses the Patient.

5. Object Model

• Object model stores the data in the form of objects, classes and inheritance.
• This model handles more complex applications, such as Geographic Information System
(GIS), scientific experiments, engineering design and manufacturing.
• It is used in File Management System.
• It represents real world objects, attributes and behaviors.
• It provides a clear modular structure.
• It is easy to maintain and modify the existing code.

RDBMS Concepts
RDBMS stands for relational database management system. A relational model can be
represented as a table of rows and columns. A relational database has following major
components:
1. Table 5. Instance
2. Record or Tuple 6. Schema
3. Field or Column name or Attribute 7. Keys
4. Domain

1. Table
A table is a collection of data represented in rows and columns. Each table has a name in
database. For example, the following table “STUDENT” stores the information of
students in database.
Table: STUDENT

Student_Id Student_Name Student_Addr Student_Age

101 Chaitanya Dayal Bagh, Agra 27


102 Ajeet Delhi 26
103 Rahul Gurgaon 24
104 Shubham Chennai 25
2. Record or Tuple
Each row of a table is known as record. It is also known as tuple. For example, the
following row is a record that we have taken from the above table.

102 Ajeet Delhi 26


3. Field or Column name or Attribute
The above table “STUDENT” has four fields (or attributes): Student_Id, Student_Name,
Student_Addr & Student_Age.
4. Domain
A domain is a set of permitted values for an attribute in table. For example, a domain of
month-of-year can accept January, February,…December as values, a domain of dates
can accept all possible valid dates etc. We specify domain of attribute while creating a
table.
An attribute cannot accept values that are outside of their domains. For example, In the
above table “STUDENT”, the Student_Id field has integer domain so that field cannot
accept values that are not integers for example, Student_Id cannot has values like, “First”,
10.11 etc.
5. Instance
The data stored in database at a particular moment of time is called instance of
database. The value of these variables at a moment of time is called the instance of that
database.
6. Schema
Database schema defines the variable declarations in tables that belong to a particular
database. The design of a database at physical level is called physical schema, how the
data stored in blocks of storage is described at this level Design of database at logical
level is called logical schema. Design of database at view level is called view schema.
This generally describes end user interaction with database systems.
7. Keys
Key plays an important role in relational database; it is used for identifying unique rows
from table. It also establishes relationship among tables.
Types of keys in DBMS
Primary Key – A primary is a column or set of columns in a table that uniquely
identifies tuples (rows) in that table.
Super Key – A super key is a set of one of more columns (attributes) to uniquely identify
rows in a table.
Candidate Key – A super key with no redundant attribute is known as candidate key
Alternate Key – Out of all candidate keys, only one gets selected as primary key,
remaining keys are known as alternate or secondary keys.
Composite Key – A key that consists of more than one attribute to uniquely identify
rows (also known as records & tuples) in a table is called composite key.
Foreign Key – Foreign keys are the columns of a table that points to the primary key of
another table. They act as a cross-reference between tables.

Difference between RDBMS and OODBMS


RDBMS and OODBMS are database management systems. RDBMS uses tables to represent
data and their relationships whereas OODBMS represents data in form of objects similar to
Object Oriented Programming.
Following are the important differences between RDBMS and OODBMS.

Sr. Key RDBMS OODBMS


No.

1 Definition RDBMS stands for Relational DataBase OODBMS stands for Object Oriented DataBase
Management System. Management System.

2 Data Data is stored as entities defined in Data is stored as objects.


Management tabular format.

3 Data RDBMS handles simple data. OODBMS handles large and complex data.
Complexity

4 Term An entity refers to collection of similar An class refers to group of objects having common
items having same definition. relationships, behaviors and properties.

5 Data RDBMS handles only data. OODBMS handles both data and functions
Handling operating on that data.

6 Objective To keep data independent from To implement data encapsulation.


application program.

7 Key A primary key identifies in object in a Object Id, OID represents an object uniquely in
table uniquely. group of objects.
Data Warehousing
A data warehouse is constructed by integrating data from multiple heterogeneous sources. It
supports analytical reporting, structured and/or ad hoc queries and decision making.

Understanding a Data Warehouse


• A data warehouse is a database, which is kept separate from the organization's operational
database.
• There is no frequent updating done in a data warehouse.
• It possesses consolidated historical data, which helps the organization to analyze its business.
• A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.
• Data warehouse systems help in the integration of diversity of application systems.
• A data warehouse system helps in consolidated historical data analysis.

Data Warehouse Features


The key features of a data warehouse are discussed below −
• Subject Oriented − A data warehouse is subject oriented because it provides information
around a subject rather than the organization's ongoing operations. These subjects can be
product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the
ongoing operations, rather it focuses on modelling and analysis of data for decision making.
• Integrated − A data warehouse is constructed by integrating data from heterogeneous
sources such as relational databases, flat files, etc. This integration enhances the effective
analysis of data.
• Time Variant − The data collected in a data warehouse is identified with a particular time
period. The data in a data warehouse provides information from the historical point of view.
• Non-volatile − Non-volatile means the previous data is not erased when new data is added to
it. A data warehouse is kept separate from the operational database and therefore frequent
changes in operational database is not reflected in the data warehouse.

Data Warehouse Applications


As discussed before, a data warehouse helps business executives to organize, analyze, and use
their data for decision making. A data warehouse serves as a sole part of a plan-execute-assess
"closed-loop" feedback system for the enterprise management. Data warehouses are widely
used in the following fields −

• Financial services
• Banking services
• Consumer goods
• Retail sectors
• Controlled manufacturing

Types of Data Warehouse


Information processing, analytical processing, and data mining are the three types of data
warehouse applications that are discussed below −
• Information Processing − A data warehouse allows to process the data stored in it. The data
can be processed by means of querying, basic statistical analysis, reporting using crosstabs,
tables, charts, or graphs.
• Analytical Processing − A data warehouse supports analytical processing of the information
stored in it. The data can be analyzed by means of basic OLAP operations, including slice-
and-dice, drill down, drill up, and pivoting.
• Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction. These
mining results can be presented using the visualization tools.

What is Data Warehousing?


Data warehousing is the process of constructing and using a data warehouse. A data warehouse
is constructed by integrating data from multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves
data cleaning, data integration, and data consolidations.

Using Data Warehouse Information


There are decision support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively. They can
gather data, analyze it, and take decisions based on the information present in the warehouse.
The information gathered in a warehouse can be used in any of the following domains −
• Tuning Production Strategies − The product strategies can be well tuned by repositioning
the products and managing the product portfolios by comparing the sales quarterly or yearly.
• Customer Analysis − Customer analysis is done by analyzing the customer's buying
preferences, buying time, budget cycles, etc.
• Operations Analysis − Data warehousing also helps in customer relationship management,
and making environmental corrections. The information also allows us to analyze business
operations.

Functions of Data Warehouse Tools and Utilities


The following are the functions of data warehouse tools and utilities −
• Data Extraction − Involves gathering data from multiple heterogeneous sources.
• Data Cleaning − Involves finding and correcting the errors in data.
• Data Transformation − Involves converting the data from legacy format to warehouse format.
• Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building
indices and partitions.
• Refreshing − Involves updating from data sources to warehouse.

What is OLAP?
Online Analytical Processing, a category of software tools which provide analysis of data for
business decisions. OLAP systems allow users to analyze database information from multiple
database systems at one time.
The primary objective is data analysis and not data processing.
Example of OLAP
Any Datawarehouse system is an OLAP system. Uses of OLAP are as follows
• A company might compare their mobile phone sales in September with sales in October,
then compare those results with another location which may be stored in a sperate
database.
• Amazon analyzes purchases by its customers to come up with a personalized homepage
with products which likely interest to their customer.
What is OLTP?
Online transaction processing shortly known as OLTP supports transaction-oriented applications
in a 3-tier architecture. OLTP administers day to day transaction of an organization.
The primary objective is data processing and not data analysis
Example of OLTP system
An example of OLTP system is ATM center. Assume that a couple has a joint account with a
bank. One day both simultaneously reach different ATM centers at precisely the same time and
want to withdraw total amount present in their bank account.
However, the person that completes authentication process first will be able to get money. In this
case, OLTP system makes sure that withdrawn amount will be never more than the amount
present in the bank. The key to note here is that OLTP systems are optimized for transactional
superiority instead data analysis.
KEY DIFFERENCE between OLTP and OLAP:
• Online Analytical Processing (OLAP) is a category of software tools that analyze data
stored in a database whereas Online transaction processing (OLTP) supports transaction-
oriented applications in a 3-tier architecture.
• OLAP creates a single platform for all type of business analysis needs which includes
planning, budgeting, forecasting, and analysis while OLTP is useful to administer day to
day transactions of an organization.
• OLAP is characterized by a large volume of data while OLTP is characterized by large
numbers of short online transactions.
• In OLAP, data warehouse is created uniquely so that it can integrate different data
sources for building a consolidated database whereas OLTP uses traditional DBMS.

Below is the difference between OLAP and OLTP in Data Warehouse:

OLTP vs OLAP

Parameters OLTP OLAP

It is an online transactional system. It manages OLAP is an online analysis and data retrieving
Process
database modification. process.

It is characterized by large numbers of short


Characteristic It is characterized by a large volume of data.
online transactions.

OLAP is an online database query management


Functionality OLTP is an online database modifying system.
system.

Method OLTP uses traditional DBMS. OLAP uses the data warehouse.

Insert, Update, and Delete information from


Query Mostly select operations
the database.

Table Tables in OLTP database are normalized. Tables in OLAP database are not normalized.

OLTP and its transactions are the sources of Different OLTP databases become the source of
Source
data. data for OLAP.

OLTP database must maintain data integrity OLAP database does not get frequently modified.
Data Integrity
constraint. Hence, data integrity is not an issue.
Parameters OLTP OLAP

Response time It’s response time is in millisecond. Response time in seconds to minutes.

The data in the OLTP database is always The data in OLAP process might not be
Data quality
detailed and organized. organized.

It helps to control and run fundamental It helps with planning, problem-solving, and
Usefulness
business tasks. decision support.

Operation Allow read/write operations. Only read and rarely write.

Audience It is a market orientated process. It is a customer orientated process.

Queries in this process are standardized


Query Type Complex queries involving aggregations.
and simple.

Complete backup of the data combined with OLAP only need a backup from time to time.
Back-up
incremental backups. Backup is not important compared to OLTP

DB design is application oriented. Example: DB design is subject oriented. Example:


Design Database design changes with industry like Database design changes with subjects like sales,
Retail, Airline, Banking, etc. marketing, purchasing, etc.

It is used by Data critical users like clerk, Used by Data knowledge users like workers,
User type
DBA & Data Base professionals. managers, and CEO.

Designed for analysis of business measures by


Purpose Designed for real time business operations.
category and attributes.

Performance Transaction throughput is the performance


Query throughput is the performance metric.
metric metric

This kind of Database users allows thousands This kind of Database allows only hundreds of
Number of users
of users. users.

It helps to Increase user’s self-service and Help to Increase productivity of the business
Productivity
productivity analysts.

An OLAP cube is not an open SQL server data


Data Warehouses historically have been a
warehouse. Therefore, technical knowledge and
Challenge development project which may prove costly
experience is essential to manage the OLAP
to build.
server.
Parameters OLTP OLAP

It ensures that response to the query is quicker


Process It provides fast result for daily used data.
consistently.

It lets the user create a view with the help of a


Characteristic It is easy to create and maintain.
spreadsheet.

A data warehouse is created uniquely so that it


OLTP is designed to have fast response time,
Style can integrate different data sources for building a
low data redundancy and is normalized.
consolidated database

Data Mart
Data marts contain a subset of organization-wide data that is valuable to specific groups of
people in an organization. In other words, a data mart contains only those data that is specific to
a particular group. For example, the marketing data mart may contain only data related to items,
customers, and sales. Data marts are confined to subjects.

What is Data Mart?


A Data Mart is focused on a single functional area of an organization and contains a subset of
data stored in a Data Warehouse. A Data Mart is a condensed version of Data Warehouse and is
designed for use by a specific department, unit or set of users in an organization.
E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an
organization.
Data Mart usually draws data from only a few sources compared to a Data warehouse. Data
marts are small in size and are more flexible compared to a Data warehouse.

The following figures show a graphical representation of data marts.


Why do we need Data Mart?
• Data Mart helps to enhance user’s response time due to reduction in volume of data
• It provides easy access to frequently requested data.
• Data mart are simpler to implement when compared to corporate Data warehouse. At the
same time, the cost of implementing Data Mart is certainly lower compared with
implementing a full data warehouse.
• Compared to Data Warehouse, a data mart is agile. In case of change in model, data mart
can be built quicker due to a smaller size.
• A Data mart is defined by a single Subject Matter Expert. On the contrary data warehouse
is defined by interdisciplinary SME from a variety of domains. Hence, Data mart is more
open to change compared to Data warehouse.
• Data is partitioned and allows very granular access control privileges.
• Data can be segmented and stored on different hardware/software platforms.
Types of Data Mart
There are three main types of data mart:
1. Dependent: Dependent data marts are created by drawing data directly from operational, external or
both sources.
2. Independent: Independent data mart is created without the use of a central data warehouse.
3. Hybrid: This type of data marts can take data from data warehouses or operational systems.
1. Dependent Data Mart
begins with a common source, but they are
scrapped, and mostly junked.
A dependent data mart allows sourcing
organization’s data from a single Data
Warehouse. It is one of the data mart
example which offers the benefit of
centralization. If you need to develop one or
more physical data marts, then you need to
configure them as dependent data marts.
Dependent Data Mart in data warehouse can
be built in two different ways. Either where
a user can access both the data mart and data
warehouse, depending on need, or where
access is limited only to the data mart. The
second approach is not optimal as it
produces sometimes referred to as a data
junkyard. In the data junkyard, all data

Dependent Data Mart


2. Independent Data Mart
An independent data mart is created without
the use of central Data warehouse. This kind
of Data Mart is an ideal option for smaller
groups within an organization.
An independent data mart has neither a
relationship with the enterprise data
warehouse nor with any other data mart. In
Independent data mart, the data is input
separately, and its analyses are also
performed autonomously.
Implementation of independent data marts is
antithetical to the motivation for building a
data warehouse. First of all, you need a
consistent, centralized store of enterprise
data which can be analyzed by multiple
users with different interests who want
widely varying information. Independent Data Mart

3. Hybrid Data Mart:


A hybrid data mart combines input from
sources apart from Data warehouse. This
could be helpful when you want ad-hoc
integration, like after a new group or product
is added to the organization.
It is the best data mart example suited for
multiple database environments and fast
implementation turnaround for any
organization. It also requires least data
cleansing effort. Hybrid Data mart also
supports large storage structures, and it is
best suited for flexible for smaller data-
centric applications.

Hybrid Data Mart


Steps in Implementing a Data mart
Implementing a Data Mart is a rewarding but complex procedure. Here are the detailed steps to
implement a Data Mart:
Designing
Designing is the first phase of Data Mart implementation. It covers all the tasks between
initiating the request for a data mart to gathering information about the requirements. Finally, we
create the logical and physical Data Mart design.
The design step involves the following tasks:
• Gathering the business & technical requirements and Identifying data sources.
• Selecting the appropriate subset of data.
• Designing the logical and physical structure of the data mart.
Data could be partitioned based on following criteria:
• Date
• Business or Functional Unit
• Geography
• Any combination of above
Data could be partitioned at the application or DBMS level. Though it is recommended to
partition at the Application level as it allows different data models each year with the change in
business environment.

What Products and Technologies Do You Need?


A simple pen and paper would suffice. Though tools that help you create UML or ER
diagram would also append meta data into your logical and physical designs.
Constructing
This is the second phase of implementation. It involves creating the physical database and the
logical structures.
This step involves the following tasks:
• Implementing the physical database designed in the earlier phase. For instance, database
schema objects like table, indexes, views, etc. are created.
What Products and Technologies Do You Need?
You need a relational database management system to construct a data mart. RDBMS have
several features that are required for the success of a Data Mart.
• Storage management: An RDBMS stores and manages the data to create, add, and
delete data.
• Fast data access: With a SQL query you can easily access data based on certain
conditions/filters.
• Data protection: The RDBMS system also offers a way to recover from system failures
such as power failures. It also allows restoring data from these backups incase of the disk
fails.
• Multiuser support: The data management system offers concurrent access, the ability
for multiple users to access and modify data without interfering or overwriting changes
made by another user.
• Security: The RDMS system also provides a way to regulate access by users to objects
and certain types of operations.
Populating:
In the third phase, data in populated in the data mart.
The populating step involves the following tasks:
• Source data to target data Mapping
• Extraction of source data
• Cleaning and transformation operations on the data
• Loading data into the data mart
• Creating and storing metadata
What Products and Technologies Do You Need?
You accomplish these population tasks using an ETL (Extract Transform Load) Tool. This tool
allows you to look at the data sources, perform source-to-target mapping, extract the data,
transform, cleanse it, and load it back into the data mart.
In the process, the tool also creates some metadata relating to things like where the data came
from, how recent it is, what type of changes were made to the data, and what level of
summarization was done.
Accessing
Accessing is a fourth step which involves putting the data to use: querying the data, creating
reports, charts, and publishing them. End-user submit queries to the database and display the
results of the queries
The accessing step needs to perform the following tasks:
• Set up a meta layer that translates database structures and objects names into business
terms. This helps non-technical users to access the Data mart easily.
• Set up and maintain database structures.
• Set up API and interfaces if required
What Products and Technologies Do You Need?
You can access the data mart using the command line or GUI. GUI is preferred as it can easily
generate graphs and is user-friendly compared to the command line.
Managing
This is the last step of Data Mart Implementation process. This step covers management tasks
such as-
• Ongoing user access management.
• System optimizations and fine-tuning to achieve the enhanced performance.
• Adding and managing fresh data into the data mart.
• Planning recovery scenarios and ensure system availability in the case when the system
fails.

Best practices for Implementing Data Marts


Following are the best practices that you need to follow while in the Data Mart Implementation
process:
• The source of a Data Mart should be departmentally structured
• The implementation cycle of a Data Mart should be measured in short periods of time,
i.e., in weeks instead of months or years.
• It is important to involve all stakeholders in planning and designing phase as the data
mart implementation could be complex.
• Data Mart Hardware/Software, Networking and Implementation costs should be
accurately budgeted in your plan
• Even though if the Data mart is created on the same hardware they may need some
different software to handle user queries. Additional processing power and disk storage
requirements should be evaluated for fast user response
• A data mart may be on a different location from the data warehouse. That’s why it is
important to ensure that they have enough networking capacity to handle the Data
volumes needed to transfer data to the data mart.
• Implementation cost should budget the time taken for Datamart loading process. Load
time increases with increase in complexity of the transformations.

Advantages of a Data Mart

• Data marts contain a subset of organization-wide data. This Data is valuable to a specific
group of people in an organization.
• It is cost-effective alternatives to a data warehouse, which can take high costs to build.
• Data Mart allows faster access of Data.
• Data Mart is easy to use as it is specifically designed for the needs of its users. Thus a
data mart can accelerate business processes.
• Data Marts needs less implementation time compare to Data Warehouse systems. It is
faster to implement Data Mart as you only need to concentrate the only subset of the data.
• It contains historical data which enables the analyst to determine data trends.
Disadvantages of a Data Mart
• Many a times enterprises create too many disparate and unrelated data marts without
much benefit. It can become a big hurdle to maintain.
• Data Mart cannot provide company-wide data analysis as their data set is limited.

Differences between Data Warehouse and Data Mart

Parameter Data Warehouse Data Mart

Definition A Data Warehouse is a large repository of A data mart is an only subtype of a Data
data collected from different organizations Warehouse. It is designed to meet the need of a
or departments within a corporation. certain user group.

Usage It helps to take a strategic decision. It helps to take tactical decisions for the business.

Objective The main objective of Data Warehouse is A data mart mostly used in a business division at
to provide an integrated environment and the department level.
coherent picture of the business at a point
in time.

Designing The designing process of Data Warehouse The designing process of Data Mart is easy.
is quite difficult.

May or may not use in a dimensional It is built focused on a dimensional model using a
model. However, it can feed dimensional start schema.
models.

Data Handling Data warehousing includes large area of Data marts are easy to use, design and implement
the corporation which is why it takes a as it can only handle small amounts of data.
long time to process it.

Focus Data warehousing is broadly focused all Data Mart is subject-oriented, and it is used at a
the departments. It is possible that it can department level.
even represent the entire company.

Data type The data stored inside the Data Warehouse Data Marts are built for particular user groups.
are always detailed when compared with Therefore, data short and limited.
data mart.

Subject-area The main objective of Data Warehouse is Mostly hold only one subject area- for example,
to provide an integrated environment and Sales figure.
coherent picture of the business at a point
in time.

Data storing Designed to store enterprise-wide decision Dimensional modeling and star schema design
data, not just marketing data. employed for optimizing the performance of
access layer.

Data type Time variance and non-volatile design are Mostly includes consolidation data structures to
strictly enforced. meet subject area's query and reporting needs.

Data value Read-Only from the end-users standpoint. Transaction data regardless of grain fed directly
from the Data Warehouse.

Scope Data warehousing is more helpful as it can Data mart contains data, of a specific department
bring information from any department. of a company. There are maybe separate data
marts for sales, finance, marketing, etc. Has
limited usage

Source In Data Warehouse Data comes from many In Data Mart data comes from very few sources.
sources.

Size The size of the Data Warehouse may range The Size of Data Mart is less than 100 GB.
from 100 GB to 1 TB+.

Implementation The implementation process of Data The implementation process of Data Mart is
time Warehouse can be extended from months restricted to few months.
to years.

Data Mining
Data Mining is defined as the procedure of extracting information from huge sets of data. In
other words, we can say that data mining is mining knowledge from data.

What is Data Mining?


Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data
Mining is all about discovering unsuspected/ previously unknown relationships amongst the data.
It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology.
The insights derived via Data Mining can be used for marketing, fraud detection, and scientific
discovery, etc.
Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis,
information harvesting, etc.
Types of Data
Data mining can be performed on following types of data
• Relational databases • Transactional and Spatial databases
• Data warehouses • Heterogeneous and legacy databases
• Advanced DB and information repositories • Multimedia and streaming database
• Object-oriented and object-relational • Text databases
databases • Text mining and Web mining

Data Mining Implementation Process

Let's study the Data Mining implementation process in detail

Business understanding:

In this phase, business and data-mining goals are established.

• First, you need to understand business and client objectives. You need to define what
your client wants (which many times even they do not know themselves)
• Take stock of the current data mining scenario. Factor in resources, assumption,
constraints, and other significant factors into your assessment.
• Using business objectives and current scenario, define your data mining goals.
• A good data mining plan is very detailed and should be developed to accomplish both
business and data mining goals.

Data understanding:
In this phase, sanity check on data is performed to check whether its appropriate for the data
mining goals.

• First, data is collected from multiple data sources available in the organization.
• These data sources may include multiple databases, flat filer or data cubes. There are
issues like object matching and schema integration which can arise during Data
Integration process. It is a quite complex and tricky process as data from various sources
unlikely to match easily. For example, table A contains an entity named cust_no whereas
another table B contains an entity named cust-id.
• Therefore, it is quite difficult to ensure that both of these given objects refer to the same
value or not. Here, Metadata should be used to reduce errors in the data integration
process.
• Next, the step is to search for properties of acquired data. A good way to explore the data
is to answer the data mining questions (decided in business phase) using the query,
reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained. Missing data if any
should be acquired.

Data preparation:

In this phase, data is made production ready.


The data preparation process consumes about 90% of the time of the project.
The data from different sources should be selected, cleaned, transformed, formatted,
anonymized, and constructed (if required).
Data cleaning is a process to "clean" the data by smoothing noisy data and filling in missing
values.
For example, for a customer demographics profile, age data is missing. The data is incomplete
and should be filled. In some cases, there could be data outliers. For instance, age has a value
300. Data could be inconsistent. For instance, name of the customer is different in different
tables.
Data transformation operations change the data to make it useful in data mining. Following
transformation can be applied
Data Transformation:
Data transformation operations would contribute toward the success of the mining process.
Smoothing: It helps to remove noise from the data.
Aggregation: Summary or aggregation operations are applied to the data. I.e., the weekly sales
data is aggregated to calculate the monthly and yearly total.
Generalization: In this step, Low-level data is replaced by higher-level concepts with the help
of concept hierarchies. For example, the city is replaced by the county.
Normalization: Normalization performed when the attribute data are scaled up o scaled down.
Example: Data should fall in the range -2.0 to 2.0 post-normalization.
Attribute construction: these attributes are constructed and included the given set of attributes
helpful for data mining.
The result of this process is a final data set that can be used in modeling.
Modeling
In this phase, mathematical models are used to determine data patterns.
• Based on the business objectives, suitable modeling techniques should be selected for the
prepared dataset.
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that model can meet data
mining objectives.

Evaluation:
In this phase, patterns identified are evaluated against the business objectives.
• Results generated by the data mining model should be evaluated against the business
objectives.
• Gaining business understanding is an iterative process. In fact, while understanding, new
business requirements may be raised because of data mining.
• A go or no-go decision is taken to move the model in the deployment phase.

Deployment:
In the deployment phase, you ship your data mining discoveries to everyday business operations.
• The knowledge or information discovered during data mining process should be made
easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring of data mining
discoveries is created.
• A final project report is created with lessons learned and key experiences during the
project. This helps to improve the organization's business policy.
Data Mining Techniques
1. Classification:
This analysis is used to retrieve important
and relevant information about data, and
metadata. This data mining method helps to
classify data in different classes.
2. Clustering:
Clustering analysis is a data mining
technique to identify data that are like each
other. This process helps to understand the
differences and similarities between the
data.
3. Regression:
Regression analysis is the data mining method of identifying and analyzing the relationship
between variables. It is used to identify the likelihood of a specific variable, given the presence
of other variables.
4. Association Rules:
This data mining technique helps to find the association between two or more Items. It discovers
a hidden pattern in the data set.
5. Outer detection:
This type of data mining technique refers to observation of data items in the dataset which do not
match an expected pattern or expected behavior. This technique can be used in a variety of
domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called
Outlier Analysis or Outlier mining.
6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in transaction
data for certain period.
7. Prediction:
Prediction has used a combination of the other data mining techniques like trends, sequential
patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence for
predicting a future event.

Challenges of Implementation of Data mine:


• Skilled Experts are needed to formulate the data mining queries.
• Overfitting: Due to small size training database, a model may not fit future states.
• Data mining needs large databases which sometimes are difficult to manage
• Business practices may need to be modified to determine to use the information
uncovered.
• If the data set is not diverse, data mining results may not be accurate.
• Integration information needed from heterogeneous databases and global information
systems could be complex
Data mining Examples:
Example 1:
Consider a marketing head of telecom service provides who wants to increase revenues of long
distance services. For high ROI on his sales and marketing efforts customer profiling is
important. He has a vast data pool of customer information like age, gender, income, credit
history, etc. But its impossible to determine characteristics of people who prefer long distance
calls with manual analysis. Using data mining techniques, he may uncover patterns between high
long distance call users and their characteristics.
For example, he might learn that his best customers are married females between the age of 45
and 54 who make more than $80,000 per year. Marketing efforts can be targeted to such
demographic.

Example 2:
A bank wants to search new ways to increase revenues from its credit card operations. They want
to check whether usage would double if fees were halved.
Bank has multiple years of record on average credit card balances, payment amounts, credit limit
usage, and other key parameters. They create a model to check the impact of the proposed new
business policy. The data results show that cutting fees in half for a targetted customer base
could increase revenues by $10 million.

Benefits of Data Mining:


• Data mining technique helps companies to get knowledge-based information.
• Data mining helps organizations to make the profitable adjustments in operation and
production.
• The data mining is a cost-effective and efficient solution compared to other statistical
data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviors as well as automated discovery
of hidden patterns.
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze huge amount of data
in less time.
Disadvantages of Data Mining
• There are chances of companies may sell useful information of their customers to other
companies for money. For example, American Express has sold credit card purchases of
their customers to the other companies.
• Many data mining analytics software is difficult to operate and requires advance training
to work on.
• Different data mining tools work in different manners due to different algorithms
employed in their design. Therefore, the selection of correct data mining tool is a very
difficult task.
• The data mining techniques are not accurate, and so it can cause serious consequences in
certain conditions.

Data Mining Applications


Applications Usage

Communications Data mining techniques are used in communication sector to predict customer behavior to offer
highly targetted and relevant campaigns.

Insurance Data mining helps insurance companies to price their products profitable and promote new
offers to their new or existing customers.

Education Data mining benefits educators to access student data, predict achievement levels and find
students or groups of students which need extra attention. For example, students who are weak
in maths subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of production assets.
They can anticipate maintenance which helps them reduce them to minimize downtime.
Banking Data mining helps finance sector to get a view of market risks and manage regulatory
compliance. It helps banks to identify probable defaulters to decide whether to issue credit
cards, loans, etc.
Retail Data Mining techniques help retail malls and grocery stores identify and arrange most sellable
items in the most attentive positions. It helps store owners to comes up with the offer which
encourages customers to increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data Mining to predict the
reasons when a customer leaves their company. They analyze billing details, customer service
interactions, complaints made to the company to assign each customer a probability score and
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells through their websites.
offers incentives.
One of the most famous names is Amazon, who use Data mining techniques to get more
customers into their eCommerce store.
Super Markets Data Mining allows supermarket's develope rules to predict if their shoppers were likely to be
expecting. By evaluating their buying pattern, they could find woman customers who are most
likely pregnant. They can start targeting products like baby powder, baby shop, diapers and so
Crime Data
on. Mining helps crime investigation agencies to deploy police workforce (where is a crime
Investigation most likely to happen and when?), who to search at a border crossing etc.

Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in biology and
medicine.

You might also like