0% found this document useful (0 votes)
24 views

UNIT V

The document covers database security, focusing on authentication, authorization, and access control methods, including DAC, MAC, and RBAC. It also discusses intrusion detection systems (IDS), their types, and SQL injection as a common attack method, along with prevention strategies. Additionally, it introduces advanced database topics such as object-oriented and object-relational databases, highlighting their differences and features.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

UNIT V

The document covers database security, focusing on authentication, authorization, and access control methods, including DAC, MAC, and RBAC. It also discusses intrusion detection systems (IDS), their types, and SQL injection as a common attack method, along with prevention strategies. Additionally, it introduces advanced database topics such as object-oriented and object-relational databases, highlighting their differences and features.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT V

1. Database Security:
1.1. Authentication,
1.2. Authorization
1.3. Access control,
1.3.1. DAC,
1.3.2. MAC and
1.3.3. RBAC models,
2. Intrusion detection,
2.1. SQL injection.
3. Advanced topics:
3.1. Object oriented and object relational databases,
3.2. Logical databases,
3.3. Web databases,
3.4. Distributed databases,
3.5. Data warehousing and data mining.

1. Database Security

Three pillars of security

1. Confidentiality: This component is often associated with secrecy and the use of
encryption. Confidentiality in this context means that the data is only available to
authorized parties. When information has been kept confidential it means that it has
not been compromised by other parties; confidential data are not disclosed to people
who do not require them or who should not have access to them. Ensuring
confidentiality means that information is organized in terms of who needs to have
access, as well as the sensitivity of the data. A breach of confidentiality may take
place through different means, for instance hacking or social engineering.
2. Integrity: Data integrity refers to the certainty that the data is not tampered with or
degraded during or after submission. It is the certainty that the data has not been
subject to unauthorized modification, either intentional or unintentional. There are
two points during the transmission process during which the integrity could be
compromised: during the upload or transmission of data or during the storage of the
document in the database or collection.
3. Availability: This means that the information is available to authorized users when it
is needed. For a system to demonstrate availability, it must have properly functioning
computing systems, security controls and communication channels. Systems defined
as critical (power generation, medical equipment, safety systems) often have extreme
requirements related to availability. These systems must be resilient against cyber
threats, and have safeguards against power outages, hardware failures and other
events that might impact the system availability.

Database Security means keeping sensitive information safe and prevents the loss of data.
Security of data base is controlled by Database Administrator (DBA). The following are the
main control measures are used to provide security of data in databases:

Prepared by: Minu Choudhary Page 1


 Authentication
 Authorization
 Access control

These are explained as following below.


1.1. Authentication

Authentication is a method of verifying the identity of a person who is accessing your


database. User authentication is to make sure that the person accessing the database is
who he claims to be. Authentication can be done at the operating system level or even
the database level itself. Many authentication systems such as retina scanners or bio-
metrics are used to make sure unauthorized people cannot access the database.

1.2. Authorization

Authorization is a privilege provided by the Database Administer. Users of the


database can only view the contents they are authorized to view. The rest of the
database is out of bounds to them.

The categories of authorization that can be given to users are:

a) System Administrator - This is the highest administrative authorization for a user.


Users with this authorization can also execute some database administrator commands
such as restore or upgrade a database.
b) System Control - This is the highest control authorization for a user. This allows
maintenance operations on the database but not direct access to data.
c) System Maintenance - This is the lower level of system control authority. It also
allows users to maintain the database but within a database manager instance.
d) System Monitor - Using this authority, the user can monitor the database and take
snapshots of it.

Prepared by: Minu Choudhary Page 2


1.3. Access Control

Access control is used to identify a subject (user/human) and to authorize the subject to
access an object (data/resource) based on the required task. These controls are used to protect
resources from unauthorized access and are put into place to ensure that subjects can only
access objects using secure and pre-approved methods.

Three main types of access control systems are:


1.3.1. Discretionary Access Control (DAC),
1.3.2 Role Based Access Control (RBAC), and
1.3.3. Mandatory Access Control (MAC).

1.3.2. Discretionary Access Control (DAC)

The access permission is usually on user‟s discretion. As a user you can create a file and set
the permissions as you want, or share it with whoever you decide. The owner decides access
instead of administrator. It gives you the highest level of flexibility. But remember, flexibility
comes with risk of authorised information disclosure. It is considered the least secure method.
NTFS in Windows is an example of DAC implementation.

Prepared by: Minu Choudhary Page 3


1.3.3. Role-Based Access Control (RBAC)

Balance between flexibility and control makes RBAC most widely used access control
method. If you work in an IT organization, your access is most probably controlled by
RBAC. The administrator assigns the required permissions on different managed groups, and
makes users part of any specific group whenever required. Your role at work decides which
group you would fall into. And the group you would fall into. And the group already has the
required permissions for yoy. Wndows OS/Windows OS-based Domains are example of
RBAC.

Prepared by: Minu Choudhary Page 4


1.3.3 Mandatory Access Control (MAC)

Users in MAC environment can‟t make changes in permissions. It‟s usually predefined or can
be changed only by administrators. Access is granted based on clearance level. Each object is
labelled with a classification, such as Top secret or secret. And the subject needs same
clearance to access that object. It is considered the most secure method, but it also makes it
the most inflexible. It is used in environments where the highest level of confidentially is
required, such as Top secret government agencies or in military etc. SELinux is an example
of MAC.

Prepared by: Minu Choudhary Page 5


2. Intrusion detection
An Intrusion Detection System (IDS) is a system that monitors network traffic for
suspicious activity and issues alerts when such activity is discovered. It is a software
application that scans a network or a system for the harmful activity or policy breaching.
Any malicious venture or violation is normally reported either to an administrator or
collected centrally using a security information and event management (SIEM) system. A
SIEM system integrates outputs from multiple sources and uses alarm filtering techniques
to differentiate malicious activity from false alarms.
Although intrusion detection systems monitor networks for potentially malicious activity,
they are also disposed to false alarms. Hence, organizations need to fine-tune their IDS
products when they first install them. It means properly setting up the intrusion detection
systems to recognize what normal traffic on the network looks like as compared to
malicious activity.
Intrusion prevention systems also monitor network packets inbound the system to check the
malicious activities involved in it and at once send the warning notifications.
Classification of Intrusion Detection System:
IDS are classified into 5 types:
1. Network Intrusion Detection System (NIDS): Network intrusion detection systems
(NIDS) are set up at a planned point within the network to examine traffic from all
devices on the network. It performs an observation of passing traffic on the entire
subnet and matches the traffic that is passed on the subnets to the collection of known
attacks. Once an attack is identified or abnormal behavior is observed, the alert can be
sent to the administrator. An example of a NIDS is installing it on the subnet where
firewalls are located in order to see if someone is trying to crack the firewall.
2. Host Intrusion Detection System (HIDS): Host intrusion detection systems (HIDS)
run on independent hosts or devices on the network. A HIDS monitors the incoming
and outgoing packets from the device only and will alert the administrator if suspicious
or malicious activity is detected. It takes a snapshot of existing system files and
compares it with the previous snapshot. If the analytical system files were edited or

Prepared by: Minu Choudhary Page 6


deleted, an alert is sent to the administrator to investigate. An example of HIDS usage
can be seen on mission-critical machines, which are not expected to change their layout.
3. Protocol-based Intrusion Detection System (PIDS): Protocol-based intrusion
detection system (PIDS) comprises a system or agent that would consistently resides at
the front end of a server, controlling and interpreting the protocol between a user/device
and the server. It is trying to secure the web server by regularly monitoring the HTTPS
protocol stream and accept the related HTTP protocol. As HTTPS is un-encrypted and
before instantly entering its web presentation layer then this system would need to
reside in this interface, between to use the HTTPS.
4. Application Protocol-based Intrusion Detection System (APIDS): Application
Protocol-based Intrusion Detection System (APIDS) is a system or agent that generally
resides within a group of servers. It identifies the intrusions by monitoring and
interpreting the communication on application-specific protocols. For example, this
would monitor the SQL protocol explicit to the middleware as it transacts with the
database in the web server.
5. Hybrid Intrusion Detection System: Hybrid intrusion detection system is made by the
combination of two or more approaches of the intrusion detection system. In the hybrid
intrusion detection system, host agent or system data is combined with network
information to develop a complete view of the network system. Hybrid intrusion
detection system is more effective in comparison to the other intrusion detection
system. Prelude is an example of Hybrid IDS.

Detection Method of IDS:


1. Signature-based Method: Signature-based IDS detects the attacks on the basis of the
specific patterns such as number of bytes or number of 1‟s or number of 0‟s in the
network traffic. It also detects on the basis of the already known malicious instruction
sequence that is used by the malware. The detected patterns in the IDS are known as
signatures.
Signature-based IDS can easily detect the attacks whose pattern (signature) already
exists in system but it is quite difficult to detect the new malware attacks as their pattern
(signature) is not known.
2. Anomaly-based Method: Anomaly-based IDS was introduced to detect unknown
malware attacks as new malware are developed rapidly. In anomaly-based IDS there is
use of machine learning to create a trustful activity model and anything coming is
compared with that model and it is declared suspicious if it is not found in model.
Machine learning-based method has a better-generalized property in comparison to
signature-based IDS as these models can be trained according to the applications and
hardware configurations.

Comparison of IDS with Firewalls: IDS and firewall both are related to network security
but an IDS differs from a firewall as a firewall looks outwardly for intrusions in order to
stop them from happening. Firewalls restrict access between networks to prevent intrusion
and if an attack is from inside the network it doesn‟t signal. An IDS describes a suspected
intrusion once it has happened and then signals an alarm.

Prepared by: Minu Choudhary Page 7


2.1. SQL Injection

SQL injection is a technique used to exploit user data through web page inputs by injecting
SQL commands as statements. Basically, these statements can be used to manipulate the
application‟s web server by malicious users.
 SQL injection is a code injection technique that might destroy your database.
 SQL injection is one of the most common web hacking techniques.
 SQL injection is the placement of malicious code in SQL statements, via web page
input.

Exploitation of SQL Injection in Web Applications

Web servers communicate with database servers anytime they need to retrieve or store user
data. SQL statements by the attacker are designed so that they can be executed while the
web-server is fetching content from the application server.It compromises the security of a
web application.
Example of SQL Injection

Suppose we have an application based on student records. Any student can view only his or
her own records by entering a unique and private student ID. Suppose we have a field like
below:
Student id:

And the student enters the following in the input field:


12222345 or 1=1.
So this basically translates to :
SELECT * from STUDENT where
STUDENT-ID == 12222345 or 1 = 1
Now this 1=1 will return all records for which this holds true. So basically, all the student
data is compromised. Now the malicious user can also delete the student records in a
similar fashion.

Consider the following SQL query.


SELECT * from USER where
USERNAME = “” and PASSWORD=””
Now the malicious can use the „=‟ operator in a clever manner to retrieve private and secure
user information. So instead of the above-mentioned query the following query when
executed, retrieves protected data, not intended to be shown to users.
Select * from User where
(Username = “” or 1=1) AND
(Password=”” or 1=1).
Since 1=1 always holds true, user data is compromised.

Prepared by: Minu Choudhary Page 8


Impact of SQL Injection

The hacker can retrieve all the user-data present in the database such as user details, credit
card information, social security numbers and can also gain access to protected areas like
the administrator portal. It is also possible to delete the user data from the tables.
Nowadays, all online shopping applications, bank transactions use back-end database
servers. So in-case the hacker is able to exploit SQL injection, the entire server is
compromised.

Preventing SQL Injection

 User Authentication: Validating input from the user by pre-defining length, type of
input, of the input field and authenticating the user.
 Restricting access privileges of users and defining as to how much amount of data any
outsider can access from the database. Basically, user should not be granted permission
to access everything in the database.
 Do not use system administrator accounts.

3.1. Object oriented and object relational databases


What is Object Oriented Database

Object-oriented databases represent data in the form of objects and classes. According to the
object-oriented paradigm, an object is a real-world entity. In addition, a class helps to create
objects. Moreover, object-oriented databases follow the principles of object-oriented
programming.

Figure 5.1: Object Oriented Model

In addition, object-oriented databases support OOP concepts such as inheritance,


encapsulation, etc. It also supports complex objects such as maps, sets, lists, tuples or
collections of multiple primitive objects. Furthermore, Object-oriented database allows the

Prepared by: Minu Choudhary Page 9


user to create persistent objects which help to overcome the database issues such as
concurrency and recovery. These objects stay in computer memory even after completing the
execution.

What is Object Relational Database


A hybrid between the object oriented model (OODBMS) and the relational model(RDBMS).
Each of those two models has their strengths and weaknesses. By combining the two models,
a DBMS can take advantage of various strengths from each model.

Different types of Database

Features of object oriented and database capability

Prepared by: Minu Choudhary Page 10


How mapping is done?

Prepared by: Minu Choudhary Page 11


Difference between Object Oriented Database and Object Relational Database

Definition

An object-oriented database is a database that represents information in the form of objects as


used in object-oriented Programming. An object-relational database, on the other hand, is a
database that depends on the relational model and the object-oriented database model. Thus,
this is the main difference between object oriented database and object relational database.

Based on

Moreover, object-oriented database depends on OOP while object-relational database


depends on the relational model and object-oriented database model.

Improvement

Another difference between object oriented database and object relational database is that
object-relational database is more improved than object-oriented database.

3.2. Logical Database


A Logical Database is a special type of ABAP (Advance Business Application and
Programming) that is used to retrieve data from various tables and the data is interrelated to
each other. Also, a logical database provides a read-only view of Data.

Structure of Logical Database:

A Logical database uses only a hierarchical structure of tables i.e. Data is organized in a
Tree-like Structure and the data is stored as records that are connected to each other through
edges (Links). Logical Database contains Open SQL statements which are used to read data

Prepared by: Minu Choudhary Page 12


from the database. The logical database reads the program, stores them in the program if
required, and passes them line by line to the application program.

Figure 5.3: Structure of Logical database

Features of Logical Database:

 We can select only that type of Data that we need.


 Data Authentication is done in order to maintain security.
 Logical Database uses hierarchical Structure due to this data integrity is maintained.

Goal Of Logical Database:

The goal of Logical Database is to create well-structured tables that reflect the need of the
user. The tables of the Logical database store data in a non-redundant manner and foreign
keys will be used in tables so that relationships among tables and entities will be supported.

Tasks Of Logical Database:

Below is some important task of Logical Database:

 With the help of the Logical database, we will read the same data from multiple programs.
 A logical database defines the same user interface for multiple programs.
 Logical Database ensures the Authorization checks for the centralized sensitive database.
 With the help of a Logical Database, Performance is improved. Like in Logical Database we
will use joins instead of multiple SELECT statements, which will improve response time and
this will increase the Performance of Logical Database.

Data View Of Logical Database:

Prepared by: Minu Choudhary Page 13


Logical Database provides a particular view of Logical Database tables. A logical database is
appropriately used when the structure of the Database is Large. It is convenient to use flow i.e

 SELECT
 READ
 PROCESS
 DISPLAY

In order to work with databases efficiently. The data of the Logical Database is hierarchical
in nature. The tables are linked to each other in a Foreign Key relationship.

Diagrammatically, the Data View of Logical Database is shown as:

Figure 5.4: Data View of Logical Database

Points To Remember:

 Tables must have Foreign Key Relationship.


 A logical Database consists of logically related tables that are arranged in a hierarchical
manner used for reading or retrieving Data.
 Logical Database consist of three main elements:
o Structure of Database
o Selections of Data from Database
o Database Program
 If we want to improve the access time on data, then we use VIEWS in Logical Database.

Example:
Suppose in a University or College, a HOD wants to get information about a specific student.
So for that, he firstly retrieves the data about its batch and Branch from a large amount of
Data, and he will easily get information about the required Student but didn‟t alter the
information about it.

Prepared by: Minu Choudhary Page 14


Advantages Of Logical Database:

Let us look at some advantages of the logical database:

 In a Logical database, we can select meaningful data from a large amount of data.
 Logical Database consists of Central Authorization which checks for Database Accesses is
Authenticated or not.
 In this Coding, the part is less required to retrieve data from the database as compared to
Other Databases.
 Access performance of reading data from the hierarchical structure of the Database is good.
 Easy to understand user interfaces.
 Logical Database firstly check functions which further check that user input is complete,
correct, and plausible.

Disadvantages Of Logical Database:

This section shows the disadvantages of the logical database:

Prepared by: Minu Choudhary Page 15


 Logical Database takes more time when the required data is at the last because if that table
which is required at the lowest level then firstly all upper-level tables should be read which
takes more time and this slows down the performance.
 In Logical Database ENDGET command doesn‟t exist due to this the code block associated
with an event ends with the next event statement.

3.3. Web databases

A web-based database is just a system that stores information for online access. It usually
keeps records in a way that‟s easy to search and retrieve through a browser. All you need to
do is use various keywords to find the desired information.

Data Organization

The organization of data in a web-based database is simple. Information is kept in tables that
have different fields. Depending on the system, that can either be in relational or non-
relational format.

The relational model is most common for records that share related fields. For example, a
school‟s set-up can have a wide range of student details with names, classes, and more. That
way, the administrator can filter the info depending on their needs.

However, the non-relational option uses a random approach to organize information. It


generates schema structures that are flexible and robust, which are useful for organizations
that handle large amounts of records.

So, where is the data in a database stored?

Once a system processes the records, it stores them in the root directory. It consists of a
folder in a computer‟s storage system.

Database software is also available to organize and correlate various sets of data. Most of it is
usually in a natural processing language format, including text, numbers, and symbols.
Altogether, it streamlines the process of sorting records for quick retrieval.

But where do they get the information?

While this varies depending on the needs of an organization, the majority rely on data
analytics to gather info from multiple sources.

A good example is how Google works with search records from users. It additionally has a
bot that crawls billions of informational websites on the web. From here, it ranks them
depending on the most searched terms on the internet.

What about security?

Prepared by: Minu Choudhary Page 16


Securing your website-based database is also of great importance, especially since hackers
access billions of organizational records every year. Protecting your systems isn‟t a matter
that‟s up for discussion; it‟s a must.

Luckily, database management systems (DBMS) offer robust data encryption mechanisms.
Top of that list is the use of complex algorithms for encrypting files. This approach makes
information unreadable to unauthorized users. When you need access, it will decrypt the
records to make them readable.

3.4. Distributed databases


A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.

Features

 Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous
with a transaction processing system.

Distributed Database Management System

A distributed database management system (DDBMS) is a centralized software system that


manages a distributed database in a manner as if it were all stored in a single location.

Features

 It is used to create, retrieve, update and delete distributed databases.


 It synchronizes the database periodically and provides access mechanisms by the
virtue of which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed
by numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Factors Encouraging DDBMS

The following factors encourage moving over to DDBMS −

 Distributed Nature of Organizational Units − Most organizations in the current


times are subdivided into multiple units that are physically distributed over the globe.

Prepared by: Minu Choudhary Page 17


Each unit requires its own set of local data. Thus, the overall database of the
organization becomes distributed.
 Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands
common databases or replicated databases that should be used in a synchronized
manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may have
common data. Distributed database systems aid both these processing by providing
synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is replication
of data across different sites. Replication of data automatically helps in data recovery
if database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost
inconspicuous to users.
 Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a
uniform functionality for using the same data among different platforms.

Advantages of Distributed Databases

Following are the advantages of distributed databases over centralized databases.

Modular Development − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed
system, with no interruption in current functions.

More Reliable − In case of database failures, the total system of centralized databases comes
to a halt. However, in distributed systems, when a component fails, the functioning of the
system continues may be at a reduced performance. Hence DDBMS is more reliable.

Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized
systems, all queries have to pass through the central computer for processing, which increases
the response time.

Lower Communication Cost − In distributed database systems, if data is located locally


where it is mostly used, then the communication costs for data manipulation can be
minimized. This is not feasible in centralized systems.

Adversities of Distributed Databases

Following are some of the adversities associated with distributed databases.

 Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.

Prepared by: Minu Choudhary Page 18


 Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the
sites.
 Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
 Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to
very slow response to user requests.

Types of Distributed Databases

Distributed databases can be broadly classified into homogeneous and heterogeneous


distributed database environments, each with further sub-divisions, as shown in the following
illustration.

Homogeneous Distributed Databases

In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −

 The sites use very similar software.


 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user
requests.
 The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −

 Autonomous − Each database is independent that functions on its own. They are
integrated by a controlling application and use message passing to share data updates.

Prepared by: Minu Choudhary Page 19


 Non-autonomous − Data is distributed across the homogeneous nodes and a central
or master DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases

In a heterogeneous distributed database, different sites have different operating systems,


DBMS products and data models. Its properties are −

 Different sites use dissimilar schemas and software.


 The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in
processing user requests.

Types of Heterogeneous Distributed Databases

 Federated − The heterogeneous database systems are independent in nature and


integrated together so that they function as a single database system.
 Un-federated − The database systems employ a central coordinating module through
which the databases are accessed.

Distributed DBMS Architectures

DDBMS architectures are generally developed depending on three parameters −

 Distribution − It states the physical distribution of data across the different sites.
 Autonomy − It indicates the distribution of control of the database system and the
degree to which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.

Architectural Models

Some of the common architectural models are −

 Client - Server Architecture for DDBMS


 Peer - to - Peer Architecture for DDBMS
 Multi - DBMS Architecture

Client - Server Architecture for DDBMS

This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.

The two different client - server architecture are −

Prepared by: Minu Choudhary Page 20


 Single Server Multiple Client
 Multiple Server Multiple Client (shown in the following diagram)

Peer- to-Peer Architecture for DDBMS

In these systems, each peer acts both as a client and a server for imparting database services.
The peers share their resource with other peers and co-ordinate their activities.

This architecture generally has four levels of schemas −

 Global Conceptual Schema − Depicts the global logical view of data.


 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.

Prepared by: Minu Choudhary Page 21


Design Alternatives

The distribution design alternatives for the tables in a DDBMS are as follows −

 Non-replicated and non-fragmented


 Fully replicated
 Partially replicated
 Fragmented
 Mixed

Non-replicated & Non-fragmented

In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.

Fully Replicated

In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
is required to be handled whereas the number of database updates is low.

Partially Replicated

Prepared by: Minu Choudhary Page 22


Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact
that the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
the site which generate the access queries.

Fragmented

In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.

The three fragmentation techniques are −

 Vertical fragmentation
 Horizontal fragmentation
 Hybrid fragmentation

Mixed Distribution

This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.

3.5. Data warehousing and data mining


Data warehouse refers to the process of compiling and organizing data into one common
database, whereas data mining refers to the process of extracting useful data from the
databases. The data mining process depends on the data compiled in the data warehousing
phase to recognize meaningful patterns. A data warehousing is created to support
management systems.

Data Warehouse

A Data Warehouse refers to a place where data can be stored for useful mining. It is like a
quick computer system with exceptionally huge data storage capacity. Data from the various
organization's systems are copied to the Warehouse, where it can be fetched and conformed
to delete errors. Here, advanced requests can be made against the warehouse storage of data.

Prepared by: Minu Choudhary Page 23


Data warehouse combines data from numerous sources which ensure the data quality,
accuracy, and consistency. Data warehouse boosts system execution by separating analytics
processing from transnational databases. Data flows into a data warehouse from different
databases. A data warehouse works by sorting out data into a pattern that depicts the format
and types of data. Query tools examine the data tables using patterns.

Data warehouses and databases both are relative data systems, but both are made to serve
different purposes. A data warehouse is built to store a huge amount of historical data and
empowers fast requests over all the data, typically using Online Analytical Processing
(OLAP). A database is made to store current transactions and allow quick access to specific
transactions for ongoing business processes, commonly known as Online Transaction
Processing (OLTP).

Important Features of Data Warehouse

The Important features of Data Warehouse are given below:

1. Subject Oriented

A data warehouse is subject-oriented. It provides useful data about a subject instead of the
company's ongoing operations, and these subjects can be customers, suppliers, marketing,
product, promotion, etc. A data warehouse usually focuses on modeling and analysis of data
that helps the business organization to make data-driven decisions.

2. Time-Variant:

The different data present in the data warehouse provides information for a specific period.

3. Integrated

A data warehouse is built by joining data from heterogeneous sources, such as social
databases, level documents, etc.

4. Non- Volatile

It means, once data entered into the warehouse cannot be change.

Advantages of Data Warehouse:

 More accurate data access


 Improved productivity and performance
 Cost-efficient
 Consistent and quality data

Data Mining

Data mining refers to the analysis of data. It is the computer-supported process of analyzing
huge sets of data that have either been compiled by computer systems or have been
downloaded into the computer. In the data mining process, the computer analyzes the data
and extract useful information from it. It looks for hidden patterns within the data set and try

Prepared by: Minu Choudhary Page 24


to predict future behavior. Data mining is primarily used to discover and indicate
relationships among the data sets.

Data mining aims to enable business organizations to view business behaviors, trends
relationships that allow the business to make data-driven decisions. It is also known as
knowledge Discover in Database (KDD). Data mining tools utilize AI, statistics, databases,
and machine learning systems to discover the relationship between the data. Data mining
tools can support business-related questions that traditionally time-consuming to resolve any
issue.

Important features of Data Mining:

The important features of Data Mining are given below:

 It utilizes the Automated discovery of patterns.


 It predicts the expected results.
 It focuses on large data sets and databases
 It creates actionable information.

Advantages of Data Mining:

i. Market Analysis:

Data Mining can predict the market that helps the business to make the decision. For
example, it predicts who is keen to purchase what type of products.

ii. Fraud detection:

Data Mining methods can help to find which cellular phone calls, insurance claims, credit, or
debit card purchases are going to be fraudulent.

iii. Financial Market Analysis:

Prepared by: Minu Choudhary Page 25


Data Mining techniques are widely used to help Model Financial Market

iv. Trend Analysis:

Analyzing the current existing trend in the marketplace is a strategic benefit because it helps
in cost reduction and manufacturing process as per market demand.

Differences between Data Mining and Data Warehousing:


Data Mining Data Warehousing

Data mining is the process of A data warehouse is a database system designed for
determining data patterns. analytics.

Data mining is generally considered as


Data warehousing is the process of combining all
the process of extracting useful data
the relevant data.
from a large set of data.

Business entrepreneurs carry data Data warehousing is entirely carried out by the
mining with the help of engineers. engineers.

In data mining, data is analyzed


In data warehousing, data is stored periodically.
repeatedly.

Data mining uses pattern recognition Data warehousing is the process of extracting and
techniques to identify patterns. storing data that allow easier reporting.

One of the most amazing data mining One of the advantages of the data warehouse is its
technique is the detection and ability to update frequently. That is the reason why
identification of the unwanted errors it is ideal for business entrepreneurs who want up to
that occur in the system. date with the latest stuff.

The data mining techniques are cost-


The responsibility of the data warehouse is to
efficient as compared to other statistical
simplify every type of business data.
data applications.

In the data warehouse, there is a high possibility


The data mining techniques are not 100
that the data required for analysis by the company
percent accurate. It may lead to serious
may not be integrated into the warehouse. It can
consequences in a certain condition.
simply lead to loss of data.

Companies can benefit from this Data warehouse stores a huge amount of historical
analytical tool by equipping suitable and data that helps users to analyze different periods and
accessible knowledge-based data. trends to make future predictions.

Prepared by: Minu Choudhary Page 26

You might also like