UNIT V
UNIT V
1. Database Security:
1.1. Authentication,
1.2. Authorization
1.3. Access control,
1.3.1. DAC,
1.3.2. MAC and
1.3.3. RBAC models,
2. Intrusion detection,
2.1. SQL injection.
3. Advanced topics:
3.1. Object oriented and object relational databases,
3.2. Logical databases,
3.3. Web databases,
3.4. Distributed databases,
3.5. Data warehousing and data mining.
1. Database Security
1. Confidentiality: This component is often associated with secrecy and the use of
encryption. Confidentiality in this context means that the data is only available to
authorized parties. When information has been kept confidential it means that it has
not been compromised by other parties; confidential data are not disclosed to people
who do not require them or who should not have access to them. Ensuring
confidentiality means that information is organized in terms of who needs to have
access, as well as the sensitivity of the data. A breach of confidentiality may take
place through different means, for instance hacking or social engineering.
2. Integrity: Data integrity refers to the certainty that the data is not tampered with or
degraded during or after submission. It is the certainty that the data has not been
subject to unauthorized modification, either intentional or unintentional. There are
two points during the transmission process during which the integrity could be
compromised: during the upload or transmission of data or during the storage of the
document in the database or collection.
3. Availability: This means that the information is available to authorized users when it
is needed. For a system to demonstrate availability, it must have properly functioning
computing systems, security controls and communication channels. Systems defined
as critical (power generation, medical equipment, safety systems) often have extreme
requirements related to availability. These systems must be resilient against cyber
threats, and have safeguards against power outages, hardware failures and other
events that might impact the system availability.
Database Security means keeping sensitive information safe and prevents the loss of data.
Security of data base is controlled by Database Administrator (DBA). The following are the
main control measures are used to provide security of data in databases:
1.2. Authorization
Access control is used to identify a subject (user/human) and to authorize the subject to
access an object (data/resource) based on the required task. These controls are used to protect
resources from unauthorized access and are put into place to ensure that subjects can only
access objects using secure and pre-approved methods.
The access permission is usually on user‟s discretion. As a user you can create a file and set
the permissions as you want, or share it with whoever you decide. The owner decides access
instead of administrator. It gives you the highest level of flexibility. But remember, flexibility
comes with risk of authorised information disclosure. It is considered the least secure method.
NTFS in Windows is an example of DAC implementation.
Balance between flexibility and control makes RBAC most widely used access control
method. If you work in an IT organization, your access is most probably controlled by
RBAC. The administrator assigns the required permissions on different managed groups, and
makes users part of any specific group whenever required. Your role at work decides which
group you would fall into. And the group you would fall into. And the group already has the
required permissions for yoy. Wndows OS/Windows OS-based Domains are example of
RBAC.
Users in MAC environment can‟t make changes in permissions. It‟s usually predefined or can
be changed only by administrators. Access is granted based on clearance level. Each object is
labelled with a classification, such as Top secret or secret. And the subject needs same
clearance to access that object. It is considered the most secure method, but it also makes it
the most inflexible. It is used in environments where the highest level of confidentially is
required, such as Top secret government agencies or in military etc. SELinux is an example
of MAC.
Comparison of IDS with Firewalls: IDS and firewall both are related to network security
but an IDS differs from a firewall as a firewall looks outwardly for intrusions in order to
stop them from happening. Firewalls restrict access between networks to prevent intrusion
and if an attack is from inside the network it doesn‟t signal. An IDS describes a suspected
intrusion once it has happened and then signals an alarm.
SQL injection is a technique used to exploit user data through web page inputs by injecting
SQL commands as statements. Basically, these statements can be used to manipulate the
application‟s web server by malicious users.
SQL injection is a code injection technique that might destroy your database.
SQL injection is one of the most common web hacking techniques.
SQL injection is the placement of malicious code in SQL statements, via web page
input.
Web servers communicate with database servers anytime they need to retrieve or store user
data. SQL statements by the attacker are designed so that they can be executed while the
web-server is fetching content from the application server.It compromises the security of a
web application.
Example of SQL Injection
Suppose we have an application based on student records. Any student can view only his or
her own records by entering a unique and private student ID. Suppose we have a field like
below:
Student id:
The hacker can retrieve all the user-data present in the database such as user details, credit
card information, social security numbers and can also gain access to protected areas like
the administrator portal. It is also possible to delete the user data from the tables.
Nowadays, all online shopping applications, bank transactions use back-end database
servers. So in-case the hacker is able to exploit SQL injection, the entire server is
compromised.
User Authentication: Validating input from the user by pre-defining length, type of
input, of the input field and authenticating the user.
Restricting access privileges of users and defining as to how much amount of data any
outsider can access from the database. Basically, user should not be granted permission
to access everything in the database.
Do not use system administrator accounts.
Object-oriented databases represent data in the form of objects and classes. According to the
object-oriented paradigm, an object is a real-world entity. In addition, a class helps to create
objects. Moreover, object-oriented databases follow the principles of object-oriented
programming.
Definition
Based on
Improvement
Another difference between object oriented database and object relational database is that
object-relational database is more improved than object-oriented database.
A Logical database uses only a hierarchical structure of tables i.e. Data is organized in a
Tree-like Structure and the data is stored as records that are connected to each other through
edges (Links). Logical Database contains Open SQL statements which are used to read data
The goal of Logical Database is to create well-structured tables that reflect the need of the
user. The tables of the Logical database store data in a non-redundant manner and foreign
keys will be used in tables so that relationships among tables and entities will be supported.
With the help of the Logical database, we will read the same data from multiple programs.
A logical database defines the same user interface for multiple programs.
Logical Database ensures the Authorization checks for the centralized sensitive database.
With the help of a Logical Database, Performance is improved. Like in Logical Database we
will use joins instead of multiple SELECT statements, which will improve response time and
this will increase the Performance of Logical Database.
SELECT
READ
PROCESS
DISPLAY
In order to work with databases efficiently. The data of the Logical Database is hierarchical
in nature. The tables are linked to each other in a Foreign Key relationship.
Points To Remember:
Example:
Suppose in a University or College, a HOD wants to get information about a specific student.
So for that, he firstly retrieves the data about its batch and Branch from a large amount of
Data, and he will easily get information about the required Student but didn‟t alter the
information about it.
In a Logical database, we can select meaningful data from a large amount of data.
Logical Database consists of Central Authorization which checks for Database Accesses is
Authenticated or not.
In this Coding, the part is less required to retrieve data from the database as compared to
Other Databases.
Access performance of reading data from the hierarchical structure of the Database is good.
Easy to understand user interfaces.
Logical Database firstly check functions which further check that user input is complete,
correct, and plausible.
A web-based database is just a system that stores information for online access. It usually
keeps records in a way that‟s easy to search and retrieve through a browser. All you need to
do is use various keywords to find the desired information.
Data Organization
The organization of data in a web-based database is simple. Information is kept in tables that
have different fields. Depending on the system, that can either be in relational or non-
relational format.
The relational model is most common for records that share related fields. For example, a
school‟s set-up can have a wide range of student details with names, classes, and more. That
way, the administrator can filter the info depending on their needs.
Once a system processes the records, it stores them in the root directory. It consists of a
folder in a computer‟s storage system.
Database software is also available to organize and correlate various sets of data. Most of it is
usually in a natural processing language format, including text, numbers, and symbols.
Altogether, it streamlines the process of sorting records for quick retrieval.
While this varies depending on the needs of an organization, the majority rely on data
analytics to gather info from multiple sources.
A good example is how Google works with search records from users. It additionally has a
bot that crawls billions of informational websites on the web. From here, it ranks them
depending on the most searched terms on the internet.
Luckily, database management systems (DBMS) offer robust data encryption mechanisms.
Top of that list is the use of complex algorithms for encrypting files. This approach makes
information unreadable to unauthorized users. When you need access, it will decrypt the
records to make them readable.
Features
Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous
with a transaction processing system.
Features
Modular Development − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed
system, with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes
to a halt. However, in distributed systems, when a component fails, the functioning of the
system continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized
systems, all queries have to pass through the central computer for processing, which increases
the response time.
Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.
In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −
Autonomous − Each database is independent that functions on its own. They are
integrated by a controlling application and use message passing to share data updates.
Distribution − It states the physical distribution of data across the different sites.
Autonomy − It indicates the distribution of control of the database system and the
degree to which each constituent DBMS can operate independently.
Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.
Architectural Models
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
In these systems, each peer acts both as a client and a server for imparting database services.
The peers share their resource with other peers and co-ordinate their activities.
The distribution design alternatives for the tables in a DDBMS are as follows −
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
is required to be handled whereas the number of database updates is low.
Partially Replicated
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
Data Warehouse
A Data Warehouse refers to a place where data can be stored for useful mining. It is like a
quick computer system with exceptionally huge data storage capacity. Data from the various
organization's systems are copied to the Warehouse, where it can be fetched and conformed
to delete errors. Here, advanced requests can be made against the warehouse storage of data.
Data warehouses and databases both are relative data systems, but both are made to serve
different purposes. A data warehouse is built to store a huge amount of historical data and
empowers fast requests over all the data, typically using Online Analytical Processing
(OLAP). A database is made to store current transactions and allow quick access to specific
transactions for ongoing business processes, commonly known as Online Transaction
Processing (OLTP).
1. Subject Oriented
A data warehouse is subject-oriented. It provides useful data about a subject instead of the
company's ongoing operations, and these subjects can be customers, suppliers, marketing,
product, promotion, etc. A data warehouse usually focuses on modeling and analysis of data
that helps the business organization to make data-driven decisions.
2. Time-Variant:
The different data present in the data warehouse provides information for a specific period.
3. Integrated
A data warehouse is built by joining data from heterogeneous sources, such as social
databases, level documents, etc.
4. Non- Volatile
Data Mining
Data mining refers to the analysis of data. It is the computer-supported process of analyzing
huge sets of data that have either been compiled by computer systems or have been
downloaded into the computer. In the data mining process, the computer analyzes the data
and extract useful information from it. It looks for hidden patterns within the data set and try
Data mining aims to enable business organizations to view business behaviors, trends
relationships that allow the business to make data-driven decisions. It is also known as
knowledge Discover in Database (KDD). Data mining tools utilize AI, statistics, databases,
and machine learning systems to discover the relationship between the data. Data mining
tools can support business-related questions that traditionally time-consuming to resolve any
issue.
i. Market Analysis:
Data Mining can predict the market that helps the business to make the decision. For
example, it predicts who is keen to purchase what type of products.
Data Mining methods can help to find which cellular phone calls, insurance claims, credit, or
debit card purchases are going to be fraudulent.
Analyzing the current existing trend in the marketplace is a strategic benefit because it helps
in cost reduction and manufacturing process as per market demand.
Data mining is the process of A data warehouse is a database system designed for
determining data patterns. analytics.
Business entrepreneurs carry data Data warehousing is entirely carried out by the
mining with the help of engineers. engineers.
Data mining uses pattern recognition Data warehousing is the process of extracting and
techniques to identify patterns. storing data that allow easier reporting.
One of the most amazing data mining One of the advantages of the data warehouse is its
technique is the detection and ability to update frequently. That is the reason why
identification of the unwanted errors it is ideal for business entrepreneurs who want up to
that occur in the system. date with the latest stuff.
Companies can benefit from this Data warehouse stores a huge amount of historical
analytical tool by equipping suitable and data that helps users to analyze different periods and
accessible knowledge-based data. trends to make future predictions.