0% found this document useful (0 votes)
33 views17 pages

Unit 5

CS3492

Uploaded by

Jenifus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views17 pages

Unit 5

CS3492

Uploaded by

Jenifus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit V

Advanced Topics
5.1 Distributed Databases
Definition of distributed databases:
• A distributed database system consists of loosely coupled sites (computer) that share no physical
components and each site is associated a database system.
• The software that maintains and manages the working of distributed databases is called distributed database
management system.
• The database system that runs on each site is independent of each other. Refer Fig. 5.1.1.

The transactions can access data at one or more sites.


Advantages of distributed database system
(1) There is fast data processing as several sites participate in request processing.
(2) Reliability and availability of this system is high.
(3) It possess reduced operating cost.
(4) It is easier to expand the system by adding more sites.
(5) It has improved sharing ability and local autonomy.
Disadvantages of distributed database system
(1) The system becomes complex to manage and control.
(2) The security issues must be carefully managed.
(3) The system require deadlock handling during the transaction processing otherwise the entire system may
be in inconsistent state.
(4) There is need of some standardization for processing of distributed database system.
Difference between distributed DBMS and centralized DBMS
Uses of distributed system:
(1) Often distributed databases are used by organizations that have numerous offices in different geographical
locations. Typically an individual branch is interacting primarily with the data that pertain to its own
operations, with a much less frequent need for general company data. In such a situation, distributed systems
are useful.
(2) Using distributed system, one can give permissions to single sections of the overall database, for better
internal and external protection.
(3) If we need to add a new location to a business, it is simple to create an additional node within the database,
making distribution highly scalable.

Types of Distributed Databases


There are two types of distributed databases -
(1) Homogeneous databases
• The homogeneous databases are kind of database systems in which all sites have identical software running
on them. Refer Fig. 5.1.2.

• In this system, all the sites are aware of the other sites present in the system and they all cooperate in
processing user's request.
• Each site present in the system, surrenders part of its autonomy in terms of right to change schemas or
software.
• The homogeneous database system appears as a single system to the user.
(2) Heterogeneous databases
• The heterogeneous databases are kind of database systems in which different sites have different schema or
software. Refer Fig. 5.1.3.

• The participating sites are not aware of other sites present in the system.
• These sites provide limited facilities for cooperation in transaction processing.

5.1.1 Architecture
• Following is an architecture of distributed databases. In this architecture the local database is maintained by
each site.
• Each site is interconnected by communication network.
When user makes a request for particular data at site Si then it is first searched at the local database. If the data
is not present in the local database then the request for that data is passed to all the other sites via
communication network. Each site then searches for that data at its local database. When data is found at
particular site say Sj then it is transmitted to site Si via communication network.

5.1.2 Data Storage


There are two approaches of storing relation r in distributed database -

(1) Replication: System maintains multiple copies of data, stored in different sites, for grind faster retrieval
and fault tolerance.
(2) Fragmentation: Relation is partitioned into several fragments stored in distinct sites.

Data Replication
• Concept: Data replication means storing a copy or replica of a relation fragments in two or more sites.
• There are two methods of data replication replication. (1) Full replication (2) Partial replication
• Full replication: In this approach the entire relation is stored at all the sites. In this approach full
redundant databases are those in which every site contains a copy of entire database.
• Partial replication: In this approach only some fragments of relation are replicated on the sites.
Advantages:
(1) Availability: Data replication facilitates increased availability of data.
(2) Parallelism: Queries can be processed by several sites in parallel.
(3) Faster accessing: The relation r is locally available at each site, hence data accessing becomes faster.
Disadvantages:
(1) Increased cost of update: The major disadvantage of data replication is increased betcost of updated.
That means each replica of relation r must be updated from all the sites if user makes a request for some
updates in relation.
(2) Increased complexity in concurrency control: It becomes complex to implement the concurrency
control mechanism for all the sites containing replica.

Data Fragmentation
• Concept: Data fragmentation is a division of relation r into fragments r1,r2, r3,...,rn which contain sufficient
information to reconstruct relation r.
• There are two approaches of data fragmentation - (1) Horizontal fragmentation and (2) Vertical
fragmentation.
• Horizontal fragmentation: In this approach, each tuple of r is assigned to one or more fragments. If
relation R is fragmented in r1 and r2 fragments, then to bring these fragments back to R we must use union
operation. That means R=r1ur2
• Vertical fragmentation: In this approach, the relation r is fragmented based on one or more columns. If
relation R is fragmented into r1 and r2 fragments using vertical fragmentation then to bring these fragments
back to original relation R we must use join operation. That means R= r1 r2
• For example - Consider following relation r
Student(RollNo, Marks, City)
The values in this schema are inserted as

Horizontal Fragmentation 1:
SELECT * FROM Student WHERE Marks >50 AND City='Pune'
We will get

Horizontal Fragmentation 2:
SELECT * FROM Student WHERE Marks >50 AND City="Mumbai'
We will get

Vertical Fragmentation 1 :
SELECT * FROM RollNo

Vertical Fragmentation 2:
SELECT * FROM city

5.1.3 Transaction Processing


Basic Concepts
In distributed system transaction initiated at one site can access or update data at other sites. Let us discuss
various basic concepts used during transaction processing in distributed systems -
• Local and global transactions :
Local transaction Ti is said to be local if it is initiated at site Si and can access or update data at site Si only.
Global transaction Ti initiated by site Si is said to be global if it can access or update data at site Si, Sj,Sk and
so on.
• Coordinating and participating sites:
The site at which the transaction is initiated is called coordinating site. The participating sites are those sites at
which the sub-transactions are executing. For example - If site S1 initiates the transaction T1 then it is called
coordinating site. Now assume that transaction T1 (initiated at S1) can access site S2 and S3. Then sites S2
and S3 are called participating sites.
To access the data on site S2, the transaction T1 needs another transaction T12 on site S2 similarly to access
the data on site S3, the transaction T2 needs some transaction say T13 on site S3. Then transactions T12 and
T13 are called sub-transactions. The above described scenario can be represented by following Fig. 5.1.6.

• Transaction manager :
The transaction manager manages the execution of those transactions (or subtransactions) that access data
stored in a local site.
(1) To maintain the log for recovery purpose.
(2) Participating in coordinating the concurrent execution of the transactions executing balls at that site.
• Transaction coordinator:
The transaction coordinator coordinates the execution of the various transactions (both local and global)
initiated at that site.
The tasks of Transaction coordinator are -
(1) Starting the execution of transactions that originate at the site.
(2) Distributing subtransactions at appropriate sites for execution
Let TC denotes the transaction coordinator and TM denotes the transaction manager, then the system
architecture can be represented as,

Failure Modes
There are four types of failure modes,
1. Failure of site
2. Loss of messages
3. Failure of communication link
4. Network partition
The most common type of failure in distributed system is loss or corruption of messages. The system uses
Transmission Control Protocol(TCP) to handle such error. This is a standard connection oriented protocol in
which message is transmitted from one end to another using wired connection.
• If two nodes are not directly connected, messages from one to another must be routed through sequence of
communication links. If the communication link fails, the messages are rerouted by alternative links.
• A system is partitioned if it has been split into two subsystems. This is called partitions. Lack of connection
between the subsystems also cause failure in distributed system.

Commit Protocols
Two Phase Commit Protocol
• The atomicity is an important property of any transaction processing. What is this atomicity property? This
property means either the transaction will execute completely or it won't execute at all.
• The commit protocol ensures the atomicity across the sites in following ways -
i) A transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the
sites.
ii) Not acceptable to have a transaction committed at one site and aborted at another.
• There are two types of important sites involving in this protocol -
• One Coordinating site
• One or more participating sites.
Two phase commit protocol
This protocol works in two phases - i) Voting phase and ii) Decision phase.
Phase 1: Obtaining decision or voting phase
Step 1: Coordinator site Ci asks all participants to prepare to commit transaction Ti.
• Ci adds the records <prepareT> to the log and writes the log to stable storage.
• It then sends prepare T messages to all participating sites at which T will get executed.

Step 2: Upon receiving message, transaction manager at participating site determines if it can commit the
transaction
• If not, add a record <no T> to the log and send abort T message to coordinating site Ci.

• If the transaction can be committed, then :


• Add the record <ready T> to the log
• Force all records for T to stable storage
• Send ready T message to Ci.
Phase 2: Recoding decision phase
• T can be committed of Ci received a ready T message from all the participating sites otherwise T must be
aborted.
• Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable
storage. Once the record stable storage it is irrevocable (even if failures occur)

• Coordinator sends a message to each participant informing it of the decision


(commit or abort)
• Participants take appropriate action locally.

Failure of site
There are various cases at which failure may occur,
(1) Failure of participating sites
• If any of the participating sites gets failed then when participating site Si recovers, it examines the log entry
made by it to take the decision about executing transaction.
• If the log contains <commit T> record: participating site executes redo (T)
• If the log contains <abort T> record: participating site executes undo (T)
• If the log contains <ready T> record: participating site must consult Coordinating site to take decision
about execution of transaction T.
• If T committed, redo (T)
• If T aborted, undo (T)
• If the log of participating site contains no record then that means Si gets failed before responding to Prepare
T message from coordinating site. In this case it must abort T
(2) Failure of coordinator
• If coordinator fails while the commit protocol for T is executing then participating sites must take decision
about execution of transaction T:
i) If an active participating site contains a <commit T> record in its log, then T site must be committed.
ii) If an active participating site contains an <abort T> record in its log, then T must be aborted.
iii) If some active participating site does not contain a <ready T> record in its log, then the failed coordinator
Ci cannot have decided to commit T. Can therefore abort T.
iv) If none of the above cases holds, then all participating active sites must have a <ready T> record in their
logs, but no additional control records (such as <abort T> of <commit T>). In this case active sites must wait
for coordinator site Ci to recover, to find decision.
Two phase locking protocol has blocking problem.
What is blocking problem?
It is a stage at which active participating sites may have to wait for failed coordinator site to recover.
The solution to this problem is to use three phase locking protocol.
Three Phase Commit Protocol
• The three phase locking is an extension of two phase locking protocol in which eliminates the blocking
problem.
• Various assumptions that are made for three phase commit protocol are -
• No network partitioning.
• At any point at least one site must be up.
• At the most k sites (participating as well as coordinating) can fail.
• Phase 1: This phase is similar to phase 1 of two phase protocol. That means Coordinator site Ci asks all
participants to prepare to commit transaction Ti. The coordinator then makes the decision about commit or
abort based on the response from all the participating sites.
• Phase 2: In phase 2 coordinator makes a decision as in 2 Phase Commit which is called the pre-commit
decision <Pre-commit, T>, and records it in multiple (at least K) participating sites.
• Phase 3: In phase 3, coordinator sends commit/abort message to all participating Brits sites.
• Under three phase protocol, the knowledge of pre-commit decision can be used to commit despite
coordinator site failure. That means if the coordinating site in case gets failed then one of the participating site
becomes the coordinating site and der consults other participating sites to know the Pre-commit message
which they possess. Thus using this pre-commit t message the decision about commit/abort is taken by this
new coordinating site.
• This protocol avoids blocking problem as long as less than k sites fail.
Advantage of three phase commit protocol
(1) It avoid blocking problem.
Disadvantage of three phase commit protocol
(1) The overhead is increased.

5.1.4 Query Processing and Optimization


Distributed database query is processed using following steps-
(1) Query Mapping:
• The input query on distributed data is specified using query language.
• This query language is then translated into algebraic query.
• During this translation the global conceptual schema is referred.
• During the translation some important actions such as normalization, analysis for semantic errors,
simplification are carried out then input query is restructured into algebraic query.
(2) Localization:
• In this step, the replication of information is handled.
• The distributed query is mapped on the global schema to separate queries on individual fragments.
(3) Global Query Optimization: optimization means selecting a strategy from list of candidate queries which
is closest to optimal. For optimization, the cost is computed. The total cost is combination of CPU cost, I/O
cost and communication costs.
(4) Local Query Optimization: This step is common to all sites in distributed database. The techniques of
local query optimization are similar to those used in centralized systems.
5.2 NOSQL Databases

5.2.1 Introduction
• NoSQL stands for not only SQL.
• It is nontabular database system that store data differently than relational tables. There are various types of
NoSQL databases such as document, key-value, wide column and graph.
• Using NoSQL we can maintain flexible schemas and these schemas can be scaled easily with large amount
of data

Need
The NoSQL database technology is usually adopted for following reasons -ut
1) The NoSQL databases are often used for handling big data as a part of fundamental architecture.
2) The NoSQL databases are used for storing and modelling structured, semi- structured and unstructured
data.
3) For the efficient execution of database with high availability, NoSQL is used.
4) The NoSQL database is non-relational, so it scales out better than relational databases and these can be
designed with web applications.
5) For easy scalability, the NoSQL is used.

Features
1) The NoSQL does not follow any relational model.
2) It is either schema free or have relaxed schema. That means it does not require specific definition of
schema.
3) Multiple NoSQL databases can be executed in distributed fashion.
4) It can process both unstructured and semi-structured data.
5) The NoSQL have higher scalability.
6) It is cost effective.
7) It supports the data in the form of key-value pair, wide columns and graphs.

Comparison between RDBMS and NoSQL

5.2.2 CAP Theorem


• Cap theorem is also called as brewer's theorem.
• The CAP theorem is comprised of three components (hence its name) as they relate to distributed data stores:
• Consistency: All reads receive the most recent write or an error.
• Availability: All reads contain data, but it might not be the most recent.
• Partition tolerance: The system continues to operate despite network failures(i.e.; dropped partitions,
slow network connections, or unavailable network connections between nodes.)
• The CAP theorem states that it is not possible to guarantee all three of the desirable properties - Consistency,
availability and partition tolerance at the same time in a distributed system with data replication.
5.2.3 Types of NoSQL Database
There are four types of NoSQL databases and those are -
1. Key-value store
2. Document store
3. Graph based
4. Wide column store
Let us discuss them in detail.

5Key-Value Store
• Key-value pair is the simplest type of NoSQL database.
• It is designed in such a way to handle lots of data and heavy load.
• In the key-value storage the key is unique and the value can be JSON, string or binary objects.
• For example -
{Customer:
|
{"id":1,"name":"Ankita"},
{"id":2,"name":"Kavita"}
|
}
Here id, name are the keys and 1,2, "Ankita", "Prajkta" are the values corresponding to those keys.
Key-value stores help the developer to store schema-less data. They work best for Shopping cart contents.
The DynamoDB, Riak, Redis are some famous examples of key-value store.

Document Based Systems


• The document store make use of key-value pair to store and retrieve data.
• The document is stored in the form of XML and JSON.
• The document stores appear the most natural among NoSQL database types.
• It is most commonly used due to flexibility and ability to query on any field.
• For example -
{
"id": 101,
"Name": "AAA",
"City" : "Pune"
}
MongoDB and CouchDB are two popular document oriented NoSQL database.

Column Based Systems


• The column store model is similar to traditional relational database. In this model, the columns are created
for each row rather than having predefined by the table structure.
• In this model number of columns are not fixed for each record.
• Columns databases can quickly aggregate the value of a given column.
• For example -

The column store databases are widely used to manage data warehouses, business intelligence, HBase,
Cassandra are examples of column based databases.

Graph Databases
The graph database is typically used in the applications where the relationships among the data elements is an
important aspect.
The connections between elements are called links or relationships. In a graph database, connections are first-
class elements of the database, stored directly. In relational databases, links are implied, using data to express
the relationships.
The graph database has two components -
1) Node: The entities itself. For example - People, student.
2) Edge: The relationships among the entities.
For example -

Graph base database is mostly used for social networks, logistics, spatial data. The graph databases are -
Neo4J, Infinite Graph,OrientDB.
5.3 Database Security
• Definition: Database security is a technique that deals with protection of data against unauthorized access
and protection.
• Database security is an important aspect for any database management system as it deals with sensitivity of
data and information of enterprise.
• Database security allows or disallows users. from performing actions on the database objects.
5.3.1 Security Issues
Types of Security
Database security addresses following issues -
(1) Legal Issues: There are many legal or ethical issues with respect to right to access information. For
example - If some sensitive information is present in the database, then it must not be accessed by
unauthorized person.
(2) Policy Issues: There are some government or organizational policies that tells us what kind of information
should be made available to access publicly.
(3) System Issues: Under this issue, it is decided whether security function should be handled at hardware
level or at operating system level or at database level.
(4) Data and User Level Issues: In many organizations, multiple security levels are identified to categorize
data and users based on these classifications. The security policy of organization must understand these levels
for permitting access to different levels of users.

Threats to Database
Threats to database will result in loss or degradation of data. There are three kinds of loss that occur due to
threats to database
(1) Loss of Integrity:
• Database integrity means information must be protected from improper modification.
• Modification to database can be performed by inserting, deleting or modifying the data.
• Integrity is lost if unauthorized changes are made to data intentionally or accidently.
• If data integrity is not corrected and work is continued then it results in inaccuracy, fraud, or erroneous
decision.
(2) Loss of Availability:
• Database availability means making the database objects available to authorized
users.
(3) Loss of Confidentiality:
• Confidentiality means protection of data from unauthorized disclosure of information.
• The loss of confidentiality results in loss of public confidence, or embarrassment or some legal action against
organization.

Control Measures
• There are four major control measures used to provide security on data in database.
1. Access control
2. Interface control
3. Flow control
4. Data encryption
• Access Control: The most common security problem is unauthorized access to of computer system.
Generally this access is for obtaining the information or to make malicious changes in the database. The
security mechanism of a DBMS must include provisions for restricting access to the database system as a
whole. This function, called access control.
• Inference Control: This method is used to provide the security to statistical database security problems.
Statistical databases are used to provide statistical information based on some criteria. These databases may
contain information about particular age group, income-level, education criteria and so on. Access to some
sensitive information must be avoided while using the statistical databases. The corresponding measure that
prevents the user from completing any inference channel.
• Flow Control: It is a kind of control measure which prevents information from flowing in such a way that it
reaches unauthorized users. Channels that are pathways for information to flow implicitly in ways that violate
the security policy of an organization are called covert channels.
• Data Encryption: The data encryption is a control measure used to secure the sensitive data. In this
technique, the data is encoded using some coding algorithm. An unauthorized user who accesses encoded data
will have difficulty deciphering it, but authorized users are given decoding or decrypting algorithms (or keys)
to decipher the data.

Database Security and DBA


• DBA stands for Database Administrator, who is the central authority for managing the database system.
• DBA is responsible for granting privileges to users who want to use the database system.
• The DBA has a DBA account in the DBMS which is sometimes called as system or superuser account. It
provides powerful capabilities that are not made available for regular database accounts and users.
• DBA makes use of some special commands that perform following type of actions –
1. Account Creation: This command helps in creating a new account and password for a single user or for
group of users.
2. Privilege Granting: This command allows the DBA to grant privileges to certain accounts.
3. Privilege Revocation: This command allows the DBA to cancel the privileges to certain accounts.
4. Security Level Assignment: This action assigns user account to the appropriate security clearance level.
Thus the DBA is responsible for the overall security of the database system.
5.3.2 Access Control Based on Privileges or Discretionary Access Control
• Discretionary Access Control (DAC) is a access control mechanism based on privileges.
• Types of discretionary privileges: The DBMS must provide selective access to each relation in the database
on specific accounts. This selective access is known as privileges. There are two levels for assigning
privileges for using Database systems and these are -
• The account level: At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
• Relation(or table) level: At this level, the DBA can control the privilege to access each individual relation
or view in the database.
• For granting the privileges, the access control mechanism follows an authorizationof (a model for
discretionary privileges known as the access matrix model.
• The access matrix is a table with rows and columns. It defines the access permissions.
• The rows of a matrix M represent subjects (users, accounts, programs)
• The columns represent objects (relations, records, columns, views, operations).
• Each position M(i, j) in the matrix represents the types of privileges (read, write, esdstab update) that
subject i holds on object j.
• For example -

• Discretionary Access Control allows each user or subject to control access to their own data.
• In DAC, owner of resource restricts access to the resources based on the identity of users.
• DAC is typically the default access control mechanism for most desktop operating doy systems.
• Each resource object on DAC based system has Account Control List (ACL) associated with it.
• An ACL contains a list of users and groups to which the user has permitted access together with the level of
access for each user or group.
• For example - The ACL is an object centered description of access rights as follows-
test1.doc: {Prajka: read}
test2.exe: {Ankita: execute}, {Prajkta: execute}
test3.com: (Ankita: execute, read}, {Prajkta: execute, read, write}
• Object access is determined during Access Control List (ACL) authorization and based on user identification
and/or group membership.
• Under DAC a user can only set access permissions for resources which they already own.
• Similarly a hypothetical user A cannot change the access control for a file that is owned by user B. User A
can, however, set access permissions on a file that he/she Jono owns.
• User may transfer object ownership to another user(s).
• User may determine the access type of other users.
• The DAC is easy to implement access control model.
Advantages:
(1) It is flexible.
(2) It has simple and efficient access right management.
(3) It is scalable. That means we can add more users without any complexity.
Disadvantages:
(1) It increases the risk that data will be made accessible to users that should not necessarily be given access.
(2) There is no control over information flow as one user can transfer ownership to another user.
5.3.3 Role Based Access Control
• It is based on the concept that privileges and other permissions are associated with A organizational roles,
rather than individual users. Individual users are then assigned to appropriate roles.
• For example, an accountant in a company will be assigned to the Accountant role, gaining access to all the
resources permitted for all accountants on the system. Similarly, a software engineer might be assigned to the
Developer role.
• In an RBAC system, the roles are centrally managed by the administrator. The administrators determine
what roles exist within their companies and then map these roles to job functions and tasks.
• Roles can effectively be implemented using security groups. The security groups are created representing
each role. Then permissions and rights are assigned to these groups. Next, simply add the appropriate users to
the appropriate security groups, depending on their roles or job functions.
• A user can have more than one role. And more than one user can have the same role.
• Role hierarchies can be used to match natural relations between roles. For example - A lecturer can create a
role student and give it a privilege "read course material".
• Role Based Access Control (RBAC), also known as non discretionary access control.
• RBAC security strategy is widely used by most organizations for deployment of commercial and off-the-
shelf products.
Advantages:
(1) The security is more easily maintained by limiting unnecessary access to sensitive information based on
each user's established role within the organization.
(2) All the roles can be aligned with the organizational structure of the business and users can do their jobs
more efficiently and autonomously.
Disadvantages:
(1) It is necessary to understand each user's functionality in depth so that roles can be properly assigned.
(2) If roles are not assigned properly then inappropriate access right creates security severe problems for
database system.
5.3.4 SQL Injection
• SQL injection is a type of code injection technique that might destroy the databases.
• In this technique the malicious code in SQL statement is placed via web page input. These statements control
a database server behind a web application.
• Attackers can use SQL injection vulnerabilities to bypass application security measures. They can go around
authentication and authorization of a web page or web application and retrieve the content of the entire SQL
database. They can also use SQL injection to add, modify and delete records in the database.
• An SQL injection vulnerability may affect any website or web application that uses an SQL database such as
MySQL, Oracle, SQL Server or others.
How SQL Injection Works?
• To make an SQL injection attack, an attacker must first find vulnerable user inputs ad to within the web page
or web application. A web page or web application that has an ses SQL injection vulnerability uses such user
input directly in an SQL query. The attacker can create input content. Such content is often called a malicious
payload and is the key part of the attack. After the attacker sends this content, malicious SQL commands are
executed in the database.
• SQL is a query language that was designed to manage data stored in relational Sup databases. You can use it
to access, modify and delete data. Many web applications and websites store all the data in SQL databases. In
some cases, you can also use SQL commands to run operating system commands. Therefore, a successful
SQL Injection attack can have very serious consequences.
Example of SQL Injection
• Following is an example of SQL injection vulnerability works around a simple rid web application having
two input fields - One for user name and another for password.
• This example has a table named users with the columns username and password
uname-request.POST['username']
passwd=request.POST['password']
query="SELECT id FROM users WHERE username='"+ uname +"' ANDpassword='"+ passwd +"'
database.execute(query)
• Here the two input fields - One for user name and another for password is vulnerable to SQL injection.
• The attacker can attack using these fields and alter the SQL query to get the access to the database.
• They could use a trick on password field. They could add
OR 1 = 1
Statement to the password field.
• As a result the query would becomes (assuming username as 'user1' and password='password')
• SELECT id FROM users WHERE username='user1' AND password='password' OR 1 = 1
• Because of OR 1 = 1 statement, the WHERE clause returns the first id from the users table no matter what
the username and password are. That means even-if we enter any wrong username or password still the query
will get executed because of OR 1 = 1 part which comes out to be true.
• The first id is returned by the above query for users table and we know that the first id is normally
administrator. In this way, the attacker not only bypasses authentication but also gains administrator
privileges.
How to prevent SQL injection?
• The only way to prevent SQL injection is to validate every input field.
• Another method is to make use of parameterized query. This parameterized query is called prepared
statement. By this ways, application code never use the input directly.
• The Web Application Firewalls (WAF) are also used to filter out the SQL.
5.3.5 Statistical Database Security
• Statistical databases contain statistical data about various populations.
• A population is a set of tuples of a table that satisfy some selection criteria.
• The statistical database may contain the confidential data about individuals. For example - The database
system about government agencies is a statistical database. The statistical database security helps in protecting
the sensitive information present in the statistical database from user's access.
• Users are permitted to use some portion of statistical database. He/she can not have access over the complete
database system. For instance - In an employees database, any user is not permitted to access the information
about employee's salary.
• Statistical aggregate functions such as COUNT, MAX,MIN, AVERAGE and STANDARD DEVIATION
are used in the queries which are called as statistical queries.
• The possibility of accessing individual information from statistical queries is reduced by following ways -
• No statistical query is permitted whenever the number of tuples in the population specified by the selection
condition falls below some threshold.
• Prohibit sequence of queries that refer repeatedly to same population of tuples.
• Partition the database. That means records are stored in groups of some minimum size. The query can refer
to any complete group but never to subsets of records within a group.
5.3.6 Flow Control
• Flow control is a mechanism that regulates the flow of information among accessible objects.
• A flow between two objects obj1 and obj2 occurs when program reads values from obj1 and writes values to
the object obj2.
• The flow control checks that the information contained in one object should not get transferred to the less
protected object.
• The flow policy specifies the channels along which the information is allowed to move.
• The simple flow policy specifies two classes of information - Confidential(C) and non confidential(N).
According to flow policy only the information flow from confidential to non confidential class is not allowed.

Convert Channel
• A covert channel is a type of attack that creates a capability to transfer information objects between
processes that are not supposed to be allowed to communicate.
• This convert channel violates the security or the policy.
• The convert channel allows information to pass from higher classification level to lower classification level
through improper means.
• The security experts believe that one way to avoid convert channels is for as programmers to not gain the
access to sensitive data.
5.3.7 Encryption and Public Key Infrastructures
Cryptology is a technique of encoding and decoding messages, so that they cannot be understood by anybody
except the sender and the intended recipient.
There are various encoding and decoding schemes which are called as encryption schemes. The sender and
recipient of the message decide on an encoding and decoding scheme and use it for communication.
The process of encoding messages is known as encryption. The sender sends the original text. The original
text called plaintext, The encrypted form of plaintext it is called as ciphertext. This encrypted text travel
through the network. When it reaches at the receiving computer, the recipient understands the meaning and
decodes the message to extract the correct meaning out of it. This process is called as decryption.

The sender applies the encryption algorithm and recipient applies the decryption algorithm. Both the sender
and the receiver must agree on this algorithm for any meaningful communication. The algorithm basically
takes one text as input and produces another as the output. Therefore, the algorithm contains the intelligence
for transforming message.
For example: If we want to send some message through an e-mail and we wish that nobody except the friend
should be able to understand it. Then the message can be encoded using some intelligence. For example if the
alphabets A to Z are encoded as follows-

That means last three letters are placed in reverse order and then first three letters are in straight manner.
Continuing this logic the A to Z letters are encoded. Now if I write the message
"SEND SOME MONEY"
it will be
QBSA QRTB TRSBN
This coded message is called cipher text.
There are variety of coding methods that can be used.

Types of Cryptography
There are two types encryption schemes based in key used for encryption and decryption.
1. Symmetric key encryption: It is also known as secret key encryption. In this method, only one key is used.
The same key is shared by sender and receiver for encryption and decryption of messages. Hence both parties
must agree upon the key before any transmission begins and nobody else should know about it. At the sender's
end, the key is used to change the original message into an encoded form. At the receiver's end using the same
key the encoded message is decrypted and original message is obtained. Data Encryption Standard (DES) uses
this approach. The problem with this approach is that of key agreement and distribution.
2. Asymmetric key encryption: It is also known as public key encryption. In this method, different keys are
used. One key is used for encryption and other key must be used for decryption. No other key can decrypt the
message-not even the original key used for encryption.
One of the two keys is known as public key and the other is the private key. Suppose there are two users X
and Y. The
• X wants to send a message to Y. Then X will convey its public key to Y but the private key of X will be
known to X only.
• Y should know the private key of Y and X should know the Y's public key.
When X and Y wants to communicate:m
1. If X wants to send a message to Y, then first of all X encrypts the message using Y's public key. For that
purpose it is necessary that X knows the Y's public key.
2. X then sends this encrypted to Y.
3. Now using Y's private key, Y decrypts X's message. Note that only Y knows his private key. It is not
possible for Y to decrypt the message using X's public key.
4. When Y wants to send a message to X then using X's public key Y will encrypt the message and will send
the encrypted message to X. On the other hand, X will use its own private key to decrypt this message. Here
again Y will not know the private key of X.

Digital Signature
A digital signature is a mathematical scheme for demonstrating the authenticity of a digital message or
document. If the recipient gets a message with digital signature then he believes that the message was created
by a known sender.
Digital signatures are commonly used for software distribution, financial transactions, and in other cases
where it is important to detect forgery or tampering.
When X and Y wants to communicate with each other
1. X encrypts the original plaintext message into ciphertext by using Y's public key.
2. Then X executes an algorithm on the original plaintext to calculate a Message Digest, also known as hash.
This algorithm takes the original plaintext in the binary format, apply the hashing algorithm. As an output a
small string of binary digits gets created. This hashing algorithm is public and anyone can use it. The most
popular message digest algorithms are MD5 and SHA-1. X encrypts the message digest. For this, it uses its
own private key.
3. X now combines the ciphertext and its digital signature (i.e encrypted message digest) and it is sent over the
network to Y.
4. Y receives the ciphertext and X's digital signature. Y has to decrypt both of these. Y first decrypts
ciphertext back to plaintext. For this, it uses its own private key. Thus, Y gets the message itself in a secure
manner.
5. Now to ensure that the message has come from the intended sender Y takes X's digital signature and
decrypts it. This gives Y the message digest as was generated by X. The X had encrypted the message digest
to form a digital signature using its own private key. Therefore, Y uses X's public key for decrypting the
digital signature.
6. Hash algorithm to generate the message digest is public. Therefore, Y can also use it.
7. Now there are two message digests one created by X and other by Y. The Y now Anon simply compares the
two message digests. If the two match, Y can be sure that the message came indeed from X and not from
someone else.
Thus with digital signature confidentiality, authenticity as well as message integrity is assured.
The other important feature supported by digital signature is non-repudiation. That is, a sender cannot refuse
having sent a message. Since the digital signature requires the private key of the sender, once a message is
digitally signed, it can be legally proven that the sender had indeed sent the message.
5.3.8 Challenges
Following are the challenges faced by the database security system -
(1) Data Quality
• The database community need the solution to assess the quality of data. The quality of data can be assessed
by a simple mechanism such as quality stamps that are posted on web sites:
• The database community may need more effective technique of integrity semantic verification for accessing
the quality of data.
• Application level recovery techniques are also used to repair incorrect data.
(2) Intellectual Property Rights
• Everywhere there is increasing use of internet and intranet. Due to which there are chances of making un-
authorized duplication and distribution of the contents. Hence digital watermarking technique is used to
protect the contents from unauthorized access or ownership.
• However, research is needed to develop the techniques for preventing intellectual property right violation.
(3) Database Survivability
• It is desired that the database systems must continue to work even after information warfare attacks.
• The goal of information warfare attacker is to damage the organization's operation.
• Following are the corrective actions for handling this situation -
• Confinement : Take immediate action to eliminate attacker's access to the system. Isolate the affected
components to avoid further spread.
• Damage Assessment: Determine the extent of problem.
• Reconfiguration:Re-configuration allows the system to be in operation in degraded mode while recovery
is going on.
• Repair:Recover the corrupted or lost data by repairing or reinstalling the system.
• Fault treatment: Identify the weakness exploited in the attack and take steps to prevent a recurrence.

You might also like