0% found this document useful (0 votes)
25 views44 pages

Unit III - RDBMS

Relational database management system
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
25 views44 pages

Unit III - RDBMS

Relational database management system
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 44
UNIT - Il DATABASE DESIGN 41 INTRODUCTION Database design can be defined as a collection of tasks or processes that enhance the designing, development, implementation, and maintenance of enterprise data management system. Designing a proper database reduces the maintenance cost thereby improving data consistency and the cost-effective measures are greatly influenced in terms of disk storage space. The designer should follow the constraints and decide how the elements correlate and what kind of data must be stored. The main objectives behind database designing are to produce physical and logical design models of the proposed database system. To elaborate this, the logical model is primarily concentrated on the requirements of data and the considerations must be made in terms of monolithic considerations and hence the stored physical data must be stored independent of the physical conditions. On the other hand, the physical database design model includes a translation of the logical design model of the database by keep control of physical media using hardware resources and software systems such as Database Management System (DBMS). Ly 4.2 (a) (b) (ce) Database Design DATABASE DESIGN PROC The process of designing a database involves the following steps: Determine the purpose of the database ind and Organize the Information v Create tables for the information Establish relationship between the tables — v Redefine your design Fig 4.1 Steps involved in Database Design Process Determine the Purpose of the database: be used for, how it is expected to be used, who We expect to use it, ete. This will help us to develop a mission statement and prepare for the Femaining steps, Find and organize the information: Once the purpose of the database has been figured OUl, We need to gather the data that needs 10 be Slored there. After the nece gathered, we need 10 organize it. information by breaking each Piece Create tables for the information: Ssary information are Its usually easiest to organize the into its smallest Useful parts, Once the information is Oeanized, we want to diy tables. Separate the dati a into major entities or s will become a table. Lal ide up the information into bel each table with the ubjects. Then, each subject Subject within that table. Relational Database Management System 4.3 Establish relationships between the tables: It can be hard to use a database with independent or unrelated tables. It is best to look at each individual table and decide how the data within relates to the data in other tables. We can then add fields to the tables or create new ones to clarify the established relationships so everything is connected Redefine your design: One of the last database design steps is to take a step back once to check if we've “completed” the database. We need to scan it and analyze the design for any errors. Run the database with the tables and record to see if we can get the results that we want. Necessary adjustments and refinements are to be made to get the desired output. OBJECTIVES OF DATABASE DESIGN The database supports both required and ad hoc information retrieval. The database must store the data necessary to support information requirements defined during the design process and any possible ad hoc queries that may be posed by a user. The tables are constructed properly and efficiently. Each table in the database represents a single subject, is composed of relatively distinct fields, keeps redundant data to an absolute minimum, and is identified throughout the database by a field with unique values. Data integrity is imposed at the field, table, and relationship levels. These levels of integrity help guarantee that the data structures and their values will be valid and accurate at all times. The database supports business rules relevant to the organization. The data must provide valid and accurate information that is always meaningful to the business. The database lends itself to future growth. The database structure should be easy to modify or expand as the information requirements of the business change and grow. ‘The database should be Flexible. The database should not be implemented in a firm manner assuming that the business remains constant forever. ‘The database should be Efficient. The database design should make full and efficient Use of the facilities provided; also the users must be able to interact with the database without any time delay. og 44 43 43.1 43.2 4.3.3 Database Ds DATABASE DESIGN TOOLS NEED FOR DATABASE DESIGN TOOLS DESIRED FE ATURES ADVANTAGES DISADVANTAGES COMMERCIAL DATABASE DESIGN TOOLS _ ASE DESIGN TOOLS / sed to automate the task of designing a business system. NEED FOR DATAB. available with a variety of features. The design tools are The database design tools are u! Many database design tools are vendor specific. productivity because the manual tasks are The database design tools increase the overall tedious tasks and more time is spent in automated and less time is spent in performing thinking about the actual design of the database. The quality of the end product is improved in using database design tools. DESIRED FEATURES OF DATABASE DESIGN TOOLS Various features of database design tools are as follows: The database design tools should capture the user needs. The capability to model the flow of data in an organization. The database design tool should have the capability to model entities and their relationships. The database design tool should hav i e the capability t 2 fi Language (DDL) to create database object.4 " ihe The database design tool should support full life cycle database support. The database design tool should . Avira generate reports for documentation and user-feedback ADVANTAG: The advantages of using database design tools are as follows: The amount of code to be sre written ii c $a res reduced as a result the database design time is Relational Database Management System 4.5 Chances of errors because of manual work are reduced. Easy to convert the business model to working database model Easy to ensure that all business requirements are met with. A Higher quality, more accurate product is produced. DISADVANTAGES Some of the disadvantages of database design tools are given below: More expenses involved for the tool itself. Developers may require special training to use the tool. COMMERCIAL DATABASE DESIGN TOOLS Various popular database design tools are as follows: HeidiSQL This free and open-source software is one of the most popular data modeling tools for MariaDB and MySQL worldwide. Archi It is an open-source conceptual and physical data modeling tool that uses the ArchiMate modeling language. This language supports the analysis and visualization of various complex database systems. PgModeler PgModeler is an open-source database modeler that supports multiple PostgreSQL databases. MySQL Workbench MySQL Workbench is more than just a visual database design tool; it also integrates database administration, performance monitoring, and database migration. ModelSphere Open Model Sphere is an open-source UML modeling tool that supports all forms of data models - conceptual, logical, and physical. It allows for the conversion of models from one type to another. o 46 6) 7 44 44.1 4.4.1.1 RED! Database Design Database Deployment Manager Database Deployment Manager (DDM) is an open-source database design tool that allows users - typically programmers - to create models and diagrams. It is also a database management software that enables users to create and maintain databases and create ER diagrams between tables, DBDesigner DBDesigner is an online database modeling tool that allows users to design database schema without writing any SQL code. Its simple and intuitive user interface has features that simplify the modeling process. FUNCTIONAL DEPENDENCIES mr Introduction + Redundancy & Data Anomaly Armstrong’s axioms/properties * Types of functional dependencies INTRODUCTION ANCY AND DATA ANOMALY Redundancy means having multiple copies of same data in the database. This problem arises when a database is not normalized. Suppose a table of student details attributes are: student Id, student name, college name, college rank, course opted, ; a Relational Database Management System 47 ‘As it can be observed that values of attribute college name, college rank, course is used due to redundancy are: Insertion being repeated W hich can lead to problems, Problems anomaly. Deletion anomaly, and Updation anomaly 1 Insertion Anomaly If a student detail has to be inserted whose course is not being decided yet then insertion will not be possible till the time course is decided for student + 4 + This problem happens when the rtion of a data record is not possible without adding some additional unrelated data to the record. 2. Deletion Anomaly If the details of students in this table are deleted then the details of college will also get deleted. This anomaly happens when deletion of a data record results in losing some unrelated information that was stored as part of the record that was deleted from a table 11 is not possible to delete some information without losing some other information in the table as well. Updation Anomaly Suppose if the rank of the college changes then changes will have to be all over the database which will be time-consuming and computationally costly. If updation does not occur at all places then database will be in inconsistent state. ‘ datadase Desigy 44.1.2 WHAT IS FUNCTIONAL DEPENDENCY? + Functional dependency in DBMS is a relationship between attributes of a table Gependent on each other and was introduced by EF, Codd, which helps in preventing data redundaney . \ functional dependency is a constraint that specifies the relationship between two sels of attributes where one can accurately determine the value of other sets Ivas denoted as N —+ Y, where X is a set of attributes that is capable of determining, the value of Y_ The attribute set on the lett side of the arrow, X is called Determinant, while on the night side, W is called the Dependent Above suggests the following: Functional Dependency A->B | | B- functionally dependent on A | A- determinant set B - dependent attribute ee Relational Database Management System 4.9 "example: | roll_no Name dept_name dept building UCSOL Teena co M4 _ 7 —___ —___| ucso2 Uma Iv A3 | UCS03 Siva co A4 UCS04 Rex IT A3 | | uCS0S Vinu EC B2 | UCS06 | Meena ME B2 | From the above table we can conclude some valid functional dependencies: roll no—+ { name, dept_name, dept_building },+ Here, roll_no can determine values of fields name, dept_name and dept_building, hence a valid Functional dependency roll_no + dept_name , Since, roll_no can determine whole set of {name, dept_name, dept building}, it can determine its subset dept_name also. dept_name — dept_building , Dept_name can identify the dept_building accurately, since departments with different dept_name will also have a different dept_building More valid functional dependencies: roll_no — name, {roll_no, name} >> {dept_name, dept_building}, etc. He oe “re are some invalid functional dependencies: name , ra — dept_name Students with the same name can have different dept_name, lence eee ce this is not a valid functional dependency. dept ; i fe bulking —+dept_name There can be multiple departments in the same building, hence ae in the above table departments ME and EC are in the same building B2, lept_building — dept_name is an invalid functional dependency. 410 Database Design _ More invalid functional dependencies: name + roll no, fname, dept_name} —+ roll _no, dept building + roll no, etc Example The following 1s an example that would make it easier to understand functional dependency We have a table with two attributes ~ Deptid and DeptName. Deptld ~ Department ID DeptName ~ Department Name The Deptld is our —_ primary key. Here, Deptld uniquely identifies the DeptName attribute. This is because if you want to know the department name, then at first you need to have the Deptld. Deptld DeptName 001 Finance 002 Marketing 003 HR Therefore, the above functional dependency 'y between Deptld and DeptName can be Getermined as Deptid is functionally dependent on DeptName Deptld ->DeptName 4.4.2 ARMSTRONG’S AXIOMS/PROPERTIES OF FUNCTIONAL DEPENDENCIES: to Armstrong's Axioms property was developed by William Armstrong in 1974 to reason about functional dependencies. The propert Bi $ : f S Property suggests rules that hold true if the following are Relational Database Management System 4i) Reflexivity A-> B.if Bis a subset of A. Augmentation The last rule suggests: AC->BC, if A>B . _ Transitivity If A->B and B->C, then A->C i.e. a transitive relation. 4.4.2.1 Reflexivity: IFY isa subset of X, then XY holds by reflexivity rule For example, {roll_no, name} — name is valid 4.4.2.2 Augmentation: IfX — Y isa valid dependency, then XZ —+ YZ is also valid by the augmentation ral. For example, If {roll_no, name} — dept_building is valid, hence {roll_no, name, dept_name} — {dept_building, dept_name} is also valid. 4.4.2.3 Transitivity: IfX — Y and Y — Z are both valid dependencies, then X—+Z is also valid by the Transitivity rule. For example, roll_no —» dept_named&dept_name — dept_building, then roll_no — dept_building is also valid. 44.3 TYPES OF FUNCTIONAL DEPENDENCIES IN DBMS * Trivial functional dependency * — Non-Trivial functional dependency * — Multivalued functional dependency * Transitive functional dependency 443.1 Trivial Functional Dependency \n Trivial Functional Dependency, a dependent is always a subset of the determinant. ic IfX — Y and Y is the subset of X, then it is called trivial functional dependency database Design For example, roll_no | name | Age 42 abe | 17 | 43 par | 18 44 xyz | 18 Here, {roll_no, name} —> name is a trivial functional dependency, since the dependent name is a subset of determinant set {roll_no, name} Similarly, rol_no — roll_no is also an example of trivial functional dependency. 4.4.3.2 Non-trivial Functional Dependency In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. ie. IfX — Y and Yis not a subset of X, then it is called Non-trivial functional dependency. For example, roll_no | name | Age 42 abe 17 )43 par | 18 | | 44 xyz | 18 Here, roll_no + name is « non-trivial functional dependency, since the dependent mame is not a subset of determinant rell_no Similarly, {roll_mo, name} ~» age is also a non-trivial functional dependency cy, since age is not a subset of {roll_no, name} Relational Database Management System 4.13 3 Multivalued Functional Dependency In Multivalued functional dependency, entities of the dependent set are not dependent on each other. ie. If'a— {b, ¢} and there exists no funetional dependency between b and e, then it is called a multivalued functional dependency. For example, roll_no | name | age 42 abe | 17 4B pqr | 18 44 xyz | 18 45 abe | 19 Here, roll_no — {name, age} is a multivalued functional dependency, since the dependents name & age are not dependent on each other(i.e. mame — age or age > name doesn’t exist !) 4.4.3.4 Transitive Functional Dependency In transitive functional dependency, dependent is indirectly dependent on determinant. ie. Ifa + b&b — ¢, then according to axiom of transitivity, a > ¢. This is a transitive functional dependency. 4.14 Database Design For example, | roll_no | Name | dept name | dept_building [ucsor[tem|co fa | UCS02 | Uma | IT : 2 UCS03 | Siva co I UCS04 | Rex IT 2 Here, 14nroll_no — dept and dept — building_no, Hence, according to the axiom of transitivity, 14nroll_no —> building_no is a valid functional dependency. This is an indirect functional dependency, hence called Transitive functional dependency. 4.5 | NORMALIZATION ¢ Introduction 2NF © 3NF * BCNF * Denormalization 4.5.1) INTRODUCTION A large database defined as a single relation may result in data duplication. This repetition of data may result in: : + Making relations very large ° Difficult to maintain and update data as it would involve s i s searching many records in relation. Relational Database Management System 4.15 | __ Wastage and poor utilization of disk space and resources. The likelihood of errors and inconsistencies increases. So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer attributes. Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy (repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is multi-step process that puts data into tabular form, removing duplicated data from the relation tables. Normalization is used for mainly two purposes, + Eliminating redundant (useless) data. + Ensuring data dependencies make sense i.e data is logically stored. 4.5.2 ADVANTAGES & DISADVANTAGES OF NORMALIZATION 4.5.2.1 Advantages of Normalization «Normalization helps to minimize data redundancy. + Greater overall database organization. . Data consistency within the database. 7 Much more flexible database design. 7 Enforces the concept of relational integrity. 4.5.2.2 Disadvantages of Normalization * You cannot start building the database before knowing what the user needs. * The performance degrades when normalizing the relations to higher normal forms, L., ANB, SNF. . It is very time-consuming and difficult to normalize relations of a higher degree. " Careless decomposition may lead to a bad database design, leading to serious problems. 416 Database Design 4.5.3 TYPES OF NORMAL FORMS. Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation is said to be in particular normal form if it satisfies constraints, Following are the various types of Normal forms: Decomposition of Relation Euminate Repeating Groups Conditions Figure 4.2 Types of Normal Forms Normal Form Description INF A relation is in INF if it contains an atomic value. 2NE A relation will be in 2NF if it is in INF and all non-key attributes are fully functional dependent on the primary key. 3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists. BCNF A stronger definition of 3NF is known as Boyce Codd’s normal form. 4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency. ey IZ IS A relation is in SNF. If it is in 4NF and does not c dependency, joining should be lossless. ‘ontain any join Relational Database Management System — 4.17 453.1 First Normal Form (INF) . A relation will be INF if it contains an atomic valued attributes or columns. . It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute. Values stored in a column should be of the same domain . All the columns in a table should have unique names. . First normal form disallows the multi-valued attribute, composite attribute, and their combinations. Example: Relation EMPLOYEE is not in INF because of multi-valued attribute EMP_PHONE. EMPLOYEE table: | EMP_ID EMP_NAME EMP_PHONE EMP_STATE | 4 John 7272826385, uP 9064738238 | 20 Harry 8574783832 Bihar 12 Sam 7390372389, Punjab 8589830302 The decomposition of the EMPLOYEE table into INF has been shown below: EMP_ID EMP_NAME EMP_PHONE EMP_STATE I: tT . John 7272826385 UP | a a John 9064738238 uP C a Harry 8574783832 Bihar a Sam 7390372389 Punjab e Sam 8589830302 Punjab 418 Databasy Design 4.5.3.2 Second Normal Form QNEF) . In the 2NF, relational must be in INE . In the second normal form, all non-key attributes are fully functional dependent on the primary key Example: Let's assume, a school can store the data of teachers and the subjects they teach. Ina school. @ teacher can teach more than one subject. TEACHER table reacHert> | supsect | TEACHERAGE 3 Chemistry 30 | 435 Biology 30 7 oa English : 35 ‘| 8 Maths : 38 7 : | Computer 38 | In the given table, non-prime attribute TEACHER_AGE_ is dependent on TEACHER ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.To convert the given table into 2NF, we decompose it into two tables: TEACHER DETAIL table: TEACHER ID TEACHER_AGE 25 30 + . _ 47 35 { 83 | 38 Relational Database Management System — 4.19 TEACHER_SUBJECT table: Be TEACHER_ID SUBJECT 25 Chemistry | 25 Biology = 47 English 83 Math : 83 Computer 4.5.3.3 Third Normal Form (3NF) A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency. . 3NF is used to reduce the data duplication. It is also used to achieve the data integrity. . If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form. A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X — Y. 1 X is a super key. 2. Y isa prime attribute, i.e., each element of Y is part of some candidate key. Example: EMPLOYEE _DETAIL table: /EMP_ID EMP_NAME_ | EMP_ZIP EMP STATE | EMP_CITY oa [208 "Hany 201010 “UP Noida a ‘Stephan 02228 us Boston 444 Lan 60007 US Chicago 355) Katharine 06389 Uk | Norwich 666 John 462007 MP | __ Bhopal 4.20 Database Design Super key in the table above: 1 {EMPUID), (EMP_LID, EMP NAME), {EMP_ID, EMP NAME, EMP ZIP}. so on Candidate key: |FMP_ ID} Non-prime attributes: In the given table, all attributes except EMP ID are non-prime. Here, EMP_ STATE & EMP_CITY dependent on EMP_ZIP and EMP_ ZIP dependent on EMP_ID. The non-prime attributes (EMP_ STATE, EMP_ CITY) transitively dependent on, super key(EMP_ID). It violates the rule of third normal form. That's why we need to move the EMP_CITY and EMP STATE to the new “EMPLOYEE Z1P> table, with EMP_ZIP as a Primary key EMPLOYEE table: EMP_ID 222 666 EMPLOYEE. ZIP table: EMP _ZIP 201010 02228 60007 O63K9 462007 EMP_ZIP 201010 02228 EMP_CITY Noida | Boston Chicago Norwich Bhopal - Relational Database Management System 4.21 4.5.3.4 Boyce Codd normal form (BCNF) . BCNF is the advance version of 3NF. It is stricter than 3NF. . A table is in BCNF if every functional dependency X — Y, X is the super key of the table. + For BCNF, the table should be in 3NF, and for every FD, LHS is super key. Example: Let's assume there is a company where employees work in more than one department. EMPLOYEE table: EMP_ID | EMP_COUNTRY | EMP_DEPT | DEPT_TYPE 264 India Designing D394 264 India Testing D394 300 364] | UK Stores D283 232 364 UK Developing D283 349 | 264 “India Designing D394 235 In the above table Functional dependencies are as follows: 1. EMP_ID ~ EMP_COUNTRY 2. EMP_DEPT -> {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT} The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To convert the given table into BCNF, we decompose it into three tables: EMP_COUNTRY table: EMP_ID EMP_COUNTRY 264 India 264 India 4.22 Database Design EMP_DEPT table: EMP_DEPT | | DEPT_TYPE | EMP_DEPT_NO | Designing SSS = | | Testing | D394 300 7 Stores | SSC«i@283 232 LK Developing =| —C«é@D iz “549 _ EMP_DEPT_MAPPING table: EMP_ID EMP_DEPT D394 283 7 D394 300 r D283 232 D283 549 Functional dependencies: 1. 2 EMP_ID — EMP_COUNTRY EMP_DEPT — {DEPT_TYPE, EMP_DEPT_NO} Candidate keys: For the first table: EMP_ID For the second table: EMP_DEPT For the third table: {EMP_ID, EMP_DEPT} Now: this is in BCNF because left side part of both the functional dependencies isa key. 4.5.3.5 Fourth normal form (4NF) A relation will be in 4NF if it is in Boyce C dependency. in Boyce Codd normal form and has no multi-valued For a dependency A — B, if for a single value of A. : . Multip] : the relation will be a multi-valued dependency. Pit values of B exists, then Relational Database Management System 4.23 ’ | COURSE | HOBBY ja a Computer Dancing a Math Singing : uo Chemistry [ Dancing 4 | a | Cricket 59 |= Physics Hockey | The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi- valued dependency on STU_ID, which leads to unnecessary repetition of data. So to make the above table into 4NF, we can decompose it into two tables: STUDENT_COURSE : STUD COURSE a Computer a | Math L : 34 Chemistry Se Biology | 59 Physics 4.24 Database Di STUDENT_HOBBY | STUID ____ HOBBY = | — Ai Dancing | aaa 7 : __ Singing —_| a Dancing | i _ 4 Cricket r 59 Hockey | 4.5.3.6 Fifth normal form (SNF) * Avelation is in SNF if it is in 4NF and not contains any join dependency and joining should be lossless. SNF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy. SNF is also known as Project-join normal form (PJ/NF). | SUBJECT LECTURER SEMESTER | oe __Computer _ Anshika Semester | Computer John Semester | Math John Semester | | Math Akash Semester 2 eee EE verse | __ Semester 1 In the above table, John takes both Computer and Math Class for Semester | but he ter 2. In this case, combination of all these fields required to doesn't take Math class for Semes| identify a valid data. Suppose we add a new Semester as Semester who will be taking that Subject so we leave Lecturer a together acts as a Primary key, t 3 but do not knoy ind Subject as NU So we can't leave other t W about the subject and LL. But all three columns 'wo columns blank, So to make the above table into SNF, we can decompose it into three relations P1, P2& P3: . Relational Database Management System 425 Pi SEMESTER 7 SUBJECT | Semester) Computer | - Semester 1 CF Math | - Semester! | Chemistry Semester 2 Math | P2 SUBJECT LECTURER SUBJECT ; Computer Anshika Computer o Computer John Computer : Math John Math : Math Akash Math Chemistry Praveen Chemistry semsteR "LECTURER : Semester) | nha oa John Semester) | ohn Semester? Tah Semester) Praveen 26 46 database. The da Database Design DENORMALIZATION Denormalization is a database optimization technique where we add redundant data in the database to get rid of the complex join operations. This is done to speed up database access speed. Denormalization is done after normalization for improving the performance of the a from one table is included in another table to reduce the number of joins in the query and hence helps in speeding up the performance. Example: Suppose after normalization we have two tables first, Student table and second, Branch table. The student has the attributes as Roll_no, Student-name, Age, and Branch_id. Student table fs Roll_no | Student_name ] Age ] Branch_id 1 ‘Andrew 18 10 2 Angel 19 10 3 Priya 20 10 4 Analisa 2 1 5 ‘Anna 2 12 The branch table is related to the Student table the Student table. to perform the join operations. So, we can add the di the Student table and this will help in reducin, operation and thus optimize the database. 4.61 with Branch_id as the foreign key in ___Branch table _ | Branch_id | Branch_name| HOD | 40 csE Meat | on | EC | Dee | poe Ex __| Drpar | If we want the name of students along with the name of the branch name then we need to perform a join operation. The problem here is that if the table is large we need a lot of time ADVANTAGES OF DENORMALIZATION Query execution is fast since we have Fetching queries in a normalized database tables, but we already know that the more joins, lata of Branch_name from Branch table ig the time that would have been used in join to join fewer tables. generally requires joining a large number of the slower the query. To overcome this. 4.6.2 4.6.3 Relational Database Management 8) 427 we can add redundancy to a database by copying values between parent and child tables, minimizing the number of joins needed for a query. Make database more convenient to manage A normalized database is not required calculated values for applications. Calculating these values on-the-fly will take a longer time, slowing down the execution of the query. Thus, in denormalization, fetching queries can be simpler because we need to look at fewer tables. Facilitate and accelerate reporting Suppose you need certain statistics very frequently. It requires a long time to create them from live data and slows down the entire system. Suppose you want to monitor client revenues over a certain year for any or all clients. Generating such reports from live data will require "searching" throughout the entire database, significantly slowing it down. DISADVANTAGES OF DENORMALIZATION The following are the disadvantages of denormalization: It takes large storage due to data redundancy. It makes it expensive to updates and inserts data in a table. It makes update and inserts code harder to write. Since data can be modified in several ways, it makes data inconsistent. Hence, we'll need to update every piece of duplicate data. It's also used to measure values and produce reports. We can do this by using triggers, transactions, and/or procedures for all operations that must be performed together. HOW IS DENORMALIZATION DIFFERENT FROM NORMALIZATION? The denormalization is different from normalization in the following manner: Denormalization is a technique used to merge data from multiple tables into a single table that can be queried quickly. Normalization, on the other hand, is used to delete redundant data from a database and replace it with non-redundant and reliable data. re run regularly on the large number of se tables are not Denormalization is used when joins are costly, and queries at ‘ables. Normalization, on the other hand, is typically used when a inserVupdate/delete operations are performed, and joins between tho expensive TRANSACTION PROCESSING & DATABASE SECURITY $.1 TRANSACTION PROCESSING > INTRODUCTION > TRANSACTION OPERATIONS > TRANSACTION STATES > PROPERTIES OF TRANSACTIONS > SCHEDULES & CONFLICTS c> o SERIAL SCHEDULE o PARALLEL SCHEDULE > SERIALIZABILITY > ANOMALIES DUE TO INTERLEAVED TRANSACTIONS © WRCONELICTS © RWCONEBLICTS © WWCONPLICTS LOCK BASED CONCURRENCY CONTROL o LOCK BASED PROTOCOL o TIMESTAMP BASED PROTOCOL 52__ Transaction Processing & Database Security $1.1 INTRODUCTION A transaction is a program including a collection of database operations, executed as logical unit of data processing. The operations performed in a transaction include one or more of database operations like insert, delete, update or retrieve data. It is an atomic process that 1s cither performed into completion entirely or is not performed at all Each high level operation can be divided into a number of low level tasks or operations. For example, a data update operation can be divided into three tasks + read_item() ~ reads data item from storage to main memory. + modify_item() ~ change value of item in the main memory, . write_item() ~ write the modified value from main memory to storage. m() and write_item() operations. Likewise, for database operations. Database access is restricted to read all transactions, read and write forms the bi 5.1.2. TRANSACTION OPERATIONS The low level operations performed in a transaction are . begin_transaction ~ A marker that specifies start of transaction execution. . read_item or write_item — Database operations that may be interleaved with main memory operations as a part of transaction. |_transaction — A marker that specifies end of transaction * commit — A signal to specify that the transaction has been successfully completed in its entirety and will not be undone. . rollback — A signal to specify that the transaction has been unsuccessful and so all temporary changes in the database are undone. A committed transaction cannot be rolled back S13 TRANSACTION A A transaction may go through a subset of five states, active, partially committed, “ommitted, failed and aborted 1 + state. The transaction Active The initial state where the transaction enters is the active state, The transac Femains in this state while it is executing read, write or other operations og Relational Database Management System $3 Partially Committed — The transaction enters this state after the last statement of the transaction has been executed, Committed ~ The transaction enters this state after successful completion of the transaction and system checks have issued commit signal. Failed ~ The transaction goes from partially committed state or active state to failed state when it is discovered that normal execution can no longer proceed or system checks fail, Aborted — Thi is the state after the transaction has been rolled back after failure and the database has been restored to its state that was before the transaction began. The following state transition diagram depicts the states in the transaction and the low level transaction operations that causes change in states. begin_transaction Figure 5,1 State Transition Diagram — States in a transaction 5.1.4 DESIRABLE PROPERTIES OF TRANSACTIONS Any transaction must maintain the ACID properties, viz, Atomicity, Consistency Isolation, and Durability : ; Atomicity ~ This property states that a tran: is, either it is performed in its entirety or not performed at all. No Partial update should exist. tion is an atomic unit of Processing, that Consistency — A transaction should take the database from one consi: another consistent state. It should not adversely affect any dat; lent state a item in the database Sa Transaction Processing & Database Security Isolation — A transaction should be executed as if it is the only one in the system. There should not be any interference from the other concurrent transactions that are simultaneously running. Durability — If a committed transaction brings about a change, that change should be durable in the database and not lost in case of any failure. SCHEDULES AND CONFLICTS, In a system with a number of simultaneous transactions, a schedule is the total order of execution of operations. Given a schedule S comprising of n transactions, say T!, T2. T3.. ..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the schedule S. Types of Schedules There are two types of schedules — Serial Schedules — In a serial schedule, at any point of time, only one transaction is active, i.e. there is no overlapping of transactions. This is depicted in the following graph é 2 G T1 T2 m3 3 —= §» —s —_ > = Time Figure 5.2 Serial Schedule Parallel Schedules ~ In parallel schedules, more than one transactions are active simultaneously, i.e. the transactions contain operations that overlap at time. This 1s depicted in the following graph Relational Database Management System 55 Transactions Time Figure 5.3 Parallel Schedule 5.1.6 CONFLICTS IN SCHEDULES In a schedule comprising of multiple transactions, a conflict occurs when two active transactions perform non-compatible operations. Two operations are said to be in conflict, when all of the following three conditions exists simultaneously — . The two operations are parts of different transactions. . Both the operations access the same data item. . ‘At least one of the operations is a write_item() operation, i.e. it tries to modify the data item. §.1.7. SERIALIZABILITY A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the correctness of serial schedule while ascertaining better CPU utilization of parallel schedule. Equivalence of Schedules Equivalence of two schedules can be of the following types + Result equivalence — Two schedules producing equivalent Producing identical results are said to be result . View equivalence Two sched ules that perfor s - said to be view equivalent Perform similar action in a similar manner are . Conflict equivalence Two schedul les a He Si , i the same se of transactions and has ty said to be conflict equivalent if both contai" c same order of conflicting pairs of operations. _, 56 Transaction Processing & Database Security 5.1.8 ANOMALIES DUE TO INTERLEAVED TRANSACTIONS When the read and write operations done alternatively there is a possibility of some type of anomalies. These are classified into three categories. 1. Write-Read Conflicts (WR Conflicts) This conflict occurs when a transaction reads the data which is written by the other transaction, but not yet committed. This happens when the Transaction T2 is trying to read the object A that has been modified by another Transaction T1, which has not yet completed (committed). This type read is called as dirty read. CJ a | fle Figure 5.4 Write-Read Conflicts (WR Conflicts) Suppose if the transactions are interleaved according to the above schedule then the account transfer program T1 deducts $100 from account A, then the interest deposit Program T2 reads the current values of accounts A and B and adds 6% interest to each, ees are transfer program credits $100 to account B. The outcome of this oa sit different from the normal execution like if the two instructions are y one. This type of anomalies leaves the database in inconsistency state. Relational Database Management System 5.7 Read-Write Conflicts (RW Conflicts) This conflict occurs when a transaction writes the data which is previously read by the other transaction. In this case anomalous behavior could result is that a Transaction T2 could change the value of an object A that has been read by a Transaction T1, while T2 is still in progress. If TI tries to read A again it will get different results. This type of tead is called as Unrepeatable Read. Figure 5.5 Read-Write Conflicts (RW Conflicts) ‘main in account A. Now TI will try to reduce it by $100. This makes the Database inconsistent. Write-Write Conflicts (WW Conflicts) This conflict occurs when the data updated by a transaction is overwritten by another transaction which might lead to data update loss, The third type of anomalous behavior is that one Transaction is updating an object while another one is also in progress. This type of write is called as Blind Write, 5.8 Transaction Processing & Database Security RIB) RiB) [ve COMMIT comMIT Figure 5.6 Write-Write Conflicts (WW. Conflicts) If A and B are two accounts and their values have to be kept equal always, Transaction TI updates both objects to 3,000 and T2 updates both objects to 2,000. At first T1 updates the value of object A to 3,000. Immediately T2 makes A as 2,000 and B as 2,000 and committed. After the completion of T2, T1 updates B to 3,000. Now the value of A is 2,000 and value of B is 3,000, they are not equal. Constraint violated in this case due to serial scheduling. 5.1.9 LOCK BASED CONCURRENCY CONTROL Ina multiprogramming environment where multiple transactions can be executed simultaneously, it is highly important to control the concurrency of transactions. We have Concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent transactions. Concurrency control Protocols can be broadly divided into two categories — Lock based Protocols . Time stamp based Protocols 5.1.9.1 LOCK-BASED PROTOCOLS Database s, transaction cannot ‘ystems equipped with lock-based protocols use a mechanism by which any kinds — read or write data until it acquires an appropriate lock on it. Locks are of two Binary Locks — A lock on a data item c: i 7 unlocked. ‘an be in two states; it is either locked or Relational Database Management System 5.9 Shared/exclusive ~ This type of locking mechanism differentiates the locks based on their us Ifa lock is acquired on a data item to perform a write operation, it 1s an exclusive lock. Allowing more than one transaction to write on the same data item would lead the database into an inconsistent state. Read locks are shared because no data value is being changed. There are four types of lock protocols available ~ (a) Simplistic Lock Protocol Simplistic lock-based protocols allow transactions to obtain a lock on every object before a ‘write’ operation is performed. Transactions may unlock the data item after completing the ‘write’ operation. } (b) Pre-claiming Lock Protocol [ Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks. Before initiating an execution, the transaction requests the system for all the locks it needs beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its operations are over. If all the locks are not granted, the transaction rolls back and waits until all the locks are granted. ‘9 Lock acquisition phase Seas ett T begin Teng Time Figure 5.7 Preclaiming Lock Protocol (c) Two-Phase Locking 2PL This locking protocol divides the execution phase of a transaction into three parts. In the first part, when the transaction starts Cxecuting, it seeks permission for the locks it Tequires. The second part is where the transaction Acquires all the locks. As soon as the transaction releases its first lock, the third phase starts, In this phase, the transaction cannot demand any new locks; it only releases the acquired locks, i 5.10 Transaction Processing & Database Security Lock acquisition releasing phase \ phase T begin Tend me Figure 5.8 Two — Phase Locking Protocol Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and the second phase is shrinking, where the locks held by the transaction are being released. To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an exclusive lock. (a) Strict Two-Phase Locking The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks at a time. Lock acquisition release at phase commit Nf 4 T begin Tend Figure 5.9 Strict two phase locking Protocol Time Strict-2PL does not have cascading abort as 2PL does. 1.9.2 TIMES °.2 TIMESTAMP-BASED PROTOCOLS i. the Most commonly used concurrency protocol is the timestamp based protocol. This col uses ei : : : Ses either system time or logical counter as a timestamp. wa ; Relational Database Management System Su Lock-based protocols manage the order between the conflicting pairs among transactions at the time of execution, whereas timestamp-based protocols start working as soon as a transaction is created. Every transaction has a timestamp associated with it, and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transactions that come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and the priority would be given to the older one. In addition, every data item is given the latest read and write-timestamp. This lets the system know when the last ‘read and write’ operation was performed on the data item. 5.1.9.2.1 Timestamp Ordering Protocol The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and writes operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed according to the timestamp values of the transactions. * The timestamp of transaction T; is denoted as TS(T}). + Read time-stamp of data-item X is denoted by R-timestamp(X). . Write time-stamp of data-item X is denoted by W-timestamp(X). Timestamp ordering protocol works as follows — . If a transaction Ti issues a read(X) operation — o If TS(Ti) < W-timestamp(X) . Operation rejected. o If TS(Ti) >= W-timestamp(X) . Operation executed. o All data-item timestamps updated. . If a transaction Ti issues a write(X) operation — o If TS(Ti) < R-timestamp(X) . Operation rejected. o If TS(Ti) < W-timestamp(X) 5.12 Transaction Processing & Database Security + Operation rejected and Ti rolled back. o Otherwise, operation executed. 5.1.9.2.2 Thomas’ Write Rule This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T, is rolled back. Time-stamp ordering rules can be modified to make the schedule view serializable. Instead of making T; rolled back, the 'write' operation itself is ignored. 5.2 DATABASE SECURITY > INTRODUCTION COMMON THREATS & CHALLENGES > CONTROL MEASURES TO PROVIDE | SECURITY > BEST PRACTICES FOR EVALUATING DATABASE SECURITY > CLASSIFICATION OF DATABASE SECURITY ° TYPES ° DIFFERENT LEVELS 5.2.1 INTRODUCTION cools, controls, and measures designed t integrity, and availability. It also mean loss of data. Security of data base i Database security refers to the range of ©stablish and preserve database confidentiality, keeping sensitive information safe and prevents the Sontrolled by Database Administrator (DBA). 5.2.2 COMMON THREATS AND CHALLENGES erabilities, or patterns of carelessness or misus Many software misconfigurations, vuln of databa: an result in breaches. The following are among the most common types or causes Security attacks and their causes: Relational Database Management System 5.43 Insider thrent An insider threat is a security threat from any one of three sources with privileged acces, to the database ° A malicious insider who intends to do harm > A negligent insider who makes errors that make the database vulnerable to attack . An infiltrator--an outsider who somehow obtains credentials via a scheme such 2, phishing or by gaining access to the credential database itself Insider threats are among the most common causes of database security breaches and are often the result of allowing too many employees to hold privileged user access credentials Exploitation of database software vulnerabilities: Hackers make their living by finding and targeting vulnerabilities in all kinds of software, including database management software. All major commercial database softwar: vendors and open source database management platforms issue regular security patches t address these vulnerabilities, but failure to apply these patches in a timely fashion can increase your exposure. SQL/NoSQL injection attacks: A database-specific threat, these involve the insertion of arbitrary SQL or non- SQL attack strings into database queries served by web applications or HTTP headers Organizations that don’t follow secure web application coding practices and perform regula" vulnerability testing ‘are open to these attacks. Buffer overflow exploitations: Buffer overflow occurs when a proc attempts to write more data to a fixed-length block of memory than it is allowed to hold. Attackers may use the excess data, stored in adjacen! memory addresses, as a foundation from which to launch attacks. Denial of service (DoS/DDoS) attacks: In a denial of service (DoS) attack, the attacker surge the target server—in this case the database server —with so many requests that the server can no longer fulfill legitimate reques's from actual users, and, in many cases, the server becomes unstable or crashes. .14._Transaction Processing & Database Security Malware: Malware is software written specifically to exploit vulnerabilities or otherwise cause ~ gamage to the database. Malware may arrive via any endpoint device connecting to the databa Attacks on backups: Organizations that fail to protect backup data with the same stringent controls used to protect the database itself can be vulnerable to attacks on backups. 3 CONTROL MEASURES TO PROVIDE DATABASE SECURITY The following are the main control measures are used to provide security of data in databases: 1, Authentication 2. Access control 3. Inference control 4. Flow control 5. Database Security applying Statistical Method 6. Encryption These are explained as following below. 1, Authentication ; Authentication is the process of confirmation that whether the user log in only according to the rights provided to him to perform the activities of data base. A particular user can login only up to his privilege but he can’t access the other Sensitive data. The privilege of accessing sensitive data is restricted by using Authentication. By using these authentication tools for biometrics such as retina and figure prints can Prevent the data base from unauthorized/malicious users. Access Control : The security mechanism of DBMS must include some provisions for restricting access to the data base by unauthorized users. Access control is done by Creating user accounts and to control login process by the DBMS. So, that database Access of sensitive data is possible only to those people (database users) who are 5.2.4 Relational Database Management System 5.15 allowed to access such da pd to restrict acs ess to unauthorized persons. The database system must also keep the track of all operations performed by certain user throughout the entire login time Inference Control : his method is known as the countermeasures to statistical database security problem, It is used to prevent the user from completing any inference channel This method protect sensitive information — from _ indirect disclosure. Inferences are of two types, identity disclosure or attribute disclosure. Flow Control : This prevents information from flowing in a way that it reaches unauthorized users, Channels are the pathways for information to flow implicitly in ways that violate the privacy policy of a company are called convert channels. Database Security applying Statistical Method : Statistical database security focuses on the protection of confidential individual values stored in and used for statistical purposes and used to retrieve the summaries of values based on categories They do not permit to retrieve the individual information. This allows to access the database to get statistical information about the number of employees in the company but not to access the detailed confidential/personal information about the specific individual employee. Encryption : This method is mainly used to protect sensitive data (such as credit card numbers, OTP numbers) and other sensitive numbers. The data is encoded using some encoding algorithms. An unauthorized user who tries to access this encoded data wil face difficulty in decoding it, but authorized users are given decoding keys to decode data. BEST PRACTICES FOR EVALUATING DATABASE SECURITY Consider each of the following areas: Physical security: Whether your database server is on-premise or in a cloud dat center, it must be located within a secure, climate-controlled environment Administrative and network access controls: The practical minimum number °! users should have access to the database, and their permissions. should be restricted the minimum levels necessary for them to do their jobs. Likewise, network acces should be limited to the minimum level of permissions necessary End user account/device security: Always be aware of who is accessing the databas’ and when and how the data is being used. Data monitoring solutions can alert you !! data activities are unusual or appear risky. All user devices connecting to the network in Transaction Processing & Database Security » housing the database should be physically secure and subject to security controls at all | times Encryption: ALL data including data in the database, and credential data should be protected with best-in-class encryption while at rest and in transit, All encryption keys should be handled in accordance with best-practice guidelines, . Di ase software security: Always use the latest version of your database management software, and apply all patches as soon as they are issued Application/web server security: Any application or web server that interacts with the database can be a channel for attack and should be subject to ongoing security testing and best practice management Backup security: All backups, copies, or images of the database must be subject to the same (or equally stringent) security controls as the database itself. Auditing: Record all logins to the database server and operating system. and log all operations performed on sensitive data as well. Database security standard audits should be performed regularly. 5.2.5 CLASSIFICATION OF DATABASE SECURITY The database security is broadly classified into physical and logical security. Database recovery is the way of restoring a database to a correct state in the event of a failure. + Physical security ~ Physical security refers to the security of the hardware that is associated with the system and the protection of the site where the computer resides. The natural events like fire, floods, and earthquakes can be considered as some of the physical threats. It is advisable to have backup copies of databases in the face of massive disasters, . Logical security — Logical security refers to the security measures present it the operating system or the DBMS designed to handle threats to the data. Logical security 1s far more difficult to accomplish. 52.5.1 DATABA: ECURITY AS PER THE LEVE! Database security is performed at different levels. This is explained below (4) Database Security at Design Level Is necessary to take care of the database security at the stage of database design Some ; #uidelines to implement the most secure system are (b) Relational Database Management System 5.17 The database design should be simple. The database must be normalized. Create a unique key for each user or group of users. Database security at maintenance level Once the database is designed, the administrator is playing an important role in the maintenance of the database. The security issues at maintenance level can be classified into the following — Operating system issues and availability Confidentiality and accountability through authorization rules Encryption Authentication schemes Database Security through Access Control A database for an enterprise contains a great deal of information and usually has several groups of users. Most users need to access only a small portion of the database which is allocated to them. DBMS should provide mechanisms to access the data. Especially, it is a way to control the data accessible by a given user. The mechanisms for access control at the DBMS level are as follows — Discretionary access control: DAC is identity-based access control. DAC mechanisms will be controlled by user identification such as username and password DAC is discretionary because the owners can transfer objects or any Sel information to other users. In simple words, the owner can determine the access privileges. : Mandatory access control: The operating system in MAC will provid the user based on their identities and data, For gaining access, the a hi ‘ mabe J personal information. It is very secure because the rules o eer by the admin and will be strictly followed. MAC settings a1 be established in a secure network and are limited to syste id restrictions are imposed nd policy management will 'm administrators.

You might also like