Unit 1
Unit 1
Introduction
Unit - 1
Disadvantage
Data redundancy: Multiple file formats, duplication of information in different files.
• E.g.: Student has a double major (music and mathematics) the address and telephone number of that student may appear
in a file that consists of student records of students in the Music department and in a file that consists of student records
of students in the Mathematics department.
• This redundancy leads to higher storage and access cost.
Data inconsistency: The various copies of the same data may no longer agree.
• E.g.: A changed student address may be reflected in the Music department records but not elsewhere in the system.
Difficulty in accessing data: Need to write a new program to carry out each new task.
• Conventional file-processing environments do not allow needed data to be retrieved in a convenient and
efficient manner.
• More responsive data-retrieval systems are required for general use.
Data isolation: Data are scattered in various files, and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.
Integrity problems: The data values stored in the database must satisfy certain types of consistency constraints.
• E.g.: account balance < 0 (never fall less than zero).
• Hard to add new constraints or change existing ones.
Atomicity problems: Failures may leave the database in an inconsistent state with partial updates carried out.
• E.g.: Transfer of funds from one account to another should either be complete or not happen at all.
• Difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies: For the sake of overall performance of the system and faster response, many
systems allow multiple users to update the data simultaneously.
• Uncontrolled concurrent accesses can lead to inconsistencies.
• E.g.: Two people reading a balance and updating it at the same time.
• Consider department A, with an account balance of $10,000. If two department clerks debit the account
balance ($500 and $100) of department A at almost exactly the same time the result of the concurrent
executions may leave the budget in an incorrect (or inconsistent) state.
• If the two programs run concurrently, they may both read the value $10,000, and write back $9500 and
$9900, respectively. Depending on which one writes the value last, the account balance of department A may
contain either $9500 or $9900, rather than the correct value of $9400.
• Security problems: Not every user of the database system should be able to access all the data.
• E.g.: in a university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records.
• A database schema corresponds to the variable declarations (along with associated type definitions) in a program.
• The values of the variables in a program at a point in time correspond to an instance of a database schema.
• Physical Data Independence: The ability to modify the physical schema without changing the logical schema.
• Applications depend on the logical schema.
• In general, the interfaces between the various levels and components should be well defined so that
changes in some parts do not seriously influence others.
Data Models
• A collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints.
• A data model provides a way to describe the design of a database
at the physical, logical, and view levels.
• Semi structured Data Model: The semi structured data model permits the specification of data where
individual data items of the same type may have different sets of attributes.
• The Extensible Markup Language (XML) is widely used to represent semi structured data.
The network data model and the hierarchical data model preceded the relational data model.
Relational Model
• Tuple – one row of a relation
• Attribute – one column of a relation
• Relation – the whole table
• Domain of an attribute – all the values that the attribute
Entity-Relationship Model
• E-R model of real world
• Entities (objects)
• E.g. Teacher, Department.
• Relationships between entities
• Relationship set teaches in associates teacher with department.
• Entity sets are represented by a rectangular box with the entity set
name in the header and the attributes listed below it.
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also called as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data.
• A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language.
Database Languages
• The output of the DDL is placed in the data dictionary which contains metadata - that is, data about data.
• SQL provides a rich DDL that allows one to define tables, integrity constraints, assertions, etc.
create table department (dept name char (20), building char (15), budget numeric (12,2));
• Execution of the above DDL statement creates the department table with three columns: dept name,
building, and budget, each of which has a specific data type associated with it.
CREATING DATABASE TABLE
• Used to create a table by defining its structure, the data type and name of the various
columns, the relationships with columns of other tables etc.
• E.g.:
CREATE TABLE Employee(Name varchar2(20), DOB date, Salary number(6));
ALTER - Add a new attribute or Modify the characteristics of some existing attribute.
• ALTER TABLE table_name ADD (column_name1 data_type (size), column_name2 data_type (size),
….., column_nameN data_type (size));
E.g.:
ALTER TABLE Employee ADD (Address varchar2(20));
ALTER TABLE Employee ADD (Designation varchar2(20), Dept varchar2(3));
ALTER TABLE table_name MODIFY (column_name data_type(new_size));
E.g.:
E.g.:
E.g.:
TRUNCATE TABLE Employee_details;
Data Manipulation Language
E.g.:
• INSERT INTO Employee VALUES (‘ashok’, ‘16-mar-1998’, 30000);
• Naive users: unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously.
• Specialized users: sophisticated users who write specialized database applications that do not fit into
the traditional data-processing framework.
Database Administrator
• One of the main reasons for using DBMSs is to have central control of both the data and the programs that
access those data.
• A person who has such central control over the system is called a database administrator (DBA).
• Schema definition: The DBA creates the original database schema by executing a set of data definition
statements in the DDL.
• Schema and physical-organization modification: The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance.
• Granting of authorization for data access: By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access.
• Routine maintenance: Periodically backing up the database, Ensuring that enough free disk space,
Monitoring jobs running on the database.
Transaction Management
• Atomicity - all-or-none requirement.
• one department account (A) is debited and another department account (B) is credited.
• Either both the credit and debit occur, or that neither occur.
• Consistency - it is essential that the execution of the funds transfer preserve the consistency of the
database. The value of the sum of the balances of A and B must be preserved.
• This is called correctness requirement.
• Durability - After the successful execution of a funds transfer, the new values of the balances of
accounts A and B must persist, despite the possibility of system failure.
• This is called persistence requirement.
Storage Manager
• The storage manager is the component of a database system that provides the interface between the low-
level data stored in the database and the application programs and queries submitted to the system.
• The storage manager is responsible for the interaction with the file manager.
The storage manager components include
• Authorization and integrity manager: which tests for the satisfaction of integrity constraints and checks
the authority of users to access data.
• Transaction manager: which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
• File manager: which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer manager: which is responsible for fetching data from disk storage into main memory, and deciding
what data to cache in main memory.
• The buffer manager is a critical part of the database system, since it enables the database to handle data
sizes that are much larger than the size of main memory.
• The storage manager implements several data structures as part of the physical system implementation
• Data files: which store the database itself.
• Data dictionary: which stores metadata about the structure of the database, in particular the schema of the
database.
• Indices: which can provide fast access to data items.
• Like the index in this textbook, a database index provides pointers to those data items that hold a particular
value.
The Query Processor
• The query processor components include
• DDL interpreter: which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler: which translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
• A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization; that is, it picks the lowest cost evaluation
plan from among the alternatives.
• Query evaluation engine: which executes low-level instructions generated by the DML compiler.
DML Commands
Command Explanation
Select The SELECT operation is used for selecting a subset of the tuples according to a given
selection condition
Projection The projection eliminates all attributes of the input relation but those mentioned in the
projection list.
Union UNION is symbolized by symbol. It includes all tuples that are in tables A or in B.
Intersection Intersection defines a relation consisting of a set of all tuple that are in both A and B.
Cartesian Product Cartesian operation is helpful to merge columns from two relations.
Inner Join Inner join, includes only those tuples that satisfy the matching criteria.
Outer Join In an outer join, along with tuples that satisfy the matching criteria.
UNION Operator
• UNION operator is used to combine the result sets of 2 or more SELECT statements. It removes duplicate
rows between the various SELECT statements.
• Each SELECT statement statement within the UNION operator must have the same number of fields in the
result sets with similar data types.
SELECT expression1, expression2, ... expression_n FROM tables [WHERE conditions] UNION SELECT expression1,
expression2, ... expression_n FROM tables [WHERE conditions];
• The SQLite INTERSECT operator returns the intersection of 2 or more datasets. Each dataset is defined by a SELECT statement.
• If a record exists in both data sets, it will be included in the INTERSECT results.
• If a record exists in one data set and not in the other, it will be omitted from the INTERSECT results.
SELECT expression1, expression2, ... expression_n FROM tables [WHERE conditions] INTERSECT SELECT expression1,
expression2, ... expression_n FROM tables [WHERE conditions];
SELECT department_id FROM departments WHERE department_id >= 25 INTERSECT SELECT department_id FROM
employees WHERE last_name = 'Anderson';
• Example - With Multiple Expressions
SELECT contact_id, last_name, first_name FROM contacts WHERE contact_id > 50 INTERSECT SELECT
customer_id, last_name, first_name FROM customers WHERE last_name <> 'Peterson';
• ORDER BY clause is used to sort the data in an ascending or descending order, based on one or more columns.
SELECT column-list FROM table_name [WHERE condition] [ORDER BY column1, column2, .. columnN] [ASC | DESC];
SELECT column-list FROM table_name WHERE [ conditions ] GROUP BY column1, column2....columnN ORDER BY
column1, column2....columnN
SELECT column1, column2, columnN FROM table_name WHERE [condition1] AND [condition2]...AND [conditionN];
SELECT * FROM COMPANY WHERE AGE >= 25 AND SALARY >= 65000;
• The OR operator is also used to combine multiple conditions in a SQL statement's WHERE clause.
[condition1] OR [condition2] will be true if either condition1 or condition2 is true.
SELECT column1, column2, columnN FROM table_name WHERE [condition1] OR [condition2]...OR [conditionN]
• The SELECT TOP clause is used to specify the number of records to return.
• The SELECT TOP clause is useful on large tables with thousands of records. Returning a large
number of records can impact performance.
Not all database systems support the SELECT TOP clause. MySQL supports the LIMIT clause to
select a limited number of records, while Oracle uses ROWNUM.
• SELECT * FROM Student LIMIT 3;
MIN() and MAX() Functions
• The MIN() function returns the smallest value of the selected column.
• The MAX() function returns the largest value of the selected column.
MIN() Syntax
• SELECT MIN(column_name) FROM table_name WHERE condition;
MAX() Syntax
• SELECT MAX(column_name) FROM table_name WHERE condition;
SELECT student_id, name, MIN(age) FROM Student;
MAX() Functions
• SELECT student_id, name, Max(age) FROM Student;
COUNT(), AVG() and SUM() Functions
• The COUNT() function returns the number of rows that matches a specified criterion.
• The AVG() function returns the average value of a numeric column.
• The SUM() function returns the total sum of a numeric column.
COUNT() Syntax
• SELECT COUNT(column_name) FROM table_name WHERE condition;
AVG() Syntax
• SELECT AVG(column_name) FROM table_name WHERE condition;
SUM() Syntax
• SELECT SUM(column_name) FROM table_name WHERE condition;
SELECT COUNT(student_id) FROM Student;
• SELECT AVG(age) FROM Student;
• The BETWEEN operator selects values within a given range. The values can be numbers, text, or
dates.
• The BETWEEN operator is inclusive: begin and end values are included.
BETWEEN Syntax
• SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND
value2;
SELECT * FROM student WHERE age BETWEEN 16 AND 18;
NOT BETWEEN Operator
• To display the products outside the range of the previous example, use NOT BETWEEN.
• SELECT * FROM student WHERE age NOT BETWEEN 16 AND 18;
LIKE Operator
• The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
• There are two wildcards often used in conjunction with the LIKE operator:
• % - The percent sign represents zero, one, or multiple characters.
• _ - The underscore represents a single character.
• LIKE Syntax
• SELECT column1, column2, ... FROM table_name WHERE columnN LIKE pattern;
LIKE Operator
• SELECT * FROM student WHERE Name LIKE 'S%’;
• selects all the orders from the customer with CustomerID=4 (Around the Horn).
• We use the "Customers" and "Orders" tables, and give them the table aliases of "c" and "o"
respectively (Here we use aliases to make the SQL shorter).
• The CASE statement goes through conditions and returns a value when the first condition is met
(like an IF-THEN-ELSE statement).
• So, once a condition is true, it will stop reading and return the result.
• If no conditions are true, it returns the value in the ELSE clause.
• If there is no ELSE part and no conditions are true, it returns NULL.
CASE Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN result
ELSE result
END;
CASE Statement
SELECT orderid, quantity,
CASE
WHEN quantity > 30 THEN 'The quantity is greater than 30’
WHEN quantity = 30 THEN 'The quantity is 30’
ELSE 'The quantity is under 30’
END AS QuantityText
FROM orderdetails;
DELETE Statement
• It is possible to delete all rows in a table without deleting the table. This means that the table
structure, attributes, and indexes will be intact.
• DELETE FROM table_name;
DELETE FROM orderdetails;
Different Types of SQL JOINs
• (INNER) JOIN: Returns records that have matching values in both tables.
• LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the
right table.
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from
the left table.
• FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table.
JOIN
• A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
• Orders Table Customer Table
SELECT customers.customer_id, orders.order_id, orders.order_date FROM customers FULL OUTER JOIN orders ON
customers.customer_id = orders.customer_id ORDER BY customers.customer_id;
TCL -Transaction Control Languages
• TCL stands for Transaction Control Languages. These commands are used for maintaining
consistency of the database and for the management of transactions made by the DML commands.
• A Transaction is a set of SQL statements that are executed on the data stored in DBMS.
• Whenever any transaction is made these transactions are temporarily happen in database. So, to
make the changes permanent, we use TCL commands.
• The privileges (Right to access the data) are required for performing all the database operations
like creating tables, views, or sequences.
• DCL command is a statement that is used to perform the work related to the rights, permissions,
and other control of the database system.