Exam Avanced
Exam Avanced
The main difference between the two is how the data is structured:
Polymorphism.
Inheritance.
Encapsulation.
Abstraction.
Polymorphism
Polymorphism is the capability of an object to take multiple forms. This
ability allows the same program code to work with different data types.
Inheritance
Inheritance creates a hierarchical relationship between related classes
while making parts of code reusable. Defining new types inherits all the
existing class fields and methods plus further extends them. The existing
class is the parent class, while the child class extends the parent. For
example, a parent class called Vehicle can have child classes such
as Car and Bike . Both child classes inherit information from the parent
class. They also extend the parent class with new information depending
on the methods defined for each vehicle type.
Encapsulation
Encapsulation allows grouping variables and methods into a single
object to create access protection. This process hides information and
details of how an object works from the rest of the code and results in
data and function security.
Classes interact with each other through interface methods without the
need to know how particular methods work.
For example, a Car class can have properties such as color , make ,
and model and methods such as changeColor() . You can change the color of
a car through a method, yet the model and make are not
accessible. Encapsulation bundles all the car information into one entity,
where some elements are modifiable while some are not.
Abstraction
Abstraction is the process of focusing on the essential characteristics to
provide functionality. The process selects vital information while
unnecessary details stay hidden. Abstraction helps reduce data
complexity and simplifies code reusability. For example, when a web
browser connects to the internet, it doesn't need to know the specific
connection details. Whether the connection is established through Wi-
Fi or Ethernet is irrelevant. The specific connection type is hidden from
the browser to create an abstraction, whereas the various types of
connections represent different implementations of the abstraction.
Feature Description
Query Language Finds objects and retrieve data from the database.
Transparent Allows accessing and using data with an object-oriented programming language without
Persistence special handling.
Ensures that ACID transactions, and guarantees all transactions are completed without
ACID Transactions
conflicting changes.
Creates a partial replica of the database in memory. Allows faster access to a database
Database Caching
without reading from disk.
Chapter-2
QUERY PROCESSING AND OPTIMIZATION
QUERY PROCESSING
Query Processing
1. Parsing: The query is broken down into smaller components, such as tables, columns,
and conditions.
2. Optimization: The query is analyzed and optimized by considering various factors,
such as the available indexes, statistics, and system resources.
3. Query Rewriting: The query is rewritten to improve performance, such as by
reordering joins or eliminating unnecessary operations.
4. Execution: The optimized query is executed, and the results are returned to the user.
Query Optimization
1. Access Path Selection: Choosing the most efficient method to access the required
data, such as using an index or a full table scan.
2. Join Order Optimization: Determining the optimal order in which to join tables,
taking into account factors such as data distribution and indexing.
3. Query Transformation: Modifying the query to reduce overhead, improve
parallelism, or optimize for specific data distributions.
4. Index Selection: Choosing the most effective index for a query, considering factors
such as data distribution, indexing overhead, and query frequency.
The activities involved in parsing, validating, execution and optimizing a query is called
Query Processing. It is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
1. Parsing and translation
2. Evaluation
3. Optimization
Parsing and Translation
As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query optimizing transformations and takes
place. Thus before processing a query, a computer system needs to translate the query into a
human-readable and understandable language. Consequently, SQL or Structured Query
Language is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is similar to the
parser of a query. When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates a
tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra.
With this, it evenly replaces all the use of the views when used in the query. Thus, we can
understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:
Select emp_name from Employee where salary>10000;
Example-1:
The query above (immediate) is called nested expression, here, as usual, we evaluate the inner
expression first (which results in relation say Manager1), then we calculate the outer expression
on Manager1 (the relation we obtained from evaluating the inner expression), which results in
relation again, which is an instance of a relation we input.
Example-2:
Given a relation Student (Roll, Name, Class, Fees, and Team) with the following tuples:
Roll Name Department Fees Team
1 Abebe CS 22000 A
2 Kebede CS 34000 A
3 Lwam IT 36000 C
4 Aster IT 56000 D
Select all the student of Team A :
σ Team = 'A' (Student)
This results as follows:
Roll Name Department Fees Team
1 Abebe CS 22000 A
2 Kebede CS 34000 A
Select all the students of department IT whose fees is greater than equal to 10000 and belongs to
Team other than A.
σ Fees >= 10000(σTeam != 'A' (Student))
This results as follows:
Roll Name Department Fees Team
3 Lwam IT 36000 C
4 Aster IT 56000 D
Important points about Select operation: Select operator is Unary, means it it applied to single
relation only. Selection operation is commutative that is,
The degree (number of attributes) of resulting relation from a Selection operation is same as the
degree of the Relation given. The cardinality (number of tuples) of resulting relation from a
Selection operation is,
0 <= σ c (R) <= |R|
Difference between Selection and Projection in DBMS
For most queries, we need a combination of projection and selection operations. There are two
ways to write these expressions:
i. Using sequence of projection and selection operations: Relational algebra expression
using sequence of projection and selection operations.
π name, team (σ Fees>10000 (Student))
// select name, team from Student where Fees >=10000;
ii. Using rename operation to generate intermediate results: Relational
algebra expression using sequence of projection and selection operations.
Result1:= σ Fees >10000 (Student)
Result2:= π name,team(Result1)
2. Union Operation:
Suppose there are two tuples R and S. The union operation contains all the tuples
denoted by ∪.
that are either in R or S or both in R & S. It eliminates the duplicate tuples. It is
Notation: R ∪ S
Example:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
3. Set Intersection:
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S. It is denoted by intersection ∩.
Notation: R ∩ S
Input:
∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith
Jones
4. Set Difference:
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S. It is denoted by intersection minus (-).
Notation: R - S
Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
5. Cartesian product
The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product. It is denoted by X.
Notation: E X D
Example: EMPLOYEE
DEPARTMENT
6. Rename operation:
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).
ρ(STUDENT1, STUDENT)
7. Join operations:
EMP_CODE EMP_NAME
EMP_CODE SALARY
101 Stephan
101 50000
102 Jack
102 30000
103 Harry
103 25000
Input: Operation: (EMPLOYEE ⋈ SALARY)
Result:
EMP_CODE EMP_NAME SALARY
101 Stephan 50000
102 Jack 30000
103 Harry 25000
8. Types of Join operations:
The three types of join operations are:- Natural Outer and erui joins.
A. Natural Join:
Example: Let's use the above EMPLOYEE table and SALARY table:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
B. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.
Example:
EMPLOYEE FACT_WORKERS
EMP_NAM BRANCH SALARY
EMP_NA STREET CITY
E
ME
Ram Infosys 10000
Ram Civil line Mumbai
Shyam Wipro 20000
Shyam Park street Kolkata
Kuber HCL 30000
Ravi M.G. Delhi
Hari TCS 50000
Street
Hari Nehru Hyderab
nagar ad
Input: (EMPLOYEE ⋈ FACT_WORKERS)
Output:
Left outer join contains the set of tuples of all combinations in R and S that are
Right outer join contains the set of tuples of all combinations in R and S that are
Output:
Full outer join is like a left or right join except that it contains all rows from both
tables. In full outer join, tuples in R that have no matching tuples in S and tuples in
by ⟗.
S that have no matching tuples in R in their common attribute name. It is denoted
Output:
C. Equi join:
Example:
CUSTOMER PRODUCT
CLASS_ID NAME
PRODUCT_ID CITY
1 John
1 Delhi
2 Harry
2 Mumbai
3 Jackson
3 Noida
CUSTOMER ⋈ PRODUCT
Input:
Output:
Translating SQL queries into relational algebra involves expressing the query
semantics using operations that align with the relational algebra formalism. Below,
I will go through a few SQL queries and translate them into relational algebra with
suitable examples.
Example Schema
Employees Table:
Departments Table:
| DeptID | DeptName |
|--------|---------------|
| 101 | Engineering |
| 102 | Sales |
| 103 | HR |
SQL Query:
Relational Algebra:
σ(DeptID = 101)(Employees)
Explanation: The selection operator σ is used to filter rows based on the condition
DeptID = 101.
Example 2: Projection
SQL Query:
Relational Algebra:
π(EmpName)(Employees)
Explanation: The projection operator π selects only the EmpName column from the
Employees table.
Relational Algebra:
Explanation: The join operator ⨝ combines rows from both tables based on the
matching DeptID. The projection selects only EmpName and DeptName.
SQL Query:
Relational Algebra:
Explanation: The grouping operator γ is used here to group records by DeptID and
count the number of employees in each department.
Example 5: Union
SQL Query:
Explanation: The union operator ∪ combines the results from the two selection
operations where DeptID is either 101 or 102, projecting the EmpName.
Example 6: Intersection
SQL Query:
Relational Algebra:
Explanation: The intersection operator ∩ returns only the employee names that
appear in both sets (Department 101 and names starting with 'A').
1. Sequential Search
This is the simplest form of searching in a dataset, where each element is checked
until the desired value is found.
Example:
You have a table of users, and you want to find a user with the name "Alice".
Internally, a sequential search will iterate through each row in the users table until it
finds the row where the name is "Alice".
2. Binary Search
Binary search is a more efficient search method, but it requires that the data is
sorted. It works by repeatedly dividing the search interval in half.
Example:
Consider a sorted list of user IDs:
To find the user with a specific ID (say 7), the binary search would check the
middle of the list. If the middle ID is less than 7, it would search the upper half; if
greater, it would search the lower half.
3. Indexing
Example:
If you have an index on the name column in the users table, the database engine can
directly locate "Alice" without scanning every row.
With indexing, the database could quickly navigate to the index and retrieve the
relevant rows.
4. Hashing
Hashing involves using a hash function to compute the address of a data element,
leading to fast access times.
Example:
A hash index on a table of products could allow a quick lookup for a product by its
SKU (Stock Keeping Unit).
Instead of searching linearly or using a binary search, the database uses the hash
function to find the correct location of 'SKU1234' in the index.
5. Join Operations
Joining tables is a common operation in relational databases, and there are various
algorithms for this, including nested loop joins, hash joins, and merge joins.
Nested Loop Join: Go through each row of the first table and for each row,
search the second table.
Example:
SELECT *
FROM users u
JOIN orders o ON u.user_id = o.user_id;
In a nested loop join, for each user, the database will check all orders to find
matching user IDs.
Hash Join: Build a hash table from one of the tables and then probe it for
matches.
Merge Join: Requires both tables to be sorted; it merges them based on
matching keys.
Example:
To get only the names of users:
7. Aggregation
Aggregation functions like SUM, COUNT, AVG, MAX, and MIN are used to summarize
data.
Example:
To count how many users are in the database:
A heuristic might first filter the Products table by category_id, reducing the number of
rows in memory for subsequent operations.
Join Order Optimization: Generally, smaller tables are joined first to reduce
the size of intermediate results. For example, if you have three tables A, B, and C,
and B is significantly smaller than A or C, a heuristic might suggest:
3. WITH Temp AS (
4. SELECT id FROM Orders WHERE order_date > '2023-01-01'
5. )
6. SELECT * FROM Customers WHERE id IN (SELECT customer_id FROM Temp);
7.
A heuristic might determine that it’s more efficient to store Temp and reuse it.
1. Selectivity Calculation:
For a table with 10,000 rows, if a specific condition (like salary > 80,000)
returns around 500 rows, the selectivity for that predicate is 500 / 10,000 =
0.05 (or 5%).
2. Cost Estimates:
Cost estimates involve evaluating the resources that will be consumed to
execute a query, such as CPU time, I/O operations, and memory usage. Cost
models may consider:
o The number of rows to process based on selectivity.
o Data distribution (e.g., how indexed or clustered the data is).
o Join costs based on how tables are joined (nested loop, hash join, etc.).
Using selectivity, the database optimizer can expect how many rows will likely be
processed under different predicates and choose the most efficient plan. For
example, if choosing between:
SELECT * FROM Employees WHERE department_id = 2;
versus
If it’s known through statistics that department_id = 2 returns 20% of the data but salary
> 100000 returns only 5%, the optimizer might choose the first option for better
performance if index scans are utilized.
SELECT * FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE o.total > 100;
If you know that every customer with an order over 100 must exist (due to business
logic or constraints), it might be possible to restructure this:
SELECT * FROM Customers WHERE id IN (SELECT customer_id FROM Orders WHERE total >
100);
This can reduce the size of data being processed if enforced correctly.
Knowing the semantic rules could allow the optimizer to identify that if an
employee is in department 1 or 2, they can't be in 3. This means the second
condition is always true if the first condition is true, allowing that part to be
dropped or re-evaluated.