Exam Avanced
Exam Avanced
Objects. The basic building block and an instance of a class. The type is either built-
in or user-defined.
Classes. A schema or blueprint that defines object structure and behavior.
Methods. A blueprint that defines the behavior of a class.
Pointers. An entity that helps access elements of an object database. They also help
establish relationships between objects.
Relational databases. Data resides in structured tables with rows and columns. It is
queried using SQL, focusing mainly on data consistency and normalization.
OODB. Data is stored as a complete and complex object. It supports complex data
types and inheritance. The main focus is to minimize the difference between a
database structure and a programming language.
Key-value databases. A simple structure with key-value pairs. Ideal for lookup tables
and data caching, but there is no support for complex relationships.
OODB. Supports complex relationships, inheritance, and encapsulation.
Graph databases. Structured for highly connected data. Uses edges and nodes to
represent relationships between entities.
OODB. Focuses on objects and their behavior while supporting relationships, though
not as interconnected as graph databases.
Polymorphism.
Inheritance.
Encapsulation.
Abstraction.
Polymorphism
Polymorphism is the capability of an object to take multiple forms. This ability allows the
same program code to work with different data types.
To illustrate, a Vehicle class can be defined to have a method called brake() . A Car and
a Bike can both inherit from the class and implement their version of the brake() method.
The same method is applied for different behaviors, resulting in polymorphism.
Inheritance
Inheritance creates a hierarchical relationship between related classes while making parts of
code reusable. Defining new types inherits all the existing class fields and methods plus
further extends them. The existing class is the parent class, while the child class extends the
parent. For example, a parent class called Vehicle can have child classes such
as Car and Bike . Both child classes inherit information from the parent class. They
also extend the parent class with new information depending on the methods defined for each
vehicle type.
Encapsulation
Encapsulation allows grouping variables and methods into a single object to create access
protection. This process hides information and details of how an object works from the rest
of the code and results in data and function security.
Classes interact with each other through interface methods without the need to know how
particular methods work.
For example, a Car class can have properties such as color , make , and model and methods
such as changeColor() . You can change the color of a car through a method, yet
the model and make are not accessible. Encapsulation bundles all the car information into
one entity, where some elements are modifiable while some are not.
Abstraction
Abstraction is the process of focusing on the essential characteristics to
provide functionality. The process selects vital information while unnecessary details stay
hidden. Abstraction helps reduce data complexity and simplifies code reusability. For
example, when a web browser connects to the internet, it doesn't need to know the specific
connection details. Whether the connection is established through Wi-Fi or Ethernet is
irrelevant. The specific connection type is hidden from the browser to create an abstraction,
whereas the various types of connections represent different implementations of the
abstraction.
Feature Description
Query Language Finds objects and retrieve data from the database.
Transparent Allows accessing and using data with an object-oriented programming language without
Persistence special handling.
Ensures that ACID transactions, and guarantees all transactions are completed without
ACID Transactions
conflicting changes.
Creates a partial replica of the database in memory. Allows faster access to a database
Database Caching
without reading from disk.
Chapter-2
QUERY PROCESSING AND OPTIMIZATION
QUERY PROCESSING
Query Optimization
1. Access Path Selection: Choosing the most efficient method to access the required
data, such as using an index or a full table scan.
2. Join Order Optimization: Determining the optimal order in which to join tables,
taking into account factors such as data distribution and indexing.
3. Query Transformation: Modifying the query to reduce overhead, improve
parallelism, or optimize for specific data distributions.
4. Index Selection: Choosing the most effective index for a query, considering factors
such as data distribution, indexing overhead, and query frequency.
The activities involved in parsing, validating, execution and optimizing a query is called Query
Processing. It is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
1. Parsing and translation
2. Evaluation
3. Optimization
Parsing and Translation
As query processing includes certain activities for data retrieval. Initially, the given user queries
get translated in high-level database languages such as SQL. It gets translated into expressions
that can be further used at the physical level of the file system. After this, the actual evaluation
of the queries and a variety of query optimizing transformations and takes place. Thus before
processing a query, a computer system needs to translate the query into a human-readable and
understandable language. Consequently, SQL or Structured Query Language is the best
suitable choice for humans. But, it is not perfectly suitable for the internal representation of the
query to the system. Relational algebra is well suited for the internal representation of a query.
The translation process in query processing is similar to the parser of a query. When a user
executes any query, for generating the internal form of the query, the parser in the system
checks the syntax of the query, verifies the name of the relation in the database, the tuple, and
finally the required attribute value. The parser creates a tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With this, it evenly replaces all the use
of the views when used in the query. Thus, we can understand the working of a query
processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:
Select emp_name from Employee where salary>10000;
Example-1:
The query above (immediate) is called nested expression, here, as usual, we evaluate the inner
expression first (which results in relation say Manager1), then we calculate the outer expression
on Manager1 (the relation we obtained from evaluating the inner expression), which results in
relation again, which is an instance of a relation we input.
Example-2:
Given a relation Student (Roll, Name, Class, Fees, and Team) with the following tuples:
Roll Name Department Fees Team
1 Abebe CS 22000 A
2 Kebede CS 34000 A
3 Lwam IT 36000 C
4 Aster IT 56000 D
Select all the student of Team A :
σ Team = 'A' (Student)
This results as follows:
Roll Name Department Fees Team
1 Abebe CS 22000 A
2 Kebede CS 34000 A
Select all the students of department IT whose fees is greater than equal to 10000 and belongs to
Team other than A.
σ Fees >= 10000(σTeam != 'A' (Student))
This results as follows:
Roll Name Department Fees Team
3 Lwam IT 36000 C
4 Aster IT 56000 D
Important points about Select operation: Select operator is Unary, means it it applied to single
relation only. Selection operation is commutative that is,
The degree (number of attributes) of resulting relation from a Selection operation is same as the
degree of the Relation given. The cardinality (number of tuples) of resulting relation from a
Selection operation is,
0 <= σ c (R) <= |R|
Difference between Selection and Projection in DBMS
2. Union Operation:
Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S. It eliminates the duplicate tuples. It is
denoted by ∪.
Notation: R ∪ S
Example:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
3. Set Intersection:
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S. It is denoted by intersection ∩.
Notation: R ∩ S
Input:
∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith
Jones
4. Set Difference:
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S. It is denoted by intersection minus (-).
Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
5. Cartesian product
The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product. It is denoted by X.
Notation: E X D
Example: EMPLOYEE
DEPARTMENT
Output:
EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
6. Rename operation:
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).
ρ(STUDENT1, STUDENT)
7. Join operations:
A Join operation combines related tuples from different relations, if and only if a
given join condition is satisfied. It is denoted by ⋈.
EMP_CODE EMP_NAME
EMP_CODE SALARY
101 Stephan
101 50000
102 Jack
102 30000
103 Harry
103 25000
Input:
Operation: (EMPLOYEE ⋈ SALARY)
Result:
A. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on
their common attribute names. It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
B. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.
Example:
EMPLOYEE FACT_WORKERS
EMP_NAME BRANCH SALARY
Ram Infosys 10000
Shyam Wipro 20000
Kuber HCL 30000
Hari TCS 50000
EMP_NAME STREET CITY
Ram Civil line Mumbai
Shyam Park street Kolkata
Ravi M.G. Street Delhi
Hari Nehru nagar Hyderabad
Input: (EMPLOYEE ⋈ FACT_WORKERS)
Output:
Left outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names. In the left outer join, tuples in R have no
matching tuples in S. It is denoted by ⟕.
Right outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names. In right outer join, tuples in S have no
matching tuples in R. It is denoted by ⟖.
Full outer join is like a left or right join except that it contains all rows from both
tables. In full outer join, tuples in R that have no matching tuples in S and tuples in
S that have no matching tuples in R in their common attribute name. It is denoted
by ⟗.
Output:
C. Equi join:
Example:
CUSTOMER PRODUCT
CLASS_ID NAME
1 John
2 Harry
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Jackson
3 Noida
Input:
CUSTOMER ⋈ PRODUCT
Output:
Example Schema
Employees Table:
Departments Table:
| DeptID | DeptName |
|--------|---------------|
| 101 | Engineering |
| 102 | Sales |
| 103 | HR |
SQL Query:
Relational Algebra:
σ(DeptID = 101)(Employees)
Explanation: The selection operator σ is used to filter rows based on the condition
DeptID = 101.
Example 2: Projection
SQL Query:
Relational Algebra:
π(EmpName)(Employees)
Explanation: The projection operator π selects only the EmpName column from the
Employees table.
SQL Query:
SELECT EmpName, DeptName
FROM Employees
JOIN Departments ON Employees.DeptID = Departments.DeptID;
Relational Algebra:
Explanation: The join operator ⨝ combines rows from both tables based on the
matching DeptID. The projection selects only EmpName and DeptName.
SQL Query:
Relational Algebra:
Explanation: The grouping operator γ is used here to group records by DeptID and
count the number of employees in each department.
Example 5: Union
SQL Query:
Relational Algebra:
π(EmpName)(σ(DeptID = 101)(Employees)) ∪ π(EmpName)(σ(DeptID = 102)(Employees))
Explanation: The union operator ∪ combines the results from the two selection
operations where DeptID is either 101 or 102, projecting the EmpName.
Example 6: Intersection
SQL Query:
Relational Algebra:
Explanation: The intersection operator ∩ returns only the employee names that
appear in both sets (Department 101 and names starting with 'A').
1. Sequential Search
This is the simplest form of searching in a dataset, where each element is checked
until the desired value is found.
Example:
You have a table of users, and you want to find a user with the name "Alice".
SELECT * FROM users WHERE name = 'Alice';
Internally, a sequential search will iterate through each row in the users table until it
finds the row where the name is "Alice".
2. Binary Search
Binary search is a more efficient search method, but it requires that the data is
sorted. It works by repeatedly dividing the search interval in half.
Example:
Consider a sorted list of user IDs:
To find the user with a specific ID (say 7), the binary search would check the
middle of the list. If the middle ID is less than 7, it would search the upper half; if
greater, it would search the lower half.
3. Indexing
Example:
If you have an index on the name column in the users table, the database engine can
directly locate "Alice" without scanning every row.
With indexing, the database could quickly navigate to the index and retrieve the
relevant rows.
4. Hashing
Hashing involves using a hash function to compute the address of a data element,
leading to fast access times.
Example:
A hash index on a table of products could allow a quick lookup for a product by its
SKU (Stock Keeping Unit).
Instead of searching linearly or using a binary search, the database uses the hash
function to find the correct location of 'SKU1234' in the index.
5. Join Operations
Joining tables is a common operation in relational databases, and there are various
algorithms for this, including nested loop joins, hash joins, and merge joins.
Nested Loop Join: Go through each row of the first table and for each row,
search the second table.
Example:
SELECT *
FROM users u
JOIN orders o ON u.user_id = o.user_id;
In a nested loop join, for each user, the database will check all orders to find
matching user IDs.
Hash Join: Build a hash table from one of the tables and then probe it for
matches.
Merge Join: Requires both tables to be sorted; it merges them based on
matching keys.
7. Aggregation
Aggregation functions like SUM, COUNT, AVG, MAX, and MIN are used to summarize
data.
Example:
To count how many users are in the database:
A heuristic might first filter the Products table by category_id, reducing the number of
rows in memory for subsequent operations.
Join Order Optimization: Generally, smaller tables are joined first to reduce
the size of intermediate results. For example, if you have three tables A, B, and C,
and B is significantly smaller than A or C, a heuristic might suggest:
3. WITH Temp AS (
4. SELECT id FROM Orders WHERE order_date > '2023-01-01'
5. )
6. SELECT * FROM Customers WHERE id IN (SELECT customer_id FROM Temp);
7.
A heuristic might determine that it’s more efficient to store Temp and reuse it.
1. Selectivity Calculation:
For a table with 10,000 rows, if a specific condition (like salary > 80,000)
returns around 500 rows, the selectivity for that predicate is 500 / 10,000 =
0.05 (or 5%).
2. Cost Estimates:
Cost estimates involve evaluating the resources that will be consumed to
execute a query, such as CPU time, I/O operations, and memory usage. Cost
models may consider:
o The number of rows to process based on selectivity.
o Data distribution (e.g., how indexed or clustered the data is).
o Join costs based on how tables are joined (nested loop, hash join, etc.).
Using selectivity, the database optimizer can expect how many rows will likely be
processed under different predicates and choose the most efficient plan. For
example, if choosing between:
SELECT * FROM Employees WHERE department_id = 2;
versus
If it’s known through statistics that department_id = 2 returns 20% of the data but salary
> 100000 returns only 5%, the optimizer might choose the first option for better
performance if index scans are utilized.
SELECT * FROM Customers c JOIN Orders o ON c.id = o.customer_id WHERE o.total > 100;
If you know that every customer with an order over 100 must exist (due to business
logic or constraints), it might be possible to restructure this:
SELECT * FROM Customers WHERE id IN (SELECT customer_id FROM Orders WHERE total >
100);
This can reduce the size of data being processed if enforced correctly.
Knowing the semantic rules could allow the optimizer to identify that if an
employee is in department 1 or 2, they can't be in 3. This means the second
condition is always true if the first condition is true, allowing that part to be
dropped or re-evaluated.