0% found this document useful (0 votes)
6 views

lecture-5

The document discusses advanced database design and query optimization techniques, emphasizing the importance of robust design for performance and scalability. It covers fundamental concepts such as ER modeling, normalization, and SQL basics, as well as advanced techniques like denormalization, indexing, and partitioning. The document also highlights the significance of query optimization for efficient data access and includes case studies and hands-on exercises.

Uploaded by

pekogroup2017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

lecture-5

The document discusses advanced database design and query optimization techniques, emphasizing the importance of robust design for performance and scalability. It covers fundamental concepts such as ER modeling, normalization, and SQL basics, as well as advanced techniques like denormalization, indexing, and partitioning. The document also highlights the significance of query optimization for efficient data access and includes case studies and hands-on exercises.

Uploaded by

pekogroup2017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Database Systems and Security

Advanced Database Design and Query Optimization

Evrad KAMTCHOUM

CENTER FOR CYBERSECURITY AND MATHEMATICAL CRYPTOLOGY


THE UNIVERSITY OF BAMENDA

November 21, 2024

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 1 / 34


Contents

1 Advanced Database Design Techniques

2 Query Optimization Techniques

3 Case Studies and Applications

4 Hands-On Exercise

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 2 / 34


Introduction to Advanced Database Design

Importance of robust database design in complex systems.


How design impacts query optimization.
Goals:
Enhance system performance.
Ensure scalability and maintainability.
Connection between database design and query optimization:
Well-designed databases lead to efficient query execution.
Query optimization enhances performance in large-scale systems.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 3 / 34


Review of Fundamental Concepts

Entity-Relationship (ER) Modeling:


Entities, attributes, and relationships.
Normalization:
Process of reducing data redundancy.
Up to Boyce-Codd Normal Form (BCNF).
SQL Basics:
Designing tables.
Writing basic queries.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 4 / 34


Review: Entity-Relationship (ER) Modeling

Definition:
A technique used to visually model and design databases.
Represents entities, attributes, and relationships.
Key Components:
Entities: Objects or concepts (e.g., Student, Course).
Attributes: Properties or characteristics of entities (e.g., Name, Age).
Relationships: Connections between entities (e.g., A Student *enrolls*
in a Course).
Diagram Notations:
Rectangles: Entities.
Diamonds: Relationships.
Ovals: Attributes.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 5 / 34


Review: Normalization

Definition:
Process of organizing data to reduce redundancy and improve data
integrity.
Normalization Forms:
1NF: Eliminate duplicate columns and ensure atomic values.
2NF: Remove partial dependencies (ensure full functional dependency).
3NF: Remove transitive dependencies.
BCNF: Ensure every determinant is a candidate key.
Advantages:
Reduces redundancy.
Prevents update anomalies.
Improves data consistency.
Example:
A non-normalized table can be split into two normalized tables with
proper keys.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 6 / 34


Review: SQL Basics
Definition:
SQL (Structured Query Language) is used to manage and query
relational databases.
Key Commands:
DDL (Data Definition Language):
’CREATE TABLE’: Defines new tables.
’ALTER TABLE’: Modifies existing tables.
’DROP TABLE’: Deletes tables.
DML (Data Manipulation Language):
’INSERT INTO’: Adds new data.
’UPDATE’: Modifies existing data.
’DELETE FROM’: Removes data.
DQL (Data Query Language):
’SELECT’: Retrieves data from tables.
Example:
Retrieve all records from a table: SELECT * FROM Students;
Filter with conditions: SELECT Name FROM Students WHERE Age >
18;
Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 7 / 34
Advanced Database Design Techniques

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 8 / 34


Challenges in Database Design

Large-scale and distributed systems.


Handling semi-structured and unstructured data.
Achieving balance between normalization and performance.
Common advanced database design techniques:
Denormalization
Indexing
Partitioning and Sharding
Data Warehousing Design

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 9 / 34


Denormalization

Definition: Introducing redundancy to improve query performance.


Trade-offs:
Faster reads.
Increased storage and potential inconsistency.
Example: Merging tables to avoid joins.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 10 / 34


Indexing Strategies

Indexing speed query processes.


Indexing is a strategy which
consist on using userdata on
metadata to speed up
processing

Types of Indexes:
B-trees: General-purpose indexing.
Hash indexes: For exact matches.
Full-text indexes: For searching large text fields.
Considerations:
Trade-offs between read/write performance.
Indexing frequently queried columns.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 11 / 34


Indexing with B-trees

Key Features of B-trees:


Balanced tree data structure used for database indexing.
Nodes have multiple keys and pointers, facilitating efficient search,
insert, and delete operations.
Keeps data sorted and supports logarithmic time complexity.
Advantages:
Efficient for range queries.
Performs well with large datasets.
Handles dynamic updates to the database without reorganization.
Use Cases:
Primary and secondary indexes in relational databases.
File systems and general-purpose search systems.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 12 / 34


Indexing with Hash Indexes

Key Features of Hash Indexes:


Use a hash function to map keys to index buckets.
Ideal for exact-match queries.
Do not preserve order, making them unsuitable for range queries.
Advantages:
Fast lookup for equality comparisons.
Compact data representation.
Low overhead for specific workloads.
Limitations:
Poor performance for range queries or sorted data requirements.
Rehashing required if the table size changes significantly.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 13 / 34


Full-text Indexes

What are Full-text Indexes?


Specialized indexes for text data to optimize search and retrieval.
Designed to handle natural language queries.
Match against words, phrases, or patterns in large text fields.
Applications:
Search engines.
Content management systems.
Log analysis and information retrieval.
Features:
Supports complex queries like boolean searches, phrase matching, and
proximity.
Can rank results based on relevance.
Efficient handling of synonyms, stemming, and stop words.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 14 / 34


Partitioning and Sharding

Partitioning:
Horizontal: Rows divided across partitions.
Vertical: Columns divided into subsets.
Sharding:
Splitting data across multiple servers.
Example: Splitting users by geographic region.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 15 / 34


Horizontal Partitioning
Definition:
Horizontal partitioning splits a table into multiple smaller tables
(partitions), each containing a subset of rows.
Each partition has the same columns but contains different rows
based on a partitioning condition.
Key Features:
Useful for distributing rows across multiple storage locations.
Partitions are typically based on ranges, lists, or hash functions
applied to a key.
Advantages:
Improved query performance for subsets of data.
Easier management of data retention policies.
Enables parallel processing by working on partitions independently.
Example:
A ”Sales” table is partitioned by region, with each partition storing
sales data for a specific region.
Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 16 / 34
Vertical Partitioning
Definition:
Vertical partitioning splits a table into smaller tables by dividing its columns.
Each partition contains a subset of columns and shares a common primary key to allow
reconstruction.

Key Features:
Ideal for scenarios where different columns are accessed frequently together.
Can optimize storage and query performance by reducing the amount of data read.

Advantages:
Reduces I/O overhead by accessing only relevant columns.
Improves cache utilization and data locality.
Allows separation of sensitive data into different partitions for security.

Example:
A ”Customer” table is split into two partitions:
Partition 1: CustomerID, Name, ContactInfo.
Partition 2: CustomerID, Preferences, PurchaseHistory.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 17 / 34


Query Optimization Techniques

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 18 / 34


Importance and Benefits of Query Optimization
Importance of Query Optimization:
Ensures efficient execution of database queries by minimizing resource usage.
Enhances the performance of applications relying on the database.
Crucial for handling large-scale data in modern applications.

Key Benefits of Query Optimization:


Reduced Execution Time: Faster query response times improve user experience and
application performance.
Lower Resource Consumption: Minimizes CPU, memory, and disk I/O usage, leading to
cost savings.
Scalability: Enables the database to handle more queries and larger datasets effectively.
Improved Throughput: Increases the number of queries processed in a given time.
Energy Efficiency: Reduces energy consumption by optimizing hardware usage.
Consistent Performance: Provides predictable query execution times, essential for
real-time applications.

Conclusion: Query optimization is a cornerstone of modern database systems, ensuring


reliability, performance, and cost-efficiency in managing data-intensive applications.
Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 19 / 34
Overview of Query Optimization

Goals:
Minimize query execution time.
Reduce resource utilization (CPU, memory, disk I/O).
Role of query execution plans.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 20 / 34


Cost-Based Query Optimization

Estimation of query cost based on:


Disk I/O.
CPU usage.
Memory consumption.
Use of statistics about data distribution.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 21 / 34


Example of Cost-Based Query Optimization

Scenario: Retrieve all employees earning more than 50, 000 from a
database with millions of records.
Optimization Steps:
The optimizer evaluates multiple query plans:
1 Full table scan.
2 Index scan using an index on the ”salary” column.
Cost is calculated for each plan based on:
Disk I/O.
CPU usage.
Memory usage.
The plan with the lowest cost (e.g., the index scan) is selected.

Result: Efficient execution with minimal resource usage.


Note: Cost estimation depends on factors such as table size, index
availability, and system statistics.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 22 / 34


Rule-Based Query Optimization

Transformation Rules:
Join reordering.
Predicate pushdown. Predicate is a statement or assumption about reality. I CS it
has 2 possible values
Simpler than cost-based but less effective for complex queries.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 23 / 34


Example of Rule-Based Query Optimization
Scenario: Retrieve data from two tables using a join operation.
Optimization Rules:
Apply join order rules:
Smaller tables are joined first to reduce intermediate result size.
Filters (WHERE clauses) are applied early to reduce data volume.
Use indexed columns for join conditions wherever possible.

Example Query:
SELECT e.name, d.name
FROM employees e
JOIN departments d
ON e.dept_id = d.dept_id
WHERE d.name = ’Sales’;

Optimization Outcome:
The optimizer applies rules to determine join order and execution.
Filters are applied before joining to minimize processing overhead.

Result: Optimized query execution based on predefined rules.


Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 24 / 34
Common Optimization Techniques

Query rewriting:
Replace subqueries with joins.
Materialized views:
Store precomputed results for reuse.
Avoid redundant calculations.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 25 / 34


Query Optimization Technique: Query Rewriting
Definition: Query rewriting involves transforming a query into a semantically
equivalent query that is more efficient to execute.
Example:
Original Query:

SELECT name
FROM employees
WHERE salary > 50000;

Rewritten Query (using an indexed column):

SELECT name
FROM employees
WHERE salary > 50000 AND active = 1;

Advantages:
Reduces data scanned by leveraging indexes.
Simplifies execution plan by pre-filtering data.

Outcome: Improved query performance with lower resource usage.


Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 26 / 34
Query Optimization Technique: Materialized Views
Definition: A materialized view stores precomputed query results for quick retrieval.
Example:
The view is the result of a
Original Query: query

SELECT department, AVG(salary)


FROM employees
GROUP BY department;

Optimization with Materialized View:


CREATE MATERIALIZED VIEW avg_salary AS
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

SELECT avg_salary FROM avg_salary WHERE department = ’Sales’;

Advantages:
Avoids recomputation of aggregate data.
Reduces query execution time significantly.

Outcome: Faster response time for repeated queries.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 27 / 34


Query Optimization Technique: Avoid Redundant
Calculations
Definition: Avoiding redundant calculations means eliminating unnecessary
computations during query execution.
Example:
Original Query:
SELECT SUM(salary) / COUNT(*) AS avg_salary
FROM employees
WHERE department = ’Sales’;

Optimized Query (with precomputed aggregate):


SELECT avg_salary
FROM precomputed_aggregates
WHERE department = ’Sales’;

Advantages:
Reduces execution time by using precomputed results.
Minimizes resource usage during query execution.

Outcome: Increased efficiency for repetitive and resource-intensive queries.


Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 28 / 34
Case Studies and Applications

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 29 / 34


Case Studies

E-commerce System:
Challenges: High query volume, real-time data analysis.
Solution: Denormalization and indexing.
Data Warehousing:
Challenges: Complex analytical queries.
Solution: Star schema with materialized views.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 30 / 34


Hands-On Exercise

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 31 / 34


Exercise

Given a denormalized database table, identify possible indexes to


improve query performance.
Write an optimized query for retrieving data from a large dataset.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 32 / 34


Conclusion

Advanced database design improves performance and scalability.


Query optimization is critical for efficient data access.
Tools and techniques must be applied based on specific use cases.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 33 / 34


Further Reading

”Database System Concepts” by Silberschatz et al.


Research papers on cost-based query optimization.
Tutorials for query tuning in MySQL and PostgreSQL.

Evrad KAMTCHOUM (CCMC (UBa)) Database Systems November 21, 2024 34 / 34

You might also like