lecture-5
lecture-5
Evrad KAMTCHOUM
4 Hands-On Exercise
Definition:
A technique used to visually model and design databases.
Represents entities, attributes, and relationships.
Key Components:
Entities: Objects or concepts (e.g., Student, Course).
Attributes: Properties or characteristics of entities (e.g., Name, Age).
Relationships: Connections between entities (e.g., A Student *enrolls*
in a Course).
Diagram Notations:
Rectangles: Entities.
Diamonds: Relationships.
Ovals: Attributes.
Definition:
Process of organizing data to reduce redundancy and improve data
integrity.
Normalization Forms:
1NF: Eliminate duplicate columns and ensure atomic values.
2NF: Remove partial dependencies (ensure full functional dependency).
3NF: Remove transitive dependencies.
BCNF: Ensure every determinant is a candidate key.
Advantages:
Reduces redundancy.
Prevents update anomalies.
Improves data consistency.
Example:
A non-normalized table can be split into two normalized tables with
proper keys.
Types of Indexes:
B-trees: General-purpose indexing.
Hash indexes: For exact matches.
Full-text indexes: For searching large text fields.
Considerations:
Trade-offs between read/write performance.
Indexing frequently queried columns.
Partitioning:
Horizontal: Rows divided across partitions.
Vertical: Columns divided into subsets.
Sharding:
Splitting data across multiple servers.
Example: Splitting users by geographic region.
Key Features:
Ideal for scenarios where different columns are accessed frequently together.
Can optimize storage and query performance by reducing the amount of data read.
Advantages:
Reduces I/O overhead by accessing only relevant columns.
Improves cache utilization and data locality.
Allows separation of sensitive data into different partitions for security.
Example:
A ”Customer” table is split into two partitions:
Partition 1: CustomerID, Name, ContactInfo.
Partition 2: CustomerID, Preferences, PurchaseHistory.
Goals:
Minimize query execution time.
Reduce resource utilization (CPU, memory, disk I/O).
Role of query execution plans.
Scenario: Retrieve all employees earning more than 50, 000 from a
database with millions of records.
Optimization Steps:
The optimizer evaluates multiple query plans:
1 Full table scan.
2 Index scan using an index on the ”salary” column.
Cost is calculated for each plan based on:
Disk I/O.
CPU usage.
Memory usage.
The plan with the lowest cost (e.g., the index scan) is selected.
Transformation Rules:
Join reordering.
Predicate pushdown. Predicate is a statement or assumption about reality. I CS it
has 2 possible values
Simpler than cost-based but less effective for complex queries.
Example Query:
SELECT e.name, d.name
FROM employees e
JOIN departments d
ON e.dept_id = d.dept_id
WHERE d.name = ’Sales’;
Optimization Outcome:
The optimizer applies rules to determine join order and execution.
Filters are applied before joining to minimize processing overhead.
Query rewriting:
Replace subqueries with joins.
Materialized views:
Store precomputed results for reuse.
Avoid redundant calculations.
SELECT name
FROM employees
WHERE salary > 50000;
SELECT name
FROM employees
WHERE salary > 50000 AND active = 1;
Advantages:
Reduces data scanned by leveraging indexes.
Simplifies execution plan by pre-filtering data.
Advantages:
Avoids recomputation of aggregate data.
Reduces query execution time significantly.
Advantages:
Reduces execution time by using precomputed results.
Minimizes resource usage during query execution.
E-commerce System:
Challenges: High query volume, real-time data analysis.
Solution: Denormalization and indexing.
Data Warehousing:
Challenges: Complex analytical queries.
Solution: Star schema with materialized views.