Data Redundancy
Data Redundancy
Data Redundancy
The file system's structure complicates data combination from multiple sources and is
vulnerable to security breaches. Its organization leads to storing the same data in different
locations, resulting in inconsistent updates and data redundancy. Data redundancy occurs
when the same data is unnecessarily stored in various places, leading to different versions of
the same information.
1. Poor data security: Having multiple copies of data increases the risk of unauthorized access.
2. Data inconsistency: Different and conflicting versions of the same data appear in different
places.
3. Data-entry errors: Complex entries are more prone to errors when made in several files or
recur frequently.
4. Data integrity problems: Inaccurate entries, like a non-existent sales agent's name, can lead
to customer dissatisfaction.
Dependencies
Determination means that knowing the value of one attribute allows you to determine the value
of another. In a database, this is typically based on the relationships between attributes.
Functional dependence describes relationships where the value of one or more attributes
determines the value of other attributes. The determining attribute is called the determinant or
key, while the attribute being determined is the dependent. For example, in a Student table,
STU_NUM is the determinant and STU_LNAME is the dependent because STU_NUM determines
STU_LNAME. Functional dependence can also involve multiple determinants and dependents.
Types of keys
A composite key consists of more than one attribute. An attribute that is part of a key is known
as a key attribute.
A super-key is a key that can uniquely identify any row in a table, meaning it functionally
determines every attribute in the row. A composite containing a super-key is also a super-key.
However, not all keys are super-keys.
A candidate key is a minimal super-key, meaning it has no unnecessary attributes and is based
on full functional dependency. A table can have multiple candidate keys, and they are called
candidate keys because they are the eligible options from which the designer will choose the
primary key.
The primary key is the candidate key selected to uniquely identify the rows of a table. Entity
integrity ensures that each row in the table has a unique identity. To maintain entity integrity, the
primary key must have unique values, and no key attribute within the primary key can be null.
A null represents the absence of any data value and is never permitted in any part of the primary
key. Nulls should generally be avoided because they can indicate poor design and their meaning
is often unclear. For instance, a null could signify an unknown attribute value, a known but
missing attribute value, or a "not applicable" condition.
Normalisation
In database management systems (DBMS), normal forms are a series of guidelines that help
to ensure that the design of a database is efficient, organized, and free from data anomalies.
There are several levels of normalization, each with its own set of guidelines, known as normal
forms.
First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell
should contain only a single value, and each column should have a unique name. The first
normal form helps to eliminate duplicate data and simplify queries.
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key
attribute be dependent on the primary key. This means that each column should be directly
related to the primary key, and not to other columns.
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are
independent of each other. This means that each column should be directly related to the
primary key, and not to any other columns in the same table.
Steps:
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each
determinant in a table is a candidate key. In other words, BCNF ensures that each non-key
attribute is dependent only on the candidate key.
Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does
not contain any multi-valued dependencies.
Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a
table into smaller tables to remove data redundancy and improve data integrity.
DB Performance Tuning
refers to a set of activities and procedures designed to reduce the response time of the
database system
One of the main functions of a database system is to provide timely answers to end users. End
users interact with the DBMS through queries to generate information, using the following
sequence:
4. The DBMS sends the resulting data set to the end-user (client-end) application
On the client side, SQL performance tuning aims to create queries that return the correct
answers quickly and with minimal server resources.
On the server side, DBMS performance tuning involves configuring the DBMS environment to
respond to client requests as quickly as possible while optimizing resource usage.
Query Optimization
is an aspect of DBMS that determines the most efficient way to execute a given query.
Query optimization algorithms can also be classified according to when the optimization is
done. Within this timing classification, query optimization algorithms can be static or dynamic.
Static query optimization occurs at compilation time, meaning the best optimization strategy
is chosen when the query is compiled by the DBMS.
Dynamic query optimization occurs at execution time. The database access strategy is
defined when the program runs, allowing the DBMS to use the most up-to-date information to
determine the access strategy dynamically.
An Entity Relationship (ER) model is a visual representation of the data and its relationships
within a database. It uses entities, attributes, and relationships to organize data:
1. Entities: These are objects or things in the business that have data stored about them, like
"Customer" or "Product."
2. Attributes: These are details about the entities, like a customer's name or a product's price.
3. Relationships: These show how entities are related to each other, such as a customer
placing an order.
Business rules are specific conditions or guidelines that define how data can be created,
stored, and modified. These rules ensure the database accurately reflects the business's
processes and policies. For example, a business rule might state that each order must be linked
to a customer, ensuring that orders cannot exist without an associated customer.
In simplest terms, an ER model maps out what data is stored and how it's connected, while
business rules define the constraints and guidelines for handling that data.
1. Identify Entities:
o Determine the main subjects or objects in the table. Each row usually
represents an entity.
o Example: In a "Customer" table, each row is a customer, so "Customer" is the
entity.
2. Identify Attributes:
o Identify the columns in the table; these are the attributes of the entities.
o Example: In the "Customer" table, attributes might include "Customer_ID,"
"Name," "Address," "Phone," etc.
3. Determine Primary Key:
o Identify the primary key, a unique identifier for each entity.
o Example: "Customer_ID" might be the primary key in the "Customer" table.
4. Identify Relationships:
o Look for columns that reference other tables, indicating relationships between
entities.
o Example: If the "Customer" table has an "Order_ID" column referencing an
"Orders" table, there's a relationship between "Customer" and "Order."
5. Create ER Diagram:
o Draw rectangles for each entity.
o Inside each rectangle, list the attributes, with the primary key highlighted or
underlined.
o Draw lines to represent relationships between entities, adding labels to
describe the nature of the relationships.
6. Define Relationship Types:
o Determine whether relationships are one-to-one, one-to-many, or many-to-
many.
o Example: One customer can place many orders (one-to-many relationship).