DB Normalization
DB Normalization
Normalization Rules
• Each table must have a primary key (PK) that uniquely identifies each row. The PK
can be a composite, that is, can consist of several columns, for example, Order Table
(Order ID, Order Date, Customer ID, Product ID, Product Name, Price, Quantity). In
this notation, the underlined columns are the PKs; in this case, Order ID and Product
ID are a composite PK.
• It must be in 1NF.
• When each value in column 1 is associated with a value in column 2, we say that
column 2 is dependant on column 1, for example, Customer (Customer ID, Customer
Name). Customer Name is dependant on Customer ID, noted as Customer ID ➤
Customer Name.
505
506 APPENDIX ■ NORMALIZATION RULES
• In 2NF, all non-PK columns must be dependent on the entire PK, not just on part of
it, for example, Order Table (Order ID, Order Date, Product ID, Price, Quantity). The
underlined columns are a composite PK. Order Date is dependent on Order ID but
not on Product ID. This violates 2NF.
• To make it 2NF, we need to break it into two tables: Order Header (Order ID, Order
Date) and Order Item (Order ID, Product ID, Price, Quantity). Now all non-PK columns
are dependent on the entire PK. In the Order Header table, Order Date is dependent on
Order ID. In the Order Item table, Price and Quantity are dependent on Order ID and
Product ID. Order ID in the Order Item table is a foreign key.
• It must be in 2NF.
• To make it 3NF, we need to break it into two tables: Product (Product ID, Product Name,
Category ID) and Category (Category ID, Category Name). Now no column is transi-
tively dependent on the PK. Category ID in the Product table is a foreign key.
• It must be in 3NF.
• BCNF is applicable to situations where you have two or more candidate composite
PKs, such as with a cable TV service engineer visiting customers: Visit (Date, Route ID,
Shift ID, Customer ID, Engineer ID, Vehicle ID). A visit to a customer can be identified
using Date, Route ID, and Customer ID as the composite PK. Alternatively, the PK can
be Shift ID and Customer ID. Shift ID is the determinant of Date and Route ID.
APPENDIX ■ NORMALIZATION RULES 507
• A table is in fourth normal form (4NF) when it is in BCNF and there are no multivalued
dependencies.
• A table is in fifth normal form (5NF) when it is in 4NF and there are no cyclic
dependencies.
■Note A sixth normal form (6NF) has been suggested, but it’s not widely accepted or implemented yet.
Index
operational data store (ODS). See ODS NDS, creating physically. See NDS physical
(operational data store ) database, creating
operational system alerts, 437 partitioning tables. See partitioned tables
opting out (permissions), 454 (databases)
order column, defined, 318 sizing database server, 116–118
order header table SQL Server, editions of, 118–119
example, 182 SQL Server, licensing of, 119
NDS physical database, 151–153 storage requirements, calculating,
ordered columns (data mining), 420 120–123
summary tables, 161
■P views. See views (database object)
package, ETL, defined, 31 PIM (product information management), 22
package table (ETL process metadata), PM (project manager), function of (example),
318–320 56
parallel database system. See MPP (massively populating data warehouses
parallel processing) database system data firewall, creating, 215, 218–219
parallel index operations, 119 DDS dimension tables, 215, 250–266
parallel query, defined, 10 DDS fact tables, 266–269
parameters, report ETL batches, 269
Division parameter example, 349–351 NDS, 215, 219–228
naming, 343 NDS with SSIS, 228–235
overview, 342 near real-time ETL, 270
Quarter parameter example, 346–348 normalization, 242–248
Year parameter example, 345–346 overview, 215
partition indexes, aligning, 166 pushing data approach, 270–271
partitioned cubes, 119 SSIS practical tips, 249–250
partitioned tables (databases) stage loading, 215, 216–217
administering, 166 upsert using Lookup transformation, 236
creating indexes in, 170 upsert using SQL statements, 235–236
loading/query of partitioned tables, 163 portals
maintenance of, 500 applications (BI), 438–439
Subscription Sales fact table example, 162, creating data warehouse, 489
163–166 post office organizations, 290
vertical/horizontal partitioning, 162 Prediction Query Builder, 417
partitioning, table and index, 118 predictive analysis
patches, security, 498 basics, 13
per-processor licenses (SQL Server), 119 customer analysis (example), 461
performance in data mining, 416
requirements, 483 defined, 14
testing, defined, 477 PredictProbability function, 432
testing, fundamentals, 482–484 primary keys, naming, 146
periodic snapshots processes
defined, 11 data quality, 274–277
fact table, 90, 269 ETL, 31
periodic updating of data, 6 mining structure (data mining), 423–424
permissions ProClarity Analytics 6, 380
management (CRM), 18, 450–454 product data, MDM systems and, 21–22
selection queries, 449 product dimension
personalization (CRM), 18, 464–465 creating, 83–84, 132
physical database design source system mapping, 105
configuring databases, 123–128 product information management (PIM). See
DDS database structure, creating. See DDS PIM (product information
database structure management)
hardware platform, 113–119 product sales data mart (Amadeus)
indexing, 166–170 analysis of product sales, 63
customer dimension, 84–86
■INDEX 519