SQL Antipatterns
Strike Back
Bill Karwin
1
1 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
2
2 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
3
3 Monday, April 20, 2009
Database Design Antipatterns
1. Metadata Tribbles
2. Entity-Attribute-Value
3. Polymorphic Associations
4. Naive Trees
4
4 Monday, April 20, 2009
Metadata Tribbles
5
I want these things off the ship. I dont care if it takes
every last man weve got, I want them off the ship.
James T. Kirk
5 Monday, April 20, 2009
Metadata Tribbles
Objective: improve performance
of a very large table.
6
6 Monday, April 20, 2009
Metadata Tribbles
Antipattern: separate into many tables
with similar structure
Separate tables per distinct value in attribute
e.g., per year, per month, per user,
per postal code, etc.
7
7 Monday, April 20, 2009
Metadata Tribbles
Must create a new table for each new value
CREATE TABLE Bugs_2005 ( . . . );
CREATE TABLE Bugs_2006 ( . . . );
CREATE TABLE Bugs_2007 ( . . . );
CREATE TABLE Bugs_2008 ( . . . );
. . .
8
mixing data
with metadata
8 Monday, April 20, 2009
Metadata Tribbles
Automatic primary keys cause conicts:
CREATE TABLE Bugs_2005 (bug_id SERIAL . . . );
CREATE TABLE Bugs_2006 (bug_id SERIAL . . . );
CREATE TABLE Bugs_2007 (bug_id SERIAL . . . );
CREATE TABLE Bugs_2008 (bug_id SERIAL . . . );
. . .
9
same values allocated
in multiple tables
9 Monday, April 20, 2009
Metadata Tribbles
Difcult to query across tables
SELECT b.status, COUNT(*) AS count_per_status
FROM (
SELECT * FROM Bugs_2009
UNION
SELECT * FROM Bugs_2008
UNION
SELECT * FROM Bugs_2007
UNION
SELECT * FROM Bugs_2006 ) AS b
GROUP BY b.status;
10
10 Monday, April 20, 2009
Metadata Tribbles
Table structures are not kept in sync
ALTER TABLE Bugs_2009
ADD COLUMN hours NUMERIC;
Prior tables dont contain new column
Dissimilar tables cant be combined with UNION
11
11 Monday, April 20, 2009
Metadata Tribbles
Solution #1: use horizontal partitioning
Physically split, while logically whole
MySQL 5.1 supports partitioning
BUGS
BUGS
(2007)
BUGS
(2008)
BUGS
(2009)
12
12 Monday, April 20, 2009
Solution #2: use vertical partitioning
Move bulky and seldom-used columns to a
second table in one-to-one relationship
Metadata Tribbles
INSTALLERS
( 1 .. 1 )
PRODUCTS
13
13 Monday, April 20, 2009
Metadata Tribbles
Columns can also be tribbles:
CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
. . .
product_id1 BIGINT,
product_id2 BIGINT,
product_id3 BIGINT
);
14
14 Monday, April 20, 2009
Metadata Tribbles
Solution #3: add a dependent table
CREATE TABLE BugsProducts (
bug_id BIGINT REFERENCES bugs,
product_id BIGINT REFERENCES products,
PRIMARY KEY (bug_id, product_id)
);
15
BUGS
(1 .. *)
BUGS
PRODUCTS
15 Monday, April 20, 2009
Entity-Attribute-Value
16
If you try and take a cat apart to see how it works,
the rst thing you have on your hands is a non-working cat.
Richard Dawkins
16 Monday, April 20, 2009
Entity-Attribute-Value
Objective: make a table with a variable
set of attributes
bug_id bug_type priority description severity sponsor
1234 BUG high
crashes when
saving
loss of
functionality
3456 FEATURE low support XML Acme Corp.
17
17 Monday, April 20, 2009
Entity-Attribute-Value
Antipattern: store all attributes in a
second table, one attribute per row
CREATE TABLE eav (
bug_id BIGINT NOT NULL,
attr_name VARCHAR(20) NOT NULL,
attr_value VARCHAR(100),
PRIMARY KEY (bug_id, attr_name),
FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id)
);
mixing data
with metadata
18
18 Monday, April 20, 2009
Entity-Attribute-Value
bug_id attr_name attr_value
1234 priority high
1234 description crashes when saving
1234 severity loss of functionality
3456 priority low
3456 description support XML
3456 sponsor Acme Corp.
19
19 Monday, April 20, 2009
Entity-Attribute-Value
Difcult to rely on attribute names
bug_id attr_name attr_value
1234 created 2008-04-01
3456 created_date 2008-04-01
20
20 Monday, April 20, 2009
Entity-Attribute-Value
Difcult to enforce data type integrity
bug_id attr_name attr_value
1234 created_date 2008-02-31
3456 created_date banana
21
21 Monday, April 20, 2009
Entity-Attribute-Value
Difcult to enforce mandatory attributes
(i.e. NOT NULL)
SQL constraints apply to columns, not rows
No way to declare that a row must exist with a
certain attr_name value (created_date)
Maybe create a trigger on INSERT for bugs?
22
22 Monday, April 20, 2009
Entity-Attribute-Value
Difcult to enforce referential integrity for
attribute values
Constraints apply to all rows in the column, not
selected rows depending on value in attr_name
bug_id attr_name attr_value
1234 priority new
3456 priority xed
5678 priority banana
23
23 Monday, April 20, 2009
Entity-Attribute-Value
Difcult to reconstruct a row of attributes:
SELECT b.bug_id,
e1.attr_value AS created_date,
e2.attr_value AS priority,
e3.attr_value AS description,
e4.attr_value AS status,
e5.attr_value AS reported_by
FROM Bugs b
LEFT JOIN eav e1 ON (b.bug_id = e1.bug_id AND e1.attr_name = created_date)
LEFT JOIN eav e2 ON (b.bug_id = e2.bug_id AND e2.attr_name = priority)
LEFT JOIN eav e3 ON (b.bug_id = e3.bug_id AND e3.attr_name = description)
LEFT JOIN eav e4 ON (b.bug_id = e4.bug_id AND e4.attr_name = status)
LEFT JOIN eav e5 ON (b.bug_id = e5.bug_id AND e5.attr_name = reported_by);
bug_id created_date priority description status reported_by
1234 2008-04-01 high Crashes when I save. NEW Bill
24
need one JOIN
per attribute
24 Monday, April 20, 2009
Entity-Attribute-Value
Solution: use metadata for metadata
Dene attributes in columns
ALTER TABLE to add attribute columns
Dene related tables for related types
25
25 Monday, April 20, 2009
Entity-Attribute-Value
Solution #1: Single Table Inheritance
One table with many columns
Columns are NULL when inapplicable
CREATE TABLE Issues (
issue_id SERIAL PRIMARY KEY,
created_date DATE NOT NULL,
priority VARCHAR(20),
description TEXT,
issue_type CHAR(1) CHECK (issue_type IN (B, F)),
bug_severity VARCHAR(20),
feature_sponsor VARCHAR(100)
);
26
26 Monday, April 20, 2009
Entity-Attribute-Value
Solution #2: Concrete Table Inheritance
Dene similar tables for similar types
Duplicate common columns in each table
CREATE TABLE Features (
bug_id SERIAL PRIMARY KEY,
created_date DATE NOT NULL,
priority VARCHAR(20),
description TEXT,
sponsor VARCHAR(100)
);
CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
created_date DATE NOT NULL,
priority VARCHAR(20),
description TEXT,
severity VARCHAR(20)
);
27
27 Monday, April 20, 2009
Entity-Attribute-Value
Solution #2: Concrete Table Inheritance
Use UNION to search both tables:
SELECT * FROM (
SELECT issue_id, description FROM Bugs
UNION ALL
SELECT issue_id, description FROM Features
) unied_table
WHERE description LIKE ...
28
28 Monday, April 20, 2009
Entity-Attribute-Value
Solution #3: Class Table Inheritance
Common columns in base table
Subtype-specic columns in subtype tables
CREATE TABLE Features (
issue_id BIGINT PRIMARY KEY,
sponsor VARCHAR(100),
FOREIGN KEY (issue_id)
REFERENCES Issues (issue_id)
);
CREATE TABLE Bugs (
issue_id BIGINT PRIMARY KEY,
severity VARCHAR(20),
FOREIGN KEY (issue_id)
REFERENCES Issues (issue_id)
);
CREATE TABLE Issues (
issue_id SERIAL PRIMARY KEY,
created_date DATE NOT NULL
priority VARCHAR(20),
description TEXT
);
29
29 Monday, April 20, 2009
Entity-Attribute-Value
Solution #3: Class Table Inheritance
Easy to query common columns:
SELECT * FROM Issues
WHERE description LIKE ...
Easy to query one subtype at a time:
SELECT * FROM Issues
JOIN Bugs USING (issue_id);
30
30 Monday, April 20, 2009
Entity-Attribute-Value
Appropriate usage of EAV:
If attributes must be fully exible and dynamic
You must enforce constraints in application code
Dont try to fetch one object in a single row
Consider non-relational solutions
for semi-structured data, e.g. RDF/XML
31
31 Monday, April 20, 2009
Polymorphic Associations
32
Of course, some people do go both ways.
The Scarecrow
32 Monday, April 20, 2009
Polymorphic Assocations
Objective: reference multiple parents
33
BUGS
COMMENTS
FEATURES
33 Monday, April 20, 2009
Polymorphic Assocations
Cant make a FOREIGN KEY constraint
reference two tables:
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
comment TEXT NOT NULL,
issue_type VARCHAR(15) CHECK
(issue_type IN (Bugs, Features)),
issue_id BIGINT NOT NULL,
FOREIGN KEY issue_id REFERENCES
);
34
you need this to be
Bugs or Features
34 Monday, April 20, 2009
Polymorphic Assocations
Instead, you have to dene table with no
FOREIGN KEY or referential integrity:
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
comment TEXT NOT NULL,
issue_type VARCHAR(15) CHECK
(issue_type IN (Bugs, Features)),
issue_id BIGINT NOT NULL
);
35
35 Monday, April 20, 2009
Polymorphic Assocations
36
comment_id comment issue_type
c.
issue_id
b.
issue_id
f.
issue_id
6789 It crashes Bug 1234 1234 NULL
9876
Great
idea!
Feature 2345 NULL 2345
comment
id
comment issue_type issue_id
6789 It crashes Bugs 1234
9876 Great idea! Features 2345
issue_id
. . .
2345
issue_id
. . .
1234
Bugs Features
Comments
Query result:
36 Monday, April 20, 2009
Polymorphic Assocations
You cant use a different table for each row.
You must name all tables explicitly.
SELECT * FROM Comments
JOIN USING (issue_id);
37
you need this to be
Bugs or Features
37 Monday, April 20, 2009
Polymorphic Assocations
Instead, join to each parent table:
SELECT *
FROM Comments c
LEFT JOIN Bugs b ON (c.issue_type = Bugs
AND c.issue_id = b.issue_id)
LEFT JOIN Features f ON (c.issue_type = Features
AND c.issue_id = f.issue_id);
38
you have to get
these strings right
38 Monday, April 20, 2009
Polymorphic Assocations
Solution #1: exclusive arcs
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
comment TEXT NOT NULL,
bug_id BIGINT,
feature_id BIGINT,
FOREIGN KEY bug_id
REFERENCES Bugs(bug_id)
FOREIGN KEY feature_id
REFERENCES Features(feature_id)
);
39
both columns are nullable;
exactly one must be non-null
39 Monday, April 20, 2009
Polymorphic Assocations
Solution #1: exclusive arcs
Referential integrity is enforced
But hard to make sure exactly one is non-null
Queries are easier:
SELECT * FROM Comments c
LEFT JOIN Bugs b USING (bug_id)
LEFT JOIN Features f USING (feature_id);
40
40 Monday, April 20, 2009
Solution #2: reverse the relationship
Polymorphic Assocations
41
BUGS
FEATURES
FEATURES
COMMENTS
BUGS
COMMENTS
COMMENTS
41 Monday, April 20, 2009
Polymorphic Assocations
Solution #2: reverse the relationship
CREATE TABLE BugsComments (
comment_id BIGINT NOT NULL,
bug_id BIGINT NOT NULL,
PRIMARY KEY (comment_id),
FOREIGN KEY (comment_id) REFERENCES Comments(comment_id),
FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id)
);
CREATE TABLE FeaturesComments (
comment_id BIGINT NOT NULL,
feature_id BIGINT NOT NULL,
PRIMARY KEY (comment_id),
FOREIGN KEY (comment_id) REFERENCES Comments(comment_id),
FOREIGN KEY (feature_id) REFERENCES Features(feature_id)
);
42
42 Monday, April 20, 2009
Polymorphic Assocations
Solution #2: reverse the relationship
Referential integrity is enforced
Query comments for a given bug:
SELECT * FROM BugsComments b
JOIN Comments c USING (comment_id)
WHERE b.bug_id = 1234;
Query bug/feature for a given comment:
SELECT * FROM Comments
LEFT JOIN (BugsComments JOIN Bugs USING (bug_id))
USING (comment_id)
LEFT JOIN (FeaturesComments JOIN Features USING (feature_id))
USING (comment_id)
WHERE comment_id = 9876;
43
43 Monday, April 20, 2009
Solution #3: use a base parent table
Polymorphic Assocations
44
FEATURES BUGS
COMMENTS
ISSUES
44 Monday, April 20, 2009
Polymorphic Assocations
Solution #3: use a base parent table
CREATE TABLE Issues (
issue_id SERIAL PRIMARY KEY
);
CREATE TABLE Bugs (
issue_id BIGINT PRIMARY KEY,
. . .
FOREIGN KEY (issue_id) REFERENCES Issues(issue_id)
);
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
comment TEXT NOT NULL,
issue_id BIGINT NOT NULL,
FOREIGN KEY (issue_id) REFRENCES Issues(issue_id)
);
45
45 Monday, April 20, 2009
Polymorphic Assocations
Solution #3: use a base parent table
Referential integrity is enforced
Queries are easier:
SELECT * FROM Comments
JOIN Issues USING (issue_id)
LEFT JOIN Bugs USING (issue_id)
LEFT JOIN Features USING (issue_id);
46
46 Monday, April 20, 2009
Polymorphic Assocations
Enforcing disjoint subtypes:
CREATE TABLE Issues (
issue_id SERIAL PRIMARY KEY,
issue_type CHAR(1) NOT NULL CHECK (issue_type IN (B, F)),
UNIQUE KEY (issue_id, issue_type)
);
CREATE TABLE Bugs (
issue_id BIGINT PRIMARY KEY,
issue_type CHAR(1) NOT NULL CHECK (issue_type = B),
. . .
FOREIGN KEY (issue_id, issue_type)
REFERENCES Issues(issue_id, issue_type)
);
47
referential integrity
47 Monday, April 20, 2009
Naive Trees
48
48 Monday, April 20, 2009
Naive Trees
Objective: store/query hierarchical data
Categories/subcategories
Bill of materials
Threaded discussions
49
49 Monday, April 20, 2009
Naive Trees
50
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to
check valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
50 Monday, April 20, 2009
Naive Trees
Adjacency List
Naive solution nearly everyone uses
Each entry in the tree knows immediate parent
51
comment_id parent_id author comment
1 NULL Fran Whats the cause of this bug?
2 1 Ollie I think its a null pointer.
3 2 Fran No, I checked for that.
4 1 Kukla We need to check valid input.
5 4 Ollie Yes, thats a bug.
6 4 Fran Yes, please add a check
7 6 Kukla That xed it.
51 Monday, April 20, 2009
Naive Trees
Adjacency List
Easy to inserting a new comment:
INSERT INTO Comments (parent_id, author, comment)
VALUES (7, Kukla, Thanks!);
Easy to move a subtree to a new position:
UPDATE Comments SET parent_id = 3
WHERE comment_id = 6;
52
52 Monday, April 20, 2009
Naive Trees
Adjacency List
Querying a nodes immediate children is easy:
SELECT * FROM Comments c1
LEFT JOIN Comments c2
ON (c2.parent_id = c1.comment_id);
Querying a nodes immediate parent is easy:
SELECT * FROM Comments c1
JOIN Comments c2
ON (c1.parent_id = c2.comment_id);
53
53 Monday, April 20, 2009
Naive Trees
Adjacency List
Hard to query all descendants in a deep tree:
SELECT * FROM Comments c1
LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)
LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)
LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)
LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id)
LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)
LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)
LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)
LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)
LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)
. . .
54
it still doesnt support
unlimited depth!
54 Monday, April 20, 2009
Naive Trees
Solution #1: Path Enumeration
Store chain of ancestors as a string in each node
Good for breadcrumbs, or sorting by hierarchy
55
comment_id path author comment
1 1/ Fran Whats the cause of this bug?
2 1/2/ Ollie I think its a null pointer.
3 1/2/3/ Fran No, I checked for that.
4 1/4/ Kukla We need to check valid input.
5 1/4/5/ Ollie Yes, thats a bug.
6 1/4/6/ Fran Yes, please add a check
7 1/4/6/7/ Kukla That xed it.
55 Monday, April 20, 2009
Naive Trees
Solution #1: Path Enumeration
Easy to query all ancestors of comment #7:
SELECT * FROM Comments
WHERE 1/4/6/7/ LIKE path || %;
Easy to query all descendants of comment #4:
SELECT * FROM Comments
WHERE path LIKE 1/4/%;
56
56 Monday, April 20, 2009
Naive Trees
Solution #1: Path Enumeration
Easy to add child of comment 7:
INSERT INTO Comments (author, comment)
VALUES (Ollie, Good job!);
SELECT path FROM Comments
WHERE comment_id = 7;
UPDATE Comments
SET path = $parent_path || LAST_INSERT_ID() || /
WHERE comment_id = LAST_INSERT_ID();
57
57 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
Each comment encodes its descendants
using two numbers:
A comments right number is less than all the
numbers used by the comments descendants.
A comments left number is greater than all the
numbers used by the comments descendants.
58
58 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
59
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to check
valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
1
2
14
5
3 4
6 13
7 8 9 12
10 11
59 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
60
comment_id nsleft nsright author comment
1 1 14 Fran Whats the cause of this bug?
2 2 5 Ollie I think its a null pointer.
3 3 4 Fran No, I checked for that.
4 6 13 Kukla We need to check valid input.
5 7 8 Ollie Yes, thats a bug.
6 9 12 Fran Yes, please add a check
7 10 11 Kukla That xed it.
these are not
foreign keys
60 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
Easy to query all ancestors of comment #7:
SELECT * FROM Comments child
JOIN Comments ancestor
ON (child.left BETWEEN ancestor.nsleft
AND ancestor.nsright)
WHERE child.comment_id = 7;
61
61 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
62
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to check
valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
1
2
14
5
3 4
6 13
7 8 9 12
10 11
ancestors
child
62 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
Easy to query all descendants of comment #4:
SELECT * FROM Comments parent
JOIN Comments descendant
ON (descendant.left BETWEEN parent.nsleft
AND parent.nsright)
WHERE parent.comment_id = 4;
63
63 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
64
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to check
valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
1
2
14
5
3 4
6 13
7 8 9 12
10 11
parent
descendants
64 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
Hard to insert a new child of comment #5:
UPDATE Comment
SET nsleft = CASE WHEN nsleft >= 8 THEN nsleft+2 ELSE nsleft END,
nsright = nsright+2
WHERE nsright >= 7;
INSERT INTO Comment (nsleft, nsright, author, comment)
VALUES (8, 9, 'Fran', 'I agree!');
Recalculate left values for all nodes to the right of
the new child. Recalculate right values for all
nodes above and to the right.
65
65 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
66
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to check
valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
1
2
14
5
3 4
6 13
7 8 9 12
10 11
(8) Fran:
I agree!
8 9
10 11
12 13
14
15
16
66 Monday, April 20, 2009
Naive Trees
Solution #2: Nested Sets
Hard to query the parent of comment #6:
SELECT parent.* FROM Comments AS c
JOIN Comments AS parent
ON (c.nsleft BETWEEN parent.nsleft AND parent.nsright)
LEFT OUTER JOIN Comments AS in_between
ON (c.nsleft BETWEEN in_between.nsleft AND in_between.nsright
AND in_between.nsleft BETWEEN parent.nsleft AND parent.nsright)
WHERE c.comment_id = 6 AND in_between.comment_id IS NULL;
Parent of #6 is an ancestor who has no
descendant who is also an ancestor of #6.
Querying a child is a similar problem.
67
67 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Store every path from ancestors to descendants
Requires an additional table:
CREATE TABLE TreePaths (
ancestor BIGINT NOT NULL,
descendant BIGINT NOT NULL,
PRIMARY KEY (ancestor, descendant),
FOREIGN KEY(ancestor) REFERENCES Comments(comment_id),
FOREIGN KEY(descendant) REFERENCES Comments(comment_id),
);
68
68 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
69
(1) Fran:
Whats the cause
of this bug?
(2) Ollie:
I think its a null
pointer.
(3) Fran:
No, I checked for
that.
(4) Kukla:
We need to check
valid input.
(5) Ollie:
Yes, thats a bug.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That xed it.
69 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
70
comment_id author comment
1 Fran Whats the cause of this bug?
2 Ollie I think its a null pointer.
3 Fran No, I checked for that.
4 Kukla We need to check valid input.
5 Ollie Yes, thats a bug.
6 Fran Yes, please add a check
7 Kukla That xed it.
ancestor descendant
1 1
1 2
1 3
1 4
1 5
1 6
1 7
2 2
2 3
3 3
4 4
4 5
4 6
4 7
5 5
6 6
6 7
7 7
requires O(n!)
rows at most
but far fewer
in practice
70 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Easy to query descendants of comment #4:
SELECT c.* FROM Comments c
JOIN TreePaths t
ON (c.comment_id = t.descendant)
WHERE t.ancestor = 4;
71
71 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Easy to query ancestors of comment #6:
SELECT c.* FROM Comments c
JOIN TreePaths t
ON (c.comment_id = t.ancestor)
WHERE t.descendant = 6;
72
72 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Easy to insert a new child of comment #5:
INSERT INTO Comments ...
INSERT INTO TreePaths (ancestor, descendant)
VALUES (8, 8);
INSERT INTO TreePaths (ancestor, descendant)
SELECT ancestor, 8 FROM TreePaths
WHERE descendant = 5;
73
generates comment #8
73 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Easy to delete a child comment #7:
DELETE FROM TreePaths
WHERE descendant = 7;
Even easier with ON DELETE CASCADE
74
74 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Easy to delete the subtree under comment #4:
DELETE FROM TreePaths WHERE descendant IN
(SELECT descendant FROM TreePaths
WHERE ancestor = 4);
For MySQL, use multi-table DELETE:
DELETE p FROM TreePaths p
JOIN TreePaths a USING (descendant)
WHERE a.ancestor = 4;
75
75 Monday, April 20, 2009
Naive Trees
Solution #3: Closure Table
Add a depth column to make it
easier to query immediate parent
or child:
SELECT c.* FROM Comments c
JOIN TreePaths t
ON (c.comment_id = t.descendant)
WHERE t.ancestor = 4
AND t.depth = 1;
76
ancestor descendant depth
1 1 0
1 2 1
1 3 2
1 4 1
1 5 2
1 6 2
1 7 3
2 2 0
2 3 1
3 3 0
4 4 0
4 5 1
4 6 1
4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
76 Monday, April 20, 2009
Naive Trees
Summary of Designs:
77
Design
Number of
Tables
Query
Child
Query
Subtree
Modify
Tree
Referential
Integrity
Adjacency List 1 Easy Hard Easy Yes
Path
Enumeration
1 Easy Easy Hard No
Nested Sets 1 Hard Easy Hard No
Closure Table 2 Easy Easy Easy Yes
77 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
78
78 Monday, April 20, 2009
Database Creation Antipatterns
5. ENUM Antipattern
6. Rounding Errors
7. Indexes Are Magical
79
79 Monday, April 20, 2009
ENUM Antipattern
80
80 Monday, April 20, 2009
ENUM Antipattern
Objective: restrict a column
to a xed set of values
INSERT INTO bugs (status)
VALUES (new)
INSERT INTO bugs (status)
VALUES (banana)
81
OK
FAIL
81 Monday, April 20, 2009
ENUM Antipattern
Antipattern: use ENUM data type,
when the set of values may change
CREATE TABLE Bugs (
. . .
status ENUM(new, open, xed)
);
82
82 Monday, April 20, 2009
ENUM Antipattern
Changing the set of values is a metadata
alteration
You must know the current set of values
ALTER TABLE Bugs MODIFY COLUMN
status ENUM(new, open, xed, duplicate);
83
83 Monday, April 20, 2009
ENUM Antipattern
Difcult to get a list of possible values
SELECT column_type
FROM information_schema.columns
WHERE table_schema = bugtracker_schema
AND table_name = Bugs
AND column_name = status;
Returns a LONGTEXT you must parse:
ENUM(new, open, xed)
84
84 Monday, April 20, 2009
ENUM Antipattern
Solution: use ENUM only if values are
set in stone
CREATE TABLE Bugs (
. . .
bug_type ENUM(defect, feature)
);
85
85 Monday, April 20, 2009
ENUM Antipattern
Use a lookup table if values may change
CREATE TABLE BugStatus (
status VARCHAR(10) PRIMARY KEY
);
INSERT INTO BugStatus (status)
VALUES (NEW), (OPEN), (FIXED);
86
BUGSTATUS BUGS
86 Monday, April 20, 2009
ENUM Antipattern
Adding/removing a value is a data
operation, not a metadata operation
You dont need to know the current values
INSERT INTO BugStatus (status)
VALUES (DUPLICATE);
87
87 Monday, April 20, 2009
ENUM Antipattern
Use an attribute to retire values, not
DELETE
CREATE TABLE BugStatus (
status VARCHAR(10) PRIMARY KEY,
active TINYINT NOT NULL DEFAULT 1
);
UPDATE BugStatus
SET active = 0
WHERE status = DUPLICATE;
88
88 Monday, April 20, 2009
Rounding Errors
89
10.0 times 0.1 is hardly ever 1.0.
Brian Kernighan
89 Monday, April 20, 2009
Rounding Errors
Objective: store real numbers exactly
Especially money
Work estimate hours
90
90 Monday, April 20, 2009
Rounding Errors
Antipattern: use FLOAT data type
ALTER TABLE Bugs
ADD COLUMN hours FLOAT;
INSERT INTO Bugs (bug_id, hours)
VALUES (1234, 3.3);
91
91 Monday, April 20, 2009
Rounding Errors
FLOAT is inexact
SELECT hours FROM Bugs
WHERE bug_id = 1234;
!
3.3
SELECT hours * 1000000000 FROM Bugs
WHERE bug_id = 1234;
!
3299999952.3163
92
92 Monday, April 20, 2009
Rounding Errors
Inexact decimals
1/3 + 1/3 + 1/3 = 1.0
0.333 + 0.333 + 0.333 = 0.999
93
assuming innite precision
nite precision
93 Monday, April 20, 2009
Rounding Errors
IEEE 754 standard for representing
oating-point numbers in base-2
Some numbers round off, arent stored exactly
Comparisons to original value fail
SELECT * FROM Bugs
WHERE hours = 3.3;
94
comparison
fails
94 Monday, April 20, 2009
Rounding Errors
Solution: use NUMERIC data type
ALTER TABLE Bugs
ADD COLUMN hours NUMERIC(9,2)
INSERT INTO bugs (bug_id, hours)
VALUES (1234, 3.3);
SELECT * FROM Bugs
WHERE hours = 3.3;
95
comparison
succeeds
95 Monday, April 20, 2009
Indexes are Magical
96
Whenever any result is sought, the question will then
arise by what course of calculation can these results
be arrived at by the machine in the shortest time?
Charles Babbage
96 Monday, April 20, 2009
Indexes are Magical
Objective: execute queries with optimal
performance
97
97 Monday, April 20, 2009
Indexes are Magical
Antipatterns:
Creating indexes blindly
Executing non-indexable queries
Rejecting indexes because of their overhead
98
98 Monday, April 20, 2009
Indexes are Magical
Creating indexes blindly:
CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
date_reported DATE NOT NULL,
summary VARCHAR(80) NOT NULL,
status VARCHAR(10) NOT NULL,
hours NUMERIC(9,2),
INDEX (bug_id),
INDEX (summary),
INDEX (hours),
INDEX (bug_id, date_reported, status)
);
99
redundant index
bulky index
unnecessary index
unnecessary
covering index
99 Monday, April 20, 2009
Indexes are Magical
Executing non-indexable queries:
SELECT * FROM Bugs
WHERE description LIKE %crash%;
SELECT * FROM Bugs
WHERE MONTH(date_reported) = 4;
SELECT * FROM Bugs
WHERE last_name = ... OR rst_name = ...;
SELECT * FROM Accounts
ORDER BY rst_name, last_name;
100
non-leftmost
string match
function applied
to column
non-leftmost
composite key match
no index spans
all rows
100 Monday, April 20, 2009
Indexes are Magical
Telephone book analogy
Easy to search for Dean Thomas:
SELECT * FROM TelephoneBook
WHERE full_name LIKE Thomas, %;
Hard to search for Thomas Riddle:
SELECT * FROM TelephoneBook
WHERE full_name LIKE %, Thomas;
101
uses index
to match
requires full
table scan
101 Monday, April 20, 2009
Rejecting indexes because of their overhead:
Indexes are Magical
102
0
20
40
60
80
0 100 200 300 400 500 600 700 800 9001000
Query w/ Index: O(log n)
Update Index: O(log n)
Query w/o Index: O(n)
the benet quickly
justies the overhead
102 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
103
103 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
104
Prole your application.
Focus on the most costly
SQL queries:
Longest-running.
Most frequently run.
Blockers, lockers, and
deadlocks.
104 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
105
Analyze the optimization
plan of costly queries,
e.g. MySQLs EXPLAIN
Identify tables that arent
using indexes:
Temporary table
Filesort
105 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
106
Could an index improve
access to these tables?
ORDER BY criteria
MIN() / MAX()
WHERE conditions
Which column(s) need
indexes?
106 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
107
After creating index,
measure again.
Conrm the new index
made a difference.
Impress your boss!
The new index gave a 27%
performance improvement!
107 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
108
Indexes are compact,
frequently-used data.
Try to cache indexes in
memory:
MyISAM: key_buffer_size,
LOAD INDEX INTO CACHE
InnoDB: innodb_buffer_pool_size
108 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Measure
Explain
Nominate
Test
Optimize
Repair
109
Indexes require periodic
maintenance.
Like a lesystem requires
periodic defragmentation.
Analyze or rebuild
indexes, e.g. in MySQL:
ANALYZE TABLE
OPTIMIZE TABLE
109 Monday, April 20, 2009
Indexes are Magical
Solution: MENTOR your indexes
Sounds like the name
of a self-help book!
110
MENTOR
YOUR
INDEXES
How to Break the
Bad Performance Habits
That Make
You Miserabl!
BYTHE AUTHOROF THE BESTSELLER
WHY DOESNT MY CACHE GET ANY HITS?
William K. Karwin
just kidding!
please dont ask
when its coming out!
110 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
111
111 Monday, April 20, 2009
Query Antipatterns
8. NULL antipatterns
9. Ambiguous Groups
10. Random Order
11. JOIN antipattern
12. Goldberg Machine
112
112 Monday, April 20, 2009
NULL Antipatterns
113
As we know, there are known knowns; there are things we know
we know. We also know there are known unknowns; that is to say
we know there are some things we do not know. But there are also
unknown unknowns the ones we don't know we don't know.
Donald Rumsfeld
113 Monday, April 20, 2009
NULL Antipatterns
Objective: handle missing values,
store them as missing, and support them in
queries.
114
114 Monday, April 20, 2009
NULL Antipatterns
Antipatterns:
Use NULL as an ordinary value
Use an ordinary value as NULL
115
115 Monday, April 20, 2009
NULL Antipatterns
Using NULL in most expressions results in
an unknown value.
SELECT NULL + 10;
SELECT Bill || NULL;
SELECT FALSE OR NULL;
116
NULL is not zero
NULL is not an empty string
NULL is not FALSE
116 Monday, April 20, 2009
NULL Antipatterns
The opposite of unknown is still unknown.
SELECT * FROM Bugs
WHERE assigned_to = 123;
SELECT * FROM Bugs
WHERE NOT (assigned_to = 123);
117
which query returns bugs
that are not yet assigned?
neither query!
117 Monday, April 20, 2009
NULL Antipatterns
Choosing an ordinary value in lieu of NULL:
UPDATE Bugs SET assigned_to = -1
WHERE assigned_to IS NULL;
118
assigned_to is a foreign key
so this value doesnt work
118 Monday, April 20, 2009
NULL Antipatterns
Choosing an ordinary value in lieu of NULL:
UPDATE Bugs SET hours = -1
WHERE hours IS NULL;
SELECT SUM(hours)
FROM Bugs
WHERE status = OPEN
119
this makes SUM()
inaccurate
AND hours <> -1;
special-case code
you were trying to avoid
by prohibiting NULL
119 Monday, April 20, 2009
NULL Antipatterns
Choosing an ordinary value in lieu of NULL:
Any given value may be signicant in a column
Every column needs a different value
You need to remember or document the value
used for missing on a case-by-case basis
120
120 Monday, April 20, 2009
NULL Antipatterns
Solution: use NULL appropriately
NULL signies missing or inapplicable
Works for every data type
Already standard and well-understood
121
121 Monday, April 20, 2009
NULL Antipatterns
Understanding NULL in expressions
122
Expression Expected Actual
NULL = 0 TRUE Unknown
NULL = 12345 FALSE Unknown
NULL <> 12345 TRUE Unknown
NULL + 12345 12345 Unknown
NULL || string string Unknown
NULL = NULL TRUE Unknown
NULL <> NULL FALSE Unknown
122 Monday, April 20, 2009
NULL Antipatterns
Understanding NULL in boolean expressions
123
Expression Expected Actual
NULL AND TRUE FALSE Unknown
NULL AND FALSE FALSE FALSE
NULL OR FALSE FALSE Unknown
NULL OR TRUE TRUE TRUE
NOT (NULL) TRUE Unknown
123 Monday, April 20, 2009
NULL Antipatterns
SQL supports IS NULL predicate that
returns true or false, never unknown:
SELECT * FROM Bugs
WHERE assigned_to IS NULL;
SELECT * FROM Bugs
WHERE assigned_to IS NOT NULL;
124
124 Monday, April 20, 2009
NULL Antipatterns
SQL-99 supports IS DISTINCT FROM
predicate that returns true or false:
SELECT * FROM Bugs
WHERE assigned_to IS DISTINCT FROM 123;
SELECT * FROM Bugs
WHERE assigned_to IS NOT DISTINCT FROM 123;
125
MySQL operator works like
IS NOT DISTINCT FROM
SELECT * FROM Bugs
WHERE assigned_to <=> 123;
125 Monday, April 20, 2009
NULL Antipatterns
Change NULL to ordinary value on demand
with COALESCE():
SELECT COALESCE(
rst_name || || middle_initial || || last_name,
rst_name || || last_name) AS full_name
FROM Accounts;
Also called NVL() or ISNULL()
in some database brands.
126
126 Monday, April 20, 2009
Ambiguous Groups
127
Please accept my resignation. I dont want to belong
to any club that will accept me as a member.
Groucho Marx
127 Monday, April 20, 2009
Ambiguous Groups
Objective: perform grouping queries, and
include some attributes in the result
SELECT product_name, bug_id,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
128
128 Monday, April 20, 2009
Antipattern: bug_id isnt that of the
latest per product
Ambiguous Groups
product_name bug_id latest
Open RoundFile 1234 2008-04-01
Visual TurboBuilder 3456 2008-02-16
ReConsider 5678 2008-01-01
product_name bug_id date_reported
Open RoundFile 1234 2007-12-19
Open RoundFile 2248 2008-04-01
Visual TurboBuilder 3456 2008-02-16
Visual TurboBuilder 4077 2008-02-10
ReConsider 5678 2008-01-01
ReConsider 8063 2007-11-09
129
129 Monday, April 20, 2009
Ambiguous Groups
SELECT product_name, bug_id,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
130
assume bug_id from
the same row with
MAX(date_reported)
130 Monday, April 20, 2009
Ambiguous Groups
SELECT product_name, bug_id,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
131
what if two bug_id
both match the
latest date?
131 Monday, April 20, 2009
Ambiguous Groups
SELECT product_name, bug_id,
MIN(date_reported) AS earliest,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
132
what bug_id
has both the earliest
and the latest date?
132 Monday, April 20, 2009
Ambiguous Groups
SELECT product_name, bug_id,
AVG(date_reported) AS mean
FROM Bugs
GROUP BY product_name;
133
what if no bug_id
matches this date?
133 Monday, April 20, 2009
Ambiguous Groups
The Single-Value Rule: every column
in the select-list must be either:
Part of an aggregate expression.
In the GROUP BY clause.
A functional dependency of a column
named in the GROUP BY clause.
134
134 Monday, April 20, 2009
Ambiguous Groups
For a given product_name, there is a single
value in each functionally dependent attribute.
135
product_name bug_id date_reported
Open RoundFile 1234 2007-12-19
Open RoundFile 2248 2008-04-01
Visual TurboBuilder 3456 2008-02-16
Visual TurboBuilder 4077 2008-02-10
ReConsider 5678 2008-01-01
ReConsider 8063 2007-11-09
multiple values per
product name
bug_id is not
functionally dependent
135 Monday, April 20, 2009
Ambiguous Groups
Solution #1: use only functionally
dependent attributes:
SELECT product_name, bug_id,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
136
product_name latest
Open RoundFile 2008-04-01
Visual TurboBuilder 2008-02-16
ReConsider 2008-01-01
!
136 Monday, April 20, 2009
Ambiguous Groups
Solution #2: use a derived table:
SELECT b.product_name, b.bug_id, m.latest
FROM Bugs b
JOIN (SELECT product_name, MAX(date_reported) AS latest
FROM Bugs GROUP BY product_name) m
ON (b.product_name = m.product_name
AND b.date_reported = m.latest);
137
product_name bug_id latest
Open RoundFile 2248 2008-04-01
Visual TurboBuilder 3456 2008-02-16
ReConsider 5678 2008-01-01
137 Monday, April 20, 2009
Ambiguous Groups
Solution #3: use an outer JOIN:
SELECT b1.product_name, b1.bug_id,
b1.date_reported AS latest
FROM Bugsb1 LEFT OUTER JOIN Bugs b2
ON (b1.product_name = b2.product_name
AND b1.date_reported < b2.date_reported)
WHERE b2.bug_id IS NULL;
138
product_name bug_id latest
Open RoundFile 2248 2008-04-01
Visual TurboBuilder 3456 2008-02-16
ReConsider 5678 2008-01-01
138 Monday, April 20, 2009
Ambiguous Groups
Solution #4: use another aggregate:
SELECT product_name, MAX(date_reported) AS latest,
MAX(bug_id) AS latest_bug_id
FROM Bugs
GROUP BY product_name;
139
product_name bug_id latest
Open RoundFile 2248 2008-04-01
Visual TurboBuilder 3456 2008-02-16
ReConsider 5678 2008-01-01
if bug_id increases
in chronological order
139 Monday, April 20, 2009
Ambiguous Groups
Solution #5: use GROUP_CONCAT():
SELECT product_name,
GROUP_CONCAT(bug_id) AS bug_id_list,
MAX(date_reported) AS latest
FROM Bugs
GROUP BY product_name;
140
product_name bug_id_list latest
Open RoundFile 1234, 2248 2008-04-01
Visual TurboBuilder 3456, 4077 2008-02-16
ReConsider 5678, 8063 2008-01-01
140 Monday, April 20, 2009
Random Order
141
I must complain the cards are ill shufed till I have a good hand.
Jonathan Swift
141 Monday, April 20, 2009
Random Order
Objective: select a random row
142
142 Monday, April 20, 2009
Random Order
Antipattern: sort by random expression,
then return top row(s)
SELECT * FROM Bugs
ORDER BY RAND()
LIMIT 1;
143
non-indexed sort
in a temporary table
sort entire table
just to discard it?
143 Monday, April 20, 2009
Random Order
Solution #1: pick random primary key
from list of all values:
$bug_id_list = $pdo->query(
SELECT bug_id FROM Bugs )->fetchAll();
$rand = random(count($bug_id_list));
$stmt = $pdo->prepare(
SELECT * FROM Bugs WHERE bug_id = ?);
$stmt->execute( $bug_id_list[$rand][0] );
$rand_bug = $stmt->fetch();
144
144 Monday, April 20, 2009
Random Order
Solution #1: pick random primary key
from list of all values:
$bug_id_list = $pdo->query(
SELECT bug_id FROM Bugs )->fetchAll();
$bug_id_list may grow to an impractical size:
Fatal error: Allowed memory size of 16777216 bytes exhausted
145
145 Monday, April 20, 2009
Random Order
Solution #2: pick random value between
1...MAX(bug_id); use that bug_id:
SELECT b1.* FROM Bugs b1
JOIN (SELECT CEIL(RAND() *
(SELECT MAX(bug_id) FROM Bugs)) rand_id) b2
ON (b1.bug_id = b2.rand_id);
146
146 Monday, April 20, 2009
Random Order
Solution #2: pick random value between
1...MAX(bug_id); use that bug_id:
Assumes bug_id starts at 1
and values are contiguous.
If there are gaps, a random
bug_id may not match an
existing bug.
147
147 Monday, April 20, 2009
Random Order
Solution #3: pick random value between
1...MAX(bug_id); use next higher bug_id:
SELECT b1.* FROM Bugs b1
JOIN (SELECT CEIL(RAND() *
(SELECT MAX(bug_id) FROM Bugs)) AS bug_id) b2
WHERE b1.bug_id >= b2.bug_id
ORDER BY b1.bug_id
LIMIT 1;
148
148 Monday, April 20, 2009
Random Order
Solution #3: pick random value between
1...MAX(bug_id); use next higher bug_id:
bug_id values after gaps
are chosen more often.
Random values are evenly
distributed, but bug_id values arent.
149
149 Monday, April 20, 2009
Random Order
Solution #4: pick random row from
0...COUNT, regardless of bug_id values:
$offset = $pdo->query(
SELECT ROUND(RAND() *
(SELECT COUNT(*) FROM Bugs)) )->fetch();
$sql = SELECT * FROM Bugs LIMIT 1 OFFSET ?;
$stmt = $pdo->prepare( $sql );
$stmt->execute( $offset );
150
150 Monday, April 20, 2009
JOIN Antipattern
151
151 Monday, April 20, 2009
JOIN Antipattern
Objective: Design optimal queries.
152
152 Monday, April 20, 2009
JOIN Antipattern
Antipatterns:
Senseless avoidance of JOIN.
Overzealous JOIN decomposition.
Joins are slow!
153
compared
to what?
153 Monday, April 20, 2009
JOIN Antipattern
Reasons for JOIN decomposition:
Cache and reuse earlier results
Reduce locking across multiple tables
Distribute tables across servers
Leverage IN() optimization
Reduce redundant rows
(result sets are denormalized)
Notice these are exception cases!
154
borrowed
from this book
154 Monday, April 20, 2009
JOIN Antipattern
Example from the web (2009-4-18):
SELECT *,
(SELECT name FROM stores WHERE id = p.store_id) AS store_name,
(SELECT username FROM stores WHERE id = p.store_id) AS store_username,
(SELECT region_id FROM stores WHERE id = p.store_id) AS region_id,
(SELECT city_id FROM stores WHERE id = p.store_id) AS city_id,
(SELECT name FROM categories_sub WHERE id=p.subcategory_id) subcat_name,
(SELECT name FROM categories WHERE id = p.category_id) AS category_name
FROM products p
WHERE p.date_start <= DATE(NOW()) AND p.date_end >= DATE(NOW());
155
six correlated
subqueries!
how to apply
conditions to stores?
optimizer cant
reorder JOINs
155 Monday, April 20, 2009
JOIN Antipattern
Example revised with JOINs:
SELECT p.*, s.name AS store_name, s.username AS store_username,
s.region_id, s.city_id, cs.name AS subcategory_name,
c.name AS category_name
FROM products p
JOIN stores s ON (s.id = p.store_id)
JOIN categories c ON (c.id = p.category_id)
JOIN categories_sub cs ON (cs.id = p.subcategory_id)
WHERE p.date_start <= DATE(NOW()) AND p.date_end >= DATE(NOW())
156
AND s.store_category = preferred;
easier to
apply conditions
easier to
optimize
156 Monday, April 20, 2009
JOIN Antipattern
Example: nd an entry with three tags:
HAVING COUNT solution:
SELECT b.*
FROM Bugs b
JOIN BugsProducts p ON (b.bug_id = p.bug_id)
WHERE p.product_id IN (1, 2, 3)
GROUP BY b.bug_id
HAVING COUNT(*) = 3;
157
must match all
three products
157 Monday, April 20, 2009
JOIN Antipattern
Example: nd an entry with three tags::
multiple-JOIN solution:
SELECT DISTINCT b.*
FROM Bugs b
JOIN BugsProducts p1 ON ((p1.bug_id, p1.product_id) = (b.bug_id, 1))
JOIN BugsProducts p2 ON ((p2.bug_id, p2.product_id) = (b.bug_id, 2))
JOIN BugsProducts p3 ON ((p3.bug_id, p3.product_id) = (b.bug_id, 3));
158
three joins is slower
than one, right?
not if indexes
are used well
158 Monday, April 20, 2009
JOIN Antipattern
Solution:
JOIN is to SQL as while() is to other languages.
One-size-ts-all rules (e.g. joins are slow)
dont work.
Measure twice, query once.
Let the SQL optimizer work.
Employ alternatives (e.g. JOIN decomposition)
as exception cases.
159
159 Monday, April 20, 2009
Goldberg Machine
160
Enita non sunt multiplicanda praeter necessitatem
(Entities are not to be multiplied beyond necessity).
William of Okham
160 Monday, April 20, 2009
Goldberg Machine
Objective: Generate a complex report as
efciently as possible.
161
161 Monday, April 20, 2009
Goldberg Machine
Example: Calculate for each account:
Count of bugs reported by user.
Count of products the user has been assigned to.
Count of comments left by user.
162
162 Monday, April 20, 2009
Goldberg Machine
Antipattern: Try to generate all the
information for the report in a single query:
SELECT a.account_name,
COUNT(br.bug_id) AS bugs_reported,
COUNT(bp.product_id) AS products_assigned,
COUNT(c.comment_id) AS comments
FROM Accounts a
LEFT JOIN Bugs br ON (a.account_id = br.reported_by)
LEFT JOIN (Bugs ba JOIN BugsProducts bp ON (ba.bug_id = bp.bug_id))
ON (a.account_id = ba.assigned_to)
LEFT JOIN Comments c ON (a.account_id = c.author)
GROUP BY a.account_id;
163
expected: 2
expected: 4
expected: 3
163 Monday, April 20, 2009
Goldberg Machine
Expected result versus actual result:
164
account name bugs reported
products
assigned
comments
Bill 3 2 4 48 48 48
FAIL FAIL FAIL
164 Monday, April 20, 2009
Goldberg Machine
Run query without GROUP BY:
SELECT a.account_name,
br.bug_id AS bug_reported,
ba.bug_id AS bug_assigned,
bp.product_id AS product_assigned
c.comment_id
FROM Accounts a
LEFT JOIN Bugs br ON (a.account_id = br.reported_by)
LEFT JOIN (Bugs ba JOIN BugsProducts bp ON (ba.bug_id = bp.bug_id))
ON (a.account_id = ba.assigned_to)
LEFT JOIN Comments c ON (a.account_id = c.author);
165
165 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
166
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 1234 1234 1 6789
Bill 1234 1234 1 9876
Bill 1234 1234 1 4365
Bill 1234 1234 1 7698
Bill 1234 1234 3 6789
Bill 1234 1234 3 9876
Bill 1234 1234 3 4365
Bill 1234 1234 3 7698
166 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
167
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 1234 5678 1 6789
Bill 1234 5678 1 9876
Bill 1234 5678 1 4365
Bill 1234 5678 1 7698
Bill 1234 5678 3 6789
Bill 1234 5678 3 9876
Bill 1234 5678 3 4365
Bill 1234 5678 3 7698
167 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
168
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 2345 1234 1 6789
Bill 2345 1234 1 9876
Bill 2345 1234 1 4365
Bill 2345 1234 1 7698
Bill 2345 1234 3 6789
Bill 2345 1234 3 9876
Bill 2345 1234 3 4365
Bill 2345 1234 3 7698
168 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
169
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 2345 5678 1 6789
Bill 2345 5678 1 9876
Bill 2345 5678 1 4365
Bill 2345 5678 1 7698
Bill 2345 5678 3 6789
Bill 2345 5678 3 9876
Bill 2345 5678 3 4365
Bill 2345 5678 3 7698
169 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
170
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 3456 1234 1 6789
Bill 3456 1234 1 9876
Bill 3456 1234 1 4365
Bill 3456 1234 1 7698
Bill 3456 1234 3 6789
Bill 3456 1234 3 9876
Bill 3456 1234 3 4365
Bill 3456 1234 3 7698
170 Monday, April 20, 2009
Goldberg Machine
Query result reveals a Cartesian Product:
171
account
name
bug
reported
bug
assigned
product
assigned
comment
Bill 3456 5678 1 6789
Bill 3456 5678 1 9876
Bill 3456 5678 1 4365
Bill 3456 5678 1 7698
Bill 3456 5678 3 6789
Bill 3456 5678 3 9876
Bill 3456 5678 3 4365
Bill 3456 5678 3 7698
171 Monday, April 20, 2009
Goldberg Machine
Visualizing a Cartesian Product:
172
bugs
reported
bugs
assigned
products
assigned
3 # 4 # 4 = 48
each assigned bug
applies to 2 products
comments
Bugs BugsProducts
Comments
Accounts
Bugs
172 Monday, April 20, 2009
Goldberg Machine
Solution: Write separate queries.
SELECT a.account_name, COUNT(br.bug_id) AS bugs_reported
FROM Accounts a LEFT JOIN Bugs br ON (a.account_id = br.reported_by)
GROUP BY a.account_id;
SELECT a.account_name,
COUNT(DISTINCT bp.product_id) AS products_assigned,
FROM Accounts a LEFT JOIN
(Bugs ba JOIN BugsProducts bp ON (ba.bug_id = bp.bug_id))
ON (a.account_id = ba.assigned_to)
GROUP BY a.account_id;
SELECT a.account_name, COUNT(c.comment_id) AS comments
FROM Accounts a LEFT JOIN Comments c ON (a.account_id = c.author)
GROUP BY a.account_id;
173
result: 3
result: 2
result: 4
173 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
174
174 Monday, April 20, 2009
Application Antipatterns
13. Parameter Facade
14. Phantom Side Effects
15. See No Evil
16. Diplomatic Immunity
17. Magic Beans
175
175 Monday, April 20, 2009
Parameter Facade
176
176 Monday, April 20, 2009
Parameter Facade
Objective: include application variables in
SQL statements
SELECT * FROM Bugs
WHERE bug_id IN ( $id_list );
177
177 Monday, April 20, 2009
Parameter Facade
Antipattern: Trying to use parameters
for complex syntax
178
178 Monday, April 20, 2009
Parameter Facade
Interpolation can modify syntax
$list = 1234, 3456, 5678
SELECT * FROM Bugs
WHERE bug_id IN ( $list );
SELECT * FROM Bugs
WHERE bug_id IN ( 1234, 3456, 5678 );
179
three values
separated by commas
179 Monday, April 20, 2009
Parameter Facade
A parameter is always a single value
$list = 1234, 3456, 5678
SELECT * FROM Bugs
WHERE bug_id IN ( ? );
EXECUTE USING $list;
SELECT * FROM Bugs
WHERE bug_id IN ( 1234, 3456, 5678 );
180
one string value
180 Monday, April 20, 2009
Parameter Facade
Interpolation can specify identiers
$column = bug_id
SELECT * FROM Bugs
WHERE $column = 1234;
SELECT * FROM Bugs
WHERE bug_id = 1234;
181
column identier
181 Monday, April 20, 2009
Parameter Facade
A parameter is always a single value
$column = bug_id;
SELECT * FROM Bugs
WHERE ? = 1234;
EXECUTE USING $column;
SELECT * FROM Bugs
WHERE bug_id = 1234;
182
one string value
182 Monday, April 20, 2009
Parameter Facade
Interpolation risks SQL injection
$id = 1234 or 1=1;
SELECT * FROM Bugs
WHERE bug_id = $id;
SELECT * FROM Bugs
WHERE bug_id = 1234 or 1=1;
183
logical
expression
183 Monday, April 20, 2009
Parameter Facade
A parameter is always a single value
$id = 1234 or 1=1;
SELECT * FROM Bugs
WHERE bug_id = ?;
EXECUTE USING $id;
SELECT * FROM Bugs
WHERE bug_id = 1234 or 1=1;
184
one string value
184 Monday, April 20, 2009
Parameter Facade
Preparing a SQL statement:
Parses SQL syntax
Optimizes execution plan
Retains parameter placeholders
185
185 Monday, April 20, 2009
Parameter Facade
query
SELECT
FROM
WHERE
expr-list
*
simple-table
expr
bugs
parameter
placeholder
?
bug_id
= equality
186
186 Monday, April 20, 2009
Parameter Facade
Executing a prepared statement
Combines a supplied value for each parameter
Doesnt modify syntax, tables, or columns
Runs query
187
could invalidate
optimization plan
187 Monday, April 20, 2009
Parameter Facade
query
SELECT
FROM
WHERE
expr-list
*
simple-table
expr
bugs
1234
bug_id
= equality
supplied
value
188
188 Monday, April 20, 2009
Parameter Facade
query
SELECT
FROM
WHERE
expr-list
*
simple-table
expr
bugs
1234
or 1=1
bug_id
= equality
supplied
value
189
189 Monday, April 20, 2009
Parameter Facade
Interpolating into a query string
Occurs in the application, before SQL is parsed
Database server cant tell what part is dynamic
190
190 Monday, April 20, 2009
Parameter Facade
query
SELECT
FROM
WHERE
expr-list
*
simple-table
expr
bugs
1234
bug_id
= equality
intended
value
191
191 Monday, April 20, 2009
Parameter Facade
query
SELECT
FROM
WHERE
expr-list
*
simple-table
expr
1234
bugs
bug_id
= equality
1
1
= equality
OR
192
SQL injection
192 Monday, April 20, 2009
Parameter Facade
The Bottom Line:
Interpolation may change the shape of the tree
Parameters cannot change the tree
Parameter nodes may only be values
193
193 Monday, April 20, 2009
Parameter Facade
Example: IN predicate
SELECT * FROM bugs
WHERE bug_id IN ( ? );
SELECT * FROM bugs
WHERE bug_id IN ( ?, ?, ?, ? );
194
must supply
exactly four values
may supply
only one value
194 Monday, April 20, 2009
Parameter Facade
195
Scenario Value Interpolation Parameter
single value 1234 SELECT * FROM bugs
WHERE bug_id = $id;
SELECT * FROM bugs
WHERE bug_id = ?;
multiple values
1234, 3456, 5678
SELECT * FROM bugs
WHERE bug_id IN ($list);
SELECT * FROM bugs
WHERE bug_id IN ( ?, ?, ? );
column name bug_id SELECT * FROM bugs
WHERE $column = 1234; NO
table name bugs SELECT * FROM $table
WHERE bug_id = 1234; NO
other syntax bug_id = 1234 SELECT * FROM bugs
WHERE $expr; NO
195 Monday, April 20, 2009
Parameter Facade
Solution:
Use parameters only for individual values
Use interpolation for dynamic SQL syntax
Be careful to prevent SQL injection
196
196 Monday, April 20, 2009
Phantom Side Effects
197
Every program attempts to expand until it can read mail.
Jamie Zawinsky
197 Monday, April 20, 2009
Phantom Side Effects
Objective: execute application tasks
with database operations
INSERT INTO Bugs . . .
...and send email to notify me
198
198 Monday, April 20, 2009
Phantom Side Effects
Antipattern: execute external effects in
database triggers, stored procedures, and
functions
199
199 Monday, April 20, 2009
Phantom Side Effects
External effects dont obey ROLLBACK
1. Start transaction and INSERT
200
bug_id description
insert row
bug_id = 1234
notify of
bug_id 1234
200 Monday, April 20, 2009
Phantom Side Effects
External effects dont obey ROLLBACK
2. ROLLBACK
201
bug_id description
discard
row
I got email,
but no row
1234?
201 Monday, April 20, 2009
Phantom Side Effects
External effects dont obey transaction
isolation
1. Start transaction and INSERT
202
bug_id description
insert row
bug_id = 1234
notify of
bug_id 1234
202 Monday, April 20, 2009
Phantom Side Effects
External effects dont obey transaction
isolation
2. Email is received before row is visible
203
bug_id description
row
pending
commit
I got email,
but no row
1234?
203 Monday, April 20, 2009
Phantom Side Effects
External effects run as database server user
Possible security risk
SELECT * FROM bugs
WHERE bug_id = 1234
OR send_email(Buy cheap Rolex watch!);
Auditing/logging confusion
204
SQL injection
204 Monday, April 20, 2009
Phantom Side Effects
Functions may crash
SELECT pk_encrypt(description,
/nonexistant/private.ppk)
FROM Bugs
WHERE bug_id = 1234;
205
missing le
causes fatal error
205 Monday, April 20, 2009
Phantom Side Effects
Long-running functions delay query
Accessing remote resources
Unbounded execution time
SELECT libcurl_post(description,
http://myblog.org/ . . .)
FROM Bugs
WHERE bug_id = 1234;
206
unresponsive
website
206 Monday, April 20, 2009
Phantom Side Effects
Solution:
Operate only on database in triggers, stored
procedures, database functions
Wait for transaction to commit
Perform external actions in application code
207
207 Monday, April 20, 2009
See No Evil
208
Everyone knows that debugging is twice as hard as writing
a program in the rst place. So if youre as clever as you
can be when you write it, how will you ever debug it?
Brian Kernighan
208 Monday, April 20, 2009
See No Evil
Objective: Debug errors in queries.
209
209 Monday, April 20, 2009
See No Evil
Antipatterns:
Ignore errors in return status or exceptions.
Troubleshoot code that builds queries.
210
210 Monday, April 20, 2009
See No Evil
Ignoring errors in return status:
$sql = SELECT * FROM Bugs;
$result = $mysqli->query( $sql );
$rows = $result->fetch_all();
211
OK
211 Monday, April 20, 2009
See No Evil
Ignoring errors in return status:
$sql = SELECT * FROM Bugz;
$result = $mysqli->query( $sql );
$rows = $result->fetch_all();
212
FAIL
returns FALSE
212 Monday, April 20, 2009
See No Evil
Ignoring exceptions:
$sql = SELECT * FROM Bugz;
$stmt = $pdo->query( $sql );
$rows = $stmt->fetchAll();
213
NOT
REACHED
throws PDOException
213 Monday, April 20, 2009
See No Evil
Solution: check for error status.
$sql = SELECT * FROM Bugz;
$result = $mysqli->query( $sql );
if ($result === FALSE ) {
log($mysqli->error());
return FALSE;
}
$rows = $result->fetchAll();
214
dont let it go this far!
214 Monday, April 20, 2009
See No Evil
Solution: handle exceptions.
$sql = SELECT * FROM Bugz;
try {
$stmt = $pdo->query( $sql );
} catch (PDOException $e) {
log($stmt->errorInfo());
return FALSE;
}
$rows = $stmt->fetchAll();
215
dont let it go this far!
215 Monday, April 20, 2009
See No Evil
Troubleshooting code:
$sql = SELECT * FROM Bugs
WHERE summary LIKE \%
. $mysqli->quote( $feature )
. doesn\t work 50\% of the time!%\;
$result = $mysqli->query( $sql );
$rows = $result->fetchAll();
216
who wants to
read this!?
216 Monday, April 20, 2009
See No Evil
Solution: Look at the SQL, not the code!
$sql = SELECT * FROM Bugs
WHERE summary LIKE \%
. $mysqli->quote( $feature )
. doesn\t work 50\% of the time!%\;
$rephp = FirePHP::getInstance(true);
$rephp->log( $sql, SQL );
$result = $mysqli->query( $sql );
$rows = $result->fetchAll();
217
the error
is now clear!
217 Monday, April 20, 2009
Diplomatic Immunity
218
Humans are allergic to change. They love to say,
Weve always done it this way. I try to ght that.
Adm. Grace Murray Hopper
218 Monday, April 20, 2009
Diplomatic Immunity
Objective: Employ software development
best practices.
219
219 Monday, April 20, 2009
Diplomatic Immunity
Antipattern: Belief that database
development is different software
development best practices dont apply.
220
220 Monday, April 20, 2009
Diplomatic Immunity
Solution: Employ best practices, just like
in conventional application coding.
Functional testing
Documentation
Source code control
221
221 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
222
Verify presence of
tables and views.
Verify they contain
columns you expect.
Verify absence of
tables, views, or
columns that you
dropped.
222 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
223
Try to execute updates
that ought to be denied
by constraints.
You can catch bugs
earlier by identifying
constraints that are
failing.
223 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
224
Triggers can enforce
constraints too.
Triggers can perform
cascading effects,
transform values, log
changes, etc.
You should test these
scenarios.
224 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
225
Code is more easily
developed, debugged,
and maintained in the
application layer.
Nevertheless, stored
procedures are useful,
e.g. solving tough
bottlenecks.
You should test stored
procedure code.
225 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
226
Lookup tables need to
be lled, even in an
empty database.
Test that the required
data are present.
Other cases exist for
initial data.
226 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
227
Application code is
laced with SQL queries.
Test queries for result
set metadata, e.g.
columns, data types.
Test performance; good
queries can become
bottlenecks, as data and
indexes grow.
227 Monday, April 20, 2009
Diplomatic Immunity
Functional testing
Tables, Views,
Columns
Constraints
Triggers
Stored Procedures
Bootstrap Data
Queries
ORM Classes
228
Like Triggers, ORM
classes contain logic:
Validation.
Transformation.
Monitoring.
You should test these
classes as you would
any other code.
228 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity Relation-
ship Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
229
229 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
230
Purpose of each table,
each column.
Constraints, rules that
apply to each.
Sample data.
List the Views, Triggers,
Procs, Applications, and
Users that use each.
230 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
231
Describe in text the
dependencies between
tables.
Business rules arent
represented fully by
declarative constraints.
231 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
232
Purpose of Views;
who uses them.
Usage of updatable
Views.
Business rules enforced
by Triggers:
Validation
Transformation
Logging
232 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
233
Document the Stored
Procedures as an API.
Especially side-effects.
What problem is the
procedure solving?
Encapsulation
Performance
Privileged access
233 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
234
Logins with specic
access purposes
(e.g. backup, reports).
Sets of privileges
(roles) used for
different scenarios.
Security measures.
234 Monday, April 20, 2009
Diplomatic Immunity
Documentation
Entity-Relationship
Diagram
Tables, Columns
Relationships
Views, Triggers
Stored Procedures
SQL Privileges
Application Code
235
Data Access Layer:
Connection params.
Client options.
Driver versions.
Object-Relational
Mapping (ORM):
Validations, Logging,
Transformations.
Special nd() methods.
235 Monday, April 20, 2009
Diplomatic Immunity
Source code control
Keep database in synch with application code.
Commit portable .SQL les, not binaries.
Create a separate database instance for each
working set (each branch or revision you test).
Also commit bootstrap data and test data
to source control.
236
236 Monday, April 20, 2009
Diplomatic Immunity
Source code control: Migrations.
Migrations are like version-control for the
database instance.
Incremental scripts for each milestone.
Upgrade script to apply new changes
(e.g. CREATE new tables).
Downgrade script to revert changes
(e.g. DROP new tables).
Database instance includes a revision table.
237
237 Monday, April 20, 2009
Magic Beans
238
Essentially, all models are wrong, but some are useful.
George E. P. Box
238 Monday, April 20, 2009
Magic Beans
Objective: simplify application
development using Object-Relational
Mapping (ORM) technology.
239
239 Monday, April 20, 2009
Magic Beans
Antipattern: equating Model in MVC
architecture with the Active Record pattern.
The Golden Hammer of data access.
Model used inaccurately in MVC frameworks:
240
240 Monday, April 20, 2009
Magic Beans
Antipattern: Model is-a Active Record.
241
Active
Record
Active
Record
Bugs Products Comments
inheritance (IS-A)
aggregation (HAS-A)
Controller View
241 Monday, April 20, 2009
Magic Beans
Bad object-oriented design:
Model Active Record
Models tied to database structure.
Can a Product associate to a Bug,
or does a Bug associate to a Product?
Models expose general-purpose
Active Record interface,
not model-specic interface.
242
inheritance (IS-A)
unclear assignment
of responsibilities
poor encapsulation
inappropriate
coupling
242 Monday, April 20, 2009
Magic Beans
Bad Model-View-Controller design
Controller needs to know database structure.
Database changes cause code changes.
Anemic Domain Model, contrary to OO design.
http://www.martinfowler.com/bliki/AnemicDomainModel.html
Pushing Domain-layer code into Application-layer,
contrary to Domain-Driven Design.
http://domaindrivendesign.org/
243
not DRY
T.M.I. !!
243 Monday, April 20, 2009
Magic Beans
Bad testability design
Model coupled to Active Record;
harder to test Model without database.
Database xtures become necessary.
Business logic pushed to Controller;
harder to test Controller code.
244
tests are
slow
mocking HTTP Request,
scraping HTML output
tests are
even slower
244 Monday, April 20, 2009
Magic Beans
Solution: Model has-a Active Record(s).
245
BugReport
(Model)
View Controller
Products Comments
Active
Record
Bugs
inheritance (IS-A)
aggregation (HAS-A)
composition (HAS-A)
245 Monday, April 20, 2009
Magic Beans
Solution: Model has-a Active Record(s).
Models expose only domain-specic interface.
Models encapsulate complex business logic.
Models abstract the persistence implementation.
Controllers and Views are unaware of database.
246
246 Monday, April 20, 2009
Magic Beans
Solution: Model has-a Active Record(s).
Models are decoupled from Active Record.
Supports mock objects.
Supports dependency injection.
Unit-testing Models in isolation is easier & faster.
Unit-testing thinner Controllers is easier.
247
247 Monday, April 20, 2009
Magic Beans
Solution: Model has-a Active Record(s).
Its possible to follow this design,
even in MVC frameworks that assume
that Model is-a Active Record.
248
248 Monday, April 20, 2009
Antipattern Categories
Database Design
Antipatterns
Database Creation
Antipatterns
Query
Antipatterns
Application
Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
SELECT b.product, COUNT(*)
FROM BugsProducts AS b
GROUP BY b.product;
$dbHandle = new PDO(mysql:dbname=test);
$stmt = $dbHandle->prepare($sql);
$result = $stmt->fetchAll();
249
249 Monday, April 20, 2009
Thank You
Copyright 2008-2009 Bill Karwin
www.karwin.com
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share - to copy, distribute and
transmit this work, under the following conditions:
Attribution.
You must attribute this
work to Bill Karwin.
Noncommercial.
You may not use this work
for commercial purposes.
No Derivative Works.
You may not alter,
transform, or build
upon this work.
250 Monday, April 20, 2009