Questions from your boss
- How many products
- How many developers fixed bugs
- Average number of bugs fixed per developer
- How many of our fixed bugs were reported by customers
Your hope
hoping to do the least amount of duplicate work and therefore produce the results faster
So
you make one complex query
SELECT COUNT(bp.product_id) AS how_many_products,
COUNT(dev.account_id) AS how_many_developers,
COUNT(b.bug_id)/COUNT(dev.account_id) AS avg_bugs_per_developer,
COUNT(cust.account_id) AS how_many_customers
FROM Bugs b JOIN BugsProducts bp ON (b.bug_id = bp.bug_id) JOIN Accounts dev ON (b.assigned_to = dev.account_id)
JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE cust.email NOT LIKE '%@example.com'
GROUP BY bp.product_id;
The result seems wrong
17.1 Objective: Decrease SQL Queries
- How can I do this with a single query?
- one SQL query is difficult, complex, and expensive
- two SQL queries must be twice as bad
- Some programming framework(MyBatis)
- I can not solve these data without sql
- I can't write shell script! perl? what is it?
- Excel is good but i need a good format
- what? Database is on remote server???
17.2 Antipattern: Solve a Complex Problem in One Step
17.2.1 Out of your expectation
Let's see an example
count the number of bugs fixed, and the number of bugs open, for a given product
SELECT p.product_id,
COUNT(f.bug_id) AS count_fixed,
COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;
I don't think anyone will write such a query if he really know about SQL
Tell me why? Give me a reason!!
You happen to know that in reality there are 12 fixed bugs and 7 open bugs for the given product
product_id | count_fixed | count_open |
---|---|---|
1 | 84 | 84 |
We got a Cartesian product
17.2.2 More serious
- simply hard to write
- hard to modify
- hard to debug
- big runtime costs
17.3 How to Recognize the Antipattern
- Why are my sums and counts impossibly large?
- I’ve been working on this monster SQL query all day!
- take too long to figure out how to recode the SQL query
- Try putting another DISTINCT into the query
17.4 Legitimate Uses of the Antipattern
- business intelligence
- reporting tools
Pay attention to report’s complexity and the hours it takes to produce it
But can i tell my boss that you need to organize the report by yourself.
17.5 Solution: Divide and Conquer
The Law of Parsimony
When you have two competing theories that make exactly the same predictions, the simpler one is the better.
17.5.1 One Step at a Time
split up a Spaghetti Query into several simpler queries
SELECT p.product_id, COUNT(f.bug_id) AS count_fixed
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id;
SELECT p.product_id, COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;
Merits
- Avoid unwanted Cartesian product
- Easier to maintain when add another requirement
- Easier for SQL engine to do optimization
- Easier to share with others(Code review, explain .. etc.)
17.5.2 Look for the UNION Label
(SELECT p.product_id, f.status, COUNT(f.bug_id) AS bug_count
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id, f.status)
UNION ALL
(SELECT p.product_id, o.status, COUNT(o.bug_id) AS bug_count
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id, o.status)
ORDER BY bug_count;
17.5.2 Solving Your Boss’s Problem
- How many products
SELECT COUNT(*) AS how_many_products
FROM Products;
- How many developers fixed bugs
SELECT COUNT(DISTINCT assigned_to) AS how_many_developers FROM Bugs
WHERE status = 'FIXED';
- Average number of bugs fixed per developer
SELECT AVG(bugs_per_developer) AS average_bugs_per_developer
FROM (SELECT dev.account_id, COUNT(*) AS bugs_per_developer
FROM Bugs b JOIN Accounts dev
ON (b.assigned_to = dev.account_id)
WHERE b.status = 'FIXED'
GROUP BY dev.account_id) t;
- How many of our fixed bugs were reported by customers
SELECT COUNT(*) AS how_many_customer_bugs
FROM Bugs b JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE b.status = 'FIXED' AND cust.email NOT LIKE '%@example.com';
17.5.6 Writing SQL Automatically—with SQL
Generate SQL by code
SELECT CONCAT('UPDATE Inventory '
' SET last_used = ''', MAX(u.usage_date), '''',
' WHERE inventory_id = ', u.inventory_id, ';
') AS update_statement
FROM ComputerUsage u
GROUP BY u.inventory_id;
But I have a confuse
How to balance the query times and query time?(May be I am too young too simple)
An Example of myself
SELECT t1.reg_date,t1.COUNT, t2.COUNT,t3.COUNT,t4.COUNT,t5.COUNT,t6.COUNT,t7.COUNT,t8.COUNT,t9.COUNT,t10.COUNT, t11.COUNT,t12.COUNT,t13.COUNT,t14.COUNT
FROM
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 1 DAY) group by game_reg_date) AS t1,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t2,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t3,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t4,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t5,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t6,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 7 DAY) group by game_reg_date) AS t7,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 8 DAY) group by game_reg_date) AS t8,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 9 DAY) group by game_reg_date) AS t9,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t10,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t11,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t12,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t13,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t14
where t1.reg_date = t2.reg_date AND t1.reg_date=t3.reg_date AND t4.reg_date=t1.reg_date AND t1.reg_date=t5.reg_date AND t1.reg_date = t6.reg_date AND t1.reg_date = t7.reg_date AND t1.reg_date = t8.reg_date
AND t1.reg_date = t9.reg_date AND t1.reg_date=t10.reg_date AND t11.reg_date=t1.reg_date AND t1.reg_date=t12.reg_date AND t1.reg_date = t13.reg_date AND t1.reg_date = t14.reg_date;