100% found this document useful (1 vote)
433 views600 pages

Databricks SQL 2024

The document contains information about Tom Coffing and David Cook, who are authors and developers for Coffing Data Warehousing. It discusses their backgrounds, experiences, and roles within the company. Tom Coffing founded Coffing Data Warehousing 20 years ago and has written over 85 books on database technologies. David Cook has been a lead developer on the Nexus Query Chameleon software for 10 years and helped design data analysis and migration tools. The document also provides contact information for Tom Coffing.

Uploaded by

kolleru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
433 views600 pages

Databricks SQL 2024

The document contains information about Tom Coffing and David Cook, who are authors and developers for Coffing Data Warehousing. It discusses their backgrounds, experiences, and roles within the company. Tom Coffing founded Coffing Data Warehousing 20 years ago and has written over 85 books on database technologies. David Cook has been a lead developer on the Nexus Query Chameleon software for 10 years and helped design data analysis and migration tools. The document also provides contact information for Tom Coffing.

Uploaded by

kolleru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 600

Licensed to , ajayraoda@gmail.

com
The Tera-Tom Video Series

Lessons with Tera-Tom

Teradata Architecture and SQL Video Series


These exciting videos make learning and certification much easier

YouTube Channel: CoffingDW

Licensed to , [email protected]
The Tera-Tom and David Cook Cloud Series

Each Cloud Series book targets a cloud database. The books take a building block approach, always starting
simple, and then each page builds upon the previous point.

Licensed to , [email protected]
Tera-Tom- Author of over 90 Books

Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach
millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they
are to understand. They are so easy that a seven-year-old boy (raised by wolves) can understand them!

Licensed to , [email protected]
The Query Tool of the Future is Nexus

The Nexus is the greatest tool for data the world has ever known. Download a free trial at www.CoffingDW.com.
Checkout the Nexus in action on YouTube here: https://www.youtube.com/watch?v=drNlY1cyZrw

Licensed to , [email protected]
Trademarks and Copyrights
Databricks is a registered trademark of Databricks. Snowflake is a registered trademark of Snowflake. Microsoft
Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-
SQL, Azure SQL Data Warehouse, and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET, and
SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2, and Netezza
are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards
Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of
Linus Torvalds. Java and Oracle are a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a
trademark of Kognitio. Greenplum is a trademark of EMC Corporation. Nexus Query Chameleon is a trademark
of Coffing Data Warehousing.

Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity concerning any loss
or damages arising from the information contained in this book or from the use of programs or program segments
that are included. The manual is not a publication of Microsoft Corporation, nor was it produced in conjunction
with Microsoft Corporation.

Copyright © September 2021 by Coffing Publishing

ISBN 978-1-940540-61-0 Databricks SQL

All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any
means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the
publisher. No patent liability is assumed with respect to the use of information contained herein. Although we
took every precaution in preparing this book, the publisher and author assume no responsibility for errors or
omissions, neither is any liability assumed for damages resulting from the use of the information contained herein.

Licensed to , [email protected]
About Tom Coffing

Tom Coffing, better known as Tera-Tom, has been the CEO and founder of Coffing Data Warehousing for the past 20
years. Tom has written over 85 books on all aspects of Teradata, Netezza, Snowflake, Redshift, Yellowbrick, Vertica,
SQL Server, Azure Synapse, MySQL, Postgres, Greenplum, Oracle, Databricks and more. In addition, Tom has taught
over 1,000 Teradata classes in India, Africa, Europe, China, Malaysia, and North America.

Tom also owns and designs the Nexus Desktop and Nexus Server software. The Nexus Desktop software allows users
to query all database platforms, migrate and move data automatically between database platforms, and join data across
all database platforms. As a result, the Nexus product line is one of the most sophisticated enterprise tools in the industry.

In High School, Tom was the first athlete from his school to every place at the state in any sport, was selected by his
school to represent them at Buckeye Boys State, and Tom is proud of his induction into the first class of the Lakota High
School Hall of Fame.

At the Universthe ity of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler,
Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a
bachelor’s degree in Speech Communications.

After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as
an actor. Tom is the proud father of three beautiful children and seven grandchildren and has been married for the past
32 years. You can contact Tom at 513 300-0341 or [email protected].

Licensed to , [email protected]
About David Cook

For nearly a decade, David Cook has been one of the lead developers on the Nexus Query Chameleon software
at Coffing Data Warehousing. While in this position, David has designed and created several data analysis and
migration tools, including the Garden of Analysis, which queries answer sets without leaving the PC. He is also
the creator of the database to database move and compare module that allows users to move and compare
databases from different platforms.

David brings to Coffing Data Warehousing a strong background of experience with information technology and
management, including time spent managing logistics information for national building products manufacturers.
David's ability to communicate well, combined with his programming talent, has made him an excellent asset for
Coffing Data Warehousing.

David graduated cum laude from The Ohio State University, receiving a BA in Communication Technology.
David furthered his education at Miami University, where he held a senator position in the student government
while maintaining a 4.0 GPA in his study area in Computer Science and Programming.

Licensed to , [email protected]
Table of Contents

Contents
Chapter 1 – Introduction to SQL .................................................................................................................................. 1
Introduction ................................................................................................................................................................ 2
SELECT * (All Columns) in a Table ......................................................................................................................... 3
SELECT Specific Columns in a Table ...................................................................................................................... 4
Commas in the Front or Back? .................................................................................................................................. 5
Place your Commas in front for better Debugging Capabilities ................................................................................ 6
Sort the Data with the ORDER BY Keyword ........................................................................................................... 7
Use a Column name or Number in an ORDER BY Statement ................................................................................. 8
Two Examples of ORDER BY using Different Techniques ..................................................................................... 9
Changing the ORDER BY to Descending Order ..................................................................................................... 10
Null Values Sort First in Ascending Mode (Default) .............................................................................................. 11
Order By with Nulls Last ......................................................................................................................................... 12
Order By with Nulls First ......................................................................................................................................... 13
Major Sort vs. Minor Sort ........................................................................................................................................ 14
Multiple Sort Keys using Names vs. Numbers ........................................................................................................ 15
An Order By That Uses an Expression .................................................................................................................... 16
Sorts are Alphabetical, NOT Logical ....................................................................................................................... 17
Using A Valued CASE Statement to Sort Logically ............................................................................................... 18
Using A Searched CASE Statement to Sort Logically ............................................................................................ 19
Quiz – Can you Add a Minor Sort? ......................................................................................................................... 20
Answer – Can you Add a Minor Sort?..................................................................................................................... 21
Order By Decode...................................................................................................................................................... 22
Quiz – Can you Add Two Minor Sorts Using Decode? .......................................................................................... 23
Answer – Can you Add Two Minor Sorts Using Decode?...................................................................................... 24
How to ALIAS a Column name ............................................................................................................................... 25

Licensed to , [email protected]
Table of Contents

Using an Alias in the ORDER BY Clause ............................................................................................................... 26


A Missing Comma can by Mistake become an Alias .............................................................................................. 27
Comments using Double Dashes are Single Line Comments ................................................................................. 28
Comments for Multi-Lines....................................................................................................................................... 29
Comments for Multi-Lines As Double Dashes Per Line ......................................................................................... 30
Chapter 2 – The WHERE Clause................................................................................................................................ 32
The WHERE Clause limits Returning Rows ........................................................................................................... 33
Numbers Don't Need Single Quotes ........................................................................................................................ 34
Not Equal.................................................................................................................................................................. 35
Searching for null Values Using Equality Returns Nothing .................................................................................... 36
Is NULL ................................................................................................................................................................... 37
IS Not Null ............................................................................................................................................................... 38
Using Greater Than Or Equal To (>=) ..................................................................................................................... 39
AND in the WHERE Clause .................................................................................................................................... 40
Troubleshooting AND .............................................................................................................................................. 41
OR in the WHERE Clause ....................................................................................................................................... 42
Troubleshooting OR ................................................................................................................................................. 43
WHY OR Must Utilize the Column Name Each Time ........................................................................................... 44
Troubleshooting Character Data .............................................................................................................................. 45
Troubleshooting Character Data Continued ............................................................................................................ 46
Quiz – How many rows will return? ........................................................................................................................ 47
Answer to Quiz – How many rows will return? ...................................................................................................... 48
What is the Order of Precedence? ............................................................................................................................ 49
Using Parentheses to change the Order of Precedence ............................................................................................ 50
Using an IN List in Place of OR .............................................................................................................................. 51
The IN List is an Excellent Technique..................................................................................................................... 52
IN List vs. OR Brings the Same Results .................................................................................................................. 53
The IN List Can Use Character Data ....................................................................................................................... 54
Using a NOT IN List ................................................................................................................................................ 55

Licensed to , [email protected]
Table of Contents

Null Values in a NOT IN List Return No Rows ...................................................................................................... 56


A Technique for Handling Nulls with a NOT IN List ............................................................................................. 57
Technique 2 for Handling Nulls with a NOT IN List .............................................................................................. 58
The BETWEEN Statement is Inclusive ................................................................................................................... 59
The NOT BETWEEN Statement is also Inclusive .................................................................................................. 60
The BETWEEN Statement Works for Character Data ............................................................................................ 61
LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’ ........................................................................................... 62
Another Example of UPPER and LOWER ............................................................................................................. 63
Using LIKE for all Cases with Lower and Upper.................................................................................................... 64
Using ILIKE Handle Case Issues............................................................................................................................. 65
LIKE command Underscore is Wildcard for one Character.................................................................................... 66
Finding Anyone Whose name End in 'Y' ................................................................................................................. 67
Escape Character in the LIKE Command changes Wildcards ................................................................................ 68
Escape Characters Turn off Wildcards in the LIKE Command .............................................................................. 69
The REPLACE Function ......................................................................................................................................... 70
Chapter 3 – Distinct, Group By and Top .................................................................................................................... 72
The Distinct Command ............................................................................................................................................ 73
Distinct vs. GROUP BY .......................................................................................................................................... 74
Quiz – How many rows come back from the Distinct? ........................................................................................... 75
Answer – How many rows come back from the Distinct? ...................................................................................... 76
Top Command .......................................................................................................................................................... 77
Top Command and Order By ................................................................................................................................... 78
Chapter 4 – Aggregation ............................................................................................................................................. 80
Quiz – You calculate the Answer Set in your Mind ................................................................................................ 81
Quiz 2 – Calculate the Answer Set in your Mind ................................................................................................... 82
Answer - Quiz 2 – Calculate the Answer Set in your Mind ................................................................................... 83
There are Five Aggregates ....................................................................................................................................... 84
Quiz – How many rows come back? ....................................................................................................................... 85
Answer – How many rows come back? ................................................................................................................... 86

Licensed to , [email protected]
Table of Contents

Casting a Data Type ................................................................................................................................................. 87


Troubleshooting Aggregates .................................................................................................................................... 88
GROUP BY Delivers One Row Per Group ............................................................................................................. 89
GROUP BY dept_no or GROUP BY Column Number .......................................................................................... 90
Limiting Rows and Improving Performance with WHERE .................................................................................... 91
WHERE Clause in Aggregation limits unneeded Calculations ............................................................................... 92
Keyword HAVING tests Aggregates after they are Totaled ................................................................................... 93
Keyword HAVING is like an Extra WHERE Clause for Totals ............................................................................. 94
ANY_VALUE.......................................................................................................................................................... 95
GROUP BY GROUPING SETS.............................................................................................................................. 96
GROUP BY ROLLUP ............................................................................................................................................. 97
GROUP BY ROLLUP Answer Set ......................................................................................................................... 98
GROUP BY CUBE .................................................................................................................................................. 99
GROUP BY CUBE Answer Set ............................................................................................................................ 100
Chapter 5 – Joining Tables ....................................................................................................................................... 102
Nexus Builds Your Join SQL Automatically......................................................................................................... 103
A Two-Table Join Using Traditional Syntax ......................................................................................................... 104
Two-Table join using Traditional Syntax with Table Alias .................................................................................. 105
You Can Fully Qualify All Columns ..................................................................................................................... 106
A Two-Table Join Using ANSI Syntax ................................................................................................................. 107
Both Queries have the same Results and Performance .......................................................................................... 108
Quiz – Can You Finish the Join Syntax? ............................................................................................................... 109
Answer to Quiz – Can You Finish the Join Syntax? ............................................................................................. 110
Quiz – Can You Find the Error? ............................................................................................................................ 111
Answer to Quiz – Can You Find the Error? .......................................................................................................... 112
Super Quiz – Can You Find the Difficult Error? ................................................................................................... 113
Answer to Quiz – Can You Find the Error? .......................................................................................................... 114
Super Quiz – Can You Find the Difficult Error? ................................................................................................... 115
Answer to Super Quiz – Can You Find the Difficult Error? ................................................................................. 116

Licensed to , [email protected]
Table of Contents

Quiz – Which Rows from Both Tables Won’t Return? ......................................................................................... 117
Answer to Quiz – Which rows from both tables Won’t Return?........................................................................... 118
Left Outer Join ....................................................................................................................................................... 119
Left Outer Join Results........................................................................................................................................... 120
Right Outer Join ..................................................................................................................................................... 121
Right Outer Join Example and Results .................................................................................................................. 122
Full Outer Join........................................................................................................................................................ 123
Full Outer Join Results ........................................................................................................................................... 124
Which Tables are Left Tables and Which are Right? ............................................................................................ 125
Answer - Which Tables are Left Tables and Which are Right? ............................................................................ 126
INNER JOIN with Additional AND Clause .......................................................................................................... 127
ANSI INNER JOIN with Additional AND Clause ............................................................................................... 128
ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 129
OUTER JOIN with Additional WHERE Clause ................................................................................................... 130
OUTER JOIN with Additional AND Clause ......................................................................................................... 131
The DREADED Product Join ................................................................................................................................ 132
The DREADED Product Join Results ................................................................................................................... 133
Cartesian Product Join with Traditional Syntax .................................................................................................... 134
Cartesian Product Join with ANSI Syntax ............................................................................................................. 135
The CROSS JOIN .................................................................................................................................................. 136
The CROSS JOIN Answer Set............................................................................................................................... 137
The Self Join........................................................................................................................................................... 138
The Self Join with ANSI Syntax ............................................................................................................................ 139
An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 140
Quiz – Can you Write the 3-Table Join? ............................................................................................................... 141
Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 142
Quiz – Can you Write the 3-Table Join Using ANSI Syntax? .............................................................................. 143
Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 144
Quiz – Can you Place the ON Clauses at the End?................................................................................................ 145

Licensed to , [email protected]
Table of Contents

Answer – Can you Place the ON Clauses at the End? ........................................................................................... 146
The 5-Table Join – Logical Insurance Model ........................................................................................................ 147
Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 148
Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 149
Quiz - Write a Five Table Join Using Traditional Syntax ..................................................................................... 150
Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 151
Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 152
Answer – Re-Write this putting the ON clauses at the END ................................................................................. 153
Chapter 6 – Date Functions....................................................................................................................................... 155
Migrate Any Database to Databricks and Vice Versa ........................................................................................... 156
Current_Date .......................................................................................................................................................... 157
Current_Date, Current_Timestamp, and Current_Timezone ................................................................................ 158
Now() Function ...................................................................................................................................................... 159
Add or Subtract From a Date ................................................................................................................................. 160
Date Function ......................................................................................................................................................... 161
To_Date Function................................................................................................................................................... 162
To_Timestamp Function ........................................................................................................................................ 163
Add or Subtract Days From a Date ........................................................................................................................ 164
Subtract Two Dates for a Difference in Days ........................................................................................................ 165
Subtract Two Dates for a Difference in Days ........................................................................................................ 166
MONTHS_BETWEEN .......................................................................................................................................... 167
The ADD_MONTHS Command ........................................................................................................................... 168
Using the ADD_MONTHS Command to Add 1 Year .......................................................................................... 169
Using the ADD_MONTHS Command to Add 5 Years ........................................................................................ 170
The EXTRACT Command .................................................................................................................................... 171
The EXTRACT Command .................................................................................................................................... 172
EXTRACT from DATES and TIME ..................................................................................................................... 173
Day, Month, Year, DayofMonth, DayofWeek, and DayofYear ............................................................................ 174
Using CASE and Extract to Reformat Dates ......................................................................................................... 175

Licensed to , [email protected]
Table of Contents

Using CAST and SUBSTRING to Reformat Dates .............................................................................................. 176


The Date_Part Function ......................................................................................................................................... 177
Date_Format Function ........................................................................................................................................... 178
More Date_Format Examples ................................................................................................................................ 179
Datediff Example ................................................................................................................................................... 180
Dateadd................................................................................................................................................................... 181
Incrementing Time Values Using the Dateadd Function....................................................................................... 182
Date_Sub Function ................................................................................................................................................. 183
The Date_Trunc Function ...................................................................................................................................... 184
Date_Trunc Command With Time ........................................................................................................................ 185
Date_Trunc Command With Dates ........................................................................................................................ 186
Last_Day ................................................................................................................................................................ 187
Advanced Tricks for Month ................................................................................................................................... 188
Clever Tricks for Month......................................................................................................................................... 189
Make_Date ............................................................................................................................................................. 190
Make_Timestamp ................................................................................................................................................... 191
Using Day, Month, and Year intervals .................................................................................................................. 192
The Basics of a Simple Interval ............................................................................................................................. 193
Determining if the Current_Date is a Leap Year ................................................................................................... 194
Determining if the Current_Timestamp is a Leap Year ........................................................................................ 195
Make_Interval ........................................................................................................................................................ 196
Try_Divide Function .............................................................................................................................................. 197
Chapter 7 – Analytic and Window Functions ........................................................................................................... 199
Nexus Gives You Databricks Analytics for Free ................................................................................................... 200
ROW_NUMBER ................................................................................................................................................... 201
Quiz – How did the Row_Number Reset? ............................................................................................................. 202
Answer – How did the Row_Number Reset? ........................................................................................................ 203
QUALIFY .............................................................................................................................................................. 204
Top Two Students Per class_code Using a Derived Table .................................................................................... 205

Licensed to , [email protected]
Table of Contents

RANK..................................................................................................................................................................... 206
Dense_Rank ........................................................................................................................................................... 207
Getting RANK to Sort in DESC Order .................................................................................................................. 208
RANK() OVER and PARTITION BY .................................................................................................................. 209
RANK() OVER, PARTITION BY, and QUALIFY .............................................................................................. 210
RANK() OVER and a Derived Table .................................................................................................................... 211
RANK() OVER and a WITH Derived Table ......................................................................................................... 212
RANK vs. DENSE_RANK.................................................................................................................................... 213
DENSE_RANK() OVER and PARTITION BY ................................................................................................... 214
PERCENT_RANK() OVER with 14 rows in Calculation .................................................................................... 215
PERCENT_RANK() OVER with 21 rows in Calculation .................................................................................... 216
PERCENT_RANK() OVER and PARTITION BY .............................................................................................. 217
Cumulative Sum ..................................................................................................................................................... 218
Cumulative Sum with CAST ................................................................................................................................. 219
Cumulative Sum – The Sort Explained ................................................................................................................. 220
Cumulative Sum – Rows Unbounded Preceding Explained ................................................................................. 221
Cumulative Sum – Making Sense of the Data ....................................................................................................... 222
Cumulative Sum – Major and Minor Sort Keys .................................................................................................... 223
Reset with a PARTITION BY Statement .............................................................................................................. 224
Totals and Subtotals through Partition By ............................................................................................................. 225
Moving Sum ........................................................................................................................................................... 226
Moving SUM every 3-rows Vs. a Continuous Average ........................................................................................ 227
Partition By Resets the Calculations ...................................................................................................................... 228
Moving Average..................................................................................................................................................... 229
The Moving Window is Current Row and Preceding............................................................................................ 230
How Moving Average Handles the Order By........................................................................................................ 231
Quiz – How is that Total Calculated? .................................................................................................................... 232
Answer to Quiz – How is that Total Calculated? .................................................................................................. 233
Quiz – How is that 4th Row Calculated? ............................................................................................................... 234

Licensed to , [email protected]
Table of Contents

Answer to Quiz – How is that 4th Row Calculated? ............................................................................................. 235


Moving Average every 3-rows Vs. a Continuous Average ................................................................................... 236
The Partition By Statement .................................................................................................................................... 237
Partition By Resets an ANSI OLAP ...................................................................................................................... 238
Moving Difference ................................................................................................................................................. 239
Moving Difference with Partition By .................................................................................................................... 240
Moving Difference with Partition By .................................................................................................................... 241
Finding a Value of a Column in the Next Row with MIN .................................................................................... 242
Finding a Next Row Value with MIN and PARTITION BY ................................................................................ 243
Finding The Next Date using MAX ....................................................................................................................... 244
Finding Multiple Values of a Column in Upcoming Rows ................................................................................... 245
COUNT OVER for a Sequential Number ............................................................................................................. 246
COUNT OVER using ROWS UNBOUNDED PRECEDING .............................................................................. 247
The MAX OVER Command.................................................................................................................................. 248
MAX OVER with PARTITION BY Reset ........................................................................................................... 249
The MIN OVER Command ................................................................................................................................... 250
The MIN OVER Command with PARTITION BY .............................................................................................. 251
Different Windowing Options ............................................................................................................................... 252
How Ntile Works ................................................................................................................................................... 253
Ntile in DESC Mode .............................................................................................................................................. 254
Ntile ........................................................................................................................................................................ 255
Ntile Continued ...................................................................................................................................................... 256
Ntile Percentile ....................................................................................................................................................... 257
Another Ntile Example .......................................................................................................................................... 258
Using Quantiles (Partitions of Four) ...................................................................................................................... 259
NTILE With a Partition.......................................................................................................................................... 260
NTILE With a Qualify Statement .......................................................................................................................... 261
Using FIRST_VALUE ........................................................................................................................................... 262
FIRST_VALUE ..................................................................................................................................................... 263

Licensed to , [email protected]
Table of Contents

FIRST_VALUE With Partitioning ........................................................................................................................ 264


Daily_Sales Minus FIRST_VALUE With Partitioning......................................................................................... 265
FIRST_VALUE With Partitioning ........................................................................................................................ 266
FIRST_VALUE After Sorting by the Highest Value ............................................................................................ 267
FIRST_VALUE with Partitioning ......................................................................................................................... 268
Using LAST_VALUE ............................................................................................................................................ 269
LAST_VALUE – Current Row ............................................................................................................................. 270
First_Value Review ................................................................................................................................................ 271
Last_Value Can Be Confusing ............................................................................................................................... 272
Last_Value Now Makes Sense .............................................................................................................................. 273
Last_Value With Partitioning ................................................................................................................................ 274
Last_Value And First_Value with Partitioning ..................................................................................................... 275
First and Last Value Difference Between Today's Daily_Sales ............................................................................ 276
Using LEAD........................................................................................................................................................... 277
Using LEAD with a PARTITION Statement ........................................................................................................ 278
Using LEAD With an Offset of 2 .......................................................................................................................... 279
Using LEAD With an Offset of 2 and a PARTITION .......................................................................................... 280
Using LAG ............................................................................................................................................................. 281
Using LAG with a PARTITION Statement ........................................................................................................... 282
Using Two LAG Statements .................................................................................................................................. 283
Using LAG With an Offset of 2 ............................................................................................................................. 284
Using LAG With an Offset of 2 and a PARTITION ............................................................................................. 285
CUME_DIST ......................................................................................................................................................... 286
CUME_DIST ......................................................................................................................................................... 287
CUME_DIST and Qualify ..................................................................................................................................... 288
CUME_DIST With Ties ........................................................................................................................................ 289
CUME_DIST and Partition By .............................................................................................................................. 290
CUME_DIST With a Partition on the Sales_Table ............................................................................................... 291
CURRENT ROW AND UNBOUNDED FOLLOWING ...................................................................................... 292

Licensed to , [email protected]
Table of Contents

Different Windowing Options ............................................................................................................................... 293


MEDIAN Example................................................................................................................................................. 294
MEDIAN with Partitioning and a WHERE Clause ............................................................................................... 295
MEDIAN with Partitioning .................................................................................................................................... 296
PERCENTILE_CONT Function Description and Syntax ..................................................................................... 297
Final Result Information About PERCENTILE_CONT ....................................................................................... 298
PERCENTILE_DISC Function Arguments .......................................................................................................... 299
PERCENTILE_CONT Example............................................................................................................................ 300
PERCENTILE_CONT Example with Percentage Change ................................................................................... 301
PERCENTILE_CONT With PARTITION Example ............................................................................................ 302
PERCENTILE_CONT With PARTITION and (0.4) ............................................................................................ 303
PERCENTILE_DISC Function Description and Syntax....................................................................................... 304
PERCENTILE_DISC Example ............................................................................................................................. 305
PERCENTILE_DISC Example with Percentage Change ..................................................................................... 306
PERCENTILE_DISC With PARTITION Example .............................................................................................. 307
PERCENTILE_DISC With PARTITION and (0.4) .............................................................................................. 308
Chapter 8 – Temporary Tables ................................................................................................................................. 310
CREATING A Derived Table................................................................................................................................ 311
Naming the Derived Table ..................................................................................................................................... 312
Aliasing the Column names in the Derived Table ................................................................................................. 313
CREATING A Derived Table using the WITH Command ................................................................................... 314
Derived Query Examples with Three Different Techniques ................................................................................. 315
Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 316
The Three Components of a Derived Table ........................................................................................................... 317
Visualize This Derived Table ................................................................................................................................ 318
Our Join Example Using The WITH Syntax ......................................................................................................... 319
An Example of Two Derived Tables in a Single Query ........................................................................................ 320
Chapter 9 – Subqueries ............................................................................................................................................. 322
An IN List is much like a Subquery ....................................................................................................................... 323

Licensed to , [email protected]
Table of Contents

An IN List Never has Duplicates – Just like a Subquery....................................................................................... 324


An IN List Ignores Duplicates ............................................................................................................................... 325
The Subquery ......................................................................................................................................................... 326
The Three Steps of How a Basic Subquery Works................................................................................................ 327
These are Equivalent Queries ................................................................................................................................ 328
The Final Answer Set from the Subquery.............................................................................................................. 329
Quiz- Answer the Difficult Question ..................................................................................................................... 330
Answer to Quiz - Answer the Difficult Question .................................................................................................. 331
Should you use a Subquery or a Join? ................................................................................................................... 332
Quiz - Write the Subquery ..................................................................................................................................... 333
Answer to Quiz- Write the Subquery..................................................................................................................... 334
Quiz - Write the More Difficult Subquery ............................................................................................................. 335
Answer to Quiz - Write the More Difficult Subquery ........................................................................................... 336
Quiz – Write the Extreme Subquery ...................................................................................................................... 337
Answer To Quiz – Write the Extreme Subquery ................................................................................................... 338
Quiz - Write the Subquery with an Aggregate....................................................................................................... 339
Answer to Quiz- Write the Subquery with an Aggregate ...................................................................................... 340
Quiz- Write the Correlated Subquery .................................................................................................................... 341
Answer to Quiz- Write the Correlated Subquery ................................................................................................... 342
The Basics of a Correlated Subquery ..................................................................................................................... 343
The Top Query always runs first in a Correlated Subquery .................................................................................. 344
Correlated Subquery Example vs. a Join with a Derived Table ............................................................................ 345
Quiz- A Second Chance To Write a Correlated Subquery .................................................................................... 346
Answer - A Second Chance to Write a Correlated Subquery ................................................................................ 347
Quiz- A Third Chance To Write a Correlated Subquery ....................................................................................... 348
Answer - A Third Chance to Write a Correlated Subquery ................................................................................... 349
Quiz- Last Chance To Write a Correlated Subquery ............................................................................................. 350
Answer – Last Chance to Write a Correlated Subquery ........................................................................................ 351
Quiz – Write the Extreme Correlated Subquery .................................................................................................... 352

Licensed to , [email protected]
Table of Contents

Answer To Quiz – Write the Extreme Correlated Subquery ................................................................................. 353


NOT IN Subquery Returns Nothing when nulls are Present ................................................................................. 354
Fixing a NOT IN Subquery with Null Values ....................................................................................................... 355
Quiz- Write the NOT Subquery ............................................................................................................................. 356
Answer to Quiz- Write the NOT Subquery ........................................................................................................... 357
Quiz - Write the Subquery using a WHERE Clause.............................................................................................. 358
Answer - Write the Subquery using a WHERE Clause ......................................................................................... 359
Quiz- Write the Subquery with Two Parameters ................................................................................................... 360
Answer to Quiz- Write the Subquery with Two Parameters ................................................................................. 361
How the Double Parameter Subquery Works ........................................................................................................ 362
More on how the Double Parameter Subquery Works .......................................................................................... 363
Another Example of a Double Parameter Subquery .............................................................................................. 364
Quiz – Write the Triple Subquery .......................................................................................................................... 365
Answer to Quiz – Write the Triple Subquery ........................................................................................................ 366
Using a Correlated Exists ....................................................................................................................................... 367
How a Correlated Exists Matches Up .................................................................................................................... 368
The Correlated NOT Exists .................................................................................................................................... 369
Chapter 10 – Strings.................................................................................................................................................. 371
UPPER and lower Functions................................................................................................................................. 372
The Length Command Counts Characters ............................................................................................................. 373
LENGTH and TRIM Work on Fixed Length Columns ......................................................................................... 374
The Char_Length Command Counts Characters ................................................................................................... 375
CHAR_LENGTH and OCTET_LENGTH ............................................................................................................ 376
The TRIM Command trims both Leading and Trailing Spaces ............................................................................ 377
The RTRIM and LTRIM Command Trims Spaces ............................................................................................... 378
TRIM can also TRIM Characters........................................................................................................................... 379
Concatenation ......................................................................................................................................................... 380
Concat and Concat_WS for Concatenation ........................................................................................................... 381
The SUBSTR and SUBSTRING Commands ........................................................................................................ 382

Licensed to , [email protected]
Table of Contents

How SUBSTR Works with NO ENDING POSITION ......................................................................................... 383


Using SUBSTR and CHAR_LENGTH Together ................................................................................................. 384
The POSITION Command finds a Letters Position .............................................................................................. 385
The POSITION Command is Brilliant with SUBSTR .......................................................................................... 386
CHARINDEX Finds a Letter(s) Position in a String ............................................................................................. 387
The CHARINDEX Command is brilliant with SUBSTRING .............................................................................. 388
The CHARINDEX Command Using a Literal ...................................................................................................... 389
LPAD and RPAD ................................................................................................................................................... 390
The REPLACE Function ....................................................................................................................................... 391
The ASCII Function ............................................................................................................................................... 392
The Reverse String Function.................................................................................................................................. 393
The RIGHT Function ............................................................................................................................................. 394
The LEFT and RIGHT Functions .......................................................................................................................... 395
REGEXP Example for Whitespace Character ....................................................................................................... 396
REGEXP Example for Non-Whitespace ............................................................................................................... 397
REGEXP Example for [xyz] .................................................................................................................................. 398
REGEXP Example Start of a String ...................................................................................................................... 399
REGEXP Example End of a String........................................................................................................................ 400
REGEXP Example Matching Within a Range ...................................................................................................... 401
REGEXP_REPLACE ............................................................................................................................................ 402
REGEXP_REPLACE Example ............................................................................................................................. 403
Another REGEXP_REPLACE Example ............................................................................................................... 404
REGEXP_LIKE ..................................................................................................................................................... 405
RLIKE .................................................................................................................................................................... 406
SOUNDEX Function to Find a Sound ................................................................................................................... 407
Chapter 11 – Interrogating the Data.......................................................................................................................... 409
Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 410
Answer – Fill in the Answers for the NULLIF Command .................................................................................... 411
COALESCE in a Real-World Example ................................................................................................................. 412

Licensed to , [email protected]
Table of Contents

The COALESCE Command .................................................................................................................................. 413


COALESCE is Equivalent to this CASE Statement .............................................................................................. 414
Some Great CAST (Convert And Store) Examples ............................................................................................... 415
A Rounding Example Using CAST ....................................................................................................................... 416
CAST will Round Values up or Down .................................................................................................................. 417
Valued Case vs. Searched Case ............................................................................................................................. 418
Combining Searched Case and Valued Case ......................................................................................................... 419
Decode .................................................................................................................................................................... 420
A Trick for getting a Horizontal Case .................................................................................................................... 421
Put a Valued CASE in the ORDER BY................................................................................................................. 422
Put a Searched CASE in the ORDER BY ............................................................................................................. 423
Put a Decode in the ORDER BY ........................................................................................................................... 424
Extreme CASE Challenge ...................................................................................................................................... 425
Answer - Extreme CASE Challenge ...................................................................................................................... 426
Answer - CASE Challenge .................................................................................................................................... 427
Chapter 12 – Views ................................................................................................................................................... 429
The Fundamentals of Views .................................................................................................................................. 430
Creating a Simple View to Restrict Sensitive Columns ........................................................................................ 431
Creating a Simple View to Restrict Rows ............................................................................................................. 432
Creating a View to Join Tables Together............................................................................................................... 433
Basic Rules for Views ............................................................................................................................................ 434
How to Modify a View .......................................................................................................................................... 435
The Exception to the ORDER BY Rule inside a View ......................................................................................... 436
Derived Columns in a View Should Contain a Column Alias .............................................................................. 437
The Standard Way Most Aliasing is Done ............................................................................................................ 438
Another Way to Alias Columns in a View CREATE ............................................................................................ 439
What Happens When a View Column gets Aliased Twice? ................................................................................. 440
Chapter 13 – Set Operators ....................................................................................................................................... 442
Rules of Set Operators ........................................................................................................................................... 443

Licensed to , [email protected]
Table of Contents

Quiz - Intersect Explained Logically ..................................................................................................................... 444


Answer - Intersect Explained Logically ................................................................................................................ 445
Quiz - Union Explained Logically ......................................................................................................................... 446
Answer - Union Explained Logically .................................................................................................................... 447
Quiz - Union ALL Explained Logically ................................................................................................................ 448
Answer - Union ALL Explained Logically ........................................................................................................... 449
Quiz - Except Explained Logically ........................................................................................................................ 450
Answer - Except Explained Logically ................................................................................................................... 451
Quiz - Testing Your Knowledge ............................................................................................................................ 452
Answer - Testing Your Knowledge ....................................................................................................................... 453
An Equal Number of Columns in both SELECT List ........................................................................................... 454
The Top Query handles all Aliases ........................................................................................................................ 455
The Bottom Query does the ORDER BY .............................................................................................................. 456
Intersect Challenge ................................................................................................................................................. 457
Answer - Intersect Challenge ................................................................................................................................. 458
UNION Vs. UNION ALL ...................................................................................................................................... 459
Using UNION ALL and Literals ........................................................................................................................... 460
Using UNION ALL for speed in Merging Data Sets ............................................................................................ 461
Great Trick: Place your Set Operator in a Derived Table..................................................................................... 462
A Great Example of how EXCEPT works ............................................................................................................ 463
USING Multiple SET Operators in a Single Request ............................................................................................ 464
Changing the Order of Precedence with Parentheses ............................................................................................ 465
Chapter 14 – Creating Tables ................................................................................................................................... 467
Create Table Syntax ............................................................................................................................................... 468
Data Types.............................................................................................................................................................. 469
Create Table Examples .......................................................................................................................................... 470
Best Practices for Partitioned Tables ..................................................................................................................... 471
Describe Detail Tablename .................................................................................................................................... 472
Not Null Constraint ................................................................................................................................................ 473

Licensed to , [email protected]
Table of Contents

Create a Table IF NOT EXISTS ............................................................................................................................ 474


Create Table AS (CTAS) Populates the Table With Data ..................................................................................... 475
Create Table AS (CTAS) can Choose Certain Columns ....................................................................................... 476
Chapter 15 – Data Manipulation Language (DML) ................................................................................................. 478
INSERT Syntax # 1 ................................................................................................................................................ 479
INSERT Syntax # 2 ................................................................................................................................................ 480
INSERT Example with Multiple Rows ................................................................................................................. 481
Above we have inserted multiple rows and placed null values in some of them. ................................................. 482
INSERT/SELECT Command ................................................................................................................................ 483
INSERT/SELECT to Build a Data Mart ................................................................................................................ 484
UPDATE Examples ............................................................................................................................................... 485
Deleting Rows in a Table ....................................................................................................................................... 486
Chapter 16 – Statistical Aggregate Functions........................................................................................................... 489
The Stats Table ....................................................................................................................................................... 490
The KURTOSIS Function ...................................................................................................................................... 491
A KURTOSIS Example ......................................................................................................................................... 492
The STDDEV_POP Function ................................................................................................................................ 493
STDDEV_POP Example ....................................................................................................................................... 494
The STDDEV_SAMP Function............................................................................................................................. 495
A STDDEV_SAMP Example ................................................................................................................................ 496
The VAR_POP Function ....................................................................................................................................... 497
A VAR_POP Example ........................................................................................................................................... 498
The VAR_SAMP Function .................................................................................................................................... 499
A VAR_SAMP Example ....................................................................................................................................... 500
The CORR Function .............................................................................................................................................. 501
A CORR Example .................................................................................................................................................. 502
Another CORR Example so you can Compare...................................................................................................... 503
The VARIANCE Function..................................................................................................................................... 504
A VARIANCE Example ........................................................................................................................................ 505

Licensed to , [email protected]
Table of Contents

The COVAR_POP Function .................................................................................................................................. 506


A COVAR_POP Example ..................................................................................................................................... 507
Another COVAR_POP Example so you can Compare ......................................................................................... 508
The COVAR_SAMP Function .............................................................................................................................. 509
A COVAR_SAMP Example .................................................................................................................................. 510
Another COVAR_SAMP Example so you can Compare ..................................................................................... 511
The REGR_INTERCEPT Function ....................................................................................................................... 512
A REGR_INTERCEPT Example ......................................................................................................................... 513
Another REGR_INTERCEPT Example so you can Compare .............................................................................. 514
The REGR_SLOPE Function ................................................................................................................................ 515
A REGR_SLOPE Example .................................................................................................................................... 516
NOT IN Subquery Returns Nothing when nulls are Present ................................................................................. 517
The REGR_AVGX Function ................................................................................................................................. 518
A REGR_AVGX Example .................................................................................................................................... 519
Another REGR_AVGX Example so you can Compare ....................................................................................... 520
The REGR_AVGY Function ............................................................................................................................... 521
A REGR_AVGY Example .................................................................................................................................... 522
Quiz- Write the Subquery with Two Parameters ................................................................................................... 523
The REGR_COUNT Function ............................................................................................................................... 524
A REGR_COUNT Example .................................................................................................................................. 525
The REGR_R2 Function ........................................................................................................................................ 526
A REGR_R2 Example ........................................................................................................................................... 527
The REGR_SXX Function..................................................................................................................................... 528
Answer to Quiz – Write the Triple Subquery ........................................................................................................ 529
The REGR_SXY Function..................................................................................................................................... 530
A REGR_SXY Example ........................................................................................................................................ 531
The REGR_SYY Function..................................................................................................................................... 532
A REGR_SYY Example ........................................................................................................................................ 533
Using GROUP BY ................................................................................................................................................. 534

Licensed to , [email protected]
Table of Contents

APPROX_COUNT_DISTINCT ............................................................................................................................ 535


Chapter 17 – Mathematical Functions ...................................................................................................................... 537
Numeric Manipulation Functions .......................................................................................................................... 538
ABS ........................................................................................................................................................................ 539
ACOS ..................................................................................................................................................................... 540
ACOSH .................................................................................................................................................................. 541
ASIN....................................................................................................................................................................... 542
ASINH .................................................................................................................................................................... 543
ATAN ..................................................................................................................................................................... 544
ATAN2 ................................................................................................................................................................... 545
ATANH .................................................................................................................................................................. 546
CBRT ..................................................................................................................................................................... 547
Ceil ......................................................................................................................................................................... 548
COS ........................................................................................................................................................................ 549
COSH ..................................................................................................................................................................... 550
COT ........................................................................................................................................................................ 551
DEGREES .............................................................................................................................................................. 552
DIV ......................................................................................................................................................................... 553
EXP ........................................................................................................................................................................ 554
FACTORIAL ......................................................................................................................................................... 555
Floor ....................................................................................................................................................................... 556
LN........................................................................................................................................................................... 557
LOG ........................................................................................................................................................................ 558
MOD....................................................................................................................................................................... 559
PI ............................................................................................................................................................................ 560
POW or POWER.................................................................................................................................................... 561
RADIANS .............................................................................................................................................................. 562
ROUND .................................................................................................................................................................. 563
SIGN....................................................................................................................................................................... 564

Licensed to , [email protected]
Table of Contents

SIN ......................................................................................................................................................................... 565


SINH....................................................................................................................................................................... 566
SQRT ...................................................................................................................................................................... 567
TAN ........................................................................................................................................................................ 568
TANH ..................................................................................................................................................................... 569

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Chapter 1 – Introduction to SQL

"A bird does not sing because it has the answers, it sings because it has a song."
- Anonymous

Page 1

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Introduction

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

We are using the student_table above in many of our early SQL Examples

The picture above is a pictorial of the student_table, which we will use to present some basic examples of SQL and
get some hands-on experience with querying this table. This book attempts to show you the table, show you the
query, and show you the result set.

Page 2

Licensed to , [email protected]
Chapter 1 Introduction to SQL

SELECT * (All Columns) in a Table

An asterisk (*)
SELECT * means you want
FROM student_table ; to see ALL
columns in the
table on your
report

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00

Mostly every SQL statement will consist of a SELECT and a FROM clause. You SELECT the columns you want
to see on your report, and an Asterisk (*) means you want to see all columns in the table on the returning answer
set.

Page 3

Licensed to , [email protected]
Chapter 1 Introduction to SQL

SELECT Specific Columns in a Table

SELECT first_name
Commas ,last_name
separate
column
,class_code
names ,grade_pt
FROM student_table ;

first_name last_name class_code grade_pt


Michael Larkins FR 0.00
Henry Hanson FR 2.88
Richard McRoberts JR 1.90
Stanley Johnson ? ?
Susie Wilson SO 3.80
Wendy Thomas FR 4.00
Danny Delaney SR 3.35
Martin Phillips SR 3.00
Jimmy Bond JR 3.95
Andy Smith SO 2.00

Commas must separate column names. Notice that only the columns requested come back on the report, not all
columns. Also, notice that the order of the columns in the SQL is the same order on the report.

Page 4

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Commas in the Front or Back?

SELECT first_name SELECT first_name,


,last_name last_name,
1 ,class_code 2 class_code,
,grade_pt grade_pt
FROM student_table ; FROM student_table ;

first_name last_name class_code grade_pt


Michael Larkins FR 0.00
Henry Hanson FR 2.88
Richard McRoberts JR 1.90
Stanley Johnson ? ?
Susie Wilson SO 3.80
Wendy Thomas FR 4.00
Danny Delaney SR 3.35
Martin Phillips SR 3.00
Jimmy Bond JR 3.95
Andy Smith SO 2.00

Why is the example on the left better even though they are functionally equivalent? Errors are easier to spot, and
comments won't cause errors. Both examples work and return the same answer set.

Page 5

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Place your Commas in front for better Debugging Capabilities

SELECT first_name, SELECT first_name


last_name, Sometimes if ,last_name
class_code, you Add or ,class_code
Remove a
grade_pt, ,grade_pt
COLUMN
you can
FROM student_table ; overlook an FROM student_table ;
ending
Error! Comma! Successful

Having commas in front to separate column names makes it easier to debug.

Page 6

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Sort the Data with the ORDER BY Keyword

Sorts the SELECT *


Answer Set FROM student_table
in Ascending ORDER BY last_name ;
order by default

student_id last_name first_name class_code grade_pt


322133 Bond Jimmy JR 3.95
324652 Delaney Danny SR 3.35
125634 Hanson Henry FR 2.88
260000 Johnson Stanley ? ?
423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
123250 Phillips Martin SR 3.00
333450 Smith Andy SO 2.00
234121 Thomas Wendy FR 4.00
231222 Wilson Susie SO 3.80

Rows typically come back to the report in random order. To order the result set, you must use an ORDER BY
statement. When you order by a column, it will order in ASCENDING order. The first column listed in an
ORDER BY statement is called the Major Sort! You will see upcoming examples with multiple columns.

Page 7

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Use a Column name or Number in an ORDER BY Statement

SELECT *
Sorts the answer set FROM student_table
by the second ORDER BY 2 ;
column, which is
last_name Sort by the 2nd column
In the answer set.

student_id last_name first_name class_code grade_pt


322133 Bond Jimmy JR 3.95
324652 Delaney Danny SR 3.35
125634 Hanson Henry FR 2.88
260000 Johnson Stanley ? ?
423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
123250 Phillips Martin SR 3.00
333450 Smith Andy SO 2.00
234121 Thomas Wendy FR 4.00
231222 Wilson Susie SO 3.80

The ORDER BY can use a number to represent the sort column. The number 2 represents the second column in
the returning answer set. The example above is also going to default to sort in ascending order.

Page 8

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Two Examples of ORDER BY using Different Techniques

SELECT * Same SELECT *


FROM student_table Query FROM student_table
ORDER BY 5 ; ORDER BY grade_pt ;

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? ?
423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
333450 Smith Andy SO 2.00
125634 Hanson Henry FR 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO 3.80
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00

You have the option of using a number instead of the column name. The column number is represented by what
position it is in the SELECT statement, not the table. If you use an * in your Select Statement, then the column’s
number is represented by its position in the table. The two above queries are the same.

Page 9

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Changing the ORDER BY to Descending Order

Sorts the SELECT *


answer set
FROM student_table
in DESC
Order By
ORDER BY last_name DESC;
last_name

student_id last_name first_name class_code grade_pt


231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
123250 Phillips Martin SR 3.00
280023 McRoberts Richard JR 1.90
423400 Larkins Michael FR 0.00
260000 Johnson Stanley ? ?
125634 Hanson Henry FR 2.88
324652 Delaney Danny SR 3.35
322133 Bond Jimmy JR 3.95

Notice that the answer set sorts in descending order based on the column last_name. Also, notice that last_name is
the second column coming back on the report. We could have done an Order By 2 DESC. If you spell out the
word DESCENDING, the query will fail, so you must remember to use the abbreviation of DESC.

Page 10

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Null Values Sort First in Ascending Mode (Default)

SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 ; ORDER BY grade_pt ;

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? Nulls ?
423400 Larkins Michael FR sort 0.00
280023 McRoberts Richard JR first in 1.90
333450 Smith Andy SO ASC 2.00
125634 Hanson Henry FR mode 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO 3.80
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00

The default for an ORDER BY statement is in ascending mode (ASC). Notice that this places the null values at the
beginning of the answer set.

Page 11

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Order By with Nulls Last

SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 NULLS LAST ORDER BY grade_pt NULLS LAST

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
333450 Smith Andy SO 2.00
125634 Hanson Henry FR 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO Nulls Last 3.80
322133 Bond Jimmy JR sorts 3.95
234121 Thomas Wendy FR Nulls 4.00
Last
260000 Johnson Stanley ? ?

Null values by default sort first in ASC order, but you can use NULLS LAST to place the null values at the end.

Page 12

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Order By with Nulls First

SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 DESC NULLS FIRST ORDER BY grade_pt DESC NULLS FIRST

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? ? Nulls First
234121 Thomas Wendy FR 4.00 sorts
322133 Bond Jimmy JR 3.95 Nulls
First
231222 Wilson Susie SO 3.80
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
280023 McRoberts Richard JR 1.90
423400 Larkins Michael FR 0.00

Null values by default sort last in DESC mode, but you can use NULLS FIRST to place the null values at the
beginning.

Page 13

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Major Sort vs. Minor Sort

SELECT * FROM student_table Major Sort on


ORDER BY class_code DESC, class_code Descending
grade_pt ASC; Minor Sort on
grade_pt Ascending
student_id last_name first_name class_code grade_pt
123250 Phillips Martin SR Major 3.00 Minor
324652 Delaney Danny SR sorts 3.35 sorts
first further
333450 Smith Andy SO 2.00
on
231222 Wilson Susie SO 3.80 ties
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
234121 Thomas Wendy FR 4.00
260000 Johnson Stanley ? ?

The first column or number in an ORDER BY statement is called the major sort, which is how the answer set is to
sort the data. When a second column, or number, is added to the ORDER BY statement, all ties are to sort further
by the minor sort. Notice above that the first and second rows tie because we have two ‘SR’ values in class_code.

Page 14

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Multiple Sort Keys using Names vs. Numbers

SELECT * SELECT *
FROM employee_table FROM employee_table
ORDER BY dept_no DESC ORDER BY 2 DESC,
,salary ASC 5,
,last_name ASC; 3 ASC ;

These queries sort identically


employee_no dept_no last_name first_name salary
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
1121334 400 Strickling Cletus 54500.00
2312225 300 Larkins Loraine 40200.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
1232578 100 Chambers Mandee 48850.00
1000234 10 Smythe Richard 64300.00
2000000 ? Jones Squiggy 32800.50

Queries can have multiple columns in the ORDER BY statement. You can even mix and match the column names
with numbers. Both queries in the example above are equivalent.

Page 15

Licensed to , [email protected]
Chapter 1 Introduction to SQL

An Order By That Uses an Expression

SELECT RTRIM(last_name) || ', ' || first_name AS fullname


FROM employee_table
ORDER BY RTRIM(last_name) || ', ' || first_name;

SELECT RTRIM(last_name) || ', ' || first_name AS fullname


FROM employee_table
ORDER BY fullname; concatenation

fullname
Chambers, Mandee
Coffing, Billy
Harrison, Herbert
Jones, Squiggy
Larkins, Loraine
Reilly, William
Smith, John
Smythe, Richard
Strickling, Cletus

The above examples are equivalent. We use the FULLNAME expression in the ORDER BY of the first example.
The second example uses the alias FULLNAME.

Page 16

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Sorts are Alphabetical, NOT Logical

SELECT * FROM student_table


ORDER BY class_code ;
student_id last_name first_name class_code grade_pt
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
322133 Bond Jimmy JR 3.95
280023 McRoberts Richard JR 1.90
231222 Wilson Susie SO 3.80
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
This sorts alphabetically. Can you change the sort,
so the freshman come first, followed by the
sophomores, juniors, seniors and then the null?

Schools generally have the first year of high school as a freshman. Change the query to Order BY class_code
statement logically, so the order is the freshman, sophomore, junior, senior, and then null.

Page 17

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Using A Valued CASE Statement to Sort Logically

SELECT * FROM student_table A column value of class_code


follows CASE (Valued CASE).
ORDER BY CASE class_code
WHEN 'FR' THEN 1
WHEN 'SO' THEN 2
CASE in the WHEN 'JR' THEN 3
ORDER BY
WHEN 'SR' THEN 4
Statement
ELSE 5
END;

student_id last_name first_name class_code grade_pt


234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

We are using a valued CASE Statement to Order BY class_code logically (FR, SO, JR, SR, null).

Page 18

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Using A Searched CASE Statement to Sort Logically

SELECT * FROM student_table No column value follows


CASE (Searched CASE).
ORDER BY CASE
WHEN class_code = 'FR' THEN 1
WHEN class_code = 'SO' THEN 2
Searched CASE
WHEN class_code = 'JR' THEN 3
in the
ORDER BY
WHEN class_code = 'SR' THEN 4
Statement WHEN class_code IS NULL THEN 5
ELSE 6 END;
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

We are using a Searched CASE Statement to Order BY class_code logically (FR, SO, JR, SR, null).
Page 19

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Quiz – Can you Add a Minor Sort?

SELECT * FROM student_table


ORDER BY CASE WHEN class_code = 'FR' THEN 1
WHEN class_code = 'SO' THEN 2
WHEN class_code = 'JR' THEN 3
WHEN class_code = 'SR' THEN 4
ELSE 5
END;
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

Can you Add a Minor Sort of grade_pt ASC to the example above?

Page 20

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Answer – Can you Add a Minor Sort?

SELECT * FROM student_table


ORDER BY CASE WHEN class_code = 'FR' THEN 1
Put in a comma
WHEN class_code = 'SO' THEN 2
and add the WHEN class_code = 'JR' THEN 3
minor sort WHEN class_code = 'SR' THEN 4
ELSE 5
END, grade_pt ASC;
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

All you have to do is place a comma after the END keyword and then add the column. The ASC is not needed as it
is the default.
Page 21

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Order By Decode

SELECT * FROM student_table


ORDER BY decode(class_code, 'FR', 1, 'SO', 2, 'JR', 3, 'SR', 4, 5)

Column WHEN WHEN WHEN WHEN Else


to 'FR' 'SO' 'JR' 'SR' 5
CASE then 1 then 2 then 3 then 4

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

The DECODE command is similar to the CASE command seen previously.


Page 22

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Quiz – Can you Add Two Minor Sorts Using Decode?

SELECT * FROM student_table


ORDER BY decode(class_code, 'FR', 1, 'SO', 2, 'JR', 3, 'SR', 4, 5)

Column WHEN WHEN WHEN WHEN Else


to 'FR' 'SO' 'JR' 'SR' 5
CASE then 1 then 2 then 3 then 4

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
Make the first
234121 Thomas Wendy FR 4.00
minor sort
grade_pt DESC 333450 Smith Andy SO 2.00
and the second 231222 Wilson Susie SO 3.80
minor sort 280023 McRoberts Richard JR 1.90
first_name ASC. 322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

Your quiz assignment is to add two minor sorts to the above DECODE statement. Make the first minor sort
Grade_Pt DESC, and the second minor sort first_name ASC.

Page 23

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Answer – Can you Add Two Minor Sorts Using Decode?

SELECT * FROM student_table


ORDER BY decode(class_code, 'FR', 1, 'SO', 2, 'JR', 3, 'SR', 4, 5)
,grade_pt DESC
The first minor sort Grade_Pt DESC and
,first_name the second minor sort first_name ASC.
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
333450 Smith Andy SO 2.00
322133 Bond Jimmy JR 3.95
280023 McRoberts Richard JR 1.90
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
260000 Johnson Stanley ? ?

Your quiz assignment is to add two minor sorts to the above DECODE statement. Make the first minor sort
Grade_Pt DESC, and the second minor sort first_name ASC.

Page 24

Licensed to , [email protected]
Chapter 1 Introduction to SQL

How to ALIAS a Column name

SELECT * FROM STUDENT_TABLE


ORDER BY DECODE(CLASS_CODE, 'FR', 1, 'SO', 2, 'JR', 3, 'SR', 4, 5)

Make the first minor sort Column WHEN WHEN WHEN WHEN Else 5
GRADE_PT DESC and the second to 'FR' 'SO' 'JR' 'SR'
minor sort FIRST_NAME ASC. CASE then 1 then 2 then 3 then 4

STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

When you ALIAS a column, you give it a new name for the report header.

Page 25

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Using an Alias in the ORDER BY Clause

SELECT first_name AS Fname


,last_name Lname
,class_code "Class Code"
,grade_pt AS "AVG"
,student_id AS STU_ID
FROM student_table
WHERE class_code = 'FR' You cannot use
an alias in the
ORDER BY "AVG" WHERE clause.

Fname Lname Class Code AVG STU_ID


Michael Larkins FR 0.00 423400
Henry Hanson FR 2.88 125634
Wendy Thomas FR 4.00 234121

When you ALIAS a column, you give it a new name for the report header, but you can also use the alias in the
ORDER BY clause. If you use double quotes in the alias, you need to use double quotes in the ORDER BY
clause.

Page 26

Licensed to , [email protected]
Chapter 1 Introduction to SQL

A Missing Comma can by Mistake become an Alias

SELECT first_name, last_name, class_code grade_pt


FROM student_table ;
Missing a Comma

first_name last_name grade_pt Aliased


as
Michael Larkins FR
grade_pt
Susie Wilson SO
Richard McRoberts JR
Jimmy Bond JR
Henry Hanson FR
Andy Smith SO
Danny Delaney SR
Stanley Johnson ?
Wendy Thomas FR
Martin Phillips SR

Commas must separate column names. Notice in this example, there is a comma missing between class_code and
grade_pt. The query works, but it thinks you want class_code to have an alias of grade_pt. That is why the
keyword AS is right to use when you alias a column.
Page 27

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Comments using Double Dashes are Single Line Comments

Comment -- Double Dashes provide a single line comment


SELECT *
FROM student_table -- This table tracks students
ORDER BY grade_pt ;

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? ?
423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
333450 Smith Andy SO 2.00
125634 Hanson Henry FR 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO 3.80
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00

Double dashes make a single line comment that will be ignored by the system. Notice that we also have double
dashes after the FROM statement. The system also ignores this comment.

Page 28

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Comments for Multi-Lines

Comment /* This is how you can make


multi-line comments to express
what is going on in the code. */
SELECT *
FROM student_table
ORDER BY grade_pt ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
333450 Smith Andy SO 2.00
125634 Hanson Henry FR 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO 3.80
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00
260000 Johnson Stanley ? ?

Slash Asterisk starts a multi-line comment, and Asterisk Slash ends the comment.

Page 29

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Comments for Multi-Lines As Double Dashes Per Line

Comments -- This is how you can make multi-line comments


-- also to express what is going on in the code.
SELECT *
FROM student_table
ORDER BY grade_pt ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
280023 McRoberts Richard JR 1.90
333450 Smith Andy SO 2.00
125634 Hanson Henry FR 2.88
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
231222 Wilson Susie SO 3.80
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00
260000 Johnson Stanley ? ?

You can make multi-line comments with double dashes on each line.

Page 30

Licensed to , [email protected]
Chapter 1 Introduction to SQL

Page 31

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Chapter 2 – The WHERE Clause

“I saw the angel in the marble and carved until I set him free.

- Michelangelo

Page 32

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The WHERE Clause limits Returning Rows

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT first_name, last_name, class_code, grade_pt


FROM student_table
Single or double-quotes are
WHERE first_name = 'Henry' ; used for character data
first_name last_name class_code grade_pt
Henry Hanson FR 2.88

The WHERE Clause filters the rows coming back on the report. So, not all rows will return, just the rows that
qualify. In this example, I am asking for the report to bring back only rows WHERE the first name is Henry.

Page 33

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Numbers Don't Need Single Quotes

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * SELECT *
FROM student_table FROM student_table
WHERE first_name = "Henry" ; WHERE grade_pt = 0.00 ;
Character data needs Numbers never need
single or double-quotes single or double-quotes

Character data (letters) need single quotes, but you need no single quotes for Integers or any other column with a
numeric data type.
Page 34

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Not Equal

select first_name select first_name


,last_name ,last_name
,class_code ,class_code
,grade_pt 1 ,grade_pt 2
FROM student_table FROM student_table
WHERE class_code != 'FR' ; WHERE NOT class_code = 'FR' ;

Not equal Not equal

SELECT * FROM student_table


3
WHERE class_code <> 'FR' ;

first_name last_name class_code grade_pt


Richard McRoberts JR 1.90
Danny Delaney SR 3.35
Martin Phillips SR 3.00
Susie Wilson SO 3.80
Jimmy Bond JR 3.95
Andy Smith SO 2.00

The opposite of equal is NOT equal, and here are three ways you can write the NOT equal syntax.
Page 35

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Searching for null Values Using Equality Returns Nothing

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * This query returns no


data because of the
FROM student_table
equal = sign. Null is
WHERE class_code = null ; equal to nothing

student_id first_name last_name class_code grade_pt

The first thing you need to know about null is it is unknown data. Null is not a zero or spaces. It is missing data.
Since we don’t know what is in null, if you use it with an equal sign, no data will return.

Page 36

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Is NULL

SELECT *
FROM student_table
WHERE class_code IS null ;

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? ?

Below are the keywords that can be


used to interrogate a null value:
1. IS null
2. IS not null

If you are looking for a row that holds a null value, you need to use ‘IS null.’ Using IS null will only bring back the
rows with a null value in the column.

Page 37

Licensed to , [email protected]
Chapter 2 The WHERE Clause

IS Not Null

SELECT *
FROM student_table
WHERE class_code IS not null ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00

If you are looking for a row that does not hold a null value, you need to use ‘IS not null.’ Using IS not null will
only bring back the rows where the column value does not contain a null.

Page 38

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using Greater Than Or Equal To (>=)

SELECT *
FROM student_table
WHERE grade_pt >= 3.0 ;

Greater than
or Equal to

student_id last_name first_name class_code grade_pt


231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95

All rows returned have a grade_pt >= 3.0

The WHERE Clause doesn’t just deal with ‘Equals,’ but other options too. These include GREATER or LESSER
THAN, along with GREATER/LESSER THAN or EQUAL to as well.

Page 39

Licensed to , [email protected]
Chapter 2 The WHERE Clause

AND in the WHERE Clause

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table


WHERE class_code = "FR"
AND first_name = 'Henry' ;

student_id first_name last_name class_code grade_pt


125634 Hanson Henry FR 2.88

Notice the WHERE statement and the word AND. In this example, qualifying rows must have a class_code equal
to ‘FR’ and must also have a first_name of Henry.

Page 40

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Troubleshooting AND

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
No rows qualify. How can a
SELECT * student have two grade points?
FROM student_table
WHERE grade_pt = 3.0 AND grade_pt = 4.0;

student_id first_name last_name class_code grade_pt

What is going wrong here? You are using an AND checking the same column. What you are asking with this
syntax, is to see the rows that have BOTH a grade_pt of 3.0 and a 4.0. That is impossible, so that no rows returned.
Page 41

Licensed to , [email protected]
Chapter 2 The WHERE Clause

OR in the WHERE Clause

SELECT *
FROM student_table
WHERE grade_pt = 3.0
OR grade_pt = 4.0;

student_id last_name first_name class_code grade_pt


234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

The above query brings back rows if the grade_pt is equal to 3.0 or 4.0.

Page 42

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Troubleshooting OR

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * SELECT *
FROM student_table FROM student_table
WHERE grade_pt = 3.0 OR 4.0; WHERE grade_pt = 3.0
OR grade_pt = 4.0;
This is an error
Perfect – OR must always
use the column name again.

The first example is invalid. It does not error but instead returns all rows from the table in the answer set. The
second example is the way to do it.

Page 43

Licensed to , [email protected]
Chapter 2 The WHERE Clause

WHY OR Must Utilize the Column Name Each Time

SELECT *
FROM student_table
WHERE grade_pt = 3.0
OR class_code = 'JR' ;

student_id last_name first_name class_code grade_pt


123250 Phillips Martin SR 3.00
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95

The reason that the column name must be used on both sides of the OR clause if because you can use the same
column or different columns. The system doesn't know what you want to do unless you tell it.

Page 44

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Troubleshooting Character Data

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT *
FROM student_table Error!!!
WHERE grade_pt = 3.0 Why?
AND class_code = SR ;

This query errors, but what is WRONG with this syntax? No single or double quotes around SR.

Page 45

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Troubleshooting Character Data Continued

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table


WHERE grade_pt = 3.0
AND class_code = 'SR' ;
student_id last_name first_name class_code grade_pt
123250 Phillips Martin SR 3.00

Notice that AND separates two different columns, and the data will come back if both are TRUE.

Page 46

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Quiz – How many rows will return?

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table


WHERE grade_pt = 4.0 OR grade_pt = 3.0
AND class_code = 'SR' ;
Which Seniors have a 3.0 or a 4.0 grade_pt average. How many rows will return?
A) 2 C) Error
B) 1 D) 3

How many rows will return from the query above?

Page 47

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Answer to Quiz – How many rows will return?

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table


WHERE grade_pt = 4.0 OR grade_pt = 3.0
AND class_code = 'SR' ;
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

The reason for two rows returning is because of the “order of precedence” in SQL. The next page will explain.

Page 48

Licensed to , [email protected]
Chapter 2 The WHERE Clause

What is the Order of Precedence?

1 ()
2 NOT
3 AND
4 OR
SELECT * FROM student_table
WHERE grade_pt = 4.0 OR grade_pt = 3.0
AND class_code = 'SR' ;

Syntax has an ORDER OF PRECEDENCE. It will first prioritize anything with parentheses around it. Then, the
system will process the NOT statements, followed by the AND statements. Finally, the system will process the OR
Statements. Look at the order of precedence, and you will see why the last query came out odd. Let’s fix it and
bring back only one row.

Page 49

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using Parentheses to change the Order of Precedence

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table Parentheses


WHERE (grade_pt = 3.0 OR grade_pt = 4.0) Evaluated
AND class_code = 'SR' ; First!

student_id last_name first_name class_code grade_pt


123250 Phillips Martin SR 3.00

Using parenthesis is the proper coding technique. Only ONE row comes back because parentheses evaluate first.

Page 50

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using an IN List in Place of OR

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table
The IN
WHERE grade_pt IN (3.0, 4.0) List
AND class_code = 'SR' ;
student_id last_name first_name class_code grade_pt
123250 Phillips Martin SR 3.00

Using an IN List also works to query for a grade_pt of 3.0 or 4.0 AND also have a class_code of ‘SR.’ Only ONE
row comes back here, as well.

Page 51

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The IN List is an Excellent Technique

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table


WHERE grade_pt IN (2.0, 3.0, 4.0) ;

student_id last_name first_name class_code grade_pt


234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
123250 Phillips Martin SR 3.00

Using an IN list is an excellent way to look for multiple values for a column.

Page 52

Licensed to , [email protected]
Chapter 2 The WHERE Clause

IN List vs. OR Brings the Same Results

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT * FROM student_table SELECT *


WHERE grade_pt IN (2.0, 3.0, 4.0) ; FROM student_table
WHERE grade_pt = 2.0
An IN list Both OR grade_pt = 3.0
is a better examples
OR grade_pt = 4.0 ;
technique produce the
same results

The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search a
column for a list of values. Both queries above are equal, but the IN list is an excellent way to keep things
organized and straightforward.
Page 53

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The IN List Can Use Character Data

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Trim removes
leading and
SELECT * FROM student_table
trailing spaces WHERE TRIM(last_name) IN ('Larkins', 'Bond') ;
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
322133 Bond Jimmy JR 3.95

The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the
corresponding column for a list of values. An IN list works with character data as long as you use single quotes.
Notice that we have single quotes for 'Larkins' and 'Bond.'
Page 54

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using a NOT IN List

SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0) ;

SELECT *
FROM student_table
WHERE NOT grade_pt IN (2.0, 3.0, 4.0) ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
324652 Delaney Danny SR 3.35

You can also ask to see the results that are NOT IN your parameter list. That requires the column name and a NOT
IN. Neither the IN nor NOT IN can search for nulls!

Page 55

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Null Values in a NOT IN List Return No Rows

SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0, null) ;

Notice the Notice


NOT IN The Null

student_id last_name first_name class_code grade_pt

now notice that


No data returns!

Few people know that when a NOT IN is used and a null value is present that no data returns. A NOT IN returns
no rows because a null value equals nothing, so it can't compare and eliminate values. The NOT IN issue with
null values is also true with NOT IN subqueries. If there is a null value returns from the bottom query, an IN has no
problems with null values, but a NOT IN returns no data. The next page will teach you a trick to get around this
problem.

Page 56

Licensed to , [email protected]
Chapter 2 The WHERE Clause

A Technique for Handling Nulls with a NOT IN List

SELECT *
FROM STUDENT_TABLE
WHERE GRADE_PT NOT IN (2.0, 3.0, 4.0, NULL) ;

Notice the Notice


NOT IN The Null

STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT

now notice that


No data returns!

Using an OR to bring back the rows with the null value is a great technique to include looking for a null value
when using a NOT IN List.

Page 57

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Technique 2 for Handling Nulls with a NOT IN List

SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0)
AND grade_pt IS not null ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
324652 Delaney Danny SR 3.35
The null row does
NOT come back

You should always use an AND clause to exclude the rows with the null value when using a NOT IN List. back the
rows with the null value is an excellent technique for looking for a null value when using a NOT IN List.

Page 58

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The BETWEEN Statement is Inclusive

SELECT *
FROM student_table
WHERE grade_pt BETWEEN 2.0 AND 4.0 ;

student_id last_name first_name class_code grade_pt


125634 Hanson Henry FR 2.88
231222 Wilson Susie SO 3.80
324652 Delaney Danny SR 3.35
322133 Bond Jimmy JR 3.95
234121 Thomas Wendy FR 4.00
333450 Smith Andy SO 2.00
123250 Phillips Martin SR 3.00
2.0 and 4.0 come back in the
answer set. The BETWEEN
statement is therefore inclusive.

The example above is using a BETWEEN statement. What this allows you to do is see if a column falls in a range.
It is inclusive, meaning that in our example, we will be getting the rows that also have a 2.0 and 4.0 as the grade_pt
value.

Page 59

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The NOT BETWEEN Statement is also Inclusive

SELECT *
FROM student_table
WHERE grade_pt NOT BETWEEN 2.0 AND 4.0 ;

student_id last_name first_name class_code grade_pt


280023 McRoberts Richard JR 1.90
423400 Larkins Michael FR 0.00
2.0 and 4.0 do not come back in the
answer set. The NOT BETWEEN
statement is therefore inclusive.

The example above is using a NOT BETWEEN statement. What this allows you to do is see if a column falls
outside of a range. It is inclusive, meaning that in our example, we will not be getting the rows that also have a 2.0
and 4.0 as the grade_pt value.

Page 60

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The BETWEEN Statement Works for Character Data

Proper case
SELECT *
values work
FROM student_table
WHERE last_name BETWEEN 'L' AND 'Lzzz' ;

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00

The BETWEEN statement works with character data. You need to include single-quotes. If the case is not perfect,
no rows will return.

Page 61

Licensed to , [email protected]
Chapter 2 The WHERE Clause

LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’

SELECT * FROM student_table No rows


WHERE last_name LIKE 'SM%' ; returned

student_id last_name first_name class_code grade_pt

SELECT * FROM student_table No rows


WHERE last_name LIKE 'sm%' ; returned

student_id last_name first_name class_code grade_pt

SELECT * FROM student_table ROWS


WHERE last_name LIKE 'Sm%' ; Returned

student_id last_name first_name class_code grade_pt


333450 Smith Andy SO 2.00

The wildcard percentage sign (%) is a wildcard for any number of characters. We are looking for anyone whose
last name starts with Sm. Databricks insists you match the case. The first two examples return nothing because the
case is wrong. The final example returns rows because the name 'Smith' starts with a capital 'S,' and it is followed
by a lowercase 'm.'

Page 62

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Another Example of UPPER and LOWER

SELECT UPPER(first_name)
,LOWER(last_name)
FROM student_table
ORDER BY first_name
Up low
ANDY smith
DANNY delaney
HENRY hanson
JIMMY bond
MARTIN phillips
MICHAEL larkins
RICHARD mcroberts
STANLEY johnson
SUSIE wilson
WENDY thomas

When you use the UPPER command, the column value will be in all uppercase, and when you use the LOWER
command, the column value will be in all lowercase. Thus, the UPPER and LOWER functions are excellent for
WHERE clause comparisons or results with all UPPER or LOWER characters.

Page 63

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using LIKE for all Cases with Lower and Upper

SELECT * FROM student_table


WHERE UPPER(last_name) LIKE 'SM%' ;
student_id last_name first_name class_code grade_pt
333450 Smith Andy SO 2.00

SELECT * FROM student_table


WHERE LOWER(last_name) LIKE 'sm%' ;
student_id last_name first_name class_code grade_pt
333450 Smith Andy SO 2.00

When you use the UPPER command, the column value will be in all uppercase for the comparison. When you use
the LOWER command, the column value will be in all lowercase for the comparison. Notice that in both
examples, 'Smith' was returned because it met the criteria, but it did not come back in the answer set as upper or
lowercase.

Page 64

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Using ILIKE Handle Case Issues

SELECT * FROM Student_Table No rows returned.


WHERE ( last_name) LIKE 'SM%' ; CASE Issues.

student_id last_name first_name class_code grade_pt

SELECT * FROM Student_Table ILIKE removes the


WHERE ( last_name) ILIKE 'SM%' ; case restrictions

student_id last_name first_name class_code grade_pt


333450 Smith Andy SO 2.00

You can use the ILIKE command instead of the LIKE command to get around the case issues.

Page 65

Licensed to , [email protected]
Chapter 2 The WHERE Clause

LIKE command Underscore is Wildcard for one Character

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table Show me anyone with an 'a' as
WHERE last_name LIKE '_a%' ; the 2nd letter in their last_name

student_id last_name first_name class_code grade_pt


423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88

The underscore wildcard represents one character. Our search finds anyone that has an ‘a’ as the second letter in
the last name.
Page 66

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Finding Anyone Whose name End in 'Y'

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table WHERE first_name LIKE '%y' ;
student_id last_name first_name class_code grade_pt
125634 Hanson Henry FR 2.88
322133 Bond Jimmy JR 3.95
324652 Delaney Danny SR 3.35
333450 Smith Andy SO 2.00
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00

Above, our example finds anyone who has a first name that ends in a 'Y.' The data type of the first name is
varchar(12). The search works on the last name as well, which has a data type of CHAR(20).
Page 67

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Escape Character in the LIKE Command changes Wildcards

Pretend student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00
999999 T_ S% FR 1.90

/* We just pretended to add a new row to the student_table */


/* Can you use the LIKE command to find S% above? */

Here you will have to utilize a Wildcard Escape Character. Turn the page for more.

Page 68

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Escape Characters Turn off Wildcards in the LIKE Command

Pretend student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00
999999 T_ S% FR 1.90
SELECT * FROM student_table WHERE first_name LIKE 'S@%' Escape '@';
student_id last_name first_name class_code grade_pt
1 T_ S% FR 1.90

We can pick our Escape character, and we have chosen the @ sign. Anything following an @ sign turns the
wildcard off for one character, so we find ‘S%’, without bringing back Stanley or Susie.
Page 69

Licensed to , [email protected]
Chapter 2 The WHERE Clause

The REPLACE Function

Syntax: REPLACE(string, substring1, substring2)

SELECT customer_name The REPLACE


,REPLACE (customer_name, ' ', '_') AS Under_Score function replaces all
,phone_number occurrences of
,REPLACE (phone_number, '-', ' ') AS No_Dash substring1 in the string
FROM customer_table with substring2.

customer_name under_score phone_number no_dash


Billy's Best Choice Billy's_Best_Choice 555-1234 555 1234
Acme Products Acme_Products 555-1111 555 1111
ACE Consulting ACE_Consulting 555-1212 555 1212
XYZ Plumbing XYZ_Plumbing 347-8954 347 8954
Databases N-U Databases_N-U 322-1012 322 1012

Replace spaces with underscores Replace dashes with spaces

The RELACE function replaces a value for another in a string. Above, we have replaced the spaces in a Customer
Name with underscores. In the Phone Number, we have replaced the dashes (-) with space.

Page 70

Licensed to , [email protected]
Chapter 2 The WHERE Clause

Page 71

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Chapter 3 – Distinct, Group By and Top

"A bird does not sing because it has the answers, it sings because it has a song."

- Anonymous

Page 72

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

The Distinct Command

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
class_code
SELECT Distinct class_code ? Distinct
FROM student_table FR won't
ORDER BY 1; SO repeat
JR duplicate
SR values

The DISTINCT keyword in the example above means to eliminate duplicate values.

Page 73

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Distinct vs. GROUP BY

SELECT class_code SELECT


FROM student_table Distinct class_code
GROUP BY class_code FROM student_table
ORDER BY 1; ORDER BY 1;

class_code
? Both
examples
FR
produce the
JR exact same
SO result
SR

Rules for Distinct Vs. GROUP BY


(1) Many Duplicates – use GROUP BY
(2) Few Duplicates – use DISTINCT
(3) Space Exceeded - use GROUP BY

The Distinct and GROUP BY command examples above return the same answer set.

Page 74

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Quiz – How many rows come back from the Distinct?

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT Distinct class_code, grade_pt


FROM student_table
ORDER BY class_code, grade_pt;

How many rows will come back from the above SQL?

Page 75

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Answer – How many rows come back from the Distinct?

SELECT Distinct class_code, grade_pt


FROM student_table
ORDER BY class_code, grade_pt ;

class_code grade_pt
? ?
FR 0.00 No Rows have
FR 2.88 the exact same
FR 4.00 values for both
JR 1.90 the class_code
JR 3.95 and grade_pt.
Each row is
SO 2.00 Distinct!
SO 3.80
SR 3.00
SR 3.35

How many rows will come back from the above SQL? 10. All rows came back. Why? Because there are no exact
duplicates that contain a duplicate class_code and Duplicate grade_pt combined. Each row in the SELECT list is
distinct.

Page 76

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Top Command

STUDENT_TABLE
STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT TOP 3 last_name class_code grade_pt
last_name
Wilson SO 3.80
,class_code
Bond JR 3.95
,grade_pt Smith SO 2.00
FROM student_table

In the above example, we brought back three rows only. We brought back three rows because of the TOP 3
statement, which means to get an answer set and then bring back the first three rows in that answer set. Because
this example does not have an ORDER BY statement, you can consider this example as merely getting three
random rows.
Page 77

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Top Command and Order By

STUDENT_TABLE
STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT TOP 3 last_name, last_name class_code grade_pt
class_code, Thomas FR 4.00
grade_pt Bond JR 3.95
FROM student_table Wilson SO 3.80
ORDER BY grade_pt DESC

We are now returning the students with the top three grade point averages because we use the ORDER BY
statement. Databricks orders the data first and then uses the TOP command.

Page 78

Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top

Page 79

Licensed to , [email protected]
Chapter 4 Aggregation

Chapter 4 – Aggregation

" Databricks climbed Aggregate Mountain and delivered a better way to Sum
It.” "

- Tera-Tom Coffing

Page 80

Licensed to , [email protected]
Chapter 4 Aggregation

Quiz – You calculate the Answer Set in your Mind

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

SELECT Avg(grade_pt) AS "AVG"


,Count(grade_pt) AS "Count"
,Count(*) AS "Count *"
FROM student_table
WHERE class_code IS null

AVG Count Count *

What would the result set be from the above query? The next slide shows answers!

Page 81

Licensed to , [email protected]
Chapter 4 Aggregation

Quiz 2 – Calculate the Answer Set in your Mind

Pretend Aggregation_Table
SELECT
employee_no salary AVG(salary) as "AVG"
423400 100000.00 ,Count(salary) as SalCnt
423401 100000.00 ,Count(*) as RowCnt
423402 null FROM Aggregation_Table ;

1) Aggregates Ignore Null Values.

2) Aggregates WANT to come back in one row.

3) You CAN’T mix Aggregates with normal columns without a GROUP BY.

Look at the pretend table and the query in yellow and calculate the answer in your mind. Remember that
aggregates ignore null values.

Page 82

Licensed to , [email protected]
Chapter 4 Aggregation

Answer - Quiz 2 – Calculate the Answer Set in your Mind

Pretend Aggregation_Table
SELECT
employee_no salary AVG(salary) as "AVG"
423400 100000.00 ,Count(salary) as SalCnt
423401 100000.00 ,Count(*) as RowCnt
423402 null FROM Aggregation_Table ;

1) Aggregates Ignore Null Values.

2) Aggregates WANT to come back in one row.

3) You CAN’T mix Aggregates with normal columns without a GROUP BY.

AVG(salary) = $100000.00 Count(salary) = 2 Count(*) = 3

Remember, aggregates ignore null values. Aggregates usually deliver answer sets that are only one row. You can
only have a non-aggregate with aggregates if you use a GROUP BY statement. The answers are above.

Page 83

Licensed to , [email protected]
Chapter 4 Aggregation

There are Five Aggregates

There are FIVE AGGREGATES which are the following:


MIN – The Minimum Value.
MAX – The Maximum Value.
AVG – The Average of the Column Values.
SUM – The Sum Total of the Column Values.
COUNT – The Count of the Column Values.

SELECT MIN (salary) AS min


,MAX (salary) AS max
,SUM (salary) AS sum
,AVG (salary) AS avg
,Count(*) AS count_rows
FROM employee_table ;

min max sum avg count_rows


32800.50 64300.00 421039.38 46782.153333 9

The five aggregates are listed above.

Page 84

Licensed to , [email protected]
Chapter 4 Aggregation

Quiz – How many rows come back?

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT SUM (salary)


,AVG (salary) How many
rows come
,COUNT(*)
back?
FROM employee_table ;

How many rows will the above query produce in the result set?

Page 85

Licensed to , [email protected]
Chapter 4 Aggregation

Answer – How many rows come back?

SELECT SUM (salary)


,AVG (salary)
,Count(*)
FROM employee_table ;

EXPR_ 1 EXPR_2 EXPR_3 Only one


421039.38 46782.153333 9 row comes
back

How many rows will the above query produce in the result set? The answer is one.

Page 86

Licensed to , [email protected]
Chapter 4 Aggregation

Casting a Data Type

SELECT
CAST(SUM (salary) as DECIMAL(8,1)) as Sum
,CAST(AVG(salary) as DECIMAL(8,2)) as Avg
,Count(*) as Count
,CAST(MIN(salary) as DECIMAL(9,3)) as Min
,CAST(MAX(salary) as DECIMAL(10,4)) as Max
FROM employee_table ;

Sum Avg Count Min Max


421039.4 46782.15 9 32800.500 64300.0000

The CAST (Convert and Store) command is in the example above. CAST changes the data type for the life of the
query.

Page 87

Licensed to , [email protected]
Chapter 4 Aggregation

Troubleshooting Aggregates

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT dept_no NON-Aggregate


,MIN (salary)
,MAX (salary)
,SUM (salary)
Error
,AVG (salary)
,COUNT(*) This needs a
FROM employee_table GROUP BY dept_no

If you have a regular column (nonaggregate) in your query, you must have a corresponding GROUP BY statement.
Page 88

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY Delivers One Row Per Group

SELECT dept_no NON-Aggregate


,MIN (salary)
,MAX (salary)
,SUM (salary)
,AVG (salary)
,Count(*)
FROM employee_table
Group By
Needed GROUP BY dept_no
ORDER BY dept_no nulls last ;
dept_no min(salary) max(salary) sum(salary) avg(salary) count(1)
10 64300.00 64300.00 64300.00 64300.000000 1
100 48850.00 48850.00 48850.00 48850.000000 1
200 41888.88 48000.00 89888.88 44944.440000 2
300 40200.00 40200.00 40200.00 40200.000000 1
400 36000.00 54500.00 145000.00 48333.333333 3
? 32800.50 32800.50 32800.50 32800.500000 1

Group By dept_no command allows for the calculation of aggregates per dept_no. The data has also been sorted
with the ORDER BY statement. Notice we used the NULLS LAST command to put the null dept_no last in our
sorting.

Page 89

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY dept_no or GROUP BY Column Number

SELECT dept_no SELECT dept_no


,MIN (salary) ,MIN (salary)
,MAX (salary) ,MAX (salary)
,SUM (salary) ,SUM (salary)
Both Queries
,AVG (salary) are exactly ,AVG (salary)
,Count(*) the same ,Count(*)
FROM employee_table FROM employee_table
GROUP BY dept_no GROUP BY 1
ORDER BY dept_no nulls last; ORDER BY 1 nulls last;

dept_no min(salary) max (salary) sum (salary) avg (salary) count(1)


10 64300.00 64300.00 64300.00 64300.000000 1
100 48850.00 48850.00 48850.00 48850.000000 1
200 41888.88 48000.00 89888.88 44944.440000 2
300 40200.00 40200.00 40200.00 40200.000000 1
400 36000.00 54500.00 145000.00 48333.333333 3
? 32800.50 32800.50 32800.50 32800.500000 1

Both queries above produce the same result. The GROUP BY allows you to either name the column or use the
number in the SELECT list, just like the ORDER BY statement. The only two commands that can use the column
number are the ORDER BY and GROUP BY statements.

Page 90

Licensed to , [email protected]
Chapter 4 Aggregation

Limiting Rows and Improving Performance with WHERE

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT dept_no, MIN (salary), MAX (salary), SUM (salary)


,AVG (salary) , COUNT(*)
FROM employee_table WHERE Clause acts
WHERE dept_no IN (200, 400) as a filter before any
GROUP BY dept_no Calculations are done
ORDER BY 1 ;

Will dept_no 300 be part of the calculation? Of course, you know it will NOT!
Page 91

Licensed to , [email protected]
Chapter 4 Aggregation

WHERE Clause in Aggregation limits unneeded Calculations

Alias
SELECT dept_no, MIN (salary) min, MAX (salary) max,
SUM (salary) sum ,AVG (salary) avg , COUNT(*) count
FROM employee_table WHERE Clause acts
WHERE dept_no IN (200, 400) as a filter before any
GROUP BY dept_no Calculations are done
ORDER BY 1 ;

dept_no min max sum avg count


200 41888.88 48000.00 89888.88 44944.440000 2
400 36000.00 54500.00 145000.00 48333.333333 3

The system eliminates reading any other dept_no values other than 200 and 400. Reducing values means that only
dept_no values of 200 and 400 will come off the disk for the calculation.

Page 92

Licensed to , [email protected]
Chapter 4 Aggregation

Keyword HAVING tests Aggregates after they are Totaled

SELECT dept_no, MIN (salary) min, MAX (salary) max,


SUM (salary) sum, AVG (salary) avg , COUNT(*) count
FROM employee_table WHERE dept_no in (200, 400)
GROUP BY dept_no HAVING Clause acts as a filter on
HAVING Count(*) > 2 ; all Aggregates after they are totaled.

Previous Answer Set (Without the Having Statement)


dept_no min max sum avg count
200 41888.88 48000.00 89888.88 48333.333333 2
400 36000.00 54500.00 145000.00 44944.440000 3

NEW Answer Set Can you calculate what the new


Answer Set will be after the
?????????????? HAVING Clause is implemented?

The HAVING Clause only works on Aggregate Totals. The WHERE filters rows to be excluded from the
calculation, but the HAVING filters the Aggregate totals after the calculations, thus eliminating individual
Aggregate totals.

Page 93

Licensed to , [email protected]
Chapter 4 Aggregation

Keyword HAVING is like an Extra WHERE Clause for Totals

SELECT dept_no, MIN (salary) min, MAX (salary) max,


SUM (salary) sum, AVG (salary) avg , COUNT(*) count
FROM employee_table WHERE dept_no in (200, 400)
GROUP BY dept_no
HAVING Count(*) > 2 ; HAVING Clause acts as a filter on
all Aggregates after they are totaled.

Previous Answer Set (Without the Having Statement)


dept_no min max sum avg count
200 41888.88 48000.00 89888.88 48333.333333 2
400 36000.00 54500.00 145000.00 44944.440000 3

Current Answer Set with HAVING calculating


dept_no min max sum avg count
400 36000.00 54500.00 145000.00 48333.333333 3

The HAVING Clause only works on Aggregate Totals after they are totaled. It is a final check after aggregation is
complete. Now only the totals with Count (*) > two can return.

Page 94

Licensed to , [email protected]
Chapter 4 Aggregation

ANY_VALUE

SELECT We only group by


c.customer_number c.customer_number.
,ANY_VALUE(c.customer_name) We don't have to include
c.customer_name.
,SUM(o.order_total) AS sum_orders
FROM customer_table AS c
INNER JOIN
order_table AS o
ON c.customer_number = o.customer_number
GROUP BY c.customer_number

customer_number any_value(customer_name) sum_orders


57896883 XYZ Plumbing 23454.84
31323134 ACE Consulting 5111.47
87323456 Databases N-U 15231.62
11111111 Billy's Best Choice 20353.44

Any_Value returns some value of the expression from the group. The result is non-deterministic. ANY_VALUE
simplifies and optimizes the performance of GROUP BY statements. The problem with aggregation is all non-
aggregate columns must be included in a GROUP BY statement. Any_Value eliminates the need to GROUP BY
all columns.

Page 95

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY GROUPING SETS

SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY GROUPING SETS (product_id, mth, yr)
ORDER BY product_id, mth, yr;

product_id mth yr sum_daily_sales


? ? 2000 862404.35
? 9 ? 418769.36
? 10 ? 443634.99
3000 ? ? 331204.72
2000 ? ? 306611.81
1000 ? ? 224587.82

Be prepared to be amazed. There are three advanced options listed above for grouping data. Each is more powerful
than the one before. The following pages will give great examples. GROUP BY GROUPING Sets above will show
you the DAILY_SALES for each PRODUCT_ID, each month, and year. It is like three separate reports in one.

Page 96

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY ROLLUP

SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY ROLLUP (product_id, mth, yr)
ORDER BY product_id, mth, yr;

Answer set is on the next page

Check out the answer set and explanation on the next page.

Page 97

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY ROLLUP Answer Set

product_id mth yr sum_daily_sales


? ? ? 862404.35
1000 ? ? 331204.72
1000 9 ? 139350.69
1000 9 2000 139350.69
1000 10 ? 191854.03
1000 10 2000 191854.03
2000 ? ? 306611.81
2000 9 ? 139738.91
2000 9 2000 139738.91
2000 10 ? 166872.90
2000 10 2000 166872.90
3000 ? ? 224587.82
3000 9 ? 139679.76
3000 9 2000 139679.76
3000 10 ? 84908.06
3000 10 2000 84908.06

GROUP BY ROLLUP displays the DAILY_SALES for each PRODUCT_ID, distinct month, month per year, each
year, plus a total. Above, we have the answer set.

Page 98

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY CUBE

SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY CUBE (product_id, mth, yr)
ORDER BY product_id, mth, yr;

Answer set is on the next page

GROUP BY Cube displays the DAILY_SALES for each PRODUCT_ID, distinct month, month per year, year,
plus a total.

Page 99

Licensed to , [email protected]
Chapter 4 Aggregation

GROUP BY CUBE Answer Set


product_id mth yr sum_daily_sales
? ? ? 862404.35
? ? 2000 862404.35
? 9 ? 418769.36
? 9 2000 418769.36
? 10 ? 443634.99
? 10 2000 443634.99
1000 ? ? 331204.72
1000 ? 2000 331204.72
1000 9 ? 139350.69
1000 9 2000 139350.69
1000 10 ? 191854.03
1000 10 2000 191854.03
2000 ? ? 306611.81
2000 ? 2000 306611.81
2000 9 ? 139738.91
2000 9 2000 139738.91
2000 10 ? 166872.90
2000 10 2000 166872.90
3000 ? ? 224587.82
3000 ? 2000 224587.82
3000 9 ? 139679.76
3000 9 2000 139679.76
3000 10 ? 84908.06
3000 10 2000 84908.06

Above, we have the answer set. GROUP BY Cube displays the DAILY_SALES for each PRODUCT_ID, distinct
month, month per year, each year, plus a total.
Page 100

Licensed to , [email protected]
Chapter 4 Aggregation

Page 101

Licensed to , [email protected]
Chapter 5 Joining Tables

Chapter 5 – Joining Tables

"The man who doesn't read Tera-Tom books has no advantage over the man
who can't read them."

- Mark Twain

Page 102

Licensed to , [email protected]
Chapter 5 Joining Tables

Nexus Builds Your Join SQL Automatically

Why write SQL when Nexus automatically does it for you? Plus, you can edit the SQL if you desire. Watch the
YouTube video of the Nexus Super Join Builder with this link: https://youtu.be/ARwo9prucw0.

Page 103

Licensed to , [email protected]
Chapter 5 Joining Tables

A Two-Table Join Using Traditional Syntax

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT customer_table.customer_number Since the column


,customer_name customer_number is in
,order_number both tables. It must be
,order_total fully qualified with the
FROM customer_table, table name or it errors.
order_table
WHERE customer_table.customer_number = order_table.customer_number ;

customer_number is the column that has matching


data in both tables. This is called the "Join Condition"

A Join combines columns on the report from more than one table. The example above joins the customer_table
and the order_table together. The most complicated part of any join is the JOIN CONDITION.

Page 104

Licensed to , [email protected]
Chapter 5 Joining Tables

Two-Table join using Traditional Syntax with Table Alias

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT
The column Cust.customer_number
customer_number ,customer_name We alias the table
is in both tables. ,order_number names to shorten
It must be ,order_total the typing when
fully qualified, fully qualifying a
FROM customer_table as Cust,
or the column.
query will error.
order_table as ORD
WHERE Cust.customer_number = Ord.customer_number

A Join combines columns on the report from more than one table. The example above joins the customer_table
and the order_table together. The most complicated part of any join is the JOIN CONDITION. The JOIN
CONDITION means what Column from each table is a match. In this case, customer_number is a match that
establishes the relationship.
Page 105

Licensed to , [email protected]
Chapter 5 Joining Tables

You Can Fully Qualify All Columns

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT A good practice is


The column Cust.customer_number
customer_number to fully qualify all
,Cust.customer_name columns in the
is in both tables. It
must be fully ,Ord.order_number SELECT list, for
qualified or the ,Ord.order_total clarity to other
query will error. FROM customer_table as Cust, users.
order_table as ORD
WHERE Cust.customer_number = Ord.customer_number

The customer_number is a column in both the Customer and Order Tables. Cust.customer_number fully qualifies
the column is in the customer_table. That is why we alias the table names, so we can fully qualify any columns in
both tables with minimal typing.

Page 106

Licensed to , [email protected]
Chapter 5 Joining Tables

A Two-Table Join Using ANSI Syntax

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT Cust.customer_number,
customer_name,
order_number,
order_total
FROM customer_table as Cust INNER JOIN
INNER JOIN Keyword
ON Keyword replaces
is used order_table as ORD
the comma
instead of ON Cust.customer_number
WHERE = Ord.customer_number ;

The example above is the same join as the previous example, except it is using ANSI syntax. Both traditional and
ANS syntax return the same rows with the same performance. Rows join when the customer_number matches on
both tables, but non-matches won’t return.

Page 107

Licensed to , [email protected]
Chapter 5 Joining Tables

Both Queries have the same Results and Performance

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Traditional Syntax ANSI Syntax


SELECT Cust.customer_number, SELECT Cust.customer_number,
customer_name, customer_name,
order_number, order_number,
order_total order_total
FROM customer_table as Cust, FROM customer_table as Cust
order_table as ORD INNER JOIN order_table as ORD
WHERE Cust.customer_number ON Cust.customer_number
= Ord.customer_number ; = Ord.customer_number ;

Both syntax techniques bring back the same result set and have the same performance. The INNER JOIN is
considered ANSI. Which one does an Outer Join?

Page 108

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Can You Finish the Join Syntax?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name, last_name,


department_name
FROM employee_table as E
INNER JOIN
department_table as D
ON Finish the Join

Finish this join by placing the missing SQL in the proper place!

Page 109

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Quiz – Can You Finish the Join Syntax?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00 Primary
1256349 400 Harrison Herbert 54500.00 Key

Foreign Key

SELECT first_name, last_name,


department_name
FROM employee_table as E dept_no is the column that
INNER JOIN both tables have in common.
department_table as D This is called a Primary
ON E.dept_no = D.dept_no ; Key/Foreign Key relationship

This query is ready to run.

Page 110

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Can You Find the Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON E.dept_no = D.dept_no ;

Quiz – Can You Find the Error?


Page 111

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Quiz – Can You Find the Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
The column dept_no ,last_name
is in both tables. It
,E.dept_no
needs to be
fully qualified as ,department_name
E.dept_no or FROM employee_table as E
D.dept_no INNER JOIN
department_table as D
ON E.dept_no = D.dept_no ;

If a column in the SELECT list is in both tables, you must fully qualify it.

Page 112

Licensed to , [email protected]
Chapter 5 Joining Tables

Super Quiz – Can You Find the Difficult Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON E.dept_no = D.dept_no ;

This query has an error! Can you find it?


Page 113

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Quiz – Can You Find the Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
The column dept_no ,last_name
is in both tables. It
,E.dept_no
needs to be
fully qualified as ,department_name
E.dept_no or FROM employee_table as E
D.dept_no INNER JOIN
department_table as D
ON E.dept_no = D.dept_no ;

If a column in the SELECT list is in both tables, you must fully qualify it.

Page 114

Licensed to , [email protected]
Chapter 5 Joining Tables

Super Quiz – Can You Find the Difficult Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,E.dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON employee_table.dept_no = D.dept_no ;

This query has an error! Can you find it?

Page 115

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Super Quiz – Can You Find the Difficult Error?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name, last_name, E.dept_no ,department_name


FROM employee_table as E
Once you alias a
INNER JOIN table (as E)
department_table as D
ON employee_table.dept_no = D.dept_no ;
You must fully qualify with E.dept_no (Not employee_table.dept_no)
(This query thinks there are three tables (E, D, and employee_table)

Once you alias a table, you must fully qualify columns with the table alias. The system thinks there are additional
tables, and the query will error.

Page 116

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Which Rows from Both Tables Won’t Return?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT E.first_name
,E.last_name This inner join will
,D.department_name return all rows that
have a matching
FROM employee_table as E
dept_no in both tables.
INNER JOIN Which rows won't
department_table as D return?
ON E.dept_no = D.dept_no ;

An Inner Join returns matching rows, but did you know an Outer Join returns both matching rows and non-
matching rows? You will understand soon! What rows above are not part of the answer set?

Page 117

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Quiz – Which rows from both tables Won’t Return?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT E.first_name
,E.last_name 1 Squiggy Jones has a null dept_no
,D.department_name
FROM employee_table as E 2 Richard Smythe has an invalid dept_no 10
INNER JOIN
department_table as D 3 No employees work in Department 500
ON E.dept_no = D.dept_no ;

The bottom line is that the three rows excluded do not have a matching dept_no.

Page 118

Licensed to , [email protected]
Chapter 5 Joining Tables

Left Outer Join

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT E.first_name
1st Table ,D.department_name Since we are doing a
after FROM FROM employee_table as E Left Outer Join, the
is always the employee_table is
LEFT OUTER JOIN
LEFT Table referred to as the
department_table as D
ON E.dept_no = D.dept_no ; outer table.

The SQL above is a LEFT OUTER JOIN. That means that all rows from the LEFT table will appear in the report
regardless of if it finds a match on the right table.

Page 119

Licensed to , [email protected]
Chapter 5 Joining Tables

Left Outer Join Results

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name department_name
Mandee Marketing The matching
SELECT E.first_name Herbert Customer Support rows return
,D.department_name William Customer Support just like an
Loraine Sales
FROM employee_table as E Nulls show
inner join, but
Squiggy ? orphaned
LEFT OUTER JOIN Richard ? mismatches
department_table as D rows from the
Cletus Customer Support Left table
ON E.dept_no = D.dept_no ; Billy Research and Dev also return.
John Research and Dev

A LEFT Outer Join Returns all rows from the LEFT table, including all Matches. If a LEFT row can’t find a
match, null is placed on the right columns not found!
Page 120

Licensed to , [email protected]
Chapter 5 Joining Tables

Right Outer Join

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT E.first_name Since we are


,D.department_name doing a
FROM employee_table as E Right Outer Join,
2nd Table the
RIGHT OUTER JOIN department_table
after FROM
is always the department_table as D is referred to as
RIGHT Table ON E.dept_no = D.dept_no ; the outer table.

The SQL above is a RIGHT OUTER JOIN. That means that all rows from the RIGHT table will appear in the
report regardless of if it finds a match with the LEFT Table.

Page 121

Licensed to , [email protected]
Chapter 5 Joining Tables

Right Outer Join Example and Results

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

first_name department_name
SELECT E.first_name The matching
Mandee Marketing
,D.department_name Herbert Customer Support rows return
FROM employee_table as E William Customer Support just like an
RIGHT OUTER JOIN Loraine Sales inner join, but
department_table as D Cletus Customer Support orphaned
Billy Research and Dev rows from the
ON E.dept_no = D.dept_no ; Right table
John Research and Dev
Nulls show mismatches ? Human Resources also return.

All rows from the Right Table return. The rows with matches and dept_no 500 because it was in the right table but
didn’t have a match. The system puts a null Value for Left Column values.
Page 122

Licensed to , [email protected]
Chapter 5 Joining Tables

Full Outer Join

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT E.first_name
,D.department_name Since we are doing a
FROM employee_table as E Full Outer Join, both
FULL OUTER JOIN tables are referred to
department_table as D as the outer table.
ON E.dept_no = D.dept_no ;

The SQL above is a Full Outer Join. That means that all rows from both the RIGHT and LEFT Table will appear in
the report regardless of if it finds a match.

Page 123

Licensed to , [email protected]
Chapter 5 Joining Tables

Full Outer Join Results

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name department_name
SELECT E.first_name Mandee Marketing
,D.department_name Herbert Customer Support
FROM employee_table as E William Customer Support
Loraine Sales
FULL OUTER JOIN Squiggy ?
department_table as D Richard ?
ON E.dept_no = D.dept_no ; Cletus Customer Support
Billy Research and Dev
All rows return from both tables
John Research and Dev
on a Full Outer Join
? Human Resources

The FULL Outer Join Returns all rows from both Tables. The nulls show the flaws!
Page 124

Licensed to , [email protected]
Chapter 5 Joining Tables

Which Tables are Left Tables and Which are Right?

SELECT Cla.claim_id, Fill in the blank. Is the


Cla.claim_date, table a Left Table or a
Right Table?
SUB.last_name,
SUB.first_name, claims ________
"ADD".phone, providers ________
SER.service_pay, services ________
PRO.provider_code, subscribers _______
PRO.provider_name addresses ________
FROM claims Cla
LEFT OUTER JOIN providers PRO
ON Cla.provider_no = PRO.provider_code
LEFT OUTER JOIN services SER
ON Cla.claim_service = SER.service_code
LEFT OUTER JOIN subscribers SUB
ON Cla.subscriber_no = SUB.subscriber_no
AND Cla.member_no = SUB.member_no
LEFT OUTER JOIN addresses "ADD"
ON SUB.subscriber_no = "ADD".subscriber_no;

Your mission is to show which tables are left tables and which ones are right tables.

Page 125

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer - Which Tables are Left Tables and Which are Right?

SELECT Cla.claim_id, Fill in the blank. Is


Cla.claim_date, the table a Left
Table or a Right
SUB.last_name,
Table?
SUB.first_name,
"ADD".phone, claims Left
SER.service_pay, providers Right
PRO.provider_code, services Right There is always
PRO.provider_name subscribers Right only one Left table
addresses Right (the first table after
FROM claims Cla
LEFT OUTER JOIN providers PRO the FROM clause)
ON Cla.provider_no = PRO.provider_code All tables after the
LEFT OUTER JOIN services SER first table are each
ON Cla.claim_service = SER.service_code Right Tables.
LEFT OUTER JOIN subscribers SUB
Tables are joined
ON Cla.subscriber_no = SUB.subscriber_no two at a time. The
AND Cla.member_no = SUB.member_no result from each
LEFT OUTER JOIN addresses "ADD" join remains the
ON SUB.subscriber_no = "ADD".subscriber_no; Left Table

The first table is always the left table, and all remaining tables are the right tables. It is the intermediate results
from each join that remain the left table. In this case, all rows will return from the claims table.

Page 126

Licensed to , [email protected]
Chapter 5 Joining Tables

INNER JOIN with Additional AND Clause

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,department_name
FROM employee_table as E,
department_table as D
WHERE E.dept_no = D.dept_no
AND department_name like 'Marke%' ;

The additional AND performs first to eliminate unwanted data, so the join is less intensive than joining everything
first and then removing rows that don't qualify.

Page 127

Licensed to , [email protected]
Chapter 5 Joining Tables

ANSI INNER JOIN with Additional AND Clause

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name, last_name, department_name


FROM employee_table as E
INNER JOIN
department_table as D
ON E.dept_no = D.dept_no
AND department_name like 'Marke%' ;

The additional AND performs first to eliminate unwanted data, so the join is less intensive than joining everything
first and then removing after.

Page 128

Licensed to , [email protected]
Chapter 5 Joining Tables

ANSI INNER JOIN with Additional WHERE Clause

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,department_name
FROM employee_table as E
INNER JOIN
department_table as D
ON E.dept_no = D.dept_no
WHERE department_name like 'Marke%' ;

The additional WHERE is performed first to eliminate unwanted data, so the join is less intensive than joining
everything first and then eliminating.
Page 129

Licensed to , [email protected]
Chapter 5 Joining Tables

OUTER JOIN with Additional WHERE Clause

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name,
first_name department_name
department_name
FROM employee_table as E Mandee Marketing
LEFT OUTER JOIN
department_table as D
ON E.dept_no = D.dept_no Only Mandee Chambers
WHERE E.dept_no = 100 ; is in dept_no 100

The additional WHERE is always performed last on Outer Joins. All rows join first, and then the WHERE clause
filters after the join takes place.

Page 130

Licensed to , [email protected]
Chapter 5 Joining Tables

OUTER JOIN with Additional AND Clause

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name dname
OUTER Join with additional AND Clause
Mandee Marketing
SELECT first_name Herbert ?
,department_name AS dname William ?
FROM employee_table as E Loraine ?
LEFT OUTER JOIN Squiggy ?
Richard ?
department_table as D
Cletus ?
ON E.dept_no = D.dept_no Billy ?
AND E.dept_no = 100 ; John ?

The additional AND performs in conjunction with the ON statement on Outer Joins. Only Mandee is in dept_no
100!
Page 131

Licensed to , [email protected]
Chapter 5 Joining Tables

The DREADED Product Join

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name
,last_name
,department_name
No Join
Condition FROM employee_table as E,
Linking the department_table as D
Two Tables! WHERE department_name like '%m%'
Order by 1, 2, 3;

A Product Join is often a mistake! Three Department rows have an ‘m’ in their department_name. These join to
every employee, and the information is worthless.

Page 132

Licensed to , [email protected]
Chapter 5 Joining Tables

The DREADED Product Join Results

SELECT first_name
,last_name
,department_name
No Join
Condition FROM employee_table as E,
Linking the department_table as D
Two Tables! WHERE department_name ilike '%m%'
Order by 1, 2, 3;

first_name last_name department_name 27 Rows came


Billy Coffing Customer Support back. Nine
Billy Coffing Human Resources employees
Billy Coffing Marketing with each
Cletus Strickling Customer Support working
Cletus Strickling Human Resources in three
Not all
Cletus Strickling Marketing different
rows are
Herbert Harrison Customer Support departments.
displayed
Herbert Harrison Human Resources This data is
Herbert Harrison Marketing WRONG!

A Product Join is often a mistake! Three Department rows had the letter ‘m’ in their department_name. These join
to every employee and the information is worthless.

Page 133

Licensed to , [email protected]
Chapter 5 Joining Tables

Cartesian Product Join with Traditional Syntax

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name This joins every row


,last_name from one table to every
row of another table.
,department_name
No WHERE 9 rows multiplied by 5
FROM employee_table as E, rows = 45 rows of
Clause in
department_table as D complete nonsense!
the join!

The SQL above joins every row from one table to every row of another table. Nine rows multiplied by five rows =
45 rows of complete nonsense!

Page 134

Licensed to , [email protected]
Chapter 5 Joining Tables

Cartesian Product Join with ANSI Syntax

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT first_name This joins every row from


,last_name one table to every row of
,department_name another table.
No ON FROM employee_table as E 9 rows multiplied by 5
Clause in INNER JOIN rows = 45 rows of
the join! department_table as D complete nonsense!

The syntax above produces a cartesian join.

Page 135

Licensed to , [email protected]
Chapter 5 Joining Tables

The CROSS JOIN

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

A Cross Join
SELECT customer_name,
is the ANSI order_number
equivalent to FROM customer_table
a Product Join CROSS JOIN
order_table
Only a WHERE clause
WHERE order_number = 123456
will work.
An ON clause will NOT! ORDER BY 1 ;

This query becomes a Product Join because a Cross Join is an ANSI Product Join. It will compare every row from
the customer_table to order_number 123456 in the order_table. Check out the Answer Set on the next page.

Page 136

Licensed to , [email protected]
Chapter 5 Joining Tables

The CROSS JOIN Answer Set

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Answer Set
SELECT customer_name, customer_name order_number
order_number
ACE Consulting 123456
FROM customer_table
Acme Products 123456
CROSS JOIN
Billy's Best Choice 123456
order_table
Databases N-U 123456
WHERE order_number = 123456
XYZ Plumbing 123456
ORDER BY 1 ;

This Cross Join produces information that quite often isn’t worth anything!

Page 137

Licensed to , [email protected]
Chapter 5 Joining Tables

The Self Join

employee_table2
employee_no dept_no last_name first_name salary Mgr
1232578 100 Chambers Mandee 48850.00 Y
1256349 400 Harrison Herbert 54500.00 N
2341218 400 Reilly William 36000.00 Y
1121334 400 Strickling Cletus 54500.00 N
2312225 300 Larkins Loraine 40200.00 Y
2000000 ? Jones Squiggy 32800.50 N
1000234 10 Smythe Richard 32800.00 N
1324657 200 Coffing Billy 41888.88 N
1333454 200 Smith John 48000.00 Y

SELECT Mgrs.dept_no
, Mgrs.last_name as mgrname Which
, Mgrs.salary as mgrsal Workers
, emps.last_name as empname make a
, emps.salary as empsal bigger
FROM employee_table2 as emps, salary than
employee_table2 as Mgrs their
WHERE emps.dept_no = Mgrs.dept_no Manager?
AND Mgrs.mgr = 'Y' AND emps.salary > Mgrs.salary ;

A Self Join gives itself two different Aliases, which makes a copy to produce two separate tables.

Page 138

Licensed to , [email protected]
Chapter 5 Joining Tables

The Self Join with ANSI Syntax

employee_table2
employee_no dept_no last_name first_name salary Mgr
1232578 100 Chambers Mandee 48850.00 Y
1256349 400 Harrison Herbert 54500.00 N
2341218 400 Reilly William 36000.00 Y
1121334 400 Strickling Cletus 54500.00 N
2312225 300 Larkins Loraine 40200.00 Y
2000000 ? Jones Squiggy 32800.50 N
1000234 10 Smythe Richard 32800.00 N
1324657 200 Coffing Billy 41888.88 N
1333454 200 Smith John 48000.00 Y
SELECT Mgrs.dept_no dept_no mgrname mgrsal empname empsal
, Mgrs.last_name as mgrname 400 Reilly 36000.00 Harrison 54500.00
, Mgrs.salary as mgrsal 400 Reilly 36000.00 Strickling 54500.00
, emps.last_name as empname
, emps.salary as empsal
FROM employee_table2 as emps Only these two employees are
INNER JOIN employee_table2 as Mgrs making more than their manager.
ON emps.dept_no = Mgrs.dept_no
WHERE Mgrs.mgr = 'Y' AND emps.salary > Mgrs.salary

A Self Join gives itself two different aliases for its table name. The join performs as if there were two separate
tables. The query asks, “which workers make a bigger salary than their manager?”
Page 139

Licensed to , [email protected]
Chapter 5 Joining Tables

An Associative Table is a Bridge that Joins Two Tables

Associative course_table
course_id course_name credits seats
Table
100 Database Concepts 3 50
200 Introduction to SQL 3 20
student_course_table 210 Advanced SQL 3 22
220 V2R3 SQL Features 2 25
student_id course_id 300 Physical Database Design 4 20
280023 210 400 Database Administration 4 16
231222 210
125634 100 student_table
231222 220 student_id last_name first_name class_code grade_pt
125634 200 423400 Larkins Michael FR 0.00
322133 220 231222 Wilson Susie SO 3.80
125634 220 280023 McRoberts Richard JR 1.90
322133 300 322133 Bond Jimmy JR 3.95
324652 200 125634 Hanson Henry FR 2.88
333450 500 333450 Smith Andy SO 2.00
260000 400 324652 Delaney Danny SR 3.35
333450 400 260000 Johnson Stanley ? ?
234121 100 234121 Thomas Wendy FR 4.00
123250 100 123250 Phillips Martin SR 3.00

The Associative Table is a bridge between the course_table and student_table.

Page 140

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Can you Write the 3-Table Join?

Associative course_table
course_id course_name credits seats
Table
100 Database Concepts 3 50
200 Introduction to SQL 3 20
student_course_table 210 Advanced SQL 3 22
220 V2R3 SQL Features 2 25
student_id course_id 300 Physical Database Design 4 20
280023 210 400 Database Administration 4 16
231222 210
125634 100 student_table
231222 220 student_id last_name first_name class_code grade_pt
125634 200 423400 Larkins Michael FR 0.00
322133 220 231222 Wilson Susie SO 3.80
125634 220 280023 McRoberts Richard JR 1.90
322133 300 322133 Bond Jimmy JR 3.95
324652 200 125634 Hanson Henry FR 2.88
333450 500 333450 Smith Andy SO 2.00
260000 400 324652 Delaney Danny SR 3.35
333450 400 260000 Johnson Stanley ? ?
234121 100 234121 Thomas Wendy FR 4.00
123250 100 123250 Phillips Martin SR 3.00

SELECT ALL Columns from the course_table and student_table and Join them.

Page 141

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer to Quiz – Can you Write the 3-Table Join?

student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
Notice the *
technique of
SELECT S.*, C.* getting ALL
FROM student_table as S, columns from
course_table as C, both tables!
student_course_table as SC
Where S.student_id = SC.student_id
AND C.course_id = SC.course_id ;

The Associative Table is a bridge between the course_table and student_table, and its sole purpose is to join these
two tables together.

Page 142

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Can you Write the 3-Table Join Using ANSI Syntax?

student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
SELECT S.*, C.*
FROM student_table as S,
course_table as C,
student_course_table as SC
Where S.student_id = SC.student_id
AND C.course_id = SC.course_id ;

Convert this query to ANSI syntax

Please re-write the above query using ANSI Syntax.

Page 143

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer – Can you Write the 3-Table Join to ANSI Syntax?

student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
ANSI Syntax
Select S.*, C.*
Traditional Syntax
FROM student_table as S
SELECT S.*, C.* INNER JOIN
FROM student_table as S, student_course_table as SC
course_table as C, ON S.student_id = SC.student_id
student_course_table as SC INNER JOIN
Where S.student_id = SC.student_id course_table as C
AND C.course_id = SC.course_id ; ON C.course_id = SC.course_id;

Here are two examples of performing the join using traditional vs. ANSI syntax.

Page 144

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz – Can you Place the ON Clauses at the End?

student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt

ANSI Syntax
Select S.*, C.*
From student_table as S Can you rewrite
INNER JOIN this and place
all the ON
student_course_table as SC
clauses at the
ON S.student_id = SC.student_id end?
INNER JOIN
course_table as C
ON C.course_id = SC.course_id;

Please re-write the above query and place both ON Clauses at the end.
Page 145

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer – Can you Place the ON Clauses at the End?

student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt

Select S.*, C.*


From student_table as S The trick is to put
the first ON
INNER JOIN
clause for the last
student_course_table as SC join and go
INNER JOIN backwards
course_table as C
ON C.course_id = SC.course_id
ON SC.student_id = S.student_id;

The example above is complicated. Most people have never seen this technique. The only way it works is to place
the ON clauses backward. The first ON Clause represents the last INNER JOIN and then moves back.

Page 146

Licensed to , [email protected]
Chapter 5 Joining Tables

The 5-Table Join – Logical Insurance Model

addresses subscribers claims

subscriber_no subscriber_no subscriber_no


member_no member_no

services

service_code claim_service

providers

provider_code provider_no

Above is the logical model for the insurance tables showing the Primary Key and Foreign Key relationships
(PK/FK).

Page 147

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz - Write a Five Table Join Using ANSI Syntax

addresses subscribers claims

subscriber_no subscriber_no subscriber_no


member_no member_no

services

service_code claim_service

providers

provider_code provider_no

Your mission is to write a five-table join selecting all columns using ANSI syntax.

Page 148

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer - Write a Five Table Join Using ANSI Syntax

SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.*


FROM claims AS cla1
INNER JOIN
subscribers AS sub1
ON cla1.subscriber_no = sub1.subscriber_no
AND cla1.member_no = sub1.member_no
INNER JOIN
addresses AS add1
ON sub1.subscriber_no = add1.subscriber_no
INNER JOIN
providers AS pro1
ON cla1.provider_no = pro1.provider_code
INNER JOIN
services AS ser1
ON cla1.claim_service = ser1.service_code ;

Above is the example of writing this five table join using ANSI syntax.
Page 149

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz - Write a Five Table Join Using Traditional Syntax

addresses subscribers claims

subscriber_no subscriber_no subscriber_no


member_no member_no

services

service_code claim_service

providers

provider_code provider_no

Your mission is to write a five-table join selecting all columns using traditional join syntax.

Page 150

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer - Write a Five Table Join Using Non-ANSI Syntax

SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.*


FROM claims AS cla1,
subscribers AS sub1,
addresses AS add1,
providers AS pro1,
services AS ser1
WHERE cla1.subscriber_no = sub1.subscriber_no
AND cla1.member_no = sub1.member_no
AND sub1.subscriber_no = add1.subscriber_no
AND cla1.provider_no = pro1.provider_code
AND cla1.claim_service = ser1.service_code ;

Above is the example of writing this five-table join using traditional join syntax.

Page 151

Licensed to , [email protected]
Chapter 5 Joining Tables

Quiz –Re-Write this putting the ON clauses at the END

SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.*


FROM claims AS cla1
INNER JOIN
subscribers AS sub1
ON cla1.subscriber_no = sub1.subscriber_no
AND cla1.member_no = sub1.member_no
INNER JOIN
addresses AS add1
ON sub1.subscriber_no = add1.subscriber_no
INNER JOIN
providers AS pro1
ON cla1.provider_no = pro1.provider_code
INNER JOIN
services AS ser1
ON cla1.claim_service = ser1.service_code ;

Above is an example of writing this five-table join using ANSI syntax, but can you place the ON clauses at the end
of the SQL.

Page 152

Licensed to , [email protected]
Chapter 5 Joining Tables

Answer – Re-Write this putting the ON clauses at the END

SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.*


FROM providers AS pro1
INNER JOIN
addresses AS add1
INNER JOIN
subscribers AS sub1
INNER JOIN
services AS ser1
INNER JOIN
claims as cla1
ON cla1.claim_service = ser1.service_code
ON cla1.subscriber_no = sub1.subscriber_no
AND cla1.member_no = sub1.member_no
ON sub1.subscriber_no = add1.subscriber_no
ON cla1.provider_no = pro1.provider_code ;

Above is an example of writing this five-table join using ANSI syntax with the ON clauses at the end. Also, to
make this happen, we moved the tables around. Notice that the first ON clause represents the last two tables
joining, and then it works backward.

Page 153

Licensed to , [email protected]
Chapter 5 Joining Tables

Page 154

Licensed to , [email protected]
Chapter 6 Date Functions

Chapter 6 – Date Functions

"An inch of time cannot be bought with an inch of gold."

- Chinese Proverb

Page 155

Licensed to , [email protected]
Chapter 6 Date Functions

Migrate Any Database to Databricks and Vice Versa

Can you imagine a query tool with over 200 ETL utilities built inside that allows anyone to migrate every database
to Databricks and migrate Databricks to every database. Watch the Nexus migrate entire databases between
systems with the click of the mouse in this YouTube video: https://www.youtube.com/watch?v=0JQn134tzio

Page 156

Licensed to , [email protected]
Chapter 6 Date Functions

Current_Date

SELECT current_date as ansi_date;

ansi_date Alias name


2023-07-11 for answer set

YYYY-MM-DD
Year Month Day

The current_date will return today's date.

Page 157

Licensed to , [email protected]
Chapter 6 Date Functions

Current_Date, Current_Timestamp, and Current_Timezone

SELECT
current_date as ansi_date
,curdate() as date_function
,current_timestamp as ansi_timestamp
,current_timezone() as my_timezone;

ansi_date date_function ansi_timestamp my_timezone


2023-07-11 2023-07-11 2023-07-11 6:39:41.097000 Etc/UTC

Use the above keywords to get the


date, timestamp, and timezone.

Unlocking the Power of Databricks: Discover Essential Keywords for Date, Timestamp, and Timezone Retrieval.
These keywords are reserved by Databricks and are ready at your fingertips when you need them.

Page 158

Licensed to , [email protected]
Chapter 6 Date Functions

Now() Function

now is a traditional SELECT


Databricks equivalent to current_timestamp as timestamp1
current_timestamp
,now() as now1
timestamp1 now1
2023-07-11 6:47:37.485000 2023-07-11 6:47:37.485000

This SQL query in Databricks retrieves two timestamps. It assigns them aliases: current_timestamp as timestamp1:
This part of the query retrieves the current timestamp from the system clock and assigns it the alias "timestamp1."
The current_timestamp function returns the current date and time when the query is executed. now() as now1: This
part of the query also retrieves the current timestamp, but it uses the now() function instead of current_timestamp.
It assigns this timestamp the alias "now1." The now() function serves the same purpose as the current_timestamp in
most databases, returning the current date and time. Both timestamps are included in the query result, with their
respective aliases, making it easy to reference them in subsequent calculations or for display purposes.

Page 159

Licensed to , [email protected]
Chapter 6 Date Functions

Add or Subtract From a Date

SELECT When you add or


current_date as today
,current_date + 1 as tomorrow
subtract a number
,current_date – 1 as yesterday from a date, you
,current_date + 365 as next_year are adding or
,current_date – 365 as last_year subtracting days
today tomorrow yesterday next_year last_year
2023-07-11 2023-07-12 2023-07-10 2024-07-10 2022-07-11

Add or subtract a number from a date, and you are adding or subtracting days.

Page 160

Licensed to , [email protected]
Chapter 6 Date Functions

Date Function

SELECT
CURRENT_TIMESTAMP
,CAST (CURRENT_TIMESTAMP as date) as date_ts
,DATE(CURRENT_TIMESTAMP ) as date_fn
,DATE ('2023-12-31') as eoy

current_timestamp() date_ts date_fn eoy


2023-08-10 2:51:11.466000 2023-08-10 2023-08-10 2023-12-31

The date function casts the


parameters as a date data type.

The date function casts the parameters as a date data type.

Page 161

Licensed to , [email protected]
Chapter 6 Date Functions

To_Date Function

Syntax: to_date(expr [, fmt] )

SELECT
CURRENT_TIMESTAMP
,CAST (CURRENT_TIMESTAMP as date) as date_ts
,TO_DATE(CURRENT_TIMESTAMP ) as date_fn
,TO_DATE ('2023-12-31') as eoy
,TO_DATE(CURRENT_TIMESTAMP, 'yyyy-MM-dd') as formatted;

current_timestamp() date_ts date_fn eoy formatted


2023-08-10 3:13:04.594000 2023-08-10 2023-08-10 2023-12-31 2023-08-10

The To_Date function casts the


parameters as a date data type.

The date function casts the parameters as a date data type.

Page 162

Licensed to , [email protected]
Chapter 6 Date Functions

To_Timestamp Function

Syntax: to_timestamp(expr [, fmt] )

SELECT
CURRENT_TIMESTAMP
,CURRENT_DATE as today
,TO_TIMESTAMP(CURRENT_DATE ) as timestamp_fn
,TO_TIMESTAMP(CURRENT_DATE, 'yyyy-MM-dd') as formatted;

current_timestamp() today timestamp_fn formatted


2023-08-11 7:05:09.599000 2023-08-11 2023-08-11 12:00:00.000000 2023-08-11 12:00:00.000000

The To_Timestamp function casts the


parameters as a timestamp data type.

The to_timestamp function casts the parameters as a timestamp data type. If the format (fmt) is supplied, it must
conform with Datetime patterns, but if the format (fmt) is not supplied, the function is a synonym for cast(expr AS
TIMESTAMP).

Page 163

Licensed to , [email protected]
Chapter 6 Date Functions

Add or Subtract Days From a Date

SELECT order_date
,order_date + 60 as "Due Date"
,order_total
,order_date + 50 as discount
,order_total *.98 as disc_price
FROM order_table
ORDER BY 1 ;

order_date Due Date order_total discount disc_price


2020-05-04 2020-07-03 12347.53 2020-06-23 12100.5794
2021-01-01 2021-03-02 8005.91 2021-02-20 7845.7918
2021-09-09 2021-11-08 23454.84 2021-10-29 22985.7432
2021-10-01 2021-11-30 5111.47 2021-11-20 5009.2406
2021-10-10 2021-12-09 15231.62 2021-11-29 14926.9876

When you add or subtract from a Date you are adding/subtracting Days

If you add or subtract a number from a date, it adds or subtracts several days from the date.

Page 164

Licensed to , [email protected]
Chapter 6 Date Functions

Subtract Two Dates for a Difference in Days

SELECT
order_date When you subtract
,current_date as today between two dates
,current_date - order_date as day_diff you get the number
FROM order_table of days between
ORDER BY order_date

order_date today day_diff


2020-05-04 2023-07-11 1163 00:00:00.000000000
2021-01-01 2023-07-11 921 00:00:00.000000000
2021-09-09 2023-07-11 670 00:00:00.000000000
2021-10-01 2023-07-11 648 00:00:00.000000000
2021-10-10 2023-07-11 639 00:00:00.000000000

When you subtract between two dates, you get the approximate number of days between those dates.

Page 165

Licensed to , [email protected]
Chapter 6 Date Functions

Subtract Two Dates for a Difference in Days

SELECT
order_date
,current_date as today
,current_date - order_date as day_diff
,now() - order_date as day_and_time_diff
FROM order_table
ORDER BY 1

order_date current_date day_diff day_and_time_diff


2020-05-04 2023-07-11 1163 00:00:00.000000000 1163 18:52:28.389000000
2021-01-01 2023-07-11 921 00:00:00.000000000 921 18:52:28.389000000
2021-09-09 2023-07-11 670 00:00:00.000000000 670 18:52:28.389000000
2021-10-01 2023-07-11 648 00:00:00.000000000 648 18:52:28.389000000
2021-10-10 2023-07-11 639 00:00:00.000000000 639 18:52:28.389000000

When you subtract between two dates


you get the number of days between

When you subtract between two dates, you get the approximate number of days between those dates.

Page 166

Licensed to , [email protected]
Chapter 6 Date Functions

MONTHS_BETWEEN

Syntax: MONTHS_BETWEEN ( date1, date2 )

SELECT
order_date
,current_date
,MONTHS_BETWEEN(current_date, order_date) as month_diff
FROM order_table
ORDER BY 1

order_date current_date() month_diff


2020-05-04 2023-07-11 38.23
2021-01-01 2023-07-11 30.32
2021-09-09 2023-07-11 22.06
2021-10-01 2023-07-11 21.32
2021-10-10 2023-07-11 21.03

The MONTHS_BETWEEN command displays the number of months between two dates. The inputs may have
DATE, TIMESTAMP, or TIMESTAMPTZ data types. If one of the inputs is null, the result is null.

Page 167

Licensed to , [email protected]
Chapter 6 Date Functions

The ADD_MONTHS Command

Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84
SELECT Order_Date
,Add_Months (Order_Date,2) as due_date
,Order_Total
FROM Order_Table ORDER BY 1 ;
order_date due_date order_total
2020-05-04 2020-07-04 12347.53
2021-01-01 2021-03-01 8005.91
2021-09-09 2021-11-09 23454.84
2021-10-01 2021-12-01 5111.47
2021-10-10 2021-12-10 15231.62

The example above uses the Add_Months Command. What you can do with it is add a month or many months to
your date columns. Can you convert this to one year? There is no ADD_YEAR command!

Page 168

Licensed to , [email protected]
Chapter 6 Date Functions

Using the ADD_MONTHS Command to Add 1 Year

Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84

SELECT order_date
,Add_Months (order_date,12) as Due_Date12
,order_total
FROM order_table ORDER BY 1 ;
Add
order_date Due_Date12 order_total one year
1998-05-04 1999-05-04 12347.53 (12 months)
1999-01-01 2000-01-01 8005.91
1999-01-10 2000-01-10 15231.62
1999-09-09 2000-09-09 23454.84
1999-10-01 2000-10-01 5111.47

The Add_Months command adds months to any date. Above, we used a great technique that would give us one
year. Can you give me five years?
Page 169

Licensed to , [email protected]
Chapter 6 Date Functions

Using the ADD_MONTHS Command to Add 5 Years

Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84

SELECT order_date
,Add_Months (order_date,12 * 5) as Due_5_Years
,order_total
FROM order_table Add five years
60 works, as well.
ORDER BY 1 ;
order_date Due_5_Years order_total
2020-05-04 2025-05-04 12347.53
2021-01-01 2026-01-01 8005.91
2021-09-09 2026-09-09 23454.84
2021-10-01 2026-10-01 5111.47
2021-10-10 2026-10-10 15231.62

Above, you see a great technique for adding multiple years to the date.
Page 170

Licensed to , [email protected]
Chapter 6 Date Functions

The EXTRACT Command

The EXTRACT command extracts portions of


a Date, Time, or Timestamp
SELECT order_date
,order_total
,EXTRACT (Year from order_date) as yr
,EXTRACT (Day from order_date) as day
FROM order_table
WHERE EXTRACT(Month from order_date) = 9 ;

order_date order_total yr day


2021-09-09 23454.84 2021 9

The Extract command extracts portions of a date, time, or timestamp.

Page 171

Licensed to , [email protected]
Chapter 6 Date Functions

The EXTRACT Command

Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84

The EXTRACT command extracts portions of


Date, Time, and Timestamp
SELECT order_date
,order_total
FROM order_table
WHERE EXTRACT(Month from order_date) = 9 ;

order_date order_total
2021-09-09 23454.84

The Extract command extracts portions of a date, time, or timestamp.

Page 172

Licensed to , [email protected]
Chapter 6 Date Functions

EXTRACT from DATES and TIME

SELECT current_date
,EXTRACT(Year from current_date) as yr
,EXTRACT(Month from current_date) as mo
,EXTRACT(Day from current_date) as da

Answer Set
EXPR_1 yr mo da
2023-07-11 2023 7 11

The above examples show how robust the EXTRACT command is at extracting portions of a date, time, or
timestamp.

Page 173

Licensed to , [email protected]
Chapter 6 Date Functions

Day, Month, Year, DayofMonth, DayofWeek, and DayofYear

SELECT
current_date as now
,DAY(current_date) as day
,DAYOFMONTH(current_date) as day2
,DAYOFWEEK(current_date) as dow -- 1 = Sunday
,DAYOFYEAR(current_date) as doy
,MONTH(current_date) as month
,YEAR(current_date) as year

Now day day2 dow doy month year


2023-08-11 11 11 6 223 8 2023

The DAY, DAYOFMONTH, MONTH and YEAR commands are implicit extract statements, but the
DAYOFWEEK and DAYOFYEAR perform a calculation. The DAYOFWEEK returns an integer where one
represents Sunday and two is for Monday.

Page 174

Licensed to , [email protected]
Chapter 6 Date Functions

Using CASE and Extract to Reformat Dates

SELECT Order_Date AS "Order_Date",


CASE
WHEN EXTRACT(month from Order_Date) < 10 THEN '0' ||
CAST(EXTRACT(month FROM Order_Date) AS CHAR(1))
ELSE
CAST(EXTRACT(month FROM Order_Date) AS CHAR(2))
END
|| '/' ||
CAST(EXTRACT(YEAR FROM Order_Date) AS CHAR(4)) AS "mmyyyy"
FROM Order_Table
ORDER BY 1, 2
Order_Date mmyyyy
2020-05-04 05/2020
2021-01-01 01/2021
2021-09-09 09/2021
2021-10-01 10/2021
2021-10-10 10/2021

Use the EXTRACT function (combined with CASE, CAST, and CONCATENATION) to retrieve date and month
and reformat them in the date format of mm/yyyy. The double pipe symbols perform the concatenation.

Page 175

Licensed to , [email protected]
Chapter 6 Date Functions

Using CAST and SUBSTRING to Reformat Dates

SELECT order_date,
SUBSTRING (Cast(order_date as CHAR(10)) FROM 6 for 2)
|| '/' ||
SUBSTRING (CAST(order_date as CHAR(10)) FROM 1 for 4) AS mmyyyy
FROM Order_Table
ORDER BY 1, 2

order_date mmyyyy
2020-05-04 05/2020
2021-01-01 01/2021
2021-09-09 09/2021
2021-10-01 10/2021
2021-10-10 10/2021

Use the CAST, SUBSTRING, and CONCATENATION) to retrieve date and month and reformat them in the date
format of mm/yyyy. The concatenation is performed by the double pipe symbols.

Page 176

Licensed to , [email protected]
Chapter 6 Date Functions

The Date_Part Function

Syntax: date_part(text, timestamp) or Syntax: date_part(text, interval)

SELECT current_timestamp
,date_part('day', current_timestamp) as day
,date_part('minute', current_timestamp) as minute
,date_part('second', current_timestamp) as second
,date_part('quarter', current_timestamp) as quarter
,date_part('hour', interval '4 hours 1 minutes') as interval ;

The valid text names are century, day, decade, dow, doy, epoch, hour,
isodow, isoyear, microseconds, millennium, milliseconds, minute, month,
quarter, second, timezone, timezone_hour, timezone_minute, week, year.

current_timestamp() day minute second quarter interval


2023-07-11 7:16:23.678000 11 16 23.678000 3 4

The above examples use the date_part function to get the subfields. The text parameter needs to be a string value,
not a name, so don't forget your single quotes.

Page 177

Licensed to , [email protected]
Chapter 6 Date Functions

Date_Format Function

SELECT
current_date as today
,date_format(current_date, 'y') as yr -- year
,date_format(current_date, 'M') as mo -- month
,date_format(current_date, 'd') as day -- day of the month
,date_format(current_date, 'D') as day_of_yr -- day of the year
,date_format(current_date, 'E') as day_of_wk -- day of the week
,date_format(current_date, 'a') as am_pm -- am or pm
,date_format(current_date, 'q') as qtr -- quarter of the year
,date_format(current_date, 'G') as era -- era BC or AD

today yr mo day day_of_yr day_of_wk am_pm qtr era


2023-08-10 2023 8 10 222 Thu AM 3 AD

The date_format function will format a DATE, TIMESTAMP, or a STRING in a valid datetime format.

Page 178

Licensed to , [email protected]
Chapter 6 Date Functions

More Date_Format Examples

SELECT
current_date as today -- default format
,date_format(current_date, 'd-MM-yyyy') as dmy -- dmy format
,date_format(current_date, 'MMM') as mo_3 -- month abbrev
,date_format(current_date, 'MMMM') as mo_full -- month spelled
,date_format(current_date, 'MMMM d') as mo_day -- month and day
,date_format(current_date, 'E MMM d, yyyy') doy_mo_d_y -- full formatting

today dmy mo_3 mo_full mo_day doy_mo_d_y


2023-08-10 10-08-2023 Aug August August 10 Thu Aug 10, 2023

The date_format function will format a DATE, TIMESTAMP, or a STRING in a valid datetime format.

Page 179

Licensed to , [email protected]
Chapter 6 Date Functions

Datediff Example

Syntax: date_diff(unit, start, end)

SELECT unit
DATEDIFF (month, '2014-01-01', '2023-01-01') as mo_bet { MICROSECOND |
MILLISECOND |
,DATEDIFF (year, '2014-01-01', current_date) as years
SECOND |
,DATEDIFF (quarter, '2014-01-01', current_date) as quarters MINUTE |
,DATEDIFF (hour, '2014-01-01', current_date) as hours HOUR |
,DATEDIFF (minute, '2014-01-01', current_date) as minutes DAY |
,DATEDIFF (second, '2014-01-01', current_date) as seconds WEEK |
MONTH |
QUARTER |
YEAR }

mo_bet years quarters hours minutes seconds


108 9 38 84216 5052960 303177600

This function uses a datepart (day, week, month, etc.) and two target expressions. This function returns the
difference between the two expressions. The expressions must be a date or timestamp expressions, and they must
both contain the specified datepart. If the second date is later than the first date, the result is positive. If the second
date is earlier than the first date, the result is negative.

Page 180

Licensed to , [email protected]
Chapter 6 Date Functions

Dateadd

date_add(unit, value, expr)


unit
{ MICROSECOND |
SELECT MILLISECOND |
order_date SECOND |
,cast(dateadd(day,3,order_date) as date) as three_days MINUTE |
HOUR |
,cast(dateadd(week,1, order_date) as date) one_week
DAY | DAYOFYEAR |
,cast(dateadd(month,2,order_date) as date) two_months WEEK |
,cast(dateadd(quarter, -2,order_date) as date) minus_2q MONTH |
FROM order_table QUARTER |
ORDER BY 1 ; YEAR }

order_date three_days one_week two_months minus_2q


2020-05-04 2020-05-07 2020-05-11 2020-07-04 2019-11-04
2021-01-01 2021-01-04 2021-01-08 2021-03-01 2020-07-01
2021-09-09 2021-09-12 2021-09-16 2021-11-09 2021-03-09
2021-10-01 2021-10-04 2021-10-08 2021-12-01 2021-04-01
2021-10-10 2021-10-13 2021-10-17 2021-12-10 2021-04-10

The Dateadd command adds a specified interval of time to a date or timestamp value. We are casting to a date so
not to get a timestamp in the result.

Page 181

Licensed to , [email protected]
Chapter 6 Date Functions

Incrementing Time Values Using the Dateadd Function

SELECT CURRENT_TIMESTAMP
,DATEADD(HOUR,2,CURRENT_TIMESTAMP()) twohours;
current_timestamp() twohours
2023-07-12 4:53:58.743000 2023-07-12 6:53:58.743000

SELECT CURRENT_TIMESTAMP
,DATEADD(MINUTE,2,CURRENT_TIMESTAMP()) twomin;

current_timestamp() twomin
2023-07-12 4:55:30.669000 2023-07-12 4:57:30.669000

SELECT CURRENT_TIMESTAMP
,DATEADD(SECOND,2,CURRENT_TIMESTAMP()) twosec;

current_timestamp() twosec
2023-07-12 4:56:23.067000 2023-07-12 4:56:25.067000

The Dateadd command adds a specified time interval to a date or timestamp value.

Page 182

Licensed to , [email protected]
Chapter 6 Date Functions

Date_Sub Function

date_sub(startDate, numDays)

SELECT
current_date
,date_sub(current_date, 1) as yesterday
,date_sub(current_date, 30) as month_ago
,date_sub(current_date, -30) as month_forward

current_date() yesterday month_ago month_forward


2023-08-11 2023-08-10 2023-07-12 2023-09-10

The date_sub function returns the date numDays before the startDate. If numDays is negative abs(num_days) are
added to startDate. If the result date overflows the date range the function raises an error. In our example above,
notice the month_forward uses a negative number for numDays and the result date is one month forward.
Page 183

Licensed to , [email protected]
Chapter 6 Date Functions

The Date_Trunc Function

Syntax: date_part(text, timestamp) or Syntax: date_part(text, interval)

SELECT The valid text names


Date_Trunc('day', current_timestamp) as day are microseconds,
milliseconds, second,
,Date_Trunc('hour', current_timestamp) as hour
minute, hour, day,
,Date_Trunc('minute', current_timestamp) as minute week, month, quarter,
year, decade, century,
and millennium

day hour minute


2023-07-11 12:00:00.000000 2023-07-11 7:00:00.000000 2023-07-11 7:25:00.000000

The above examples use the Date_Trunc function to get the subfields. The text parameter needs to be a string
value, not a name, so don't forget your single quotes. Date_Trunc selects to which precision to truncate the input
value. The return value is of type timestamp or interval.

Page 184

Licensed to , [email protected]
Chapter 6 Date Functions

Date_Trunc Command With Time

SELECT date_trunc('hour', timestamp '2023-02-16 20:38:40') AS hr

hr This sets the


minute and
2023-02-16 8:00:00.000000 seconds
to zeros

SELECT date_trunc('minute', timestamp '2023-02-16 20:38:40') min

min This sets the


seconds
2023-02-16 8:38:00.000000 to zero

SELECT date_trunc('second', timestamp '2023-02-16 20:38:40.123456') sec

sec This sets the


microseconds
2023-02-16 8:38:40.000000 to zero

The date_trunc command will set the hour to the top of the hour. It will set the minute to the top of the minute, and
it will set the seconds to the top of the seconds.

Page 185

Licensed to , [email protected]
Chapter 6 Date Functions

Date_Trunc Command With Dates

SELECT date_trunc('Year', timestamp '2023-02-16 20:38:40') as year

year This sets the month and


day back to the 1st and
2023-01-01 12:00:00.000000 the time to
12:00:00:000000

SELECT date_trunc('Month', timestamp '2023-02-16 20:38:40') mth


mth
This sets the day back
to the 1st and the time to
2023-02-01 12:00:00.000000 12:00:00:000000

SELECT date_trunc('Day', timestamp '2023-02-16 20:38:40') as day


day
This sets the time back
2023-02-16 12:00:00.000000 to 12:00:00:000000

The date_trunc command will set the date to the 1st day of the year when using the interval ‘Year.’ Date_trunc
will set to the first day of the month for the interval ‘Month.’ It will set the time to midnight for the interval ‘Day.’

Page 186

Licensed to , [email protected]
Chapter 6 Date Functions

Last_Day

Syntax: LAST_DAY ( date | timestamp | timestamptz )

SELECT Current_Date as today


,last_day(current_date) last_day

today last_day
2023-07-11 2023-07-31

The Last_Day command returns the last day of the month, based on a given date or timestamp.

Page 187

Licensed to , [email protected]
Chapter 6 Date Functions

Advanced Tricks for Month

The example below displays the dates of the


• current_date
• the first day of the month
• the last day of the month
• the last day of the previous month

SELECT
Current_Date as today
,(Current_Date - EXTRACT(DAY FROM Current_Date)) + 1 as first_day_of_mth
,cast((Current_Date + INTERVAL '1' MONTH) - EXTRACT
(DAY FROM ADD_MONTHS(Current_Date,1)) as date) as last_day_of_mth
,(Current_Date - EXTRACT(DAY FROM Current_Date)) as last_day_prev_mth

today first_day_of_mth last_day_of_mth last_day_prev_mth


2023-07-11 2023-07-01 2023-07-31 2023-06-30

The example below displays the dates of the current_date, the first day of the month, the last day of the month, and
the last day of the previous month.

Page 188

Licensed to , [email protected]
Chapter 6 Date Functions

Clever Tricks for Month

The example below displays the dates of the


• current_date
• the first day of the month
• the last day of the month
• the last day of the previous month
• the first day of the next month

SELECT
Current_Date as today
,(Current_Date - EXTRACT(DAY FROM Current_Date)) + 1 as first_day
, last_day(current_date) as last_day
,(Current_Date - EXTRACT(DAY FROM Current_Date)) as last_day_prev_mth
, last_day(current_date) +1 as first_day_next_mth

date first_day last_day last_day_prev_mth first_day_next_mth


2023-07-11 2023-07-01 2023-07-31 2023-06-30 2023-08-01

The example below displays the dates of the current_date, the first day of the month, the last day of the month, the
last day of the previous month, and the first day of the next month.

Page 189

Licensed to , [email protected]
Chapter 6 Date Functions

Make_Date

Syntax: MAKE_DATE(year, month, day)

SELECT make_date(1, 2, 3) as long_time_ago


,make_date(2023, 12, 22) as this_year
,make_date(2023, 10, 15) as next_year

long_time_ago this_year next_year


0001-02-03 2023-12-22 2023-10-15

The Make_Date command will make a DATE value from three integer values representing the year, month, and
day.

Page 190

Licensed to , [email protected]
Chapter 6 Date Functions

Make_Timestamp

Syntax: MAKE_TIMESTAMP(year, month, day, hour, minute, seconds)

SELECT make_timestamp(1, 2, 3, 4, 5, 6) as long_time_ago


,make_timestamp(2023, 12, 22, 10, 09, 00) as this_year

long_time_ago this_year
0001-02-03 4:05:06.000000 2023-12-22 10:09:00.000000

The Make_Timestamp command makes a complete TIMESTAMP value from a series of datetime values. The
year, month, day, hour, and minute values are integers. The second value is a double-precision number.

Page 191

Licensed to , [email protected]
Chapter 6 Date Functions

Using Day, Month, and Year intervals

SELECT current_date as our_date


,current_date + interval '1' Day as plus_1_day
,current_date + interval '5' Year as plus_5_years

our_date plus_1_day plus_5_years


2023-07-12 2023-07-13 2028-07-12

The examples above show you how to use day, month, and year intervals.

Page 192

Licensed to , [email protected]
Chapter 6 Date Functions

The Basics of a Simple Interval

SELECT Date '2012-01-29' as our_date


,Date '2012-01-29' + interval '1' Month as leap_year

our_date leap_year
2012-01-29 2012-02-29

SELECT Date '2011-01-29' as our_date


,Date '2011-01-29' + interval '1' Month as no_leap_year

our_date no_leap_year
2011-01-29 2011-02-28

Intervals add or subtract a period from a date at specific intervals.

Page 193

Licensed to , [email protected]
Chapter 6 Date Functions

Determining if the Current_Date is a Leap Year

If the year of a date is evenly divisible by 400 (no


remainder) then it is a leap year, or if the year is
divisible by 4 with no remainder, and not divisible by
100 with no remainder it is leap year.

SELECT Current_Date,
CASE
WHEN MOD(CAST(Extract(Year from Current_Date) as integer), 400) = 0
THEN 'Leap year'
WHEN MOD(CAST(EXTRACT(Year from Current_Date) as integer), 4) = 0
AND NOT MOD(CAST(EXTRACT(Year from Current_Date) as integer), 100) = 0
THEN 'Leap Year'
else 'Not Leap Year'
END

EXPR_1 EXPR_2
2023-07-12 Not Leap Year

You can use the query above as an example to see if a date is a leap year.

Page 194

Licensed to , [email protected]
Chapter 6 Date Functions

Determining if the Current_Timestamp is a Leap Year

If the year of a date is evenly divisible by 400 (no remainder) then


it is a leap year, or if the year is divisible by 4 with no remainder,
and not divisible by 100 with no remainder it is leap year.

SELECT Current_Timestamp,
CASE
WHEN MOD(CAST(Extract(Year from Current_timestamp) as integer), 400) = 0
THEN 'Leap year'
WHEN MOD(CAST(EXTRACT(Year from Current_timestamp) as integer), 4) = 0
AND NOT MOD(CAST(EXTRACT(Year from Current_timestamp) as integer), 100) = 0
THEN 'Leap Year'
ELSE 'Not Leap Year'
END

EXPR_1 EXPR_2
2023-07-12 8:43:37.301000 Not Leap Year

You can use the query above as an example to see if a current_timestamp is a leap year.

Page 195

Licensed to , [email protected]
Chapter 6 Date Functions

Make_Interval

Syntax: make_interval(years, months, weeks, days, hours, mins, secs)

SELECT MAKE_INTERVAL(1000, 1, 1, 1, 1, 1, 1) as make_interval

make_interval
1000 years 1 months 8 days 1 hours 1 minutes 1 seconds

The Make_Interval command allows you to make an interval value by adding a series of values that represent the
year, months, weeks, days, hours, minutes, and seconds. (You can specify all or some of these values.) The year,
month, day, hour, and minute values are integers, and default to 0. The second's value is a double-precision
number that defaults to 0.0.

Page 196

Licensed to , [email protected]
Chapter 6 Date Functions

Try_Divide Function

Syntax: try_divide(dividend, divisor)

SELECT try_divide(3, 2) as num_1


,try_divide(3, 0) as no_error
,try_divide(INTERVAL '3:20' HOUR TO MINUTE, 4) as int_example
,try_divide(2L, 2L) as only_one

num_1 no_error int_example only_one


1.5 ? 0 00:50:00.000000000 1

The try_divide function returns dividend divided by divisor, or NULL if divisor is 0. If both dividend and divisor
are DECIMAL, the result is DECIMAL. If dividend is a year-month interval, the result is an INTERVAL YEAR
TO MONTH. If divident is a day-time interval, the result is an INTERVAL DAY TO SECOND. In all other cases,
a DOUBLE. If the divisor is 0, the operator returns NULL.

Page 197

Licensed to , [email protected]
Chapter 6 Date Functions

Page 198

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Chapter 7 – Analytic and Window Functions

"Extinction is the rule. Survival is the exception."

- Carl Sagan

Page 199

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Nexus Gives You Databricks Analytics for Free

Nexus collects each answer set in memory and then allows the user to use analytic templates to create the same
analytic reports you can get from Databricks, but with Nexus the analytics are free. Watch the YouTube video:
https://www.youtube.com/watch?v=dMsAghFbYXk&t=16s.

Page 200

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

ROW_NUMBER

SELECT product_id ,sale_date , daily_sales,


ROW_NUMBER() OVER
(ORDER BY product_id, sale_date) AS seq_number
FROM sales_table WHERE product_id IN (1000, 2000) ;

product_id sale_date daily_sales seq_number


1000 2000-09-28 48850.40 1
1000 2000-09-29 54500.22 2
Not all 1000 2000-09-30 36000.07 3
rows
are
1000 2000-10-01 40200.43 4
displayed 1000 2000-10-02 32800.50 5
1000 2000-10-03 64300.00 6
1000 2000-10-04 54553.10 7
2000 2000-09-28 41888.88 8
2000 2000-09-29 48000.00 9
2000 2000-09-30 49850.03 10
2000 2000-10-01 54850.29 11

The ROW_NUMBER () Keyword(s) caused seq_number to increase sequentially. Notice that this does NOT have
a Rows Unbounded Preceding, and it still works!

Page 201

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Quiz – How did the Row_Number Reset?

SELECT product_id ,sale_date , daily_sales,


ROW_NUMBER() OVER (PARTITION BY product_id
ORDER BY product_id, sale_date ) AS startover
FROM sales_table WHERE product_id IN (1000, 2000) ;

product_id sale_date daily_sales startover


1000 2000-09-28 48850.40 1
1000 2000-09-29 54500.22 2
1000 2000-09-30 36000.07 3
1000 2000-10-01 40200.43 4
1000 2000-10-02 32800.50 5
1000 2000-10-03 64300.00 6
1000 2000-10-04 54553.10 7
2000 2000-09-28 41888.88 1
2000 2000-09-29 48000.00 2
2000 2000-09-30 49850.03 3
2000 2000-10-01 54850.29 4
2000 2000-10-02 36021.93 5
2000 2000-10-03 43200.18 6
2000 2000-10-04 32800.50 7

What Keyword(s) caused startover to reset?


Page 202

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Answer – How did the Row_Number Reset?

SELECT product_id ,sale_date , daily_sales,


ROW_NUMBER() OVER (PARTITION BY product_id
ORDER BY product_id, sale_date ) AS startover
FROM sales_table WHERE product_id IN (1000, 2000) ;

product_id sale_date daily_sales startover


1000 2000-09-28 48850.40 1
1000 2000-09-29 54500.22 2
1000 2000-09-30 36000.07 3
1000 2000-10-01 40200.43 4
1000 2000-10-02 32800.50 5
1000 2000-10-03 64300.00 6
1000 2000-10-04 54553.10 7
2000 2000-09-28 41888.88 1
2000 2000-09-29 48000.00 2
2000 2000-09-30 49850.03 3
2000 2000-10-01 54850.29 4
2000 2000-10-02 36021.93 5
2000 2000-10-03 43200.18 6
2000 2000-10-04 32800.50 7

What Keyword(s) caused startover to reset? It is the PARTITION BY statement.


Page 203

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

QUALIFY

SELECT product_id ,sale_date , daily_sales,


ROW_NUMBER() OVER (PARTITION BY product_id
ORDER BY product_id, sale_date DESC ) AS startover
FROM SALES_TABLE QUALIFY startover < 4

QUALIFY is to Ordered Analytics what HAVING is to Aggregates

product_id sale_date daily_sales startover


1000 2000-10-04 54553.10 1
1000 2000-10-03 64300.00 2
1000 2000-10-02 32800.50 3
2000 2000-10-04 32800.50 1
2000 2000-10-03 43200.18 2
2000 2000-10-02 36021.93 3
3000 2000-10-04 15675.33 1
3000 2000-10-03 21553.79 2
3000 2000-10-02 19678.94 3

Qualify is a keyword extension that acts as a filter but only filters after the calculations finish. Qualify is to
ordered analytics and window functions what HAVING is to aggregates. So, qualify only works on analytics, and
HAVING only works on aggregates, but both are filters after processing the calculations.

Page 204

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Top Two Students Per class_code Using a Derived Table

WITH TeraTom AS
(
SELECT class_code, first_name, last_name , grade_pt,
ROW_NUMBER() OVER (PARTITION BY class_code
ORDER BY class_code, grade_pt DESC ) AS toptwo
FROM student_table
WHERE class_code IS not null
) SELECT * FROM TeraTom WHERE toptwo < 3
class_code first_name last_name grade_pt toptwo
FR Wendy Thomas 4.00 1
FR Henry Hanson 2.88 2
JR Jimmy Bond 3.95 1
JR Richard McRoberts 1.90 2
SO Susie Wilson 3.80 1
SO Andy Smith 2.00 2
SR Danny Delaney 3.35 1
SR Martin Phillips 3.00 2

The example above is finding the top two students with the highest grade_pt average in their class_code. The
words colored in blue in the example are the derived table needed to filter out the top two students.

Page 205

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK

SELECT product_id ,sale_date , daily_sales,


RANK() OVER (ORDER BY daily_sales asc) AS rank1
FROM sales_table What are we ranking? The column
WHERE product_id IN (1000, 2000) ; in the ORDER BY (daily_sales).
product_id sale_date daily_sales rank1
1 These rows
1000 2000-10-02 32800.50 have equal
2000 2000-10-04 32800.50 1 values
1000 2000-09-30 36000.07 3
2000 2000-10-02 36021.93 4 Not all
1000 2000-10-01 40200.43 5 rows
6 are
2000 2000-09-28 41888.88 displayed
2000 2000-10-03 43200.18 7
2000 2000-09-29 48000.00 8
1000 2000-09-28 48850.40 9
2000 2000-09-30 49850.03 10
1000 2000-09-29 54500.22 11
1000 2000-10-04 54553.10 12
2000 2000-10-01 54850.29 13

The example above uses the rank command. We are ranking on daily_sales ASC. The ORDER BY statement
identifies the column we are ranking. Notice that the first two-rows tie and the next row gets a three.

Page 206

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Dense_Rank

SELECT product_id ,sale_date , daily_sales,


DENSE_RANK() OVER (ORDER BY daily_sales) AS dense_rank1
FROM sales_table
WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales dense_rank1
1000 2000-10-02 32800.50 1
2000 2000-10-04 32800.50 1
1000 2000-09-30 36000.07 2
2000 2000-10-02 36021.93 3
Not all 1000 2000-10-01 40200.43 4
rows
are 2000 2000-09-28 41888.88 5
displayed 6
2000 2000-10-03 43200.18
2000 2000-09-29 48000.00 7
1000 2000-09-28 48850.40 8
2000 2000-09-30 49850.03 9
1000 2000-09-29 54500.22 10
1000 2000-10-04 54553.10 11

The example above uses the dense_rank command. We are ranking on daily_sales ASC. The difference between a
Rank and a Dense_Rank command is how they handle ties. Notice the first two rows tie with a rank of one, but the
next row ranks as a two. A Rank would have made the third row a three.
Page 207

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Getting RANK to Sort in DESC Order

SELECT product_id ,sale_date , daily_sales,


RANK() OVER (ORDER BY daily_sales DESC) AS rank1
FROM sales_table;
product_id sale_date daily_sales rank1 Largest
1000 2000-10-03 64300.00 1 daily_sales
value gets
2000 2000-10-01 54850.29 2 the ranking
1000 2000-10-04 54553.10 3 of one.
1000 2000-09-29 54500.22 4
2000 2000-09-30 49850.03 5
1000 2000-09-28 48850.40 6
2000 2000-09-29 48000.00 7
2000 2000-10-03 43200.18 8
2000 2000-09-28 41888.88 9
1000 2000-10-01 40200.43 10
2000 2000-10-02 36021.93 11
1000 2000-09-30 36000.07 12
2000 2000-10-02 32800.50 13
1000 2000-10-04 32800.50 13

This RANK query is sorted in descending mode. The highest daily_sales return first.

Page 208

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK() OVER and PARTITION BY

SELECT product_id ,sale_date , daily_sales,


RANK() OVER (PARTITION BY product_id
ORDER BY daily_sales DESC) AS rank1
FROM sales_table;
product_id sale_date daily_sales rank1 Largest
1000 2000-10-03 64300.00 1 daily_sales
value of
1000 2000-10-04 54553.10 2 product_id
1000 2000-09-29 54500.22 3 (1000) gets
1000 2000-09-28 48850.40 4 the ranking
of one.
1000 2000-10-01 40200.43 5
1000 2000-09-30 36000.07 6
1000 2000-10-02 32800.50 7
2000 2000-10-01 54850.29 1 Largest
2 daily_sales
2000 2000-09-30 49850.03 value of
2000 2000-09-29 48000.00 3 product_id
2000 2000-10-03 43200.18 4 (2000) gets
5 the ranking
2000 2000-09-28 41888.88 of one.
2000 2000-10-02 36021.93 6
2000 2000-10-04 32800.50 7

What does the PARTITION Statement in the RANK () OVER do? It resets the rank.
Page 209

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK() OVER, PARTITION BY, and QUALIFY

SELECT product_id ,sale_date , daily_sales,


RANK() OVER (PARTITION BY product_id
ORDER BY daily_sales DESC) AS rank1
FROM sales_table
WHERE product_id IN (1000, 2000)
QUALIFY rank1 < 4;

product_id sale_date daily_sales rank1 Qualify is a


1000 2000-10-03 64300.00 1 special filter that
1000 2000-10-04 54553.10 2 waits until all
1000 2000-09-29 3 rows are
54500.22
calculated
2000 2000-10-01 54850.29 1
before acting like
2000 2000-09-30 49850.03 2 a final WHERE
2000 2000-09-29 48000.00 3 clause.

Above, we rank the daily_sales column and partition on the product_id column. Therefore each rank is within
product_id, and because we ORDER BY daily_sales DESC, we rank the largest daily_sales with the one status. We
have a QUALIFY statement at the end, which acts as a special filter. Qualify waits until all calculations finish, but
the QUALIFY acts like a final WHERE clause before returning the answer set, but only for the ordered analytics.
So, above, we have the top three highest-ranking products for each product_id.

Page 210

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK() OVER and a Derived Table

SELECT * Anything in
FROM the color
(SELECT product_id ,sale_date , daily_sales, (red or blue)
RANK() OVER (PARTITION BY product_id was added
ORDER BY daily_sales DESC) AS rank1 as part of
FROM sales_table) as TeraTom the derived
WHERE rank1 < 3 Derived table is named TeraTom table

product_id sale_date daily_sales rank1


1000 2000-10-03 64300.00 1
1000 2000-10-04 54553.10 2
2000 2000-10-01 54850.29 1
2000 2000-09-30 49850.03 2
3000 2000-09-28 61301.77 1
3000 2000-09-30 43868.86 2

You can't use a WHERE clause to filter for calculations or analytics because the calculations still need to be
calculated. Once I ran my query to satisfaction, I added a derived table so I could get only the top three ranking
daily_sales.

Page 211

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK() OVER and a WITH Derived Table

With TeraTom AS Derived table is named TeraTom


(SELECT product_id ,sale_date , daily_sales,
RANK() OVER (PARTITION BY product_id
ORDER BY daily_sales DESC) AS rank1
FROM sales_table WHERE product_id IN (1000, 2000))
SELECT *
FROM TeraTom Anything in the color of
WHERE rank1 < 4 (red or blue) was added as
part of the derived table

product_id sale_date daily_sales rank1


1000 2000-10-03 64300.00 1
1000 2000-10-04 54553.10 2
1000 2000-09-29 54500.22 3
2000 2000-10-01 54850.29 1
2000 2000-09-30 49850.03 2
2000 2000-09-29 48000.00 3

You can't use a WHERE clause to filter for calculations or analytics because the calculations still need to be
calculated. Once I ran my query to satisfaction, I added a derived table so I could get only the top three ranking
daily_sales.

Page 212

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

RANK vs. DENSE_RANK

SELECT product_id, sale_date, daily_sales,


RANK() OVER (ORDER BY daily_sales) AS Rank1,
DENSE_RANK () OVER (ORDER BY daily_sales) AS Dense
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales rank1 dense
1000 2000-10-02 32800.50 1 1
2000 2000-10-04 32800.50 1 1
1000 2000-09-30 36000.07 3 2
2000 2000-10-02 36021.93 4 3
Not all
rows 1000 2000-10-01 40200.43 5 4
are 2000 2000-09-28 41888.88 6 5
displayed
2000 2000-10-03 43200.18 7 6
2000 2000-09-29 48000.00 8 7
1000 2000-09-28 48850.40 9 8
2000 2000-09-30 49850.03 10 9
The only difference between rank and dense_rank is
how they handle equal values (ties).

The difference between a RANK and a DENSE_RANK is how they handle ties. The DENSE_RANK will not skip
a number when the previous rows tie. Notice how the RANK skips to a 3 when the two previous rows tie with a
rank of 1.

Page 213

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

DENSE_RANK() OVER and PARTITION BY

WITH TERATOM AS
(SELECT PRODUCT_ID, SALE_DATE, DAILY_SALES,
RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY DAILY_SALES DESC) AS RANK1
FROM SALES_TABLE)
SELECT * FROM TERATOM
WHERE RANK1 < 4 ORDER BY PRODUCT_ID, RANK1 ;

PROD SALE_DATE DAILY_SALES RANK1


1000 2023-05-06 12:00:00 64300.00 1
1000 2023-05-07 12:00:00 54553.10 2
1000 2023-05-02 12:00:00 54500.22 3
2000 2023-05-04 12:00:00 54850.29 1
2000 2023-05-03 12:00:00 49850.03 2
2000 2023-05-02 12:00:00 48000.00 3
3000 2023-05-01 12:00:00 61301.77 1
3000 2023-05-03 12:00:00 43868.86 2
3000 2023-05-02 12:00:00 34509.13 3

What does the PARTITION Statement in the DENSE_RANK() OVER do? It resets the Dense_Rank.

Page 214

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENT_RANK() OVER with 14 rows in Calculation

SELECT product_id ,sale_date , daily_sales,


PERCENT_RANK()
OVER ( ORDER BY daily_sales DESC) AS percentrank1
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales percentrank1
1000 2000-10-03 64300.00 0
2000 2000-10-01 54850.29 0.08
1000 2000-10-04 54553.10 0.15
1000 2000-09-29 54500.22 0.23
2000 2000-09-30 49850.03 0.31 14 Rows in
1000 2000-09-28 48850.40 0.38 calculation
2000 2000-09-29 48000.00 0.46 for both the
2000 2000-10-03 43200.18 0.54 1000 and
2000 2000-09-28 41888.88 0.62 2000
1000 2000-10-01 40200.43 product_ids
0.69
2000 2000-10-02 36021.93 0.77
1000 2000-09-30 36000.07 0.85
2000 2000-10-04 32800.50 0.92
1000 2000-10-02 32800.50 0.92
Percent_Rank finds the relative rank of a row in a group. The formula to get
Percent_Rank is (RANK-1 / (Total Rows -1).

Page 215

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENT_RANK() OVER with 21 rows in Calculation

SELECT product_id ,sale_date , daily_sales,


PERCENT_RANK() OVER ( ORDER BY daily_sales DESC) AS percent_r
FROM sales_table ;
product_id sale_date daily_sales percent_r
1000 2000-10-03 64300.00 0
3000 2000-09-28 61301.77 0.05
2000 2000-10-01 54850.29 0.1
1000 2000-10-04 54553.10 0.15 21 Rows in
1000 2000-09-29 54500.22 0.2 calculation
Not all 2000 2000-09-30
rows
49850.03 0.25 for all
are 1000 2000-09-28 48850.40 0.3 product_ids
displayed
2000 2000-09-29 48000.00 0.35
3000 2000-09-30 43868.86 0.4
2000 2000-10-03 43200.18 0.45
2000 2000-09-28 41888.88 0.5
1000 2000-10-01 40200.43 0.55
2000 2000-10-02 36021.93 0.6
1000 2000-09-30 36000.07 0.65

Percent_Rank is just like RANK, but the Rank is a percentage. The calculation is a percent of all the other rows up
to 100%. If you compare the last two examples, you will see different results.
Page 216

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENT_RANK() OVER and PARTITION BY

SELECT product_id ,sale_date , daily_sales,


PERCENT_RANK() OVER (PARTITION BY product_id
ORDER BY daily_sales DESC) AS percentrank1
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales percentrank1
1000 2000-10-03 64300.00 0
1000 2000-10-04 54553.10 0.17 7 Rows in
1000 2000-09-29 54500.22 0.33 Calculation
1000 2000-09-28 48850.40 0.5 for 1000
1000 2000-10-01 40200.43 product_id
0.67
1000 2000-09-30 36000.07 0.83
1000 2000-10-02 32800.50 1
2000 2000-10-01 54850.29 0
2000 2000-09-30 49850.03 0.17
2000 2000-09-29 48000.00 0.33 7 Rows in
2000 2000-10-03 43200.18 0.5 Calculation
2000 2000-09-28 41888.88 0.67 for 2000
2000 2000-10-02 36021.93 0.83 product_id
2000 2000-10-04 32800.50 1

We now have added a Partition statement that resets on product_id, so this produces seven rows for each of our
product_ids.
Page 217

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum

SELECT product_id , sale_date, daily_sales,


Start on 1st
row and SUM(daily_sales) OVER (ORDER BY sale_date
continue ROWS UNBOUNDED PRECEDING) AS csumansi
until the end FROM sales_table WHERE product_id IN (1000, 2000 );

product_id sale_date daily_sales csumansi


1000 2000-09-28 48850.40 48850.40 41888.88
2000 2000-09-28 41888.88 90739.28 + 48850.40
145239.50 90739.28
1000 2000-09-29 54500.22
2000 2000-09-29 48000.00 193239.50 daily_sales
1000 2000-09-30 36000.07 229239.57 continues to be
2000 2000-09-30 49850.03 279089.60 added up
1000 2000-10-01 40200.43 319290.03
2000 2000-10-01 54850.29 374140.32 Not all
rows
1000 2000-10-02 32800.50 406940.82 are
2000 2000-10-02 36021.93 442962.75 displayed

The example above is performing a cumulative sum (CSUM). The query is an ordered analytic because it orders
the data by sale_date and then calculates the first row’s daily_sales and adds them all up until the end.

Page 218

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum with CAST


The pink colors are
SELECT product_id , sale_date, daily_sales, part of the CAST
CAST(SUM(daily_sales) OVER (ORDER BY sale_date command

ROWS UNBOUNDED PRECEDING) as Decimal(10,3)) AS csumansi


FROM sales_table WHERE product_id IN(1000, 2000) ;

product_id sale_date daily_sales csumansi 41888.88


+ 48850.40
1000 2000-09-28 48850.40 48850.400 90739.280
2000 2000-09-28 41888.88 90739.280
1000 2000-09-29 54500.22 145239.500 daily_sales
2000 2000-09-29 48000.00 193239.500 continues to be
1000 2000-09-30 36000.07 229239.570 added up
2000 2000-09-30 49850.03 279089.600
1000 2000-10-01 40200.43 319290.030
2000 2000-10-01 54850.29 374140.320
1000 2000-10-02 32800.50 406940.820
2000 2000-10-02 36021.93 442962.750

Adding the CAST command allowed me to only see two decimal places for csumansi. The CAST command
converts data types.

Page 219

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum – The Sort Explained

SELECT product_id , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY sale_date
ROWS UNBOUNDED PRECEDING) AS csumansi
FROM sales_table WHERE product_id IN (1000, 2000) ;

product_id sale_date daily_sales csumansi


1000 2000-09-28 48850.40 48850.40
2000 2000-09-28 41888.88 90739.28
1000 2000-09-29 54500.22 145239.50
Not all
rows 2000 2000-09-29 48000.00 193239.50
are 1000 2000-09-30 36000.07 229239.57
displayed
2000 2000-09-30 49850.03 279089.60
1000 2000-10-01 40200.43 319290.03
2000 2000-10-01 54850.29 374140.32
1000 2000-10-02 32800.50 406940.82
2000 2000-10-02 36021.93 442962.75

The first thing the above query does before calculating is SORT all the rows by sale_date. The Sort is located right
after the ORDER BY statement.

Page 220

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum – Rows Unbounded Preceding Explained

SELECT product_id , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY sale_date
ROWS UNBOUNDED PRECEDING) AS csumansi
FROM sales_table WHERE product_id IN (1000, 2000);

product_id sale_date daily_sales csumansi


1000 2000-09-28 48850.40 48850.40
2000 2000-09-28 41888.88 90739.28
1000 2000-09-29 54500.22 145239.50
Not all
rows 2000 2000-09-29 48000.00 193239.50
are 1000 2000-09-30 36000.07 229239.57
displayed
2000 2000-09-30 49850.03 279089.60
1000 2000-10-01 40200.43 319290.03
2000 2000-10-01 54850.29 374140.32
1000 2000-10-02 32800.50 406940.82
2000 2000-10-02 36021.93 442962.75

The keywords ROWS UNBOUNDED PRECEDING determine that this is a CSUM. There are only a few
different statements, and Rows Unbounded Preceding is the main one. It means start calculating at the beginning
row and continue calculating until the last row.

Page 221

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum – Making Sense of the Data

SELECT product_id , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY sale_date
ROWS UNBOUNDED PRECEDING) AS sumover
FROM sales_table WHERE product_id BETWEEN 1000 and 2000 ;

product_id sale_date daily_sales csumansi


1000 2000-09-28 48850.40 48850.40
2000 2000-09-28 41888.88 90739.28
1000 2000-09-29 54500.22 145239.50
Not all
rows 2000 2000-09-29 48000.00 193239.50
are 1000 2000-09-30 36000.07 229239.57
displayed
2000 2000-09-30 49850.03 279089.60
1000 2000-10-01 40200.43 319290.03
2000 2000-10-01 54850.29 374140.32
1000 2000-10-02 32800.50 406940.82
2000 2000-10-02 36021.93 442962.75

The second sumover row is 90739.28. The calculation is the first row’s daily_sales (48850.40) added to the
SECOND row’s daily_sales (41888.88). It continues to add up the running total until the last row.

Page 222

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Cumulative Sum – Major and Minor Sort Keys

Major Sort Minor


SELECT product_id , sale_date, daily_sales,
SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS UNBOUNDED PRECEDING) AS sumover
FROM sales_table ;
The data product_id sale_date daily_sales sumover
sorts first
by
1000 2000-09-28 48850.40 48850.40
product_id, 1000 2000-09-29 54500.22 103350.62
and then by 1000 2000-09-30 36000.07 139350.69
sale_date 1000 2000-10-01 40200.43 179551.12
within the 1000 2000-10-02 32800.50 212351.62
product_id. 1000 2000-10-03 64300.00 276651.62
1000 2000-10-04 54553.10 331204.72
2000 2000-09-28 41888.88 373093.60
2000 2000-09-29 48000.00 421093.60
2000 2000-09-30 49850.03 470943.63
2000 2000-10-01 54850.29 525793.92

You can have more than one SORT KEY. In the top query, product_id is the MAJOR Sort, and sale_date is the
MINOR Sort.

Page 223

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Reset with a PARTITION BY Statement

SELECT product_id , sale_date, daily_sales,


SUM(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id, sale_date
ROWS UNBOUNDED PRECEDING) AS sumansi Reset on
FROM sales_table ; each
product_id
product_id sale_date daily_sales sumansi break.
1000 2000-09-28 48850.40 48850.40
1000 2000-09-29 54500.22 103350.62
1000 2000-09-30 36000.07 139350.69
1000 2000-10-01 40200.43 179551.12
1000 2000-10-02 32800.50 212351.62
1000 2000-10-03 64300.00 276651.62
1000 2000-10-04 54553.10 331204.72
When product_id
2000 2000-09-28 41888.88 41888.88 2000 enters, the
2000 2000-09-29 48000.00 89888.88 CSUM calculation
2000 2000-09-30 49850.03 139738.91 starts over.

The PARTITION Statement is how you reset in ANSI. The partition statement will cause the column alias
sumansi to start over (reset) on its calculating for each NEW product_id.

Page 224

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Totals and Subtotals through Partition By

SELECT PRODUCT_ID AS PROD, SALE_DATE, DAILY_SALES,


SUM(DAILY_SALES) OVER (ORDER BY PRODUCT_ID, SALE_DATE
ROWS 2 PRECEDING) AS SUM3,
SUM(DAILY_SALES) OVER (PARTITION BY PRODUCT_ID
ORDER BY PRODUCT_ID, SALE_DATE ANSI reset
ROWS UNBOUNDED PRECEDING) AS RESET_CALC much Like a
FROM SALES_TABLE; GROUP BY
PROD SALE_DATE DAILY_SALES SUM3 RESET_CALC
1000 2023-05-01 12:00:00 48850.40 48850.4 48850.40
1000 2023-05-02 12:00:00 54500.22 103350.62 103350.62
Not all 1000 2023-05-03 12:00:00 36000.07 139350.69 139350.69
rows
are 1000 2023-05-04 12:00:00 40200.43 130700.72 179551.12
displayed 1000 2023-05-05 12:00:00 32800.50 109001 212351.62
1000 2023-05-06 12:00:00 64300.00 137300.93 276651.62
1000 2023-05-07 12:00:00 54553.10 151653.6 331204.72
2000 2023-05-01 12:00:00 41888.88 160741.98 41888.88
2000 2023-05-02 12:00:00 48000.00 144441.98 89888.88

Above are two OLAP statements. The grandtotal column one has PARTITION BY, so only it resets.

Page 225

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Sum

SELECT product_id , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS sum3_ansi
FROM sales_table ; Calculate the
Moving Current Row
Window and 2 rows
of 3 rows preceding

product_id sale_date daily_sales sum3_ansi


1000 2000-09-28 48850.40 48850.40
Not all 1000 2000-09-29 54500.22 103350.62
rows
are 1000 2000-09-30 36000.07 139350.69
displayed 1000 2000-10-01 40200.43 130700.72
1000 2000-10-02 32800.50 109001.00
1000 2000-10-03 64300.00 137300.93
1000 2000-10-04 54553.10 151653.60
2000 2000-09-28 41888.88 160741.98
2000 2000-09-29 48000.00 144441.98

The SUM () Over allows you to get the moving SUM of a specific column. The moving window in ANSI form
always includes the current row. When you see “ROWS 2 PRECEDING”, this means to calculate the current row
and two preceding rows. They are adding up the daily_sales every three rows looking for trends.
Page 226

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving SUM every 3-rows Vs. a Continuous Average

SELECT product_id as prod , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 PRECEDING) AS sum3,
SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS UNBOUNDED PRECEDING) AS continuous
FROM sales_table; Not all
rows
are
prod sale_date daily_sales sum3 continuous displayed

1000 2000-09-28 48850.40 48850.40 48850.40


1000 2000-09-29 54500.22 103350.62 103350.62
1000 2000-09-30 36000.07 139350.69 139350.69
1000 2000-10-01 40200.43 130700.72 179551.12
1000 2000-10-02 32800.50 109001.00 212351.62
1000 2000-10-03 64300.00 137300.93 276651.62
1000 2000-10-04 54553.10 151653.60 331204.72
2000 2000-09-28 41888.88 160741.98 373093.60
2000 2000-09-29 48000.00 144441.98 421093.60

The first ordered analytic statement gives a moving sum with a moving window of 3. The second ordered analytic
statement is performing a continuous sum from the first row to the last.

Page 227

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Partition By Resets the Calculations

SELECT product_id as prod , sale_date, daily_sales,


SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS sum3,
SUM(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id, sale_date ANSI reset
ROWS UNBOUNDED Preceding) AS continuous much Like a
FROM sales_table; GROUP BY
prod sale_date daily_sales sum3 continuous
1000 2000-09-28 48850.40 48850.40 48850.40
1000 2000-09-29 54500.22 103350.62 103350.62
Not all 1000 2000-09-30 36000.07 139350.69 139350.69
rows
are 1000 2000-10-01 40200.43 130700.72 179551.12
displayed 1000 2000-10-02 32800.50 109001.00 212351.62
1000 2000-10-03 64300.00 137300.93 276651.62
1000 2000-10-04 54553.10 151653.60 331204.72
2000 2000-09-28 41888.88 160741.98 41888.88
2000 2000-09-29 48000.00 144441.98 89888.88

The “Partition By” statement resets the calculations with each product_id break.

Page 228

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Average

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ;
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
1000 2000-09-30 36000.07 46450.230000
1000 2000-10-01 40200.43 43566.906667 Not all
rows
1000 2000-10-02 32800.50 36333.666667 are
displayed
1000 2000-10-03 64300.00 45766.976667
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.326667
2000 2000-09-30 49850.03 46579.636667

The AVG () Over allows you to get the moving AVG of a specific column. The moving window in ANSI form
always includes the current row. When you see “ROWS 2 PRECEDING”, this means to average the current row
and two preceding rows. They are averaging the daily_sales every three rows looking for trends.

Page 229

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

The Moving Window is Current Row and Preceding

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ;
Moving Window Calculate the Current Row
of 3 rows and 2 rows preceding
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
Not all 1000 2000-09-30 36000.07 46450.230000
rows
are
1000 2000-10-01 40200.43 43566.900000
displayed 1000 2000-10-02 32800.50 36333.660000
1000 2000-10-03 64300.00 45766.970000
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.320000
2000 2000-09-30 49850.03 46579.636667

The example above is doing a moving average for 3-rows at a time. The first row is a single calculation because
no rows precede it, and the second row in the calculation averages two-rows. From the third row until the end,
each row is averaging the current row and previous two-rows. A moving average is to look for trends when the
business did well or not so well.

Page 230

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

How Moving Average Handles the Order By

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ; Major and Minor Sort keys
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
1000 2000-09-30 36000.07 46450.230000
Not all 1000 2000-10-01 40200.43 43566.906667
rows
are 1000 2000-10-02 32800.50 36333.666667
displayed 1000 2000-10-03 64300.00 45766.976667
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.326667
2000 2000-09-30 49850.03 46579.636667
2000 2000-10-01 54850.29 50900.106667
2000 2000-10-02 36021.93 46907.416667

The moving average in the example above is ordering the data by product_id and then by sale_date. Only then are
the averages calculated.

Page 231

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Quiz – How is that Total Calculated?

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ;
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
1000 2000-09-30 36000.07 46450.230000
1000 2000-10-01 40200.43 43566.906667
Not all 2000-10-02 32800.50
rows
1000 36333.666667
are 1000 2000-10-03 64300.00 45766.976667
displayed
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.326667
2000 2000-09-30 49850.03 46579.636667
2000 2000-10-01 54850.29 50900.106667
2000 2000-10-02 36021.93 46907.416667

With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3 column in the fourth row?

Page 232

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Answer to Quiz – How is that Total Calculated?

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ;
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
1000 2000-09-30 36000.07 46450.230000
1000 2000-10-01 40200.43 43566.906667
Not all 2000-10-02 32800.50
rows
1000 36333.666667
are 1000 2000-10-03 64300.00 45766.976667
displayed
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.326667
2000 2000-09-30 49850.03 46579.636667
2000 2000-10-01 54850.29 50900.106667
2000 2000-10-02 36021.93 46907.416667
AVG of 48850.40, 54500.22, and 36000.07

With a Moving Window of 3, the 46450.23 amount derived in the third row is the average of 48850.40, 54500.22,
and 36000.07.
Page 233

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Quiz – How is that 4th Row Calculated?

SELECT product_id , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg_3
FROM sales_table ;
product_id sale_date daily_sales avg_3
1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
1000 2000-09-30 36000.07 46450.230000
1000 2000-10-01 40200.43 43566.906667
Not all
rows 1000 2000-10-02 32800.50 36333.666667
are
displayed
1000 2000-10-03 64300.00 45766.976667
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 53580.660000
2000 2000-09-29 48000.00 48147.326667
2000 2000-09-30 49850.03 46579.636667
2000 2000-10-01 54850.29 50900.106667
2000 2000-10-02 36021.93 46907.416667

With a Moving Window of 3, how is the 43566.91 amount derived in the avg_3 column in the fourth row?

Page 234

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Answer to Quiz – How is that 4th Row Calculated?

With a Moving Window of 3, the 43566.91 amount derived in the fourth row is the average of 54500.22,
36000.07, and 40200.43. It is the current row and previous two-rows in the moving window of 3.

Page 235

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Average every 3-rows Vs. a Continuous Average

SELECT product_id as prod , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg3,
AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS UNBOUNDED Preceding) AS continuous
FROM sales_table;
prod sale_date daily_sales avg3 continuous
1000 2000-09-28 48850.40 48850.400000 48850.400000
Not all
rows 1000 2000-09-29 54500.22 51675.310000 51675.310000
are 1000 2000-09-30 36000.07 46450.230000 46450.230000
displayed
1000 2000-10-01 40200.43 43566.910000 44887.780000
1000 2000-10-02 32800.50 36333.670000 42470.324000
1000 2000-10-03 64300.00 45788.980000 46108.603333
1000 2000-10-04 54553.10 50551.200000 47314.960000
2000 2000-09-28 41888.88 53580.660000 46636.700000
2000 2000-09-29 48000.00 48147.330000 46788.177778

The first ordered analytic statement gives a moving AVG with a moving window of 3. The second ordered
analytic statement is performing a continuous average from the first row to the last.

Page 236

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

The Partition By Statement

SELECT product_id , sale_date, daily_sales, Reset on product_id breaks


AVG(daily_sales) OVER (Partition By product_id
ORDER BY product_id, sale_date ROWS 2 Preceding) AS avg_3
FROM sales_table ; Calculate the Current Row
Moving Window
of 3 rows and 2 rows preceding

product_id sale_date daily_sales avg_3


1000 2000-09-28 48850.40 48850.400000
1000 2000-09-29 54500.22 51675.310000
Not all 1000 2000-09-30 36000.07 46450.230000
rows 2000-10-01
are
1000 40200.43 43566.906667
displayed 1000 2000-10-02 32800.50 36333.666667
1000 2000-10-03 64300.00 45766.976667
1000 2000-10-04 54553.10 50551.200000
2000 2000-09-28 41888.88 41888.880000
2000 2000-09-29 48000.00 44944.440000
2000 2000-09-30 49850.03 46579.636667

The example above is doing a moving average for 3-rows at a time. The first row is a single calculation because
no rows precede it, and the second row in the calculation averages two rows. From the third row until the end,
each row averages the current row and the previous two rows. The Partition By statement means to reset the
calculation on product_id breaks. averages two rows
Page 237

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Partition By Resets an ANSI OLAP

SELECT product_id as prod , sale_date, daily_sales,


AVG(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS 2 Preceding) AS avg3,
AVG(daily_sales) OVER (PARTITION BY product_id
ANSI reset
ORDER BY product_id, sale_date
much Like a
ROWS UNBOUNDED Preceding) AS continuous GROUP BY
FROM sales_table;
prod sale_date daily_sales avg3 continuous
1000 2000-09-28 48850.40 48850.400000 48850.400000
1000 2000-09-29 54500.22 51675.310000 51675.310000
Not all
rows
1000 2000-09-30 36000.07 46450.230000 46450.230000
are 1000 2000-10-01 40200.43 43566.906667 44887.780000
displayed
1000 2000-10-02 32800.50 36333.666667 42470.324000
1000 2000-10-03 64300.00 45766.976667 46108.603333
1000 2000-10-04 54553.10 50551.200000 47314.960000
2000 2000-09-28 41888.88 53580.660000 41888.880000
2000 2000-09-29 48000.00 48147.326667 44944.440000

Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets on the column
within the statement. Notice that only the column alias continuous resets, but avg3 does not.

Page 238

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Difference

SELECT product_id, sale_date, daily_sales,


daily_sales - SUM(daily_sales) OVER ( ORDER BY product_id, sale_date
ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING) AS mdiff_ansi
FROM sales_table ;
product_id sale_date daily_sales mdiff_ansi
1000 2000-09-28 48850.40 ? The answer set is
1000 2000-09-29 54500.22 ? color coded for
36000.07 ? you. Each color
1000 2000-09-30
is the difference
1000 2000-10-01 40200.43 ? between the two
1000 2000-10-02 32800.50 -16049.90 rows. Notice they
64300.00 9799.78 are each four
1000 2000-10-03
rows apart.
1000 2000-10-04 54553.10 18553.03
2000 2000-09-28 41888.88 1688.45 The difference
2000 2000-09-29 48000.00 15199.50 between the
daily_sales value
2000 2000-09-30 49850.03 -14449.97 on row 1 vs row 5
2000 2000-10-01 54850.29 297.19 is -16049.90.

The example above is a moving difference. Only two rows compare at the time, and that is the current row with
the row four rows ahead. I have color-coded the answer set to show you the two rows that compare. The fifth row
corresponds to the first row and has a -16049.90 difference. The sixth row compares to the second row.

Page 239

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Difference with Partition By

SELECT product_id, sale_date, daily_sales, Partition By Resets the Calculation by product_id


daily_sales - SUM(daily_sales) OVER (PARTITION BY product_id
ORDER BY sale_date ASC
ROWS BETWEEN 2 PRECEDING AND 2 PRECEDING) AS mdiff_ansi
FROM sales_table;
product_id sale_date daily_sales mdiff_ansi
1000 2000-09-28 48850.40 ?
1000 2000-09-29 54500.22 ?
1000 2000-09-30 36000.07 -12850.33
1000 2000-10-01 40200.43 -14299.79
1000 2000-10-02 32800.50 -3199.57
1000 2000-10-03 64300.00 24099.57
1000 2000-10-04 54553.10 21752.60
2000 2000-09-28 41888.88 ?
2000 2000-09-29 48000.00 ?
2000 2000-09-30 49850.03 7961.15
2000 2000-10-01 54850.29 6850.29
2000 2000-10-02 36021.93 -13828.10

The moving difference query above has a moving window of 4 and a PARTITION BY statement. This statement
means to reset the calculations with every product_id break.
Page 240

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Moving Difference with Partition By

SELECT product_id, sale_date, daily_sales,


daily_sales - SUM(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id ASC, sale_date ASC
ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING) AS mdiff_ansi
FROM sales_table;
product_id sale_date daily_sales mdiff_ansi 48850.40
compares
1000 2000-09-28 48850.40 ? With
1000 2000-09-29 54500.22 ? 32800.50
1000 2000-09-30 36000.07 ? (four rows
separated)
1000 2000-10-01 40200.43 ?
Not all to see we
rows 1000 2000-10-02 32800.50 -16049.90 lost
are
displayed
1000 2000-10-03 64300.00 9799.78 -16049.90
1000 2000-10-04 54553.10 18553.03 41888.88
2000 2000-09-28 41888.88 ? compares
2000 2000-09-29 48000.00 ? With
2000 2000-09-30 49850.03 36021.93
? to see we
2000 2000-10-01 54850.29 ? lost
2000 2000-10-02 36021.93 -5866.95 -5866.95

The moving difference query above has a moving window of 4 and a PARTITION BY statement. This statement
means to reset the calculations with every product_id break.
Page 241

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Finding a Value of a Column in the Next Row with MIN

SELECT product_id, sale_date, daily_sales,


MIN(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS BETWEEN 1 Following and 1 Following) AS nextsale
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales nextsale
1000 2000-09-28 48850.40 54500.22
1000 2000-09-29 54500.22 36000.07
1000 2000-09-30 36000.07 40200.43
Not all
1000 2000-10-01 40200.43 32800.50
rows 1000 2000-10-02 32800.50 64300.00
are
displayed
1000 2000-10-03 64300.00 54553.10
1000 2000-10-04 54553.10 41888.88
2000 2000-09-28 41888.88 48000.00
2000 2000-09-29 48000.00 49850.03
2000 2000-09-30 49850.03 54850.29
2000 2000-10-01 54850.29 36021.93

The above example finds the value of a column in the next row for daily_sales. You can use MIN or MAX
interchangeably when you want the next value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING.
Page 242

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Finding a Next Row Value with MIN and PARTITION BY

SELECT product_id, sale_date, daily_sales,


MIN(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id, sale_date
ROWS BETWEEN 1 Following and 1 Following) AS nextsale
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales nextsale
1000 2000-09-28 48850.40 54500.22
1000 2000-09-29 54500.22 36000.07
1000 2000-09-30 36000.07 40200.43
Not all 1000 2000-10-01 40200.43 32800.50
rows
are 1000 2000-10-02 32800.50 64300.00
displayed
1000 2000-10-03 64300.00 54553.10
1000 2000-10-04 54553.10 ?
2000 2000-09-28 41888.88 48000.00
2000 2000-09-29 48000.00 49850.03
2000 2000-09-30 49850.03 54850.29
2000 2000-10-01 54850.29 36021.93

The above example finds the value of a column in the next row for daily_sales. You can use MIN or MAX
interchangeably when you want the next value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING. Notice how the PARTITION BY statement resets to null for the last row in product_id 1000.

Page 243

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Finding The Next Date using MAX

SELECT product_id, sale_date, daily_sales,


MAX(sale_date) OVER (ORDER BY product_id, sale_date
ROWS BETWEEN 1 Following and 1 Following) AS nextdate
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales nextdate
1000 2000-09-28 48850.40 2000-09-29
1000 2000-09-29 54500.22 2000-09-30
1000 2000-09-30 36000.07 2000-10-01
1000 2000-10-01 40200.43 2000-10-02
Not all
rows
1000 2000-10-02 32800.50 2000-10-03
are 1000 2000-10-03 64300.00 2000-10-04
displayed
1000 2000-10-04 54553.10 2000-09-28
2000 2000-09-28 41888.88 2000-09-29
2000 2000-09-29 48000.00 2000-09-30
2000 2000-09-30 49850.03 2000-10-01
2000 2000-10-01 54850.29 2000-10-02

The above example finds the value of a column in the next row for sale_date. You can use MIN or MAX
interchangeably when you want the next date value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING.

Page 244

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Finding Multiple Values of a Column in Upcoming Rows

SELECT product_id as Prod, sale_date, daily_sales,


MIN(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS BETWEEN 1 Following and 1 Following) AS nextsale,
SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS BETWEEN 2 Following and 2 Following) AS rows_2_down
FROM sales_table WHERE product_id IN (1000) ;
prod sale_date daily_sales nextsale rows_2_down
1000 2000-09-28 48850.40 54500.22 36000.07
Not all 1000 2000-09-29 54500.22 36000.07 40200.43
rows 1000 2000-09-30 36000.07 40200.43 32800.50
are
displayed 1000 2000-10-01 40200.43 32800.50 64300.00
1000 2000-10-02 32800.50 64300.00 54553.10
1000 2000-10-03 64300.00 54553.10 ?
1000 2000-10-04 54553.10 ? ?

The above example finds the value of a column in the next row for daily_sales. The keywords are ROWS
BETWEEN 1 FOLLOWING and 1 FOLLOWING, which delivers the next row's daily_sales. The keywords
ROWS BETWEEN 2 FOLLOWING and 2 FOLLOWING provides the daily_sales value two rows down.

Page 245

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

COUNT OVER for a Sequential Number

SELECT product_id ,sale_date , daily_sales,


COUNT(*) OVER (ORDER BY product_id, sale_date)
AS seq_number
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales seq_number
1000 2000-09-28 48850.40 1
1000 2000-09-29 54500.22 2
1000 2000-09-30 36000.07 3
1000 2000-10-01 40200.43 4
Not all
rows 1000 2000-10-02 32800.50 5
are 1000 2000-10-03 64300.00 6
displayed
1000 2000-10-04 54553.10 7
2000 2000-09-28 41888.88 8
2000 2000-09-29 48000.00 9
2000 2000-09-30 49850.03 10
2000 2000-10-01 54850.29 11

The example above is the COUNT OVER. It will provide a sequential number starting at 1. The COUNT OVER
continues to add up the previous total by one until there are no more rows. You do not need ROWS
UNBOUNDED PRECEDING.

Page 246

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

COUNT OVER using ROWS UNBOUNDED PRECEDING

SELECT product_id ,sale_date , daily_sales,


COUNT(*) OVER (ORDER BY product_id, sale_date
ROWS UNBOUNDED PRECEDING) AS seq_number
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales seq_number
1000 2000-09-28 48850.40 1
1000 2000-09-29 54500.22 2
1000 2000-09-30 36000.07 3
1000 2000-10-01 40200.43 4
Not all 1000 2000-10-02 32800.50 5
rows
are 1000 2000-10-03 64300.00 6
displayed
1000 2000-10-04 54553.10 7
2000 2000-09-28 41888.88 8
2000 2000-09-29 48000.00 9
2000 2000-09-30 49850.03 10
2000 2000-10-01 54850.29 11

The example above is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s)
ROWS UNBOUNDED PRECEDING is not necessary, but they will not cause an error if present. The COUNT
OVER continues to add up the previous total by one until there are no more rows.

Page 247

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

The MAX OVER Command

SELECT product_id ,sale_date , daily_sales,


MAX(daily_sales) OVER (ORDER BY product_id, sale_date) AS maxover
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales maxover
1000 2000-09-28 48850.40 48850.40
1000 2000-09-29 54500.22 54500.22
1000 2000-09-30 36000.07 54500.22
1000 2000-10-01 40200.43 54500.22
1000 2000-10-02 32800.50 54500.22
1000 2000-10-03 64300.00 64300.00
1000 2000-10-04 54553.10 64300.00
2000 2000-09-28 41888.88 64300.00
2000 2000-09-29 48000.00 64300.00
2000 2000-09-30 49850.03 64300.00
2000 2000-10-01 54850.29 64300.00
2000 2000-10-02 36021.93 64300.00
2000 2000-10-03 43200.18 64300.00
2000 2000-10-04 32800.50 64300.00

After the sort, the Max () Over shows the Max Value up to that point. With each new max, a new number is a
max.
Page 248

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

MAX OVER with PARTITION BY Reset

SELECT product_id ,sale_date , daily_sales,


MAX(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id, sale_date) AS maxover
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales maxover
1000 2000-09-28 48850.40 48850.40
1000 2000-09-29 54500.22 54500.22
1000 2000-09-30 36000.07 54500.22
1000 2000-10-01 40200.43 54500.22
Not all
rows 1000 2000-10-02 32800.50 54500.22
are
displayed
1000 2000-10-03 64300.00 64300.00
1000 2000-10-04 54553.10 64300.00
2000 2000-09-28 41888.88 41888.88
2000 2000-09-29 48000.00 48000.00
2000 2000-09-30 49850.03 49850.03
2000 2000-10-01 54850.29 54850.29

The largest value is 64300.00 in the column maxover. Once 64300.00 arrives, it is the max until the product_id
breaks. The PARTITION BY statement resets the calculation.

Page 249

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

The MIN OVER Command

SELECT product_id, sale_date ,daily_sales


,MIN(daily_sales) OVER (ORDER BY product_id, sale_date) minover
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales minover
1000 2000-09-28 48850.40 48850.40
1000 2000-09-29 54500.22 48850.40
1000 2000-09-30 36000.07 36000.07
1000 2000-10-01 40200.43 36000.07
1000 2000-10-02 32800.50 32800.50
1000 2000-10-03 64300.00 32800.50
1000 2000-10-04 54553.10 32800.50
2000 2000-09-28 41888.88 32800.50
2000 2000-09-29 48000.00 32800.50
2000 2000-09-30 49850.03 32800.50
2000 2000-10-01 54850.29 32800.50
2000 2000-10-02 36021.93 32800.50
2000 2000-10-03 43200.18 32800.50
2000 2000-10-04 32800.50 32800.50

After the sort, the MIN () Over shows the Min Value up to that point. With each new Min, that new Min appears.

Page 250

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

The MIN OVER Command with PARTITION BY

SELECT product_id, sale_date ,daily_sales


,MIN(daily_sales) OVER (PARTITION BY product_id
ORDER BY product_id, sale_date ) AS minover
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales minover
1000 2000-09-28 48850.40 48850.40
1000 2000-09-29 54500.22 48850.40
1000 2000-09-30 36000.07 36000.07
1000 2000-10-01 40200.43 36000.07
1000 2000-10-02 32800.50 32800.50
1000 2000-10-03 64300.00 32800.50
1000 2000-10-04 54553.10 32800.50
2000 2000-09-28 41888.88 41888.88
2000 2000-09-29 48000.00 41888.88
2000 2000-09-30 49850.03 41888.88
2000 2000-10-01 54850.29 41888.88
2000 2000-10-02 36021.93 36021.93
2000 2000-10-03 43200.18 36021.93
2000 2000-10-04 32800.50 32800.50

The MIN calculation resets and starts over with each product_id break. Partition By causes analytics to reset.
Page 251

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Different Windowing Options

SELECT product_id, sale_date, daily_sales


,CAST(SUM(daily_sales)
OVER( PARTITION BY product_id ORDER BY product_id, sale_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) as Decimal(10,2))
as row_preceding
,CAST(SUM(daily_sales)
OVER( PARTITION BY product_id ORDER BY product_id, sale_date
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as Decimal(10,2))
as row_following
FROM sales_table WHERE product_id = 1000

product_id sale_date daily_sales row_preceding row_following


1000 2000-09-28 48850.40 48850.40 103350.62
1000 2000-09-29 54500.22 103350.62 90500.29
1000 2000-09-30 36000.07 90500.29 76200.50
1000 2000-10-01 40200.43 76200.50 73000.93
1000 2000-10-02 32800.50 73000.93 97100.50
1000 2000-10-03 64300.00 97100.50 118853.10
1000 2000-10-04 54553.10 118853.10 54553.10

The example above uses ROWS BETWEEN 1 PRECEDING AND CURRENT ROW, and then it uses a different
example with ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Notice how the report came out?

Page 252

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

How Ntile Works

SELECT product_id, sale_date, daily_sales


,NTILE (4) OVER (ORDER BY daily_sales asc ) AS quartiles
FROM sales_table WHERE product_id = 1000;

Expr of 4 provides four buckets

product_id sale_date daily_sales quartiles


1000 2020-10-02 32800.50 1
1000 2020-09-30 36000.07 1
1000 2020-10-01 40200.43 2
1000 2020-09-28 48850.40 2
1000 2020-09-29 54500.22 3
1000 2000-10-04 54553.10 3
1000 2020-10-03 64300.00 4
1000 2020-10-05 70000.00 4

Assigning a different value to the partition's indicator of the Ntile function changes the number of partitions
established. Each Ntile partition is assigned a number starting at one increasing to a value that is one less than the
partition number specified. So, with a Ntile of 4, the partitions are 1 through 4. Then, all the rows are distributed
as evenly as possible into each partition from highest to lowest values. Typically, extra rows with the lowest value
begin back in the lowest numbered partitions.

Page 253

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Ntile in DESC Mode

SELECT last_name, grade_pt,


NTILE(5) OVER (ORDER BY grade_pt DESC nulls last) as bucket
FROM student_table
nulls first or last will allow you
Expr of 5 provides five buckets
to specifically order nulls.
last_name grade_pt bucket
Thomas 4.00 1
Bond 3.95 1
Wilson 3.80 2
Delaney 3.35 2
Phillips 3.00 3
Hanson 2.88 3
Smith 2.00 4
McRoberts 1.90 4
Larkins 0.00 5
Johnson ? 5

The Ntile function organizes rows into n number of groups. These groups refer to the name tiles. The tile number
returns in the answer set. For example, the example above has ten rows, so NTILE(5) splits the ten rows into five
equally sized tiles. There are two rows in each tile in the order of the OVER() clause's ORDER BY clause.

Page 254

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Ntile

SELECT last_name, grade_pt,


NTILE(5) OVER (ORDER BY grade_pt) as tile
FROM student_table
ORDER BY tile DESC;
last_name grade_pt tile
Bond 3.95 5
Thomas 4.00 5
Delaney 3.35 4
Wilson 3.80 4
Hanson 2.88 3
Phillips 3.00 3
McRoberts 1.90 2
Smith 2.00 2
Johnson ? 1
Larkins 0.00 1

The Ntile function organizes rows into n number of groups. These groups refer to the name tiles. The tile number
returns in the answer set. For example, the example above has ten rows, so NTILE(5) splits the ten rows into five
equally sized tiles. There are two rows in each tile in the order of the OVER() clause's ORDER BY clause.
Page 255

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Ntile Continued

SELECT dept_no, employeecount,


NTILE(2) OVER (ORDER BY employeecount) as tile
FROM (SELECT dept_no, COUNT(*) as employeecount
FROM employee_table
GROUP BY dept_no
) AS Q
ORDER BY 3 DESC;
dept_no employeecount tile
100 1 2
200 2 2
400 3 2
? 1 1
10 1 1
300 1 1

The Ntile function organizes rows into n number of groups. These tiles return in the answer set. For example, the
example above has six rows, so NTILE(2) splits the ten rows into two equally sized tiles. There are three rows in
each tile in the order of the OVER() clause's ORDER BY clause.

Page 256

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Ntile Percentile

SELECT claim_id, claim_date, claimcount,


NTILE(100) OVER (ORDER BY claimcount) as percentile
FROM (SELECT claim_id, claim_date, COUNT(*) as claimcount
FROM claims
Everything in
GROUP BY claim_id, claim_date blue is part of
) AS Q a derived
ORDER BY percentile DESC table

claim_id claim_date claimcount percentile


1302111 2003-03-01 4 26
4307444 2003-07-05 3 25
3306333 2003-06-28 3 24 Not all
rows
3308333 2003-08-01 2 23 are
1302111 2003-03-02 2 22 displayed

4306444 2003-06-03 2 21
4306444 2003-06-02 2 20
3402222 2004-02-28 2 19
1302111 2003-02-28 2 18

The Ntile function organizes rows into n number of groups, so the above example is a way to get the percentile.

Page 257

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Another Ntile Example

This example determines the percentile for every row in the Sales
table based on the daily sales amount and sorts it into sequence
by the value being categorized, which here is daily sales.

SELECT product_id, sale_date, daily_sales


,NTILE(100) OVER (ORDER BY daily_sales) AS quantile
FROM sales_table
WHERE product_id < 2000 ;
product_id sale_date daily_sales Quantile
1000 2000-10-02 32800.50 1
1000 2000-09-30 36000.07 2
1000 2000-10-01 40200.43 3
1000 2000-09-28 48850.40 4
1000 2000-09-29 54500.22 5
1000 2000-10-04 54553.10 6
1000 2000-10-03 64300.00 7

Above is another Ntile example.

Page 258

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using Quantiles (Partitions of Four)

SELECT product_id, sale_date, daily_sales


,NTILE (4) OVER (Order by daily_sales , sale_date ) AS quartiles
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales quartiles
1000 2000-10-02 32800.50 1
2000 2000-10-04 32800.50 1
1000 2000-09-30 36000.07 1
2000 2000-10-02 36021.93 1
1000 2000-10-01 40200.43 2
2000 2000-09-28 41888.88 2
2000 2000-10-03 43200.18 2
2000 2000-09-29 48000.00 2
1000 2000-09-28 48850.40 3
2000 2000-09-30 49850.03 3
1000 2000-09-29 54500.22 3
1000 2000-10-04 54553.10 4
2000 2000-10-01 54850.29 4
1000 2000-10-03 64300.00 4

Instead of 100, the example above uses a quartile (QUANTILE based on four partitions).

Page 259

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

NTILE With a Partition


SELECT product_id ,sale_date , daily_sales,
NTILE(3) OVER (PARTITION BY product_id ORDER BY daily_sales) bucket
FROM sales_table WHERE product_id IN (1000, 2000) ;
PARTITION BY product_id calculates within each product_id.
product_id sale_date daily_sales bucket
1000 2000-10-02 32800.50 1
1000 2000-09-30 36000.07 1
1000 2000-10-01 40200.43 1
1000 2000-09-28 48850.40 2
1000 2000-09-29 54500.22 2
1000 2000-10-04 54553.10 3
1000 2000-10-03 64300.00 3
2000 2000-10-04 32800.50 1
2000 2000-10-02 36021.93 1
2000 2000-09-28 41888.88 1
2000 2000-10-03 43200.18 2
2000 2000-09-29 48000.00 2
2000 2000-09-30 49850.03 3
2000 2000-10-01 54850.29 3

The NTILE() function divides the rows into buckets as evenly as possible. In this example, because PARTITION
BY is listed, the data will first be sorted by product_id and then sorted using the ORDER BY clause (within
product_id), and then divided into the number of buckets specified. This example uses a value of 3 in the NTILE.
Notice that the PARTITION BY statement causes the answer set to reset when the product_id goes from 1000 to
2000.
Page 260

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

NTILE With a Qualify Statement

SELECT product_id ,sale_date , daily_sales,


NTILE(3) OVER (PARTITION BY product_id
ORDER BY daily_sales DESC) AS bucket_one
FROM sales_table
Qualify bucket_one = 1
ORDER BY product_id, daily_sales DESC
product_id sale_date daily_sales bucket_one
1000 2000-10-03 64300.00 1
1000 2000-10-04 54553.10 1
1000 2000-09-29 54500.22 1
2000 2000-10-01 54850.29 1
2000 2000-09-30 49850.03 1
2000 2000-09-29 48000.00 1
3000 2000-09-28 61301.77 1
3000 2000-09-30 43868.86 1
3000 2000-09-29 34509.13 1

The QUALIFY statement is like a WHERE filter but used after the calculations. The example above returns only
the rows from each product_id placed in the first bucket.

Page 261

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using FIRST_VALUE

SELECT last_name, first_name, dept_no


,FIRST_VALUE(first_name)
OVER (ORDER BY dept_no nulls last, last_name desc
rows unbounded preceding) AS first_all
,FIRST_VALUE(first_name) OVER (PARTITION BY dept_no
ORDER BY dept_no nulls last, last_name desc
rows unbounded preceding) AS first_partition
FROM employee_table;
last_name first_name dept_no first_all first_partition
Smythe Richard 10 Richard Richard
Chambers Mandee 100 Richard Mandee
Smith John 200 Richard John
Coffing Billy 200 Richard John
Larkins Loraine 300 Richard Loraine
Strickling Cletus 400 Richard Cletus
Reilly William 400 Richard Cletus
Harrison Herbert 400 Richard Cletus
Jones Squiggy ? Richard Squiggy

The above example uses FIRST_VALUE to show you the very first first_name returned. It also uses the keyword
Partition to show you the very first first_name returned in each department.
Page 262

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

FIRST_VALUE

SELECT product_id ,sale_date , daily_sales,


First_Value (daily_sales) OVER (ORDER BY sale_date) as first_val
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales first_val
1000 2000-09-28 48850.40 48850.40 After the data
2000 2000-09-28 41888.88 48850.40 sorts by sale_date
1000 2000-09-29 54500.22 48850.40 from the
2000 2000-09-29 48000.00 48850.40 ORDER BY
1000 2000-09-30 36000.07 48850.40 statement, the first
row has a
2000 2000-09-30 49850.03 48850.40
daily_sales value
1000 2000-10-01 40200.43 48850.40 of 48850.40.
2000 2000-10-01 54850.29 48850.40
1000 2000-10-02 32800.50 48850.40 All rows for the
2000 2000-10-02 36021.93 48850.40 first_value
1000 2000-10-03 64300.00 48850.40 function repeat.
2000 2000-10-03 43200.18 48850.40
1000 2000-10-04 54553.10 48850.40
2000 2000-10-04 32800.50 48850.40

Above, after sorting the data by sale_date, we get the first value of daily_sales for only the first row. This seems
simple enough but watch us build to make first_value more relevant.
Page 263

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

FIRST_VALUE With Partitioning

SELECT product_id ,sale_date , daily_sales,


First_Value (daily_sales) OVER
(PARTITION BY product_id ORDER BY sale_date) as first_val_p
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales first_val_p We partition by
1000 2000-09-28 48850.40 48850.40 product_id, and
we sort by
1000 2000-09-29 54500.22 48850.40 sale_date within
1000 2000-09-30 36000.07 48850.40 product_id. This
1000 2000-10-01 40200.43 48850.40 is because of
1000 2000-10-02 32800.50 48850.40 the
1000 2000-10-03 64300.00 48850.40 PARTITION BY
1000 2000-10-04 54553.10 48850.40 and ORDER BY
2000 2000-09-28 41888.88 41888.88 statements.
The first row is
2000 2000-09-29 48000.00 41888.88 the value for
2000 2000-09-30 49850.03 41888.88 first_value,
2000 2000-10-01 54850.29 41888.88 which resets
2000 2000-10-02 36021.93 41888.88 with each
2000 2000-10-03 43200.18 41888.88 product_id
2000 2000-10-04 32800.50 41888.88 break.

Above, after partition by product_id, we essentially group the calculation within each product_id. We then have a
minor sort by sale_date. We get the first value of daily_sales for the first row of each product_id. Our next
example will show a great way to use the first_value function.

Page 264

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Daily_Sales Minus FIRST_VALUE With Partitioning

SELECT product_id ,sale_date , daily_sales,


daily_sales - First_Value (daily_sales) OVER
(PARTITION BY product_id ORDER BY sale_date) as compare_first
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales compare_first We partition by
1000 2000-09-28 48850.40 0.00 product_id, and we sort
by sale_date within
1000 2000-09-29 54500.22 5649.82 product_id.
1000 2000-09-30 36000.07 -12850.33
1000 2000-10-01 40200.43 -8649.97 We subtract the
1000 2000-10-02 32800.50 -16049.90 first_value from the
1000 2000-10-03 64300.00 15449.60 current row's
1000 2000-10-04 54553.10 5702.70 daily_sales.
2000 2000-09-28 41888.88 0.00
We can then see if the
2000 2000-09-29 48000.00 6111.12 following rows did better
2000 2000-09-30 49850.03 7961.15 or worse than the sale
2000 2000-10-01 54850.29 12961.41 on the first day.
2000 2000-10-02 36021.93 -5866.95
2000 2000-10-03 43200.18 1311.30 This shows trends!
2000 2000-10-04 32800.50 -9088.38

Above, after sorting the data by sale_date, we get the first value of daily_sales. We then subtract the first_value of
48850.40 against all other daily_sales within product_id 1000 to see the differences.
Page 265

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

FIRST_VALUE With Partitioning

SELECT product_id ,sale_date , daily_sales,


daily_sales - First_Value (daily_sales)
OVER (ORDER BY sale_date) delta_first
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales delta_first
1000 2000-09-28 48850.40 0.00
2000 2000-09-28 41888.88 -6961.52
1000 2000-09-29 54500.22 5649.82
2000 2000-09-29 48000.00 -850.40
Not all
rows
1000 2000-09-30 36000.07 -12850.33
are 2000 2000-09-30 49850.03 999.63
displayed
1000 2000-10-01 40200.43 -8649.97
2000 2000-10-01 54850.29 5999.89
1000 2000-10-02 32800.50 -16049.90
2000 2000-10-02 36021.93 -12828.47

Above, after sorting the data by sale_date, we compute the difference between the first row's daily_sales and the
daily_sales of each following row. All rows daily_sales compare with the first row's daily_sales, thus the name
First_Value.

Page 266

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

FIRST_VALUE After Sorting by the Highest Value

SELECT product_id ,sale_date , daily_sales,


CAST(daily_sales - First_Value (daily_sales)
OVER (ORDER BY daily_sales DESC) as Decimal(10,2)) delta_first
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales delta_first
1000 2000-10-03 64300.00 0.00
2000 2000-10-01 54850.29 -9449.71
1000 2000-10-04 54553.10 -9746.90
1000 2000-09-29 54500.22 -9799.78
Not all 2000 2000-09-30 49850.03 -14449.97
rows
are
1000 2000-09-28 48850.40 -15449.60
displayed 2000 2000-09-29 48000.00 -16300.00
2000 2000-10-03 43200.18 -21099.82
2000 2000-09-28 41888.88 -22411.12
1000 2000-10-01 40200.43 -24099.57

Above, after sorting the data by daily_sales DESC, we compute the difference between the first row's daily_sales
and the daily_sales of each following row. All rows daily_sales compare with the first row's daily_sales, thus the
name First_Value. This example shows how much less each daily_sales compared to 64,300.00, which is our
highest sale.

Page 267

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

FIRST_VALUE with Partitioning

SELECT product_id ,sale_date , daily_sales,


daily_sales - First_Value (daily_sales)
OVER (PARTITION BY product_id ORDER BY sale_date) AS delta_first
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales delta_first
1000 2000-09-28 48850.40 0.00
1000 2000-09-29 54500.22 5649.82
Not all 1000 2000-09-30 36000.07 -12850.33
rows
are
1000 2000-10-01 40200.43 -8649.97
displayed 1000 2000-10-02 32800.50 -16049.90
1000 2000-10-03 64300.00 15449.60
1000 2000-10-04 54553.10 5702.70
2000 2000-09-28 41888.88 0.00
2000 2000-09-29 48000.00 6111.12
2000 2000-09-30 49850.03 7961.15

We are comparing the daily_sales value for the first sale_date with the daily_sales of all other rows within the
product_id partition. Each row compares only with the first row (First_Value) in its partition.

Page 268

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LAST_VALUE

SELECT last_name, first_name, dept_no


,LAST_VALUE(first_name) OVER (ORDER BY dept_no, last_name desc
rows unbounded preceding) AS last_all
,LAST_VALUE(first_name) OVER (PARTITION BY dept_no
ORDER BY dept_no, last_name desc rows unbounded preceding) last_part
FROM employee_table;
last_name first_name dept_no last_all last_part
Jones Squiggy ? Squiggy Squiggy
Smythe Richard 10 Richard Richard
Chambers Mandee 100 Mandee Mandee
Smith John 200 John John
Coffing Billy 200 Billy Billy
Larkins Loraine 300 Loraine Loraine
Strickling Cletus 400 Cletus Cletus
Reilly William 400 William William
Harrison Herbert 400 Herbert Herbert

The FIRST_VALUE and LAST_VALUE are good to use anytime you need to propagate a value from one row to
all or multiple rows based on a sorted sequence. However, the output from the LAST_VALUE function appears to
be incorrect and is a little misleading until you understand a few concepts. The SQL request specifies "rows
unbounded preceding,“ and LAST_VALUE looks at the last row. The current row is always the latest, and
therefore, it appears in the output.
Page 269

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

LAST_VALUE – Current Row

SELECT product_id ,sale_date , daily_sales,


CAST(daily_sales - LAST_Value (daily_sales)
OVER (ORDER BY sale_date) as Decimal(10,2)) delta_last
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales delta_last
1000 2000-09-28 48850.40 6961.52
2000 2000-09-28 41888.88 0.00
1000 2000-09-29 54500.22 6500.22
2000 2000-09-29 48000.00 0.00
Not all
rows 1000 2000-09-30 36000.07 -13849.96
are
displayed
2000 2000-09-30 49850.03 0.00
1000 2000-10-01 40200.43 -14649.86
2000 2000-10-01 54850.29 0.00
1000 2000-10-02 32800.50 -3221.43
2000 2000-10-02 36021.93 0.00
1000 2000-10-03 64300.00 21099.82

Above, after sorting the data by sale_date, we compute the difference between the last row's daily_sales and the
daily_sales of each following row (from the same sale_date). Since there are only two product totals for each day,
there is always a 0.00 for one of the rows.

Page 270

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

First_Value Review

SELECT product_id ,sale_date , daily_sales,


FIRST_VALUE(daily_sales) OVER (ORDER BY sale_date) as first_row_val
FROM sales_table WHERE product_id = 1000;
product_id sale_date daily_sales first_row_val The ORDER BY
1000 2000-09-28 48850.40 48850.40 sorts the data
1000 2000-09-29 54500.22 48850.40 by sale_date
1000 2000-09-30 36000.07 48850.40 ASC.
1000 2000-10-01 40200.43 48850.40
1000 2000-10-02 32800.50 48850.40 The first row after
1000 2000-10-03 64300.00 48850.40 the sort is the
1000 2000-10-04 54553.10 48850.40 First_Value.

First_Value will work slightly differently than Last_Value. Above, we have a First_Value example, and it makes
sense. After the ORDER BY sale_date, the data sorts by sale_date ASC. We then take the daily_sales value of the
first row.

Page 271

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Last_Value Can Be Confusing

SELECT product_id ,sale_date , daily_sales,


LAST_VALUE(daily_sales) OVER (ORDER BY sale_date) as confusing
FROM sales_table WHERE product_id = 1000;

product_id sale_date daily_sales confusing Because we


1000 2000-09-28 48850.40 48850.40 ORDER BY
1000 2000-09-29 54500.22 54500.22 sale_date this looks
1000 2000-09-30 36000.07 36000.07 confusing.
1000 2000-10-01 40200.43 40200.43
1000 2000-10-02 32800.50 32800.50 It is resetting the
1000 2000-10-03 64300.00 64300.00 calculation
1000 2000-10-04 54553.10 54553.10 whenever the
sale_date changes.
The next example will show you how to fix this issue.

First_Value will work slightly differently than Last_Value. Above, we have a Last_Value example, and it is
different than our previous First_Value example. After the ORDER BY sale_date, the data sorts by sale_date
ASC, but the last_value changes to the current row's daily_sales value with each sale_date change. We will fix this
in our next example.

Page 272

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Last_Value Now Makes Sense

SELECT product_id ,sale_date , daily_sales,


Last_Value (daily_sales) OVER (ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
as last_value
FROM sales_table
WHERE product_id = 1000;
product_id sale_date daily_sales last_value The ROWS BETWEEN
1000 2000-10-04 54553.10 54553.10 statement is how you get
1000 2000-10-03 64300.00 54553.10 the last_value.
1000 2000-10-02 32800.50 54553.10
1000 2000-10-01 40200.43 54553.10
1000 2000-09-30 36000.07 54553.10
1000 2000-09-29 54500.22 54553.10
1000 2000-09-28 48850.40 54553.10

Last_Value orders the data by sale_date (DESC). It takes the


value from the first row, but it is the last_value by date.

In this example we are using the keywords ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING. The ORDER BY sale_date statement sorts the data in descending order. The
last_value is the row's daily_sales value of the most current sale_date.
Page 273

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Last_Value With Partitioning

SELECT product_id ,sale_date , daily_sales,


Last_Value (daily_sales) OVER (Partition by Product_id ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as last_value
FROM sales_table;
product_id sale_date daily_sales last_value
1000 2000-09-28 48850.40 54553.10 Last_Value
1000 2000-09-29 54500.22 54553.10 orders the
1000 2000-09-30 36000.07 54553.10 data by
1000 2000-10-01 40200.43 54553.10 sale_date
1000 2000-10-02 32800.50 54553.10
1000 2000-10-03 64300.00 54553.10 It takes the value
1000 2000-10-04 54553.10 54553.10 from the last row
2000 2000-09-28 41888.88 32800.50 because it is the
2000 2000-09-29 48000.00 32800.50 last_value by
2000 2000-09-30 49850.03 32800.50 date.
2000 2000-10-01 54850.29 32800.50
2000 2000-10-02 36021.93 32800.50
2000 2000-10-03 43200.18 32800.50
2000 2000-10-04 32800.50 32800.50

Because we use the PARTITION BY product_id statement and use the ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING statement, the example above makes sense. We reset the
calculation on each product_id break.

Page 274

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Last_Value And First_Value with Partitioning


SELECT product_id ,sale_date , daily_sales,
First_Value (daily_sales) OVER (Partition by Product_id Order by sale_date) as first_value,
Last_Value (daily_sales) OVER (Partition by Product_id ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_value
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales first_value last_value
1000 2000-09-28 48850.40 48850.40 54553.10
1000 2000-09-29 54500.22 48850.40 54553.10
1000 2000-09-30 36000.07 48850.40 54553.10
1000 2000-10-01 40200.43 48850.40 54553.10
1000 2000-10-02 32800.50 48850.40 54553.10
1000 2000-10-03 64300.00 48850.40 54553.10
1000 2000-10-04 54553.10 48850.40 54553.10
2000 2000-09-28 41888.88 41888.88 32800.50
2000 2000-09-29 48000.00 41888.88 32800.50
2000 2000-09-30 49850.03 41888.88 32800.50
2000 2000-10-01 54850.29 41888.88 32800.50
2000 2000-10-02 36021.93 41888.88 32800.50
2000 2000-10-03 43200.18 41888.88 32800.50
2000 2000-10-04 32800.50 41888.88 32800.50

We are now getting the first and last value because we use two functions. The data displays by the Last_Value
function because it is the last function in the SQL.
Page 275

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

First and Last Value Difference Between Today's Daily_Sales


SELECT product_id as prod ,sale_date , daily_sales,
First_Value (daily_sales) OVER (Partition by Product_id Order by sale_date) as first_value,
Last_Value (daily_sales) OVER (Partition by Product_id ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_value,
Daily_Sales - First_Value (daily_sales) OVER (Partition by Product_id Order by sale_date) first_diff,
Daily_Sales - Last_Value (daily_sales) OVER (Partition by Product_id ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_diff
FROM sales_table WHERE product_id IN (1000, 2000) ;
prod sale_date daily_sales first_value last_value first_diff last_diff
1000 2000-09-28 48850.40 48850.40 54553.10 0.00 -5702.70
1000 2000-09-29 54500.22 48850.40 54553.10 5649.82 -52.88
1000 2000-09-30 36000.07 48850.40 54553.10 -12850.33 -18553.03
1000 2000-10-01 40200.43 48850.40 54553.10 -8649.97 -14352.67
1000 2000-10-02 32800.50 48850.40 54553.10 -16049.90 -21752.60
1000 2000-10-03 64300.00 48850.40 54553.10 15449.60 9746.90
1000 2000-10-04 54553.10 48850.40 54553.10 5702.70 0.00
2000 2000-09-28 41888.88 41888.88 32800.50 0.00 9088.38
2000 2000-09-29 48000.00 41888.88 32800.50 6111.12 15199.50
2000 2000-09-30 49850.03 41888.88 32800.50 7961.15 17049.53
2000 2000-10-01 54850.29 41888.88 32800.50 12961.41 22049.79
2000 2000-10-02 36021.93 41888.88 32800.50 -5866.95 3221.43
2000 2000-10-03 43200.18 41888.88 32800.50 1311.30 10399.68
2000 2000-10-04 32800.50 41888.88 32800.50 -9088.38 0.00

Page 276

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LEAD

SELECT product_id, sale_date, daily_sales,


LEAD (daily_sales) over(order by sale_date) AS next_value
FROM sales_table
WHERE product_id = 1000

product_id sale_date daily_sales next_value


1000 2000-09-28 48850.40 54500.22
1000 2000-09-29 54500.22 36000.07
1000 2000-09-30 36000.07 40200.43
1000 2000-10-01 40200.43 32800.50
1000 2000-10-02 32800.50 64300.00
1000 2000-10-03 64300.00 54553.10
1000 2000-10-04 54553.10 ?

As you can see, the first LEAD brings back the value from the next row except for the last, which has no row
following it. We did not specify the offset value in this example, so it defaulted to a value of 1 row. Both queries
trim both the leading and trailing spaces from the Last_Name column for the life of the query.

Page 277

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LEAD with a PARTITION Statement

SELECT product_id, sale_date, daily_sales,


LEAD (daily_sales) over(partition by product_id order by sale_date) next_val
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales next_val
1000 2000-09-28 48850.40 54500.22 PARTITION BY will reset
1000 2000-09-29 54500.22 36000.07 the calculation. So,
36000.07 40200.43 PARTITION BY
1000 2000-09-30
product_id
1000 2000-10-01 40200.43 32800.50 means start over with
1000 2000-10-02 32800.50 64300.00 each
1000 2000-10-03 64300.00 54553.10 product_id break.
1000 2000-10-04 54553.10 ?
2000 2000-09-28 41888.88 48000.00
2000 2000-09-29 48000.00 49850.03
2000 2000-09-30 49850.03 54850.29
2000 2000-10-01 54850.29 36021.93
2000 2000-10-02 36021.93 43200.18
2000 2000-10-03 43200.18 32800.50
2000 2000-10-04 32800.50 ?

As you can see, the first LEAD brings back the value from the next row except for the last, which has no row
following it. We did not specify the offset value in this example, so it defaulted to a value of 1 row. Notice how
things reset because of the Partitioning statement.

Page 278

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LEAD With an Offset of 2

SELECT product_id, sale_date, daily_sales,


LEAD (daily_sales, 2) over(order by sale_date) AS two_down
FROM sales_table
WHERE product_id = 1000

product_id sale_date daily_sales two_down


1000 2000-09-28 48850.40 36000.07
1000 2000-09-29 54500.22 40200.43
1000 2000-09-30 36000.07 32800.50
1000 2000-10-01 40200.43 64300.00
1000 2000-10-02 32800.50 54553.10
1000 2000-10-03 64300.00 ?
1000 2000-10-04 54553.10 ?

As you can see, the LEAD brings back the value from Daily_Sales row two rows down, except for the last two
rows, which have no rows two rows down. The offset value is 2, so it shows the value of two rows down.

Page 279

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LEAD With an Offset of 2 and a PARTITION

SELECT product_id, sale_date, daily_sales,


LEAD (daily_sales, 2) over(partition by product_id order by sale_date) down_2
FROM sql_class.sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales down_2
1000 2000-09-28 48850.40 36000.07
1000 2000-09-29 54500.22 40200.43
1000 2000-09-30 36000.07 32800.50
1000 2000-10-01 40200.43 64300.00
1000 2000-10-02 32800.50 54553.10
1000 2000-10-03 64300.00 ?
1000 2000-10-04 54553.10 ?
2000 2000-09-28 41888.88 49850.03
2000 2000-09-29 48000.00 54850.29
2000 2000-09-30 49850.03 36021.93
2000 2000-10-01 54850.29 43200.18
2000 2000-10-02 36021.93 32800.50
2000 2000-10-03 43200.18 ?
2000 2000-10-04 32800.50 ?

As you can see, the LEAD brings back the value from Daily_Sales row two rows down, except for the last two
rows, which have no rows two rows down (within a partition). Notice how things reset because of the Partitioning
statement.
Page 280

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LAG

SELECT product_id, sale_date, daily_sales,


LAG (daily_sales) over(order by sale_date) AS yesterday
FROM sales_table
WHERE product_id = 1000

product_id sale_date daily_sales yesterday


1000 2000-09-28 48850.40 ? Each row
1000 2000-09-29 54500.22 48850.40 shows
36000.07 54500.22 today's
1000 2000-09-30
daily_sales
1000 2000-10-01 40200.43 36000.07 and
1000 2000-10-02 32800.50 40200.43 yesterdays
1000 2000-10-03 64300.00 32800.50 daily_sales.
1000 2000-10-04 54553.10 64300.00

As you can see, the LAG brings back the value from the next row except for the last, which has no row following
it. We did not specify the offset value in this example, so it defaulted to a value of 1 row.

Page 281

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LAG with a PARTITION Statement

SELECT product_id, sale_date, daily_sales,


LAG (daily_sales) over(partition by product_id order by sale_date) next_val
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales next_val PARTITION BY product_id
is like a GROUP BY
1000 2000-09-28 48850.40 ? statement.
1000 2000-09-29 54500.22 48850.40
1000 2000-09-30 36000.07 54500.22 In other words,
1000 2000-10-01 40200.43 36000.07 PARTITION BY product_id
1000 2000-10-02 32800.50 40200.43 means to reset the
1000 2000-10-03 64300.00 32800.50 calculation on each
product_id break.
1000 2000-10-04 54553.10 64300.00
2000 2000-09-28 41888.88 ? In other words,
2000 2000-09-29 48000.00 41888.88 PARTITION BY product_id
2000 2000-09-30 49850.03 48000.00 means to keep the
2000 2000-10-01 54850.29 49850.03 Lag calculation within
2000 2000-10-02 36021.93 54850.29 each specific product_id
2000 2000-10-03 43200.18 36021.93 only.
2000 2000-10-04 32800.50 43200.18

As you can see, the LAG brings back the value from the next row except for the first, which has no row before it.
We did not specify the offset value in this example, so it defaulted to a value of 1 row. Notice how things reset
because of the partitioning statement.

Page 282

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using Two LAG Statements

SELECT product_id, sale_date, daily_sales,


LAG(daily_sales, 1) over (order by sale_date) AS yesterday,
LAG(daily_sales, 2) over(order by sale_date) AS two_days_ago
FROM sales_table
WHERE product_id = 1000
product_id sale_date daily_sales yesterday two_days_ago
1000 2000-09-28 48850.40 ? ?
1000 2000-09-29 54500.22 48850.40 ?
1000 2000-09-30 36000.07 54500.22 48850.40
1000 2000-10-01 40200.43 36000.07 54500.22
1000 2000-10-02 32800.50 40200.43 36000.07
1000 2000-10-03 64300.00 32800.50 40200.43
1000 2000-10-04 54553.10 64300.00 32800.50

As you can see, the LAG brings back the Daily_Sales value from both yesterday and two days ago. The offset
value in this example is 1 in yesterday, and 2 for the previous two days.

Page 283

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LAG With an Offset of 2

SELECT product_id, sale_date, daily_sales,


LAG (daily_sales, 2) over(order by sale_date) AS two_down
FROM sales_table
WHERE product_id = 1000
product_id sale_date daily_sales two_down
1000 2000-09-28 48850.40 ?
1000 2000-09-29 54500.22 ?
1000 2000-09-30 36000.07 48850.40
1000 2000-10-01 40200.43 54500.22
1000 2000-10-02 32800.50 36000.07
1000 2000-10-03 64300.00 40200.43
1000 2000-10-04 54553.10 32800.50

As you can see, the LAG brings back the Daily_Sales value from two rows down except for the first two rows,
which has no two rows before it. The offset value in this example is 2, so it shows the value of two rows down.

Page 284

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Using LAG With an Offset of 2 and a PARTITION

SELECT product_id, sale_date, daily_sales,


LAG (daily_sales, 2) OVER (partition by product_id order by sale_date) down_2
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales down_2
1000 2000-09-28 48850.40 ?
1000 2000-09-29 54500.22 ?
1000 2000-09-30 36000.07 48850.40
1000 2000-10-01 40200.43 54500.22
1000 2000-10-02 32800.50 36000.07
1000 2000-10-03 64300.00 40200.43
1000 2000-10-04 54553.10 32800.50
2000 2000-09-28 41888.88 ?
2000 2000-09-29 48000.00 ?
2000 2000-09-30 49850.03 41888.88
2000 2000-10-01 54850.29 48000.00
2000 2000-10-02 36021.93 49850.03
2000 2000-10-03 43200.18 54850.29
2000 2000-10-04 32800.50 36021.93

As you can see, the LAG brings back the value from the Daily_Sales column that is two rows down from the
current row. Notice that the first two rows are null because there are now rows two rows before. Notice how things
reset because of the Partitioning statement.

Page 285

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST

SELECT Product_ID ,Sale_Date , Daily_Sales,


CUME_DIST() OVER (ORDER BY Daily_Sales DESC) AS CDist
FROM Sales_Table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales cdist
1000 2000-10-03 64300.00 0.07
2000 2000-10-01 54850.29 0.14
1000 2000-10-04 54553.10 0.21
1000 2000-09-29 54500.22 0.29
2000 2000-09-30 49850.03 0.36
1000 2000-09-28 48850.40 0.43
2000 2000-09-29 48000.00 0.5
2000 2000-10-03 43200.18 0.57
2000 2000-09-28 41888.88 0.64
1000 2000-10-01 40200.43 0.71
2000 2000-10-02 36021.93 0.79
1000 2000-09-30 36000.07 0.86
1000 2000-10-02 32800.50 1
2000 2000-10-04 32800.50 1

CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula.
That formula is (number of rows preceding or peer with current row) / (total rows). We order by Daily_Sales
DESC so that each row ranks by the cumulative distribution. The distribution is represented relatively by floating-
point numbers from 0 to 1. When there is only one row in a partition, it is assigned 1. When there is more than one
row, they are assigned a cumulative distribution ranking, ranging from 0 to 1.
Page 286

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST

SELECT prod The rows are sorted first by total_sales before the calculation begins.
,total_sales
,CUME_DIST() OVER (ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example
prod total_sales cdist percent After total_sales sort the rows, the first
1000 1.00 0.1 10 row has a value of 1.00. There are 10
rows in the result set.
2000 2.00 0.2 20
The calculation is 1/10 = 0.1.
3000 3.00 0.3 30
4000 4.00 0.4 40
After total_sales sorts the rows, the ninth
5000 5.00 0.5 50 row has a value of 999.00. There are 10
6000 6.00 0.6 60 rows in the result set.
7000 7.00 0.7 70
8000 8.00 0.8 80 The calculation is 9/10 = 0.9.
9000 999.00 0.9 90 Notice that the cdist calculation is based
10000 9999.00 1 100 on the order of the row relative to the
other rows in the data set.

CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula.
That formula is (number of rows preceding or peer with current row) / (total rows).

Page 287

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST and Qualify

SELECT prod
,total_sales
,CUME_DIST() OVER (ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example
QUALIFY CUME_DIST() OVER (ORDER BY total_sales) >= 0.5
prod total_sales cdist percent
5000 5.00 0.5 50 The rows are sorted first by the
6000 6.00 0.6 60 column total_sales.
7000 7.00 0.7 70
8000 8.00 0.8 80 The QUALIFY statement provides
9000 999.00 0.9 90 only the top 50 percent of sales.
10000 9999.00 1 100

Based on a formula, CUME_DIST() is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. The QUALIFY statement provides only the top 50 percent of sales.

Page 288

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST With Ties

SELECT prod, total_sales


,CUME_DIST() OVER (ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example_updated
prod total_sales cdist percent
1000 1.00 0.1 10
2000 2.00 0.2 20
3000 3.00 0.3 30
4000 4.00 0.4 40 An update on the data now
5000 5.00 0.6 60 has two rows that have the
0.6 60 same total_sales value.
6000 5.00
7000 7.00 0.7 70 After the rows sort by
8000 8.00 0.8 80 total_sales, the 5th and 6th
9000 999.00 0.9 90 row have a tie, so the
10000 9999.00 1 100 calculation for both rows is
6 divided by 10 = 0.6.

Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. After total_sales sort the rows, the 5th and 6th row have a tie, so the calculation for
both rows is 6 divided by 10 = 0.6.

Page 289

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST and Partition By

SELECT region, prod PARTITION BY region calculates each


,total_sales region separately within their own grouping.
,CUME_DIST() OVER
(PARTITION BY region ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example
region prod total_sales cdist percent
North 6000 6.00 0.2 20
North 7000 7.00 0.4 40
North North 8000 8.00 0.6 60 South
North 9000 999.00 0.8 80
North 10000 9999.00 1 100
South 1000 1.00 0.2 20
South 2000 2.00 0.4 40
South South 3000 3.00 0.6 60 North
South 4000 4.00 0.8 80
South 5000 5.00 1 100

Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. We reset the calculations for each region because we use the PARTITION BY
statement.

Page 290

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CUME_DIST With a Partition on the Sales_Table

SELECT product_id ,sale_date , daily_sales,


CUME_DIST() OVER (PARTITION BY product_id
ORDER BY Daily_Sales DESC) AS cdist
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales cdist
1000 2000-10-03 64300.00 0.14
1000 2000-10-04 54553.10 0.29
1000 2000-09-29 54500.22 0.43
1000 2000-09-28 48850.40 0.57
1000 2000-10-01 40200.43 0.71
Not all
rows 1000 2000-09-30 36000.07 0.86
are 1000 2000-10-02 32800.50 1
displayed
2000 2000-10-01 54850.29 0.14
2000 2000-09-30 49850.03 0.29
2000 2000-09-29 48000.00 0.43
2000 2000-10-03 43200.18 0.57
2000 2000-09-28 41888.88 0.71
2000 2000-10-02 36021.93 0.86
2000 2000-10-04 32800.50 1

Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). We Partition by Product_ID
and then Order By Daily_Sales DESC so that each row ranks by cumulative distribution within its partition.

Page 291

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

CURRENT ROW AND UNBOUNDED FOLLOWING

SELECT product_id, sale_date ,daily_sales


,SUM(daily_sales) OVER (ORDER BY product_id, sale_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS cumulativetotal
FROM sales_table
product_id sale_date daily_sales cumulative_total
1000 2000-09-28 48850.40 862404.35
1000 2000-09-29 54500.22 813553.95
1000 2000-09-30 36000.07 759053.73
1000 2000-10-01 40200.43 723053.66
Not all
rows 1000 2000-10-02 32800.50 682853.23
are 1000 2000-10-03 64300.00 650052.73
displayed
1000 2000-10-04 54553.10 585752.73
2000 2000-09-28 41888.88 531199.63
2000 2000-09-29 48000.00 489310.75
2000 2000-09-30 49850.03 441310.75
2000 2000-10-01 54850.29 391460.72

Above, we used the ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING to produce a
CSUM but notice that the Product_ID and the Sale_Date reverse. We see the Product_ID of 3000 first and the
latest date first.
Page 292

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Different Windowing Options

SELECT product_id, sale_date, daily_sales,


SUM(daily_sales)
OVER( PARTITION BY product_id ORDER BY product_id, sale_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) row_preceding
,SUM(daily_sales)
OVER( PARTITION BY product_id ORDER BY product_id, sale_date
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) row_following
FROM sales_table
product_id sale_date daily_sales row_preceding row_following
1000 2000-09-28 48850.40 48850.40 103350.62
1000 2000-09-29 54500.22 103350.62 90500.29
Not all 1000 2000-09-30 36000.07 90500.29 76200.50
rows 1000 2000-10-01 40200.43 76200.50 73000.93
are
displayed 1000 2000-10-02 32800.50 73000.93 97100.50
1000 2000-10-03 64300.00 97100.50 118853.10
1000 2000-10-04 54553.10 118853.10 54553.10
2000 2000-09-28 41888.88 41888.88 89888.88
2000 2000-09-29 48000.00 89888.88 97850.03

The example above uses ROWS BETWEEN 1 PRECEDING AND CURRENT ROW, and then it uses a different
example with ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Notice how the report came out?
Page 293

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

MEDIAN Example

SELECT last_name, dept_no, salary,


MEDIAN(salary) OVER () AS median
FROM employee_table
ORDER BY salary, last_name desc

last_name dept_no salary median


Jones ? 32800.50 48000
Reilly 400 36000.00 48000
Larkins 300 40200.00 48000
Coffing 200 41888.88 48000
Smith 200 48000.00 48000
Chambers 100 48850.00 48000
Strickling 400 54500.00 48000
Harrison 400 54500.00 48000
Smythe 10 64300.00 48000

The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest to the highest, it then picks the middle
one. If there is an even number of values, then there is no single middle value, so the median is the mean (average)
of the two middle values.

Page 294

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

MEDIAN with Partitioning and a WHERE Clause

SELECT last_name, dept_no, salary,


MEDIAN(salary) OVER (PARTITION BY dept_no) AS median
FROM employee_table
WHERE dept_no in (200, 400) ;

last_name dept_no salary median


Smith 200 48000.00 44944.44 Average between
these two rows
Coffing 200 41888.88 44944.44
Harrison 400 54500.00 54500
Middle row
Strickling 400 54500.00 54500 of dept_no 400
Reilly 400 36000.00 54500 Salaries if ordered

The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest value to the highest, it then picks the
middle one. If there is an even number of values, then there is no single middle value, so the median is considered
to be the mean (average) of the two middle values.

Page 295

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

MEDIAN with Partitioning

SELECT last_name, dept_no, salary,


MEDIAN(salary) OVER (PARTITION BY dept_no) AS MEDIAN
FROM employee_table order by dept_no, salary

last_name dept_no salary median


Jones ? 32800.50 32800.5
Smythe 10 64300.00 64300
Chambers 100 48850.00 48850
Coffing 200 41888.88 44944.44
Smith 200 48000.00 44944.44
Larkins 300 40200.00 40200
Reilly 400 36000.00 54500
Harrison 400 54500.00 54500
Strickling 400 54500.00 54500

The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest value to the highest, it then picks the
middle one. If there is an even number of values, then there is no single middle value, so the median is considered
to be the mean (average) of the two middle values.

Page 296

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_CONT Function Description and Syntax

The PERCENTILE_CONT window function is an inverse distribution function that assumes


a continuous distribution model. It takes a percentile value and a sort specification and
returns an interpolated value that would fall into the given percentile value with respect to
the sort specification.

PERCENTILE_CONT computes a linear interpolation between values after ordering them.


Using the percentile value (P) and the number of not null rows (N) in the aggregation
group, the function computes the row number after ordering the rows according to the sort
specification. This row number (RN) is computed according to the formula
RN = (1+ (P*(N-1)).

The result of the aggregate function is computed by linear interpolation between the values
from rows at row numbers

CRN = CEILING(RN) and FRN = FLOOR(RN).

PERCENTILE_CONT ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_CONT
OVER ( [ PARTITION BY expr_list ] )

Above are the PERCENTILE_CONT function and the description.

Page 297

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Final Result Information About PERCENTILE_CONT

PERCENTILE_CONT computes a linear interpolation between values after ordering them.


Using the percentile value (P) and the number of not null rows (N) in the aggregation group,
the function computes the row number after ordering the rows according to the sort
specification. This row number (RN) is computed according to the formula
RN = (1+ (P*(N-1)).

The result of the aggregate function is computed by linear interpolation between the values
from rows at row numbers

CRN = CEILING(RN) and FRN = FLOOR(RN).

The result will be as follows.


If (CRN = FRN = RN), then the result is (value of the expression from the row
at RN)

Else the result is as follows:

(CRN – RN) * (value of expression for the row at FRN) + (RN – FRN) * (value
of expression for the row at CRN).

Above is the PERCENTILE_CONT final result rules.


Page 298

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC Function Arguments

PERCENTILE_CONT ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_CONT
OVER ( [ PARTITION BY expr_list ] )

Arguments
percentile - Numeric constant between 0 and 1. Nulls are ignored in the calculation.
WITHIN GROUP ( ORDER BY expr) - Specifies numeric or date/time values to sort and
compute the percentile over.
OVER - Specifies the window partitioning. The OVER clause cannot contain a window
ordering or window frame specification.
PARTITION BY expr - Optional argument that sets the range of records for each group
in the OVER clause.

Using the percentile value (P) and the number of not null rows (N) in the aggregation
group, the function computes the row number after ordering the rows according to the
sort specification. This row number (RN) is computed according to the formula
RN = (1+ (P*(N-1)).

The result of the aggregate function is computed by linear interpolation between the
values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

Above are the function arguments with additional information to help you make sense of this challenging function.
Page 299

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_CONT Example

SELECT product_id, sale_date, daily_sales, percentile_cont (0.5)


WITHIN GROUP(order by daily_sales)
OVER () as median
FROM sales_table
WHERE product_id = 1000 ;

product_id sale_date daily_sales median


1000 2000-09-28 48850.40 48850.4
1000 2000-09-29 54500.22 48850.4
1000 2000-09-30 36000.07 48850.4
1000 2000-10-01 40200.43 48850.4
1000 2000-10-02 32800.50 48850.4
1000 2000-10-03 64300.00 48850.4
1000 2000-10-04 54553.10 48850.4

The above example shows percentile_cont (0.5). These values would be different if percentile_cont were (0.4).

Page 300

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_CONT Example with Percentage Change

SELECT product_id, sale_date, daily_sales, percentile_cont (0.4)


WITHIN GROUP(order by daily_sales) OVER () as median
FROM sales_table WHERE product_id = 1000 ;

product_id sale_date daily_sales median


1000 2000-09-28 48850.40 43660.42
1000 2000-09-29 54500.22 43660.42
1000 2000-09-30 36000.07 43660.42
1000 2000-10-01 40200.43 43660.42
1000 2000-10-02 32800.50 43660.42
1000 2000-10-03 64300.00 43660.42
1000 2000-10-04 54553.10 43660.42

The above example shows percentile_cont (0.4). Notice that the median values are different than the previous
example that uses a percentile_cont (0.5).

Page 301

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_CONT With PARTITION Example

SELECT product_id, sale_date, daily_sales, percentile_cont(0.5)


WITHIN GROUP(order by daily_sales)
OVER (PARTITION BY product_id) as median
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales median
1000 2000-09-28 48850.40 48850.4
1000 2000-09-29 54500.22 48850.4
1000 2000-09-30 36000.07 48850.4
1000 2000-10-01 40200.43 48850.4
1000 2000-10-02 32800.50 48850.4
1000 2000-10-03 64300.00 48850.4
1000 2000-10-04 54553.10 48850.4
2000 2000-09-28 41888.88 43200.18
2000 2000-09-29 48000.00 43200.18
2000 2000-09-30 49850.03 43200.18
2000 2000-10-01 54850.29 43200.18
2000 2000-10-02 36021.93 43200.18
2000 2000-10-03 43200.18 43200.18
2000 2000-10-04 32800.50 43200.18

The above example shows the percentile_cont (0.5) for each Product_ID partition break.
Page 302

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_CONT With PARTITION and (0.4)

SELECT product_id, sale_date, daily_sales, percentile_cont(0.4)


WITHIN GROUP(order by daily_sales)
OVER (PARTITION BY product_id) as median
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales median
1000 2000-09-28 48850.40 43660.42
1000 2000-09-29 54500.22 43660.42
1000 2000-09-30 36000.07 43660.42
1000 2000-10-01 40200.43 43660.42
1000 2000-10-02 32800.50 43660.42
1000 2000-10-03 64300.00 43660.42
1000 2000-10-04 54553.10 43660.42
2000 2000-09-28 41888.88 42413.4
2000 2000-09-29 48000.00 42413.4
2000 2000-09-30 49850.03 42413.4
2000 2000-10-01 54850.29 42413.4
2000 2000-10-02 36021.93 42413.4
2000 2000-10-03 43200.18 42413.4
2000 2000-10-04 32800.50 42413.4

The above example shows the percentile_cont (0.4) for each Product_ID partition break.
Page 303

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC Function Description and Syntax

The PERCENTILE_DISC window function is an inverse distribution


function that assumes a discrete distribution model. It takes a percentile
value and a sort specification and returns an element from the given set.
For a given percentile value P, PERCENTILE_DISC sorts the values of
the expression in the ORDER BY clause and returns the value with the
smallest cumulative distribution value (concerning the same sort
specification) that is greater than or equal to P.

PERCENTILE_DISC ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_DISC
OVER ( [ PARTITION BY expr_list ] )

Above is information on the PERCENTILE_DISC function and a description.

Page 304

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC Example

SELECT product_id, sale_date, daily_sales, percentile_cont (0.5)


WITHIN GROUP (order by daily_sales)
OVER () as p_disc
FROM sales_table
WHERE product_id = 1000 ;

product_id sale_date daily_sales p_disc


1000 2000-09-28 48850.40 48850.4
1000 2000-09-29 54500.22 48850.4
1000 2000-09-30 36000.07 48850.4
1000 2000-10-01 40200.43 48850.4
1000 2000-10-02 32800.50 48850.4
1000 2000-10-03 64300.00 48850.4
1000 2000-10-04 54553.10 48850.4

The above example shows percentile_disc (0.5). These values would be different if percentile_disc were (0.4).

Page 305

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC Example with Percentage Change

SELECT product_id, sale_date, daily_sales, percentile_disc (0.4)


WITHIN GROUP (order by daily_sales) OVER () as p_disc
FROM sales_table WHERE product_id = 1000 ;

product_id sale_date daily_sales p_disc


1000 2000-09-28 48850.40 40200.43
1000 2000-09-29 54500.22 40200.43
1000 2000-09-30 36000.07 40200.43
1000 2000-10-01 40200.43 40200.43
1000 2000-10-02 32800.50 40200.43
1000 2000-10-03 64300.00 40200.43
1000 2000-10-04 54553.10 40200.43

The above example shows percentile_disc (0.4). The answer set is different than the previous example that uses a
percentile_disc of (0.5).

Page 306

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC With PARTITION Example

SELECT product_id, sale_date, daily_sales, percentile_disc (0.5)


WITHIN GROUP (order by daily_sales)
OVER (PARTITION BY product_id) as p_disc
FROM sales_table WHERE product_id in (1000, 2000) ;
product_id sale_date daily_sales p_disc
1000 2000-10-02 48850.40 48850.4
1000 2000-09-30 54500.22 48850.4
1000 2000-10-01 36000.07 48850.4
1000 2000-09-28 40200.43 48850.4
1000 2000-09-29 32800.50 48850.4
1000 2000-10-04 64300.00 48850.4
1000 2000-10-03 54553.10 48850.4
2000 2000-10-04 41888.88 43200.18
2000 2000-10-02 48000.00 43200.18
2000 2000-09-28 49850.03 43200.18
2000 2000-10-03 54850.29 43200.18
2000 2000-09-29 36021.93 43200.18
2000 2000-09-30 43200.18 43200.18
2000 2000-10-01 32800.50 43200.18

The above example shows the percentile_disc (0.5) for each Product_ID partition break.
Page 307

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

PERCENTILE_DISC With PARTITION and (0.4)

SELECT product_id, sale_date, daily_sales, percentile_disc (0.4)


WITHIN GROUP(order by daily_sales)
OVER (PARTITION BY product_id) as p_disc
FROM sales_table WHERE product_id IN (1000, 2000) ;
product_id sale_date daily_sales p_disc
1000 2000-09-28 48850.40 40200.43
1000 2000-09-29 54500.22 40200.43
1000 2000-09-30 36000.07 40200.43
1000 2000-10-01 40200.43 40200.43
1000 2000-10-02 32800.50 40200.43
1000 2000-10-03 64300.00 40200.43
1000 2000-10-04 54553.10 40200.43
2000 2000-09-28 41888.88 41888.88
2000 2000-09-29 48000.00 41888.88
2000 2000-09-30 49850.03 41888.88
2000 2000-10-01 54850.29 41888.88
2000 2000-10-02 36021.93 41888.88
2000 2000-10-03 43200.18 41888.88
2000 2000-10-04 32800.50 41888.88

The above example shows the percentile_disc (0.4) for each Product_ID partition break.
Page 308

Licensed to , [email protected]
Chapter 7 Analytic and Window Functions

Page 309

Licensed to , [email protected]
Chapter 8 Temporary Tables

Chapter 8 – Temporary Tables

"Life is what happens when you're busy making other plans."

- John Lennon

Page 310

Licensed to , [email protected]
Chapter 8 Temporary Tables

CREATING A Derived Table

• Exists only within a query


• Materialized by a SELECT Statement inside a query
• Space comes from the User’s space
• Deleted when the query ends

Everything
SELECT * in blue, red,
FROM (SELECT AVG(salary) as avgsal and pink is
FROM employee_table) AS teratom ; the derived
table

A derived table will have three things:


1) A SELECT statement within parentheses to populate the table with data (blue color).
2) A name of the derived table (pink color) * AS keyword is optional and not needed.
3) All columns that are aggregates or calculations must have a column alias (red color).

The SELECT statement that creates and populates the derived table is always inside parentheses.

Page 311

Licensed to , [email protected]
Chapter 8 Temporary Tables

Naming the Derived Table

SELECT *
FROM (SELECT AVG(salary) as avgsal
FROM employee_table) AS teratom ;

The name of the


derived table is teratom

avgsal Answer
46782.153333 Set

A derived table will have three things:


1) A SELECT statement within parentheses to populate the table with data (blue color).
2) A name of the derived table (pink color) * AS keyword is optional and not needed.
3) All columns that are aggregates or calculations must have a column alias (red color).

In the example above, TeraTom is the name given to the derived table. The materialization of the derived table
comes from the blue colors. The red color is the alias name given to the aggregate AVG(salary).

Page 312

Licensed to , [email protected]
Chapter 8 Temporary Tables

Aliasing the Column names in the Derived Table

The best way


SELECT *
1 to alias a
FROM (SELECT AVG(salary) AS avgsal column is in
FROM employee_table) AS teratom ; the SELECT
statement
The derived table must
always have a name

SELECT *
FROM (SELECT AVG(salary)
FROM employee_table) AS teratom(avgsal) ;
2
The derived
Aliasing the column(s)
table must
can be done here, but it
always have a
is not the best way
name

In the example above, TeraTom is the name given to the derived table. All calculations must have an alias. The
first example is a better way to alias the column names.

Page 313

Licensed to , [email protected]
Chapter 8 Temporary Tables

CREATING A Derived Table using the WITH Command

Derived
Create the Derived This AS columns
Table before we run the is necessary must
query query!
using WITH! have an
alias
WITH teratom AS
(SELECT AVG(salary) as avgsal
FROM employee_table)
After teratom is built
you must use it in a
SELECT * FROM teratom ; second SELECT
clause or the query
avgsal Answer errors
46782.153333 Set

A derived table will have three things:


1) A SELECT statement within parentheses to populate the table with data (blue color).
2) A name of the derived table (pink color) * AS keyword is optional and not needed.
3) All columns that are aggregates or calculations must have a column alias (red color).

In the example above, TeraTom is the name given to the derived table. The WITH command allows the creation of
the derived table before the actual query runs. You must have two SELECT statements. The first SELECT builds
the table and the second queries the table. All derived table examples shown have the same performance.

Page 314

Licensed to , [email protected]
Chapter 8 Temporary Tables

Derived Query Examples with Three Different Techniques

SELECT *
FROM (SELECT dept_no Aliasing of Notice that
1 ,AVG(salary) column names every
FROM employee_table query with
a derived
GROUP BY dept_no) teratom (dept_no, avgsal) table has at
least two
SELECT * No alias Alias of select
FROM (SELECT dept_no needed calculation
2 statements.
, AVG(salary) as avgsal
FROM employee_table One
GROUP BY 1) teratom ; SELECT to
populate
Alias of the derived
WITH teratom AS table, and
calculation
(SELECT dept_no, AVG(salary) as avgsal the other to
FROM employee_table run the
3 GROUP BY dept_no) query.
You must SELECT again
SELECT * FROM teratom
or the query will error

In the examples above, TeraTom is the name given to each derived table. All examples have the same
performance to retrieve the same answer set. The choice is yours.
Page 315

Licensed to , [email protected]
Chapter 8 Temporary Tables

Most Derived Tables Are Used To Join To Other Tables

SELECT E.*, avgsal


FROM employee_table as E
The SELECT
Select all columns from E (employee_table) materializes the
Derived Table
INNER JOIN
(SELECT dept_no, AVG(salary) as avgsal
FROM employee_table
GROUP BY dept_no) AS teratom
ON E.dept_no = teratom.dept_no ORDER BY E.dept_no ;
employee_no dept_no last_name first_name salary avgsal
1000234 10 Smythe Richard 64300.00 64300.000000
1232578 100 Chambers Mandee 48850.00 48850.000000
1324657 200 Coffing Billy 41888.88 44944.440000
1333454 200 Smith John 48000.00 44944.440000
2312225 300 Larkins Loraine 40200.00 40200.000000
1121334 400 Strickling Cletus 54500.00 48333.333333
1256349 400 Harrison Herbert 54500.00 48333.333333
2341218 400 Reilly William 36000.00 48333.333333

The above example shows how users use derived tables. Derived tables are great for combining aggregates with
detailed data. Above our derived table, TeraTom held the averages for each dept_no, and we then joined our
derived table named TeraTom to our employee_table.
Page 316

Licensed to , [email protected]
Chapter 8 Temporary Tables

The Three Components of a Derived Table

SELECT E.*, salary - avgsal as plusminavg teratom


FROM employee_table as E dept_no avgsal
INNER JOIN ? 32800.50
10 64300.00
100 48850.00
(SELECT dept_no, AVG(salary) as avgsal
200 44944.44
FROM employee_table 300 40200.00
GROUP BY dept_no) AS teratom 400 48333.33
Derived tables live in
ON E.dept_no = teratom.dept_no memory until the query
ends.

A derived table will always have a SELECT query to materialize the


1 derived table with data. The SELECT query always starts with an
open parenthesis and ends with a close parenthesis.

The derived table must be given a name. Above we


2 called our derived table teratom.

You will need to define (alias) the columns in the derived


3 table. Above we allowed dept_no to default to dept_no,
but we had to specifically alias AVG(salary) as avgsal.

The above example explains the three components of a derived table.

Page 317

Licensed to , [email protected]
Chapter 8 Temporary Tables

Visualize This Derived Table

SELECT E.*, salary - avgsal as plusminavg teratom


FROM employee_table as E dept_no avgsal
INNER JOIN ? 32800.50
(SELECT dept_no, AVG(salary) as avgsal 10 64300.00
FROM employee_table 100 48850.00
GROUP BY dept_no) AS teratom 200 44944.44
300 40200.00
ON E.dept_no = teratom.dept_no 400 48333.33
ORDER BY E.dept_no ; The derived table is built first
employee_no dept_no last_name first_name salary plusminavg
1000234 10 Smythe Richard 64300.00 0.000000
1232578 100 Chambers Mandee 48850.00 0.000000
1324657 200 Coffing Billy 41888.88 3055.560000
1333454 200 Smith John 48000.00 -3055.560000
2312225 300 Larkins Loraine 40200.00 0.000000
1121334 400 Strickling Cletus 54500.00 6166.666667
1256349 400 Harrison Herbert 54500.00 6166.666667
2341218 400 Reilly William 36000.00 -12333.333333

When using the WITH Command, we can CREATE our Derived table before running the main query. You must
SELECT from the derived table, or it will error.
Page 318

Licensed to , [email protected]
Chapter 8 Temporary Tables

Our Join Example Using The WITH Syntax

WITH teratom AS teratom


(SELECT dept_no dept_no avgsal
, AVG(salary) as avgsal ? 32800.50
FROM employee_table 10 64300.00
GROUP BY dept_no) 100 48850.00
200 44944.44
300 40200.00
SELECT E.*, avgsal 400 48333.33
FROM employee_table as E
The derived table
INNER JOIN
is built first
teratom
ON E.dept_no = teratom.dept_no
ORDER BY E.dept_no ;

now, the lower portion of the query


refers to TeraTom almost like it is a
permanent table, but it is not!

The WITH syntax is nice because you build the derived table right away.

Page 319

Licensed to , [email protected]
Chapter 8 Temporary Tables

An Example of Two Derived Tables in a Single Query

WITH t (dept_no, avgsal) AS


(SELECT dept_no, AVG(salary) FROM employee_table
GROUP BY dept_no)

SELECT t.dept_no, first_name, last_name,


avgsal, counter
FROM employee_table as E
INNER JOIN
t
ON E.dept_no = T.dept_no
INNER JOIN

(SELECT employee_no, SUM(1) OVER(PARTITION BY dept_no


ORDER BY dept_no, last_name Rows Unbounded Preceding)
FROM employee_table) as s (employee_no, counter)

ON E.employee_no = s.employee_no
ORDER BY t.dept_no;

You can have as many derived tables as you need.


Page 320

Licensed to , [email protected]
Chapter 8 Temporary Tables

Page 321

Licensed to , [email protected]
Chapter 9 Subqueries

Chapter 9 – Subqueries

“An invasion of Armies can be resisted, but not an idea whose time has come."

- Victor Hugo

Page 322

Licensed to , [email protected]
Chapter 9 Subqueries

An IN List is much like a Subquery

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT * FROM employee_table


WHERE dept_no IN (100, 200) ;

employee_no dept_no last_name first_name salary


1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00

This query is straightforward and easy to understand. It uses an IN-List to find all employees who are in dept_no
100 or dept_no 200.

Page 323

Licensed to , [email protected]
Chapter 9 Subqueries

An IN List Never has Duplicates – Just like a Subquery

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT *
FROM employee_table
WHERE dept_no IN (100, 100,200, 200) ;

What is going on with this IN List? Why in the world are their duplicates in there? Will this query even work?
What will the result set look like when it returns? Turn the page!

Page 324

Licensed to , [email protected]
Chapter 9 Subqueries

An IN List Ignores Duplicates

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT * FROM employee_table The answer set


still produced
WHERE dept_no IN (100, 100, 200, 200) only three rows

employee_no dept_no last_name first_name salary


1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00

The system ignores duplicate values in a list. We get the same rows back as before because the system ignores the
duplicate values in the IN list.

Page 325

Licensed to , [email protected]
Chapter 9 Subqueries

The Subquery

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT * There is a
Which Query FROM employee_table Top Query
Runs First WHERE dept_no IN ( and a
Top or bottom? SELECT dept_no Bottom
FROM department_table) ; Query!

The query above is a Subquery, meaning multiple queries are in the same SQL. The bottom query runs first, and
its purpose in life is to build a distinct list of values that it passes to the top query. The top query then returns the
result set. This query solves the problem: Show all employees in valid departments!

Page 326

Licensed to , [email protected]
Chapter 9 Subqueries

The Three Steps of How a Basic Subquery Works

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00 2
1256349 400 Harrison Herbert 54500.00 100
200 The result is
SELECT * 300 passed to the
1 FROM employee_table 400 top query!
The Bottom WHERE dept_no IN ( 500
Query runs SELECT dept_no
first! FROM department_table) ;
3
SELECT * FROM employee_table The top query runs using the
WHERE dept_no IN (100, 200, 300, 400, 500) ; bottom query answer set

The bottom query runs first and builds a distinct IN list. Then, the top query runs using the list.

Page 327

Licensed to , [email protected]
Chapter 9 Subqueries

These are Equivalent Queries

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT *
1 FROM employee_table
WHERE dept_no IN (
SELECT dept_no FROM department_table) ;

2 SELECT * FROM employee_table


WHERE dept_no IN (100, 200, 300, 400, 500) ;

Both queries above are the same. Query two has values in an IN list. Query one runs a subquery to build the values
in the IN list.
Page 328

Licensed to , [email protected]
Chapter 9 Subqueries

The Final Answer Set from the Subquery

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Notice that No
2341218 400 Reilly William 36000.00 employees are
1256349 400 Harrison Herbert 54500.00 in dept 500

Remember that a SELECT * FROM employee_table WHERE dept_no


subquery never
has columns IN (SELECT dept_no FROM department_table)
return in the final
answer set employee_no dept_no last_name first_name salary
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1256349 400 Harrison Herbert 54500.00
2341218 400 Reilly William 36000.00
1121334 400 Strickling Cletus 54500.00

Page 329

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- Answer the Difficult Question

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

How are Subqueries similar to Joins between two tables?

A great question is above. How are subqueries similar to joins? Do you know the answer? Turn the page!

Page 330

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz - Answer the Difficult Question

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Primary
2341218 400 Reilly William 36000.00 Key
1256349 400 Harrison Herbert 54500.00
How are Subqueries
Foreign Key similar to Joins
between two tables?
A Subquery between two tables or a Join between two tables will
each need a common key that represents the relationship. This is
called a Primary Key/Foreign Key relationship.

A Subquery will use a common key linking the two tables together, very similar to a join! When sub querying
between two tables, look for the common link between the two tables. They will commonly both have a column
with the same name, but not always.

Page 331

Licensed to , [email protected]
Chapter 9 Subqueries

Should you use a Subquery or a Join?

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

When do I Subquery? When do I perform a Join?

SELECT * SELECT E.*, department_name


FROM employee_table FROM employee_table as E
WHERE dept_no IN ( Inner Join
SELECT dept_no department_table as D
FROM department_table) ; ON E.dept_no = D.dept_no;

Both queries above return much of the same data. If you only want to see a report where the final result set has
only columns from one table, try a subquery. If you need columns on the report where the final result set has
columns from both tables, you must do a Join.
Page 332

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz - Write the Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the
Subquery

Select all columns in the customer_table if


the customer has placed an order.

Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the
customer_table if the customer has placed an order in the order_table. Good luck! Advice: Look for the shared
key among both tables!

Page 333

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz- Write the Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery Select all columns in


the customer_table if
the customer has
SELECT customer_number
placed an order!
,customer_name
FROM customer_table customer_number customer_name
WHERE customer_number IN 11111111 Billy's Best Choice
( SELECT customer_number 31323134 ACE Consulting
FROM order_table) 57896883 XYZ Plumbing
ORDER BY customer_number; 87323456 Databases N-U

The shared key among both tables is customer_number. The bottom query runs first and delivers a distinct list of
customer numbers, which the top query uses in the IN-List!
Page 334

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz - Write the More Difficult Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery

Select all columns in the customer_table if the customer


has placed an order over $10,000.00 Dollars!

Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the
customer_table if the customer has placed an order in the order_table that is greater than $10,000.00.

Page 335

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz - Write the More Difficult Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Select all columns in the customer_table if the Write the


customer has placed an order over $10,000.00 Dollars! Subquery

SELECT * customer_number customer_name


FROM customer_table
11111111 Billy's Best Choice
WHERE customer_number IN (
57896883 XYZ Plumbing
SELECT customer_number
87323456 Databases N-U
FROM order_table
WHERE order_total > 10000.00)

Here is your answer!

Page 336

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz – Write the Extreme Subquery


course_table
course_id course_name credits seats
100 Database Concepts 3 50
200 Introduction to SQL 3 20
student_course_table
210 Advanced SQL 3 22
student_id course_id 220 V2R3 SQL Features 2 25
280023 210 300 Physical Database Design 4 20
231222 210 400 Database Administration 4 16
125634 100
231222 220 student_table
125634 200 student_id last_name first_name class_code grade_pt
322133 220 423400 Larkins Michael FR 0.00
125634 220 231222 Wilson Susie SO 3.80
322133 300 280023 McRoberts Richard JR 1.90
324652 200 322133 Bond Jimmy JR 3.95
333450 500 125634 Hanson Henry FR 2.88
260000 400 333450 Smith Andy SO 2.00
333450 400 324652 Delaney Danny SR 3.35
234121 100 260000 Johnson Stanley ? ?
123250 100 234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00

Write SQL that will bring back an answer set that selects all columns from
the student_table if that student is taking a course that has four (4) credits.

Use a subquery to get the answer set requested above. The answer is on the next page.
Page 337

Licensed to , [email protected]
Chapter 9 Subqueries

Answer To Quiz – Write the Extreme Subquery

SELECT *
FROM student_table
WHERE student_id IN
(SELECT student_id
FROM student_course_table
WHERE course_id IN
(SELECT course_id
FROM course_table
WHERE credits=4))

student_id last_name first_name class_code grade_pt


260000 Johnson Stanley ? ?
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00

Above is something to enjoy and learn from in your quest to master subqueries.

Page 338

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz - Write the Subquery with an Aggregate

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

Write the Subquery

Select all columns in the employee_table if the employee


makes a greater salary than the AVERAGE salary.

Another opportunity knocking! Would someone please answer the query door?

Page 339

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz- Write the Subquery with an Aggregate

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

Select all columns in the employee_table if the employee


makes a greater salary than the AVERAGE salary.

SELECT *
FROM employee_table
WHERE salary > (
SELECT AVG(salary)
FROM employee_table) ;

Nailed it!

Page 340

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- Write the Correlated Subquery

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

Write the Correlated


Subquery

Select all columns in the employee_table if the


employee makes a greater salary than the AVERAGE
salary (within their own Department).

Another opportunity knocking! The query is complicated. Only the best get this written correctly.
Page 341

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz- Write the Correlated Subquery

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

Select all columns in the employee_table if the employee makes a


greater salary than the AVERAGE salary (within their own Department).

SELECT * FROM employee_table as EE


WHERE salary > (
SELECT AVG(salary)
FROM employee_table as EEEE
WHERE EE.dept_no = EEEE.dept_no) ;

The above example is a correlated subquery. It works differently than normal subqueries.

Page 342

Licensed to , [email protected]
Chapter 9 Subqueries

The Basics of a Correlated Subquery

The Top Query is Co-Related (Correlated) with the


Bottom Query.
The table name from the top query and the table name
from the bottom query are given a different alias.
The bottom query WHERE clause co-relates dept_no
from Top and Bottom.
The top query is run first.
The bottom query is run one time for each distinct
value delivered from the top query.

SELECT * FROM employee_table as EE


WHERE salary > (
SELECT AVG(salary)
FROM employee_table as EEEE
WHERE EE.dept_no = EEEE.dept_no) ;

The example above is a correlated subquery. It works differently than normal subqueries. It runs the top query first
and then runs the bottom query for each distinct dept_no.

Page 343

Licensed to , [email protected]
Chapter 9 Subqueries

The Top Query always runs first in a Correlated Subquery

The Top SELECT * FROM employee_table as EE The bottom


Query runs WHERE salary > ( query (in red)
first runs one time
SELECT AVG(salary)
(colored in for each distinct
blue)
FROM employee_table as EEEE dept_no
WHERE EE.dept_no = EEEE.dept_no)
SELECT * FROM employee_table as EE EE.dept_no = EEEE.dept_no

employee_no dept_no last_name first_name salary dept_no avgsal


Null is
2000000 skipped ? Jones Squiggy 32800.50 10 64300.00
1000234 10 Smythe Richard 64300.00 100 48850.00
1232578 100 Chambers Mandee 48850.00 200 44944.44
1324657 200 Coffing Billy 41888.88 300 40200.00
1333454 200 Smith John 48000.00 400 48333.33
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00 Only these three
employee_no dept_no last_name first_name salary employees make
1333454 200 Smith John 48000.00 more than the AVG
1256349 400 Harrison Herbert 54500.00 salary within their
1121334 400 Strickling Cletus 54500.00 own department

Page 344

Licensed to , [email protected]
Chapter 9 Subqueries

Correlated Subquery Example vs. a Join with a Derived Table

SELECT last_name, dept_no, salary Correlated Subquery


FROM employee_table as EE last_name dept_no salary
WHERE salary > (
Smith 200 48000.00
SELECT AVG(salary)
Harrison 400 54500.00
FROM employee_table as EEEE
Strickling 400 54500.00
WHERE EE.dept_no = EEEE.dept_no)

SELECT last_name, dept_no,


salary, avgsal
FROM employee_table as E
INNER JOIN Join with a Derived Table
(SELECT dept_no, AVG(salary)
last_name dept_no salary avgsal
FROM employee_table
GROUP BY dept_no) Smith 200 48000.00 44944.44
teratom (depty, avgsal) Harrison 400 54500.00 48333.33
ON dept_no = depty Strickling 400 54500.00 48333.33
AND salary > avgsal ;

Both queries above will bring back all employees making a salary that is greater than the average salary in their
department. The most significant difference is that the Join with the Derived Table also shows the average salary
in the result set.

Page 345

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- A Second Chance To Write a Correlated Subquery

sales_table
product_id sale_date daily_sales
1000 10/02/2000 32800.50
1000 09/30/2000 36000.07
1000 10/01/2000 40200.43
2000 10/04/2000 32800.50
2000 10/02/2000 36021.93
2000 09/28/2000 41888.88
3000 10/04/2000 15675.33
3000 10/02/2000 19678.94
3000 10/03/2000 21553.79
Write the Correlated Subquery

Select all columns in the sales_table if the daily_sales column is


greater than the Average daily_sales within its own product_id.

Another opportunity knocking! You now have a second chance. I will even give you a third chance.

Page 346

Licensed to , [email protected]
Chapter 9 Subqueries

Answer - A Second Chance to Write a Correlated Subquery

SELECT * FROM sales_table as TopS


WHERE daily_sales > (
SELECT AVG(daily_sales)
FROM sales_table as BotS
WHERE TopS.product_id = BotS.product_id)
ORDER BY product_id, sale_date ;

product_id sale_date daily_sales


1000 2000-09-28 48850.40
1000 2000-09-29 54500.22 Select all columns
1000 2000-10-03 64300.00 in the sales_table if
Answer 1000 2000-10-04 54553.10 the daily_sales
Set 2000 2000-09-29 48000.00 column is greater
2000 2000-09-30 49850.03 than the Average
2000 2000-10-01 54850.29 daily_sales within
3000 2000-09-28 61301.77 its own product_id.
3000 2000-09-29 34509.13
3000 2000-09-30 43868.86

All you must do is alias both tables and then correlate in the WHERE clause.

Page 347

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- A Third Chance To Write a Correlated Subquery

sales_table
product_id sale_date daily_sales
1000 10/02/2000 32800.50
1000 09/30/2000 36000.07
1000 10/01/2000 40200.43
All rows
are not
2000 10/04/2000 32800.50
displayed 2000 10/02/2000 36021.93
2000 09/28/2000 41888.88
3000 10/04/2000 15675.33
3000 10/02/2000 19678.94
3000 10/03/2000 21553.79
Write the Correlated Subquery

Select all columns in the sales_table if the daily_sales column is


greater than the Average daily_sales within its own sale_date.

Another opportunity knocking! Practicing complicated queries makes you stronger.

Page 348

Licensed to , [email protected]
Chapter 9 Subqueries

Answer - A Third Chance to Write a Correlated Subquery

SELECT * FROM sales_table as TopS Select all columns in the


WHERE daily_sales > ( sales_table if the
SELECT AVG(daily_sales) daily_sales column is
FROM sales_table as BotS greater than the Average
WHERE TopS.sale_date = BotS.sale_date) daily_sales within its
ORDER BY sale_date ; own sale_date.
product_id sale_date daily_sales
3000 2000-09-28 61301.77
2000 2000-09-29 48000.00
1000 2000-09-29 54500.22
3000 2000-09-30 43868.86
2000 2000-09-30 49850.03
2000 2000-10-01 54850.29
2000 2000-10-02 36021.93
1000 2000-10-02 32800.50
2000 2000-10-03 43200.18
1000 2000-10-03 64300.00
1000 2000-10-04 54553.10

All you must do is alias both tables and then correlate in the WHERE clause.

Page 349

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- Last Chance To Write a Correlated Subquery

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Write the Correlated Subquery

Select all columns in the student_table if the grade_pt column is


greater than the Average grade_pt within its own class_code.

Another opportunity knocking! Get this down, and your future will skyrocket.

Page 350

Licensed to , [email protected]
Chapter 9 Subqueries

Answer – Last Chance to Write a Correlated Subquery

SELECT * FROM student_table as TopS Select all columns in the


WHERE grade_pt > ( student_table if the
SELECT AVG(grade_pt) grade_pt column is
FROM student_table as BotS greater than the Average
WHERE TopS. class_code = BotS.class_code ) grade_pt within its own
ORDER BY class_code ; class_code.

student_id last_name first_name class_code grade_pt


234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
322133 Bond Jimmy JR 3.95
231222 Wilson Susie SO 3.80
324652 Delaney Danny SR 3.35

All you have to do is alias both tables and then correlate in the WHERE clause.

Page 351

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz – Write the Extreme Correlated Subquery


Write a correlated subquery that will bring back course_table
an answer set that returns all columns from the
course_table if that course is being taken by a course_id course_name credits seats
student who has a greater than average grade 100 Database Concepts 3 50
point within their own class code. 200 Introduction to SQL 3 20
210 Advanced SQL 3 22
student_course_table 220 V2R3 SQL Features 2 25
300 Physical Database Design 4 20
student_id course_id
400 Database Administration 4 16
280023 210
231222 210
125634 100 student_table
231222 220 student_id last_name first_name class_code grade_pt
125634 200 423400 Larkins Michael FR 0.00
322133 220 231222 Wilson Susie SO 3.80
125634 220 280023 McRoberts Richard JR 1.90
322133 300 322133 Bond Jimmy JR 3.95
324652 200 125634 Hanson Henry FR 2.88
333450 500 333450 Smith Andy SO 2.00
260000 400 324652 Delaney Danny SR 3.35
333450 400 260000 Johnson Stanley ? ?
234121 100 234121 Thomas Wendy FR 4.00
123250 100 123250 Phillips Martin SR 3.00

Use a subquery to get the answer set requested above. No joins allowed! The answer is on the next page.

Page 352

Licensed to , [email protected]
Chapter 9 Subqueries

Answer To Quiz – Write the Extreme Correlated Subquery

SELECT *
FROM course_table
WHERE course_id IN
(SELECT course_id
FROM student_course_table
WHERE student_id IN
(SELECT student_id
FROM student_table AS s1
WHERE grade_pt >
(SELECT AVG(grade_pt)
FROM student_table AS s2
WHERE s1.class_code=s2.class_code))) ;
course_id course_name credits seats
200 Introduction to SQL 3 20
100 Databricks Concepts 3 50
220 V2R3 SQL Features 2 25
300 Physical Database Design 4 20
210 Advanced SQL 3 22

Here is how you do it!

Page 353

Licensed to , [email protected]
Chapter 9 Subqueries

NOT IN Subquery Returns Nothing when nulls are Present

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Select all columns in the
2341218 400 Reilly William 36000.00
department_table if the
1256349 400 Harrison Herbert 54500.00
dept_no is not in the
employee_table.
SELECT * FROM
NO DATA RETURNS
department_table
WHERE dept_no NOT IN This is because when a
(SELECT dept_no NOT IN encounters a null
FROM employee_table) it freaks out.

When a NOT IN subquery encounters a null value in the list, it always returns nothing. The system can't eliminate
if it doesn't know what is in the null. The next page shows a technique to get around this problem.

Page 354

Licensed to , [email protected]
Chapter 9 Subqueries

Fixing a NOT IN Subquery with Null Values

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT * FROM department_table


WHERE dept_no NOT IN
(SELECT dept_no FROM employee_table
WHERE dept_no IS not null)

dept_no department_name mgr_no budget


500 Human Resources 1121334 450000.00

When a NOT IN subquery encounters a null value in the list, it always returns nothing. The system can't eliminate
if it doesn't know what is in the null. That is why you put the WHERE clause in the bottom query.
Page 355

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- Write the NOT Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery

Select all columns in the customer_table if


the customer has NOT placed an order.

Write the above query!

Page 356

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz- Write the NOT Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Select all columns in the customer_table if the


Customer has NOT placed an order.

SELECT *
FROM customer_table
WHERE customer_number
NOT IN Nulls are a
(SELECT customer_number NOT IN nightmare.
FROM order_table Notice how I
WHERE customer_number IS NOT NULL) ; account for them!

Whenever you have a NOT IN query, make sure you eliminate null values from being in the list.

Page 357

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz - Write the Subquery using a WHERE Clause

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery

Select all columns in the order_table that were placed


by a customer with ‘Bill’ anywhere in their name.

Another opportunity to show your brilliance is ready for you to make it happen.

Page 358

Licensed to , [email protected]
Chapter 9 Subqueries

Answer - Write the Subquery using a WHERE Clause

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT * Select all columns


FROM order_table in the order_table
WHERE customer_number IN that were placed
(SELECT customer_number by a customer with
FROM customer_table ‘Bill’ anywhere in
WHERE customer_name like '%Bill%') ; their name.

order_number customer_number order_date order_total


123456 11111111 2019-05-04 12347.53
123512 11111111 2020-01-01 8005.91

Great job on writing your query!

Page 359

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz- Write the Subquery with Two Parameters

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery

What is the highest dollar order for each Customer?


This Subquery will involve two parameters!

Get ready to be amazed at either yourself or the answer on the next page!

Page 360

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz- Write the Subquery with Two Parameters

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT customer_number, order_number, order_total Notice two


FROM order_table parameters in
WHERE (customer_number, order_total) IN the top query
(SELECT customer_number, MAX(order_total) and the two in
FROM order_table GROUP BY 1) ; the bottom.

customer_number order_number order_total


11111111 123456 12347.53
31323134 123552 5111.47
87323456 123585 15231.62
57896883 123777 23454.84

What is the highest dollar order for each Customer? This Subquery involves two parameters. The example above
is how you utilize multiple parameters in a subquery!
Page 361

Licensed to , [email protected]
Chapter 9 Subqueries

How the Double Parameter Subquery Works

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT customer_number, order_number, order_total


FROM order_table
WHERE (customer_number, order_total) IN
(SELECT customer_number, MAX(order_total)
FROM order_table GROUP BY 1) ;

customer_number Max(order_total)
11111111 12347.53 These 4 rows
31323134 5111.47 are sent to
87323456 15231.62 the top query
57896883 23454.84

The bottom query runs first, returning two columns. Turn to the next page for more info!

Page 362

Licensed to , [email protected]
Chapter 9 Subqueries

More on how the Double Parameter Subquery Works

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT customer_number, order_number, order_total


FROM order_table
WHERE (customer_number, order_total ) IN
( 11111111 ,12347.53 The top
31323134 , 5111.47 query
87323456 ,15231.62 now uses
57896883 ,23454.84 ); the In-list

Once the in-list builds, we can process the top query and get the final answer set.

Page 363

Licensed to , [email protected]
Chapter 9 Subqueries

Another Example of a Double Parameter Subquery

Above, we have a subquery that matches up subscriber_no and member_no because you need both columns to
distinguish an individual policyholder filing a claim. Notice that there is a parenthesis in the top query, but they
don't exist in the bottom query, which is the key to success for double-parameter subqueries.
Page 364

Licensed to , [email protected]
Chapter 9 Subqueries

Quiz – Write the Triple Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Write the Subquery

What is the customer_name who has the highest dollar order


among all customers? This query will have multiple Subqueries!

Good luck in writing this query. Remember that this will involve multiple subqueries.

Page 365

Licensed to , [email protected]
Chapter 9 Subqueries

Answer to Quiz – Write the Triple Subquery

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

What is the customer_name who has the highest dollar order


among all customers? This query will have multiple Subqueries!

SELECT customer_name XYZ Plumbing


This runs
third
FROM customer_table
WHERE customer_number IN
This runs (SELECT customer_number FROM order_table 58796883
second WHERE order_total IN
This runs first (SELECT Max(order_total) FROM order_table)) ; 23454.84

The query is above. Of course, the answer is XYZ Plumbing.


Page 366

Licensed to , [email protected]
Chapter 9 Subqueries

Using a Correlated Exists

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Use EXISTS to find which Customers have placed an Order?

SELECT customer_number, customer_name


FROM customer_table as Top1
WHERE EXISTS
(SELECT * FROM order_table as Bot1
Where Top1.customer_number = Bot1.customer_number ) ;

The EXISTS command will determine, via a Boolean, if something is True or False. If a customer places an order
it EXISTS and using the Correlated Exists statement, only customers who have placed an order will return in the
answer set. EXISTS is different than IN as it is less restrictive, as you will soon understand.

Page 367

Licensed to , [email protected]
Chapter 9 Subqueries

How a Correlated Exists Matches Up

customer_table order_table

customer_number customer_name order_number customer_number order_total


Does not
11111111 Billy’s Best Choice 123456 11111111 12347.53
Exist in 31313131 Acme Products 123512 11111111 8005.91
order_table 31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

SELECT customer_number, customer_name


FROM customer_table as Top1
WHERE EXISTS
(SELECT * FROM order_table as Bot1
Where Top1.customer_number = Bot1.customer_number ) ;
customer_number customer_name
11111111 Billy’s Best Choice
31323134 ACE Consulting
57896883 XYZ Plumbing
87323456 Databases N-U

Only customers who placed an order return with the above Correlated EXISTS.

Page 368

Licensed to , [email protected]
Chapter 9 Subqueries

The Correlated NOT Exists

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Use NOT EXISTS to find which Customers have NOT placed an Order?

SELECT customer_number, customer_name


FROM customer_table as Top1
WHERE NOT EXISTS
(SELECT * FROM order_table as Bot1
Where Top1.customer_number = Bot1.customer_number ) ;
customer_number customer_name
31313131 Acme Products

The EXISTS command will determine, via a Boolean, if something is True or False. If a customer places an order
it EXISTS and using the Correlated NOT Exists statement, only customers who have not placed an order will
return in the answer set. EXISTS is different than IN as it a NOT can handle null values.

Page 369

Licensed to , [email protected]
Chapter 9 Subqueries

Page 370

Licensed to , [email protected]
Chapter 10 Strings

Chapter 10 – Strings

"The only way to do great work is to love what you do."

- Steve Jobs

Page 371

Licensed to , [email protected]
Chapter 10 Strings

UPPER and lower Functions

The UPPER and LOWER functions convert the input


string to either all uppercase or lowercase characters.

Syntax: UPPER(string), LOWER(string)

SELECT first_name
,UPPER (first_name) as upper_case
,lower(first_name) as lower_case
FROM student_table
first_name upper case lower case
Martin MARTIN martin
Henry HENRY henry
Susie SUSIE susie
Wendy WENDY wendy
Stanley STANLEY stanley
Richard RICHARD richard
Jimmy JIMMY jimmy
Danny DANNY danny
Andy ANDY andy
Michael MICHAEL michael

The UPPER and LOWER functions convert the input string to either all uppercase or lowercase characters.

Page 372

Licensed to , [email protected]
Chapter 10 Strings

The Length Command Counts Characters

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

varchar Answer Set

SELECT first_name first_name lnth


, LENGTH(first_name) AS lnth Billy 5
FROM employee_table Cletus 6
WHERE LENGTH (first_name) < 7
John 4
ORDER BY 1;
Mandee 6

The LENGTH command counts the number of characters in a string.

Page 373

Licensed to , [email protected]
Chapter 10 Strings

LENGTH and TRIM Work on Fixed Length Columns

CHAR (20)

SELECT last_name
,LENGTH(last_name) AS lnth_wrong
,LENGTH(TRIM(last_name)) as lnth_right
FROM employee_table
ORDER BY 1;
last_name lnth_wrong lnth_right
Chambers 20 8
Coffing 20 7
Harrison 20 8
Jones 20 5
Larkins 20 7
Reilly 20 6
Smith 20 5
Smythe 20 6
Strickling 20 10

The LENGTH command brings back a length of 20 on many systems, but Databricks can still deliver the length of
a char(20) string using the TRIM command to remove the leading and trailing spaces.

Page 374

Licensed to , [email protected]
Chapter 10 Strings

The Char_Length Command Counts Characters

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

varchar Answer Set

SELECT first_name first_name lnth


,CHAR_LENGTH(first_name) AS lnth Billy 5
FROM employee_table Cletus 6
WHERE CHAR_LENGTH (first_name) < 7 John 4
ORDER BY 1;
Mandee 6

The CHAR_LENGTH command counts the number of characters in a string.

Page 375

Licensed to , [email protected]
Chapter 10 Strings

CHAR_LENGTH and OCTET_LENGTH

Query 1 first_name c_length


Mandee 6
SELECT first_name
Herbert 7
,CHAR_LENGTH(first_name) c_length
William 7
FROM employee_table ;
Loraine 7
Squiggy 7
Query 2 Richard 7
SELECT first_name Cletus 6
,Octet_Length (first_name) AS c_length Billy 5
FROM employee_table ; John 4

You can use the CHAR_LENGTH and OCTET LENGTH commands equivalently. These queries get the same
answer sets!

Page 376

Licensed to , [email protected]
Chapter 10 Strings

The TRIM Command trims both Leading and Trailing Spaces

Query 1

SELECT last_name
,Trim(last_name) AS no_spaces
FROM employee_table ;

Query 2
SELECT last_name
,Trim(Both from last_name) AS no_spaces
FROM employee_table ;

Both queries above do the exact same thing.


They remove spaces from the beginning and
the end of the column last_name.

Both queries trim both the leading and trailing spaces from the last_name column for the life of the query.

Page 377

Licensed to , [email protected]
Chapter 10 Strings

The RTRIM and LTRIM Command Trims Spaces

RTRIM Query
SELECT last_name
,RTRIM(last_name) AS trim_trailing_spaces
FROM employee_table ;

LTRIM Query
SELECT last_name
,LTRIM(last_name) AS trim_leading_spaces
FROM employee_table ;

Trimming Both Leading and Trailing Spaces Query


SELECT last_name
,LTRIM(RTRIM(last_name)) AS trim_spaces_leading_trailing
FROM employee_table ;

The RTRIM command trims trailing spaces from a character string. The LTRIM trims leading spaces from a
character string. The LTRIM(RTRIM) combination trims both leading and trailing spaces from a character string.

Page 378

Licensed to , [email protected]
Chapter 10 Strings

TRIM can also TRIM Characters

SELECT first_name
,Trim(TRAILING 'y' FROM first_name) AS no_y
FROM employee_table
WHERE first_name LIKE '%y';

first_name no_y
Squiggy Squigg
Billy Bill

The TRIM function can also trim characters.

Page 379

Licensed to , [email protected]
Chapter 10 Strings

Concatenation

Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1 Query 2 Query 3
+ Snowflake
+ Azure Cloud SELECT first_name
,last_name This is a literal
+ DB2
To pipe symbols ,first_name space in single
+ Excel
mean quotes
+ Greenplum || ' '
+ Hadoop concatenate
|| last_name as full_name
+ Kognitio
FROM employee_table
+ Netezza
+ Oracle WHERE first_name = 'Squiggy'
+ Matrix
+ Databricks Messages Garden of Analysis Result 1
+ SQL Server
+ Sybase first_name last_name full_name
+ Teradata
+ Vertica 1 Squiggy Jones Squiggy Jones

Two pipe symbols represent concatenation. That allows you to combine multiple columns into one column. The ||
(Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between; just put two Pipe
Symbols together. In this example, we have combined the first name, then a single space, and then the last name to
get a new column called full_name.

Page 380

Licensed to , [email protected]
Chapter 10 Strings

Concat and Concat_WS for Concatenation

Concatenation with CONCAT and CONCAT_WS

SELECT CONCAT CONCAT_WS adds a separator


last_name Concatenates between string arguments
,first_name
,CONCAT(TRIM(last_name),' ', first_name) as namebackwards
,CONCAT_WS('---', TRIM(last_name), first_name) as concat_separator
FROM employee_table
last_name first_name namebackwards concat_separator
Chambers Mandee Chambers Mandee Chambers---Mandee
Harrison Herbert Harrison Herbert Harrison---Herbert
Reilly William Reilly William Reilly---William
Larkins Loraine Larkins Loraine Larkins---Loraine
Jones Squiggy Jones Squiggy Jones---Squiggy
Smythe Richard Smythe Richard Smythe---Richard
Strickling Cletus Strickling Cletus Strickling---Cletus
Coffing Billy Coffing Billy Coffing---Billy
Smith John Smith John Smith---John

CONCAT concatenates, and CONCAT_WS adds a separator between string arguments.

Page 381

Licensed to , [email protected]
Chapter 10 Strings

The SUBSTR and SUBSTRING Commands

SELECT first_name,
SUBSTR(first_name, 2 , 3) AS sub1,
SUBSTRING(first_name, 2, 3) AS sub2,
SUBSTRING(first_name from 2 for 3) as sub3
FROM employee_table
WHERE dept_no = 400;
Start in Go for
position 2 3 positions

first_name sub1 sub2 sub3


Herbert erb erb erb
William ill ill ill
Cletus let let let

The above example shows the SUBSTR command, which can also use the keyword SUBSTRING. The Substring
function receives two parameters, and they are the starting position of the string and the number of places to return
(from the starting position). The above example will start in position two and go for three positions!

Page 382

Licensed to , [email protected]
Chapter 10 Strings

How SUBSTR Works with NO ENDING POSITION

SELECT first_name,
SUBSTR(first_name, 2) AS gotoend
FROM employee_table ;
Start in
Position 2
first_name gotoend
Squiggy quiggy
John ohn Since there is
Richard ichard only one
Herbert erbert parameter
(starting position)
Mandee andee
the results bring
Cletus letus all remaining
William illiam characters back
Billy illy
Loraine oraine

If you don’t tell the substring the end position, it will go all the way to the end.

Page 383

Licensed to , [email protected]
Chapter 10 Strings

Using SUBSTR and CHAR_LENGTH Together

SELECT last_name Get the last two letters


,SUBSTR(last_name, of the last_name
CHAR_LENGTH(TRIM(last_name)) -1, 2) AS letters
FROM employee_table;

Find the length last_name letters


of each name and Jones es
make the SUBSTR Smith th
starting position Smythe he
Length – 1. Harrison on
Chambers rs
Strickling ng
Reilly ly
Coffing ng
Larkins ns

The SQL above brings back the last two letters of each last_name even though the last names are of different
lengths. We SUBSTR last_name and run a subquery to get our starting position. Notice that we want the starting
position to be Char_Length – 1. We can then go for two positions.

Page 384

Licensed to , [email protected]
Chapter 10 Strings

The POSITION Command finds a Letters Position

SELECT last_name
,Position('e' in last_name) AS find_the_e
,Position('f' in last_name) AS find_the_f
FROM employee_table ;

last_name find_the_e find_the_f


e is in4th
Jones 4 0
position
Smith 0 0
Smythe 6 0
No f is in
Harrison 0 0 the name
Chambers 6 0
e is 2nd Strickling 0 0
position Reilly 2 0
1st f is in 3rd
in name Coffing 0 3
position
Larkins 0 0

The example above uses the POSITION counter. What it will do is tell you what position a letter locates in a string.
Why did Jones have a 4 in the result set? The ‘e’ was in the 4th position. Why did Smith get a zero for both
columns? There is no ‘e’ in Smith and no ‘f’ in Smith. If there are two ‘f’s’ only, the first occurrence reports.

Page 385

Licensed to , [email protected]
Chapter 10 Strings

The POSITION Command is Brilliant with SUBSTR

SELECT dept_no ,department_name as depty


,SUBSTR(department_name ,1 , POSITION(' ' IN department_name) ) as word1
FROM department_table
ORDER BY 1;

dept_no depty word1 We are returning only


100 Marketing Marketing the first word
200 Research and Develop Research from each
department_name
300 Sales Sales
that has multiple words
400 Customer Support Customer
500 Human Resources Human

Our SUBSTR has a starting position of 1. We now run a subquery using


POSITION to find the first space. We subtract 1 and now have the ending position

What was the starting position of the Substr in the above query? It was one. The ending position (FOR length)
calculates to look for the first space. So, for “Research and Development,” the ending position was one and for
nine.

Page 386

Licensed to , [email protected]
Chapter 10 Strings

CHARINDEX Finds a Letter(s) Position in a String

Syntax: CHARINDEX(substring, string[, start_pos])

SELECT last_name
,CHARINDEX ('e', last_name) AS find_e
,CHARINDEX ('f', last_name) AS find_f
,CHARINDEX ('th', last_name) AS find_th
,CHARINDEX ('in', last_name, 6) AS find_in_after_6
FROM employee_table
WHERE TRIM(last_name) IN ('Smith', 'Smythe', 'Strickling', 'Coffing')
ORDER BY 1 DESC;

last_name find_e find_f find_th find_in_after_6


Strickling 0 0 0 8
Smythe 6 0 4 0
Smith 0 0 4 0
Coffing 0 3 0 0

Tell this function what character(s) to look for in a string, and optionally, what starting position first to start
searching. If it does not find the character(s) in the string, it returns a 0. It also only reports the first occurrence.

Page 387

Licensed to , [email protected]
Chapter 10 Strings

The CHARINDEX Command is brilliant with SUBSTRING


Starting position is a subquery
using CHARINDEX. Find the
first space and subtract two.

SELECT last_name
,SUBSTRING (last_name, CHARINDEX(' ', last_name) -2 , 2) as last_two_letters
from employee_table;
last_name last_two_letters
Smythe he
Strickling ng
Chambers rs
Harrison on
Coffing ng
Smith th
Jones es
Larkins ns
Reilly ly

What was the starting position of the Substring in the above query? It uses a subquery to determine the starting
position. The CHARINDEX finds the first space (end of the name) and then subtracts 2 to get the starting length.
Even though the names were of different lengths, the CHARINDEX subquery brings back only the last two letters.

Page 388

Licensed to , [email protected]
Chapter 10 Strings

The CHARINDEX Command Using a Literal


The The 1st
phrase character
we are of the
seeking phrase
to find starts
here

SELECT CHARINDEX('May flowers', 'April showers bring May flowers')


as may_flowers_position ;

may_flowers_position
21

We are looking for the phrase of May flowers. The phrase starts in position 21 of the string.

Page 389

Licensed to , [email protected]
Chapter 10 Strings

LPAD and RPAD

Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: Nexus Schema: SQL Class EXECUTE
Systems Query 1 Query 2 Query 3
+ Snowflake
+ Azure Cloud
SELECT first_name || ' ' || last_name as name
+ DB2 ,Length(first_name) as len
+ Excel ,LPad(first_name, 10) as left_spaces
+ Greenplum ,Length(LPad(first_name, 10)) as l10
+ Hadoop ,RPad(last_name, 15) as right_spaces
+ Kognitio ,Length(RPAD(last_name, 15)) as l15
+ Netezza
+ Oracle
FROM employee_table
+ Matrix WHERE first_name LIKE 'M%' ;
+ Databricks
+ SQL Server Messages Garden of Analysis Result 1
+ Sybase
+ Teradata name len left_spaces l10 right_spaces l15
+ Vertica 1 Mandee Chambers 6 Mandee 10 Chambers 15

The LPAD () command pads spaces to the left of a string, and the RPAD () pads spaces to the right of a string.
Notice the spaces in the answer set and the lengths.

Page 390

Licensed to , [email protected]
Chapter 10 Strings

The REPLACE Function

The REPLACE function replaces all occurrences


of substring1 in the string with substring2.

Syntax: REPLACE(string, substring1, substring2)

SELECT customer_name
,REPLACE (customer_name, ' ', '_') AS under_score
,phone_number
,REPLACE (phone_number, '-', ' ') AS no_dash
FROM customer_table
customer_name under_score phone_number no_dash
Billy's Best Choice Billy's_Best_Choice 555-1234 555 1234
Acme Products Acme_Products 555-1111 555 1111
ACE Consulting ACE_Consulting 555-1212 555 1212
XYZ Plumbing XYZ_Plumbing 347-8954 347 8954
Databases N-U Databases_N-U 322-1012 322 1012
Replace spaces with underscores Replace dashes with spaces

The RELACE function replaces a value for another in a string. Above, we have replaced the spaces in a Customer
name with underscores. In the phone Number, we have replaced the dashes (-) with space.
Page 391

Licensed to , [email protected]
Chapter 10 Strings

The ASCII Function

The example below shows you how to convert


characters into the integer ASCII value.

Syntax: ASCII (string)

SELECT
ASCII('H') as asciih
,ASCII('o') as asciio
,ASCII('w') as asciiw
,ASCII('d') as asciid
,ASCII('y') as asciiy

asciih asciio ascii2 aseiid asciiy


72 111 119 100 121

The example above shows you how to convert characters into the integer ASCII value.

Page 392

Licensed to , [email protected]
Chapter 10 Strings

The Reverse String Function

The REVERSE function returns the string str


with the order of the characters reversed.

Syntax: REVERSE(str)

SELECT first_name
,REVERSE(first_name) as backward
FROM employee_table
WHERE dept_no = 400;

first_name backward
Herbert trebreH
William mailliW
Cletus sutelC

The example above shows how the REVERSE function returns the string with the order of the characters reversed.

Page 393

Licensed to , [email protected]
Chapter 10 Strings

The RIGHT Function

The RIGHT function returns the rightmost len characters


from the string str, or null if any argument is null.

Syntax: RIGHT(str, len)

SELECT department_name
,RIGHT(TRIM(department_name), 4) right_4_char
FROM department_table
department_name right_4_char
Research and Develop elop
Marketing ting
Customer Support port
Sales ales
Human Resources rces

The RIGHT function returns the rightmost n characters from the string, or null if any argument is null.

Page 394

Licensed to , [email protected]
Chapter 10 Strings

The LEFT and RIGHT Functions

The LEFT and RIGHT functions are abbreviations of the SUBSTRING function. They
return a requested number of characters from the left or right end of the input string.

Syntax: LEFT(string, n), RIGHT(string, n)

SELECT first_name
,LEFT (first_name , 1) AS first_initial
,last_name
,Right (RTRIM(last_name), 2) AS last_two_letters
FROM employee_table
WHERE dept_no in (400) ;
first_name first_initial last_name last_two_letters
Cletus C Strickling ng
Herbert H Harrison on
William W Reilly ly

In our example above, our result set will have the first_name and last_name coming back, but we also use the
LEFT and RIGHT functions to produce the first letter of the first_name and the last two letters of the last_name.
We filtered the rows with an additional WHERE clause to only bring back three rows. Notice the RTRIM of
last_name. The trim function is necessary because the last_name column has a data type of Character 20, so the
system pads it with spaces.

Page 395

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example for Whitespace Character

SELECT \s matches a ^ The start of a string


city whitespace character $ The end of a string
,state . Wildcard for any character, but (\n)
| a|b corresponds to a or b
,city REGEXP 'San\\s.*' as t_or_f \ Used to escape a special character
FROM addresses * Match to 0 or more of the previous
ORDER BY 3 DESC ? Matches 0 or 1 of the previous
LIMIT 5 + Matches 1 or more of the previous
a The character "a"
city state t_or_f ab The string "ab"
\s Matches a whitespace character
San Pablo CA True \S Matches a non-whitespace character
San Pablo CA True \w Matches a word character
Deltona FL False \W Matches a non-word character
Torrance CA False \d Matches one digit
Orange NJ False \D Matches one non-digit
[\b] A backspace character
\*c A control character
(xyz) Grouping of characters
?:xyz Non-capturing group of characters
[xyz] Matches an x or y or z
[^xyz] Matches a character other than
[a-q] Matches within a specified range
[0-7] Matches a digit within a range

REGEXP returns true if the subject matches the specified pattern. Both inputs must be text expressions.
Page 396

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example for Non-Whitespace

SELECT \S matches a ^ The start of a string


city non-whitespace character $ The end of a string
,state . Wildcard for any character, but (\n)
| a|b corresponds to a or b
,city REGEXP 'Harrison\\S.*' as t_or_f \ Used to escape a special character
FROM addresses * Match to 0 or more of the previous
ORDER BY 3 DESC ? Matches 0 or 1 of the previous
LIMIT 5 + Matches 1 or more of the previous
a The character "a"
city state t_or_f ab The string "ab"
\s Matches a whitespace character
Harrisonburg VA True \S Matches a non-whitespace character
Deltona FL False \w Matches a word character
Torrance CA False \W Matches a non-word character
Orange NJ False \d Matches one digit
Westmont IL False \D Matches one non-digit
[\b] A backspace character
\*c A control character
(xyz) Grouping of characters
?:xyz Non-capturing group of characters
[xyz] Matches an x or y or z
[^xyz] Matches a character other than
[a-q] Matches within a specified range
[0-7] Matches a digit within a range

The REGEXP above looks for a city that starts with ‘Harrison’ but has additional characters (no whitespace).
Page 397

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example for [xyz]

SELECT matches 1, 2, or 3 ^ The start of a string


street, followed by non-whitespace $ The end of a string
street REGEXP '[123]\\S.*' as t_or_f . Wildcard for any character, but (\n)
| a|b corresponds to a or b
FROM addresses \ Used to escape a special character
ORDER BY 2 DESC * Match to 0 or more of the previous
LIMIT 3 ? Matches 0 or 1 of the previous
+ Matches 1 or more of the previous
street t_or_f a The character "a"
ab The string "ab"
306 Lake Street True
\s Matches a whitespace character
911 Wilbur Road True \S Matches a non-whitespace character
717 Maiden Lane True \w Matches a word character
\W Matches a non-word character
\d Matches one digit
\D Matches one non-digit
[\b] A backspace character
\*c A control character
(xyz) Grouping of characters
?:xyz Non-capturing group of characters
[xyz] Matches an x or y or z
[^xyz] Matches a character other than
[a-q] Matches within a specified range
[0-7] Matches a digit within a range

The REGEXP above matches one, two, or three, followed by non-whitespace.

Page 398

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example Start of a String

SELECT String starts with 1, ^ The start of a string


street, 2, or 3 $ The end of a string
street REGEXP '^[123].*' t_or_f . Wildcard for any character, but (\n)
| a|b corresponds to a or b
FROM addresses \ Used to escape a special character
ORDER BY 2 DESC LIMIT 3 * Match to 0 or more of the previous
? Matches 0 or 1 of the previous
street t_or_f + Matches 1 or more of the previous
306 Lake Street True a The character "a"
217 Wood Street True ab The string "ab"
336 Lincoln Street True \s Matches a whitespace character
\S Matches a non-whitespace character
SELECT string does not \w Matches a word character
start with 1, 2, or 3 \W Matches a non-word character
street,
\d Matches one digit
street REGEXP '^[^123].*' t_or_f \D Matches one non-digit
FROM addresses [\b] A backspace character
ORDER BY 2 DESC LIMIT 3 \*c A control character
(xyz) Grouping of characters
street t_or_f ?:xyz Non-capturing group of characters
909 Willow Avenue True [xyz] Matches an x or y or z
717 Maiden Lane True [^xyz] Matches a character other than
541 Oak Avenue True [a-q] Matches within a specified range
[0-7] Matches a digit within a range

The REGEXP examples above look for strings that start or don't start with one, two, or three.
Page 399

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example End of a String

SELECT String ends with ^ The start of a string


STREET, oad $ The end of a string
STREET REGEXP '.*(oad)$' t_or_f . Wildcard for any character, but (\n)
| a|b corresponds to a or b
FROM addresses \ Used to escape a special character
ORDER BY 2 DESC LIMIT 3 * Match to 0 or more of the previous
? Matches 0 or 1 of the previous
street t_or_f + Matches 1 or more of the previous
911 Wilbur Road True a The character "a"
791 River Road True ab The string "ab"
720 Creek Road True \s Matches a whitespace character
\S Matches a non-whitespace character
SELECT string ends with urt \w Matches a word character
\W Matches a non-word character
street,
\d Matches one digit
street REGEXP '.*(urt)$' as t_or_f \D Matches one non-digit
FROM addresses [\b] A backspace character
ORDER BY 2 DESC LIMIT 3 \*c A control character
(xyz) Grouping of characters
street t_or_f ?:xyz Non-capturing group of characters
785 Hanover Court True [xyz] Matches an x or y or z
418 Heather Court True [^xyz] Matches a character other than
669 Fawn Court True [a-q] Matches within a specified range
[0-7] Matches a digit within a range

The REGEXP examples above look for consecutive letters at the end of the string.

Page 400

Licensed to , [email protected]
Chapter 10 Strings

REGEXP Example Matching Within a Range

SELECT String has a 1 or 2 ^ The start of a string


street, $ The end of a string
street REGEXP '[1-2].*' AS t_or_f . Wildcard for any character, but (\n)
| a|b corresponds to a or b
FROM addresses \ Used to escape a special character
ORDER BY 2 DESC LIMIT 3 * Match to 0 or more of the previous
? Matches 0 or 1 of the previous
street t_or_f + Matches 1 or more of the previous
541 Oak Avenue True a The character "a"
911 Wilbur Road True ab The string "ab"
717 Maiden Lane True \s Matches a whitespace character
\S Matches a non-whitespace character
SELECT CITY starts \w Matches a word character
with A or B \W Matches a non-word character
city,
\d Matches one digit
city REGEXP '[A-B].*' AS t_or_f \D Matches one non-digit
FROM addresses [\b] A backspace character
ORDER BY 2 DESC LIMIT 3 \*c A control character
(xyz) Grouping of characters
city TorF ?:xyz Non-capturing group of characters
Aliquippa True [xyz] Matches an x or y or z
Avon True [^xyz] Matches a character other than
Banerster True [a-q] Matches within a specified range
[0-7] Matches a digit within a range

The REGEXP examples above look for a range of characters in an OR criteria.

Page 401

Licensed to , [email protected]
Chapter 10 Strings

REGEXP_REPLACE

REGEXP_REPLACE searches for a specific Regex pattern from the


provided string(value) and replaces it with whatever you specify.

Syntax: REGEXP_REPLACE(value, regexp, replacement)

SELECT dept_no
,REGEXP_REPLACE(dept_no, 0, 1) As zero_to_1
FROM employee_table
WHERE dept_no IN (100, 200) ; Replace 0
with 1
for dept_no

DEPT_NO zero_to_1
200 211
200 211
100 111

Regexp_Replace returns the subject with the specified pattern (or all pattern occurrences) either removed or
replaced by a replacement string. If it finds no matches in the subject, Regexp_Replace returns the original subject.
For example, the query above uses Regexp_Replace to replace any zero with a one for Dept_No.

Page 402

Licensed to , [email protected]
Chapter 10 Strings

REGEXP_REPLACE Example

REGEXP_REPLACE searches for a specific Regex pattern from the


provided string(value) and replaces it with whatever you specify.

Syntax: REGEXP_REPLACE(value, regexp, replacement)

SELECT FIRST_NAME
,REGEXP_REPLACE(first_name, '^W', 'W starts W') as replaceW
,REGEXP_REPLACE(first_name, 'y$', 'y Ends with y') as replace_y
,REGEXP_REPLACE(first_name, 'Wendy', 'Wendi') as replace
,REGEXP_REPLACE(first_name, 'Wendy', '*****') as encrypt
FROM student_table
WHERE first_name = 'Wendy'

FIRST_NAME replaceW Replace_y replace encrypt


Wendy W starts Wendy Wendy Ends with y Wendi *****

REGEXP_REPLACE searches for a specific Regex pattern from the provided string(value) and replaces it with
whatever you specify. Above, we are using many different techniques.

Page 403

Licensed to , [email protected]
Chapter 10 Strings

Another REGEXP_REPLACE Example

SELECT CUSTOMER_NAME
,REGEXP_REPLACE(customer_name, ' ', '_') AS underscore
,PHONE_NUMBER
,REGEXP_REPLACE(phone_number, '-', ' ') AS no_dash
FROM customer_table

customer_name under_score phone_number no_dash


Billy's Best Choice Billy's_Best_Choice 555-1234 555 1234
Acme Products Acme_Products 555-1111 555 1111
ACE Consulting ACE_Consulting 555-1212 555 1212
XYZ Plumbing XYZ_Plumbing 347-8954 347 8954
Databases N-U Databases_N-U 322-1012 322 1012

Replace spaces with underscores Replace dashes with spaces

REGEXP_REPLACE searches for a specific Regex pattern from the provided string(value) and replaces it with
whatever you specify. Above, we are replacing spaces with underscores and then replacing dashes with spaces.

Page 404

Licensed to , [email protected]
Chapter 10 Strings

REGEXP_LIKE

Syntax: REGEXP_LIKE( <subject> , <pattern> [ , <parameters> ] )

SELECT subscriber_no, city, state


FROM addresses
WHERE REGEXP_LIKE (city, 'Zee.*'); Case-sensitive

subscriber_no city state


1000247 Zeeland MI

REGEXP_LIKE returns true if the subject matches the pattern. Both expressions must be text expressions. Our
example above is case-sensitive.

Page 405

Licensed to , [email protected]
Chapter 10 Strings

RLIKE

Syntax: RLIKE( <subject> , <pattern>)

SELECT subscriber_no, city, state


FROM addresses
WHERE RLIKE (city, 'Zee.*'); Case-sensitive

subscriber_no city state


1000247 Zeeland MI

RLIKE returns true if the subject matches the specified pattern. Both inputs must be text expressions. RLIKE is a
relative of the LIKE function, but with POSIX extended regular expressions instead of SQL LIKE pattern syntax.
Thus, it supports more complex matching conditions than LIKE.

Page 406

Licensed to , [email protected]
Chapter 10 Strings

SOUNDEX Function to Find a Sound

The SOUNDEX, better named "Sound" will display similar sounding items.
The example below will find any Last_Name that sounds like 'Smith'.

Syntax: SOUNDEX(String)

SELECT DISTINCT
SOUNDEX(last_name) as soundslike1
,SOUNDEX('Smith') as soundslike2
,last_name
FROM employee_table
WHERE SOUNDEX(last_name) = SOUNDEX('Smith');

soundslike1 soundslike 2 last_name


S530 S530 Smith
S530 S530 Smythe

Call center employees often look up customers by their last name while speaking with the customer on the phone.
The employees would like to guess the spelling of the name to narrow the search results and then work with the
customer to determine the appropriate spelling. The SOUNDEX function searches for similar sounds. Above, we
are looking at anyone with a name that sounds like 'Smith.' We got two results back in 'Smith' and 'Smythe.'

Page 407

Licensed to , [email protected]
Chapter 10 Strings

Page 408

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Chapter 11 – Interrogating the Data

"Imagination is more important than knowledge. For knowledge is limited,


whereas imagination embraces the entire world, stimulating progress, giving
birth to evolution."

- Albert Einstein

Page 409

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
123250 Phillips Martin SR 3.00
234121 Thomas Wendy FR 4.00

SELECT last_name
,NULLIF(grade_pt, 0) AS gp1
,NULLIF(grade_pt, 3.0) AS gp2
,NULLIF(grade_pt, 4.0) AS gp3
FROM student_table
WHERE student_id IN (423400, 123250, 234121)
ORDER BY last_name ;

Fill in the rest


last_name gp1 gp2 gp3 of the answer
Larkins set after
Phillips looking at the
Thomas table and the
query.

The NULLIF command above examines the column grade_pt. If the first NULLIF statement if the grade_pt = 0,
then it will become null.
Page 410

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Answer – Fill in the Answers for the NULLIF Command

student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
123250 Phillips Martin SR 3.00
234121 Thomas Wendy FR 4.00

SELECT last_name
,NULLIF(grade_pt, 0) AS gp1
,NULLIF(grade_pt, 3.0) AS gp2
,NULLIF(grade_pt, 4.0) AS gp3
FROM student_table
WHERE student_id IN (423400, 123250, 234121)
ORDER BY last_name ;

last_name gp1 gp2 gp3


Larkins ? 0.00 0.00
Phillips 3.00 ? 3.00
Thomas 4.00 4.00 ?

Look at the answers above, and if it doesn’t make sense, go over it again until it does.

Page 411

Licensed to , [email protected]
Chapter 11 Interrogating the Data

COALESCE in a Real-World Example

SELECT FIRST_NAME, LAST_NAME


,COALESCE (work_phone, cell_phone, home_phone, 'No Phone') AS phone
FROM employee_phone
ORDER BY 1;

first_name last_name phone


Billy Coffing 512 123-1445
Cletus Strickling 656 188-9912
Herbert Harrison 212 134-6851
John Smith 783 122-6881
Loraine Larkins 713 133-8781
Mandee Chambers No Phone
Richard Smythe 792 123-8159
Squiggy Jones No Phone
William Reilly No Phone

Coalesce returns the first non-null value in a list; if all values are Null, it returns Null. Notice the query on the left
shows the employee's name and what numbers they have available. The query on the right uses the coalesce
command to attempt first to call their work_phone, but if that is null, they call the cell_phone, and if that is null,
they call the home_phone. If all the number values are null, then a literal 'No Phone' is the entry.

Page 412

Licensed to , [email protected]
Chapter 11 Interrogating the Data

The COALESCE Command

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00 COALESCE
1324657 200 Coffing Billy 41888.88 returns the
1333454 200 Smith John 48000.00 first non-null
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
value in
2341218 400 Reilly William 36000.00 a list
1256349 400 Harrison Herbert 54500.00

SELECT last_name
,COALESCE (dept_no, employee_no) as coal
FROM employee_table
WHERE TRIM(last_name) IN ('Jones', 'Reilly')
ORDER BY 1
last_name coal
Jones 2000000
Reilly 400

Coalesce returns the first non-null value in a list, and if all values are Null returns Null.

Page 413

Licensed to , [email protected]
Chapter 11 Interrogating the Data

COALESCE is Equivalent to this CASE Statement

SELECT last_name
,COALESCE (dept_no, employee_no) as coal
FROM employee_table

SELECT last_name
, CASE
WHEN dept_no IS not null THEN dept_no
WHEN employee_no IS not null THEN employee_no
ELSE null
END as coal
FROM employee_table ;

Coalesce returns the first non-null value in a list, and if all values are Null, returns Null. Above are two queries
that return the same answer set. These examples should give you a better idea of how Coalesce works.

Page 414

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Some Great CAST (Convert And Store) Examples

SELECT
CAST('ABCDE' AS CHAR(1) ) AS trunc
,CAST(128 AS CHAR(3) ) AS ok
,CAST('2023-05-30' as Date) as date1
,'2023-06-30'::DATE as implied_cast

trunc ok date1 implied_cast


ABCDE 128 2023-05-30 2023-06-30

The Databricks CAST function converts a value from one data type to another data type. Above are some
examples to get a better understanding of the CAST command.

Page 415

Licensed to , [email protected]
Chapter 11 Interrogating the Data

A Rounding Example Using CAST

SELECT
CAST(.014 AS Decimal(3,2)) as a014
,CAST(.016 AS Decimal(3,2)) as a016
,CAST(.015 AS Decimal(3,2)) as a015
,CAST(.0150 AS Decimal(3,2)) as a0150
,CAST(.0250 AS Decimal(3,2)) as a0250
,CAST(.0159 AS Decimal(3,2)) as a0159
Digit to Right of rounding digit = 5
Digit to Right Digit to Right AND there are trailing non-zero digits.
of rounding digit of rounding digit Rounding behaves as if the value to
< 5 (no change) > 5 (increase 1) the right of the rounding digit > 5

a014 a016 a015 a0150 a0250 a0159


0.01 0.02 0.02 0.02 0.03 0.02

Digit to Right of rounding digit = 5


AND no trailing non-zero digits. If the
rounding digit is odd increase by 1. If the
value of the rounding digit is even no change.

The examples above will help you understand how complicated and tricky rounding can be.

Page 416

Licensed to , [email protected]
Chapter 11 Interrogating the Data

CAST will Round Values up or Down

SELECT order_number as ordno


,customer_number as custno
,order_date
,order_total
,CAST(order_total as Decimal(10,1)) as round
,CAST(order_total as Decimal(5,0)) as rounded
FROM order_table
ORDER BY 1 ;

ordno custno order_date order_total round rounded


123456 11111111 2020-05-04 12347.53 12347.5 12348
123512 11111111 2021-01-01 8005.91 8005.9 8006
123552 31323134 2021-10-01 5111.47 5111.5 5111
123585 87323456 2021-10-10 15231.62 15231.6 15232
123777 57896883 2021-09-09 23454.84 23454.8 23455

The CAST statement causes data to round the values up or down.

Page 417

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Valued Case vs. Searched Case

SELECT course_name The column credits (in blue)


,CASE credits follows the word CASE. This is
WHEN 1 THEN 'One Credit' a valued case statement.
WHEN 2 THEN 'Two credits' Rules for a Valued CASE:
WHEN 3 THEN 'Three credits' 1. You can only check
for equality.
Else 'credits not found' 2. You can only check
END AS creditalias the column credits.
FROM course_table ;

SELECT course_name NO Value follows the word CASE.


,CASE This is a Searched CASE!
WHEN credits <= 1 THEN 'One' Rules for a
WHEN credits = 2 THEN 'Two' Searched CASE:
WHEN credits < 4 THEN 'Three' 1. Check for
WHEN course_name like 'Tera%' Then 'Four' equality, <,>, IS
Null, etc.
Else 'Don’t know'
2. You can check
END AS creditalias other columns
FROM course_table ; as well.

The second example is better unless you have a simple query like the first example.

Page 418

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Combining Searched Case and Valued Case

SELECT last_name Searched NO Value follows CASE so


,CASE CASE It’s a SEARCHED CASE.
WHEN grade_pt IS null THEN 'Null' Searched Case Rules:
WHEN grade_pt IN (1, 2, 3) THEN 'Integer GPA' Check any way (=, <, >, IN, etc)
WHEN grade_pt Between 1 and 2 THEN 'Low GPA' Check other columns, as well.
WHEN grade_pt < 4 THEN 'High GPA'
ELSE '4.0 GPA' Valued
END AS gpa_info last_name gpa_info year1
CASE
,CASE class_code checks Larkins High GPA Year-1
WHEN 'FR' THEN 'Year-1' class_code Wilson High GPA Year-2
WHEN 'SO' THEN 'Year-2' for McRoberts Low GPA Year-3
WHEN 'JR' THEN 'Year-3' equality Bond High GPA Year-3
only Hanson High GPA Year-1
WHEN 'SR' THEN 'Year-4'
ELSE 'Null' Smith Integer GPA Year-2
END As year1 Delaney High GPA Year-4
FROM student_table Johnson Null Null
ORDER BY "Year"; Thomas 4.0 GPA Year-1
Phillips Integer GPA Year-4

The query above uses both a Valued Case and Searched Case.

Page 419

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Decode

SELECT course_name SELECT course_name


,decode(credits, ,CASE credits
1, 'One Credit', WHEN 1 THEN 'One Credit'
2, 'Two Credits', WHEN 2 THEN 'Two credits'
3, 'Three credits', WHEN 3 THEN 'Three credits'
'Credits Not found') Else 'credits not found'
AS creditalias END AS creditalias
FROM course_table FROM course_table
ORDER BY 1; ORDER BY 1;

course_name creditalias
Advanced SQL Three credits
Database Administration Credits Not found
Introduction to SQL Three credits
Physical Database Design Credits Not found
SQL Features Two Credits
Databricks Concepts Three credits

The decode command works like the CASE command. The two queries above are equivalent.

Page 420

Licensed to , [email protected]
Chapter 11 Interrogating the Data

A Trick for getting a Horizontal Case

SELECT AVG(CASE class_code


WHEN 'FR' THEN grade_pt
ELSE null END) AS freshman_gpa
,AVG(CASE class_code
WHEN 'SO' THEN grade_pt
ELSE null END) AS sophomore_gpa
,AVG(CASE class_code
WHEN 'JR' THEN grade_pt
ELSE null END) AS junior_gpa
,AVG(CASE class_code
WHEN 'SR' THEN grade_pt
ELSE null END) AS senior_gpa
FROM student_table WHERE class_code IS not null ;

freshman_gpa sophomore_gpa junior_gpa senior_gpa


2.293333 2.900000 2.92500 3.175000

Aggregates ignore Nulls, so knowing this trick allows for Horizontal Reporting.

Page 421

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Put a Valued CASE in the ORDER BY

SELECT * FROM student_table


ORDER BY CASE class_code
WHEN 'FR' THEN 1
CASE in the WHEN 'SO' THEN 2
ORDER BY WHEN 'JR' THEN 3
Statement WHEN 'SR' THEN 4
ELSE 5 END;
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

I bet you didn't know you could put a CASE statement in the Order BY clause. You do now! Above, we are using
a valued CASE because there is a column value (class_code) immediately after the keyword CASE. A valued
CASE can only check for equality, and only for the column class_code.

Page 422

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Put a Searched CASE in the ORDER BY

SELECT * FROM student_table


ORDER BY CASE
WHEN class_code = 'FR' THEN 1
WHEN class_code = 'SO' THEN 2
CASE in the WHEN class_code LIKE 'J%' THEN 3
ORDER BY WHEN class_code IN('SR') THEN 4
Statement WHEN class_code IS NULL THEN 5
ELSE 6 END;
student_id last_name first_name class_code grade_pt
234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

The example above is a Searched Case.

Page 423

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Put a Decode in the ORDER BY

SELECT * FROM student_table


ORDER BY Decode(class_code, 'FR', 1, 'SO', 2, 'JR', 3, 'SR', 4, 5) ;

student_id last_name first_name class_code grade_pt


234121 Thomas Wendy FR 4.00
125634 Hanson Henry FR 2.88
423400 Larkins Michael FR 0.00
333450 Smith Andy SO 2.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
123250 Phillips Martin SR 3.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?

I bet you didn't know you could put a DECODE statement in the Order BY clause. Decode is much like a CASE
statement but instead uses a different format. We are using the Decode on the column class_code. Suppose the
value of class_code is 'FR,' then put in a 1, but if the value is 'SO,' then put in a 2, etc. If the value does not match
'FR', 'SO', 'JR', or 'SR', then put in a 5.

Page 424

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Extreme CASE Challenge

Boots 1862250.00 2793375.00 941028.73 1862250.00 7458903.73


Jackets 3903900.00 5855850.00 975975.00 1974709.63 12710434.63
Jeans 736573.44 1717500.00 286250.00 1145000.00 3885323.44
T-Shirt 614280.00 438898.11 153570.00 614280.00 1821028.11

Your mission is to use the pivot_test_region_all table and write a


case statement to produce the result set above?

Your mission is to use the PIVOT_TEST_REGION_ALL table and write a case statement to produce the result set
above?

Page 425

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Answer - Extreme CASE Challenge

SELECT PRODUCT,
SUM(CASE WHEN sales_person = 'Mary Jones' THEN daily_sales
ELSE NULL END) as Mary_Jones,
SUM(CASE WHEN sales_person = 'Will Davis' THEN daily_sales
ELSE NULL END) as Will_Davis,
SUM(CASE WHEN sales_person = 'Gary Lewis' THEN daily_sales
ELSE NULL END) as Gary_Lewis,
SUM(CASE WHEN sales_person = 'Helen Smith' THEN daily_sales
ELSE NULL END) as Helen_Smith,
Mary_Jones + Will_Davis + Gary_Lewis + Helen_Smith as total_sales
FROM pivot_test_region_all
GROUP BY product
ORDER BY product;

Here is how we answered the extreme CASE challenge. Your mission is to write a query using the intersect
operator to show all customers who have placed an order.

Page 426

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Answer - CASE Challenge

SELECT *
,CASE
WHEN dept_no = 200 THEN 'Winner'
WHEN salary BETWEEN 20000 and 40000 THEN 'Worker'
WHEN salary < 50000 THEN 'Manager'
WHEN salary < 60000 THEN 'VP'
WHEN salary < 900000 THEN 'CEO'
Else 'DON''T KNOW'
END as title
FROM employee_table ORDER BY dept_no NULLS LAST ;
employee_no dept_no last_name first_name salary title
1000234 10 Smythe Richard 64300.00 CEO
1232578 100 Chambers Mandee 48850.00 Manager
1333454 200 Smith John 48000.00 Winner
1324657 200 Coffing Billy 41888.88 Winner
2312225 300 Larkins Loraine 40200.00 Manager
1256349 400 Harrison Herbert 54500.00 VP
1121334 400 Strickling Cletus 54500.00 VP
2341218 400 Reilly William 36000.00 Worker
2000000 ? Jones Squiggy 32800.50 Worker

Our WHEN statements are in the best logical order to produce only one CEO.

Page 427

Licensed to , [email protected]
Chapter 11 Interrogating the Data

Page 428

Licensed to , [email protected]
Chapter 12 Views

Chapter 12 – Views

"Be the change that you want to see in the world."

-Mahatma Gandhi

Page 429

Licensed to , [email protected]
Chapter 12 Views

The Fundamentals of Views

View Fundamentals
A view is a virtual table.
A view may define a subset of columns.
A view can even define a subset of rows if it has a WHERE clause.
A view never duplicates data or stores the data separately.
Views provide security.

View Advantages
An additional level of security is provided.
Helps the business user not miss join conditions.
Help control read and update privileges.
Unaffected when new columns are added to a table.
Unaffected when a column is dropped unless its referenced in the view.

The above information introduces View fundamentals and advantages. The most important things to understand
about views are that they never duplicate data but merely hide sensitive data from being seen by users.

Page 430

Licensed to , [email protected]
Chapter 12 Views

Creating a Simple View to Restrict Sensitive Columns

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

CREATE View employee_v SELECT *


AS FROM employee_v
SELECT dept_no WHERE dept_no = 100
,first_name
,last_name dept_no first_name last_name
FROM employee_table ; 100 Mandee Chambers

Above, we create a view whose name is employee_v, and its creation does not include the employee_no or salary
columns. The users have access to the views, and the views have access to the actual tables.

Page 431

Licensed to , [email protected]
Chapter 12 Views

Creating a Simple View to Restrict Rows

employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

CREATE View emp_200_v SELECT *


AS SELECT dept_no FROM emp_200_v
,first_name
,last_name dept_no first_name last_name
FROM employee_table 200 Billy Coffing
WHERE dept_no = 200 ; 200 John Smith

The view example above demonstrates how a view can restrict rows. In the view, emp_200_V, the user can only
see rows from dept_no 200.

Page 432

Licensed to , [email protected]
Chapter 12 Views

Creating a View to Join Tables Together

CREATE VIEW join_ins_v


AS
SELECT
c.claim_date
,c.subscriber_no
,c.member_no
,p.provider_code
,p.provider_name
,p.p_error_rate
FROM claims as c
INNER JOIN
providers as p
ON c.provider_no = p.provider_code

The view example above joins two tables together. By creating a view, we have now made it easier for the user
community to join these tables by merely selecting the columns you want from the view. Views can hide the
complexity of a query and allow users to access relevant information without being an SQL guru.
Page 433

Licensed to , [email protected]
Chapter 12 Views

Basic Rules for Views

No ORDER BY inside the View CREATE (exceptions exist)


All Aggregation should have an ALIAS
Any Derived columns (such as Math) should have an ALIAS

CREATE View deptsal_v AS


SELECT dept_no
Why do
these two ,SUM(salary) as sumsal
columns ,SUM(salary) / 12 as monthsal
need FROM employee_table You don't put an
aliases? GROUP BY dept_no; Order By in the
view creation.
So we can
SELECT dept_no
bring them Users put
back in the ,sumsal the Order By
SELECT FROM deptsal_v when selecting
query. Order By 1 ; from the view.

Above are the rules for views.


Page 434

Licensed to , [email protected]
Chapter 12 Views

How to Modify a View

Replace CREATE or REPLACE View employee_v


the view AS
SELECT dept_no
,first_name
,last_name
1 ,employee_no as emp
FROM employee_table ;

SELECT * FROM employee_v


2 WHERE dept_no in (100, 300)
ORDER BY dept_no

dept_no first_name last_name emp


100 Mandee Chambers 1232578
300 Loraine Larkins 2312225

The CREATE or REPLACE keywords change the definition of a view, which must exist.

Page 435

Licensed to , [email protected]
Chapter 12 Views

The Exception to the ORDER BY Rule inside a View

Create view sales_olap_v AS Every ANSI


SELECT product_id, sale_date, daily_sales Ordered Analytic
,Sum(daily_sales) has an Order By
in it naturally
OVER (ORDER BY daily_sales
Rows Unbounded Preceding) as csum1
FROM sales_table ;
SELECT * FROM sales_olap_v
product_id sale_date daily_sales csum1
3000 2000-10-04 15675.33 15675.33
3000 2000-10-02 19678.94 35354.27
Not all 3000 2000-10-03 21553.79 56908.06
rows 3000 2000-10-01 28000.00 84908.06
are
displayed 1000 2000-10-02 32800.50 117708.56
2000 2000-10-04 32800.50 150509.06
3000 2000-09-29 34509.13 185018.19
1000 2000-09-30 36000.07 221018.26
2000 2000-10-02 36021.93 257040.19

There are EXCEPTIONS to the no ORDER BY rule inside a view. The ANSI OLAP statements always have an
ORDER BY statement in them, but these still work inside a View.

Page 436

Licensed to , [email protected]
Chapter 12 Views

Derived Columns in a View Should Contain a Column Alias

CREATE VIEW alias_v


AS SELECT employee_no
,last_name The derived
All derived column
,salary/12 AS mnth_sal
columns must with the alias
have an alias in a FROM employee_table
mnth_sal
view WHERE dept_no = 200 ;
can be used to
query the view
SELECT *
FROM alias_v SELECT
ORDER BY mnth_sal ; AVG(mnth_sal)
FROM alias_v
employee_no last_name mnth_sal
1324657 Coffing 3490.740000 avg
1333454 Smith 4000.000000 3745.3700000000

You should alias all derived columns in a query. You can refer to them when querying the view.

Page 437

Licensed to , [email protected]
Chapter 12 Views

The Standard Way Most Aliasing is Done

CREATE VIEW emp_v2


AS SELECT employee_no
,last_name
,salary/12 as sal_monthly
FROM employee_table
WHERE dept_no = 200 ; The most
popular
form of
SELECT * aliasing
FROM emp_v2
ORDER BY sal_monthly

employee_no last_name sal_monthly


1324657 Coffing 3490.740000
1333454 Smith 4000.000000

The ALIAS for salary / 12 in this example is sal_monthly, and this form of aliasing is the most popular. All
derived data must have an alias in a view. The keyword 'as' in the alias definition is optional.

Page 438

Licensed to , [email protected]
Chapter 12 Views

Another Way to Alias Columns in a View CREATE

CREATE VIEW e_view2 (emp_nbr, last)


AS SELECT
employee_no
,last_name Aliases
can
FROM employee_table be here
WHERE dept_no = 200 ;

SELECT *
FROM e_view2

emp_nbr last
1324657 Coffing
1333454 Smith

You can create aliases for columns right after the view name or right after the column name. Above, we have done
it both ways just to see which alias will be accepted by default. Databricks takes the first alias.

Page 439

Licensed to , [email protected]
Chapter 12 Views

What Happens When a View Column gets Aliased Twice?

CREATE VIEW e_view3 (emp_nbr, Last, mnth_sal)


AS SELECT
employee_no
This column
,last_name has
,salary/12 AS my_monthly_salary been aliased
FROM employee_table twice
WHERE dept_no = 200 ; using both
techniques
SELECT *
FROM e_view3
Notice that the
emp_nbr last_name mnth_sal query result
1324657 Coffing 3490.740000 uses the alias
1333454 Smith 4000.000000 mnth_sal

You can create aliases for columns right after the view name or right after the column name. Above, we have done
it both ways to see which alias will be accepted by default. The first alias definition wins.

Page 440

Licensed to , [email protected]
Chapter 12 Views

Page 441

Licensed to , [email protected]
Chapter 13 Set Operators

Chapter 13 – Set Operators

"The man who doesn't read good books has no advantage over the man who
can't read them."

-Mark Twain

Page 442

Licensed to , [email protected]
Chapter 13 Set Operators

Rules of Set Operators

1. Each query will have at least two SELECT Statements separated by a


SET Operator.
2. SET Operators are UNION, INTERSECT, or EXCEPT.
3. Must specify the same number of columns from the same domain (data
type/range).
4. If using Aggregates, both SELECTs much have their own GROUP BY.
5. Both SELECTS must have a FROM Clause.
6. The First SELECT is used for all ALIAS, TITLE, and FORMAT
Statements.
7. The last SELECT will have the ORDER BY statement.
8. When multiple operators the order of precedence is INTERSECT,
UNION, and EXCEPT.
9. Parentheses can change the order of Precedence.
10. Duplicate rows are eliminated unless the ALL keyword is used.

Above are the rules for set operators.

Page 443

Licensed to , [email protected]
Chapter 13 Set Operators

Quiz - Intersect Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
INTERSECT
SELECT * FROM table_blue ;

In this example, what numbers in the answer set would come from the query above?
Page 444

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Intersect Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
INTERSECT
SELECT * FROM table_blue ;

In this example, only the number 3 was in both tables, so they intersect.

Page 445

Licensed to , [email protected]
Chapter 13 Set Operators

Quiz - Union Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
UNION
SELECT * FROM table_blue ;

In this example, what numbers in the answer set would come from the query above?
Page 446

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Union Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
UNION
SELECT * FROM table_blue ;

1 2 3 4 5

Both top and bottom queries run simultaneously; then, the two different temporary files merge to eliminate
duplicates and place the remaining numbers in the answer set.

Page 447

Licensed to , [email protected]
Chapter 13 Set Operators

Quiz - Union ALL Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
UNION ALL
SELECT * FROM table_blue ;

In this example, what numbers in the answer set would come from the query above?

Page 448

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Union ALL Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
UNION ALL
SELECT * FROM table_blue ;

1 2 3 3 4 5
Both top and bottom queries run simultaneously; then, the two different temp files merge to build the answer set.
The keyword ALL prevents eliminating duplicates.
Page 449

Licensed to , [email protected]
Chapter 13 Set Operators

Quiz - Except Explained Logically

table_red table_blue

1 3
2 4
3 5
SELECT * FROM table_red
EXCEPT
SELECT * FROM table_blue ;

EXCEPT delivers only the results of the top query, unless a value is
found in the bottom query, where it is removed. The bottom query
will never add results, but only take away from the top results.

In this example, what numbers in the answer set would come from the query above?

Page 450

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Except Explained Logically

table_red table_blue

1 3
2 4
3 5
The only possible results are
from Table_Red (1, 2, 3). SELECT * FROM table_red
Notice that Table_Blue EXCEPT
contains a 3, so this SELECT * FROM table_blue ;
eliminates the 3 from the
final answer.
1 2

The Top query SELECTED 1, 2, 3 from Table_Red. From that point on, only 1, 2, 3 can return in the answer set.
The bottom query runs on Table_Blue and eliminates matches from the top query.

Page 451

Licensed to , [email protected]
Chapter 13 Set Operators

Quiz - Testing Your Knowledge

table_red table_blue

1 3
2 4
3 5
SELECT * SELECT *
FROM table_blue FROM table_red
EXCEPT EXCEPT
SELECT * SELECT *
FROM table_red ; FROM table_blue ;

Will the result set be the same for both queries above?

Will both queries bring back the same result set? Check out the next page to find out.

Page 452

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Testing Your Knowledge

table_red table_blue

1 3
2 4
3 5
SELECT * SELECT *
FROM table_blue FROM table_red
EXCEPT EXCEPT
SELECT * SELECT *
FROM table_red ; FROM table_blue ;

Will the result set be the same for both queries above?

NO

No! The first query returns 4, 5, and the query on the right returns 1, 2. The answer set can only contain values
from the first table mentioned. Values from the second query eliminate matches from the first table.
Page 453

Licensed to , [email protected]
Chapter 13 Set Operators

An Equal Number of Columns in both SELECT List

dept_no employee_no
SELECT dept_no 100 1232578
,employee_no 400 1256349
Both queries
FROM employee_table have 400 2341218
UNION the same 300 2312225
SELECT dept_no number ? 2000000
,mgr_no of columns in 10 1000234
FROM department_table; the 400 1121334
SELECT list. 200 1324657
200 1333454
100 1256349
200 1000234
300 1333454
500 1121334

You must have an equal number of columns in both SELECT lists. An equal number of columns is for eliminating
duplicate rows. So, for comparison purposes, there must be an equal number of columns in both queries.

Page 454

Licensed to , [email protected]
Chapter 13 Set Operators

The Top Query handles all Aliases

depty the_mgr
100 1232578
SELECT dept_no as depty 400 1256349
,employee_no as the_mgr 400 2341218
FROM employee_table 300 2312225
UNION Top query is ? 2000000
SELECT dept_no responsible for 10 1000234
,mgr_no the column
400 1121334
FROM department_table; ALIAS
and 200 1324657
Formatting. 200 1333454
100 1256349
200 1000234
300 1333454
500 1121334

The top query is responsible for an alias on a column.

Page 455

Licensed to , [email protected]
Chapter 13 Set Operators

The Bottom Query does the ORDER BY

SELECT dept_no
,employee_no
FROM employee_table dept_no employee_no
UNION ? 2000000
Bottom 10 1000234
SELECT dept_no
query is 100 1256349
,mgr_no responsible
FROM department_table 100 1232578
for the
ORDER BY 1 ; ORDER BY 200 1324657
200 1333454
SELECT dept_no 200 1000234
,employee_no Bottom 300 2312225
FROM employee_table query can 300 1333454
UNION use the 400 2341218
column 400 1121334
SELECT dept_no number or
,mgr_no 400 1256349
column
FROM department_table name in the 500 1121334
ORDER BY dept_no ; ORDER BY

The Bottom Query is responsible for sorting and is the only place an ORDER BY statement works. You can use
the column number or the column name in the order by statement. You can even use the column name of
'employee_no' in the order by statement, even though it is from the top query.

Page 456

Licensed to , [email protected]
Chapter 13 Set Operators

Intersect Challenge

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Use the INTERSECT operator and a subquery to


show all customers who have placed an order.

Your mission is to write a query using the intersect operator to show all customers who have placed an order.

Page 457

Licensed to , [email protected]
Chapter 13 Set Operators

Answer - Intersect Challenge

customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84

Use the INTERSECT operator and a subquery to


show all customers who have placed an order.
SELECT *
FROM customer_table
WHERE customer_number IN
(SELECT customer_number
FROM customer_table
INTERSECT
SELECT customer_number
FROM order_table)

This quizzes answer uses the SET Operator INTERSECT in the subquery.
Page 458

Licensed to , [email protected]
Chapter 13 Set Operators

UNION Vs. UNION ALL

SELECT department_name, dept_no


FROM department_table
UNION ALL
SELECT department_name, dept_no
FROM department_table
ORDER BY 1;

UNION Answer Set UNION ALL Answer Set


department_name dept_no
department_name dept_no
Customer Support 400
Customer Support 400 Customer Support 400
Human Resources 500 Human Resources 500
Marketing 100 Human Resources 500
Research and Development 200 Marketing 100
Sales 300 Marketing 100
Research and Development 200
Research and Development 200
Sales 300
Sales 300

Unions will get better performance and use fewer system resources when using a Union ALL. Unless the user uses
the ALL option, there is overhead to eliminate duplicate rows from each result set and the final result.
Page 459

Licensed to , [email protected]
Chapter 13 Set Operators

Using UNION ALL and Literals

SELECT dept_no AS dept


,'employee ' AS b
,Concat(first_name,' ', last_name) as name
FROM employee_table
UNION ALL dept b name
SELECT dept_no 10 employee Richard Smythe
,'Department' 100 Department Marketing
,department_name 100 employee Mandee Chambers
FROM department_table 200 Department Research and Develop
ORDER BY 1 nulls last, 2 ; 200 employee Billy Coffing
200 employee John Smith
300 Department Sales
300 employee Loraine Larkins
400 Department Customer Support
400 employee Cletus Strickling
400 employee Herbert Harrison
400 employee William Reilly
500 Department Human Resources
? employee Squiggy Jones

Notice the 2nd SELECT column in that it is a literal 'employee ' (with two spaces), and the other Literal is
'Department.' These literals match up because now they are both ten characters long precisely. The UNION ALL
brings back all employees and all Departments and shows the employees in each valid department.
Page 460

Licensed to , [email protected]
Chapter 13 Set Operators

Using UNION ALL for speed in Merging Data Sets

cust_table_east cust_table_west combined_custs

1,000,000 rows of 1,000,000 rows of Completely


East Customers West Customers empty

INSERT INTO combined_custs


SEL * FROM cust_table_east
UNION ALL
SEL * FROM cust_table_west ;

Combined_Custs

2,000,000 rows
of East and West
customers

UNION ALL is a great technique to load data quickly.

Page 461

Licensed to , [email protected]
Chapter 13 Set Operators

Great Trick: Place your Set Operator in a Derived Table

SELECT employee_no AS manager


,Trim(last_name) || ', ' || first_name as name
FROM employee_table
INNER JOIN
(SELECT employee_no
FROM employee_table
INTERSECT
SELECT mgr_no
FROM department_table)
AS teratom (empno)
ON employee_no = empno;

manager name
1256349 Harrison, Herbert
1333454 Smith, John
1000234 Smythe, Richard
1121334 Strickling, Cletus

The Derived Table gave us the employee number for all managers, and we were able to join it.

Page 462

Licensed to , [email protected]
Chapter 13 Set Operators

A Great Example of how EXCEPT works

employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00

SELECT dept_no as department_number


FROM department_table
EXCEPT
SELECT dept_no
FROM employee_table;
department_number
500

This query brought back all Departments without any employees.

Page 463

Licensed to , [email protected]
Chapter 13 Set Operators

USING Multiple SET Operators in a Single Request

SELECT dept_no , employee_no empno


FROM employee_table
UNION ALL
SELECT dept_no, employee_no
FROM employee_table dept_no empno
INTERSECT ALL 10 1000234
SELECT dept_no, mgr_no ? 2000000
FROM department_table 400 1121334
EXCEPT 200 1333454
SELECT dept_no, mgr_no 400 1256349
FROM department_table 300 2312225
WHERE department_name 400 2341218
LIKE '%Sales%'; 100 1232578
200 1324657

Above, we use multiple SET Operators. They follow the natural Order of Precedence in that UNION is evaluated
first, then INTERSECT, and finally, EXCEPT.

Page 464

Licensed to , [email protected]
Chapter 13 Set Operators

Changing the Order of Precedence with Parentheses

SELECT dept_no, employee_no as empno


FROM employee_table
UNION ALL
dept_no empno
(SELECT dept_no, employee_no
FROM employee_table ? 2000000
INTERSECT ALL 200 1333454
(SELECT dept_no, mgr_no 10 1000234
FROM department_table 400 1256349
EXCEPT 100 1232578
SELECT dept_no, mgr_no 400 1121334
FROM department_table 400 2341218
WHERE department_name 200 1324657
LIKE '%Sales%')); 300 2312225
400 1256349

Above, we use multiple SET Operators and Parentheses to change the order of precedence. Above the EXCEPT
runs first, then the INTERSECT and lastly, the UNION. The natural Order of Precedence without parentheses is
UNION, INTERSECT, and, finally, EXCEPT.

Page 465

Licensed to , [email protected]
Chapter 13 Set Operators

Page 466

Licensed to , [email protected]
Chapter 14 Creating Tables

Chapter 14 – Creating Tables

"Strength does not come from physical capacity. It comes from an indomitable
will."

- Mahatma Gandhi

Page 467

Licensed to , [email protected]
Chapter 14 Creating Tables

Create Table Syntax

{ { [CREATE OR] REPLACE TABLE | CREATE [EXTERNAL] TABLE [ IF NOT EXISTS ] }


table_name
[ table_specification ]
[ USING data_source ]
[ table_clauses ]
[ AS query ] }

table_specification
( { column_identifier column_type [ NOT NULL ]
[ GENERATED ALWAYS AS ( expr ) |
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ] |
DEFAULT default_expression ]
[ COMMENT column_comment ]
[ column_constraint ] } [, ...]
[ , table_constraint ] [...] )

table_clauses
{ OPTIONS clause |
PARTITIONED BY clause |
clustered_by_clause |
LOCATION path [ WITH ( CREDENTIAL credential_name ) ] |
COMMENT table_comment |
TBLPROPERTIES clause } [...]

clustered_by_clause
{ CLUSTERED BY ( cluster_column [, ...] )
[ SORTED BY ( { sort_column [ ASC | DESC ] } [, ...] ) ]
INTO num_buckets BUCKETS }

Page 468

Licensed to , [email protected]
Chapter 14 Creating Tables

Data Types

Numeric data types Exact numeric data types Binary floating-point data
represent whole numbers: represent base-10 numbers: types use exponents and a
binary representation to
TINYINT Integral numeric
cover a range of numbers:
SMALLINT DECIMAL
FLOAT
INT
DOUBLE
BIGINT Date-time types represent
date and time components:
Numeric types represents DATE
all numeric data types:
TIMESTAMP Simple types are types defined
Exact numeric by holding singleton values:
TIMESTAMP_NTZ
Binary floating point Numeric
Date-time
Complex types are composed of multiple
components of complex or simple types: BINARY
ARRAY BOOLEAN
MAP INTERVAL
STRUCT STRING

Here are the data types of Databricks.


Page 469

Licensed to , [email protected]
Chapter 14 Creating Tables

Create Table Examples

CREATE TABLE course_table CREATE TABLE claims


( (claim_id INT
course_id INT ,claim_date DATE
,course_name STRING ,claim_service SMALLINT
,credits TINYINT ,subscriber_no BIGINT
,seats SMALLINT ,member_no TINYINT
); ,claim_amt DECIMAL(12,2)
,provider_no INT );

CREATE TABLE sales_table CREATE TABLE student_table


(product_id INT, (student_id INT,
sale_date TIMESTAMP, last_name STRING,
daily_sales DECIMAL(9,2)); first_name STRING,
class_code STRING,
grade_pt DECIMAL(5,2))
PARTITIONED BY (class_code);

-- Creates a CSV table from an external directory


CREATE TABLE weather USING CSV LOCATION '/mnt/csv_files/weather';

Here are some examples of creating tables.

Page 470

Licensed to , [email protected]
Chapter 14 Creating Tables

Best Practices for Partitioned Tables

CREATE TABLE claims_partitioned


(claim_id INT
,claim_date DATE
,claim_service SMALLINT
,subscriber_no BIGINT
,member_no TINYINT
,claim_amt DECIMAL(12,2)
,provider_no INT )
PARTITIONED BY (claim_date) ;
• Partition the table by a column used in the WHERE or ON clause of a join
that is performed often.
• The best partition column is most often a date column.
• Use columns with low cardinality, meaning there should be many duplicate
values in the column.
• You should consider partitioning a column if you expect data in that partition
to be at least 1 GB. Partitioning is optional for smaller tables.
• PARTITION BY is done on a single column only.

Here are some best practices for partitioned tables.


Page 471

Licensed to , [email protected]
Chapter 14 Creating Tables

Describe Detail Tablename

When you describe detail a table, you will see the columns format, id, name, description, location, createdAT,
lastModified, partitionColumns, numfiles, sizeInBytes, properties, minReaderVersion, and minWriterVersion.
Page 472

Licensed to , [email protected]
Chapter 14 Creating Tables

Not Null Constraint

CREATE TABLE employee_table ( The not null


employee_no DECIMAL(38,0) not null, constraint ensures
dept_no DECIMAL(38,0), that a specific
column cannot
last_name VARCHAR(20), contain NULL values.
first_name VARCHAR(12),
salary DECIMAL(8,2))
USING delta
TBLPROPERTIES (
'delta.minReaderVersion' = '1',
'delta.minWriterVersion' = '2')

The not null constraint ensures that a specific column cannot contain NULL values.

Page 473

Licensed to , [email protected]
Chapter 14 Creating Tables

Create a Table IF NOT EXISTS

You can use the IF NOT EXISTS option when


creating a table. If the table already exists,
Databricks will ignore the whole statement and will
not create the new table.

create table IF NOT EXISTS department_exists


(dept_no integer
,dept_name CHAR(24) The table will not be
) created if it already exists

When creating a table, we recommend the IF NOT EXISTS option to ensure the table doesn't already have a table
with the same name in the database.

Page 474

Licensed to , [email protected]
Chapter 14 Creating Tables

Create Table AS (CTAS) Populates the Table With Data

create table claims_ctas New Table name


Keyword
AS AS
is optional SELECT * FROM claims; Existing Table

SELECT * You are set to query


FROM claims_ctas; because the table
is populated with data

claim_id claim_date claim_service subscriber_no member_no claim_amt provider_no


1302111 2015-03-10 111 1111111 2 235.78 1
1302111 2016-01-19 555 1111111 2 735.78 1
1302111 2015-10-15 111 1111111 2 235.78 1
1302111 2014-08-04 122 1111111 1 250.99 1
1302111 2016-05-20 222 1111111 2 23553.22 1
1302111 2014-04-30 333 1111111 2 235.78 1

You can create one table from another with the data automatically loaded by adding a SELECT statement at the
end of the create table statement, which refers to a CTAS (create table AS). The CTAS does not automatically
create any indexes for you, which is intentional to make the statement flexible and versatile. If you want to have
indexes in the table, you should specify these before the SELECT statement.
Page 475

Licensed to , [email protected]
Chapter 14 Creating Tables

Create Table AS (CTAS) can Choose Certain Columns

create table claims_some_columns


AS
SELECT claim_id, claim_date, claim_service, claim_amt
FROM claims;

SELECT *
FROM claims_some_columns;

claim_id claim_date claim_service claim_amt


1302111 2015-03-10 111 235.78
1302111 2016-01-19 555 735.78
1302111 2015-10-15 111 235.78
1302111 2014-08-04 122 250.99
1302111 2016-05-20 222 23553.22
1302111 2014-04-30 333 235.78

The example above creates one table from another, but only uses some of the columns in the new table. The data is
automatically loaded as well.

Page 476

Licensed to , [email protected]
Chapter 14 Creating Tables

Page 477

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

Chapter 15 – Data Manipulation Language (DML)

"Manipulation is a crafty shadow that dances on the edges of truth, weaving


webs of deceit in the pursuit of control."

- Anonymous

Page 478

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

INSERT Syntax # 1

The following syntax of the INSERT does not use the column names as
part of the command. Therefore, it requires that the VALUES portion of
the INSERT match each column in the table with a data value or a null.

INSERT [ INTO ] <table-name> VALUES


( <literal-data-value> [ ...,<literal-data-value> ] ;

create table sales


(product_id integer not null
,sale_date Date
,daily_sales Decimal(10,2));

INSERT INTO sales Values


(1000, '2019-06-30', 500.30);

INSERT INTO sales Values


(2000, null, 300.40);

The INSERT statement puts a new row into a table. The database returns a status from the database, but no rows
return to the user. It must account for all the columns in a table using either a data value or a null. When executed,
the INSERT places a single new row into a table.
Page 479

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

INSERT Syntax # 2

The syntax of the second type of INSERT follows:

INSERT [ INTO ] <table-name>


( <column-name> [...,<column-name> ]
VALUES
( <literal-data-value> [...,<literal-data-value> ] ;

INSERT INTO sales (product_id, sale_date, daily_sales)


Values (300, now(), 12450.22);
sale_date is data type
DATE

INSERT INTO sales (product_id, daily_sales) sale_date


Values (300, 12450.22); will be null

Above is another form of the INSERT statement that you can use when some of the data is not available. It allows
for the missing values (null) to be eliminated from the list in the VALUES clause. It is also the best format when
the data arranges in a different sequence than the create table, or when there are more nulls (unknown values) than
available data values. Notice in our top INSERT example that now() function inserts the current timestamp. Also,
notice in the second example that sale_date is missing in both the column definition and the value. Therefore
sale_date is null.

Page 480

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

INSERT Example with Multiple Rows

INSERT INTO sales (product_id, sale_date, daily_sales)


Values (500, '2019-06-22', 12450.22),
(600, '2019-06-21', 3330.12),
(700, '2019-06-20', 3500.02);

SELECT * FROM sales


WHERE product_id in (500, 600, 700)

product_id sale_date daily_sales


500 2019-06-22 12450.22
600 2019-06-21 3330.12
700 2019-06-20 3500.02

You have the option of inserting multiple rows with a single insert statement. Above, we have added three rows.

Page 481

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

Above we have inserted multiple rows and placed null values in some of them.

INSERT INTO sales (product_id, sale_date, daily_sales)


Values
(7000, null, 1000.40),
(8000, '2019-06-30', 2200.75),
(9000, null, null);

SELECT * FROM sales


WHERE product_id IN (7000, 8000, 9000)
OR product_id is null

product_id sale_date daily_sales


7000 ? 1000.40
8000 2019-06-30 2200.75
9000 ? ?

Above we have inserted multiple rows and placed null values in some of them.

Page 482

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

INSERT/SELECT Command

CREATE TABLE claims6 ( CREATE TABLE claims7 (


claim_id DECIMAL(38,0), claim_id INTEGER,
claim_date DATE, subscriber_no INTEGER,
claim_service INTEGER, claim_date DATE);
subscriber_no INTEGER,
member_no INTEGER,
claim_amt DECIMAL(12,2),
provider_no INTEGER);

INSERT INTO claims6 INSERT all


SELECT * FROM claims; columns and rows

INSERT INTO claims7 INSERT some


SELECT claim_id, subscriber_no, claim_date columns and
FROM claims; all rows

INSERT INTO claims7 INSERT some


SELECT claim_id, subscriber_no, claim_date columns and
FROM claims some rows
Where claim_amt > 500;

The INSERT/SELECT command inserts data into a table from another table. Both the source and the target tables
must reside on the same system. The examples above show a lot of options.
Page 483

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

INSERT/SELECT to Build a Data Mart

create table ins_data_mart


(subscriber_no integer
,member_no integer
,claim_amt Decimal(10,2)
,p_error_rate Decimal(10,2)
)

INSERT INTO ins_data_mart


SELECT C.subscriber_no, C.member_no,
Sum(claim_amt), AVG(p_error_rate)
FROM claims C
INNER JOIN
providers P
ON C.provider_no = P.provider_code
GROUP BY 1,2

You can use an INSERT/SELECT to build a data mart. We populate the data mart above with a join query as the
SELECT.

Page 484

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

UPDATE Examples

CREATE TABLE student_table2


AS SELECT * FROM student_table;

UPDATE student_table2 UPDATE student_table2


SET last_name = 'McGregor' SET grade_pt = grade_pt + .06
,class_code = 'JR' WHERE first_name = 'Martin'
,grade_pt = 3.94 AND class_code = 'JR'
WHERE student_id = 1; AND student_id = 1 ;
Update three Update the
columns for a grade_pt of a
single row single row by
adding .06 to it

CREATE TABLE employee_table2 Update employee_table2


AS SELECT * FROM employee_table; SET salary = salary * 1.1

Update every row


with a salary
increase of 10%

The above are examples of how to update a table.

Page 485

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

Deleting Rows in a Table

Delete certain rows from the table

DELETE FROM student_table2


WHERE class_code IS null
OR grade_pt = 0 ;

Delete all rows from the table

DELETE from student_table2 ;

Both examples will delete rows in the table. Sometimes you want to delete them all, and sometimes you need to
delete specific rows.

Page 486

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

Page 487

Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)

Page 488

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Chapter 16 – Statistical Aggregate Functions

"The future belongs to those who believe in the beauty of their dreams."

- Eleanor Roosevelt

Page 489

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The Stats Table


Col1 Col2 Col3 Col4 Col5 Col6
1 1 1 30 1 0
2 1 1 29 2 5
3 3 10 28 3 10
4 3 10 27 4 15
5 3 10 26 5 20
6 4 10 25 6 30
7 5 10 24 7 30
8 5 10 23 8 30
9 5 10 22 9 35
10 5 20 21 10 35
11 7 20 20 22 40
12 7 20 19 12 40
13 9 20 18 13 45
14 9 20 17 14 45
15 9 20 16 15 50
16 9 20 15 14 55
17 10 20 14 13 55
18 10 20 13 12 60
19 10 20 12 11 60
20 10 20 11 9 65
21 10 20 10 8 65
22 10 20 9 7 65
23 13 20 8 6 70
24 13 30 7 5 70
25 13 30 6 4 80
26 14 40 5 3 85
27 15 40 4 2 90
28 15 50 3 1 90
29 16 50 2 1 95
30 16 60 1 1 100

Page 490

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The KURTOSIS Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The following formula is used to compute the population excess kurtosis:
(n∗(n+1))/((n−1)∗(n−2)∗(n−3))∗(n∗m4/(k2)2)−3∗(n−1)2/((n−2)∗(n−3))

• n denotes the number of non-null records.


• m4 denotes the sample's fourth central moment.
• k2 denotes the symmetric unbiased estimator of the variance.

SELECT KURTOSIS(col1) AS KofCol1 KofCol1


FROM stats_table; -1.2

Kurtosis is a term used in statistics to describe the shape of a data set's distribution. It's a bit like describing the
"peakedness" or "tailedness" of the data. Imagine you have a bunch of numbers that represent the heights of people.
If most of the heights are close to the average height and the distribution of heights isn't too spread out, then the
data has low kurtosis. It's like a gentle, rounded hill. On the other hand, if the heights have some extreme values
and the distribution is more spread out, the data has high kurtosis. This is like a taller, more peaked hill. So,
kurtosis is a way to tell if your data has more or fewer extreme values compared to a standard distribution. It helps
you understand how the data's values are behaving, whether they're more clustered around the average or spread
out with some unusual values.

Page 491

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A KURTOSIS Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Easy way to see the distribution of data
Col
SELECT A positive value indicates a sharp or peaked
KURTOSIS(col1) AS COL1 distribution, and a negative number represents
,KURTOSIS(col2) AS COL2 a flat distribution. A peaked distribution means
,KURTOSIS(col3) AS COL3 that one value exists more often than the other
,KURTOSIS(col4) AS COL4 values. A flat distribution means there is the
FROM stats_table; same quantity values exist for each number.

COL1 COL2 COL3 COL4


-1.2 -1.02 0.79 -1.2

A high-value result is leptokurtic. While a medium result is mesokurtic, and a low result is platykurtic.

Page 492

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The STDDEV_POP Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using STDDEV_POP:

STDDEV_POP(<column-name>)

SELECT STDDEV_POP(col1) AS SDP_COL1 Returns the population


FROM stats_table; standard deviation
(square root of variance)
of non-NULL values. If all
SDP_COL1
records inside a group
8.66 are NULL, returns NULL.

The STDDEV_POP function is a statistical tool that helps you determine how spread out or "spread apart" a group
of numbers is from their average. Imagine you're looking at a bunch of test scores. If the scores are all very close,
then the group doesn't have much spread, and the STDDEV_POP value will be low. But if the scores are all over
the place, the group has more spread, and the STDDEV_POP value will be higher. So, in simpler terms, the
STDDEV_POP function tells you how much the numbers in a group vary or spread out from their average. It's a
way to see how consistent or varied the data is.

Page 493

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

STDDEV_POP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

SELECT Returns the population standard


Col
STDDEV_POP(col1) AS COL1 deviation (square root of
,STDDEV_POP(col2) AS COL2 variance) of non-NULL values.
,STDDEV_POP(col3) AS COL3 The STDDEV_POP function is
,STDDEV_POP(col4) AS COL4 one of two that calculates the
,STDDEV_POP(col5) AS COL5 standard deviation.
,STDDEV_POP(col6) AS COL6 Claims can have the same deviation.
FROM stats_table; For example, everyone has $100 and
then someone has a $500 bill, which
could be fraud. It is used to find
COL1 COL2 COL3 COL4 COL5 COL6 outliers in the data. It can be a part of
8.66 4.39 13.82 8.66 4.4 26.89 machine learning.

The standard deviation function is a statistical measure of the spread or dispersion of values.
Page 494

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The STDDEV_SAMP Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using STDDEV_SAMP:


STDDEV_SAMP(<column-name>)

SELECT STDDEV_SAMP(col1) AS SDS_COL1 Returns the sample


FROM stats_table; standard deviation (square
root of sample variance) of
SDS_COL1 non-NULL values. If all
records inside a group are
8.8 NULL, returns NULL.

The STDDEV_SAMP function is a statistic tool that helps you estimate how much the values in a group vary or
spread out from their average. Imagine you're looking at a set of exam scores. If the scores are fairly close, the
group has little spread, and the STDDEV_SAMP value will be low. But if the scores are all over the place, the
group has more spread, and the STDDEV_SAMP value will be higher. The key difference between
STDDEV_POP and STDDEV_SAMP is that STDDEV_POP assumes you have data for an entire population,
while STDDEV_SAMP assumes you only have a sample from that population. This makes it more accurate when
working with a smaller portion of the data. In simpler terms, the STDDEV_SAMP function helps you understand
how much the numbers in a group differ from their average. It's like a way to measure the data's consistency or
variable, especially when dealing with a smaller group from a bigger population.

Page 495

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A STDDEV_SAMP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
STDDEV_SAMP(col1) AS COL1 Returns the sample
,STDDEV_SAMP(col2) AS COL2 standard deviation
,STDDEV_SAMP(col3) AS COL3 (square root of
,STDDEV_SAMP(col4) AS COL4 sample variance) of
,STDDEV_SAMP(col5) AS COL5 non-NULL values.
,STDDEV_SAMP(col6) AS COL6
FROM stats_table;
COL1 COL2 COL3 COL4 COL5 COL6
8.8 4.47 14.06 8.8 4.5 27.34

The standard deviation function is a statistical measure of the spread or dispersion of values. It is the root’s square
of the difference of the mean (average). This measure is to compare the amount by which a set of values differs
from the arithmetical mean.
Page 496

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The VAR_POP Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax : VAR_POP(<column-name>)

SELECT VAR_POP(col1) AS VP_COL1 Returns the population variance of


FROM stats_table; non-NULL records in a group. If all
records inside a group are NULL,
VP_COL1 a NULL is returned.

74.92

What is variance? The variance is a measure of variability. It is calculated by taking the average of
squared deviations from the mean. Variance tells you the degree of spread in your data set. The more
spread the data, the larger the variance is in relation to the mean.

The VAR_POP function in statistics helps you determine how spread out the values in a group are from their
average. Imagine you have a bunch of test scores. If the scores are close, then the group doesn't have much spread,
and the VAR_POP value will be low. But if the scores are all over the place, the group has more spread, and the
VAR_POP value will be higher. In simple terms, the VAR_POP function gives you a number that describes how
much the numbers in a group vary or spread out from their average. It's a way to understand the overall variability
or dispersion of the data.
Page 497

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A VAR_POP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
VAR_POP(col1) AS COL1
,VAR_POP(col2) AS COL2 Returns the population
variance of non-NULL
Another flavor is ,VAR_POP(col3) AS COL3
records in a group.
seeing how much ,VAR_POP(col4) AS COL4
variance in the data ,VAR_POP(col5) AS COL5
,VAR_POP(col6) AS COL6
FROM stats_table;

COL1 COL2 COL3 COL4 COL5 COL6


74.92 19.29 191.06 74.92 19.58 722.81

VARPOP Returns the population variance of non-NULL records in a group.

Page 498

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The VAR_SAMP Function


COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax: VAR_SAMP(<column-name>)

SELECT VAR_SAMP(col1) AS VS_COL1 Returns the sample


FROM stats_table; variance of non-NULL
records in a group.
VS_COL1
77.5
What is variance? The variance is a measure of variability. It is calculated by taking the average of
squared deviations from the mean. Variance tells you the degree of spread in your data set. The more
spread the data, the larger the variance is in relation to the mean.

The VAR_SAMP function in statistics helps you estimate how spread out the values in a sample group are from
their average. Imagine you have many exam scores from a smaller group of students. If the scores are fairly close,
the group doesn't have much spread, and the VAR_SAMP value will be low. But if the scores are all over the
place, the group has more spread, and the VAR_SAMP value will be higher. The key difference between
VAR_POP and VAR_SAMP is that VAR_POP assumes you have data for an entire population, while
VAR_SAMP assumes you only have a sample from that population. VAR_SAMP considers that you might have a
partial picture when you're working with a sample. In simpler terms, the VAR_SAMP function shows how much
the numbers in a sample group differ from their average. It's like measuring how variable or spread out the data is,
especially when working with a smaller portion of the entire population.
Page 499

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A VAR_SAMP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT VAR_SAMP(col1) AS COL1


,VAR_SAMP(col2) AS COL2 VAR_SAMP
,VAR_SAMP(col3) AS COL3 returns the
,VAR_SAMP(col4) AS COL4 sample variance
of non-NULL
,VAR_SAMP(col5) AS COL5
records in a
,VAR_SAMP(col6) AS COL6 group.
FROM stats_table ;
COL1 COL2 COL3 COL4 COL5 COL6
77.5 19.95 197.65 77.5 20.25 747.73

Variance has two forms; VAR_POP is for the entire population of data rows allowed by the WHERE clause.
VAR_SAMP is for a random sampling of the data rows allowed by the WHERE clause.

Page 500

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The CORR Function


COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: CORR(<column-name>, <column-name>)


CORR is computed for non-null pairs using the following formula: Where x is the independent
variable and y is the
COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y)) dependent variable.

SELECT CORR(col1, col2) AS CCol1and2 FROM stats_table; CCOL1AND2


0.99
Variance tells us how much a quantity varies with its mean. It's the spread of data around the mean
value. You only know the magnitude here, and how much the data is spread. Covariance tells us the
direction in which two quantities vary with each other. Correlation shows us both, the direction and
magnitude of how two quantities vary with each other.

The CORR function is a way to figure out how strongly two sets of numbers are related or connected to each other.
Imagine you have two lists of data, like the amount of time people study and the grades they get. The CORR
function helps you find out if there's a relationship between these two things. The correlation would be positive if
higher study times usually lead to higher grades. The correlation would be negative if higher study times lead to
lower grades. In simpler words, the CORR function gives you a number that shows how much the two sets of
numbers move together. The correlation is positive if one goes up when the other goes up. If one goes down when
the other goes up, it's negative. It's like a math tool to help you understand if things are connected in a certain way.

Page 501

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A CORR Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
CORR(col1, col2) AS C1_2
As temp goes up ,CORR(col1, col3) AS C1_3 Where:
Then crime goes ,CORR(col1, col4) AS C1_4 1 = perfect positive correlation
Up (people are 0 = no correlation
,CORR(col1, col5) AS C1_5
outside.) -1 = perfect negative correlation
,CORR(col1, col6) AS C1_6
Negative corr is FROM stats_table ;
Less ice cream C1_2 C1_3 C1_4 C1_5 C1_6 Do data points move in the same
With higher temps. direction or opposite directions.
0.99 0.89 -1.00 -0.15 0.99
Variance tells us how much a quantity varies with its mean. Its the spread of data around the mean value. You only know
the magnitude here, as in how much the data is spread. Covariance tells us direction in which two quantities vary with
each other. Correlation shows us both, the direction and magnitude of how two quantities vary with each other.

Page 502

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Another CORR Example so you can Compare

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT SELECT


CORR(col4, col2) AS C4_2 CORR(col1, col2) AS C1_2
,CORR(col4, col3) AS C4_3 ,CORR(col1, col3) AS C1_3
,CORR(col4, coL1) AS C4_1 ,CORR(col1, col4) AS C1_4
,CORR(col4, col5) AS C4_5 ,CORR(col1, col5) AS C1_5
,CORR(col4, col6) AS C4_6 ,CORR(col1, col6) AS C1_6
FROM stats_table ; FROM stats_table ;

C4_2 C4_3 C4_1 C4_5 C4_6 C1_2 C1_3 C1_4 C1_5 C1_6
-0.99 -0.89 -1 0.15 -0.99 0.99 0.89 -1 -0.15 0.99

Page 503

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The VARIANCE Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax: VARIANCE(<column-name>)

The Variance function returns the sample


variance of non-NULL records in a group
SELECT VARIANCE (col1) AS VSCOL1
FROM stats_table;
VSCOL1
77.5
What is variance? The variance is a measure of variability. It is calculated by taking the
average of squared deviations from the mean. Variance tells you the degree of spread in your
data set. The more spread the data, the larger the variance is in relation to the mean.

The VARIANCE function in statistics helps you figure out how much the values in a group differ from their
average, on average. Imagine you have a set of test scores. If the scores are close to each other, then the group
doesn't have much variation, and the VARIANCE value will be low. However, if the scores are spread out, the
group has more variation, and the VARIANCE value will be higher. In simple terms, the VARIANCE function
gives you a number that describes the average amount of spread or difference between the numbers in a group and
their average. It's like a way to measure the overall variability or dispersion of the data.
Page 504

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A VARIANCE Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
VARIANCE(col1) AS COL1
First calculate the variance
,VARIANCE(col2) AS COL2 The Variance function
ss a precursor to ,VARIANCE(col3) AS COL3 returns the sample
standard deviation. ,VARIANCE(col4) AS COL4 variance of non-NULL
,VARIANCE(col5) AS COL5 records in a group
,VARIANCE(col6) AS COL6
FROM stats_table ;
COL1 COL2 COL3 COL4 COL5 COL6
77.5 19.95 197.65 77.5 20.25 747.73

What is variance? The variance is a measure of variability. It is calculated by taking the average of squared deviations from
the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in
relation to the mean.

Page 505

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The COVAR_POP Function


COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: COVAR(<column-name>, <column-name>)

COVAR_POP is computed for non-null pairs using the following formula:


(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*)

SELECT COVAR_POP(col1, col2) AS CCOL1_2 FROM stats_table;


CCOL1_2
37.5

The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population. Imagine you have two sets of data, like the number of hours studied and the
corresponding test scores of different students. The COVAR_POP function helps you determine if there's a
consistent pattern between the two data sets when you're looking at the entire population. In simpler terms,
COVAR_POP gives you a number that shows how much the two sets of values move together or apart across the
entire group. If they usually go in the same direction, the covariance is positive. If they tend to go in opposite
directions, the covariance is negative. It's like a mathematical tool to help you understand how two things change
together across a population.

Page 506

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A COVAR_POP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col The covariance is a statistical SELECT


measure of the tendency of two COVAR_POP(col1, col2) AS C1_2
variables to change in ,COVAR_POP(col1, col3) AS C1_3
conjunction with each other. It is ,COVAR_POP(col1, col4) AS C1_4
equal to the product of their ,COVAR_POP(col1, col5) AS C1_5
standard deviations and ,COVAR_POP(col1, col6) AS C1_6
correlation coefficients. FROM stats_table ;
COVAR_POP returns the C1_2 C1_3 C1_4 C1_5 C1_6
population covariance for
37.5 105.9 -74.92 -5.82 230.75
non-null pairs in a group.

The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population.
Page 507

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Another COVAR_POP Example so you can Compare

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT SELECT


COVAR_POP(col4, col2) C4_2 COVAR_POP(col1, col2) C1_2
,COVAR_POP(col4, col3) C4_3 ,COVAR_POP(col1, col3) C1_3
,COVAR_POP(col4, col1) C4_1 ,COVAR_POP(col1, col4) C1_4
,COVAR_POP(col4, col5) C4_5 ,COVAR_POP(col1, col5) C1_5
,COVAR_POP(col4, col6) C4_6 ,COVAR_POP(col1, col6) C1_6
FROM stats_table ; FROM stats_table ;

C4_2 C4_3 C4_1 C4_5 C4_6 C1_2 C1_3 C1_4 C1_5 C1_6
-37.5 -105.9 -74.92 5.82 -230.75 37.5 105.9 -74.92 -5.82 230.75

The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population.
Page 508

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The COVAR_SAMP Function


COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

COVAR_SAMP Syntax: COVAR_SAMP (expression1,expression2)


returns the
sample
COVAR_SAMP uses the following formula:
covariance for
non-null pairs in
a group. (SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / (COUNT(*) - 1)

SELECT COVAR_SAMP(col1, col2) AS CCOL1_2 FROM stats_table;


CCOL1_2
38.79

The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship. Imagine you have two sets of data, like the number of hours studied and
the corresponding test scores of a smaller group of students. The COVAR_SAMP function helps you determine if
there's a consistent pattern between the two data sets within that sample. In simpler terms, COVAR_SAMP gives
you a number that shows how much the two sets of values tend to move together or apart in that smaller group. If
they usually go in the same direction, the covariance is positive. If they tend to go in opposite directions, the
covariance is negative. It's like a math tool that helps you estimate how two things change together in a smaller
data portion.
Page 509

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A COVAR_SAMP Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
COVAR_SAMP COVAR_SAMP(col1, col2) AS C1_2 COVAR_SAMP
returns the ,COVAR_SAMP(col1, col3) AS C1_3 eliminates all
sample expression pairs
,COVAR_SAMP(col1, col4) AS C1_4
covariance for where either
non-null pairs in ,COVAR_SAMP(col1, col5) AS C1_5 expression in the
a group. ,COVAR_SAMP(col1, col6) AS C1_6 pair is NULL.
FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6


38.79 109.55 -77.5 -6.02 238.71

The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship.
Page 510

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Another COVAR_SAMP Example so you can Compare

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col
SELECT SELECT
COVAR_SAMP (col1, col2) C1_2 COVAR_SAMP (col4, col2) C4_2
,COVAR_SAMP (col1, col3) C1_3 ,COVAR_SAMP (col4, col3) C4_3
,COVAR_SAMP (col1, col4) C1_4 ,COVAR_SAMP (col4, col1) C4_1
,COVAR_SAMP (col1, col5) C1_5 ,COVAR_SAMP (col4, col5) C4_5
,COVAR_SAMP (col1, coL6) C1_6 ,COVAR_SAMP (col4, col6) C4_6
FROM stats_table ; FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
38.79 109.55 -77.5 -6.02 238.71 -38.79 -109.55 -77.5 6.02 -238.71

The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship.
Page 511

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_INTERCEPT Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_INTERCEPT(dependent-expression, independent-expression)


REGR_INTERCEPT returns the intercept The formula for REGR_INTERCEPT is:
of the univariate linear regression line for
non-null pairs in a group. AVG(y)-REGR_SLOPE(y,x)*AVG(x)

SELECT REGR_INTERCEPT(col1, col2) as REGR_INTERCEPT FROM stats_table;


REGR_INTERCEPT
-1.35

The REGR_INTERCEPT function in statistics helps you find the point where a straight line (a linear regression
line) crosses the y-axis. Imagine you have a set of data points on a scatter plot that could fit a straight line. The
REGR_INTERCEPT function helps determine where that line starts on the vertical y-axis. In simpler terms, the
REGR_INTERCEPT function gives you a number representing the y-coordinate where the line crosses the y-axis.
It's like finding the starting point for a straight line that best fits your data. This helps you make predictions based
on the relationship between the two sets of data.

Page 512

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_INTERCEPT Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col REGR_INTERCEPT returns SELECT


REGR_INTERCEPT(col1, col2) C1_2
the intercept of the univariate
linear regression line for,REGR_INTERCEPT(col1, col3) C1_3
non-null pairs in a group. ,REGR_INTERCEPT(col1, col4) C1_4
REGR_INTERCEPT formula: ,REGR_INTERCEPT(col1, coL5) C1_5
,REGR_INTERCEPT(col1, col6) C1_6
AVG(y)-REGR_SLOPE(y,x)*AVG(x) FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6


-1.35 3.45 31 17.65 -0.83

A regression line is a line of best fit drawn through a set of points on a graph for X and Y coordinates. It uses the
Y coordinate as the Dependent Variable and the X value as the Independent Variable. Two regression lines always
meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y) and is not usually one of the
original data points.

Page 513

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Another REGR_INTERCEPT Example so you can Compare

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT SELECT
REGR_INTERCEPT(col1, col2) C1_2 REGR_INTERCEPT(col4, col2) C4_2
,REGR_INTERCEPT(col1, col3) C1_3 ,REGR_INTERCEPT(col4, col3) C4_3
,REGR_INTERCEPT(col1, col4) C1_4 ,REGR_INTERCEPT(col4, col1) C4_1
,REGR_INTERCEPT(col1, col5) C1_5 ,REGR_INTERCEPT(col4, col5) C4_5
,REGR_INTERCEPT(col1, col6) C1_6 ,REGR_INTERCEPT(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
-1.35 3.45 31 17.65 -0.83 32.35 27.55 31 13.35 31.83

Two regression lines always meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y)
and is not usually one of the original data points.

Page 514

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_SLOPE Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_SLOPE(dependent-expression, independent-expression)

REGR_SLOPE returns the slope Formula for REGR_SLOPE:


of the linear regression line for
non-null pairs in a group. COVAR_POP(x,y) / VAR_POP(x)

SELECT REGR_SLOPE(col1, col2) AS REG_SLOPE FROM stats_table;


REG_SLOPE
1.94

The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points. Imagine you have many data points on a scatter plot, and they have a general
trend that could be described with a straight line. The REGR_SLOPE function helps you figure out how steep that
line should be. In simpler terms, the REGR_SLOPE function gives you a number that represents how much the y-
values (vertical values) change for each one-unit increase in the x-values (horizontal values) along the line. It's like
understanding how much the data rises or falls as you move along the line. This helps you see how the two sets of
data are related linearly.

Page 515

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_SLOPE Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col
REGR_SLOPE returns the slope SELECT
of the linear regression line for REGR_SLOPE(col1, col2) AS C1_2
non-null pairs in a group. ,REGR_SLOPE(col1, col3) AS C1_3
,REGR_SLOPE(col1, col4) AS C1_4
Formula for REGR_SLOPE: ,REGR_SLOPE(col1, col5) AS C1_5
,REGR_SLOPE(col1, col6) AS C1_6
COVAR_POP(x,y) / VAR_POP(x) FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6


1.94 0.55 -1 -0.3 0.32

The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points.

Page 516

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

NOT IN Subquery Returns Nothing when nulls are Present

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col
SELECT SELECT
REGR_SLOPE(col1, col2) C1_2 REGR_SLOPE(col4, col2) C4_2
,REGR_SLOPE(col1, col3) C1_3 ,REGR_SLOPE(col4, col3) C4_3
,REGR_SLOPE(col1, col4) C1_4 ,REGR_SLOPE(col4, col1) C4_1
,REGR_SLOPE(col1, col5) C1_5 ,REGR_SLOPE(col4, col5) C4_5
,REGR_SLOPE(col1, col6) C1_6 ,REGR_SLOPE(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
1.94 0.55 -1 -0.3 0.32 -1.94 -0.55 -1 0.3 -0.32

The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points.

Page 517

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_AVGX Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_AVGX(dependent-expression, independent-expression)

REGR_AVGX function returns the average of the independent


variable for non-null pairs in a group where x is the
independent variable and y is the dependent variable.

SELECT REGR_AVGX(col1, col2) AS REG_AVGX FROM stats_table;


REG_AVGX
8.67

The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points. Imagine you have a bunch of data points on a scatter plot. The x-values are the numbers on the horizontal
axis. The REGR_AVGX function helps you calculate the average or typical value of these x-values. In simpler
terms, the REGR_AVGX function gives you a number representing the central position of the x-values. It's like
finding the "middle" value of all the numbers that make up the x-coordinates of your data points. This can be useful
for understanding the general location of your data on the x-axis.
Page 518

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_AVGX Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col
SELECT REGR_AVGX function
REGR_AVGX(col1, col2) AS C1_2 returns the average of the
,REGR_AVGX(col1, col3) AS C1_3 independent variable for
,REGR_AVGX(col1, col4) AS C1_4 non-null pairs in a group
,REGR_AVGX(col1, col5) AS C1_5 where x is the independent
,REGR_AVGX(col1, col6) AS C1_6 variable and y is the
FROM stats_table ; dependent variable.

c1_2 c1_3 c1_4 c1_5 c1_6


8.67 21.73 15.5 7.23 51.17

The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points.
Page 519

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Another REGR_AVGX Example so you can Compare

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT SELECT


REGR_AVGX(col1, col2) C1_2 REGR_AVGX(col4, col2) C4_2
,REGR_AVGX(col1, col3) C1_3 ,REGR_AVGX(col4, col3) C4_3
,REGR_AVGX(col1, col4) C1_4 ,REGR_AVGX(col4, col1) C4_1
,REGR_AVGX(col1, col5) C1_5 ,REGR_AVGX(col4, col5) C4_5
,REGR_AVGX(col1, col6) C1_6 ,REGR_AVGX(col4, col6) C4_6
FROM STATS_TABLE ; FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
8.67 21.73 15.5 7.23 51.17 8.67 21.73 15.5 7.23 51.17

The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points.

Page 520

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_AVGY Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_AVGY(dependent-expression, independent-expression)


The REGR_AVGY function returns the average of the dependent
variable for non-null pairs in a group, where x is the independent
variable and y is the dependent variable. REGR_AVGY(y, x).

SELECT REGR_AVGY(col1, col2) AS REGR_AVGY_COL1_COL2


FROM stats_table;
REGR_AVGY_COL1_COL2
15.5

The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points. Imagine you have a bunch of data points on a scatter plot. The y-values are the numbers on the vertical axis.
The REGR_AVGY function helps you calculate the average or typical value of these y-values. In simpler terms,
the REGR_AVGY function gives you a number representing the central position of the y-values. It's like finding
the "middle" value of all the numbers that make up the y-coordinates of your data points. This can be useful for
understanding the general location of your data on the y-axis.

Page 521

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_AVGY Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT The REGR_AVGY function
REGR_AVGY(col1, col2) AS C1_2 returns the average of the
,REGR_AVGY(col1, col3) AS C1_3 dependent variable for non-
,REGR_AVGY(col1, col4) AS C1_4 null pairs in a group, where
,REGR_AVGY(col1, col5) AS C1_5 x is the independent variable
,REGR_AVGY(col1, col6) AS C1_6 and y is the dependent
FROM stats_table ; variable: REGR_AVGY(y, x)
C1_2 C1_3 C1_4 C1_5 C1_6
15.5 15.5 15.5 15.5 15.5

The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points.

Page 522

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Quiz- Write the Subquery with Two Parameters

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT SELECT
REGR_AVGY(col1, col2) C1_2 REGR_AVGY(col4, col2) C4_2
,REGR_AVGY(col1, col3) C1_3 ,REGR_AVGY(col4, col3) C4_3
,REGR_AVGY(col1, col4) C1_4 ,REGR_AVGY(col4, col1) C4_1
,REGR_AVGY(col1, col5) C1_5 ,REGR_AVGY(col4, col5) C4_5
,REGR_AVGY(col1, col6) C1_6 ,REGR_AVGY(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5

The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points.
Page 523

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_COUNT Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_COUNT(dependent-expression, independent-expression)


REGR_COUNT returns the
number of non-null number pairs
in a group. REGR_COUNT(y, x)

SELECT REGR_COUNT(col1, col2) AS REGR_COUNT FROM stats_table;


REGR_COUNT
30

The REGR_COUNT function in statistics helps you count the data points in your set. Imagine you have a bunch of
data points on a scatter plot. The REGR_COUNT function helps you determine how many of these points you
have. In simpler terms, the REGR_COUNT function gives you a number that tells you how many data points
you've got. It's like counting the dots on your scatter plot to know how much data you're working with. This count
can be important for various statistical calculations and understanding the reliability of your analysis.

Page 524

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_COUNT Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
REGR_COUNT(col1, col2) C1_2 The
REGR_COUNT
REGR_COUNT ,REGR_COUNT(col1, col3) C1_3
function is the
returns the number ,REGR_COUNT(col1, col4) C1_4 number of input
of non-null number ,REGR_COUNT(col1, col5) C1_5 rows in which both
pairs in a group. ,REGR_COUNT(col1, col6) C1_6 expressions are
REGR_COUNT(y, x) FROM stats_table ; non-null.

C1_2 C1_3 C1_4 C1_5 C1_6


30 30 30 30 30

The REGR_COUNT function in statistics helps you count the data points in your set.

Page 525

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_R2 Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_R2(Y, X)

SELECT REGR_R2(col1, col2) AS REGR_R2_COL1_2


FROM stats_table;
REGR_R2_COL1_2
0.97

The REGR_R2 function in statistics helps you understand how well a straight line (a linear regression line) fits
your data points. Imagine you have a scatter plot with data points, and you draw a straight line that you think best
represents the trend of the data. The REGR_R2 function helps you determine how closely the points match that
line. In simpler terms, the REGR_R2 function gives you a number between 0 and 1. If the number is closer to 1,
the line you drew fits the data points well. If it's closer to 0, the line doesn't match the points well. Think of it as a
measure of how well your line explains the data pattern. The closer to 1, the better the line fits the points; the closer
to 0, the worse the fit.

Page 526

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_R2 Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
REGR_R2(col1, col2) AS C1_2 The
,REGR_R2(col1, col3) AS C1_3 REGR_R2 is
the square of
,REGR_R2(col1, col4) AS C1_4 the
,REGR_R2(col1, col5) AS C1_5 correlation
,REGR_R2(col1, col6) AS C1_6 coefficient.
FROM stats_table ;

c1_2 c1_3 c1_4 c1_5 c1_6


0.97 0.78 1 0.02 0.98

The REGR_R2 function in statistics helps you understand how well a straight line (a linear regression line) fits
your data points.

Page 527

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_SXX Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_SXX(Y, X)

REGR_SXX returns
REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.

SELECT REGR_SXX(col1, col2) AS REGR_SXX FROM stats_table;


REGR_SXX
578.67

The REGR_SXX function in statistics helps you understand how the x-values (horizontal values) are spread in
your data set. Imagine you have a bunch of data points on a scatter plot. The x-values are the numbers on the
horizontal axis. The REGR_SXX function helps you calculate how much these x-values vary from their average. In
simpler terms, the REGR_SXX function gives you a number representing the sum of the squared differences
between each x-value and the average x-value. It's like measuring how much the x-values spread out from their
central position. This can help you understand the dispersion of your data along the x-axis.

Page 528

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Answer to Quiz – Write the Triple Subquery

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col REGR_SXX returns


REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.
SELECT
REGR_SXX(col1, col2) C1_2
,REGR_SXX(col1, col3) C1_3
,REGR_SXX(col1, col4) C1_4
,REGR_SXX(col1, col5) C1_5
,REGR_SXX(col1, col6) C1_6
FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6
578.67 5731.87 2247.5 587.37 21684.17

The REGR_SXX function in statistics helps you understand how the x-values (horizontal values) are spread in
your data set.
Page 529

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_SXY Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_SXY(Y, X)

REGR_SXY returns:
REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) for non-null pairs.

SELECT REGR_SXY(col1, col2) AS REGR_SXY_COL1_2 FROM stats_table;


REGR_SXY_COL1_2
1125

The REGR_SXY function in statistics helps you understand how the x-values (horizontal values) and y-values
(vertical values) change together in your data set. Imagine you have a bunch of data points on a scatter plot. The
REGR_SXY function helps you determine how the x-values and y-values move together or apart. In simpler terms,
the REGR_SXY function gives you a number that represents the sum of the products of the differences between
each x-value and the average x-value and the corresponding y-value and the average y-value. This helps you see
how the two sets of data change together in relation to each other. It's like measuring the "togetherness" of the data
points' movement along the x and y directions.
Page 530

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_SXY Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
REGR_SXY(col1, col2) AS C1_2
,REGR_SXY(col1, col3) AS C1_3 REGR_SXY returns:
,REGR_SXY(col1, cOL4) AS C1_4 REGR_COUNT(expr1, expr2) *
,REGR_SXY(col1, col5) AS C1_5 COVAR_POP(expr1, expr2) for
,REGR_SXY(col1, col6) AS C1_6 non-null pairs.
FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6


1125 3177 -2247.5 -174.5 6922.5

The REGR_SXY function in statistics helps you understand how the x-values (horizontal values) and y-values
(vertical values) change together in your data set.

Page 531

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

The REGR_SYY Function

COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

Syntax: REGR_SYY(Y, X)
REGR_SYY returns:
REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.

SELECT REGR_SYY(col1, col2) AS REGR_SYY_COL1_2 FROM stats_table;


REGR_SYY_COL1_2
2247.5

The REGR_SYY function in statistics helps you understand how the y-values (vertical values) are spread in your
data set. Imagine you have a bunch of data points on a scatter plot. The y-values are the numbers on the vertical
axis. The REGR_SYY function helps you calculate how much these y-values vary from their average. In simpler
terms, the REGR_SYY function gives you a number representing the sum of the squared differences between each
y-value and the average y-value. It's like measuring how much the y-values spread out from their central position.
This can help you understand the dispersion of your data along the y-axis.

Page 532

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

A REGR_SYY Example

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col SELECT
REGR_SYY(col1, coL2) AS C1_2
,REGR_SYY(col1, col3) AS C1_3 REGR_SYY returns:
,REGR_SYY(col1, col4) AS C1_4 REGR_COUNT(y, x) * VAR_POP(y)
,REGR_SYY(col1, col5) AS C1_5 for non-null pairs.
,REGR_SYY(col1, col6) AS C1_6
FROM stats_table ;

C1_2 C1_3 C1_4 C1_5 C1_6


2247.5 2247.5 2247.5 2247.5 2247.5

The REGR_SYY function in statistics helps you understand how the y-values (vertical values) are spread in your
data set.
Page 533

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Using GROUP BY

COL3 NUMBERS
1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60

SELECT col3
,COUNT(*) AS CNT
,AVG(col1) AS AVG1
,STDDEV_POP(col1) AS SD1
,VAR_POP(col1) AS VP1
,AVG(col4) AS AVG4
,STDDEV_POP(col4) AS SD4
,VAR_POP(col4) AS VP4
,AVG(col6) AS AVG6
,STDDEV_POP(col6) AS SD6
FROM stats_table GROUP BY col3 ORDER BY 1;

COL3 CNT AVG1 SD1 VP1 AVG4 SD4 VP4 AVG6 SD6
1 2 1.5 0.5 0.25 29.5 0.5 0.25 2.5 2.5
10 7 6 2 4 25 2 4 24.29 8.63
20 14 16.5 4.03 16.25 14.5 4.03 16.25 53.57 10.76
30 2 24.5 0.5 0.25 6.5 0.5 0.25 75 5
40 2 26.5 0.5 0.25 4.5 0.5 0.25 87.5 2.5
50 2 28.5 0.5 0.25 2.5 0.5 0.25 92.5 2.5
60 1 30 0 0 1 0 0 100 0
Page 534

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

APPROX_COUNT_DISTINCT

Approximate aggregate functions are scalable in terms of memory


usage and time but produce approximate results and not exact results.
These functions are best when memory is a problem because they
require less memory than exact aggregation functions. However, they
can introduce statistical uncertainty.
Use approximate aggregation for large data streams for which linear
memory usage is impractical or when data is already approximate.
SELECT
COUNT(DISTINCT dept_no) AS EXACT_DISTINCT
, APPROX_COUNT_DISTINCT(dept_no) AS APPROX_DISTINCT
FROM employee_table

EXACT_DISTINCT APPROX._DISTINCT
5 5

The APPROX_COUNT_DISTINCT function in statistics helps you get an estimate of how many different things
there are in a set, without having to count them all individually. Imagine you have a bag of different colored
marbles. Instead of taking out each marble and counting them one by one, the APPROX_COUNT_DISTINCT
function gives you a quick estimation of how many unique colors there are in the bag. In simpler terms, it's like a
shortcut to get a fairly accurate idea of how many different items are in a group, without having to count every
single item. It's especially useful when counting everything would take too much time.
Page 535

Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions

Page 536

Licensed to , [email protected]
Chapter 17 Mathematical Functions

Chapter 17 – Mathematical Functions

"Mathematics is not about numbers, equations, computations, or algorithms: it is


about understanding."

- William Paul Thurston

Page 537

Licensed to , [email protected]
Chapter 17 Mathematical Functions

Numeric Manipulation Functions

Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1 Query 2 Query 3
+ Snowflake SELECT
+ Azure Cloud
+ DB2 -10 as "neg10"
+ Excel ,Cos(90) as "cos" -- Trigonometric cosine of an angle
+ Greenplum ,Sin(90) as "sin" -- Trigonometric sine of an angle
+ Hadoop
+ Kognitio ,Tan(90) as "tan" -- Trigonometric tangent of an angle
+ Netezza ,Exp(6) as "exp" -- Exponential value of a number
+ Oracle ,Sqrt(16) as "sqrt" -- Square root of a number
+ Matrix
+ Yellowbrick
+ SQL Server Messages Garden of Analysis Result 1
+ Sybase
+ Teradata neg 10 cos sin tan exp sqrt
+ Vertica 1 -10 -0.45 0.89 -2 403.43 4

Use the functions above for algebraic, trigonometric, or geometric calculations.

Page 538

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ABS

ABS is a rounding and Syntax: ABS( <num_expr> )


truncation function that
returns the absolute
value of a numeric
SELECT ABS(-5)
expression.
,ABS(2.0)
,ABS(NULL)
,ABS(-4.5)

EXPR_1 EXPR_2 EXPR_3 EXPR_4


5 2.0 ? 4.5

The ABS mathematical function returns the absolute value of a number. ABS falls under the Databricks category
of Numeric Functions (Rounding and Truncation). The ABS function, short for "absolute value," is a way to find
the distance of a number from zero on the number line. Imagine you have a number line, like the one you might see
in a math class. The absolute value of a number is like asking "how far away is this number from zero?" It doesn't
matter if the number is positive or negative, the absolute value is always positive (or zero). For example, let's say
you have the number -5. If you calculate the absolute value of -5, it's 5 because that's how far -5 is from 0 on the
number line. If you have a positive number, like 3, the absolute value of 3 is simply 3 because 3 is 3 units away
from 0 on the number line. So, the ABS function just gives you the distance of a number from zero, ignoring
whether the number is positive or negative.
Page 539

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ACOS

Syntax: ACOS(number) The ACOS number


should evaluate to a
select acos(0.10) real number greater
,acos(0) than or equal to -1.0
,acos(0.5) and less than or equal
,acos(1); to +1.0.

EXPR_1 EXPR_2 EXPR_3 EXPR_4


1.47 1.57 1.05 0

ACOS computes the inverse cosine (arc cosine) of


its input; the result is a number in the interval [0, pi].

The ACOS mathematical function returns the inverse cosine value of an input radian value. The data type of the
return value is FLOAT. The ACOS function is a way to find out the angle you would need to take the cosine of to
get a specific number. Imagine you're playing with a flashlight and a wall. The wall represents the numbers that the
cosine function can give you. If you shine the flashlight on the wall, it makes a spot. Now, ACOS helps you figure
out what angle you need to hold the flashlight at to make that spot on the wall. In other words, if you know a
certain number that the cosine function gives you (let's call it "x"), the ACOS function will tell you the angle that
you'd need to shine the flashlight at to get that number on the wall. It's like figuring out the "secret angle" that gives
you that specific cosine value.
Page 540

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ACOSH

Syntax: ACOSH( <real_expr> ) The real_expr


(real expression)
select acosh(2.352409615) should evaluate to a
,acosh(4.702409615) FLOAT number greater
,acosh(8.142409615) than or equal to 1.0.

ACOSH(2.352409615) ACOSH(4.702409615) ACOSH(8.142409615)


1.5 2.23 2.79
The ACOSH mathematical function computes the
inverse (arc) hyperbolic cosine of its input.

The ACOSH mathematical function computes its input's inverse (arc) hyperbolic cosine. Therefore, the returned
value has a data type of FLOAT. ACOSH stands for "inverse hyperbolic cosine." Imagine you're dealing with a
special kind of curve that looks a bit like a stretched-out smile. This curve is called a hyperbolic cosine curve.
Now, the ACOSH function helps you figure out a special number associated with this curve. Let's say you have a
number, let's call it "y," which is on this hyperbolic cosine curve. If you use the ACOSH function on that number,
it will tell you how much you need to stretch the smiley curve horizontally to reach the point "y." So, in simple
terms, ACOSH helps you find out how much you need to stretch the special curve to get to a particular point. It's
like finding the stretching factor needed to reach a certain spot on the stretched-out smiley curve.

Page 541

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ASIN

Syntax: ASIN(number) ASIN returns the arc


sine in radians
select asin(1) (not degrees)
,asin(0) in the
,asin(0.5) ; range [-pi/2, pi/2].

EXPR_1 EXPR_2 EXPR_3


1.57 0 0.52

ASIN (arc sine) computes the inverse sine


of its argument, which the result is a
number in the interval [-pi/2, pi/2].

The ASIN mathematical function returns the inverse sine value of an input radian value. The data type of the return
value is FLOAT. ASIN returns the arcsine in radians (not degrees) in the range [-pi/2, pi/2]. The ASIN function is
a way to find out the angle you would need to point at to get a specific ratio involving a right triangle. Imagine you
have a right triangle, which is a triangle with one 90-degree angle (a perfect corner). One side of this triangle is
called the "opposite" side, another side is the "adjacent" side, and the longest side is the "hypotenuse." Now, let's
say you know the lengths of two sides of this triangle: the "opposite" side and the "hypotenuse." The ASIN
function helps you figure out the angle you would need to point at, so that when you take the ratio of the "opposite"
side to the "hypotenuse," you get the number you have. In simple terms, ASIN helps you find the angle that gives
you a specific ratio of sides in a right triangle. It's like finding the right angle to make a certain fraction using the
lengths of the triangle's sides.
Page 542

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ASINH

Syntax: ASINH(<real_expr> ) The expression


should evaluate to a
select asinh(2.129279455) real number.
,asinh(4.25855891)
,asinh(8.51711782) ;

ASINH(2.129279455) ASINH(4.25855891) ASINH(8.51711782)


1.5 2.16 2.84
ASINH computes the inverse (arc) hyperbolic sine of its argument.

The ASINH mathematical function computes its argument's inverse (arc) hyperbolic sine. The ASINH function
might sound a bit complicated, but it's actually quite simple to understand. ASINH stands for "inverse hyperbolic
sine." Imagine you have a special curve that looks like a smooth hill. This curve is called a hyperbolic sine curve.
Now, the ASINH function helps you figure out a special number related to this curve. Let's say you have a number,
let's call it "y," which is on this hyperbolic sine curve. If you use the ASINH function on that number, it will tell
you how much you need to stretch the hill-like curve vertically to reach the point "y." So, in simple terms, ASINH
helps you find out how much you need to stretch the special curve to get to a particular point. It's like finding the
stretching factor needed to reach a certain spot on the smooth hill-like curve.

Page 543

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ATAN

Syntax: ATAN(<real_expr> ) The expression


Select atan(1) should evaluate to a
,atan(2) real number.
,atan(3)
,atan(100);
EXPR_1 EXPR_2 EXPR_3 EXPR_4
0.79 1.11 1.25 1.56

The ATAN mathematical function computes the inverse


tangent (arc tangent) of its argument; the result is a number
in the interval [-pi, pi].

The ATAN numeric function (Trigonometric) computes the inverse tangent (arctangent) of its argument; a result is
a number in the interval [-pi, pi]. Moreover, ATAN returns the arctangent in radians (not degrees) in the range [-pi,
pi]. The ATAN function is a way to find out the angle you would need to turn to in order to get a specific ratio
involving a right triangle. Imagine you have a right triangle, which is a triangle with one 90-degree angle (like a
corner of a book). One side of this triangle is called the "opposite" side, another side is the "adjacent" side, and the
angle between these two sides is what we're curious about. Now, let's say you know the lengths of the "opposite"
and "adjacent" sides of this triangle. The ATAN function helps you figure out the angle you would need to turn to,
so that when you take the ratio of the "opposite" side to the "adjacent" side, you get the specific number you have.
In simpler words, ATAN helps you find the angle that gives you a certain ratio of sides in a right triangle. It's like
turning to a particular angle to create a specific fraction using the lengths of the triangle's sides.

Page 544

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ATAN2
The arc tangent is the angle Syntax: ATAN2( <y> , <x> ) The first
between: parameter is the
Select atan2(1,2) Y coordinate,
The X axis. The ray from the
,atan2(2,5) not the X
point (0,0) to the point (X, Y)
,atan2(5,5); coordinate.
(where X and Y are not both 0).
EXPR_1 EXPR_2 EXPR_3
0.46 0.38 0.79
The ATAN mathematical function computes the Example: if x > 0, then the
inverse tangent (arc tangent) expression ATAN2(y, x) is
of the ratio of its two arguments. equivalent to ATAN(y/x).

The ATAN numeric function (Trigonometric) computes the inverse tangent (arctangent) of the ratio of its two
arguments. The arctangent is the angle between the X-axis and the ray from the point (0,0) to the point (X, Y)
(where X and Y are not both 0). The data type of the returned value is DOUBLE. The returned value is in radians,
not degrees, and a number in the interval [-pi, pi]. The ATAN2 function might seem a bit complex, but it's actually
quite practical. Imagine you're trying to find out the angle between two points on a flat surface, like a map. ATAN2
helps you figure out that angle easily. Picture a coordinate system like the "x" and "y" axes in math class. Now,
let's say you have two points: one is your starting point, and the other is where you want to go. ATAN2 helps you
find the angle you should move in, from the starting point, to reach the destination point. But there's a twist.
ATAN2 is really good at dealing with all the directions around the coordinate system. It considers which quadrant
your destination point is in, so you get the right angle no matter where the point is located. In simpler terms,
ATAN2 helps you find the angle you need to move to reach a destination point from a starting point, while making
sure you're facing the right way on your map. It's like getting the perfect compass direction to get from one spot to
another on a treasure map.

Page 545

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ATANH

ATANH computes the Syntax: ATANH( <real_expr> ) The real_expr


inverse (arc) should evaluate
hyperbolic tangent of select atanh(0.9051482536) to a real number
its argument. ,atanh(0.5333333333) between -1.0
,atanh(0.1212121212); and +1.0
(inclusive).

ATANH(0.9051482536) ATANH(0.53333333335) ATANH(0.1212121212)


1.5 0.59 0.12

The ATANH numeric function (Trigonometric) computes its argument's inverse (arc) hyperbolic tangent. The
real_expr should evaluate a real number between -1.0 and +1.0 (inclusive). ATANH stands for "inverse hyperbolic
tangent." Imagine you have a special curve that's kind of like a squished rubber band. This curve is called a
hyperbolic tangent curve. Now, the ATANH function helps you figure out a special number related to this curve.
Let's say you have a number, let's call it "y," which is on this squished rubber band curve. If you use the ATANH
function on that number, it will tell you how much you need to stretch the rubber band curve vertically to reach the
point "y." In simple terms, ATANH helps you find out how much you need to stretch the special curve to get to a
certain point. It's like finding the stretching factor needed to reach a specific spot on the squished rubber band
curve.

Page 546

Licensed to , [email protected]
Chapter 17 Mathematical Functions

CBRT

Syntax: CBRT(expr) CBRT always returns a


The CBRT numeric
floating-point number,
function returns the
select cbrt(0) even if the input
cubic root of a
,cbrt(2) expression is of type
numeric expression.
,cbrt(3) integer.
,cbrt(27)

CBRT(0) CBRT(2) CBRT(3) CBRT(27)


0 1.26 1.44 3

The CBRT numeric function returns the cube root of a numeric expression. CBRT always returns a floating-point
number, even if the input expression is a type integer. Imagine you have a big cube, like a block, and you want to
figure out the length of one side of the cube. The CBRT function helps you with that. It tells you the number you
need to multiply by itself three times (which means multiplying it by itself, and then again, and again) to get the
original number you started with. For example, if you use the CBRT function on 8, it will give you 2, because 2 * 2
* 2 equals 8. So, the CBRT function helps you find the special number that, when multiplied by itself three times,
gives you the original number. It's like finding the side length of a cube when you know its volume.

Page 547

Licensed to , [email protected]
Chapter 17 Mathematical Functions

Ceil

Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1
+ Snowflake SELECT ceil(-0.1) as ceil_1 ceil finds the
+ Azure Cloud
+ DB2
,ceil(3.333) as ceil_2 smallest
+ Excel ,order_total as order_total integer
+ Greenplum ,ceil(order_total) as ceiling_total NOT smaller
+ Hadoop FROM order_table than X
+ Kognitio LIMIT 1;
+ Netezza
+ Oracle
+ Matrix Messages Garden of Analysis Result 1
+ Yellowbrick
+ SQL Server ceil_1 ceil_2 order_total ceiling_total
+ Sybase
+ Teradata 1 0 4 12347.53 12348
+ Vertica

The "ceil" function, short for "ceiling," helps you round up a number to the nearest whole number that's greater
than or equal to it. Imagine you have a number that's not a whole number, like 3.7. When you use the "ceil"
function on this number, it pushes it up to the next bigger whole number, which is 4. So, the "ceil" function always
moves a number up to the closest higher whole number. It's like using the "ceil" function to round up a number is
like making sure you have enough space to cover that number, even if you have to use a larger unit.

Page 548

Licensed to , [email protected]
Chapter 17 Mathematical Functions

COS

COS computes the Syntax: COS<real_expr> The real_expr


cosine of its should evaluate
argument; the select cos(0) to a real number.
argument should be ,cos(pi()/3) The value should
expressed in ,cos(radians(90)); be in radians, not
radians. degrees.

EXPR_1 EXPR_2 EXPR_3


1 0.5 0

The COS mathematical function returns the cosine value of an input radian value. The COS function, short for
"cosine," is a way to figure out a special value for an angle in a triangle. Imagine you're playing with a flashlight
and a wall. When you shine the flashlight on the wall, it makes a shadow of something you're holding at a certain
angle. The COS function helps you find out how long that shadow is compared to how far away you're holding the
thing. In other words, if you have an angle and you use the COS function on it, you'll get a number. This number
tells you the ratio between the length of the shadow and the distance from the flashlight to the wall. It's like a
special math tool for understanding angles and lengths in triangles.

Page 549

Licensed to , [email protected]
Chapter 17 Mathematical Functions

COSH

COSH computes Syntax: COSH<real_expr> The real_expr


the hyperbolic should evaluate
cosine of its select cosh(1.5) to a real number.
argument. ,cosh(2.5)
,cosh(5.8);

COSH(1.5) COSH(2.5) COSH(5.8)


2.35 6.13 165.15

The COSH mathematical function computes the hyperbolic cosine of its argument. Therefore, the real_expr should
evaluate as a real number. COSH stands for "hyperbolic cosine," and it's a mathematical tool to deal with certain
curves. Imagine you have a special curve that looks like a chain hanging between two points. This curve is called a
hyperbolic cosine curve. The COSH function helps you figure out a special number related to this curve. If you
have a number, let's call it "x," and you use the COSH function on it, you'll get a new number. This new number
tells you how much the chain-like curve stretches out at the point "x." In simpler terms, COSH helps you
understand how much this specific curve stretches or grows at a particular point. It's like figuring out the "stretch
factor" of the chain-like curve.

Page 550

Licensed to , [email protected]
Chapter 17 Mathematical Functions

COT

COT computes the Syntax: COT<real_expr> The real_expr


cotangent of its should be
argument; the select cot(50) expressed in
argument should be ,cot(pi()/3) radians.
expressed in ,cot(radians(90));
radians.
EXPR_1 EXPR_2 EXPR_3
-3.68 0.58 0

The COT mathematical function computes the cotangent of its argument; the argument should be expressed in
radians. The COT function, short for "cotangent," is like a special tool to figure out another angle-related value.
Imagine you have an angle in a triangle, and you want to know how much you need to stretch a rope horizontally
from that angle to a certain point on the ground. The COT function helps you calculate that stretching factor. In
simple words, if you have an angle and you use the COT function on it, you get a number. This number helps you
understand the horizontal stretching of a rope from the angle to the ground. It's like using the COT function to find
out how far the rope reaches when you pull it from a specific angle.

Page 551

Licensed to , [email protected]
Chapter 17 Mathematical Functions

DEGREES

Degrees Syntax: DEGREES( <real_expr> ) The real_expr


converts representing
radians to select degrees(pi()/3) the number of
degrees. ,degrees(pi()) radians.
,degrees(1);

EXPR_1 EXPR_2 EXPR_3


60 180 57.3

The DEGREES mathematical function converts a number from radians to degrees. Degrees return a type
REAL/FLOAT (double-precision floating-point). The DEGREES function is a tool that helps you understand
angles in a way that's easier to work with in everyday situations. Imagine you're playing with a compass or looking
at a map. The DEGREES function helps you take an angle that's given in a different measurement called radians
and convert it into degrees. Degrees are the kind of angles you're more familiar with – like 90 degrees making a
right angle or 180 degrees making a straight line. So, when you use the DEGREES function on an angle given in
radians, it helps you translate that angle into degrees, which are more intuitive for most people to understand. It's
like converting the angle from a special language that's used in math to a more common language of angles.

Page 552

Licensed to , [email protected]
Chapter 17 Mathematical Functions

DIV

Syntax : div(expr1, expr2)

SELECT div(100, 10)


,div(100, 12.5)

(100 div 10) (100 div 12.5)


10 8

The DIV mathematical function returns the integer quotient from the division of two DECIMAL values. The DIV
function, short for "division," is like a way of sharing things into equal groups. Imagine you have a bunch of
candies, and you want to share them equally among your friends. If you use the DIV function, you're basically
figuring out how many candies each friend will get if you split them up evenly. For example, if you have 10
candies and you use the DIV function by 2 (which you might write as 10 DIV 2), it's like saying, "How many
candies can I give to each friend if I divide them equally between 2 friends?" The answer is 5 candies each. So, the
DIV function helps you divide things into equal groups and tells you how much each group should get. It's like a
simple math tool for sharing things fairly.

Page 553

Licensed to , [email protected]
Chapter 17 Mathematical Functions

EXP

The EXP Syntax: EXP( <real_expr> )


mathematical
function SELECT EXP(10)
computes ,EXP(20) ;
Euler’s number
e
raised to a EXPR_1 EXPR_2
floating-point 22026.47 485165195.41
value.

The EXP mathematical function computes Euler’s number e raised to a floating-point value. Euler's number, often
represented as "e," is a very special number in mathematics. Imagine you're saving money in a bank account, and
the bank is offering to give you interest on your money. Euler's number "e" is like a super magical way of
calculating that interest when it keeps getting added frequently, like every instant. It's like the bank giving you
interest not just once in a while, but all the time, faster and faster. For instance, let's say you start with $1 and your
bank uses the "e" formula for continuous compound interest. Over time, your money will grow to around $2.718,
which is the value of "e." So, "e" is a special number that shows up in all sorts of places in math and science where
things grow or change really smoothly and continuously, like how your money might grow with super quick
interest.

Page 554

Licensed to , [email protected]
Chapter 17 Mathematical Functions

FACTORIAL

The FACTORIAL mathematical Syntax: FACTORIAL( <integer_expr> )


function computes the factorial of
its input. The input argument select factorial(0)
must be an integer expression in ,factorial(1)
the range of 0 to 33. ,factorial(5)
,factorial(6);
factorial(0) factorial(1) factorial(5) factorial(6)
1 1 120 720

In mathematics, the factorial of a non-negative integer n, denoted by n!, is the


product of all positive integers less than or equal to n:
For example, 5! = 5 * 4 * 3 * 2 * 1 = 120.

The FACTORIAL mathematical function computes the factorial of its input. It's like a special way to multiply a
bunch of numbers together. Imagine you have a number, let's say 5. The factorial of 5, written as 5!, means you
multiply 5 by all the whole numbers that come before it: 5 x 4 x 3 x 2 x 1. So, 5! is equal to 120 because 5 x 4 x 3 x
2 x 1 equals 120. In general, if you have a number "n," then n! means you multiply n by all the whole numbers
from 1 to n. It's like a math trick to calculate how many different ways you can arrange things. So, the factorial
function is just a fancy way of multiplying numbers in a specific sequence to find out how many arrangements or
combinations you can make with those numbers.

Page 555

Licensed to , [email protected]
Chapter 17 Mathematical Functions

Floor

Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1
+ Snowflake SELECT floor(-0.1) as floor_1 Floor finds
+ Azure Cloud the largest
+ DB2
,floor(3.333) as floorl_2
,order_total as order_total integer
+ Excel NOT greater
+ Greenplum ,floor(order_total) as floor_total than X
+ Hadoop FROM order_table
+ Kognitio LIMIT 1;
+ Netezza
+ Oracle
+ Matrix Messages Garden of Analysis Result 1
+ Yellowbrick
+ SQL Server floor_1 floor_2 order_total floor_total
+ Sybase
+ Teradata 1 -1 3 12347.53 12347
+ Vertica

The "floor" function is like a way of rounding down a number to the nearest whole number. Imagine you have a
number that's not a whole number, like 3.8. When you use the "floor" function on this number, it pushes it down to
the closest smaller whole number, which is 3. In simpler words, using the "floor" function on a number makes sure
you're on or below that number on the number line. It's like finding the closest lower step on a staircase when you
want to go down.
Page 556

Licensed to , [email protected]
Chapter 17 Mathematical Functions

LN

The LN Syntax: LN( <expr> )


mathematical SELECT TOTAL_SALES, LN(TOTAL_SALES)
function returns the FROM SALES_SIMPLE_EXAMPLE ;
natural logarithm of a
numeric expression. TOTAL_SALES ln(TOTAL_SALES)
1.00 0
2.00 0.69
3.00 1.1
4.00 1.39
5.00 1.61
6.00 1.79
7.00 1.95
8.00 2.08
999.00 6.91
9999.00 9.21

The LN function stands for "natural logarithm," and it's a way to figure out a special value for numbers. Imagine
you have a number, like 10. The LN function helps you find another number that, when raised to a certain power,
gives you the number you started with. In this case, if you use the LN function on 10, it will give you a number
around 2.3026. This means that if you raise a special number (which is "e," Euler's number) to the power of
2.3026, you'll get very close to 10. In simpler words, the LN function helps you understand the power you need to
use on a specific number to get another number as a result. It's like finding the secret power you should use on "e"
to make it equal your original number.
Page 557

Licensed to , [email protected]
Chapter 17 Mathematical Functions

LOG

<expr> The value for Syntax: LOG(<base>, <expr>) The “base” to use
which you want to (e.g., 10 for base 10
know the log. arithmetic).
SELECT log(2, 0.5)
This can be of any ,log(2,1) The base can be of
numeric data type ,log(2,16) any numeric data type
(INTEGER, fixed- (INTEGER, fixed-
point, or floating LOG(2,0.5) LOG(2,1) LOG(2,16) point, or floating
point). point).
-1 0 4
The expr should be The base should be
greater than 0. greater than 0, and
not be exactly 1.0.

The LOG mathematical function returns the logarithm of a numeric expression. The LOG function helps you figure
out a special power that you need to use on a specific number to get another number. Imagine you have a number,
let's say 100. If you use the LOG function on this number with a base of 10 (written as "LOG base 10 of 100"), it
gives you 2. This means that if you raise 10 to the power of 2 (which is 10 * 10), you'll get 100. So, the LOG
function helps you understand what power you need to use on a certain base number to end up with your original
number. It's like solving a puzzle to find out how many times you need to multiply a base number to get the result
you're given.

Page 558

Licensed to , [email protected]
Chapter 17 Mathematical Functions

MOD

MOD returns Syntax: MOD( <expr1> , <expr2> ) Expr1 - A numeric


the remainder expression.
of input expr1 SELECT mod(25, 100)
divided by input ,mod(100,25) Expr2 - A numeric
expr2. ,mod(10,9) expression.

EXPR_1 EXPR_2 EXPR_3


25 0 1

The MOD function is like a way to find the remainder when you divide two numbers. Imagine you have a bunch of
candies, and you want to share them with your friends equally. If you use the MOD function, you're not interested
in how many candies each friend gets, but instead, you want to know how many candies are left over after you've
shared them as evenly as possible. For example, if you have 10 candies and you're sharing them between 3 friends,
you can use the MOD function to find out that you'll have 1 candy left over. This means you can give each friend 3
candies, and you'll still have 1 candy left. So, the MOD function helps you find the "leftover" part when you're
dividing numbers into groups as fairly as possible. It's like a way of checking what's left after you've done your
sharing.

Page 559

Licensed to , [email protected]
Chapter 17 Mathematical Functions

PI

The PI mathematical Syntax: PI()


function returns the
value of pi as a select pi()
floating-point value. pi()::decimal(20,19)

PI() PI()
3.14 3.1415926535897930000

PI is a special number in math that's used to understand circles. Imagine you have a round pizza. The number PI
helps you figure out how big the pizza's circumference (the distance around the edge) is compared to its diameter
(the width across the middle). When you use PI, you're finding out how many times the diameter of the circle fits
around its edge. This number is around 3.14159, but we usually just call it "PI" for short. So, PI is a magical
number that helps us understand how circles work and how their size is related to the distance around them. It's like
a secret key for solving circle puzzles in math!

Page 560

Licensed to , [email protected]
Chapter 17 Mathematical Functions

POW or POWER

The POW or POWER Syntax: POW(x, y) or POWER(x,y)


mathematical function
returns a number (x) SELECT power(10.12345,10)
raised to the specified ,power(10,10)
power (y). ,pow(2,3) ;

POWER(10.12345,10) POWER(10,10) pow(2,3)


11305386703.89 10000000000 8

Always returns a floating-point number, even if the


input expressions are of integer types.

The POW or POWER function is like a superpower for numbers. It helps you raise a number to a certain power,
which means multiplying it by itself a specific number of times. Imagine you have a number, let's say 2, and you
want to make it stronger by using its superpower. If you use the POW function with 2 and a power of 3 (written as
2^3), it means you're taking 2 and multiplying it by itself three times: 2 * 2 * 2 = 8. So, 2^3 equals 8. In simpler
terms, the POW or POWER function lets you make a number super strong by raising it to a certain power, which
tells you how many times to multiply it by itself. It's like giving a number its very own superhero boost!

Page 561

Licensed to , [email protected]
Chapter 17 Mathematical Functions

RADIANS

The RADIANS Syntax: RADIANS( <real_expr> )


mathematical
function converts select radians(0)
degrees to radians. ,radians(60)
,radians(180)
,radians(360)
,radians(720);

EXPR_1 EXPR_2 EXPR_3 EXPR_4 EXPR_5


0 1.05 3.14 6.28 12.57

The RADIANS mathematical function converts a number from degrees to radians. The RADIANS function helps
you use a different way to measure angles that's often used in more advanced math. Imagine you have a pizza, and
you want to figure out how much of the pizza slice you're looking at. Normally, we use degrees to measure that.
But the RADIANS function helps you measure the same thing using a different unit called radians. In simple
terms, if you have an angle and you use the RADIANS function, you're changing how you measure that angle from
degrees to radians. It's like using a different measuring tape to see how big the angle is. RADIANS are often used
in math because they're a bit more precise for certain calculations, especially when things change smoothly and
continuously, like when objects move or waves oscillate.

Page 562

Licensed to , [email protected]
Chapter 17 Mathematical Functions

ROUND

ROUND returns Syntax: ROUND( <input_expr> [, <scale_expr> ] )


rounded values for
input_expr. SELECT round(2.5)
,round(3.3)
,round(3.6)

round(2.5) round(3.3) round(3.6)


3 3 4

The ROUND function is like a way to make a number simpler and easier to work with. Imagine you have a number
that's a bit messy with lots of decimal places, like 3.857. When you use the ROUND function on this number,
you're making it neater by choosing the nearest whole number or a specific number of decimal places. For instance,
if you use the ROUND function on 3.857 and you want 2 decimal places, it becomes 3.86 because that's the closest
number when you look just at two decimal places. So, the ROUND function helps you tidy up numbers by picking
the nearest whole number or a certain number of decimal places, making them easier to handle and understand. It's
like smoothing out the rough edges of numbers.

Page 563

Licensed to , [email protected]
Chapter 17 Mathematical Functions

SIGN

Syntax: SIGN( <expr> )

Returns the sign of a number:


0 = the number is 0.
1 = the number is a positive number
-1 = the number is a negative number

SELECT SIGN(-52.3)
,SIGN(55.5)
,SIGN(0) ;

EXPR_1 EXPR_2 EXPR_3


-1 1 0

The SIGN function is like a way to understand whether a number is positive, negative, or zero. Imagine you have a
number, like -5. When you use the SIGN function on this number, it tells you if it's positive, negative, or zero. If
the number is positive, the SIGN function gives you 1. If it's negative, you get -1. And if the number is exactly
zero, the SIGN function gives you 0. In simpler words, the SIGN function helps you quickly figure out the
"direction" of a number, whether it's going up (positive), down (negative), or not going anywhere (zero). It's like a
math compass that shows you which way the number is pointing.

Page 564

Licensed to , [email protected]
Chapter 17 Mathematical Functions

SIN

The SIN Syntax: SIN(<real_expr> ) the real_expr


mathematical should be
function computes select sin(0) expressed in
the sine of its ,sin(pi()/3) radians.
argument. ,sin(radians(90));

EXPR_1 EXPR_2 EXPR_3


0 0.87 1

The SIN function is a tool that helps you understand heights and distances in a triangle, especially when you're
dealing with angles. Imagine you're shining a flashlight at a wall. When you move the flashlight's beam up or
down, it creates a spot that goes higher or lower on the wall. The SIN function helps you figure out how high that
spot is on the wall based on the angle at which you're holding the flashlight. So, if you have an angle, and you use
the SIN function on it, you get a number. This number tells you how high the spot on the wall will be, considering
the angle. It's like a magic math trick to understand how tall things will be when you shine light on them at a
specific angle.

Page 565

Licensed to , [email protected]
Chapter 17 Mathematical Functions

SINH

The SINH Syntax: SINH(<real_expr> ) the real_expr


mathematical should
function computes select sinh(1.5) evaluate to a
the hyperbolic sine ,sinh(5.0) real number.
of its argument. ,sinh(7.33333)

SINH(1.5) SINH(5.0) SINH(7.33333)


2.13 74.2 765.23

SINH stands for "hyperbolic sine," and it's a mathematical tool to work with certain curves. Imagine you have a
special curve that looks like a hill. This curve is called a hyperbolic sine curve. The SINH function helps you figure
out a special number related to this curve. If you have a number, let's call it "x," and you use the SINH function on
it, you'll get a new number. This new number helps you understand how much the hill-like curve stretches at the
point "x." In simpler terms, the SINH function helps you find out how much this specific curve stretches or grows
at a certain point. It's like figuring out the "stretch factor" of the hill-like curve.

Page 566

Licensed to , [email protected]
Chapter 17 Mathematical Functions

SQRT

SQRT returns the Syntax: SQRT(expr)


square-root of a
non-negative
SELECT SQRT(10)
numeric
,SQRT(4)
expression.
,SQRT(16)

EXPR_1 EXPR_2 EXPR_3


3.16 2 4

The SQRT function is like a magical tool that helps you find the length of one side of a special square. Imagine
you have a square, and you know how big the area of that square is. If you use the SQRT function on that area, it
helps you figure out the length of one of the sides of that square. For instance, if you have a square with an area of
16, when you use the SQRT function on 16, it gives you 4. This means that each side of the square is 4 units long.
In simple words, the SQRT function helps you find the "secret" length of one side of a square when you know how
big its area is. It's like using a magic spell to find out the missing piece of information about the square.

Page 567

Licensed to , [email protected]
Chapter 17 Mathematical Functions

TAN

The TAN Syntax: TAN(<real_expr> ) the real_expr


mathematical should be
function computes select tan(0) expressed in
the tangent of its ,tan(pi()/3) radians.
argument. ,tan(radians(90));

EXPR_1 EXPR_2 EXPR_3


0 1.73 16331239353195400

The TAN function is a way to understand how tall something is compared to how far away you're standing, based
on an angle. Imagine you're standing a bit away from a tall tree. If you look up at the top of the tree, you're forming
an angle. The TAN function helps you figure out how tall the tree is compared to the distance you're standing from
it. So, if you have an angle and you use the TAN function on it, you get a number. This number helps you
understand the ratio between the tree's height and the distance you're standing away from it. It's like using a math
tool to find out how tall something is without needing to measure it directly.

Page 568

Licensed to , [email protected]
Chapter 17 Mathematical Functions

TANH

The TANH Syntax: TANH(<real_expr> ) the real_expr


mathematical should
function computes select tanh(1.5) evaluate to a
the hyperbolic ,tanh(5.0) real number.
tangent of its ,tanh(7.33333) ;
argument.

TANH(1.5) TANH(5.0) TANH(7.33333)


0.91 1 1

TANH stands for "hyperbolic tangent," and it's a mathematical tool used to work with certain curves. Imagine you
have a special curve that looks like a squished rubber band. This curve is called a hyperbolic tangent curve. The
TANH function helps you figure out a special number related to this curve. If you have a number, let's call it "x,"
and you use the TANH function on it, you'll get a new number. This new number tells you how much the rubber
band-like curve stretches at the point "x." In simpler terms, the TANH function helps you find out how much this
specific curve stretches or grows at a certain point. It's like figuring out the "stretch factor" of the squished rubber
band curve.

Page 569

Licensed to , [email protected]
Chapter 17 Mathematical Functions

Page 570

Licensed to , [email protected]
The End

Page 571

Licensed to , [email protected]
The End

Page 572

Licensed to , [email protected]
Powered by TCPDF (www.tcpdf.org)

You might also like