Databricks SQL 2024
Databricks SQL 2024
com
The Tera-Tom Video Series
Licensed to , [email protected]
The Tera-Tom and David Cook Cloud Series
Each Cloud Series book targets a cloud database. The books take a building block approach, always starting
simple, and then each page builds upon the previous point.
Licensed to , [email protected]
Tera-Tom- Author of over 90 Books
Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach
millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they
are to understand. They are so easy that a seven-year-old boy (raised by wolves) can understand them!
Licensed to , [email protected]
The Query Tool of the Future is Nexus
The Nexus is the greatest tool for data the world has ever known. Download a free trial at www.CoffingDW.com.
Checkout the Nexus in action on YouTube here: https://www.youtube.com/watch?v=drNlY1cyZrw
Licensed to , [email protected]
Trademarks and Copyrights
Databricks is a registered trademark of Databricks. Snowflake is a registered trademark of Snowflake. Microsoft
Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-
SQL, Azure SQL Data Warehouse, and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET, and
SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2, and Netezza
are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards
Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of
Linus Torvalds. Java and Oracle are a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a
trademark of Kognitio. Greenplum is a trademark of EMC Corporation. Nexus Query Chameleon is a trademark
of Coffing Data Warehousing.
Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity concerning any loss
or damages arising from the information contained in this book or from the use of programs or program segments
that are included. The manual is not a publication of Microsoft Corporation, nor was it produced in conjunction
with Microsoft Corporation.
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any
means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the
publisher. No patent liability is assumed with respect to the use of information contained herein. Although we
took every precaution in preparing this book, the publisher and author assume no responsibility for errors or
omissions, neither is any liability assumed for damages resulting from the use of the information contained herein.
Licensed to , [email protected]
About Tom Coffing
Tom Coffing, better known as Tera-Tom, has been the CEO and founder of Coffing Data Warehousing for the past 20
years. Tom has written over 85 books on all aspects of Teradata, Netezza, Snowflake, Redshift, Yellowbrick, Vertica,
SQL Server, Azure Synapse, MySQL, Postgres, Greenplum, Oracle, Databricks and more. In addition, Tom has taught
over 1,000 Teradata classes in India, Africa, Europe, China, Malaysia, and North America.
Tom also owns and designs the Nexus Desktop and Nexus Server software. The Nexus Desktop software allows users
to query all database platforms, migrate and move data automatically between database platforms, and join data across
all database platforms. As a result, the Nexus product line is one of the most sophisticated enterprise tools in the industry.
In High School, Tom was the first athlete from his school to every place at the state in any sport, was selected by his
school to represent them at Buckeye Boys State, and Tom is proud of his induction into the first class of the Lakota High
School Hall of Fame.
At the Universthe ity of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler,
Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a
bachelor’s degree in Speech Communications.
After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as
an actor. Tom is the proud father of three beautiful children and seven grandchildren and has been married for the past
32 years. You can contact Tom at 513 300-0341 or [email protected].
Licensed to , [email protected]
About David Cook
For nearly a decade, David Cook has been one of the lead developers on the Nexus Query Chameleon software
at Coffing Data Warehousing. While in this position, David has designed and created several data analysis and
migration tools, including the Garden of Analysis, which queries answer sets without leaving the PC. He is also
the creator of the database to database move and compare module that allows users to move and compare
databases from different platforms.
David brings to Coffing Data Warehousing a strong background of experience with information technology and
management, including time spent managing logistics information for national building products manufacturers.
David's ability to communicate well, combined with his programming talent, has made him an excellent asset for
Coffing Data Warehousing.
David graduated cum laude from The Ohio State University, receiving a BA in Communication Technology.
David furthered his education at Miami University, where he held a senator position in the student government
while maintaining a 4.0 GPA in his study area in Computer Science and Programming.
Licensed to , [email protected]
Table of Contents
Contents
Chapter 1 – Introduction to SQL .................................................................................................................................. 1
Introduction ................................................................................................................................................................ 2
SELECT * (All Columns) in a Table ......................................................................................................................... 3
SELECT Specific Columns in a Table ...................................................................................................................... 4
Commas in the Front or Back? .................................................................................................................................. 5
Place your Commas in front for better Debugging Capabilities ................................................................................ 6
Sort the Data with the ORDER BY Keyword ........................................................................................................... 7
Use a Column name or Number in an ORDER BY Statement ................................................................................. 8
Two Examples of ORDER BY using Different Techniques ..................................................................................... 9
Changing the ORDER BY to Descending Order ..................................................................................................... 10
Null Values Sort First in Ascending Mode (Default) .............................................................................................. 11
Order By with Nulls Last ......................................................................................................................................... 12
Order By with Nulls First ......................................................................................................................................... 13
Major Sort vs. Minor Sort ........................................................................................................................................ 14
Multiple Sort Keys using Names vs. Numbers ........................................................................................................ 15
An Order By That Uses an Expression .................................................................................................................... 16
Sorts are Alphabetical, NOT Logical ....................................................................................................................... 17
Using A Valued CASE Statement to Sort Logically ............................................................................................... 18
Using A Searched CASE Statement to Sort Logically ............................................................................................ 19
Quiz – Can you Add a Minor Sort? ......................................................................................................................... 20
Answer – Can you Add a Minor Sort?..................................................................................................................... 21
Order By Decode...................................................................................................................................................... 22
Quiz – Can you Add Two Minor Sorts Using Decode? .......................................................................................... 23
Answer – Can you Add Two Minor Sorts Using Decode?...................................................................................... 24
How to ALIAS a Column name ............................................................................................................................... 25
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Quiz – Which Rows from Both Tables Won’t Return? ......................................................................................... 117
Answer to Quiz – Which rows from both tables Won’t Return?........................................................................... 118
Left Outer Join ....................................................................................................................................................... 119
Left Outer Join Results........................................................................................................................................... 120
Right Outer Join ..................................................................................................................................................... 121
Right Outer Join Example and Results .................................................................................................................. 122
Full Outer Join........................................................................................................................................................ 123
Full Outer Join Results ........................................................................................................................................... 124
Which Tables are Left Tables and Which are Right? ............................................................................................ 125
Answer - Which Tables are Left Tables and Which are Right? ............................................................................ 126
INNER JOIN with Additional AND Clause .......................................................................................................... 127
ANSI INNER JOIN with Additional AND Clause ............................................................................................... 128
ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 129
OUTER JOIN with Additional WHERE Clause ................................................................................................... 130
OUTER JOIN with Additional AND Clause ......................................................................................................... 131
The DREADED Product Join ................................................................................................................................ 132
The DREADED Product Join Results ................................................................................................................... 133
Cartesian Product Join with Traditional Syntax .................................................................................................... 134
Cartesian Product Join with ANSI Syntax ............................................................................................................. 135
The CROSS JOIN .................................................................................................................................................. 136
The CROSS JOIN Answer Set............................................................................................................................... 137
The Self Join........................................................................................................................................................... 138
The Self Join with ANSI Syntax ............................................................................................................................ 139
An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 140
Quiz – Can you Write the 3-Table Join? ............................................................................................................... 141
Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 142
Quiz – Can you Write the 3-Table Join Using ANSI Syntax? .............................................................................. 143
Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 144
Quiz – Can you Place the ON Clauses at the End?................................................................................................ 145
Licensed to , [email protected]
Table of Contents
Answer – Can you Place the ON Clauses at the End? ........................................................................................... 146
The 5-Table Join – Logical Insurance Model ........................................................................................................ 147
Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 148
Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 149
Quiz - Write a Five Table Join Using Traditional Syntax ..................................................................................... 150
Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 151
Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 152
Answer – Re-Write this putting the ON clauses at the END ................................................................................. 153
Chapter 6 – Date Functions....................................................................................................................................... 155
Migrate Any Database to Databricks and Vice Versa ........................................................................................... 156
Current_Date .......................................................................................................................................................... 157
Current_Date, Current_Timestamp, and Current_Timezone ................................................................................ 158
Now() Function ...................................................................................................................................................... 159
Add or Subtract From a Date ................................................................................................................................. 160
Date Function ......................................................................................................................................................... 161
To_Date Function................................................................................................................................................... 162
To_Timestamp Function ........................................................................................................................................ 163
Add or Subtract Days From a Date ........................................................................................................................ 164
Subtract Two Dates for a Difference in Days ........................................................................................................ 165
Subtract Two Dates for a Difference in Days ........................................................................................................ 166
MONTHS_BETWEEN .......................................................................................................................................... 167
The ADD_MONTHS Command ........................................................................................................................... 168
Using the ADD_MONTHS Command to Add 1 Year .......................................................................................... 169
Using the ADD_MONTHS Command to Add 5 Years ........................................................................................ 170
The EXTRACT Command .................................................................................................................................... 171
The EXTRACT Command .................................................................................................................................... 172
EXTRACT from DATES and TIME ..................................................................................................................... 173
Day, Month, Year, DayofMonth, DayofWeek, and DayofYear ............................................................................ 174
Using CASE and Extract to Reformat Dates ......................................................................................................... 175
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
RANK..................................................................................................................................................................... 206
Dense_Rank ........................................................................................................................................................... 207
Getting RANK to Sort in DESC Order .................................................................................................................. 208
RANK() OVER and PARTITION BY .................................................................................................................. 209
RANK() OVER, PARTITION BY, and QUALIFY .............................................................................................. 210
RANK() OVER and a Derived Table .................................................................................................................... 211
RANK() OVER and a WITH Derived Table ......................................................................................................... 212
RANK vs. DENSE_RANK.................................................................................................................................... 213
DENSE_RANK() OVER and PARTITION BY ................................................................................................... 214
PERCENT_RANK() OVER with 14 rows in Calculation .................................................................................... 215
PERCENT_RANK() OVER with 21 rows in Calculation .................................................................................... 216
PERCENT_RANK() OVER and PARTITION BY .............................................................................................. 217
Cumulative Sum ..................................................................................................................................................... 218
Cumulative Sum with CAST ................................................................................................................................. 219
Cumulative Sum – The Sort Explained ................................................................................................................. 220
Cumulative Sum – Rows Unbounded Preceding Explained ................................................................................. 221
Cumulative Sum – Making Sense of the Data ....................................................................................................... 222
Cumulative Sum – Major and Minor Sort Keys .................................................................................................... 223
Reset with a PARTITION BY Statement .............................................................................................................. 224
Totals and Subtotals through Partition By ............................................................................................................. 225
Moving Sum ........................................................................................................................................................... 226
Moving SUM every 3-rows Vs. a Continuous Average ........................................................................................ 227
Partition By Resets the Calculations ...................................................................................................................... 228
Moving Average..................................................................................................................................................... 229
The Moving Window is Current Row and Preceding............................................................................................ 230
How Moving Average Handles the Order By........................................................................................................ 231
Quiz – How is that Total Calculated? .................................................................................................................... 232
Answer to Quiz – How is that Total Calculated? .................................................................................................. 233
Quiz – How is that 4th Row Calculated? ............................................................................................................... 234
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Table of Contents
Licensed to , [email protected]
Chapter 1 Introduction to SQL
"A bird does not sing because it has the answers, it sings because it has a song."
- Anonymous
Page 1
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Introduction
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
We are using the student_table above in many of our early SQL Examples
The picture above is a pictorial of the student_table, which we will use to present some basic examples of SQL and
get some hands-on experience with querying this table. This book attempts to show you the table, show you the
query, and show you the result set.
Page 2
Licensed to , [email protected]
Chapter 1 Introduction to SQL
An asterisk (*)
SELECT * means you want
FROM student_table ; to see ALL
columns in the
table on your
report
Mostly every SQL statement will consist of a SELECT and a FROM clause. You SELECT the columns you want
to see on your report, and an Asterisk (*) means you want to see all columns in the table on the returning answer
set.
Page 3
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT first_name
Commas ,last_name
separate
column
,class_code
names ,grade_pt
FROM student_table ;
Commas must separate column names. Notice that only the columns requested come back on the report, not all
columns. Also, notice that the order of the columns in the SQL is the same order on the report.
Page 4
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Why is the example on the left better even though they are functionally equivalent? Errors are easier to spot, and
comments won't cause errors. Both examples work and return the same answer set.
Page 5
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Page 6
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Rows typically come back to the report in random order. To order the result set, you must use an ORDER BY
statement. When you order by a column, it will order in ASCENDING order. The first column listed in an
ORDER BY statement is called the Major Sort! You will see upcoming examples with multiple columns.
Page 7
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT *
Sorts the answer set FROM student_table
by the second ORDER BY 2 ;
column, which is
last_name Sort by the 2nd column
In the answer set.
The ORDER BY can use a number to represent the sort column. The number 2 represents the second column in
the returning answer set. The example above is also going to default to sort in ascending order.
Page 8
Licensed to , [email protected]
Chapter 1 Introduction to SQL
You have the option of using a number instead of the column name. The column number is represented by what
position it is in the SELECT statement, not the table. If you use an * in your Select Statement, then the column’s
number is represented by its position in the table. The two above queries are the same.
Page 9
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Notice that the answer set sorts in descending order based on the column last_name. Also, notice that last_name is
the second column coming back on the report. We could have done an Order By 2 DESC. If you spell out the
word DESCENDING, the query will fail, so you must remember to use the abbreviation of DESC.
Page 10
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 ; ORDER BY grade_pt ;
The default for an ORDER BY statement is in ascending mode (ASC). Notice that this places the null values at the
beginning of the answer set.
Page 11
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 NULLS LAST ORDER BY grade_pt NULLS LAST
Null values by default sort first in ASC order, but you can use NULLS LAST to place the null values at the end.
Page 12
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT * SELECT *
FROM student_table FROM student_table
ORDER BY 5 DESC NULLS FIRST ORDER BY grade_pt DESC NULLS FIRST
Null values by default sort last in DESC mode, but you can use NULLS FIRST to place the null values at the
beginning.
Page 13
Licensed to , [email protected]
Chapter 1 Introduction to SQL
The first column or number in an ORDER BY statement is called the major sort, which is how the answer set is to
sort the data. When a second column, or number, is added to the ORDER BY statement, all ties are to sort further
by the minor sort. Notice above that the first and second rows tie because we have two ‘SR’ values in class_code.
Page 14
Licensed to , [email protected]
Chapter 1 Introduction to SQL
SELECT * SELECT *
FROM employee_table FROM employee_table
ORDER BY dept_no DESC ORDER BY 2 DESC,
,salary ASC 5,
,last_name ASC; 3 ASC ;
Queries can have multiple columns in the ORDER BY statement. You can even mix and match the column names
with numbers. Both queries in the example above are equivalent.
Page 15
Licensed to , [email protected]
Chapter 1 Introduction to SQL
fullname
Chambers, Mandee
Coffing, Billy
Harrison, Herbert
Jones, Squiggy
Larkins, Loraine
Reilly, William
Smith, John
Smythe, Richard
Strickling, Cletus
The above examples are equivalent. We use the FULLNAME expression in the ORDER BY of the first example.
The second example uses the alias FULLNAME.
Page 16
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Schools generally have the first year of high school as a freshman. Change the query to Order BY class_code
statement logically, so the order is the freshman, sophomore, junior, senior, and then null.
Page 17
Licensed to , [email protected]
Chapter 1 Introduction to SQL
We are using a valued CASE Statement to Order BY class_code logically (FR, SO, JR, SR, null).
Page 18
Licensed to , [email protected]
Chapter 1 Introduction to SQL
We are using a Searched CASE Statement to Order BY class_code logically (FR, SO, JR, SR, null).
Page 19
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Can you Add a Minor Sort of grade_pt ASC to the example above?
Page 20
Licensed to , [email protected]
Chapter 1 Introduction to SQL
All you have to do is place a comma after the END keyword and then add the column. The ASC is not needed as it
is the default.
Page 21
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Order By Decode
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Your quiz assignment is to add two minor sorts to the above DECODE statement. Make the first minor sort
Grade_Pt DESC, and the second minor sort first_name ASC.
Page 23
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Your quiz assignment is to add two minor sorts to the above DECODE statement. Make the first minor sort
Grade_Pt DESC, and the second minor sort first_name ASC.
Page 24
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Make the first minor sort Column WHEN WHEN WHEN WHEN Else 5
GRADE_PT DESC and the second to 'FR' 'SO' 'JR' 'SR'
minor sort FIRST_NAME ASC. CASE then 1 then 2 then 3 then 4
When you ALIAS a column, you give it a new name for the report header.
Page 25
Licensed to , [email protected]
Chapter 1 Introduction to SQL
When you ALIAS a column, you give it a new name for the report header, but you can also use the alias in the
ORDER BY clause. If you use double quotes in the alias, you need to use double quotes in the ORDER BY
clause.
Page 26
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Commas must separate column names. Notice in this example, there is a comma missing between class_code and
grade_pt. The query works, but it thinks you want class_code to have an alias of grade_pt. That is why the
keyword AS is right to use when you alias a column.
Page 27
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Double dashes make a single line comment that will be ignored by the system. Notice that we also have double
dashes after the FROM statement. The system also ignores this comment.
Page 28
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Slash Asterisk starts a multi-line comment, and Asterisk Slash ends the comment.
Page 29
Licensed to , [email protected]
Chapter 1 Introduction to SQL
You can make multi-line comments with double dashes on each line.
Page 30
Licensed to , [email protected]
Chapter 1 Introduction to SQL
Page 31
Licensed to , [email protected]
Chapter 2 The WHERE Clause
“I saw the angel in the marble and carved until I set him free.
- Michelangelo
Page 32
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
The WHERE Clause filters the rows coming back on the report. So, not all rows will return, just the rows that
qualify. In this example, I am asking for the report to bring back only rows WHERE the first name is Henry.
Page 33
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * SELECT *
FROM student_table FROM student_table
WHERE first_name = "Henry" ; WHERE grade_pt = 0.00 ;
Character data needs Numbers never need
single or double-quotes single or double-quotes
Character data (letters) need single quotes, but you need no single quotes for Integers or any other column with a
numeric data type.
Page 34
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Not Equal
The opposite of equal is NOT equal, and here are three ways you can write the NOT equal syntax.
Page 35
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
The first thing you need to know about null is it is unknown data. Null is not a zero or spaces. It is missing data.
Since we don’t know what is in null, if you use it with an equal sign, no data will return.
Page 36
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Is NULL
SELECT *
FROM student_table
WHERE class_code IS null ;
If you are looking for a row that holds a null value, you need to use ‘IS null.’ Using IS null will only bring back the
rows with a null value in the column.
Page 37
Licensed to , [email protected]
Chapter 2 The WHERE Clause
IS Not Null
SELECT *
FROM student_table
WHERE class_code IS not null ;
If you are looking for a row that does not hold a null value, you need to use ‘IS not null.’ Using IS not null will
only bring back the rows where the column value does not contain a null.
Page 38
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt >= 3.0 ;
Greater than
or Equal to
The WHERE Clause doesn’t just deal with ‘Equals,’ but other options too. These include GREATER or LESSER
THAN, along with GREATER/LESSER THAN or EQUAL to as well.
Page 39
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Notice the WHERE statement and the word AND. In this example, qualifying rows must have a class_code equal
to ‘FR’ and must also have a first_name of Henry.
Page 40
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Troubleshooting AND
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
No rows qualify. How can a
SELECT * student have two grade points?
FROM student_table
WHERE grade_pt = 3.0 AND grade_pt = 4.0;
What is going wrong here? You are using an AND checking the same column. What you are asking with this
syntax, is to see the rows that have BOTH a grade_pt of 3.0 and a 4.0. That is impossible, so that no rows returned.
Page 41
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt = 3.0
OR grade_pt = 4.0;
The above query brings back rows if the grade_pt is equal to 3.0 or 4.0.
Page 42
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Troubleshooting OR
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * SELECT *
FROM student_table FROM student_table
WHERE grade_pt = 3.0 OR 4.0; WHERE grade_pt = 3.0
OR grade_pt = 4.0;
This is an error
Perfect – OR must always
use the column name again.
The first example is invalid. It does not error but instead returns all rows from the table in the answer set. The
second example is the way to do it.
Page 43
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt = 3.0
OR class_code = 'JR' ;
The reason that the column name must be used on both sides of the OR clause if because you can use the same
column or different columns. The system doesn't know what you want to do unless you tell it.
Page 44
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT *
FROM student_table Error!!!
WHERE grade_pt = 3.0 Why?
AND class_code = SR ;
This query errors, but what is WRONG with this syntax? No single or double quotes around SR.
Page 45
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Notice that AND separates two different columns, and the data will come back if both are TRUE.
Page 46
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Page 47
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
The reason for two rows returning is because of the “order of precedence” in SQL. The next page will explain.
Page 48
Licensed to , [email protected]
Chapter 2 The WHERE Clause
1 ()
2 NOT
3 AND
4 OR
SELECT * FROM student_table
WHERE grade_pt = 4.0 OR grade_pt = 3.0
AND class_code = 'SR' ;
Syntax has an ORDER OF PRECEDENCE. It will first prioritize anything with parentheses around it. Then, the
system will process the NOT statements, followed by the AND statements. Finally, the system will process the OR
Statements. Look at the order of precedence, and you will see why the last query came out odd. Let’s fix it and
bring back only one row.
Page 49
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Using parenthesis is the proper coding technique. Only ONE row comes back because parentheses evaluate first.
Page 50
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table
The IN
WHERE grade_pt IN (3.0, 4.0) List
AND class_code = 'SR' ;
student_id last_name first_name class_code grade_pt
123250 Phillips Martin SR 3.00
Using an IN List also works to query for a grade_pt of 3.0 or 4.0 AND also have a class_code of ‘SR.’ Only ONE
row comes back here, as well.
Page 51
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Using an IN list is an excellent way to look for multiple values for a column.
Page 52
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search a
column for a list of values. Both queries above are equal, but the IN list is an excellent way to keep things
organized and straightforward.
Page 53
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Trim removes
leading and
SELECT * FROM student_table
trailing spaces WHERE TRIM(last_name) IN ('Larkins', 'Bond') ;
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
322133 Bond Jimmy JR 3.95
The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the
corresponding column for a list of values. An IN list works with character data as long as you use single quotes.
Notice that we have single quotes for 'Larkins' and 'Bond.'
Page 54
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0) ;
SELECT *
FROM student_table
WHERE NOT grade_pt IN (2.0, 3.0, 4.0) ;
You can also ask to see the results that are NOT IN your parameter list. That requires the column name and a NOT
IN. Neither the IN nor NOT IN can search for nulls!
Page 55
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0, null) ;
Few people know that when a NOT IN is used and a null value is present that no data returns. A NOT IN returns
no rows because a null value equals nothing, so it can't compare and eliminate values. The NOT IN issue with
null values is also true with NOT IN subqueries. If there is a null value returns from the bottom query, an IN has no
problems with null values, but a NOT IN returns no data. The next page will teach you a trick to get around this
problem.
Page 56
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM STUDENT_TABLE
WHERE GRADE_PT NOT IN (2.0, 3.0, 4.0, NULL) ;
Using an OR to bring back the rows with the null value is a great technique to include looking for a null value
when using a NOT IN List.
Page 57
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt NOT IN (2.0, 3.0, 4.0)
AND grade_pt IS not null ;
You should always use an AND clause to exclude the rows with the null value when using a NOT IN List. back the
rows with the null value is an excellent technique for looking for a null value when using a NOT IN List.
Page 58
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt BETWEEN 2.0 AND 4.0 ;
The example above is using a BETWEEN statement. What this allows you to do is see if a column falls in a range.
It is inclusive, meaning that in our example, we will be getting the rows that also have a 2.0 and 4.0 as the grade_pt
value.
Page 59
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT *
FROM student_table
WHERE grade_pt NOT BETWEEN 2.0 AND 4.0 ;
The example above is using a NOT BETWEEN statement. What this allows you to do is see if a column falls
outside of a range. It is inclusive, meaning that in our example, we will not be getting the rows that also have a 2.0
and 4.0 as the grade_pt value.
Page 60
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Proper case
SELECT *
values work
FROM student_table
WHERE last_name BETWEEN 'L' AND 'Lzzz' ;
The BETWEEN statement works with character data. You need to include single-quotes. If the case is not perfect,
no rows will return.
Page 61
Licensed to , [email protected]
Chapter 2 The WHERE Clause
The wildcard percentage sign (%) is a wildcard for any number of characters. We are looking for anyone whose
last name starts with Sm. Databricks insists you match the case. The first two examples return nothing because the
case is wrong. The final example returns rows because the name 'Smith' starts with a capital 'S,' and it is followed
by a lowercase 'm.'
Page 62
Licensed to , [email protected]
Chapter 2 The WHERE Clause
SELECT UPPER(first_name)
,LOWER(last_name)
FROM student_table
ORDER BY first_name
Up low
ANDY smith
DANNY delaney
HENRY hanson
JIMMY bond
MARTIN phillips
MICHAEL larkins
RICHARD mcroberts
STANLEY johnson
SUSIE wilson
WENDY thomas
When you use the UPPER command, the column value will be in all uppercase, and when you use the LOWER
command, the column value will be in all lowercase. Thus, the UPPER and LOWER functions are excellent for
WHERE clause comparisons or results with all UPPER or LOWER characters.
Page 63
Licensed to , [email protected]
Chapter 2 The WHERE Clause
When you use the UPPER command, the column value will be in all uppercase for the comparison. When you use
the LOWER command, the column value will be in all lowercase for the comparison. Notice that in both
examples, 'Smith' was returned because it met the criteria, but it did not come back in the answer set as upper or
lowercase.
Page 64
Licensed to , [email protected]
Chapter 2 The WHERE Clause
You can use the ILIKE command instead of the LIKE command to get around the case issues.
Page 65
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table Show me anyone with an 'a' as
WHERE last_name LIKE '_a%' ; the 2nd letter in their last_name
The underscore wildcard represents one character. Our search finds anyone that has an ‘a’ as the second letter in
the last name.
Page 66
Licensed to , [email protected]
Chapter 2 The WHERE Clause
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT * FROM student_table WHERE first_name LIKE '%y' ;
student_id last_name first_name class_code grade_pt
125634 Hanson Henry FR 2.88
322133 Bond Jimmy JR 3.95
324652 Delaney Danny SR 3.35
333450 Smith Andy SO 2.00
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
Above, our example finds anyone who has a first name that ends in a 'Y.' The data type of the first name is
varchar(12). The search works on the last name as well, which has a data type of CHAR(20).
Page 67
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Pretend student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00
999999 T_ S% FR 1.90
Here you will have to utilize a Wildcard Escape Character. Turn the page for more.
Page 68
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Pretend student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00
999999 T_ S% FR 1.90
SELECT * FROM student_table WHERE first_name LIKE 'S@%' Escape '@';
student_id last_name first_name class_code grade_pt
1 T_ S% FR 1.90
We can pick our Escape character, and we have chosen the @ sign. Anything following an @ sign turns the
wildcard off for one character, so we find ‘S%’, without bringing back Stanley or Susie.
Page 69
Licensed to , [email protected]
Chapter 2 The WHERE Clause
The RELACE function replaces a value for another in a string. Above, we have replaced the spaces in a Customer
Name with underscores. In the Phone Number, we have replaced the dashes (-) with space.
Page 70
Licensed to , [email protected]
Chapter 2 The WHERE Clause
Page 71
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
"A bird does not sing because it has the answers, it sings because it has a song."
- Anonymous
Page 72
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
class_code
SELECT Distinct class_code ? Distinct
FROM student_table FR won't
ORDER BY 1; SO repeat
JR duplicate
SR values
The DISTINCT keyword in the example above means to eliminate duplicate values.
Page 73
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
class_code
? Both
examples
FR
produce the
JR exact same
SO result
SR
The Distinct and GROUP BY command examples above return the same answer set.
Page 74
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
How many rows will come back from the above SQL?
Page 75
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
class_code grade_pt
? ?
FR 0.00 No Rows have
FR 2.88 the exact same
FR 4.00 values for both
JR 1.90 the class_code
JR 3.95 and grade_pt.
Each row is
SO 2.00 Distinct!
SO 3.80
SR 3.00
SR 3.35
How many rows will come back from the above SQL? 10. All rows came back. Why? Because there are no exact
duplicates that contain a duplicate class_code and Duplicate grade_pt combined. Each row in the SELECT list is
distinct.
Page 76
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
Top Command
STUDENT_TABLE
STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT TOP 3 last_name class_code grade_pt
last_name
Wilson SO 3.80
,class_code
Bond JR 3.95
,grade_pt Smith SO 2.00
FROM student_table
In the above example, we brought back three rows only. We brought back three rows because of the TOP 3
statement, which means to get an answer set and then bring back the first three rows in that answer set. Because
this example does not have an ORDER BY statement, you can consider this example as merely getting three
random rows.
Page 77
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
STUDENT_TABLE
STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
SELECT TOP 3 last_name, last_name class_code grade_pt
class_code, Thomas FR 4.00
grade_pt Bond JR 3.95
FROM student_table Wilson SO 3.80
ORDER BY grade_pt DESC
We are now returning the students with the top three grade point averages because we use the ORDER BY
statement. Databricks orders the data first and then uses the TOP command.
Page 78
Licensed to , [email protected]
Chapter 3 Distinct, Group By and Top
Page 79
Licensed to , [email protected]
Chapter 4 Aggregation
Chapter 4 – Aggregation
" Databricks climbed Aggregate Mountain and delivered a better way to Sum
It.” "
- Tera-Tom Coffing
Page 80
Licensed to , [email protected]
Chapter 4 Aggregation
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
What would the result set be from the above query? The next slide shows answers!
Page 81
Licensed to , [email protected]
Chapter 4 Aggregation
Pretend Aggregation_Table
SELECT
employee_no salary AVG(salary) as "AVG"
423400 100000.00 ,Count(salary) as SalCnt
423401 100000.00 ,Count(*) as RowCnt
423402 null FROM Aggregation_Table ;
3) You CAN’T mix Aggregates with normal columns without a GROUP BY.
Look at the pretend table and the query in yellow and calculate the answer in your mind. Remember that
aggregates ignore null values.
Page 82
Licensed to , [email protected]
Chapter 4 Aggregation
Pretend Aggregation_Table
SELECT
employee_no salary AVG(salary) as "AVG"
423400 100000.00 ,Count(salary) as SalCnt
423401 100000.00 ,Count(*) as RowCnt
423402 null FROM Aggregation_Table ;
3) You CAN’T mix Aggregates with normal columns without a GROUP BY.
Remember, aggregates ignore null values. Aggregates usually deliver answer sets that are only one row. You can
only have a non-aggregate with aggregates if you use a GROUP BY statement. The answers are above.
Page 83
Licensed to , [email protected]
Chapter 4 Aggregation
Page 84
Licensed to , [email protected]
Chapter 4 Aggregation
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
How many rows will the above query produce in the result set?
Page 85
Licensed to , [email protected]
Chapter 4 Aggregation
How many rows will the above query produce in the result set? The answer is one.
Page 86
Licensed to , [email protected]
Chapter 4 Aggregation
SELECT
CAST(SUM (salary) as DECIMAL(8,1)) as Sum
,CAST(AVG(salary) as DECIMAL(8,2)) as Avg
,Count(*) as Count
,CAST(MIN(salary) as DECIMAL(9,3)) as Min
,CAST(MAX(salary) as DECIMAL(10,4)) as Max
FROM employee_table ;
The CAST (Convert and Store) command is in the example above. CAST changes the data type for the life of the
query.
Page 87
Licensed to , [email protected]
Chapter 4 Aggregation
Troubleshooting Aggregates
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
If you have a regular column (nonaggregate) in your query, you must have a corresponding GROUP BY statement.
Page 88
Licensed to , [email protected]
Chapter 4 Aggregation
Group By dept_no command allows for the calculation of aggregates per dept_no. The data has also been sorted
with the ORDER BY statement. Notice we used the NULLS LAST command to put the null dept_no last in our
sorting.
Page 89
Licensed to , [email protected]
Chapter 4 Aggregation
Both queries above produce the same result. The GROUP BY allows you to either name the column or use the
number in the SELECT list, just like the ORDER BY statement. The only two commands that can use the column
number are the ORDER BY and GROUP BY statements.
Page 90
Licensed to , [email protected]
Chapter 4 Aggregation
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Will dept_no 300 be part of the calculation? Of course, you know it will NOT!
Page 91
Licensed to , [email protected]
Chapter 4 Aggregation
Alias
SELECT dept_no, MIN (salary) min, MAX (salary) max,
SUM (salary) sum ,AVG (salary) avg , COUNT(*) count
FROM employee_table WHERE Clause acts
WHERE dept_no IN (200, 400) as a filter before any
GROUP BY dept_no Calculations are done
ORDER BY 1 ;
The system eliminates reading any other dept_no values other than 200 and 400. Reducing values means that only
dept_no values of 200 and 400 will come off the disk for the calculation.
Page 92
Licensed to , [email protected]
Chapter 4 Aggregation
The HAVING Clause only works on Aggregate Totals. The WHERE filters rows to be excluded from the
calculation, but the HAVING filters the Aggregate totals after the calculations, thus eliminating individual
Aggregate totals.
Page 93
Licensed to , [email protected]
Chapter 4 Aggregation
The HAVING Clause only works on Aggregate Totals after they are totaled. It is a final check after aggregation is
complete. Now only the totals with Count (*) > two can return.
Page 94
Licensed to , [email protected]
Chapter 4 Aggregation
ANY_VALUE
Any_Value returns some value of the expression from the group. The result is non-deterministic. ANY_VALUE
simplifies and optimizes the performance of GROUP BY statements. The problem with aggregation is all non-
aggregate columns must be included in a GROUP BY statement. Any_Value eliminates the need to GROUP BY
all columns.
Page 95
Licensed to , [email protected]
Chapter 4 Aggregation
SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY GROUPING SETS (product_id, mth, yr)
ORDER BY product_id, mth, yr;
Be prepared to be amazed. There are three advanced options listed above for grouping data. Each is more powerful
than the one before. The following pages will give great examples. GROUP BY GROUPING Sets above will show
you the DAILY_SALES for each PRODUCT_ID, each month, and year. It is like three separate reports in one.
Page 96
Licensed to , [email protected]
Chapter 4 Aggregation
GROUP BY ROLLUP
SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY ROLLUP (product_id, mth, yr)
ORDER BY product_id, mth, yr;
Check out the answer set and explanation on the next page.
Page 97
Licensed to , [email protected]
Chapter 4 Aggregation
GROUP BY ROLLUP displays the DAILY_SALES for each PRODUCT_ID, distinct month, month per year, each
year, plus a total. Above, we have the answer set.
Page 98
Licensed to , [email protected]
Chapter 4 Aggregation
GROUP BY CUBE
SELECT product_id
,EXTRACT(MONTH FROM sale_date) AS mth
,EXTRACT(YEAR FROM sale_date) AS yr
,SUM(daily_sales) AS sum_daily_sales
FROM sales_table
GROUP BY CUBE (product_id, mth, yr)
ORDER BY product_id, mth, yr;
GROUP BY Cube displays the DAILY_SALES for each PRODUCT_ID, distinct month, month per year, year,
plus a total.
Page 99
Licensed to , [email protected]
Chapter 4 Aggregation
Above, we have the answer set. GROUP BY Cube displays the DAILY_SALES for each PRODUCT_ID, distinct
month, month per year, each year, plus a total.
Page 100
Licensed to , [email protected]
Chapter 4 Aggregation
Page 101
Licensed to , [email protected]
Chapter 5 Joining Tables
"The man who doesn't read Tera-Tom books has no advantage over the man
who can't read them."
- Mark Twain
Page 102
Licensed to , [email protected]
Chapter 5 Joining Tables
Why write SQL when Nexus automatically does it for you? Plus, you can edit the SQL if you desire. Watch the
YouTube video of the Nexus Super Join Builder with this link: https://youtu.be/ARwo9prucw0.
Page 103
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
A Join combines columns on the report from more than one table. The example above joins the customer_table
and the order_table together. The most complicated part of any join is the JOIN CONDITION.
Page 104
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
SELECT
The column Cust.customer_number
customer_number ,customer_name We alias the table
is in both tables. ,order_number names to shorten
It must be ,order_total the typing when
fully qualified, fully qualifying a
FROM customer_table as Cust,
or the column.
query will error.
order_table as ORD
WHERE Cust.customer_number = Ord.customer_number
A Join combines columns on the report from more than one table. The example above joins the customer_table
and the order_table together. The most complicated part of any join is the JOIN CONDITION. The JOIN
CONDITION means what Column from each table is a match. In this case, customer_number is a match that
establishes the relationship.
Page 105
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
The customer_number is a column in both the Customer and Order Tables. Cust.customer_number fully qualifies
the column is in the customer_table. That is why we alias the table names, so we can fully qualify any columns in
both tables with minimal typing.
Page 106
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
SELECT Cust.customer_number,
customer_name,
order_number,
order_total
FROM customer_table as Cust INNER JOIN
INNER JOIN Keyword
ON Keyword replaces
is used order_table as ORD
the comma
instead of ON Cust.customer_number
WHERE = Ord.customer_number ;
The example above is the same join as the previous example, except it is using ANSI syntax. Both traditional and
ANS syntax return the same rows with the same performance. Rows join when the customer_number matches on
both tables, but non-matches won’t return.
Page 107
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Both syntax techniques bring back the same result set and have the same performance. The INNER JOIN is
considered ANSI. Which one does an Outer Join?
Page 108
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Finish this join by placing the missing SQL in the proper place!
Page 109
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00 Primary
1256349 400 Harrison Herbert 54500.00 Key
Foreign Key
Page 110
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON E.dept_no = D.dept_no ;
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
The column dept_no ,last_name
is in both tables. It
,E.dept_no
needs to be
fully qualified as ,department_name
E.dept_no or FROM employee_table as E
D.dept_no INNER JOIN
department_table as D
ON E.dept_no = D.dept_no ;
If a column in the SELECT list is in both tables, you must fully qualify it.
Page 112
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON E.dept_no = D.dept_no ;
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
The column dept_no ,last_name
is in both tables. It
,E.dept_no
needs to be
fully qualified as ,department_name
E.dept_no or FROM employee_table as E
D.dept_no INNER JOIN
department_table as D
ON E.dept_no = D.dept_no ;
If a column in the SELECT list is in both tables, you must fully qualify it.
Page 114
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,E.dept_no
,department_name
FROM employee_table as E Can you
INNER JOIN find
the error?
department_table as D
ON employee_table.dept_no = D.dept_no ;
Page 115
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Once you alias a table, you must fully qualify columns with the table alias. The system thinks there are additional
tables, and the query will error.
Page 116
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT E.first_name
,E.last_name This inner join will
,D.department_name return all rows that
have a matching
FROM employee_table as E
dept_no in both tables.
INNER JOIN Which rows won't
department_table as D return?
ON E.dept_no = D.dept_no ;
An Inner Join returns matching rows, but did you know an Outer Join returns both matching rows and non-
matching rows? You will understand soon! What rows above are not part of the answer set?
Page 117
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT E.first_name
,E.last_name 1 Squiggy Jones has a null dept_no
,D.department_name
FROM employee_table as E 2 Richard Smythe has an invalid dept_no 10
INNER JOIN
department_table as D 3 No employees work in Department 500
ON E.dept_no = D.dept_no ;
The bottom line is that the three rows excluded do not have a matching dept_no.
Page 118
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT E.first_name
1st Table ,D.department_name Since we are doing a
after FROM FROM employee_table as E Left Outer Join, the
is always the employee_table is
LEFT OUTER JOIN
LEFT Table referred to as the
department_table as D
ON E.dept_no = D.dept_no ; outer table.
The SQL above is a LEFT OUTER JOIN. That means that all rows from the LEFT table will appear in the report
regardless of if it finds a match on the right table.
Page 119
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name department_name
Mandee Marketing The matching
SELECT E.first_name Herbert Customer Support rows return
,D.department_name William Customer Support just like an
Loraine Sales
FROM employee_table as E Nulls show
inner join, but
Squiggy ? orphaned
LEFT OUTER JOIN Richard ? mismatches
department_table as D rows from the
Cletus Customer Support Left table
ON E.dept_no = D.dept_no ; Billy Research and Dev also return.
John Research and Dev
A LEFT Outer Join Returns all rows from the LEFT table, including all Matches. If a LEFT row can’t find a
match, null is placed on the right columns not found!
Page 120
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The SQL above is a RIGHT OUTER JOIN. That means that all rows from the RIGHT table will appear in the
report regardless of if it finds a match with the LEFT Table.
Page 121
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name department_name
SELECT E.first_name The matching
Mandee Marketing
,D.department_name Herbert Customer Support rows return
FROM employee_table as E William Customer Support just like an
RIGHT OUTER JOIN Loraine Sales inner join, but
department_table as D Cletus Customer Support orphaned
Billy Research and Dev rows from the
ON E.dept_no = D.dept_no ; Right table
John Research and Dev
Nulls show mismatches ? Human Resources also return.
All rows from the Right Table return. The rows with matches and dept_no 500 because it was in the right table but
didn’t have a match. The system puts a null Value for Left Column values.
Page 122
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT E.first_name
,D.department_name Since we are doing a
FROM employee_table as E Full Outer Join, both
FULL OUTER JOIN tables are referred to
department_table as D as the outer table.
ON E.dept_no = D.dept_no ;
The SQL above is a Full Outer Join. That means that all rows from both the RIGHT and LEFT Table will appear in
the report regardless of if it finds a match.
Page 123
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name department_name
SELECT E.first_name Mandee Marketing
,D.department_name Herbert Customer Support
FROM employee_table as E William Customer Support
Loraine Sales
FULL OUTER JOIN Squiggy ?
department_table as D Richard ?
ON E.dept_no = D.dept_no ; Cletus Customer Support
Billy Research and Dev
All rows return from both tables
John Research and Dev
on a Full Outer Join
? Human Resources
The FULL Outer Join Returns all rows from both Tables. The nulls show the flaws!
Page 124
Licensed to , [email protected]
Chapter 5 Joining Tables
Your mission is to show which tables are left tables and which ones are right tables.
Page 125
Licensed to , [email protected]
Chapter 5 Joining Tables
Answer - Which Tables are Left Tables and Which are Right?
The first table is always the left table, and all remaining tables are the right tables. It is the intermediate results
from each join that remain the left table. In this case, all rows will return from the claims table.
Page 126
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,department_name
FROM employee_table as E,
department_table as D
WHERE E.dept_no = D.dept_no
AND department_name like 'Marke%' ;
The additional AND performs first to eliminate unwanted data, so the join is less intensive than joining everything
first and then removing rows that don't qualify.
Page 127
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The additional AND performs first to eliminate unwanted data, so the join is less intensive than joining everything
first and then removing after.
Page 128
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,department_name
FROM employee_table as E
INNER JOIN
department_table as D
ON E.dept_no = D.dept_no
WHERE department_name like 'Marke%' ;
The additional WHERE is performed first to eliminate unwanted data, so the join is less intensive than joining
everything first and then eliminating.
Page 129
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name,
first_name department_name
department_name
FROM employee_table as E Mandee Marketing
LEFT OUTER JOIN
department_table as D
ON E.dept_no = D.dept_no Only Mandee Chambers
WHERE E.dept_no = 100 ; is in dept_no 100
The additional WHERE is always performed last on Outer Joins. All rows join first, and then the WHERE clause
filters after the join takes place.
Page 130
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
first_name dname
OUTER Join with additional AND Clause
Mandee Marketing
SELECT first_name Herbert ?
,department_name AS dname William ?
FROM employee_table as E Loraine ?
LEFT OUTER JOIN Squiggy ?
Richard ?
department_table as D
Cletus ?
ON E.dept_no = D.dept_no Billy ?
AND E.dept_no = 100 ; John ?
The additional AND performs in conjunction with the ON statement on Outer Joins. Only Mandee is in dept_no
100!
Page 131
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT first_name
,last_name
,department_name
No Join
Condition FROM employee_table as E,
Linking the department_table as D
Two Tables! WHERE department_name like '%m%'
Order by 1, 2, 3;
A Product Join is often a mistake! Three Department rows have an ‘m’ in their department_name. These join to
every employee, and the information is worthless.
Page 132
Licensed to , [email protected]
Chapter 5 Joining Tables
SELECT first_name
,last_name
,department_name
No Join
Condition FROM employee_table as E,
Linking the department_table as D
Two Tables! WHERE department_name ilike '%m%'
Order by 1, 2, 3;
A Product Join is often a mistake! Three Department rows had the letter ‘m’ in their department_name. These join
to every employee and the information is worthless.
Page 133
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The SQL above joins every row from one table to every row of another table. Nine rows multiplied by five rows =
45 rows of complete nonsense!
Page 134
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Page 135
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
A Cross Join
SELECT customer_name,
is the ANSI order_number
equivalent to FROM customer_table
a Product Join CROSS JOIN
order_table
Only a WHERE clause
WHERE order_number = 123456
will work.
An ON clause will NOT! ORDER BY 1 ;
This query becomes a Product Join because a Cross Join is an ANSI Product Join. It will compare every row from
the customer_table to order_number 123456 in the order_table. Check out the Answer Set on the next page.
Page 136
Licensed to , [email protected]
Chapter 5 Joining Tables
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Answer Set
SELECT customer_name, customer_name order_number
order_number
ACE Consulting 123456
FROM customer_table
Acme Products 123456
CROSS JOIN
Billy's Best Choice 123456
order_table
Databases N-U 123456
WHERE order_number = 123456
XYZ Plumbing 123456
ORDER BY 1 ;
This Cross Join produces information that quite often isn’t worth anything!
Page 137
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table2
employee_no dept_no last_name first_name salary Mgr
1232578 100 Chambers Mandee 48850.00 Y
1256349 400 Harrison Herbert 54500.00 N
2341218 400 Reilly William 36000.00 Y
1121334 400 Strickling Cletus 54500.00 N
2312225 300 Larkins Loraine 40200.00 Y
2000000 ? Jones Squiggy 32800.50 N
1000234 10 Smythe Richard 32800.00 N
1324657 200 Coffing Billy 41888.88 N
1333454 200 Smith John 48000.00 Y
SELECT Mgrs.dept_no
, Mgrs.last_name as mgrname Which
, Mgrs.salary as mgrsal Workers
, emps.last_name as empname make a
, emps.salary as empsal bigger
FROM employee_table2 as emps, salary than
employee_table2 as Mgrs their
WHERE emps.dept_no = Mgrs.dept_no Manager?
AND Mgrs.mgr = 'Y' AND emps.salary > Mgrs.salary ;
A Self Join gives itself two different Aliases, which makes a copy to produce two separate tables.
Page 138
Licensed to , [email protected]
Chapter 5 Joining Tables
employee_table2
employee_no dept_no last_name first_name salary Mgr
1232578 100 Chambers Mandee 48850.00 Y
1256349 400 Harrison Herbert 54500.00 N
2341218 400 Reilly William 36000.00 Y
1121334 400 Strickling Cletus 54500.00 N
2312225 300 Larkins Loraine 40200.00 Y
2000000 ? Jones Squiggy 32800.50 N
1000234 10 Smythe Richard 32800.00 N
1324657 200 Coffing Billy 41888.88 N
1333454 200 Smith John 48000.00 Y
SELECT Mgrs.dept_no dept_no mgrname mgrsal empname empsal
, Mgrs.last_name as mgrname 400 Reilly 36000.00 Harrison 54500.00
, Mgrs.salary as mgrsal 400 Reilly 36000.00 Strickling 54500.00
, emps.last_name as empname
, emps.salary as empsal
FROM employee_table2 as emps Only these two employees are
INNER JOIN employee_table2 as Mgrs making more than their manager.
ON emps.dept_no = Mgrs.dept_no
WHERE Mgrs.mgr = 'Y' AND emps.salary > Mgrs.salary
A Self Join gives itself two different aliases for its table name. The join performs as if there were two separate
tables. The query asks, “which workers make a bigger salary than their manager?”
Page 139
Licensed to , [email protected]
Chapter 5 Joining Tables
Associative course_table
course_id course_name credits seats
Table
100 Database Concepts 3 50
200 Introduction to SQL 3 20
student_course_table 210 Advanced SQL 3 22
220 V2R3 SQL Features 2 25
student_id course_id 300 Physical Database Design 4 20
280023 210 400 Database Administration 4 16
231222 210
125634 100 student_table
231222 220 student_id last_name first_name class_code grade_pt
125634 200 423400 Larkins Michael FR 0.00
322133 220 231222 Wilson Susie SO 3.80
125634 220 280023 McRoberts Richard JR 1.90
322133 300 322133 Bond Jimmy JR 3.95
324652 200 125634 Hanson Henry FR 2.88
333450 500 333450 Smith Andy SO 2.00
260000 400 324652 Delaney Danny SR 3.35
333450 400 260000 Johnson Stanley ? ?
234121 100 234121 Thomas Wendy FR 4.00
123250 100 123250 Phillips Martin SR 3.00
Page 140
Licensed to , [email protected]
Chapter 5 Joining Tables
Associative course_table
course_id course_name credits seats
Table
100 Database Concepts 3 50
200 Introduction to SQL 3 20
student_course_table 210 Advanced SQL 3 22
220 V2R3 SQL Features 2 25
student_id course_id 300 Physical Database Design 4 20
280023 210 400 Database Administration 4 16
231222 210
125634 100 student_table
231222 220 student_id last_name first_name class_code grade_pt
125634 200 423400 Larkins Michael FR 0.00
322133 220 231222 Wilson Susie SO 3.80
125634 220 280023 McRoberts Richard JR 1.90
322133 300 322133 Bond Jimmy JR 3.95
324652 200 125634 Hanson Henry FR 2.88
333450 500 333450 Smith Andy SO 2.00
260000 400 324652 Delaney Danny SR 3.35
333450 400 260000 Johnson Stanley ? ?
234121 100 234121 Thomas Wendy FR 4.00
123250 100 123250 Phillips Martin SR 3.00
SELECT ALL Columns from the course_table and student_table and Join them.
Page 141
Licensed to , [email protected]
Chapter 5 Joining Tables
student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
Notice the *
technique of
SELECT S.*, C.* getting ALL
FROM student_table as S, columns from
course_table as C, both tables!
student_course_table as SC
Where S.student_id = SC.student_id
AND C.course_id = SC.course_id ;
The Associative Table is a bridge between the course_table and student_table, and its sole purpose is to join these
two tables together.
Page 142
Licensed to , [email protected]
Chapter 5 Joining Tables
Quiz – Can you Write the 3-Table Join Using ANSI Syntax?
student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
SELECT S.*, C.*
FROM student_table as S,
course_table as C,
student_course_table as SC
Where S.student_id = SC.student_id
AND C.course_id = SC.course_id ;
Page 143
Licensed to , [email protected]
Chapter 5 Joining Tables
student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
ANSI Syntax
Select S.*, C.*
Traditional Syntax
FROM student_table as S
SELECT S.*, C.* INNER JOIN
FROM student_table as S, student_course_table as SC
course_table as C, ON S.student_id = SC.student_id
student_course_table as SC INNER JOIN
Where S.student_id = SC.student_id course_table as C
AND C.course_id = SC.course_id ; ON C.course_id = SC.course_id;
Here are two examples of performing the join using traditional vs. ANSI syntax.
Page 144
Licensed to , [email protected]
Chapter 5 Joining Tables
student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
ANSI Syntax
Select S.*, C.*
From student_table as S Can you rewrite
INNER JOIN this and place
all the ON
student_course_table as SC
clauses at the
ON S.student_id = SC.student_id end?
INNER JOIN
course_table as C
ON C.course_id = SC.course_id;
Please re-write the above query and place both ON Clauses at the end.
Page 145
Licensed to , [email protected]
Chapter 5 Joining Tables
student_course_table
student_table course_table
student_id student_id course_id course_id
last_name course_name
first_name credits
class_code seats
grade_pt
The example above is complicated. Most people have never seen this technique. The only way it works is to place
the ON clauses backward. The first ON Clause represents the last INNER JOIN and then moves back.
Page 146
Licensed to , [email protected]
Chapter 5 Joining Tables
services
service_code claim_service
providers
provider_code provider_no
Above is the logical model for the insurance tables showing the Primary Key and Foreign Key relationships
(PK/FK).
Page 147
Licensed to , [email protected]
Chapter 5 Joining Tables
services
service_code claim_service
providers
provider_code provider_no
Your mission is to write a five-table join selecting all columns using ANSI syntax.
Page 148
Licensed to , [email protected]
Chapter 5 Joining Tables
Above is the example of writing this five table join using ANSI syntax.
Page 149
Licensed to , [email protected]
Chapter 5 Joining Tables
services
service_code claim_service
providers
provider_code provider_no
Your mission is to write a five-table join selecting all columns using traditional join syntax.
Page 150
Licensed to , [email protected]
Chapter 5 Joining Tables
Above is the example of writing this five-table join using traditional join syntax.
Page 151
Licensed to , [email protected]
Chapter 5 Joining Tables
Above is an example of writing this five-table join using ANSI syntax, but can you place the ON clauses at the end
of the SQL.
Page 152
Licensed to , [email protected]
Chapter 5 Joining Tables
Above is an example of writing this five-table join using ANSI syntax with the ON clauses at the end. Also, to
make this happen, we moved the tables around. Notice that the first ON clause represents the last two tables
joining, and then it works backward.
Page 153
Licensed to , [email protected]
Chapter 5 Joining Tables
Page 154
Licensed to , [email protected]
Chapter 6 Date Functions
- Chinese Proverb
Page 155
Licensed to , [email protected]
Chapter 6 Date Functions
Can you imagine a query tool with over 200 ETL utilities built inside that allows anyone to migrate every database
to Databricks and migrate Databricks to every database. Watch the Nexus migrate entire databases between
systems with the click of the mouse in this YouTube video: https://www.youtube.com/watch?v=0JQn134tzio
Page 156
Licensed to , [email protected]
Chapter 6 Date Functions
Current_Date
YYYY-MM-DD
Year Month Day
Page 157
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
current_date as ansi_date
,curdate() as date_function
,current_timestamp as ansi_timestamp
,current_timezone() as my_timezone;
Unlocking the Power of Databricks: Discover Essential Keywords for Date, Timestamp, and Timezone Retrieval.
These keywords are reserved by Databricks and are ready at your fingertips when you need them.
Page 158
Licensed to , [email protected]
Chapter 6 Date Functions
Now() Function
This SQL query in Databricks retrieves two timestamps. It assigns them aliases: current_timestamp as timestamp1:
This part of the query retrieves the current timestamp from the system clock and assigns it the alias "timestamp1."
The current_timestamp function returns the current date and time when the query is executed. now() as now1: This
part of the query also retrieves the current timestamp, but it uses the now() function instead of current_timestamp.
It assigns this timestamp the alias "now1." The now() function serves the same purpose as the current_timestamp in
most databases, returning the current date and time. Both timestamps are included in the query result, with their
respective aliases, making it easy to reference them in subsequent calculations or for display purposes.
Page 159
Licensed to , [email protected]
Chapter 6 Date Functions
Add or subtract a number from a date, and you are adding or subtracting days.
Page 160
Licensed to , [email protected]
Chapter 6 Date Functions
Date Function
SELECT
CURRENT_TIMESTAMP
,CAST (CURRENT_TIMESTAMP as date) as date_ts
,DATE(CURRENT_TIMESTAMP ) as date_fn
,DATE ('2023-12-31') as eoy
Page 161
Licensed to , [email protected]
Chapter 6 Date Functions
To_Date Function
SELECT
CURRENT_TIMESTAMP
,CAST (CURRENT_TIMESTAMP as date) as date_ts
,TO_DATE(CURRENT_TIMESTAMP ) as date_fn
,TO_DATE ('2023-12-31') as eoy
,TO_DATE(CURRENT_TIMESTAMP, 'yyyy-MM-dd') as formatted;
Page 162
Licensed to , [email protected]
Chapter 6 Date Functions
To_Timestamp Function
SELECT
CURRENT_TIMESTAMP
,CURRENT_DATE as today
,TO_TIMESTAMP(CURRENT_DATE ) as timestamp_fn
,TO_TIMESTAMP(CURRENT_DATE, 'yyyy-MM-dd') as formatted;
The to_timestamp function casts the parameters as a timestamp data type. If the format (fmt) is supplied, it must
conform with Datetime patterns, but if the format (fmt) is not supplied, the function is a synonym for cast(expr AS
TIMESTAMP).
Page 163
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT order_date
,order_date + 60 as "Due Date"
,order_total
,order_date + 50 as discount
,order_total *.98 as disc_price
FROM order_table
ORDER BY 1 ;
When you add or subtract from a Date you are adding/subtracting Days
If you add or subtract a number from a date, it adds or subtracts several days from the date.
Page 164
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
order_date When you subtract
,current_date as today between two dates
,current_date - order_date as day_diff you get the number
FROM order_table of days between
ORDER BY order_date
When you subtract between two dates, you get the approximate number of days between those dates.
Page 165
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
order_date
,current_date as today
,current_date - order_date as day_diff
,now() - order_date as day_and_time_diff
FROM order_table
ORDER BY 1
When you subtract between two dates, you get the approximate number of days between those dates.
Page 166
Licensed to , [email protected]
Chapter 6 Date Functions
MONTHS_BETWEEN
SELECT
order_date
,current_date
,MONTHS_BETWEEN(current_date, order_date) as month_diff
FROM order_table
ORDER BY 1
The MONTHS_BETWEEN command displays the number of months between two dates. The inputs may have
DATE, TIMESTAMP, or TIMESTAMPTZ data types. If one of the inputs is null, the result is null.
Page 167
Licensed to , [email protected]
Chapter 6 Date Functions
Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84
SELECT Order_Date
,Add_Months (Order_Date,2) as due_date
,Order_Total
FROM Order_Table ORDER BY 1 ;
order_date due_date order_total
2020-05-04 2020-07-04 12347.53
2021-01-01 2021-03-01 8005.91
2021-09-09 2021-11-09 23454.84
2021-10-01 2021-12-01 5111.47
2021-10-10 2021-12-10 15231.62
The example above uses the Add_Months Command. What you can do with it is add a month or many months to
your date columns. Can you convert this to one year? There is no ADD_YEAR command!
Page 168
Licensed to , [email protected]
Chapter 6 Date Functions
Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84
SELECT order_date
,Add_Months (order_date,12) as Due_Date12
,order_total
FROM order_table ORDER BY 1 ;
Add
order_date Due_Date12 order_total one year
1998-05-04 1999-05-04 12347.53 (12 months)
1999-01-01 2000-01-01 8005.91
1999-01-10 2000-01-10 15231.62
1999-09-09 2000-09-09 23454.84
1999-10-01 2000-10-01 5111.47
The Add_Months command adds months to any date. Above, we used a great technique that would give us one
year. Can you give me five years?
Page 169
Licensed to , [email protected]
Chapter 6 Date Functions
Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84
SELECT order_date
,Add_Months (order_date,12 * 5) as Due_5_Years
,order_total
FROM order_table Add five years
60 works, as well.
ORDER BY 1 ;
order_date Due_5_Years order_total
2020-05-04 2025-05-04 12347.53
2021-01-01 2026-01-01 8005.91
2021-09-09 2026-09-09 23454.84
2021-10-01 2026-10-01 5111.47
2021-10-10 2026-10-10 15231.62
Above, you see a great technique for adding multiple years to the date.
Page 170
Licensed to , [email protected]
Chapter 6 Date Functions
Page 171
Licensed to , [email protected]
Chapter 6 Date Functions
Order_Table
Order_Number Customer_Number Order_Date Order_Total
123456 11111111 05/04/1998 12347.53
123512 11111111 01/01/1999 8005.91
123552 31323134 10/01/1999 5111.47
123585 87323456 10/10/1999 15231.62
123777 57896883 09/09/1999 23454.84
order_date order_total
2021-09-09 23454.84
Page 172
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT current_date
,EXTRACT(Year from current_date) as yr
,EXTRACT(Month from current_date) as mo
,EXTRACT(Day from current_date) as da
Answer Set
EXPR_1 yr mo da
2023-07-11 2023 7 11
The above examples show how robust the EXTRACT command is at extracting portions of a date, time, or
timestamp.
Page 173
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
current_date as now
,DAY(current_date) as day
,DAYOFMONTH(current_date) as day2
,DAYOFWEEK(current_date) as dow -- 1 = Sunday
,DAYOFYEAR(current_date) as doy
,MONTH(current_date) as month
,YEAR(current_date) as year
The DAY, DAYOFMONTH, MONTH and YEAR commands are implicit extract statements, but the
DAYOFWEEK and DAYOFYEAR perform a calculation. The DAYOFWEEK returns an integer where one
represents Sunday and two is for Monday.
Page 174
Licensed to , [email protected]
Chapter 6 Date Functions
Use the EXTRACT function (combined with CASE, CAST, and CONCATENATION) to retrieve date and month
and reformat them in the date format of mm/yyyy. The double pipe symbols perform the concatenation.
Page 175
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT order_date,
SUBSTRING (Cast(order_date as CHAR(10)) FROM 6 for 2)
|| '/' ||
SUBSTRING (CAST(order_date as CHAR(10)) FROM 1 for 4) AS mmyyyy
FROM Order_Table
ORDER BY 1, 2
order_date mmyyyy
2020-05-04 05/2020
2021-01-01 01/2021
2021-09-09 09/2021
2021-10-01 10/2021
2021-10-10 10/2021
Use the CAST, SUBSTRING, and CONCATENATION) to retrieve date and month and reformat them in the date
format of mm/yyyy. The concatenation is performed by the double pipe symbols.
Page 176
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT current_timestamp
,date_part('day', current_timestamp) as day
,date_part('minute', current_timestamp) as minute
,date_part('second', current_timestamp) as second
,date_part('quarter', current_timestamp) as quarter
,date_part('hour', interval '4 hours 1 minutes') as interval ;
The valid text names are century, day, decade, dow, doy, epoch, hour,
isodow, isoyear, microseconds, millennium, milliseconds, minute, month,
quarter, second, timezone, timezone_hour, timezone_minute, week, year.
The above examples use the date_part function to get the subfields. The text parameter needs to be a string value,
not a name, so don't forget your single quotes.
Page 177
Licensed to , [email protected]
Chapter 6 Date Functions
Date_Format Function
SELECT
current_date as today
,date_format(current_date, 'y') as yr -- year
,date_format(current_date, 'M') as mo -- month
,date_format(current_date, 'd') as day -- day of the month
,date_format(current_date, 'D') as day_of_yr -- day of the year
,date_format(current_date, 'E') as day_of_wk -- day of the week
,date_format(current_date, 'a') as am_pm -- am or pm
,date_format(current_date, 'q') as qtr -- quarter of the year
,date_format(current_date, 'G') as era -- era BC or AD
The date_format function will format a DATE, TIMESTAMP, or a STRING in a valid datetime format.
Page 178
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
current_date as today -- default format
,date_format(current_date, 'd-MM-yyyy') as dmy -- dmy format
,date_format(current_date, 'MMM') as mo_3 -- month abbrev
,date_format(current_date, 'MMMM') as mo_full -- month spelled
,date_format(current_date, 'MMMM d') as mo_day -- month and day
,date_format(current_date, 'E MMM d, yyyy') doy_mo_d_y -- full formatting
The date_format function will format a DATE, TIMESTAMP, or a STRING in a valid datetime format.
Page 179
Licensed to , [email protected]
Chapter 6 Date Functions
Datediff Example
SELECT unit
DATEDIFF (month, '2014-01-01', '2023-01-01') as mo_bet { MICROSECOND |
MILLISECOND |
,DATEDIFF (year, '2014-01-01', current_date) as years
SECOND |
,DATEDIFF (quarter, '2014-01-01', current_date) as quarters MINUTE |
,DATEDIFF (hour, '2014-01-01', current_date) as hours HOUR |
,DATEDIFF (minute, '2014-01-01', current_date) as minutes DAY |
,DATEDIFF (second, '2014-01-01', current_date) as seconds WEEK |
MONTH |
QUARTER |
YEAR }
This function uses a datepart (day, week, month, etc.) and two target expressions. This function returns the
difference between the two expressions. The expressions must be a date or timestamp expressions, and they must
both contain the specified datepart. If the second date is later than the first date, the result is positive. If the second
date is earlier than the first date, the result is negative.
Page 180
Licensed to , [email protected]
Chapter 6 Date Functions
Dateadd
The Dateadd command adds a specified interval of time to a date or timestamp value. We are casting to a date so
not to get a timestamp in the result.
Page 181
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT CURRENT_TIMESTAMP
,DATEADD(HOUR,2,CURRENT_TIMESTAMP()) twohours;
current_timestamp() twohours
2023-07-12 4:53:58.743000 2023-07-12 6:53:58.743000
SELECT CURRENT_TIMESTAMP
,DATEADD(MINUTE,2,CURRENT_TIMESTAMP()) twomin;
current_timestamp() twomin
2023-07-12 4:55:30.669000 2023-07-12 4:57:30.669000
SELECT CURRENT_TIMESTAMP
,DATEADD(SECOND,2,CURRENT_TIMESTAMP()) twosec;
current_timestamp() twosec
2023-07-12 4:56:23.067000 2023-07-12 4:56:25.067000
The Dateadd command adds a specified time interval to a date or timestamp value.
Page 182
Licensed to , [email protected]
Chapter 6 Date Functions
Date_Sub Function
date_sub(startDate, numDays)
SELECT
current_date
,date_sub(current_date, 1) as yesterday
,date_sub(current_date, 30) as month_ago
,date_sub(current_date, -30) as month_forward
The date_sub function returns the date numDays before the startDate. If numDays is negative abs(num_days) are
added to startDate. If the result date overflows the date range the function raises an error. In our example above,
notice the month_forward uses a negative number for numDays and the result date is one month forward.
Page 183
Licensed to , [email protected]
Chapter 6 Date Functions
The above examples use the Date_Trunc function to get the subfields. The text parameter needs to be a string
value, not a name, so don't forget your single quotes. Date_Trunc selects to which precision to truncate the input
value. The return value is of type timestamp or interval.
Page 184
Licensed to , [email protected]
Chapter 6 Date Functions
The date_trunc command will set the hour to the top of the hour. It will set the minute to the top of the minute, and
it will set the seconds to the top of the seconds.
Page 185
Licensed to , [email protected]
Chapter 6 Date Functions
The date_trunc command will set the date to the 1st day of the year when using the interval ‘Year.’ Date_trunc
will set to the first day of the month for the interval ‘Month.’ It will set the time to midnight for the interval ‘Day.’
Page 186
Licensed to , [email protected]
Chapter 6 Date Functions
Last_Day
today last_day
2023-07-11 2023-07-31
The Last_Day command returns the last day of the month, based on a given date or timestamp.
Page 187
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
Current_Date as today
,(Current_Date - EXTRACT(DAY FROM Current_Date)) + 1 as first_day_of_mth
,cast((Current_Date + INTERVAL '1' MONTH) - EXTRACT
(DAY FROM ADD_MONTHS(Current_Date,1)) as date) as last_day_of_mth
,(Current_Date - EXTRACT(DAY FROM Current_Date)) as last_day_prev_mth
The example below displays the dates of the current_date, the first day of the month, the last day of the month, and
the last day of the previous month.
Page 188
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT
Current_Date as today
,(Current_Date - EXTRACT(DAY FROM Current_Date)) + 1 as first_day
, last_day(current_date) as last_day
,(Current_Date - EXTRACT(DAY FROM Current_Date)) as last_day_prev_mth
, last_day(current_date) +1 as first_day_next_mth
The example below displays the dates of the current_date, the first day of the month, the last day of the month, the
last day of the previous month, and the first day of the next month.
Page 189
Licensed to , [email protected]
Chapter 6 Date Functions
Make_Date
The Make_Date command will make a DATE value from three integer values representing the year, month, and
day.
Page 190
Licensed to , [email protected]
Chapter 6 Date Functions
Make_Timestamp
long_time_ago this_year
0001-02-03 4:05:06.000000 2023-12-22 10:09:00.000000
The Make_Timestamp command makes a complete TIMESTAMP value from a series of datetime values. The
year, month, day, hour, and minute values are integers. The second value is a double-precision number.
Page 191
Licensed to , [email protected]
Chapter 6 Date Functions
The examples above show you how to use day, month, and year intervals.
Page 192
Licensed to , [email protected]
Chapter 6 Date Functions
our_date leap_year
2012-01-29 2012-02-29
our_date no_leap_year
2011-01-29 2011-02-28
Page 193
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT Current_Date,
CASE
WHEN MOD(CAST(Extract(Year from Current_Date) as integer), 400) = 0
THEN 'Leap year'
WHEN MOD(CAST(EXTRACT(Year from Current_Date) as integer), 4) = 0
AND NOT MOD(CAST(EXTRACT(Year from Current_Date) as integer), 100) = 0
THEN 'Leap Year'
else 'Not Leap Year'
END
EXPR_1 EXPR_2
2023-07-12 Not Leap Year
You can use the query above as an example to see if a date is a leap year.
Page 194
Licensed to , [email protected]
Chapter 6 Date Functions
SELECT Current_Timestamp,
CASE
WHEN MOD(CAST(Extract(Year from Current_timestamp) as integer), 400) = 0
THEN 'Leap year'
WHEN MOD(CAST(EXTRACT(Year from Current_timestamp) as integer), 4) = 0
AND NOT MOD(CAST(EXTRACT(Year from Current_timestamp) as integer), 100) = 0
THEN 'Leap Year'
ELSE 'Not Leap Year'
END
EXPR_1 EXPR_2
2023-07-12 8:43:37.301000 Not Leap Year
You can use the query above as an example to see if a current_timestamp is a leap year.
Page 195
Licensed to , [email protected]
Chapter 6 Date Functions
Make_Interval
make_interval
1000 years 1 months 8 days 1 hours 1 minutes 1 seconds
The Make_Interval command allows you to make an interval value by adding a series of values that represent the
year, months, weeks, days, hours, minutes, and seconds. (You can specify all or some of these values.) The year,
month, day, hour, and minute values are integers, and default to 0. The second's value is a double-precision
number that defaults to 0.0.
Page 196
Licensed to , [email protected]
Chapter 6 Date Functions
Try_Divide Function
The try_divide function returns dividend divided by divisor, or NULL if divisor is 0. If both dividend and divisor
are DECIMAL, the result is DECIMAL. If dividend is a year-month interval, the result is an INTERVAL YEAR
TO MONTH. If divident is a day-time interval, the result is an INTERVAL DAY TO SECOND. In all other cases,
a DOUBLE. If the divisor is 0, the operator returns NULL.
Page 197
Licensed to , [email protected]
Chapter 6 Date Functions
Page 198
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
- Carl Sagan
Page 199
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Nexus collects each answer set in memory and then allows the user to use analytic templates to create the same
analytic reports you can get from Databricks, but with Nexus the analytics are free. Watch the YouTube video:
https://www.youtube.com/watch?v=dMsAghFbYXk&t=16s.
Page 200
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
ROW_NUMBER
The ROW_NUMBER () Keyword(s) caused seq_number to increase sequentially. Notice that this does NOT have
a Rows Unbounded Preceding, and it still works!
Page 201
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
QUALIFY
Qualify is a keyword extension that acts as a filter but only filters after the calculations finish. Qualify is to
ordered analytics and window functions what HAVING is to aggregates. So, qualify only works on analytics, and
HAVING only works on aggregates, but both are filters after processing the calculations.
Page 204
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
WITH TeraTom AS
(
SELECT class_code, first_name, last_name , grade_pt,
ROW_NUMBER() OVER (PARTITION BY class_code
ORDER BY class_code, grade_pt DESC ) AS toptwo
FROM student_table
WHERE class_code IS not null
) SELECT * FROM TeraTom WHERE toptwo < 3
class_code first_name last_name grade_pt toptwo
FR Wendy Thomas 4.00 1
FR Henry Hanson 2.88 2
JR Jimmy Bond 3.95 1
JR Richard McRoberts 1.90 2
SO Susie Wilson 3.80 1
SO Andy Smith 2.00 2
SR Danny Delaney 3.35 1
SR Martin Phillips 3.00 2
The example above is finding the top two students with the highest grade_pt average in their class_code. The
words colored in blue in the example are the derived table needed to filter out the top two students.
Page 205
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
RANK
The example above uses the rank command. We are ranking on daily_sales ASC. The ORDER BY statement
identifies the column we are ranking. Notice that the first two-rows tie and the next row gets a three.
Page 206
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Dense_Rank
The example above uses the dense_rank command. We are ranking on daily_sales ASC. The difference between a
Rank and a Dense_Rank command is how they handle ties. Notice the first two rows tie with a rank of one, but the
next row ranks as a two. A Rank would have made the third row a three.
Page 207
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
This RANK query is sorted in descending mode. The highest daily_sales return first.
Page 208
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
What does the PARTITION Statement in the RANK () OVER do? It resets the rank.
Page 209
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, we rank the daily_sales column and partition on the product_id column. Therefore each rank is within
product_id, and because we ORDER BY daily_sales DESC, we rank the largest daily_sales with the one status. We
have a QUALIFY statement at the end, which acts as a special filter. Qualify waits until all calculations finish, but
the QUALIFY acts like a final WHERE clause before returning the answer set, but only for the ordered analytics.
So, above, we have the top three highest-ranking products for each product_id.
Page 210
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
SELECT * Anything in
FROM the color
(SELECT product_id ,sale_date , daily_sales, (red or blue)
RANK() OVER (PARTITION BY product_id was added
ORDER BY daily_sales DESC) AS rank1 as part of
FROM sales_table) as TeraTom the derived
WHERE rank1 < 3 Derived table is named TeraTom table
You can't use a WHERE clause to filter for calculations or analytics because the calculations still need to be
calculated. Once I ran my query to satisfaction, I added a derived table so I could get only the top three ranking
daily_sales.
Page 211
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
You can't use a WHERE clause to filter for calculations or analytics because the calculations still need to be
calculated. Once I ran my query to satisfaction, I added a derived table so I could get only the top three ranking
daily_sales.
Page 212
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The difference between a RANK and a DENSE_RANK is how they handle ties. The DENSE_RANK will not skip
a number when the previous rows tie. Notice how the RANK skips to a 3 when the two previous rows tie with a
rank of 1.
Page 213
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
WITH TERATOM AS
(SELECT PRODUCT_ID, SALE_DATE, DAILY_SALES,
RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY DAILY_SALES DESC) AS RANK1
FROM SALES_TABLE)
SELECT * FROM TERATOM
WHERE RANK1 < 4 ORDER BY PRODUCT_ID, RANK1 ;
What does the PARTITION Statement in the DENSE_RANK() OVER do? It resets the Dense_Rank.
Page 214
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Page 215
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Percent_Rank is just like RANK, but the Rank is a percentage. The calculation is a percent of all the other rows up
to 100%. If you compare the last two examples, you will see different results.
Page 216
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
We now have added a Partition statement that resets on product_id, so this produces seven rows for each of our
product_ids.
Page 217
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Cumulative Sum
The example above is performing a cumulative sum (CSUM). The query is an ordered analytic because it orders
the data by sale_date and then calculates the first row’s daily_sales and adds them all up until the end.
Page 218
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Adding the CAST command allowed me to only see two decimal places for csumansi. The CAST command
converts data types.
Page 219
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The first thing the above query does before calculating is SORT all the rows by sale_date. The Sort is located right
after the ORDER BY statement.
Page 220
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The keywords ROWS UNBOUNDED PRECEDING determine that this is a CSUM. There are only a few
different statements, and Rows Unbounded Preceding is the main one. It means start calculating at the beginning
row and continue calculating until the last row.
Page 221
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The second sumover row is 90739.28. The calculation is the first row’s daily_sales (48850.40) added to the
SECOND row’s daily_sales (41888.88). It continues to add up the running total until the last row.
Page 222
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
You can have more than one SORT KEY. In the top query, product_id is the MAJOR Sort, and sale_date is the
MINOR Sort.
Page 223
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The PARTITION Statement is how you reset in ANSI. The partition statement will cause the column alias
sumansi to start over (reset) on its calculating for each NEW product_id.
Page 224
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above are two OLAP statements. The grandtotal column one has PARTITION BY, so only it resets.
Page 225
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Moving Sum
The SUM () Over allows you to get the moving SUM of a specific column. The moving window in ANSI form
always includes the current row. When you see “ROWS 2 PRECEDING”, this means to calculate the current row
and two preceding rows. They are adding up the daily_sales every three rows looking for trends.
Page 226
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The first ordered analytic statement gives a moving sum with a moving window of 3. The second ordered analytic
statement is performing a continuous sum from the first row to the last.
Page 227
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The “Partition By” statement resets the calculations with each product_id break.
Page 228
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Moving Average
The AVG () Over allows you to get the moving AVG of a specific column. The moving window in ANSI form
always includes the current row. When you see “ROWS 2 PRECEDING”, this means to average the current row
and two preceding rows. They are averaging the daily_sales every three rows looking for trends.
Page 229
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above is doing a moving average for 3-rows at a time. The first row is a single calculation because
no rows precede it, and the second row in the calculation averages two-rows. From the third row until the end,
each row is averaging the current row and previous two-rows. A moving average is to look for trends when the
business did well or not so well.
Page 230
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The moving average in the example above is ordering the data by product_id and then by sale_date. Only then are
the averages calculated.
Page 231
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3 column in the fourth row?
Page 232
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
With a Moving Window of 3, the 46450.23 amount derived in the third row is the average of 48850.40, 54500.22,
and 36000.07.
Page 233
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
With a Moving Window of 3, how is the 43566.91 amount derived in the avg_3 column in the fourth row?
Page 234
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
With a Moving Window of 3, the 43566.91 amount derived in the fourth row is the average of 54500.22,
36000.07, and 40200.43. It is the current row and previous two-rows in the moving window of 3.
Page 235
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The first ordered analytic statement gives a moving AVG with a moving window of 3. The second ordered
analytic statement is performing a continuous average from the first row to the last.
Page 236
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above is doing a moving average for 3-rows at a time. The first row is a single calculation because
no rows precede it, and the second row in the calculation averages two rows. From the third row until the end,
each row averages the current row and the previous two rows. The Partition By statement means to reset the
calculation on product_id breaks. averages two rows
Page 237
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets on the column
within the statement. Notice that only the column alias continuous resets, but avg3 does not.
Page 238
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Moving Difference
The example above is a moving difference. Only two rows compare at the time, and that is the current row with
the row four rows ahead. I have color-coded the answer set to show you the two rows that compare. The fifth row
corresponds to the first row and has a -16049.90 difference. The sixth row compares to the second row.
Page 239
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The moving difference query above has a moving window of 4 and a PARTITION BY statement. This statement
means to reset the calculations with every product_id break.
Page 240
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The moving difference query above has a moving window of 4 and a PARTITION BY statement. This statement
means to reset the calculations with every product_id break.
Page 241
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example finds the value of a column in the next row for daily_sales. You can use MIN or MAX
interchangeably when you want the next value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING.
Page 242
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example finds the value of a column in the next row for daily_sales. You can use MIN or MAX
interchangeably when you want the next value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING. Notice how the PARTITION BY statement resets to null for the last row in product_id 1000.
Page 243
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example finds the value of a column in the next row for sale_date. You can use MIN or MAX
interchangeably when you want the next date value. The keywords are ROWS BETWEEN 1 FOLLOWING and 1
FOLLOWING.
Page 244
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example finds the value of a column in the next row for daily_sales. The keywords are ROWS
BETWEEN 1 FOLLOWING and 1 FOLLOWING, which delivers the next row's daily_sales. The keywords
ROWS BETWEEN 2 FOLLOWING and 2 FOLLOWING provides the daily_sales value two rows down.
Page 245
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above is the COUNT OVER. It will provide a sequential number starting at 1. The COUNT OVER
continues to add up the previous total by one until there are no more rows. You do not need ROWS
UNBOUNDED PRECEDING.
Page 246
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s)
ROWS UNBOUNDED PRECEDING is not necessary, but they will not cause an error if present. The COUNT
OVER continues to add up the previous total by one until there are no more rows.
Page 247
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
After the sort, the Max () Over shows the Max Value up to that point. With each new max, a new number is a
max.
Page 248
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The largest value is 64300.00 in the column maxover. Once 64300.00 arrives, it is the max until the product_id
breaks. The PARTITION BY statement resets the calculation.
Page 249
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
After the sort, the MIN () Over shows the Min Value up to that point. With each new Min, that new Min appears.
Page 250
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The MIN calculation resets and starts over with each product_id break. Partition By causes analytics to reset.
Page 251
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above uses ROWS BETWEEN 1 PRECEDING AND CURRENT ROW, and then it uses a different
example with ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Notice how the report came out?
Page 252
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Assigning a different value to the partition's indicator of the Ntile function changes the number of partitions
established. Each Ntile partition is assigned a number starting at one increasing to a value that is one less than the
partition number specified. So, with a Ntile of 4, the partitions are 1 through 4. Then, all the rows are distributed
as evenly as possible into each partition from highest to lowest values. Typically, extra rows with the lowest value
begin back in the lowest numbered partitions.
Page 253
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The Ntile function organizes rows into n number of groups. These groups refer to the name tiles. The tile number
returns in the answer set. For example, the example above has ten rows, so NTILE(5) splits the ten rows into five
equally sized tiles. There are two rows in each tile in the order of the OVER() clause's ORDER BY clause.
Page 254
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Ntile
The Ntile function organizes rows into n number of groups. These groups refer to the name tiles. The tile number
returns in the answer set. For example, the example above has ten rows, so NTILE(5) splits the ten rows into five
equally sized tiles. There are two rows in each tile in the order of the OVER() clause's ORDER BY clause.
Page 255
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Ntile Continued
The Ntile function organizes rows into n number of groups. These tiles return in the answer set. For example, the
example above has six rows, so NTILE(2) splits the ten rows into two equally sized tiles. There are three rows in
each tile in the order of the OVER() clause's ORDER BY clause.
Page 256
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Ntile Percentile
4306444 2003-06-03 2 21
4306444 2003-06-02 2 20
3402222 2004-02-28 2 19
1302111 2003-02-28 2 18
The Ntile function organizes rows into n number of groups, so the above example is a way to get the percentile.
Page 257
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
This example determines the percentile for every row in the Sales
table based on the daily sales amount and sorts it into sequence
by the value being categorized, which here is daily sales.
Page 258
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Instead of 100, the example above uses a quartile (QUANTILE based on four partitions).
Page 259
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The NTILE() function divides the rows into buckets as evenly as possible. In this example, because PARTITION
BY is listed, the data will first be sorted by product_id and then sorted using the ORDER BY clause (within
product_id), and then divided into the number of buckets specified. This example uses a value of 3 in the NTILE.
Notice that the PARTITION BY statement causes the answer set to reset when the product_id goes from 1000 to
2000.
Page 260
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The QUALIFY statement is like a WHERE filter but used after the calculations. The example above returns only
the rows from each product_id placed in the first bucket.
Page 261
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Using FIRST_VALUE
The above example uses FIRST_VALUE to show you the very first first_name returned. It also uses the keyword
Partition to show you the very first first_name returned in each department.
Page 262
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
FIRST_VALUE
Above, after sorting the data by sale_date, we get the first value of daily_sales for only the first row. This seems
simple enough but watch us build to make first_value more relevant.
Page 263
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, after partition by product_id, we essentially group the calculation within each product_id. We then have a
minor sort by sale_date. We get the first value of daily_sales for the first row of each product_id. Our next
example will show a great way to use the first_value function.
Page 264
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, after sorting the data by sale_date, we get the first value of daily_sales. We then subtract the first_value of
48850.40 against all other daily_sales within product_id 1000 to see the differences.
Page 265
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, after sorting the data by sale_date, we compute the difference between the first row's daily_sales and the
daily_sales of each following row. All rows daily_sales compare with the first row's daily_sales, thus the name
First_Value.
Page 266
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, after sorting the data by daily_sales DESC, we compute the difference between the first row's daily_sales
and the daily_sales of each following row. All rows daily_sales compare with the first row's daily_sales, thus the
name First_Value. This example shows how much less each daily_sales compared to 64,300.00, which is our
highest sale.
Page 267
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
We are comparing the daily_sales value for the first sale_date with the daily_sales of all other rows within the
product_id partition. Each row compares only with the first row (First_Value) in its partition.
Page 268
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Using LAST_VALUE
The FIRST_VALUE and LAST_VALUE are good to use anytime you need to propagate a value from one row to
all or multiple rows based on a sorted sequence. However, the output from the LAST_VALUE function appears to
be incorrect and is a little misleading until you understand a few concepts. The SQL request specifies "rows
unbounded preceding,“ and LAST_VALUE looks at the last row. The current row is always the latest, and
therefore, it appears in the output.
Page 269
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, after sorting the data by sale_date, we compute the difference between the last row's daily_sales and the
daily_sales of each following row (from the same sale_date). Since there are only two product totals for each day,
there is always a 0.00 for one of the rows.
Page 270
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
First_Value Review
First_Value will work slightly differently than Last_Value. Above, we have a First_Value example, and it makes
sense. After the ORDER BY sale_date, the data sorts by sale_date ASC. We then take the daily_sales value of the
first row.
Page 271
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
First_Value will work slightly differently than Last_Value. Above, we have a Last_Value example, and it is
different than our previous First_Value example. After the ORDER BY sale_date, the data sorts by sale_date
ASC, but the last_value changes to the current row's daily_sales value with each sale_date change. We will fix this
in our next example.
Page 272
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
In this example we are using the keywords ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING. The ORDER BY sale_date statement sorts the data in descending order. The
last_value is the row's daily_sales value of the most current sale_date.
Page 273
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Because we use the PARTITION BY product_id statement and use the ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING statement, the example above makes sense. We reset the
calculation on each product_id break.
Page 274
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
We are now getting the first and last value because we use two functions. The data displays by the Last_Value
function because it is the last function in the SQL.
Page 275
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Page 276
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Using LEAD
As you can see, the first LEAD brings back the value from the next row except for the last, which has no row
following it. We did not specify the offset value in this example, so it defaulted to a value of 1 row. Both queries
trim both the leading and trailing spaces from the Last_Name column for the life of the query.
Page 277
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the first LEAD brings back the value from the next row except for the last, which has no row
following it. We did not specify the offset value in this example, so it defaulted to a value of 1 row. Notice how
things reset because of the Partitioning statement.
Page 278
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LEAD brings back the value from Daily_Sales row two rows down, except for the last two
rows, which have no rows two rows down. The offset value is 2, so it shows the value of two rows down.
Page 279
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LEAD brings back the value from Daily_Sales row two rows down, except for the last two
rows, which have no rows two rows down (within a partition). Notice how things reset because of the Partitioning
statement.
Page 280
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Using LAG
As you can see, the LAG brings back the value from the next row except for the last, which has no row following
it. We did not specify the offset value in this example, so it defaulted to a value of 1 row.
Page 281
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LAG brings back the value from the next row except for the first, which has no row before it.
We did not specify the offset value in this example, so it defaulted to a value of 1 row. Notice how things reset
because of the partitioning statement.
Page 282
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LAG brings back the Daily_Sales value from both yesterday and two days ago. The offset
value in this example is 1 in yesterday, and 2 for the previous two days.
Page 283
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LAG brings back the Daily_Sales value from two rows down except for the first two rows,
which has no two rows before it. The offset value in this example is 2, so it shows the value of two rows down.
Page 284
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
As you can see, the LAG brings back the value from the Daily_Sales column that is two rows down from the
current row. Notice that the first two rows are null because there are now rows two rows before. Notice how things
reset because of the Partitioning statement.
Page 285
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
CUME_DIST
CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula.
That formula is (number of rows preceding or peer with current row) / (total rows). We order by Daily_Sales
DESC so that each row ranks by the cumulative distribution. The distribution is represented relatively by floating-
point numbers from 0 to 1. When there is only one row in a partition, it is assigned 1. When there is more than one
row, they are assigned a cumulative distribution ranking, ranging from 0 to 1.
Page 286
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
CUME_DIST
SELECT prod The rows are sorted first by total_sales before the calculation begins.
,total_sales
,CUME_DIST() OVER (ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example
prod total_sales cdist percent After total_sales sort the rows, the first
1000 1.00 0.1 10 row has a value of 1.00. There are 10
rows in the result set.
2000 2.00 0.2 20
The calculation is 1/10 = 0.1.
3000 3.00 0.3 30
4000 4.00 0.4 40
After total_sales sorts the rows, the ninth
5000 5.00 0.5 50 row has a value of 999.00. There are 10
6000 6.00 0.6 60 rows in the result set.
7000 7.00 0.7 70
8000 8.00 0.8 80 The calculation is 9/10 = 0.9.
9000 999.00 0.9 90 Notice that the cdist calculation is based
10000 9999.00 1 100 on the order of the row relative to the
other rows in the data set.
CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula.
That formula is (number of rows preceding or peer with current row) / (total rows).
Page 287
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
SELECT prod
,total_sales
,CUME_DIST() OVER (ORDER BY total_sales) as cdist
,cdist * 100 as percent
FROM sales_simple_example
QUALIFY CUME_DIST() OVER (ORDER BY total_sales) >= 0.5
prod total_sales cdist percent
5000 5.00 0.5 50 The rows are sorted first by the
6000 6.00 0.6 60 column total_sales.
7000 7.00 0.7 70
8000 8.00 0.8 80 The QUALIFY statement provides
9000 999.00 0.9 90 only the top 50 percent of sales.
10000 9999.00 1 100
Based on a formula, CUME_DIST() is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. The QUALIFY statement provides only the top 50 percent of sales.
Page 288
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. After total_sales sort the rows, the 5th and 6th row have a tie, so the calculation for
both rows is 6 divided by 10 = 0.6.
Page 289
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). Above, The rows are sorted
first by the column total_sales. We reset the calculations for each region because we use the PARTITION BY
statement.
Page 290
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Based on a formula, CUME_DIST is a cumulative distribution function that assigns a relative rank to each row.
That formula is (number of rows preceding or peer with current row) / (total rows). We Partition by Product_ID
and then Order By Daily_Sales DESC so that each row ranks by cumulative distribution within its partition.
Page 291
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Above, we used the ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING to produce a
CSUM but notice that the Product_ID and the Sale_Date reverse. We see the Product_ID of 3000 first and the
latest date first.
Page 292
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The example above uses ROWS BETWEEN 1 PRECEDING AND CURRENT ROW, and then it uses a different
example with ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Notice how the report came out?
Page 293
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
MEDIAN Example
The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest to the highest, it then picks the middle
one. If there is an even number of values, then there is no single middle value, so the median is the mean (average)
of the two middle values.
Page 294
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest value to the highest, it then picks the
middle one. If there is an even number of values, then there is no single middle value, so the median is considered
to be the mean (average) of the two middle values.
Page 295
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The Median is a numerical value of an expression in an answer set within a window that separates the higher half
of a sample from the lower half. After sorting all values from the lowest value to the highest, it then picks the
middle one. If there is an even number of values, then there is no single middle value, so the median is considered
to be the mean (average) of the two middle values.
Page 296
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The result of the aggregate function is computed by linear interpolation between the values
from rows at row numbers
PERCENTILE_CONT ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_CONT
OVER ( [ PARTITION BY expr_list ] )
Page 297
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The result of the aggregate function is computed by linear interpolation between the values
from rows at row numbers
(CRN – RN) * (value of expression for the row at FRN) + (RN – FRN) * (value
of expression for the row at CRN).
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
PERCENTILE_CONT ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_CONT
OVER ( [ PARTITION BY expr_list ] )
Arguments
percentile - Numeric constant between 0 and 1. Nulls are ignored in the calculation.
WITHIN GROUP ( ORDER BY expr) - Specifies numeric or date/time values to sort and
compute the percentile over.
OVER - Specifies the window partitioning. The OVER clause cannot contain a window
ordering or window frame specification.
PARTITION BY expr - Optional argument that sets the range of records for each group
in the OVER clause.
Using the percentile value (P) and the number of not null rows (N) in the aggregation
group, the function computes the row number after ordering the rows according to the
sort specification. This row number (RN) is computed according to the formula
RN = (1+ (P*(N-1)).
The result of the aggregate function is computed by linear interpolation between the
values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
Above are the function arguments with additional information to help you make sense of this challenging function.
Page 299
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
PERCENTILE_CONT Example
The above example shows percentile_cont (0.5). These values would be different if percentile_cont were (0.4).
Page 300
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows percentile_cont (0.4). Notice that the median values are different than the previous
example that uses a percentile_cont (0.5).
Page 301
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows the percentile_cont (0.5) for each Product_ID partition break.
Page 302
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows the percentile_cont (0.4) for each Product_ID partition break.
Page 303
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
PERCENTILE_DISC ( percentile )
Syntax for
WITHIN GROUP (ORDER BY expr)
PERCENTILE_DISC
OVER ( [ PARTITION BY expr_list ] )
Page 304
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
PERCENTILE_DISC Example
The above example shows percentile_disc (0.5). These values would be different if percentile_disc were (0.4).
Page 305
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows percentile_disc (0.4). The answer set is different than the previous example that uses a
percentile_disc of (0.5).
Page 306
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows the percentile_disc (0.5) for each Product_ID partition break.
Page 307
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
The above example shows the percentile_disc (0.4) for each Product_ID partition break.
Page 308
Licensed to , [email protected]
Chapter 7 Analytic and Window Functions
Page 309
Licensed to , [email protected]
Chapter 8 Temporary Tables
- John Lennon
Page 310
Licensed to , [email protected]
Chapter 8 Temporary Tables
Everything
SELECT * in blue, red,
FROM (SELECT AVG(salary) as avgsal and pink is
FROM employee_table) AS teratom ; the derived
table
The SELECT statement that creates and populates the derived table is always inside parentheses.
Page 311
Licensed to , [email protected]
Chapter 8 Temporary Tables
SELECT *
FROM (SELECT AVG(salary) as avgsal
FROM employee_table) AS teratom ;
avgsal Answer
46782.153333 Set
In the example above, TeraTom is the name given to the derived table. The materialization of the derived table
comes from the blue colors. The red color is the alias name given to the aggregate AVG(salary).
Page 312
Licensed to , [email protected]
Chapter 8 Temporary Tables
SELECT *
FROM (SELECT AVG(salary)
FROM employee_table) AS teratom(avgsal) ;
2
The derived
Aliasing the column(s)
table must
can be done here, but it
always have a
is not the best way
name
In the example above, TeraTom is the name given to the derived table. All calculations must have an alias. The
first example is a better way to alias the column names.
Page 313
Licensed to , [email protected]
Chapter 8 Temporary Tables
Derived
Create the Derived This AS columns
Table before we run the is necessary must
query query!
using WITH! have an
alias
WITH teratom AS
(SELECT AVG(salary) as avgsal
FROM employee_table)
After teratom is built
you must use it in a
SELECT * FROM teratom ; second SELECT
clause or the query
avgsal Answer errors
46782.153333 Set
In the example above, TeraTom is the name given to the derived table. The WITH command allows the creation of
the derived table before the actual query runs. You must have two SELECT statements. The first SELECT builds
the table and the second queries the table. All derived table examples shown have the same performance.
Page 314
Licensed to , [email protected]
Chapter 8 Temporary Tables
SELECT *
FROM (SELECT dept_no Aliasing of Notice that
1 ,AVG(salary) column names every
FROM employee_table query with
a derived
GROUP BY dept_no) teratom (dept_no, avgsal) table has at
least two
SELECT * No alias Alias of select
FROM (SELECT dept_no needed calculation
2 statements.
, AVG(salary) as avgsal
FROM employee_table One
GROUP BY 1) teratom ; SELECT to
populate
Alias of the derived
WITH teratom AS table, and
calculation
(SELECT dept_no, AVG(salary) as avgsal the other to
FROM employee_table run the
3 GROUP BY dept_no) query.
You must SELECT again
SELECT * FROM teratom
or the query will error
In the examples above, TeraTom is the name given to each derived table. All examples have the same
performance to retrieve the same answer set. The choice is yours.
Page 315
Licensed to , [email protected]
Chapter 8 Temporary Tables
The above example shows how users use derived tables. Derived tables are great for combining aggregates with
detailed data. Above our derived table, TeraTom held the averages for each dept_no, and we then joined our
derived table named TeraTom to our employee_table.
Page 316
Licensed to , [email protected]
Chapter 8 Temporary Tables
Page 317
Licensed to , [email protected]
Chapter 8 Temporary Tables
When using the WITH Command, we can CREATE our Derived table before running the main query. You must
SELECT from the derived table, or it will error.
Page 318
Licensed to , [email protected]
Chapter 8 Temporary Tables
The WITH syntax is nice because you build the derived table right away.
Page 319
Licensed to , [email protected]
Chapter 8 Temporary Tables
ON E.employee_no = s.employee_no
ORDER BY t.dept_no;
Licensed to , [email protected]
Chapter 8 Temporary Tables
Page 321
Licensed to , [email protected]
Chapter 9 Subqueries
Chapter 9 – Subqueries
“An invasion of Armies can be resisted, but not an idea whose time has come."
- Victor Hugo
Page 322
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
This query is straightforward and easy to understand. It uses an IN-List to find all employees who are in dept_no
100 or dept_no 200.
Page 323
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT *
FROM employee_table
WHERE dept_no IN (100, 100,200, 200) ;
What is going on with this IN List? Why in the world are their duplicates in there? Will this query even work?
What will the result set look like when it returns? Turn the page!
Page 324
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The system ignores duplicate values in a list. We get the same rows back as before because the system ignores the
duplicate values in the IN list.
Page 325
Licensed to , [email protected]
Chapter 9 Subqueries
The Subquery
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT * There is a
Which Query FROM employee_table Top Query
Runs First WHERE dept_no IN ( and a
Top or bottom? SELECT dept_no Bottom
FROM department_table) ; Query!
The query above is a Subquery, meaning multiple queries are in the same SQL. The bottom query runs first, and
its purpose in life is to build a distinct list of values that it passes to the top query. The top query then returns the
result set. This query solves the problem: Show all employees in valid departments!
Page 326
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00 2
1256349 400 Harrison Herbert 54500.00 100
200 The result is
SELECT * 300 passed to the
1 FROM employee_table 400 top query!
The Bottom WHERE dept_no IN ( 500
Query runs SELECT dept_no
first! FROM department_table) ;
3
SELECT * FROM employee_table The top query runs using the
WHERE dept_no IN (100, 200, 300, 400, 500) ; bottom query answer set
The bottom query runs first and builds a distinct IN list. Then, the top query runs using the list.
Page 327
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT *
1 FROM employee_table
WHERE dept_no IN (
SELECT dept_no FROM department_table) ;
Both queries above are the same. Query two has values in an IN list. Query one runs a subquery to build the values
in the IN list.
Page 328
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Notice that No
2341218 400 Reilly William 36000.00 employees are
1256349 400 Harrison Herbert 54500.00 in dept 500
Page 329
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
A great question is above. How are subqueries similar to joins? Do you know the answer? Turn the page!
Page 330
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Primary
2341218 400 Reilly William 36000.00 Key
1256349 400 Harrison Herbert 54500.00
How are Subqueries
Foreign Key similar to Joins
between two tables?
A Subquery between two tables or a Join between two tables will
each need a common key that represents the relationship. This is
called a Primary Key/Foreign Key relationship.
A Subquery will use a common key linking the two tables together, very similar to a join! When sub querying
between two tables, look for the common link between the two tables. They will commonly both have a column
with the same name, but not always.
Page 331
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Both queries above return much of the same data. If you only want to see a report where the final result set has
only columns from one table, try a subquery. If you need columns on the report where the final result set has
columns from both tables, you must do a Join.
Page 332
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Write the
Subquery
Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the
customer_table if the customer has placed an order in the order_table. Good luck! Advice: Look for the shared
key among both tables!
Page 333
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
The shared key among both tables is customer_number. The bottom query runs first and delivers a distinct list of
customer numbers, which the top query uses in the IN-List!
Page 334
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the
customer_table if the customer has placed an order in the order_table that is greater than $10,000.00.
Page 335
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Page 336
Licensed to , [email protected]
Chapter 9 Subqueries
Write SQL that will bring back an answer set that selects all columns from
the student_table if that student is taking a course that has four (4) credits.
Use a subquery to get the answer set requested above. The answer is on the next page.
Page 337
Licensed to , [email protected]
Chapter 9 Subqueries
SELECT *
FROM student_table
WHERE student_id IN
(SELECT student_id
FROM student_course_table
WHERE course_id IN
(SELECT course_id
FROM course_table
WHERE credits=4))
Above is something to enjoy and learn from in your quest to master subqueries.
Page 338
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Another opportunity knocking! Would someone please answer the query door?
Page 339
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
SELECT *
FROM employee_table
WHERE salary > (
SELECT AVG(salary)
FROM employee_table) ;
Nailed it!
Page 340
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Another opportunity knocking! The query is complicated. Only the best get this written correctly.
Page 341
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The above example is a correlated subquery. It works differently than normal subqueries.
Page 342
Licensed to , [email protected]
Chapter 9 Subqueries
The example above is a correlated subquery. It works differently than normal subqueries. It runs the top query first
and then runs the bottom query for each distinct dept_no.
Page 343
Licensed to , [email protected]
Chapter 9 Subqueries
Page 344
Licensed to , [email protected]
Chapter 9 Subqueries
Both queries above will bring back all employees making a salary that is greater than the average salary in their
department. The most significant difference is that the Join with the Derived Table also shows the average salary
in the result set.
Page 345
Licensed to , [email protected]
Chapter 9 Subqueries
sales_table
product_id sale_date daily_sales
1000 10/02/2000 32800.50
1000 09/30/2000 36000.07
1000 10/01/2000 40200.43
2000 10/04/2000 32800.50
2000 10/02/2000 36021.93
2000 09/28/2000 41888.88
3000 10/04/2000 15675.33
3000 10/02/2000 19678.94
3000 10/03/2000 21553.79
Write the Correlated Subquery
Another opportunity knocking! You now have a second chance. I will even give you a third chance.
Page 346
Licensed to , [email protected]
Chapter 9 Subqueries
All you must do is alias both tables and then correlate in the WHERE clause.
Page 347
Licensed to , [email protected]
Chapter 9 Subqueries
sales_table
product_id sale_date daily_sales
1000 10/02/2000 32800.50
1000 09/30/2000 36000.07
1000 10/01/2000 40200.43
All rows
are not
2000 10/04/2000 32800.50
displayed 2000 10/02/2000 36021.93
2000 09/28/2000 41888.88
3000 10/04/2000 15675.33
3000 10/02/2000 19678.94
3000 10/03/2000 21553.79
Write the Correlated Subquery
Page 348
Licensed to , [email protected]
Chapter 9 Subqueries
All you must do is alias both tables and then correlate in the WHERE clause.
Page 349
Licensed to , [email protected]
Chapter 9 Subqueries
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
231222 Wilson Susie SO 3.80
280023 McRoberts Richard JR 1.90
322133 Bond Jimmy JR 3.95
125634 Hanson Henry FR 2.88
333450 Smith Andy SO 2.00
324652 Delaney Danny SR 3.35
260000 Johnson Stanley ? ?
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
Write the Correlated Subquery
Another opportunity knocking! Get this down, and your future will skyrocket.
Page 350
Licensed to , [email protected]
Chapter 9 Subqueries
All you have to do is alias both tables and then correlate in the WHERE clause.
Page 351
Licensed to , [email protected]
Chapter 9 Subqueries
Use a subquery to get the answer set requested above. No joins allowed! The answer is on the next page.
Page 352
Licensed to , [email protected]
Chapter 9 Subqueries
SELECT *
FROM course_table
WHERE course_id IN
(SELECT course_id
FROM student_course_table
WHERE student_id IN
(SELECT student_id
FROM student_table AS s1
WHERE grade_pt >
(SELECT AVG(grade_pt)
FROM student_table AS s2
WHERE s1.class_code=s2.class_code))) ;
course_id course_name credits seats
200 Introduction to SQL 3 20
100 Databricks Concepts 3 50
220 V2R3 SQL Features 2 25
300 Physical Database Design 4 20
210 Advanced SQL 3 22
Page 353
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00 Select all columns in the
2341218 400 Reilly William 36000.00
department_table if the
1256349 400 Harrison Herbert 54500.00
dept_no is not in the
employee_table.
SELECT * FROM
NO DATA RETURNS
department_table
WHERE dept_no NOT IN This is because when a
(SELECT dept_no NOT IN encounters a null
FROM employee_table) it freaks out.
When a NOT IN subquery encounters a null value in the list, it always returns nothing. The system can't eliminate
if it doesn't know what is in the null. The next page shows a technique to get around this problem.
Page 354
Licensed to , [email protected]
Chapter 9 Subqueries
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
When a NOT IN subquery encounters a null value in the list, it always returns nothing. The system can't eliminate
if it doesn't know what is in the null. That is why you put the WHERE clause in the bottom query.
Page 355
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Page 356
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
SELECT *
FROM customer_table
WHERE customer_number
NOT IN Nulls are a
(SELECT customer_number NOT IN nightmare.
FROM order_table Notice how I
WHERE customer_number IS NOT NULL) ; account for them!
Whenever you have a NOT IN query, make sure you eliminate null values from being in the list.
Page 357
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Another opportunity to show your brilliance is ready for you to make it happen.
Page 358
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Page 359
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Get ready to be amazed at either yourself or the answer on the next page!
Page 360
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
What is the highest dollar order for each Customer? This Subquery involves two parameters. The example above
is how you utilize multiple parameters in a subquery!
Page 361
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
customer_number Max(order_total)
11111111 12347.53 These 4 rows
31323134 5111.47 are sent to
87323456 15231.62 the top query
57896883 23454.84
The bottom query runs first, returning two columns. Turn to the next page for more info!
Page 362
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Once the in-list builds, we can process the top query and get the final answer set.
Page 363
Licensed to , [email protected]
Chapter 9 Subqueries
Above, we have a subquery that matches up subscriber_no and member_no because you need both columns to
distinguish an individual policyholder filing a claim. Notice that there is a parenthesis in the top query, but they
don't exist in the bottom query, which is the key to success for double-parameter subqueries.
Page 364
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Good luck in writing this query. Remember that this will involve multiple subqueries.
Page 365
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
The EXISTS command will determine, via a Boolean, if something is True or False. If a customer places an order
it EXISTS and using the Correlated Exists statement, only customers who have placed an order will return in the
answer set. EXISTS is different than IN as it is less restrictive, as you will soon understand.
Page 367
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
Only customers who placed an order return with the above Correlated EXISTS.
Page 368
Licensed to , [email protected]
Chapter 9 Subqueries
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Use NOT EXISTS to find which Customers have NOT placed an Order?
The EXISTS command will determine, via a Boolean, if something is True or False. If a customer places an order
it EXISTS and using the Correlated NOT Exists statement, only customers who have not placed an order will
return in the answer set. EXISTS is different than IN as it a NOT can handle null values.
Page 369
Licensed to , [email protected]
Chapter 9 Subqueries
Page 370
Licensed to , [email protected]
Chapter 10 Strings
Chapter 10 – Strings
- Steve Jobs
Page 371
Licensed to , [email protected]
Chapter 10 Strings
SELECT first_name
,UPPER (first_name) as upper_case
,lower(first_name) as lower_case
FROM student_table
first_name upper case lower case
Martin MARTIN martin
Henry HENRY henry
Susie SUSIE susie
Wendy WENDY wendy
Stanley STANLEY stanley
Richard RICHARD richard
Jimmy JIMMY jimmy
Danny DANNY danny
Andy ANDY andy
Michael MICHAEL michael
The UPPER and LOWER functions convert the input string to either all uppercase or lowercase characters.
Page 372
Licensed to , [email protected]
Chapter 10 Strings
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Page 373
Licensed to , [email protected]
Chapter 10 Strings
CHAR (20)
SELECT last_name
,LENGTH(last_name) AS lnth_wrong
,LENGTH(TRIM(last_name)) as lnth_right
FROM employee_table
ORDER BY 1;
last_name lnth_wrong lnth_right
Chambers 20 8
Coffing 20 7
Harrison 20 8
Jones 20 5
Larkins 20 7
Reilly 20 6
Smith 20 5
Smythe 20 6
Strickling 20 10
The LENGTH command brings back a length of 20 on many systems, but Databricks can still deliver the length of
a char(20) string using the TRIM command to remove the leading and trailing spaces.
Page 374
Licensed to , [email protected]
Chapter 10 Strings
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Page 375
Licensed to , [email protected]
Chapter 10 Strings
You can use the CHAR_LENGTH and OCTET LENGTH commands equivalently. These queries get the same
answer sets!
Page 376
Licensed to , [email protected]
Chapter 10 Strings
Query 1
SELECT last_name
,Trim(last_name) AS no_spaces
FROM employee_table ;
Query 2
SELECT last_name
,Trim(Both from last_name) AS no_spaces
FROM employee_table ;
Both queries trim both the leading and trailing spaces from the last_name column for the life of the query.
Page 377
Licensed to , [email protected]
Chapter 10 Strings
RTRIM Query
SELECT last_name
,RTRIM(last_name) AS trim_trailing_spaces
FROM employee_table ;
LTRIM Query
SELECT last_name
,LTRIM(last_name) AS trim_leading_spaces
FROM employee_table ;
The RTRIM command trims trailing spaces from a character string. The LTRIM trims leading spaces from a
character string. The LTRIM(RTRIM) combination trims both leading and trailing spaces from a character string.
Page 378
Licensed to , [email protected]
Chapter 10 Strings
SELECT first_name
,Trim(TRAILING 'y' FROM first_name) AS no_y
FROM employee_table
WHERE first_name LIKE '%y';
first_name no_y
Squiggy Squigg
Billy Bill
Page 379
Licensed to , [email protected]
Chapter 10 Strings
Concatenation
Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1 Query 2 Query 3
+ Snowflake
+ Azure Cloud SELECT first_name
,last_name This is a literal
+ DB2
To pipe symbols ,first_name space in single
+ Excel
mean quotes
+ Greenplum || ' '
+ Hadoop concatenate
|| last_name as full_name
+ Kognitio
FROM employee_table
+ Netezza
+ Oracle WHERE first_name = 'Squiggy'
+ Matrix
+ Databricks Messages Garden of Analysis Result 1
+ SQL Server
+ Sybase first_name last_name full_name
+ Teradata
+ Vertica 1 Squiggy Jones Squiggy Jones
Two pipe symbols represent concatenation. That allows you to combine multiple columns into one column. The ||
(Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between; just put two Pipe
Symbols together. In this example, we have combined the first name, then a single space, and then the last name to
get a new column called full_name.
Page 380
Licensed to , [email protected]
Chapter 10 Strings
Page 381
Licensed to , [email protected]
Chapter 10 Strings
SELECT first_name,
SUBSTR(first_name, 2 , 3) AS sub1,
SUBSTRING(first_name, 2, 3) AS sub2,
SUBSTRING(first_name from 2 for 3) as sub3
FROM employee_table
WHERE dept_no = 400;
Start in Go for
position 2 3 positions
The above example shows the SUBSTR command, which can also use the keyword SUBSTRING. The Substring
function receives two parameters, and they are the starting position of the string and the number of places to return
(from the starting position). The above example will start in position two and go for three positions!
Page 382
Licensed to , [email protected]
Chapter 10 Strings
SELECT first_name,
SUBSTR(first_name, 2) AS gotoend
FROM employee_table ;
Start in
Position 2
first_name gotoend
Squiggy quiggy
John ohn Since there is
Richard ichard only one
Herbert erbert parameter
(starting position)
Mandee andee
the results bring
Cletus letus all remaining
William illiam characters back
Billy illy
Loraine oraine
If you don’t tell the substring the end position, it will go all the way to the end.
Page 383
Licensed to , [email protected]
Chapter 10 Strings
The SQL above brings back the last two letters of each last_name even though the last names are of different
lengths. We SUBSTR last_name and run a subquery to get our starting position. Notice that we want the starting
position to be Char_Length – 1. We can then go for two positions.
Page 384
Licensed to , [email protected]
Chapter 10 Strings
SELECT last_name
,Position('e' in last_name) AS find_the_e
,Position('f' in last_name) AS find_the_f
FROM employee_table ;
The example above uses the POSITION counter. What it will do is tell you what position a letter locates in a string.
Why did Jones have a 4 in the result set? The ‘e’ was in the 4th position. Why did Smith get a zero for both
columns? There is no ‘e’ in Smith and no ‘f’ in Smith. If there are two ‘f’s’ only, the first occurrence reports.
Page 385
Licensed to , [email protected]
Chapter 10 Strings
What was the starting position of the Substr in the above query? It was one. The ending position (FOR length)
calculates to look for the first space. So, for “Research and Development,” the ending position was one and for
nine.
Page 386
Licensed to , [email protected]
Chapter 10 Strings
SELECT last_name
,CHARINDEX ('e', last_name) AS find_e
,CHARINDEX ('f', last_name) AS find_f
,CHARINDEX ('th', last_name) AS find_th
,CHARINDEX ('in', last_name, 6) AS find_in_after_6
FROM employee_table
WHERE TRIM(last_name) IN ('Smith', 'Smythe', 'Strickling', 'Coffing')
ORDER BY 1 DESC;
Tell this function what character(s) to look for in a string, and optionally, what starting position first to start
searching. If it does not find the character(s) in the string, it returns a 0. It also only reports the first occurrence.
Page 387
Licensed to , [email protected]
Chapter 10 Strings
SELECT last_name
,SUBSTRING (last_name, CHARINDEX(' ', last_name) -2 , 2) as last_two_letters
from employee_table;
last_name last_two_letters
Smythe he
Strickling ng
Chambers rs
Harrison on
Coffing ng
Smith th
Jones es
Larkins ns
Reilly ly
What was the starting position of the Substring in the above query? It uses a subquery to determine the starting
position. The CHARINDEX finds the first space (end of the name) and then subtracts 2 to get the starting length.
Even though the names were of different lengths, the CHARINDEX subquery brings back only the last two letters.
Page 388
Licensed to , [email protected]
Chapter 10 Strings
may_flowers_position
21
We are looking for the phrase of May flowers. The phrase starts in position 21 of the string.
Page 389
Licensed to , [email protected]
Chapter 10 Strings
Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: Nexus Schema: SQL Class EXECUTE
Systems Query 1 Query 2 Query 3
+ Snowflake
+ Azure Cloud
SELECT first_name || ' ' || last_name as name
+ DB2 ,Length(first_name) as len
+ Excel ,LPad(first_name, 10) as left_spaces
+ Greenplum ,Length(LPad(first_name, 10)) as l10
+ Hadoop ,RPad(last_name, 15) as right_spaces
+ Kognitio ,Length(RPAD(last_name, 15)) as l15
+ Netezza
+ Oracle
FROM employee_table
+ Matrix WHERE first_name LIKE 'M%' ;
+ Databricks
+ SQL Server Messages Garden of Analysis Result 1
+ Sybase
+ Teradata name len left_spaces l10 right_spaces l15
+ Vertica 1 Mandee Chambers 6 Mandee 10 Chambers 15
The LPAD () command pads spaces to the left of a string, and the RPAD () pads spaces to the right of a string.
Notice the spaces in the answer set and the lengths.
Page 390
Licensed to , [email protected]
Chapter 10 Strings
SELECT customer_name
,REPLACE (customer_name, ' ', '_') AS under_score
,phone_number
,REPLACE (phone_number, '-', ' ') AS no_dash
FROM customer_table
customer_name under_score phone_number no_dash
Billy's Best Choice Billy's_Best_Choice 555-1234 555 1234
Acme Products Acme_Products 555-1111 555 1111
ACE Consulting ACE_Consulting 555-1212 555 1212
XYZ Plumbing XYZ_Plumbing 347-8954 347 8954
Databases N-U Databases_N-U 322-1012 322 1012
Replace spaces with underscores Replace dashes with spaces
The RELACE function replaces a value for another in a string. Above, we have replaced the spaces in a Customer
name with underscores. In the phone Number, we have replaced the dashes (-) with space.
Page 391
Licensed to , [email protected]
Chapter 10 Strings
SELECT
ASCII('H') as asciih
,ASCII('o') as asciio
,ASCII('w') as asciiw
,ASCII('d') as asciid
,ASCII('y') as asciiy
The example above shows you how to convert characters into the integer ASCII value.
Page 392
Licensed to , [email protected]
Chapter 10 Strings
Syntax: REVERSE(str)
SELECT first_name
,REVERSE(first_name) as backward
FROM employee_table
WHERE dept_no = 400;
first_name backward
Herbert trebreH
William mailliW
Cletus sutelC
The example above shows how the REVERSE function returns the string with the order of the characters reversed.
Page 393
Licensed to , [email protected]
Chapter 10 Strings
SELECT department_name
,RIGHT(TRIM(department_name), 4) right_4_char
FROM department_table
department_name right_4_char
Research and Develop elop
Marketing ting
Customer Support port
Sales ales
Human Resources rces
The RIGHT function returns the rightmost n characters from the string, or null if any argument is null.
Page 394
Licensed to , [email protected]
Chapter 10 Strings
The LEFT and RIGHT functions are abbreviations of the SUBSTRING function. They
return a requested number of characters from the left or right end of the input string.
SELECT first_name
,LEFT (first_name , 1) AS first_initial
,last_name
,Right (RTRIM(last_name), 2) AS last_two_letters
FROM employee_table
WHERE dept_no in (400) ;
first_name first_initial last_name last_two_letters
Cletus C Strickling ng
Herbert H Harrison on
William W Reilly ly
In our example above, our result set will have the first_name and last_name coming back, but we also use the
LEFT and RIGHT functions to produce the first letter of the first_name and the last two letters of the last_name.
We filtered the rows with an additional WHERE clause to only bring back three rows. Notice the RTRIM of
last_name. The trim function is necessary because the last_name column has a data type of Character 20, so the
system pads it with spaces.
Page 395
Licensed to , [email protected]
Chapter 10 Strings
REGEXP returns true if the subject matches the specified pattern. Both inputs must be text expressions.
Page 396
Licensed to , [email protected]
Chapter 10 Strings
The REGEXP above looks for a city that starts with ‘Harrison’ but has additional characters (no whitespace).
Page 397
Licensed to , [email protected]
Chapter 10 Strings
Page 398
Licensed to , [email protected]
Chapter 10 Strings
The REGEXP examples above look for strings that start or don't start with one, two, or three.
Page 399
Licensed to , [email protected]
Chapter 10 Strings
The REGEXP examples above look for consecutive letters at the end of the string.
Page 400
Licensed to , [email protected]
Chapter 10 Strings
Page 401
Licensed to , [email protected]
Chapter 10 Strings
REGEXP_REPLACE
SELECT dept_no
,REGEXP_REPLACE(dept_no, 0, 1) As zero_to_1
FROM employee_table
WHERE dept_no IN (100, 200) ; Replace 0
with 1
for dept_no
DEPT_NO zero_to_1
200 211
200 211
100 111
Regexp_Replace returns the subject with the specified pattern (or all pattern occurrences) either removed or
replaced by a replacement string. If it finds no matches in the subject, Regexp_Replace returns the original subject.
For example, the query above uses Regexp_Replace to replace any zero with a one for Dept_No.
Page 402
Licensed to , [email protected]
Chapter 10 Strings
REGEXP_REPLACE Example
SELECT FIRST_NAME
,REGEXP_REPLACE(first_name, '^W', 'W starts W') as replaceW
,REGEXP_REPLACE(first_name, 'y$', 'y Ends with y') as replace_y
,REGEXP_REPLACE(first_name, 'Wendy', 'Wendi') as replace
,REGEXP_REPLACE(first_name, 'Wendy', '*****') as encrypt
FROM student_table
WHERE first_name = 'Wendy'
REGEXP_REPLACE searches for a specific Regex pattern from the provided string(value) and replaces it with
whatever you specify. Above, we are using many different techniques.
Page 403
Licensed to , [email protected]
Chapter 10 Strings
SELECT CUSTOMER_NAME
,REGEXP_REPLACE(customer_name, ' ', '_') AS underscore
,PHONE_NUMBER
,REGEXP_REPLACE(phone_number, '-', ' ') AS no_dash
FROM customer_table
REGEXP_REPLACE searches for a specific Regex pattern from the provided string(value) and replaces it with
whatever you specify. Above, we are replacing spaces with underscores and then replacing dashes with spaces.
Page 404
Licensed to , [email protected]
Chapter 10 Strings
REGEXP_LIKE
REGEXP_LIKE returns true if the subject matches the pattern. Both expressions must be text expressions. Our
example above is case-sensitive.
Page 405
Licensed to , [email protected]
Chapter 10 Strings
RLIKE
RLIKE returns true if the subject matches the specified pattern. Both inputs must be text expressions. RLIKE is a
relative of the LIKE function, but with POSIX extended regular expressions instead of SQL LIKE pattern syntax.
Thus, it supports more complex matching conditions than LIKE.
Page 406
Licensed to , [email protected]
Chapter 10 Strings
The SOUNDEX, better named "Sound" will display similar sounding items.
The example below will find any Last_Name that sounds like 'Smith'.
Syntax: SOUNDEX(String)
SELECT DISTINCT
SOUNDEX(last_name) as soundslike1
,SOUNDEX('Smith') as soundslike2
,last_name
FROM employee_table
WHERE SOUNDEX(last_name) = SOUNDEX('Smith');
Call center employees often look up customers by their last name while speaking with the customer on the phone.
The employees would like to guess the spelling of the name to narrow the search results and then work with the
customer to determine the appropriate spelling. The SOUNDEX function searches for similar sounds. Above, we
are looking at anyone with a name that sounds like 'Smith.' We got two results back in 'Smith' and 'Smythe.'
Page 407
Licensed to , [email protected]
Chapter 10 Strings
Page 408
Licensed to , [email protected]
Chapter 11 Interrogating the Data
- Albert Einstein
Page 409
Licensed to , [email protected]
Chapter 11 Interrogating the Data
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
123250 Phillips Martin SR 3.00
234121 Thomas Wendy FR 4.00
SELECT last_name
,NULLIF(grade_pt, 0) AS gp1
,NULLIF(grade_pt, 3.0) AS gp2
,NULLIF(grade_pt, 4.0) AS gp3
FROM student_table
WHERE student_id IN (423400, 123250, 234121)
ORDER BY last_name ;
The NULLIF command above examines the column grade_pt. If the first NULLIF statement if the grade_pt = 0,
then it will become null.
Page 410
Licensed to , [email protected]
Chapter 11 Interrogating the Data
student_table
student_id last_name first_name class_code grade_pt
423400 Larkins Michael FR 0.00
123250 Phillips Martin SR 3.00
234121 Thomas Wendy FR 4.00
SELECT last_name
,NULLIF(grade_pt, 0) AS gp1
,NULLIF(grade_pt, 3.0) AS gp2
,NULLIF(grade_pt, 4.0) AS gp3
FROM student_table
WHERE student_id IN (423400, 123250, 234121)
ORDER BY last_name ;
Look at the answers above, and if it doesn’t make sense, go over it again until it does.
Page 411
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Coalesce returns the first non-null value in a list; if all values are Null, it returns Null. Notice the query on the left
shows the employee's name and what numbers they have available. The query on the right uses the coalesce
command to attempt first to call their work_phone, but if that is null, they call the cell_phone, and if that is null,
they call the home_phone. If all the number values are null, then a literal 'No Phone' is the entry.
Page 412
Licensed to , [email protected]
Chapter 11 Interrogating the Data
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00 COALESCE
1324657 200 Coffing Billy 41888.88 returns the
1333454 200 Smith John 48000.00 first non-null
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
value in
2341218 400 Reilly William 36000.00 a list
1256349 400 Harrison Herbert 54500.00
SELECT last_name
,COALESCE (dept_no, employee_no) as coal
FROM employee_table
WHERE TRIM(last_name) IN ('Jones', 'Reilly')
ORDER BY 1
last_name coal
Jones 2000000
Reilly 400
Coalesce returns the first non-null value in a list, and if all values are Null returns Null.
Page 413
Licensed to , [email protected]
Chapter 11 Interrogating the Data
SELECT last_name
,COALESCE (dept_no, employee_no) as coal
FROM employee_table
SELECT last_name
, CASE
WHEN dept_no IS not null THEN dept_no
WHEN employee_no IS not null THEN employee_no
ELSE null
END as coal
FROM employee_table ;
Coalesce returns the first non-null value in a list, and if all values are Null, returns Null. Above are two queries
that return the same answer set. These examples should give you a better idea of how Coalesce works.
Page 414
Licensed to , [email protected]
Chapter 11 Interrogating the Data
SELECT
CAST('ABCDE' AS CHAR(1) ) AS trunc
,CAST(128 AS CHAR(3) ) AS ok
,CAST('2023-05-30' as Date) as date1
,'2023-06-30'::DATE as implied_cast
The Databricks CAST function converts a value from one data type to another data type. Above are some
examples to get a better understanding of the CAST command.
Page 415
Licensed to , [email protected]
Chapter 11 Interrogating the Data
SELECT
CAST(.014 AS Decimal(3,2)) as a014
,CAST(.016 AS Decimal(3,2)) as a016
,CAST(.015 AS Decimal(3,2)) as a015
,CAST(.0150 AS Decimal(3,2)) as a0150
,CAST(.0250 AS Decimal(3,2)) as a0250
,CAST(.0159 AS Decimal(3,2)) as a0159
Digit to Right of rounding digit = 5
Digit to Right Digit to Right AND there are trailing non-zero digits.
of rounding digit of rounding digit Rounding behaves as if the value to
< 5 (no change) > 5 (increase 1) the right of the rounding digit > 5
The examples above will help you understand how complicated and tricky rounding can be.
Page 416
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Page 417
Licensed to , [email protected]
Chapter 11 Interrogating the Data
The second example is better unless you have a simple query like the first example.
Page 418
Licensed to , [email protected]
Chapter 11 Interrogating the Data
The query above uses both a Valued Case and Searched Case.
Page 419
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Decode
course_name creditalias
Advanced SQL Three credits
Database Administration Credits Not found
Introduction to SQL Three credits
Physical Database Design Credits Not found
SQL Features Two Credits
Databricks Concepts Three credits
The decode command works like the CASE command. The two queries above are equivalent.
Page 420
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Aggregates ignore Nulls, so knowing this trick allows for Horizontal Reporting.
Page 421
Licensed to , [email protected]
Chapter 11 Interrogating the Data
I bet you didn't know you could put a CASE statement in the Order BY clause. You do now! Above, we are using
a valued CASE because there is a column value (class_code) immediately after the keyword CASE. A valued
CASE can only check for equality, and only for the column class_code.
Page 422
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Page 423
Licensed to , [email protected]
Chapter 11 Interrogating the Data
I bet you didn't know you could put a DECODE statement in the Order BY clause. Decode is much like a CASE
statement but instead uses a different format. We are using the Decode on the column class_code. Suppose the
value of class_code is 'FR,' then put in a 1, but if the value is 'SO,' then put in a 2, etc. If the value does not match
'FR', 'SO', 'JR', or 'SR', then put in a 5.
Page 424
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Your mission is to use the PIVOT_TEST_REGION_ALL table and write a case statement to produce the result set
above?
Page 425
Licensed to , [email protected]
Chapter 11 Interrogating the Data
SELECT PRODUCT,
SUM(CASE WHEN sales_person = 'Mary Jones' THEN daily_sales
ELSE NULL END) as Mary_Jones,
SUM(CASE WHEN sales_person = 'Will Davis' THEN daily_sales
ELSE NULL END) as Will_Davis,
SUM(CASE WHEN sales_person = 'Gary Lewis' THEN daily_sales
ELSE NULL END) as Gary_Lewis,
SUM(CASE WHEN sales_person = 'Helen Smith' THEN daily_sales
ELSE NULL END) as Helen_Smith,
Mary_Jones + Will_Davis + Gary_Lewis + Helen_Smith as total_sales
FROM pivot_test_region_all
GROUP BY product
ORDER BY product;
Here is how we answered the extreme CASE challenge. Your mission is to write a query using the intersect
operator to show all customers who have placed an order.
Page 426
Licensed to , [email protected]
Chapter 11 Interrogating the Data
SELECT *
,CASE
WHEN dept_no = 200 THEN 'Winner'
WHEN salary BETWEEN 20000 and 40000 THEN 'Worker'
WHEN salary < 50000 THEN 'Manager'
WHEN salary < 60000 THEN 'VP'
WHEN salary < 900000 THEN 'CEO'
Else 'DON''T KNOW'
END as title
FROM employee_table ORDER BY dept_no NULLS LAST ;
employee_no dept_no last_name first_name salary title
1000234 10 Smythe Richard 64300.00 CEO
1232578 100 Chambers Mandee 48850.00 Manager
1333454 200 Smith John 48000.00 Winner
1324657 200 Coffing Billy 41888.88 Winner
2312225 300 Larkins Loraine 40200.00 Manager
1256349 400 Harrison Herbert 54500.00 VP
1121334 400 Strickling Cletus 54500.00 VP
2341218 400 Reilly William 36000.00 Worker
2000000 ? Jones Squiggy 32800.50 Worker
Our WHEN statements are in the best logical order to produce only one CEO.
Page 427
Licensed to , [email protected]
Chapter 11 Interrogating the Data
Page 428
Licensed to , [email protected]
Chapter 12 Views
Chapter 12 – Views
-Mahatma Gandhi
Page 429
Licensed to , [email protected]
Chapter 12 Views
View Fundamentals
A view is a virtual table.
A view may define a subset of columns.
A view can even define a subset of rows if it has a WHERE clause.
A view never duplicates data or stores the data separately.
Views provide security.
View Advantages
An additional level of security is provided.
Helps the business user not miss join conditions.
Help control read and update privileges.
Unaffected when new columns are added to a table.
Unaffected when a column is dropped unless its referenced in the view.
The above information introduces View fundamentals and advantages. The most important things to understand
about views are that they never duplicate data but merely hide sensitive data from being seen by users.
Page 430
Licensed to , [email protected]
Chapter 12 Views
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Above, we create a view whose name is employee_v, and its creation does not include the employee_no or salary
columns. The users have access to the views, and the views have access to the actual tables.
Page 431
Licensed to , [email protected]
Chapter 12 Views
employee_table
employee_no dept_no last_name first_name salary
2000000 ? Jones Squiggy 32800.50
1000234 10 Smythe Richard 64300.00
1232578 100 Chambers Mandee 48850.00
1324657 200 Coffing Billy 41888.88
1333454 200 Smith John 48000.00
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
The view example above demonstrates how a view can restrict rows. In the view, emp_200_V, the user can only
see rows from dept_no 200.
Page 432
Licensed to , [email protected]
Chapter 12 Views
The view example above joins two tables together. By creating a view, we have now made it easier for the user
community to join these tables by merely selecting the columns you want from the view. Views can hide the
complexity of a query and allow users to access relevant information without being an SQL guru.
Page 433
Licensed to , [email protected]
Chapter 12 Views
Licensed to , [email protected]
Chapter 12 Views
The CREATE or REPLACE keywords change the definition of a view, which must exist.
Page 435
Licensed to , [email protected]
Chapter 12 Views
There are EXCEPTIONS to the no ORDER BY rule inside a view. The ANSI OLAP statements always have an
ORDER BY statement in them, but these still work inside a View.
Page 436
Licensed to , [email protected]
Chapter 12 Views
You should alias all derived columns in a query. You can refer to them when querying the view.
Page 437
Licensed to , [email protected]
Chapter 12 Views
The ALIAS for salary / 12 in this example is sal_monthly, and this form of aliasing is the most popular. All
derived data must have an alias in a view. The keyword 'as' in the alias definition is optional.
Page 438
Licensed to , [email protected]
Chapter 12 Views
SELECT *
FROM e_view2
emp_nbr last
1324657 Coffing
1333454 Smith
You can create aliases for columns right after the view name or right after the column name. Above, we have done
it both ways just to see which alias will be accepted by default. Databricks takes the first alias.
Page 439
Licensed to , [email protected]
Chapter 12 Views
You can create aliases for columns right after the view name or right after the column name. Above, we have done
it both ways to see which alias will be accepted by default. The first alias definition wins.
Page 440
Licensed to , [email protected]
Chapter 12 Views
Page 441
Licensed to , [email protected]
Chapter 13 Set Operators
"The man who doesn't read good books has no advantage over the man who
can't read them."
-Mark Twain
Page 442
Licensed to , [email protected]
Chapter 13 Set Operators
Page 443
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
INTERSECT
SELECT * FROM table_blue ;
In this example, what numbers in the answer set would come from the query above?
Page 444
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
INTERSECT
SELECT * FROM table_blue ;
In this example, only the number 3 was in both tables, so they intersect.
Page 445
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
UNION
SELECT * FROM table_blue ;
In this example, what numbers in the answer set would come from the query above?
Page 446
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
UNION
SELECT * FROM table_blue ;
1 2 3 4 5
Both top and bottom queries run simultaneously; then, the two different temporary files merge to eliminate
duplicates and place the remaining numbers in the answer set.
Page 447
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
UNION ALL
SELECT * FROM table_blue ;
In this example, what numbers in the answer set would come from the query above?
Page 448
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
UNION ALL
SELECT * FROM table_blue ;
1 2 3 3 4 5
Both top and bottom queries run simultaneously; then, the two different temp files merge to build the answer set.
The keyword ALL prevents eliminating duplicates.
Page 449
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * FROM table_red
EXCEPT
SELECT * FROM table_blue ;
EXCEPT delivers only the results of the top query, unless a value is
found in the bottom query, where it is removed. The bottom query
will never add results, but only take away from the top results.
In this example, what numbers in the answer set would come from the query above?
Page 450
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
The only possible results are
from Table_Red (1, 2, 3). SELECT * FROM table_red
Notice that Table_Blue EXCEPT
contains a 3, so this SELECT * FROM table_blue ;
eliminates the 3 from the
final answer.
1 2
The Top query SELECTED 1, 2, 3 from Table_Red. From that point on, only 1, 2, 3 can return in the answer set.
The bottom query runs on Table_Blue and eliminates matches from the top query.
Page 451
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * SELECT *
FROM table_blue FROM table_red
EXCEPT EXCEPT
SELECT * SELECT *
FROM table_red ; FROM table_blue ;
Will the result set be the same for both queries above?
Will both queries bring back the same result set? Check out the next page to find out.
Page 452
Licensed to , [email protected]
Chapter 13 Set Operators
table_red table_blue
1 3
2 4
3 5
SELECT * SELECT *
FROM table_blue FROM table_red
EXCEPT EXCEPT
SELECT * SELECT *
FROM table_red ; FROM table_blue ;
Will the result set be the same for both queries above?
NO
No! The first query returns 4, 5, and the query on the right returns 1, 2. The answer set can only contain values
from the first table mentioned. Values from the second query eliminate matches from the first table.
Page 453
Licensed to , [email protected]
Chapter 13 Set Operators
dept_no employee_no
SELECT dept_no 100 1232578
,employee_no 400 1256349
Both queries
FROM employee_table have 400 2341218
UNION the same 300 2312225
SELECT dept_no number ? 2000000
,mgr_no of columns in 10 1000234
FROM department_table; the 400 1121334
SELECT list. 200 1324657
200 1333454
100 1256349
200 1000234
300 1333454
500 1121334
You must have an equal number of columns in both SELECT lists. An equal number of columns is for eliminating
duplicate rows. So, for comparison purposes, there must be an equal number of columns in both queries.
Page 454
Licensed to , [email protected]
Chapter 13 Set Operators
depty the_mgr
100 1232578
SELECT dept_no as depty 400 1256349
,employee_no as the_mgr 400 2341218
FROM employee_table 300 2312225
UNION Top query is ? 2000000
SELECT dept_no responsible for 10 1000234
,mgr_no the column
400 1121334
FROM department_table; ALIAS
and 200 1324657
Formatting. 200 1333454
100 1256349
200 1000234
300 1333454
500 1121334
Page 455
Licensed to , [email protected]
Chapter 13 Set Operators
SELECT dept_no
,employee_no
FROM employee_table dept_no employee_no
UNION ? 2000000
Bottom 10 1000234
SELECT dept_no
query is 100 1256349
,mgr_no responsible
FROM department_table 100 1232578
for the
ORDER BY 1 ; ORDER BY 200 1324657
200 1333454
SELECT dept_no 200 1000234
,employee_no Bottom 300 2312225
FROM employee_table query can 300 1333454
UNION use the 400 2341218
column 400 1121334
SELECT dept_no number or
,mgr_no 400 1256349
column
FROM department_table name in the 500 1121334
ORDER BY dept_no ; ORDER BY
The Bottom Query is responsible for sorting and is the only place an ORDER BY statement works. You can use
the column number or the column name in the order by statement. You can even use the column name of
'employee_no' in the order by statement, even though it is from the top query.
Page 456
Licensed to , [email protected]
Chapter 13 Set Operators
Intersect Challenge
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
Your mission is to write a query using the intersect operator to show all customers who have placed an order.
Page 457
Licensed to , [email protected]
Chapter 13 Set Operators
customer_table order_table
customer_number customer_name order_number customer_number order_total
11111111 Billy’s Best Choice 123456 11111111 12347.53
31313131 Acme Products 123512 11111111 8005.91
31323134 ACE Consulting 123552 31323134 5111.47
57896883 XYZ Plumbing 123585 87323456 15231.62
87323456 Databases N-U 123777 57896883 23454.84
This quizzes answer uses the SET Operator INTERSECT in the subquery.
Page 458
Licensed to , [email protected]
Chapter 13 Set Operators
Unions will get better performance and use fewer system resources when using a Union ALL. Unless the user uses
the ALL option, there is overhead to eliminate duplicate rows from each result set and the final result.
Page 459
Licensed to , [email protected]
Chapter 13 Set Operators
Notice the 2nd SELECT column in that it is a literal 'employee ' (with two spaces), and the other Literal is
'Department.' These literals match up because now they are both ten characters long precisely. The UNION ALL
brings back all employees and all Departments and shows the employees in each valid department.
Page 460
Licensed to , [email protected]
Chapter 13 Set Operators
Combined_Custs
2,000,000 rows
of East and West
customers
Page 461
Licensed to , [email protected]
Chapter 13 Set Operators
manager name
1256349 Harrison, Herbert
1333454 Smith, John
1000234 Smythe, Richard
1121334 Strickling, Cletus
The Derived Table gave us the employee number for all managers, and we were able to join it.
Page 462
Licensed to , [email protected]
Chapter 13 Set Operators
employee_table department_table
employee_no dept_no last_name first_name salary dept_no department_name
2000000 ? Jones Squiggy 32800.50 100 Marketing
1000234 10 Smythe Richard 64300.00 200 Research and Dev
1232578 100 Chambers Mandee 48850.00 300 Sales
1324657 200 Coffing Billy 41888.88 400 Customer Support
1333454 200 Smith John 48000.00 500 Human Resources
2312225 300 Larkins Loraine 40200.00
1121334 400 Strickling Cletus 54500.00
2341218 400 Reilly William 36000.00
1256349 400 Harrison Herbert 54500.00
Page 463
Licensed to , [email protected]
Chapter 13 Set Operators
Above, we use multiple SET Operators. They follow the natural Order of Precedence in that UNION is evaluated
first, then INTERSECT, and finally, EXCEPT.
Page 464
Licensed to , [email protected]
Chapter 13 Set Operators
Above, we use multiple SET Operators and Parentheses to change the order of precedence. Above the EXCEPT
runs first, then the INTERSECT and lastly, the UNION. The natural Order of Precedence without parentheses is
UNION, INTERSECT, and, finally, EXCEPT.
Page 465
Licensed to , [email protected]
Chapter 13 Set Operators
Page 466
Licensed to , [email protected]
Chapter 14 Creating Tables
"Strength does not come from physical capacity. It comes from an indomitable
will."
- Mahatma Gandhi
Page 467
Licensed to , [email protected]
Chapter 14 Creating Tables
table_specification
( { column_identifier column_type [ NOT NULL ]
[ GENERATED ALWAYS AS ( expr ) |
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ] |
DEFAULT default_expression ]
[ COMMENT column_comment ]
[ column_constraint ] } [, ...]
[ , table_constraint ] [...] )
table_clauses
{ OPTIONS clause |
PARTITIONED BY clause |
clustered_by_clause |
LOCATION path [ WITH ( CREDENTIAL credential_name ) ] |
COMMENT table_comment |
TBLPROPERTIES clause } [...]
clustered_by_clause
{ CLUSTERED BY ( cluster_column [, ...] )
[ SORTED BY ( { sort_column [ ASC | DESC ] } [, ...] ) ]
INTO num_buckets BUCKETS }
Page 468
Licensed to , [email protected]
Chapter 14 Creating Tables
Data Types
Numeric data types Exact numeric data types Binary floating-point data
represent whole numbers: represent base-10 numbers: types use exponents and a
binary representation to
TINYINT Integral numeric
cover a range of numbers:
SMALLINT DECIMAL
FLOAT
INT
DOUBLE
BIGINT Date-time types represent
date and time components:
Numeric types represents DATE
all numeric data types:
TIMESTAMP Simple types are types defined
Exact numeric by holding singleton values:
TIMESTAMP_NTZ
Binary floating point Numeric
Date-time
Complex types are composed of multiple
components of complex or simple types: BINARY
ARRAY BOOLEAN
MAP INTERVAL
STRUCT STRING
Licensed to , [email protected]
Chapter 14 Creating Tables
Page 470
Licensed to , [email protected]
Chapter 14 Creating Tables
Licensed to , [email protected]
Chapter 14 Creating Tables
When you describe detail a table, you will see the columns format, id, name, description, location, createdAT,
lastModified, partitionColumns, numfiles, sizeInBytes, properties, minReaderVersion, and minWriterVersion.
Page 472
Licensed to , [email protected]
Chapter 14 Creating Tables
The not null constraint ensures that a specific column cannot contain NULL values.
Page 473
Licensed to , [email protected]
Chapter 14 Creating Tables
When creating a table, we recommend the IF NOT EXISTS option to ensure the table doesn't already have a table
with the same name in the database.
Page 474
Licensed to , [email protected]
Chapter 14 Creating Tables
You can create one table from another with the data automatically loaded by adding a SELECT statement at the
end of the create table statement, which refers to a CTAS (create table AS). The CTAS does not automatically
create any indexes for you, which is intentional to make the statement flexible and versatile. If you want to have
indexes in the table, you should specify these before the SELECT statement.
Page 475
Licensed to , [email protected]
Chapter 14 Creating Tables
SELECT *
FROM claims_some_columns;
The example above creates one table from another, but only uses some of the columns in the new table. The data is
automatically loaded as well.
Page 476
Licensed to , [email protected]
Chapter 14 Creating Tables
Page 477
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
- Anonymous
Page 478
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
INSERT Syntax # 1
The following syntax of the INSERT does not use the column names as
part of the command. Therefore, it requires that the VALUES portion of
the INSERT match each column in the table with a data value or a null.
The INSERT statement puts a new row into a table. The database returns a status from the database, but no rows
return to the user. It must account for all the columns in a table using either a data value or a null. When executed,
the INSERT places a single new row into a table.
Page 479
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
INSERT Syntax # 2
Above is another form of the INSERT statement that you can use when some of the data is not available. It allows
for the missing values (null) to be eliminated from the list in the VALUES clause. It is also the best format when
the data arranges in a different sequence than the create table, or when there are more nulls (unknown values) than
available data values. Notice in our top INSERT example that now() function inserts the current timestamp. Also,
notice in the second example that sale_date is missing in both the column definition and the value. Therefore
sale_date is null.
Page 480
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
You have the option of inserting multiple rows with a single insert statement. Above, we have added three rows.
Page 481
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
Above we have inserted multiple rows and placed null values in some of them.
Above we have inserted multiple rows and placed null values in some of them.
Page 482
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
INSERT/SELECT Command
The INSERT/SELECT command inserts data into a table from another table. Both the source and the target tables
must reside on the same system. The examples above show a lot of options.
Page 483
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
You can use an INSERT/SELECT to build a data mart. We populate the data mart above with a join query as the
SELECT.
Page 484
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
UPDATE Examples
Page 485
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
Both examples will delete rows in the table. Sometimes you want to delete them all, and sometimes you need to
delete specific rows.
Page 486
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
Page 487
Licensed to , [email protected]
Chapter 15 Data Manipulation Language (DML)
Page 488
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
"The future belongs to those who believe in the beauty of their dreams."
- Eleanor Roosevelt
Page 489
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
Page 490
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The following formula is used to compute the population excess kurtosis:
(n∗(n+1))/((n−1)∗(n−2)∗(n−3))∗(n∗m4/(k2)2)−3∗(n−1)2/((n−2)∗(n−3))
Kurtosis is a term used in statistics to describe the shape of a data set's distribution. It's a bit like describing the
"peakedness" or "tailedness" of the data. Imagine you have a bunch of numbers that represent the heights of people.
If most of the heights are close to the average height and the distribution of heights isn't too spread out, then the
data has low kurtosis. It's like a gentle, rounded hill. On the other hand, if the heights have some extreme values
and the distribution is more spread out, the data has high kurtosis. This is like a taller, more peaked hill. So,
kurtosis is a way to tell if your data has more or fewer extreme values compared to a standard distribution. It helps
you understand how the data's values are behaving, whether they're more clustered around the average or spread
out with some unusual values.
Page 491
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A KURTOSIS Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Easy way to see the distribution of data
Col
SELECT A positive value indicates a sharp or peaked
KURTOSIS(col1) AS COL1 distribution, and a negative number represents
,KURTOSIS(col2) AS COL2 a flat distribution. A peaked distribution means
,KURTOSIS(col3) AS COL3 that one value exists more often than the other
,KURTOSIS(col4) AS COL4 values. A flat distribution means there is the
FROM stats_table; same quantity values exist for each number.
A high-value result is leptokurtic. While a medium result is mesokurtic, and a low result is platykurtic.
Page 492
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
STDDEV_POP(<column-name>)
The STDDEV_POP function is a statistical tool that helps you determine how spread out or "spread apart" a group
of numbers is from their average. Imagine you're looking at a bunch of test scores. If the scores are all very close,
then the group doesn't have much spread, and the STDDEV_POP value will be low. But if the scores are all over
the place, the group has more spread, and the STDDEV_POP value will be higher. So, in simpler terms, the
STDDEV_POP function tells you how much the numbers in a group vary or spread out from their average. It's a
way to see how consistent or varied the data is.
Page 493
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
STDDEV_POP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The standard deviation function is a statistical measure of the spread or dispersion of values.
Page 494
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The STDDEV_SAMP function is a statistic tool that helps you estimate how much the values in a group vary or
spread out from their average. Imagine you're looking at a set of exam scores. If the scores are fairly close, the
group has little spread, and the STDDEV_SAMP value will be low. But if the scores are all over the place, the
group has more spread, and the STDDEV_SAMP value will be higher. The key difference between
STDDEV_POP and STDDEV_SAMP is that STDDEV_POP assumes you have data for an entire population,
while STDDEV_SAMP assumes you only have a sample from that population. This makes it more accurate when
working with a smaller portion of the data. In simpler terms, the STDDEV_SAMP function helps you understand
how much the numbers in a group differ from their average. It's like a way to measure the data's consistency or
variable, especially when dealing with a smaller group from a bigger population.
Page 495
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A STDDEV_SAMP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
STDDEV_SAMP(col1) AS COL1 Returns the sample
,STDDEV_SAMP(col2) AS COL2 standard deviation
,STDDEV_SAMP(col3) AS COL3 (square root of
,STDDEV_SAMP(col4) AS COL4 sample variance) of
,STDDEV_SAMP(col5) AS COL5 non-NULL values.
,STDDEV_SAMP(col6) AS COL6
FROM stats_table;
COL1 COL2 COL3 COL4 COL5 COL6
8.8 4.47 14.06 8.8 4.5 27.34
The standard deviation function is a statistical measure of the spread or dispersion of values. It is the root’s square
of the difference of the mean (average). This measure is to compare the amount by which a set of values differs
from the arithmetical mean.
Page 496
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax : VAR_POP(<column-name>)
74.92
What is variance? The variance is a measure of variability. It is calculated by taking the average of
squared deviations from the mean. Variance tells you the degree of spread in your data set. The more
spread the data, the larger the variance is in relation to the mean.
The VAR_POP function in statistics helps you determine how spread out the values in a group are from their
average. Imagine you have a bunch of test scores. If the scores are close, then the group doesn't have much spread,
and the VAR_POP value will be low. But if the scores are all over the place, the group has more spread, and the
VAR_POP value will be higher. In simple terms, the VAR_POP function gives you a number that describes how
much the numbers in a group vary or spread out from their average. It's a way to understand the overall variability
or dispersion of the data.
Page 497
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A VAR_POP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
VAR_POP(col1) AS COL1
,VAR_POP(col2) AS COL2 Returns the population
variance of non-NULL
Another flavor is ,VAR_POP(col3) AS COL3
records in a group.
seeing how much ,VAR_POP(col4) AS COL4
variance in the data ,VAR_POP(col5) AS COL5
,VAR_POP(col6) AS COL6
FROM stats_table;
Page 498
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
Syntax: VAR_SAMP(<column-name>)
The VAR_SAMP function in statistics helps you estimate how spread out the values in a sample group are from
their average. Imagine you have many exam scores from a smaller group of students. If the scores are fairly close,
the group doesn't have much spread, and the VAR_SAMP value will be low. But if the scores are all over the
place, the group has more spread, and the VAR_SAMP value will be higher. The key difference between
VAR_POP and VAR_SAMP is that VAR_POP assumes you have data for an entire population, while
VAR_SAMP assumes you only have a sample from that population. VAR_SAMP considers that you might have a
partial picture when you're working with a sample. In simpler terms, the VAR_SAMP function shows how much
the numbers in a sample group differ from their average. It's like measuring how variable or spread out the data is,
especially when working with a smaller portion of the entire population.
Page 499
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A VAR_SAMP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Variance has two forms; VAR_POP is for the entire population of data rows allowed by the WHERE clause.
VAR_SAMP is for a random sampling of the data rows allowed by the WHERE clause.
Page 500
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The CORR function is a way to figure out how strongly two sets of numbers are related or connected to each other.
Imagine you have two lists of data, like the amount of time people study and the grades they get. The CORR
function helps you find out if there's a relationship between these two things. The correlation would be positive if
higher study times usually lead to higher grades. The correlation would be negative if higher study times lead to
lower grades. In simpler words, the CORR function gives you a number that shows how much the two sets of
numbers move together. The correlation is positive if one goes up when the other goes up. If one goes down when
the other goes up, it's negative. It's like a math tool to help you understand if things are connected in a certain way.
Page 501
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A CORR Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
CORR(col1, col2) AS C1_2
As temp goes up ,CORR(col1, col3) AS C1_3 Where:
Then crime goes ,CORR(col1, col4) AS C1_4 1 = perfect positive correlation
Up (people are 0 = no correlation
,CORR(col1, col5) AS C1_5
outside.) -1 = perfect negative correlation
,CORR(col1, col6) AS C1_6
Negative corr is FROM stats_table ;
Less ice cream C1_2 C1_3 C1_4 C1_5 C1_6 Do data points move in the same
With higher temps. direction or opposite directions.
0.99 0.89 -1.00 -0.15 0.99
Variance tells us how much a quantity varies with its mean. Its the spread of data around the mean value. You only know
the magnitude here, as in how much the data is spread. Covariance tells us direction in which two quantities vary with
each other. Correlation shows us both, the direction and magnitude of how two quantities vary with each other.
Page 502
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
C4_2 C4_3 C4_1 C4_5 C4_6 C1_2 C1_3 C1_4 C1_5 C1_6
-0.99 -0.89 -1 0.15 -0.99 0.99 0.89 -1 -0.15 0.99
Page 503
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax: VARIANCE(<column-name>)
The VARIANCE function in statistics helps you figure out how much the values in a group differ from their
average, on average. Imagine you have a set of test scores. If the scores are close to each other, then the group
doesn't have much variation, and the VARIANCE value will be low. However, if the scores are spread out, the
group has more variation, and the VARIANCE value will be higher. In simple terms, the VARIANCE function
gives you a number that describes the average amount of spread or difference between the numbers in a group and
their average. It's like a way to measure the overall variability or dispersion of the data.
Page 504
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A VARIANCE Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
VARIANCE(col1) AS COL1
First calculate the variance
,VARIANCE(col2) AS COL2 The Variance function
ss a precursor to ,VARIANCE(col3) AS COL3 returns the sample
standard deviation. ,VARIANCE(col4) AS COL4 variance of non-NULL
,VARIANCE(col5) AS COL5 records in a group
,VARIANCE(col6) AS COL6
FROM stats_table ;
COL1 COL2 COL3 COL4 COL5 COL6
77.5 19.95 197.65 77.5 20.25 747.73
What is variance? The variance is a measure of variability. It is calculated by taking the average of squared deviations from
the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in
relation to the mean.
Page 505
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population. Imagine you have two sets of data, like the number of hours studied and the
corresponding test scores of different students. The COVAR_POP function helps you determine if there's a
consistent pattern between the two data sets when you're looking at the entire population. In simpler terms,
COVAR_POP gives you a number that shows how much the two sets of values move together or apart across the
entire group. If they usually go in the same direction, the covariance is positive. If they tend to go in opposite
directions, the covariance is negative. It's like a mathematical tool to help you understand how two things change
together across a population.
Page 506
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A COVAR_POP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population.
Page 507
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
C4_2 C4_3 C4_1 C4_5 C4_6 C1_2 C1_3 C1_4 C1_5 C1_6
-37.5 -105.9 -74.92 5.82 -230.75 37.5 105.9 -74.92 -5.82 230.75
The COVAR_POP function in statistics helps you understand how two sets of values change together, on average,
across an entire population.
Page 508
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship. Imagine you have two sets of data, like the number of hours studied and
the corresponding test scores of a smaller group of students. The COVAR_SAMP function helps you determine if
there's a consistent pattern between the two data sets within that sample. In simpler terms, COVAR_SAMP gives
you a number that shows how much the two sets of values tend to move together or apart in that smaller group. If
they usually go in the same direction, the covariance is positive. If they tend to go in opposite directions, the
covariance is negative. It's like a math tool that helps you estimate how two things change together in a smaller
data portion.
Page 509
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A COVAR_SAMP Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
COVAR_SAMP COVAR_SAMP(col1, col2) AS C1_2 COVAR_SAMP
returns the ,COVAR_SAMP(col1, col3) AS C1_3 eliminates all
sample expression pairs
,COVAR_SAMP(col1, col4) AS C1_4
covariance for where either
non-null pairs in ,COVAR_SAMP(col1, col5) AS C1_5 expression in the
a group. ,COVAR_SAMP(col1, col6) AS C1_6 pair is NULL.
FROM stats_table ;
The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship.
Page 510
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT SELECT
COVAR_SAMP (col1, col2) C1_2 COVAR_SAMP (col4, col2) C4_2
,COVAR_SAMP (col1, col3) C1_3 ,COVAR_SAMP (col4, col3) C4_3
,COVAR_SAMP (col1, col4) C1_4 ,COVAR_SAMP (col4, col1) C4_1
,COVAR_SAMP (col1, col5) C1_5 ,COVAR_SAMP (col4, col5) C4_5
,COVAR_SAMP (col1, coL6) C1_6 ,COVAR_SAMP (col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
38.79 109.55 -77.5 -6.02 238.71 -38.79 -109.55 -77.5 6.02 -238.71
The COVAR_SAMP function in statistics helps you estimate how two sets of values change together as a sample,
giving you an idea of their relationship.
Page 511
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The REGR_INTERCEPT function in statistics helps you find the point where a straight line (a linear regression
line) crosses the y-axis. Imagine you have a set of data points on a scatter plot that could fit a straight line. The
REGR_INTERCEPT function helps determine where that line starts on the vertical y-axis. In simpler terms, the
REGR_INTERCEPT function gives you a number representing the y-coordinate where the line crosses the y-axis.
It's like finding the starting point for a straight line that best fits your data. This helps you make predictions based
on the relationship between the two sets of data.
Page 512
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_INTERCEPT Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
A regression line is a line of best fit drawn through a set of points on a graph for X and Y coordinates. It uses the
Y coordinate as the Dependent Variable and the X value as the Independent Variable. Two regression lines always
meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y) and is not usually one of the
original data points.
Page 513
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT SELECT
REGR_INTERCEPT(col1, col2) C1_2 REGR_INTERCEPT(col4, col2) C4_2
,REGR_INTERCEPT(col1, col3) C1_3 ,REGR_INTERCEPT(col4, col3) C4_3
,REGR_INTERCEPT(col1, col4) C1_4 ,REGR_INTERCEPT(col4, col1) C4_1
,REGR_INTERCEPT(col1, col5) C1_5 ,REGR_INTERCEPT(col4, col5) C4_5
,REGR_INTERCEPT(col1, col6) C1_6 ,REGR_INTERCEPT(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
-1.35 3.45 31 17.65 -0.83 32.35 27.55 31 13.35 31.83
Two regression lines always meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y)
and is not usually one of the original data points.
Page 514
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points. Imagine you have many data points on a scatter plot, and they have a general
trend that could be described with a straight line. The REGR_SLOPE function helps you figure out how steep that
line should be. In simpler terms, the REGR_SLOPE function gives you a number that represents how much the y-
values (vertical values) change for each one-unit increase in the x-values (horizontal values) along the line. It's like
understanding how much the data rises or falls as you move along the line. This helps you see how the two sets of
data are related linearly.
Page 515
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_SLOPE Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
REGR_SLOPE returns the slope SELECT
of the linear regression line for REGR_SLOPE(col1, col2) AS C1_2
non-null pairs in a group. ,REGR_SLOPE(col1, col3) AS C1_3
,REGR_SLOPE(col1, col4) AS C1_4
Formula for REGR_SLOPE: ,REGR_SLOPE(col1, col5) AS C1_5
,REGR_SLOPE(col1, col6) AS C1_6
COVAR_POP(x,y) / VAR_POP(x) FROM stats_table ;
The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points.
Page 516
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT SELECT
REGR_SLOPE(col1, col2) C1_2 REGR_SLOPE(col4, col2) C4_2
,REGR_SLOPE(col1, col3) C1_3 ,REGR_SLOPE(col4, col3) C4_3
,REGR_SLOPE(col1, col4) C1_4 ,REGR_SLOPE(col4, col1) C4_1
,REGR_SLOPE(col1, col5) C1_5 ,REGR_SLOPE(col4, col5) C4_5
,REGR_SLOPE(col1, col6) C1_6 ,REGR_SLOPE(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
1.94 0.55 -1 -0.3 0.32 -1.94 -0.55 -1 0.3 -0.32
The REGR_SLOPE function in statistics helps you find the steepness or slope of a straight line (a linear regression
line) that best fits your data points.
Page 517
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points. Imagine you have a bunch of data points on a scatter plot. The x-values are the numbers on the horizontal
axis. The REGR_AVGX function helps you calculate the average or typical value of these x-values. In simpler
terms, the REGR_AVGX function gives you a number representing the central position of the x-values. It's like
finding the "middle" value of all the numbers that make up the x-coordinates of your data points. This can be useful
for understanding the general location of your data on the x-axis.
Page 518
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_AVGX Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT REGR_AVGX function
REGR_AVGX(col1, col2) AS C1_2 returns the average of the
,REGR_AVGX(col1, col3) AS C1_3 independent variable for
,REGR_AVGX(col1, col4) AS C1_4 non-null pairs in a group
,REGR_AVGX(col1, col5) AS C1_5 where x is the independent
,REGR_AVGX(col1, col6) AS C1_6 variable and y is the
FROM stats_table ; dependent variable.
The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points.
Page 519
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
8.67 21.73 15.5 7.23 51.17 8.67 21.73 15.5 7.23 51.17
The REGR_AVGX function in statistics helps you find the average x-values (horizontal values) in your data
points.
Page 520
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points. Imagine you have a bunch of data points on a scatter plot. The y-values are the numbers on the vertical axis.
The REGR_AVGY function helps you calculate the average or typical value of these y-values. In simpler terms,
the REGR_AVGY function gives you a number representing the central position of the y-values. It's like finding
the "middle" value of all the numbers that make up the y-coordinates of your data points. This can be useful for
understanding the general location of your data on the y-axis.
Page 521
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_AVGY Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT The REGR_AVGY function
REGR_AVGY(col1, col2) AS C1_2 returns the average of the
,REGR_AVGY(col1, col3) AS C1_3 dependent variable for non-
,REGR_AVGY(col1, col4) AS C1_4 null pairs in a group, where
,REGR_AVGY(col1, col5) AS C1_5 x is the independent variable
,REGR_AVGY(col1, col6) AS C1_6 and y is the dependent
FROM stats_table ; variable: REGR_AVGY(y, x)
C1_2 C1_3 C1_4 C1_5 C1_6
15.5 15.5 15.5 15.5 15.5
The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points.
Page 522
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT SELECT
REGR_AVGY(col1, col2) C1_2 REGR_AVGY(col4, col2) C4_2
,REGR_AVGY(col1, col3) C1_3 ,REGR_AVGY(col4, col3) C4_3
,REGR_AVGY(col1, col4) C1_4 ,REGR_AVGY(col4, col1) C4_1
,REGR_AVGY(col1, col5) C1_5 ,REGR_AVGY(col4, col5) C4_5
,REGR_AVGY(col1, col6) C1_6 ,REGR_AVGY(col4, col6) C4_6
FROM stats_table ; FROM stats_table ;
C1_2 C1_3 C1_4 C1_5 C1_6 C4_2 C4_3 C4_1 C4_5 C4_6
15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5
The REGR_AVGY function in statistics helps you find the average of the y-values (vertical values) in your data
points.
Page 523
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
The REGR_COUNT function in statistics helps you count the data points in your set. Imagine you have a bunch of
data points on a scatter plot. The REGR_COUNT function helps you determine how many of these points you
have. In simpler terms, the REGR_COUNT function gives you a number that tells you how many data points
you've got. It's like counting the dots on your scatter plot to know how much data you're working with. This count
can be important for various statistical calculations and understanding the reliability of your analysis.
Page 524
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_COUNT Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
REGR_COUNT(col1, col2) C1_2 The
REGR_COUNT
REGR_COUNT ,REGR_COUNT(col1, col3) C1_3
function is the
returns the number ,REGR_COUNT(col1, col4) C1_4 number of input
of non-null number ,REGR_COUNT(col1, col5) C1_5 rows in which both
pairs in a group. ,REGR_COUNT(col1, col6) C1_6 expressions are
REGR_COUNT(y, x) FROM stats_table ; non-null.
The REGR_COUNT function in statistics helps you count the data points in your set.
Page 525
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
Syntax: REGR_R2(Y, X)
The REGR_R2 function in statistics helps you understand how well a straight line (a linear regression line) fits
your data points. Imagine you have a scatter plot with data points, and you draw a straight line that you think best
represents the trend of the data. The REGR_R2 function helps you determine how closely the points match that
line. In simpler terms, the REGR_R2 function gives you a number between 0 and 1. If the number is closer to 1,
the line you drew fits the data points well. If it's closer to 0, the line doesn't match the points well. Think of it as a
measure of how well your line explains the data pattern. The closer to 1, the better the line fits the points; the closer
to 0, the worse the fit.
Page 526
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_R2 Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
REGR_R2(col1, col2) AS C1_2 The
,REGR_R2(col1, col3) AS C1_3 REGR_R2 is
the square of
,REGR_R2(col1, col4) AS C1_4 the
,REGR_R2(col1, col5) AS C1_5 correlation
,REGR_R2(col1, col6) AS C1_6 coefficient.
FROM stats_table ;
The REGR_R2 function in statistics helps you understand how well a straight line (a linear regression line) fits
your data points.
Page 527
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
Syntax: REGR_SXX(Y, X)
REGR_SXX returns
REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.
The REGR_SXX function in statistics helps you understand how the x-values (horizontal values) are spread in
your data set. Imagine you have a bunch of data points on a scatter plot. The x-values are the numbers on the
horizontal axis. The REGR_SXX function helps you calculate how much these x-values vary from their average. In
simpler terms, the REGR_SXX function gives you a number representing the sum of the squared differences
between each x-value and the average x-value. It's like measuring how much the x-values spread out from their
central position. This can help you understand the dispersion of your data along the x-axis.
Page 528
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_SXX function in statistics helps you understand how the x-values (horizontal values) are spread in
your data set.
Page 529
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
Syntax: REGR_SXY(Y, X)
REGR_SXY returns:
REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) for non-null pairs.
The REGR_SXY function in statistics helps you understand how the x-values (horizontal values) and y-values
(vertical values) change together in your data set. Imagine you have a bunch of data points on a scatter plot. The
REGR_SXY function helps you determine how the x-values and y-values move together or apart. In simpler terms,
the REGR_SXY function gives you a number that represents the sum of the products of the differences between
each x-value and the average x-value and the corresponding y-value and the average y-value. This helps you see
how the two sets of data change together in relation to each other. It's like measuring the "togetherness" of the data
points' movement along the x and y directions.
Page 530
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_SXY Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
REGR_SXY(col1, col2) AS C1_2
,REGR_SXY(col1, col3) AS C1_3 REGR_SXY returns:
,REGR_SXY(col1, cOL4) AS C1_4 REGR_COUNT(expr1, expr2) *
,REGR_SXY(col1, col5) AS C1_5 COVAR_POP(expr1, expr2) for
,REGR_SXY(col1, col6) AS C1_6 non-null pairs.
FROM stats_table ;
The REGR_SXY function in statistics helps you understand how the x-values (horizontal values) and y-values
(vertical values) change together in your data set.
Page 531
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
COL1 NUMBERS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
COL2 NUMBERS
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
Syntax: REGR_SYY(Y, X)
REGR_SYY returns:
REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.
The REGR_SYY function in statistics helps you understand how the y-values (vertical values) are spread in your
data set. Imagine you have a bunch of data points on a scatter plot. The y-values are the numbers on the vertical
axis. The REGR_SYY function helps you calculate how much these y-values vary from their average. In simpler
terms, the REGR_SYY function gives you a number representing the sum of the squared differences between each
y-value and the average y-value. It's like measuring how much the y-values spread out from their central position.
This can help you understand the dispersion of your data along the y-axis.
Page 532
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
A REGR_SYY Example
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
3 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
4 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
6 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col SELECT
REGR_SYY(col1, coL2) AS C1_2
,REGR_SYY(col1, col3) AS C1_3 REGR_SYY returns:
,REGR_SYY(col1, col4) AS C1_4 REGR_COUNT(y, x) * VAR_POP(y)
,REGR_SYY(col1, col5) AS C1_5 for non-null pairs.
,REGR_SYY(col1, col6) AS C1_6
FROM stats_table ;
The REGR_SYY function in statistics helps you understand how the y-values (vertical values) are spread in your
data set.
Page 533
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
Using GROUP BY
COL3 NUMBERS
1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
SELECT col3
,COUNT(*) AS CNT
,AVG(col1) AS AVG1
,STDDEV_POP(col1) AS SD1
,VAR_POP(col1) AS VP1
,AVG(col4) AS AVG4
,STDDEV_POP(col4) AS SD4
,VAR_POP(col4) AS VP4
,AVG(col6) AS AVG6
,STDDEV_POP(col6) AS SD6
FROM stats_table GROUP BY col3 ORDER BY 1;
COL3 CNT AVG1 SD1 VP1 AVG4 SD4 VP4 AVG6 SD6
1 2 1.5 0.5 0.25 29.5 0.5 0.25 2.5 2.5
10 7 6 2 4 25 2 4 24.29 8.63
20 14 16.5 4.03 16.25 14.5 4.03 16.25 53.57 10.76
30 2 24.5 0.5 0.25 6.5 0.5 0.25 75 5
40 2 26.5 0.5 0.25 4.5 0.5 0.25 87.5 2.5
50 2 28.5 0.5 0.25 2.5 0.5 0.25 92.5 2.5
60 1 30 0 0 1 0 0 100 0
Page 534
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
APPROX_COUNT_DISTINCT
EXACT_DISTINCT APPROX._DISTINCT
5 5
The APPROX_COUNT_DISTINCT function in statistics helps you get an estimate of how many different things
there are in a set, without having to count them all individually. Imagine you have a bag of different colored
marbles. Instead of taking out each marble and counting them one by one, the APPROX_COUNT_DISTINCT
function gives you a quick estimation of how many unique colors there are in the bag. In simpler terms, it's like a
shortcut to get a fairly accurate idea of how many different items are in a group, without having to count every
single item. It's especially useful when counting everything would take too much time.
Page 535
Licensed to , [email protected]
Chapter 16 Statistical Aggregate Functions
Page 536
Licensed to , [email protected]
Chapter 17 Mathematical Functions
Page 537
Licensed to , [email protected]
Chapter 17 Mathematical Functions
Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1 Query 2 Query 3
+ Snowflake SELECT
+ Azure Cloud
+ DB2 -10 as "neg10"
+ Excel ,Cos(90) as "cos" -- Trigonometric cosine of an angle
+ Greenplum ,Sin(90) as "sin" -- Trigonometric sine of an angle
+ Hadoop
+ Kognitio ,Tan(90) as "tan" -- Trigonometric tangent of an angle
+ Netezza ,Exp(6) as "exp" -- Exponential value of a number
+ Oracle ,Sqrt(16) as "sqrt" -- Square root of a number
+ Matrix
+ Yellowbrick
+ SQL Server Messages Garden of Analysis Result 1
+ Sybase
+ Teradata neg 10 cos sin tan exp sqrt
+ Vertica 1 -10 -0.45 0.89 -2 403.43 4
Page 538
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ABS
The ABS mathematical function returns the absolute value of a number. ABS falls under the Databricks category
of Numeric Functions (Rounding and Truncation). The ABS function, short for "absolute value," is a way to find
the distance of a number from zero on the number line. Imagine you have a number line, like the one you might see
in a math class. The absolute value of a number is like asking "how far away is this number from zero?" It doesn't
matter if the number is positive or negative, the absolute value is always positive (or zero). For example, let's say
you have the number -5. If you calculate the absolute value of -5, it's 5 because that's how far -5 is from 0 on the
number line. If you have a positive number, like 3, the absolute value of 3 is simply 3 because 3 is 3 units away
from 0 on the number line. So, the ABS function just gives you the distance of a number from zero, ignoring
whether the number is positive or negative.
Page 539
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ACOS
The ACOS mathematical function returns the inverse cosine value of an input radian value. The data type of the
return value is FLOAT. The ACOS function is a way to find out the angle you would need to take the cosine of to
get a specific number. Imagine you're playing with a flashlight and a wall. The wall represents the numbers that the
cosine function can give you. If you shine the flashlight on the wall, it makes a spot. Now, ACOS helps you figure
out what angle you need to hold the flashlight at to make that spot on the wall. In other words, if you know a
certain number that the cosine function gives you (let's call it "x"), the ACOS function will tell you the angle that
you'd need to shine the flashlight at to get that number on the wall. It's like figuring out the "secret angle" that gives
you that specific cosine value.
Page 540
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ACOSH
The ACOSH mathematical function computes its input's inverse (arc) hyperbolic cosine. Therefore, the returned
value has a data type of FLOAT. ACOSH stands for "inverse hyperbolic cosine." Imagine you're dealing with a
special kind of curve that looks a bit like a stretched-out smile. This curve is called a hyperbolic cosine curve.
Now, the ACOSH function helps you figure out a special number associated with this curve. Let's say you have a
number, let's call it "y," which is on this hyperbolic cosine curve. If you use the ACOSH function on that number,
it will tell you how much you need to stretch the smiley curve horizontally to reach the point "y." So, in simple
terms, ACOSH helps you find out how much you need to stretch the special curve to get to a particular point. It's
like finding the stretching factor needed to reach a certain spot on the stretched-out smiley curve.
Page 541
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ASIN
The ASIN mathematical function returns the inverse sine value of an input radian value. The data type of the return
value is FLOAT. ASIN returns the arcsine in radians (not degrees) in the range [-pi/2, pi/2]. The ASIN function is
a way to find out the angle you would need to point at to get a specific ratio involving a right triangle. Imagine you
have a right triangle, which is a triangle with one 90-degree angle (a perfect corner). One side of this triangle is
called the "opposite" side, another side is the "adjacent" side, and the longest side is the "hypotenuse." Now, let's
say you know the lengths of two sides of this triangle: the "opposite" side and the "hypotenuse." The ASIN
function helps you figure out the angle you would need to point at, so that when you take the ratio of the "opposite"
side to the "hypotenuse," you get the number you have. In simple terms, ASIN helps you find the angle that gives
you a specific ratio of sides in a right triangle. It's like finding the right angle to make a certain fraction using the
lengths of the triangle's sides.
Page 542
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ASINH
The ASINH mathematical function computes its argument's inverse (arc) hyperbolic sine. The ASINH function
might sound a bit complicated, but it's actually quite simple to understand. ASINH stands for "inverse hyperbolic
sine." Imagine you have a special curve that looks like a smooth hill. This curve is called a hyperbolic sine curve.
Now, the ASINH function helps you figure out a special number related to this curve. Let's say you have a number,
let's call it "y," which is on this hyperbolic sine curve. If you use the ASINH function on that number, it will tell
you how much you need to stretch the hill-like curve vertically to reach the point "y." So, in simple terms, ASINH
helps you find out how much you need to stretch the special curve to get to a particular point. It's like finding the
stretching factor needed to reach a certain spot on the smooth hill-like curve.
Page 543
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ATAN
The ATAN numeric function (Trigonometric) computes the inverse tangent (arctangent) of its argument; a result is
a number in the interval [-pi, pi]. Moreover, ATAN returns the arctangent in radians (not degrees) in the range [-pi,
pi]. The ATAN function is a way to find out the angle you would need to turn to in order to get a specific ratio
involving a right triangle. Imagine you have a right triangle, which is a triangle with one 90-degree angle (like a
corner of a book). One side of this triangle is called the "opposite" side, another side is the "adjacent" side, and the
angle between these two sides is what we're curious about. Now, let's say you know the lengths of the "opposite"
and "adjacent" sides of this triangle. The ATAN function helps you figure out the angle you would need to turn to,
so that when you take the ratio of the "opposite" side to the "adjacent" side, you get the specific number you have.
In simpler words, ATAN helps you find the angle that gives you a certain ratio of sides in a right triangle. It's like
turning to a particular angle to create a specific fraction using the lengths of the triangle's sides.
Page 544
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ATAN2
The arc tangent is the angle Syntax: ATAN2( <y> , <x> ) The first
between: parameter is the
Select atan2(1,2) Y coordinate,
The X axis. The ray from the
,atan2(2,5) not the X
point (0,0) to the point (X, Y)
,atan2(5,5); coordinate.
(where X and Y are not both 0).
EXPR_1 EXPR_2 EXPR_3
0.46 0.38 0.79
The ATAN mathematical function computes the Example: if x > 0, then the
inverse tangent (arc tangent) expression ATAN2(y, x) is
of the ratio of its two arguments. equivalent to ATAN(y/x).
The ATAN numeric function (Trigonometric) computes the inverse tangent (arctangent) of the ratio of its two
arguments. The arctangent is the angle between the X-axis and the ray from the point (0,0) to the point (X, Y)
(where X and Y are not both 0). The data type of the returned value is DOUBLE. The returned value is in radians,
not degrees, and a number in the interval [-pi, pi]. The ATAN2 function might seem a bit complex, but it's actually
quite practical. Imagine you're trying to find out the angle between two points on a flat surface, like a map. ATAN2
helps you figure out that angle easily. Picture a coordinate system like the "x" and "y" axes in math class. Now,
let's say you have two points: one is your starting point, and the other is where you want to go. ATAN2 helps you
find the angle you should move in, from the starting point, to reach the destination point. But there's a twist.
ATAN2 is really good at dealing with all the directions around the coordinate system. It considers which quadrant
your destination point is in, so you get the right angle no matter where the point is located. In simpler terms,
ATAN2 helps you find the angle you need to move to reach a destination point from a starting point, while making
sure you're facing the right way on your map. It's like getting the perfect compass direction to get from one spot to
another on a treasure map.
Page 545
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ATANH
The ATANH numeric function (Trigonometric) computes its argument's inverse (arc) hyperbolic tangent. The
real_expr should evaluate a real number between -1.0 and +1.0 (inclusive). ATANH stands for "inverse hyperbolic
tangent." Imagine you have a special curve that's kind of like a squished rubber band. This curve is called a
hyperbolic tangent curve. Now, the ATANH function helps you figure out a special number related to this curve.
Let's say you have a number, let's call it "y," which is on this squished rubber band curve. If you use the ATANH
function on that number, it will tell you how much you need to stretch the rubber band curve vertically to reach the
point "y." In simple terms, ATANH helps you find out how much you need to stretch the special curve to get to a
certain point. It's like finding the stretching factor needed to reach a specific spot on the squished rubber band
curve.
Page 546
Licensed to , [email protected]
Chapter 17 Mathematical Functions
CBRT
The CBRT numeric function returns the cube root of a numeric expression. CBRT always returns a floating-point
number, even if the input expression is a type integer. Imagine you have a big cube, like a block, and you want to
figure out the length of one side of the cube. The CBRT function helps you with that. It tells you the number you
need to multiply by itself three times (which means multiplying it by itself, and then again, and again) to get the
original number you started with. For example, if you use the CBRT function on 8, it will give you 2, because 2 * 2
* 2 equals 8. So, the CBRT function helps you find the special number that, when multiplied by itself three times,
gives you the original number. It's like finding the side length of a cube when you know its volume.
Page 547
Licensed to , [email protected]
Chapter 17 Mathematical Functions
Ceil
Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1
+ Snowflake SELECT ceil(-0.1) as ceil_1 ceil finds the
+ Azure Cloud
+ DB2
,ceil(3.333) as ceil_2 smallest
+ Excel ,order_total as order_total integer
+ Greenplum ,ceil(order_total) as ceiling_total NOT smaller
+ Hadoop FROM order_table than X
+ Kognitio LIMIT 1;
+ Netezza
+ Oracle
+ Matrix Messages Garden of Analysis Result 1
+ Yellowbrick
+ SQL Server ceil_1 ceil_2 order_total ceiling_total
+ Sybase
+ Teradata 1 0 4 12347.53 12348
+ Vertica
The "ceil" function, short for "ceiling," helps you round up a number to the nearest whole number that's greater
than or equal to it. Imagine you have a number that's not a whole number, like 3.7. When you use the "ceil"
function on this number, it pushes it up to the next bigger whole number, which is 4. So, the "ceil" function always
moves a number up to the closest higher whole number. It's like using the "ceil" function to round up a number is
like making sure you have enough space to cover that number, even if you have to use a larger unit.
Page 548
Licensed to , [email protected]
Chapter 17 Mathematical Functions
COS
The COS mathematical function returns the cosine value of an input radian value. The COS function, short for
"cosine," is a way to figure out a special value for an angle in a triangle. Imagine you're playing with a flashlight
and a wall. When you shine the flashlight on the wall, it makes a shadow of something you're holding at a certain
angle. The COS function helps you find out how long that shadow is compared to how far away you're holding the
thing. In other words, if you have an angle and you use the COS function on it, you'll get a number. This number
tells you the ratio between the length of the shadow and the distance from the flashlight to the wall. It's like a
special math tool for understanding angles and lengths in triangles.
Page 549
Licensed to , [email protected]
Chapter 17 Mathematical Functions
COSH
The COSH mathematical function computes the hyperbolic cosine of its argument. Therefore, the real_expr should
evaluate as a real number. COSH stands for "hyperbolic cosine," and it's a mathematical tool to deal with certain
curves. Imagine you have a special curve that looks like a chain hanging between two points. This curve is called a
hyperbolic cosine curve. The COSH function helps you figure out a special number related to this curve. If you
have a number, let's call it "x," and you use the COSH function on it, you'll get a new number. This new number
tells you how much the chain-like curve stretches out at the point "x." In simpler terms, COSH helps you
understand how much this specific curve stretches or grows at a particular point. It's like figuring out the "stretch
factor" of the chain-like curve.
Page 550
Licensed to , [email protected]
Chapter 17 Mathematical Functions
COT
The COT mathematical function computes the cotangent of its argument; the argument should be expressed in
radians. The COT function, short for "cotangent," is like a special tool to figure out another angle-related value.
Imagine you have an angle in a triangle, and you want to know how much you need to stretch a rope horizontally
from that angle to a certain point on the ground. The COT function helps you calculate that stretching factor. In
simple words, if you have an angle and you use the COT function on it, you get a number. This number helps you
understand the horizontal stretching of a rope from the angle to the ground. It's like using the COT function to find
out how far the rope reaches when you pull it from a specific angle.
Page 551
Licensed to , [email protected]
Chapter 17 Mathematical Functions
DEGREES
The DEGREES mathematical function converts a number from radians to degrees. Degrees return a type
REAL/FLOAT (double-precision floating-point). The DEGREES function is a tool that helps you understand
angles in a way that's easier to work with in everyday situations. Imagine you're playing with a compass or looking
at a map. The DEGREES function helps you take an angle that's given in a different measurement called radians
and convert it into degrees. Degrees are the kind of angles you're more familiar with – like 90 degrees making a
right angle or 180 degrees making a straight line. So, when you use the DEGREES function on an angle given in
radians, it helps you translate that angle into degrees, which are more intuitive for most people to understand. It's
like converting the angle from a special language that's used in math to a more common language of angles.
Page 552
Licensed to , [email protected]
Chapter 17 Mathematical Functions
DIV
The DIV mathematical function returns the integer quotient from the division of two DECIMAL values. The DIV
function, short for "division," is like a way of sharing things into equal groups. Imagine you have a bunch of
candies, and you want to share them equally among your friends. If you use the DIV function, you're basically
figuring out how many candies each friend will get if you split them up evenly. For example, if you have 10
candies and you use the DIV function by 2 (which you might write as 10 DIV 2), it's like saying, "How many
candies can I give to each friend if I divide them equally between 2 friends?" The answer is 5 candies each. So, the
DIV function helps you divide things into equal groups and tells you how much each group should get. It's like a
simple math tool for sharing things fairly.
Page 553
Licensed to , [email protected]
Chapter 17 Mathematical Functions
EXP
The EXP mathematical function computes Euler’s number e raised to a floating-point value. Euler's number, often
represented as "e," is a very special number in mathematics. Imagine you're saving money in a bank account, and
the bank is offering to give you interest on your money. Euler's number "e" is like a super magical way of
calculating that interest when it keeps getting added frequently, like every instant. It's like the bank giving you
interest not just once in a while, but all the time, faster and faster. For instance, let's say you start with $1 and your
bank uses the "e" formula for continuous compound interest. Over time, your money will grow to around $2.718,
which is the value of "e." So, "e" is a special number that shows up in all sorts of places in math and science where
things grow or change really smoothly and continuously, like how your money might grow with super quick
interest.
Page 554
Licensed to , [email protected]
Chapter 17 Mathematical Functions
FACTORIAL
The FACTORIAL mathematical function computes the factorial of its input. It's like a special way to multiply a
bunch of numbers together. Imagine you have a number, let's say 5. The factorial of 5, written as 5!, means you
multiply 5 by all the whole numbers that come before it: 5 x 4 x 3 x 2 x 1. So, 5! is equal to 120 because 5 x 4 x 3 x
2 x 1 equals 120. In general, if you have a number "n," then n! means you multiply n by all the whole numbers
from 1 to n. It's like a math trick to calculate how many different ways you can arrange things. So, the factorial
function is just a fancy way of multiplying numbers in a specific sequence to find out how many arrangements or
combinations you can make with those numbers.
Page 555
Licensed to , [email protected]
Chapter 17 Mathematical Functions
Floor
Nexus Chameleon
File Edit View Query Tools Help Web Windows History Sandbox
System: Databricks Database: SQL Class EXECUTE ? New Query
Systems Query 1
+ Snowflake SELECT floor(-0.1) as floor_1 Floor finds
+ Azure Cloud the largest
+ DB2
,floor(3.333) as floorl_2
,order_total as order_total integer
+ Excel NOT greater
+ Greenplum ,floor(order_total) as floor_total than X
+ Hadoop FROM order_table
+ Kognitio LIMIT 1;
+ Netezza
+ Oracle
+ Matrix Messages Garden of Analysis Result 1
+ Yellowbrick
+ SQL Server floor_1 floor_2 order_total floor_total
+ Sybase
+ Teradata 1 -1 3 12347.53 12347
+ Vertica
The "floor" function is like a way of rounding down a number to the nearest whole number. Imagine you have a
number that's not a whole number, like 3.8. When you use the "floor" function on this number, it pushes it down to
the closest smaller whole number, which is 3. In simpler words, using the "floor" function on a number makes sure
you're on or below that number on the number line. It's like finding the closest lower step on a staircase when you
want to go down.
Page 556
Licensed to , [email protected]
Chapter 17 Mathematical Functions
LN
The LN function stands for "natural logarithm," and it's a way to figure out a special value for numbers. Imagine
you have a number, like 10. The LN function helps you find another number that, when raised to a certain power,
gives you the number you started with. In this case, if you use the LN function on 10, it will give you a number
around 2.3026. This means that if you raise a special number (which is "e," Euler's number) to the power of
2.3026, you'll get very close to 10. In simpler words, the LN function helps you understand the power you need to
use on a specific number to get another number as a result. It's like finding the secret power you should use on "e"
to make it equal your original number.
Page 557
Licensed to , [email protected]
Chapter 17 Mathematical Functions
LOG
<expr> The value for Syntax: LOG(<base>, <expr>) The “base” to use
which you want to (e.g., 10 for base 10
know the log. arithmetic).
SELECT log(2, 0.5)
This can be of any ,log(2,1) The base can be of
numeric data type ,log(2,16) any numeric data type
(INTEGER, fixed- (INTEGER, fixed-
point, or floating LOG(2,0.5) LOG(2,1) LOG(2,16) point, or floating
point). point).
-1 0 4
The expr should be The base should be
greater than 0. greater than 0, and
not be exactly 1.0.
The LOG mathematical function returns the logarithm of a numeric expression. The LOG function helps you figure
out a special power that you need to use on a specific number to get another number. Imagine you have a number,
let's say 100. If you use the LOG function on this number with a base of 10 (written as "LOG base 10 of 100"), it
gives you 2. This means that if you raise 10 to the power of 2 (which is 10 * 10), you'll get 100. So, the LOG
function helps you understand what power you need to use on a certain base number to end up with your original
number. It's like solving a puzzle to find out how many times you need to multiply a base number to get the result
you're given.
Page 558
Licensed to , [email protected]
Chapter 17 Mathematical Functions
MOD
The MOD function is like a way to find the remainder when you divide two numbers. Imagine you have a bunch of
candies, and you want to share them with your friends equally. If you use the MOD function, you're not interested
in how many candies each friend gets, but instead, you want to know how many candies are left over after you've
shared them as evenly as possible. For example, if you have 10 candies and you're sharing them between 3 friends,
you can use the MOD function to find out that you'll have 1 candy left over. This means you can give each friend 3
candies, and you'll still have 1 candy left. So, the MOD function helps you find the "leftover" part when you're
dividing numbers into groups as fairly as possible. It's like a way of checking what's left after you've done your
sharing.
Page 559
Licensed to , [email protected]
Chapter 17 Mathematical Functions
PI
PI() PI()
3.14 3.1415926535897930000
PI is a special number in math that's used to understand circles. Imagine you have a round pizza. The number PI
helps you figure out how big the pizza's circumference (the distance around the edge) is compared to its diameter
(the width across the middle). When you use PI, you're finding out how many times the diameter of the circle fits
around its edge. This number is around 3.14159, but we usually just call it "PI" for short. So, PI is a magical
number that helps us understand how circles work and how their size is related to the distance around them. It's like
a secret key for solving circle puzzles in math!
Page 560
Licensed to , [email protected]
Chapter 17 Mathematical Functions
POW or POWER
The POW or POWER function is like a superpower for numbers. It helps you raise a number to a certain power,
which means multiplying it by itself a specific number of times. Imagine you have a number, let's say 2, and you
want to make it stronger by using its superpower. If you use the POW function with 2 and a power of 3 (written as
2^3), it means you're taking 2 and multiplying it by itself three times: 2 * 2 * 2 = 8. So, 2^3 equals 8. In simpler
terms, the POW or POWER function lets you make a number super strong by raising it to a certain power, which
tells you how many times to multiply it by itself. It's like giving a number its very own superhero boost!
Page 561
Licensed to , [email protected]
Chapter 17 Mathematical Functions
RADIANS
The RADIANS mathematical function converts a number from degrees to radians. The RADIANS function helps
you use a different way to measure angles that's often used in more advanced math. Imagine you have a pizza, and
you want to figure out how much of the pizza slice you're looking at. Normally, we use degrees to measure that.
But the RADIANS function helps you measure the same thing using a different unit called radians. In simple
terms, if you have an angle and you use the RADIANS function, you're changing how you measure that angle from
degrees to radians. It's like using a different measuring tape to see how big the angle is. RADIANS are often used
in math because they're a bit more precise for certain calculations, especially when things change smoothly and
continuously, like when objects move or waves oscillate.
Page 562
Licensed to , [email protected]
Chapter 17 Mathematical Functions
ROUND
The ROUND function is like a way to make a number simpler and easier to work with. Imagine you have a number
that's a bit messy with lots of decimal places, like 3.857. When you use the ROUND function on this number,
you're making it neater by choosing the nearest whole number or a specific number of decimal places. For instance,
if you use the ROUND function on 3.857 and you want 2 decimal places, it becomes 3.86 because that's the closest
number when you look just at two decimal places. So, the ROUND function helps you tidy up numbers by picking
the nearest whole number or a certain number of decimal places, making them easier to handle and understand. It's
like smoothing out the rough edges of numbers.
Page 563
Licensed to , [email protected]
Chapter 17 Mathematical Functions
SIGN
SELECT SIGN(-52.3)
,SIGN(55.5)
,SIGN(0) ;
The SIGN function is like a way to understand whether a number is positive, negative, or zero. Imagine you have a
number, like -5. When you use the SIGN function on this number, it tells you if it's positive, negative, or zero. If
the number is positive, the SIGN function gives you 1. If it's negative, you get -1. And if the number is exactly
zero, the SIGN function gives you 0. In simpler words, the SIGN function helps you quickly figure out the
"direction" of a number, whether it's going up (positive), down (negative), or not going anywhere (zero). It's like a
math compass that shows you which way the number is pointing.
Page 564
Licensed to , [email protected]
Chapter 17 Mathematical Functions
SIN
The SIN function is a tool that helps you understand heights and distances in a triangle, especially when you're
dealing with angles. Imagine you're shining a flashlight at a wall. When you move the flashlight's beam up or
down, it creates a spot that goes higher or lower on the wall. The SIN function helps you figure out how high that
spot is on the wall based on the angle at which you're holding the flashlight. So, if you have an angle, and you use
the SIN function on it, you get a number. This number tells you how high the spot on the wall will be, considering
the angle. It's like a magic math trick to understand how tall things will be when you shine light on them at a
specific angle.
Page 565
Licensed to , [email protected]
Chapter 17 Mathematical Functions
SINH
SINH stands for "hyperbolic sine," and it's a mathematical tool to work with certain curves. Imagine you have a
special curve that looks like a hill. This curve is called a hyperbolic sine curve. The SINH function helps you figure
out a special number related to this curve. If you have a number, let's call it "x," and you use the SINH function on
it, you'll get a new number. This new number helps you understand how much the hill-like curve stretches at the
point "x." In simpler terms, the SINH function helps you find out how much this specific curve stretches or grows
at a certain point. It's like figuring out the "stretch factor" of the hill-like curve.
Page 566
Licensed to , [email protected]
Chapter 17 Mathematical Functions
SQRT
The SQRT function is like a magical tool that helps you find the length of one side of a special square. Imagine
you have a square, and you know how big the area of that square is. If you use the SQRT function on that area, it
helps you figure out the length of one of the sides of that square. For instance, if you have a square with an area of
16, when you use the SQRT function on 16, it gives you 4. This means that each side of the square is 4 units long.
In simple words, the SQRT function helps you find the "secret" length of one side of a square when you know how
big its area is. It's like using a magic spell to find out the missing piece of information about the square.
Page 567
Licensed to , [email protected]
Chapter 17 Mathematical Functions
TAN
The TAN function is a way to understand how tall something is compared to how far away you're standing, based
on an angle. Imagine you're standing a bit away from a tall tree. If you look up at the top of the tree, you're forming
an angle. The TAN function helps you figure out how tall the tree is compared to the distance you're standing from
it. So, if you have an angle and you use the TAN function on it, you get a number. This number helps you
understand the ratio between the tree's height and the distance you're standing away from it. It's like using a math
tool to find out how tall something is without needing to measure it directly.
Page 568
Licensed to , [email protected]
Chapter 17 Mathematical Functions
TANH
TANH stands for "hyperbolic tangent," and it's a mathematical tool used to work with certain curves. Imagine you
have a special curve that looks like a squished rubber band. This curve is called a hyperbolic tangent curve. The
TANH function helps you figure out a special number related to this curve. If you have a number, let's call it "x,"
and you use the TANH function on it, you'll get a new number. This new number tells you how much the rubber
band-like curve stretches at the point "x." In simpler terms, the TANH function helps you find out how much this
specific curve stretches or grows at a certain point. It's like figuring out the "stretch factor" of the squished rubber
band curve.
Page 569
Licensed to , [email protected]
Chapter 17 Mathematical Functions
Page 570
Licensed to , [email protected]
The End
Page 571
Licensed to , [email protected]
The End
Page 572
Licensed to , [email protected]
Powered by TCPDF (www.tcpdf.org)