Summarizing Data
Summarizing Data
I N T E R M E D I AT E S Q L
Jasmin Ludolf
Data Science Content Developer,
DataCamp
Summarizing data
Aggregate functions return a single value
INTERMEDIATE SQL
Aggregate functions
AVG() , SUM() , MIN() , MAX() , COUNT()
|avg | |sum |
|----------------| |------------|
|39902826.2684...| |181079025606|
INTERMEDIATE SQL
Aggregate functions
SELECT MIN(budget) SELECT MAX(budget)
FROM films; FROM films;
|min| |max |
|---| |-----------|
|218| |12215500000|
INTERMEDIATE SQL
Non-numerical data
Numerical fields only Various data types
AVG() COUNT()
SUM() MIN()
MAX()
INTERMEDIATE SQL
Non-numerical data
MIN() <-> MAX()
A <-> Z
0 <-> 100
INTERMEDIATE SQL
Non-numerical data
SELECT MIN(country) SELECT MAX(country)
FROM films; FROM films;
|min | |max |
|-----------| |------------|
|Afghanistan| |West Germany|
INTERMEDIATE SQL
Aliasing when summarizing
SELECT MIN(country) SELECT MIN(country) AS min_country
FROM films; FROM films;
|min | |min_country|
|-----------| |-----------|
|Afghanistan| |Afghanistan|
INTERMEDIATE SQL
Let's practice!
I N T E R M E D I AT E S Q L
Summarizing
subsets
I N T E R M E D I AT E S Q L
Jasmin Ludolf
Data Science Content Developer,
DataCamp
Using WHERE with aggregate functions
SELECT AVG(budget) AS avg_budget
FROM films
WHERE release_year >= 2010;
|avg_budget |
|--------------------|
|41072235.18324607...|
INTERMEDIATE SQL
Using WHERE with aggregate functions
SELECT SUM(budget) AS sum_budget SELECT MIN(budget) AS min_budget
FROM films FROM films
WHERE release_year = 2010; WHERE release_year = 2010;
|sum_budget| |min_budget|
|----------| |----------|
|8942365000| |65000 |
INTERMEDIATE SQL
Using WHERE with aggregate functions
SELECT MAX(budget) AS max_budget SELECT COUNT(budget) AS count_budget
FROM films FROM films
WHERE release_year = 2010; WHERE release_year = 2010;
|max_budget| |count_budget|
|----------| |------------|
|600000000 | |194 |
INTERMEDIATE SQL
ROUND()
Round a number to a specified decimal ROUND(number_to_round, decimal_places)
|avg_budget |
|avg_budget |
|--------------------|
|-----------|
|41072235.18324607...|
|41072235.18|
INTERMEDIATE SQL
ROUND() to a whole number
SELECT ROUND(AVG(budget)) AS avg_budget SELECT ROUND(AVG(budget), 0) AS avg_budget
FROM films FROM films
WHERE release_year >= 2010; WHERE release_year >= 2010;
|avg_budget| |avg_budget|
|----------| |----------|
|41072235 | |41072235 |
INTERMEDIATE SQL
ROUND() using a negative parameter
SELECT ROUND(AVG(budget), -5) AS avg_budget
FROM films
WHERE release_year >= 2010;
|avg_budget|
|----------|
|41100000 |
INTERMEDIATE SQL
Let's practice!
I N T E R M E D I AT E S Q L
Aliasing and
arithmetic
I N T E R M E D I AT E S Q L
Jasmin Ludolf
Data Science Content Developer,
DataCamp
Arithmetic
+ , - , * , and /
|7| |12|
|1| |1|
INTERMEDIATE SQL
Arithmetic
SELECT (4 / 3); SELECT (4.0 / 3.0);
|1| |1.333...|
INTERMEDIATE SQL
Aggregate functions vs. arithmetic
Aggregate functions
Arithmetic
INTERMEDIATE SQL
Aliasing with arithmetic
SELECT (gross - budget) SELECT (gross - budget) AS profit
FROM films; FROM films;
|?column?| |profit |
|--------| |--------|
|null | |null |
|2900000 | |2900000 |
|null | |null |
... ...
INTERMEDIATE SQL
Aliasing with functions
SELECT MAX(budget), MAX(duration) SELECT MAX(budget) AS max_budget,
FROM films; MAX(duration) AS max_duration
FROM films;
|max |max|
|-----------|---| |max_budget |max_duration|
|12215500000|334| |-----------|------------|
|12215500000|334 |
INTERMEDIATE SQL
Order of execution
Step 1: FROM SELECT budget AS max_budget
Step 2: WHERE FROM films
WHERE max_budget IS NOT NULL;
Step 3: SELECT (aliases are defined here)
Step 4: LIMIT
column "max_budget" does not exist
LINE 5: WHERE max_budget IS NOT NULL;
^
Aliases defined in the SELECT clause
cannot be used in the WHERE clause due to
order of execution
INTERMEDIATE SQL
Let's practice!
I N T E R M E D I AT E S Q L