0% found this document useful (0 votes)

9 views41 pages

Chapter2 1

The document discusses numeric data types and summary functions in SQL. It describes integer, decimal, and floating point numeric types and their storage sizes and ranges. It also explains common summary functions like minimum, maximum, average, variance, standard deviation, and grouping and aggregation using GROUP BY. The document provides examples of calculating summary statistics and exploring distributions of numeric data using techniques like binning and generating series.

Uploaded by

Jim Bo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views41 pages

Chapter2 1

Uploaded by

Jim Bo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Numeric Data Types

and Summary
Functions
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L

Christina Maimone
Data Scientist
Numeric types: integer
Name Storage Size Description Range

integer or int typical choice -2147483648 to +2147483647

4 bytes
or int4

EXPLORATORY DATA ANALYSIS IN SQL

Numeric types: integer
Name Storage Size Description Range

integer or int typical choice

4 bytes -2147483648 to +2147483647
or int4

smallint or
2 bytes small-range -32768 to +32767
int2

bigint or int8
-9223372036854775808 to
8 bytes large-range
+9223372036854775807

EXPLORATORY DATA ANALYSIS IN SQL

Numeric types: integer
Name Storage Size Description Range

integer or int typical choice

4 bytes -2147483648 to +2147483647
or int4

smallint or
2 bytes small-range -32768 to +32767
int2

bigint or int8
-9223372036854775808 to
8 bytes large-range
+9223372036854775807

serial 4 bytes auto-increment 1 to 2147483647

small auto-
smallserial 2 bytes 1 to 32767
increment
large auto-
bigserial 8 bytes 1 to 9223372036854775807
increment

EXPLORATORY DATA ANALYSIS IN SQL

Numeric types: decimal
Storage
Name Description Range
Size

decimal or
user-speci ed up to 131072 digits before the decimal point; up
numeric variable
precision, exact to 16383 digits a er the decimal point

EXPLORATORY DATA ANALYSIS IN SQL

Numeric types: decimal
Storage
Name Description Range
Size

decimal or
user-speci ed up to 131072 digits before the decimal point; up
numeric variable
precision, exact to 16383 digits a er the decimal point

variable-
real 4 bytes precision, 6 decimal digits precision
inexact

double variable-
8 bytes precision, 15 decimal digits precision
precision inexact

EXPLORATORY DATA ANALYSIS IN SQL

Division
-- integer division
SELECT 10/4;

-- numeric division
SELECT 10/4.0;

2.500000000

EXPLORATORY DATA ANALYSIS IN SQL

Range: min and max
SELECT min(question_pct) SELECT max(question_pct)
FROM stackoverflow; FROM stackoverflow;

min max
----- -------------
0 0.071957428
(1 row) (1 row)

EXPLORATORY DATA ANALYSIS IN SQL

Average or mean
SELECT avg(question_pct)
FROM stackoverflow;

avg
---------------------
0.00379494620059319
(1 row)

EXPLORATORY DATA ANALYSIS IN SQL

Variance
Population Variance Sample Variance

SELECT var_pop(question_pct) SELECT var_samp(question_pct)

FROM stackoverflow; FROM stackoverflow;

var_pop var_samp
---------------------- ----------------------
0.000140268640974167 0.000140271571051059
(1 row) (1 row)

SELECT variance(question_pct)
FROM stackoverflow;

variance
----------------------
0.000140271571051059
(1 row)

EXPLORATORY DATA ANALYSIS IN SQL

Standard deviation
Sample Standard Deviation Population Standard Deviation

SELECT stddev_samp(question_pct) SELECT stddev_pop(question_pct)

FROM stackoverflow; FROM stackoverflow;

stddev_samp stddev_pop
-------------------- --------------------
0.0118436299778007 0.0118435062787237
(1 row) (1 row)

SELECT stddev(question_pct)
FROM stackoverflow;

stddev
--------------------
0.0118436299778007
(1 row)

EXPLORATORY DATA ANALYSIS IN SQL

Round
SELECT round(42.1256, 2);

42.13

EXPLORATORY DATA ANALYSIS IN SQL

Summarize by group
-- Summarize by group with GROUP BY
SELECT tag,
min(question_pct),
avg(question_pct),
max(question_pct)
FROM stackoverflow
GROUP BY tag;

tag | min | avg | max

--------------------------+-------------+----------------------+-------------
amazon-sqs | 6.91e-05 | 8.08328877005347e-05 | 9.6e-05
amazon-kinesis | 2.1e-05 | 3.3924064171123e-05 | 4.64e-05
android-pay | 2.97e-05 | 3.16712477396022e-05 | 3.29e-05
amazon-cloudformation | 4.8e-05 | 9.34518997326204e-05 | 0.00015246
citrix | 3.6e-05 | 3.95804407713499e-05 | 4.39e-05
amazon-ec2 | 0.001058039 | 0.00122817236730946 | 0.001378872
actionscript | 0.000551486 | 0.00067589990909091 | 0.000856132
amazon-ecs | 1.17e-05 | 3.40544117647059e-05 | 6.51e-05
mongodb | 0.0049625 | 0.00577465885069125 | 0.00631164
amazon-redshift | 0.000117294 | 0.000160832181818182 | 0.000212208
...

EXPLORATORY DATA ANALYSIS IN SQL

Let's work with
numbers!
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L
Exploring
distributions
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L

Christina Maimone
Data Scientist
Count values
SELECT unanswered_count, count(*)
FROM stackoverflow
WHERE tag='amazon-ebs'
GROUP BY unanswered_count
ORDER BY unanswered_count;

unanswered_count | count
------------------+-------
37 | 12
38 | 40
...
43 | 10
44 | 8
45 | 17
46 | 4
47 | 1
...
54 | 131
55 | 34
56 | 1
(20 rows)

EXPLORATORY DATA ANALYSIS IN SQL

Truncate
SELECT trunc(42.1256, 2);

42.12

SELECT trunc(12345, -3);

12000

EXPLORATORY DATA ANALYSIS IN SQL

Truncating and grouping
SELECT trunc(unanswered_count, -1) AS trunc_ua,
count(*)
FROM stackoverflow
WHERE tag='amazon-ebs'
GROUP BY trunc_ua -- column alias
ORDER BY trunc_ua; -- column alias

trunc_ua | count
----------+-------
30 | 74
40 | 194
50 | 480
(3 rows)

EXPLORATORY DATA ANALYSIS IN SQL

Generate series
SELECT generate_series(start, end, step);

EXPLORATORY DATA ANALYSIS IN SQL

Generate series
SELECT generate_series(1, 10, 2); SELECT generate_series(0, 1, .1);

generate_series generate_series
----------------- -----------------
1 0
3 0.1
5 0.2
7 0.3
9 0.4
(5 rows) 0.5
0.6
0.7
0.8
0.9
1.0
(11 rows)

EXPLORATORY DATA ANALYSIS IN SQL

Create bins: output
lower | upper | count
-------+-------+-------
30 | 35 | 0
35 | 40 | 74
40 | 45 | 155
45 | 50 | 39
50 | 55 | 445
55 | 60 | 35
60 | 65 | 0
(7 rows)

EXPLORATORY DATA ANALYSIS IN SQL

Create bins: query
-- Create bins
WITH bins AS (
SELECT generate_series(30,60,5) AS lower,
generate_series(35,65,5) AS upper),

EXPLORATORY DATA ANALYSIS IN SQL

Create bins: query
-- Create bins
WITH bins AS (
SELECT generate_series(30,60,5) AS lower,
generate_series(35,65,5) AS upper),
-- Subset data to tag of interest
ebs AS (
SELECT unanswered_count
FROM stackoverflow
WHERE tag='amazon-ebs')
-- Count values in each bin
SELECT lower, upper, count(unanswered_count)
-- left join keeps all bins
FROM bins
LEFT JOIN ebs
ON unanswered_count >= lower
AND unanswered_count < upper

EXPLORATORY DATA ANALYSIS IN SQL

Create bins: output
lower | upper | count
-------+-------+-------
30 | 35 | 0
35 | 40 | 74
40 | 45 | 155
45 | 50 | 39
50 | 55 | 445
55 | 60 | 35
60 | 65 | 0
(7 rows)

EXPLORATORY DATA ANALYSIS IN SQL

Time to explore
some distributions!
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L
More Summary
Functions
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L

Christina Maimone
Data Scientist
Correlation

EXPLORATORY DATA ANALYSIS IN SQL

Correlation function
SELECT corr(assets, equity)
FROM fortune500;

corr
-------------------
0.637710143588615
(1 row)

EXPLORATORY DATA ANALYSIS IN SQL

Median
1 1 4 4 4 5 6 7 13 19 20 20 21 21 22
^
median
50th percentile

^ ^
0th percentile 100th percentile

EXPLORATORY DATA ANALYSIS IN SQL

Percentile functions
SELECT percentile_disc(percentile) WITHIN GROUP (ORDER BY column_name)
FROM table;

-- percentile between 0 and 1

Returns a value from column

SELECT percentile_cont(percentile) WITHIN GROUP (ORDER BY column_name)

FROM table;

Interpolates between values

EXPLORATORY DATA ANALYSIS IN SQL

Percentile examples
SELECT val
FROM nums;

val
-----
1
3
4
5
(4 rows)

SELECT percentile_disc(.5) WITHIN GROUP (ORDER BY val),

percentile_cont(.5) WITHIN GROUP (ORDER BY val)
FROM nums;

percentile_disc | percentile_cont
-----------------+-----------------
3 | 3.5

EXPLORATORY DATA ANALYSIS IN SQL

Common issues
Error codes
Examples: 9, 99, -99

Missing value codes

NA, NaN, N/A, #N/A

0 = missing or 0?

Outlier (extreme) values

Really high or low?

Negative values?

Not really a number

Examples: zip codes, survey response categories

EXPLORATORY DATA ANALYSIS IN SQL

Let's practice!
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L
Creating Temporary
Tables
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L

Christina Maimone
Data Scientist
Syntax
Create Temp Table Syntax

-- Create table as
CREATE TEMP TABLE new_tablename AS
-- Query results to store in the table
SELECT column1, column2
FROM table;

Select Into Syntax

-- Select existing columns

SELECT column1, column2
-- Clause to direct results to a new temp table
INTO TEMP TABLE new_tablename
-- Existing table with exisitng columns
FROM table;

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

Delete (drop) table
DROP TABLE top_companies;

DROP TABLE IF EXISTS top_companies;

EXPLORATORY DATA ANALYSIS IN SQL

Time to create some
tables!
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L

All CC
100% (4)
All CC
26 pages
Pytube Documentation: Release 9.0.7
No ratings yet
Pytube Documentation: Release 9.0.7
34 pages
Non-Relational Postgres
No ratings yet
Non-Relational Postgres
71 pages
Unit - 5: System Software and Operating System
No ratings yet
Unit - 5: System Software and Operating System
3 pages
LinuxFoundationCKADDumps SecretHacksToCrackCKADExam
No ratings yet
LinuxFoundationCKADDumps SecretHacksToCrackCKADExam
5 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Thinking in Sets
No ratings yet
Thinking in Sets
37 pages
Techniques Used To Transform Data, Part 1
No ratings yet
Techniques Used To Transform Data, Part 1
12 pages
Solutions SQL PseudoCode BIE Concepts
No ratings yet
Solutions SQL PseudoCode BIE Concepts
5 pages
Microproccessor - Input-Output Techniques
No ratings yet
Microproccessor - Input-Output Techniques
62 pages
Day3 Datanalyst
No ratings yet
Day3 Datanalyst
10 pages
Unit 4
No ratings yet
Unit 4
29 pages
Smart Contact Manager Synopsis
No ratings yet
Smart Contact Manager Synopsis
15 pages
Ade 1737191501
No ratings yet
Ade 1737191501
29 pages
50+SQL - Interview Questions and Answers
No ratings yet
50+SQL - Interview Questions and Answers
8 pages
EDA SQL Document
No ratings yet
EDA SQL Document
3 pages
Kannad ELT, PR600 and Dongles
No ratings yet
Kannad ELT, PR600 and Dongles
3 pages
Iot Unit - 2
No ratings yet
Iot Unit - 2
9 pages
Computer Classes in Jaipur
No ratings yet
Computer Classes in Jaipur
8 pages
SQL Interview
100% (1)
SQL Interview
68 pages
Warehouse and SQL QUESTIONS
No ratings yet
Warehouse and SQL QUESTIONS
14 pages
ch1 Boookbksolnpdf
No ratings yet
ch1 Boookbksolnpdf
13 pages
Flipkart Business Analyst Interview Questions
No ratings yet
Flipkart Business Analyst Interview Questions
16 pages
Susan Diebel Lms
No ratings yet
Susan Diebel Lms
14 pages
Soql Practice
No ratings yet
Soql Practice
2 pages
DBMS Practicals
No ratings yet
DBMS Practicals
32 pages
Ai Unit 1
100% (1)
Ai Unit 1
101 pages
Yelp Dataset
No ratings yet
Yelp Dataset
9 pages
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
No ratings yet
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
10 pages
Chat TTT
No ratings yet
Chat TTT
14 pages
Dbms Print 4 To 11-1
No ratings yet
Dbms Print 4 To 11-1
27 pages
M DSM Guide Palo Alto
No ratings yet
M DSM Guide Palo Alto
3 pages
Data Engineer (3-5 Years of Experience.) PDF
No ratings yet
Data Engineer (3-5 Years of Experience.) PDF
7 pages
Data Analysis With SQL: Postgresql Cheat Sheet
No ratings yet
Data Analysis With SQL: Postgresql Cheat Sheet
4 pages
Eti PDF
No ratings yet
Eti PDF
16 pages
DBMS Lab-I - Questions
No ratings yet
DBMS Lab-I - Questions
12 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Chapter 1 Q & A
No ratings yet
Chapter 1 Q & A
4 pages
Database Testing Using SQL
No ratings yet
Database Testing Using SQL
6 pages
Semester 1 Midterm Exam PLSQL
100% (2)
Semester 1 Midterm Exam PLSQL
15 pages
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
100% (1)
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
29 pages
Group 2 - Security - Comelec Data Theft
No ratings yet
Group 2 - Security - Comelec Data Theft
20 pages
Crit - B - Record - of - Tasks IA
No ratings yet
Crit - B - Record - of - Tasks IA
3 pages
Assignment
No ratings yet
Assignment
4 pages
Using Built-In Functions: Module Overview
100% (1)
Using Built-In Functions: Module Overview
24 pages
Data Celko
No ratings yet
Data Celko
60 pages
COMP8780 Assignment Two - 2021-Final
No ratings yet
COMP8780 Assignment Two - 2021-Final
10 pages
Lab 1 - INTRODUCTION TO SQL.
No ratings yet
Lab 1 - INTRODUCTION TO SQL.
4 pages
ShipConstructor Outfitting Documentation
No ratings yet
ShipConstructor Outfitting Documentation
68 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
TD Advanced SQL
No ratings yet
TD Advanced SQL
88 pages
SSD1963EVAL Rev2A UG Rev1 0b
No ratings yet
SSD1963EVAL Rev2A UG Rev1 0b
16 pages
SQL For Data Analysis Cheat Sheet A4
No ratings yet
SQL For Data Analysis Cheat Sheet A4
3 pages
Control 07 2015 30pa PH Pac PHC
No ratings yet
Control 07 2015 30pa PH Pac PHC
44 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
4 pages
Select AS From: Part 1: Yelp Dataset Profiling and Understanding
No ratings yet
Select AS From: Part 1: Yelp Dataset Profiling and Understanding
10 pages
DS Lab # 06
No ratings yet
DS Lab # 06
3 pages
7d41 PDF
No ratings yet
7d41 PDF
7 pages
SQL For Data Analysis Cheat Sheet Letter
No ratings yet
SQL For Data Analysis Cheat Sheet Letter
3 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Cheat Sheet: Created by Tomi Mester
100% (1)
Cheat Sheet: Created by Tomi Mester
12 pages
DBMS Experiment-4
No ratings yet
DBMS Experiment-4
10 pages
Dbms Lab-1
100% (1)
Dbms Lab-1
22 pages
Developer's Guide: Driving Tivoli Workload Automation
No ratings yet
Developer's Guide: Driving Tivoli Workload Automation
88 pages
SQL For Data Analysis Cheat Sheet A4
No ratings yet
SQL For Data Analysis Cheat Sheet A4
3 pages
Name: Akshitha Paduru
No ratings yet
Name: Akshitha Paduru
4 pages
Dbms Lab File
No ratings yet
Dbms Lab File
31 pages
Understanding Histogram
No ratings yet
Understanding Histogram
28 pages
SQL Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
100% (1)
SQL Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
12 pages
HUAWEI IdeaHub S2 Datasheet (Simplified Edition) - For Printing
No ratings yet
HUAWEI IdeaHub S2 Datasheet (Simplified Edition) - For Printing
10 pages
SQL Cheat Sheet For Data Scientists by Tomi Mester PDF
100% (1)
SQL Cheat Sheet For Data Scientists by Tomi Mester PDF
12 pages
Lab04-Aggregate Functions, Group By, Numeric and String Functions
No ratings yet
Lab04-Aggregate Functions, Group By, Numeric and String Functions
8 pages
CubeRollup Slides PDF
No ratings yet
CubeRollup Slides PDF
27 pages
SQL Techniques
No ratings yet
SQL Techniques
37 pages
SQL Cheat Sheet
100% (3)
SQL Cheat Sheet
21 pages
Analytical Functions
No ratings yet
Analytical Functions
9 pages
DDL Is Data Definition Language Statements. Some Examples
No ratings yet
DDL Is Data Definition Language Statements. Some Examples
34 pages
Advantages and Disadvantages of Computer
No ratings yet
Advantages and Disadvantages of Computer
1 page
Oracle Analytic Functions Session1
No ratings yet
Oracle Analytic Functions Session1
16 pages
Number Functions
No ratings yet
Number Functions
10 pages
Con2803 PDF 2803 0001
No ratings yet
Con2803 PDF 2803 0001
15 pages
TT - Revit Structure - Working With Foundation PDF
No ratings yet
TT - Revit Structure - Working With Foundation PDF
13 pages
Arithmetic Operators
No ratings yet
Arithmetic Operators
4 pages
Return To Table of Contents
No ratings yet
Return To Table of Contents
8 pages
SQL - Useful Functions
No ratings yet
SQL - Useful Functions
5 pages
Oracle Histograms and Why
No ratings yet
Oracle Histograms and Why
8 pages
Function (Arg1,..., Argn) OVER ( (PARTITION BY ) (ORDER BY ) )
No ratings yet
Function (Arg1,..., Argn) OVER ( (PARTITION BY ) (ORDER BY ) )
11 pages
Analytic Functions by Example Oracle FAQ
No ratings yet
Analytic Functions by Example Oracle FAQ
16 pages
Advanced SQL Functions
No ratings yet
Advanced SQL Functions
12 pages
SIRE 2.0 Programme - Introduction and Guidance - Version 1.0 (January 2022)
100% (2)
SIRE 2.0 Programme - Introduction and Guidance - Version 1.0 (January 2022)
27 pages

Chapter2 1

Uploaded by

Chapter2 1

Uploaded by

Numeric Data Types

integer or int typical choice -2147483648 to +2147483647

EXPLORATORY DATA ANALYSIS IN SQL

integer or int typical choice

EXPLORATORY DATA ANALYSIS IN SQL

integer or int typical choice

serial 4 bytes auto-increment 1 to 2147483647

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

SELECT var_pop(question_pct) SELECT var_samp(question_pct)

EXPLORATORY DATA ANALYSIS IN SQL

SELECT stddev_samp(question_pct) SELECT stddev_pop(question_pct)

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

tag | min | avg | max

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

SELECT trunc(12345, -3);

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

-- percentile between 0 and 1

Returns a value from column

SELECT percentile_cont(percentile) WITHIN GROUP (ORDER BY column_name)

Interpolates between values

EXPLORATORY DATA ANALYSIS IN SQL

SELECT percentile_disc(.5) WITHIN GROUP (ORDER BY val),

EXPLORATORY DATA ANALYSIS IN SQL

Missing value codes

Outlier (extreme) values

Not really a number

EXPLORATORY DATA ANALYSIS IN SQL

Select Into Syntax

-- Select existing columns

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

EXPLORATORY DATA ANALYSIS IN SQL

DROP TABLE IF EXISTS top_companies;

EXPLORATORY DATA ANALYSIS IN SQL

You might also like