Chapter2 1
Chapter2 1
and Summary
Functions
E X P L O R AT O R Y D ATA A N A LY S I S I N S Q L
Christina Maimone
Data Scientist
Numeric types: integer
Name Storage Size Description Range
smallint or
2 bytes small-range -32768 to +32767
int2
bigint or int8
-9223372036854775808 to
8 bytes large-range
+9223372036854775807
smallint or
2 bytes small-range -32768 to +32767
int2
bigint or int8
-9223372036854775808 to
8 bytes large-range
+9223372036854775807
small auto-
smallserial 2 bytes 1 to 32767
increment
large auto-
bigserial 8 bytes 1 to 9223372036854775807
increment
decimal or
user-speci ed up to 131072 digits before the decimal point; up
numeric variable
precision, exact to 16383 digits a er the decimal point
decimal or
user-speci ed up to 131072 digits before the decimal point; up
numeric variable
precision, exact to 16383 digits a er the decimal point
variable-
real 4 bytes precision, 6 decimal digits precision
inexact
double variable-
8 bytes precision, 15 decimal digits precision
precision inexact
-- numeric division
SELECT 10/4.0;
2.500000000
min max
----- -------------
0 0.071957428
(1 row) (1 row)
avg
---------------------
0.00379494620059319
(1 row)
var_pop var_samp
---------------------- ----------------------
0.000140268640974167 0.000140271571051059
(1 row) (1 row)
SELECT variance(question_pct)
FROM stackoverflow;
variance
----------------------
0.000140271571051059
(1 row)
stddev_samp stddev_pop
-------------------- --------------------
0.0118436299778007 0.0118435062787237
(1 row) (1 row)
SELECT stddev(question_pct)
FROM stackoverflow;
stddev
--------------------
0.0118436299778007
(1 row)
42.13
Christina Maimone
Data Scientist
Count values
SELECT unanswered_count, count(*)
FROM stackoverflow
WHERE tag='amazon-ebs'
GROUP BY unanswered_count
ORDER BY unanswered_count;
unanswered_count | count
------------------+-------
37 | 12
38 | 40
...
43 | 10
44 | 8
45 | 17
46 | 4
47 | 1
...
54 | 131
55 | 34
56 | 1
(20 rows)
42.12
12000
trunc_ua | count
----------+-------
30 | 74
40 | 194
50 | 480
(3 rows)
generate_series generate_series
----------------- -----------------
1 0
3 0.1
5 0.2
7 0.3
9 0.4
(5 rows) 0.5
0.6
0.7
0.8
0.9
1.0
(11 rows)
Christina Maimone
Data Scientist
Correlation
corr
-------------------
0.637710143588615
(1 row)
^ ^
0th percentile 100th percentile
val
-----
1
3
4
5
(4 rows)
percentile_disc | percentile_cont
-----------------+-----------------
3 | 3.5
0 = missing or 0?
Negative values?
Christina Maimone
Data Scientist
Syntax
Create Temp Table Syntax
-- Create table as
CREATE TEMP TABLE new_tablename AS
-- Query results to store in the table
SELECT column1, column2
FROM table;