Module 1 - SQL For Analytics Introduction
Module 1 - SQL For Analytics Introduction
for Analytics
Start Learning
SQL For Analytics
Learn SQL by Application! Realistic ends to end case
studies, examples and challenges to teach you the way it is
meant to be used.
Preface
SQL was initially created to be the language for generating,
manipulating, and retrieving data from relational databases, which
have been around for more than 40 years. Over the past decade or
so, however, other data platforms such as Hadoop, Spark, and
NoSQL have gained a great deal of traction, eating away at the
relational database market. As will be discussed in the last few
chapters of this book, however, the SQL language has been evolving
to facilitate the retrieval of data from various platforms, regardless
of whether the data is stored in tables, documents, or flat files.
A Little Background
• Introduction to Database
• Relational Database, Primary Key & Foreign Key
• SQL as Part of the Data Analysis Workflow
• Database Data Types
Contents
Introduction to Database ..................................................... 2
1.1. Data Infrastructure .................................................... 2
1.2. Relational Database Systems ...................................... 3
1.3. SQL Constraints: ....................................................... 5
PRIMARY KEY Constraint ............................................. 5
FOREIGN KEY Constraint............................................. 6
Referencing Columns in Another Table ......................... 6
1.4. Database Structure .................................................... 8
1.5. Four Sublanguages of SQL .......................................... 8
SQL for Analytics ............................................................... 10
2.1. What Is Data Analysis? ............................................. 10
2.2. SQL as Part of the Data Analysis Workflow ................. 10
Database Data Types ......................................................... 13
3.1. Types of Data ........................................................... 13
1. Structured Versus Unstructured ................................ 13
2. Quantitative Versus Qualitative Data ......................... 14
3. Sparse Data ............................................................ 14
3.2. Database Data Types ................................................ 14
Introduction to Database | Module 1
SECTION 1
Introduction to Database
A database is nothing more than a set of related information. A
telephone book, for example, is a database of the names, phone
numbers, and addresses of all people living in a particular
region. While a telephone book is certainly a universal and
frequently used database, it suffers from the following:
2
Module 1 | Introduction to Database
A Database contains
one or more tables.
A table contains a
number of records.
A record contains
one or more fields
3
Introduction to Database | Module 1
4
Module 1 | Introduction to Database
Constraint Description
NOT NULL values cannot be null.
UNIQUE values cannot match any older value.
PRIMARY KEY used to uniquely identify a row.
FOREIGN KEY references a row in another table.
CHECK validates condition for new value.
DEFAULT set default value if not passed.
CREATE INDEX used to speed up the read process.
5
Introduction to Database | Module 1
6
Module 1 | Introduction to Database
7
Introduction to Database | Module 1
8
Module 1 | Introduction to Database
9
SQL for Analytics | Module 1
SECTION 2
10
Module 1 | SQL for Analytics
11
SQL for Analytics | Module 1
12
Module 1 | Database Data Types
SECTION 3
13
Database Data Types | Module 1
3. Sparse Data
14
Module 1 | Database Data Types
section will cover the basics. These are based on Postgres but
are similar across most major database types.
String data types are the most versatile. These can hold letters,
numbers, and special characters, including unprintable
characters like tabs and newlines. String fields can be defined to
hold a fixed or variable number of characters. A CHAR field could
be defined to allow only two characters to hold, for example, US
state abbreviation. Whereas a field storing the full names of
states would need to be a VARCHAR to allow a variable number
of characters.
Numeric data types are all the ones that store numbers, both
positive and negative. Mathematical functions and operators
can be applied to numeric fields. Numeric data types include the
INT types as well as FLOAT, DOUBLE, and DECIMAL types that
allow decimal places. Integer data types are often implemented
because they use less memory than their decimal counterparts.
15
Database Data Types | Module 1
16