SQL 101 A Beginner S Guide To Database From 1 To N Dev Nodrm
SQL 101 A Beginner S Guide To Database From 1 To N Dev Nodrm
INTRODUCTION 2
SELECT 3
WHERE – filtering data 5
Comparison operators 5
IS NULL 8
IS NOT NULL 8
AND operator 9
OR operator 9
ORDER BY – ordering data 11
JOIN – joining tables 13
INNER JOIN 13
LEFT JOIN 15
INSERT 18
UPDATE 20
DELETE 23
CREATE TABLE 25
CONSTRAINTS 27
PK - Primary Key 28
FK - Foreign Key 29
UNIQUE 31
NOT NULL 32
CHECK 33
CONCLUSÃO 34
APPENDIX 35
About the author 37
INTRODUCTION
It is hard to think today on a software, website or mobile app that does not
save some sort of data, either about its users registered, access statistics,
news, and so on.
Big part of these mentioned systems use a relational database model to
store the data on disk. It is plausible to say that when we store something, we
want to access it. We want to read, process, sort, filter the data to help us on
the daily tasks.
Its syntax knowledge, even in the basic form, it is almost mandatory to any
developer curriculum, because at some point, he or she will need to deal with
a database of some sort.
And that is why on this book, you will learn the basics about SQL to start
to deal with this new world. With many examples and application on real
problems, you will notice how is easy to learn this new language that can and
will open many doors in your career.
SELECT knowledge
FROM book
SELECT
What is it?
SELECT is the command used to select data in one or more tables inside a
database. This is the command that will likely be the most used by you when
working with a database. If we have thousands of data stored in disk, it does
not make any sense if we do not have the ability to recover such data. More
than recover such data, is to manipulate it: filter, sort, grouping, relationship,
and much more options are possible due to the select command.
Syntax
The syntax of the SELECT command is pretty simple at first sight… and at
the second also! All we have to think is which data we want, and in which
tables such data is stored. We only need this to start. The most basic syntax of
a SELECT is:
SELECT <columns>
FROM <table>
See how easy it is? The columns are the data we want to return. The table
is where such data is stored. Now, let us think that we work on some
company in our town. In this company, we have employees. So, it is plausible
to imagine that we have a table called “Employees” in our system, on which
we have information like: registration id, name, date of birth, sex and
department.
Jessica
101 03/01/1992 Female HR
Wilson
If we want to return all data from the table “Employees”, we use the
following command:
or
SELECT *
FROM Employees
What the ‘*’ means? You are requesting to the database all the columns
from that table. But be careful: even though it appears to be easier, this also
means more data returning from the database, more data passing on your
network and reaching your application. If most of this data is unnecessary,
you would be losing performance on your system!
We do not always will want to read all lines from a table, in fact, almost
never!
Because of that, the SELECT command has many options to select the
data. We will see one of the most important ones next.
WHERE – filtering data
What is it?
The WHERE is an optional clause for the SELECT command, which
means that it is not necessary for the command to be successful run. This
clause filters the rows which will be returned from the table. This filter is
applied to every row in the table: if the row pass the filter, it is returned, if
not, no.
Syntax
SELECT <columns>
FROM <table>
WHERE <condition>
Important: the WHERE clause always comes after the FROM clause.
Let us say that our SELECT command needs to return only employees
working on HR department. We could write the following command:
Comparison operators
These operators compares two values, and in the case where the
comparison is true, the row is returned. They are: < (less), > (greater), <=
(less than or equal), >= (greater than or equal), <> (different), IN (list of
values), NOT IN (not in list of values).
Examples
Now, suppose we want to return all employees from more than one
department. We can apply the IN operator. This operator compares the
column in the row with a list of values, and if at least one value matches, the
condition is true.
Examples
The IN operator also have its “negative” version, where we want to return
all rows that do not match any specified value.
Examples
Instead of returning rows where the value matches ‘Female’, we can opt to
return all rows where the value does not match ‘Male’, accomplishing the
objective.
Suppose we have in our employees table one column for the termination
date of an employee.
We can notice that the employees James Smith and Jessica Wilson are no
longer part of the company. However, the employees Robert Brown and
Patricia Miller still works on the company, thus yours termination date are
unknown. This is exactly what NULL means to the database: an unknown
value.
It is in these cases where we have to pay attention when filtering data,
because if we use the comparison operators to validate if a row must be
returned or not, how can we compare one value to something unknown?
For example, how we can answer this question: the Robert Brown’s
termination date is equal to today? We do not know his termination date, so
its value is unknown. This is the same that comparing today’s date with
unknown, we cannot say that is true. For these cases, we have two special
operators: IS NULL and IS NOT NULL.
IS NULL
This operator verifies if the value IS unknown. It is the same to ask: the
Robert Brown’s termination date is unknown? For this question, the answer
is true.
IS NOT NULL
This operator verifies if the value is NOT unknown. It is the same to ask:
the Robert Brown’s termination date is not unknown? For this question, the
answer is false, because we do not know such date. Now, if we ask: James
Smith’s termination date is not unknown? The answer is true, because we do
know such date (01/25/2020).
A comparison between one value and a NULL, always will result in false,
because we do not know the value, so we cannot affirm anything about the
comparison being done.
AND operator
AND is used to combine two or more expressions in SQL. To a row be
returned, all conditions must evaluate to true.
Suppose we want to return all male employees and that still works on the
company, the command would be:
OR operator
OR is used when we want to combine two or more expressions, however,
to the row to be returned, just one of the conditions must be true. Say you
want to return all female employees or employees that work on the
technology department, the command would be:
Syntax
SELECT <columns>
FROM <table>
ORDER BY <column> <ASC | DESC>
After we specify the column for which we want to order, we specify how
we want to order (ASC or DESC). Each column has its own ordering
direction. The default is ASC if not specified. Here are some examples.
Examples
When ordering by two or more columns, the rows’ order respect the order
in which the columns appear in the ORDER BY clause.
Return all employees ordered by sex, than by department
(desc):
Notice that we have two values for ‘Female’ and two values for ‘Male’.
So, when we order the rows, there is a tie. If no other column was specified,
than we could not guarantee to run this command again and get the same
order. But because the data ordering is also done by the department, in this
way there is no more tie, so the result will always be presented in this way (if
the data does not change). First we order the rows by sex, and then we order
again by department (respecting the first order).
What is it?
JOIN is the command used to relate tables with each other, based on a
common column. In the relational model, we have what is called “normal
forms”, that are basically a series of rules to follow, aiming the correct
development of the database. It is such rich content, that deserves a book by
itself. It is because of these rules that we do not have just one table with all
the data in it, but a lot of tables, each one containing a portion of a bigger
data set.
In many times we need to return data about the same information that is
stored in different tables. It is then we use the JOIN command. This
command links two tables, making a relationship between then, and so we
can access data from both tables.
Syntax
SELECT <columns>
FROM <table_A>
INNER JOIN <table_B>
ON <common_column_table_B> = <common_column_table_A>
It is a simple syntax. On the left side we have table A and on the right side
table B. We inform the common column between them, a column that is
capable of identifying the information both on table A and table B. And that
is it! We can access data from both tables.
INNER JOIN
Suppose we now have another table, the departments table:
department manager
Administration Steven Smith
HR Tyler Wilson
Technology John Wilson
name manager
James Smith John Wilson
Robert Brown Tyler Wilson
Patricia Miller Steven Smith
Jessica Wilson Tyler Wilson
Tip: every time that more than one table is used, use alias to identify them.
Aliases make it easier to write the command and identify the columns and
table, or even change the name of the column header to be returned. You just
need to put the alias (without spaces) after the name of the column/table or
you can use ‘AS’ and then put the alias as well. In the above command, the
aliases are ‘e’ for Employees table and ‘d’ for Department tables. The
following command has the same effect:
Notice that when the alias does not have any spaces, the single/double
quotes is not necessary (employee_name). However, when using a header
name with spaces, it must be included in single/double quotes (Manager
Name) depending on the engine being used.
LEFT JOIN
As said before, not always we have all the data about some object. When
using INNER JOIN, only rows that have values on both tables linked are
returned. Let’s add one more employee, but we do not know on which
department he will be working.
name manager
James Smith John Wilson
Robert Brown Tyler Wilson
Patricia Miller Steven Smith
Jessica Wilson Tyler Wilson
Remembering the set theory that we studied in school, this would be the
representation for the INNER JOIN. Only the gray area is returned.
To return Mike Halpert we must use LEFT JOIN. With this command, we
are requesting to the database all the data from the left table, even though
there is no match on the right table. It is pretty simple, just change the word
INNER for LEFT:
name manager
James Smith John Wilson
Robert Brown Tyler Wilson
Patricia Miller Steven Smith
Jessica Wilson Tyler Wilson
Mike Halpert (null)
This is the end of the chapter. Congratulations! You already know about
50% of what you need to become a beginner on SQL. But this does not make
the following chapters less important, it only means that SELECT and JOIN
are the commands you will be using a lot.
Syntax
When we want to insert a new row in the database, we only need to specify
where the data will be registered (the table), and the values desired:
Where:
Attention: every database engine treats data in a different way, you need to
check in the documentation what is the correct format to inform a date.
Notice that we did not inform the termination date, and because of this its
value is null. When inserting data, we do not need to specify all columns in
the table. Those columns not informed will have a null value. We can also
inform the null value explicitly.
Attention: we do not always inform the column and value for the primary
key of the table (we will see this later in the book). If the columns is of auto
increment, the engine itself will take care of filling the value. Each engine
acts differently, so check the documentation.
This is the basic we need to know about the INSERT. In the next chapters,
we will learn how to update and delete data from the tables, concluding our
CRUD (Create Read Update Delete) cycle.
UPDATE
What is it?
UPDATE is the command used when we want to update a value of a single
or multiple rows in a table.
Syntax
It is a simple syntax, we just need to inform the table, the column we want
to update and its new value. But keep in mind that is primordial to update
only the rows we want, to keep the integrity of the database, for this it is
recommended to always use the WHERE clause to identify such rows.
UPDATE <table>
SET <column> = <new_value>
WHERE <condition>
Attention: the WHERE clause is optional. If not specified, all rows from
that table will be updated. This is not always the target.
Let us suppose that the employee Robert Brown is leaving the company.
When this event occur, we need to update the termination date.
Suppose the event happened on 28th January 2020, the command would be
like the following:
UPDATE Employees
SET termination_date = ‘01/28/2020’
WHERE registration_id = 2
Notice here that we identified the rows to be updated using the WHERE
clause, based on the registration id. There is not two employees with the same
id, making this column probably the primary key for the table (keep calm, we
will learn this). For now, just keep in mind that the primary key of a table
does not allow two rows with the same value. So by identifying the
registration id in the WHERE, we are sure that no other employee will be
affected by our UPDATE.
Now suppose that there would be a change in the system, that the sex will
now be ‘M’ or ‘F’ instead of ‘Male’ or ‘Female’. Here we can update more
than one row at a time.
SELECT registration_id
FROM Employees
WHERE sex = ‘Male’
registration_id
1
2
200
250
UPDATE Employees
SET sex = ‘M’
WHERE registration_id IN (1, 2, 200, 250)
SELECT registration_id
FROM Employees
WHERE sex = ‘Female’
registration_id
50
101
Done!
Of course there is many ways of performing such task, but this one will do
just fine.
That is the end of the chapter. Let us move on, because sometimes we want
to delete some rows, just not update it.
DELETE
What is it?
DELETE is the command used to remove rows from the table.
Syntax
DELETE <table>
WHERE <condition>
As said before, we do not select the columns we want to delete, but the
rows that will be removed.
DELETE Employees
WHERE registration_d = 50
That is the end of the chapter. The UPDATE and DELETE commands are
pretty simple.
We now will enter one important topic in the relational database model,
that is understanding what makes it a relational model.
CREATE TABLE
What is it?
CREATE TABLE is the command used when we want to create new tables
on the database. When creating tables, we need to inform the columns that
will compose such table. Each column is composed of: name, data type
(numeric, text, date…) and optionally constraints (we will this later in the
book). For now, let us focus on the name and data type.
Syntax
The command is standardized for all database engines, however each
engine implements the same data type with different names. For example, on
Oracle platform, we have data types for date: date, timestamp, timestamp
with time zone. On Microsoft SQL Server, we have: date, smalldatetime,
datetime. The names are different, but all of them stores dates. From now on,
we will assume the pattern used by Microsoft.
Tip: there is a limit of 1024 columns for a table in SQL Server. In my own
experience, if a table has more than 20 columns, the table should be
reviewed.
Now, let’s create a new table for our database, the Dependents table. In this
table, we will store the dependent’s name, an id, date of birth, sex and to
which employee they relate to. This would be our command:
Our table is not ready yet. We did not define a primary key, did not create
the foreign key, mandatory columns, checks and so on. This will come now.
CONSTRAINTS
What is it?
CONSTRAINTS are in my opinion the basis of a relational database
model. They are aimed at ensuring data integrity between multiple tables.
And there are different types of constraints, each with its well-defined
purpose, which when used together, combined with good practices and the
use of the normal forms, create a database with integrity, simple and
effective.
Each of these constraints have its own syntax and in the following chapters
we will see one by one. We will also improve our Dependents table. In this
table, we will use all of the mentioned constraints, e by the end of the
chapter, a new table will be created.
Again, each database engine has its own syntax to implement constraints,
and in this book we will use the Microsoft SQL Server pattern.
PK - Primary Key
What is it?
This is the main constraint on a database, in my opinion. Its purpose is to
identify uniquely each row in the table, regardless if the table has 10 or 10
million rows. Each value is unique, end of story. We can compare it to a
dictionary: the table is the dictionary itself, and the word is the primary key,
there are no repeated words in a dictionary.
Syntax
There is two ways to define a PK: right after the column name (this is
called inline declaration) and below all the columns in the command (this is
called out-of-line declaration). The advantage of declaring out-of-line is that
you can give a friendly name to the constraint, so when some error on that
constraint occurs, the investigation is easier. So, in this book we will declare
all constraints in the out-of-line pattern.
Very simple!
Tip: whenever possible, use numeric columns as primary keys. This makes
it easier to the database to structure the index on a disk level.
What is it?
Foreign key is the second most import constraint in a relational model, in
my opinion. The FK is responsible to create the links (relations) between
tables. It is this key that guarantees the integrity between tables.
When a FK is applied to a column, that column can only have two values:
null or any value that exists on the table that it refers to. If we get the
employees table for example, on the column department I can only insert
values that are present in the departments table, or null. It is not possible in
our example to insert an employee with the department Sales, because there
is no sales department registered in the departments table.
Syntax
The foreign key syntax is more verbose, because we need to identify the
column on the table being applied the constraint and the table/column that it
refers to:
Tip: the FK’s name usually is formed by combining the table name itself
and the table that is referenced.
Tip: when we create a column that will be a FK, we usually use the same
name from the column in the table that we will refer to. In this case, the
column in the Employees table is registration_id, so when creating the
Dependents table we also create a column called registration_id, responsible
for linking both tables.
department manager
HR Tim Smith
HR Alicia Miller
Assuming that the company only allows one manager per department, we
cannot answer this question, our database in this way does not ensure this
business rule.
It is for that reason that a FK cannot refer a column that accepts duplicate
values.
Because this constraint is in the dependents table, this table is the child,
and the employees table is the mother.
This is why that when there is a foreign key, it is not possible to delete the
row in the referenced table if the row has dependencies on other tables, or, we
cannot delete the woman row if that row has child in another tables. This is
prohibited by the relational model, and if you try it, either by INSERT,
UPDATE or DELETE, an error will appear.
UNIQUE
What is it?
UNIQUE is a constraint that its purpose is to not allow duplicate values in
the column. The column still accepts null, but different engines treat it
differently: some of them accept more than one null, others just one null,
check the documentation.
Syntax
CREATE TABLE Dependents
(
…
associate_number int,
…
CONSTRAINT UQ_Associate_Number UNIQUE (associate_number)
)
This column is very similar to the id column, that is the primary key of the
table. The associate_number column is called a candidate key, because it also
has the power to identify a row uniquely, even though it is not the primary
key of the table.
NOT NULL
What is it?
NOT NULL, as the name suggests, is the constraint responsible for not
allowing null values into a column, thus making that column mandatory. If
the column value is not specified in the insert, the insert will fail. If tried to
update the value to null, the command will fail. The database is relentless
when ensuring the constraints.
Syntax
The simplest of all, just put NOT NULL after the data type:
What is it?
The CHECK constraint is responsible to validate the values entered by the
user on a column and allow or not to be stored. When used, it is informed a
list of possible values accepted by that column, and when inserting or
updating data in that column, if the value is not on that list, the command will
not succeed.
Syntax
CREATE TABLE Dependents
(
…
sex char(1),
…
CONSTRAINT CK_Sex CHECK (sex IN (‘F’, ‘M’))
)
In this way, we can only enter ‘F’ or ‘M’ in the column sex.
This is the final CREATE TABLE with all the constraints applied to our
new table:
There are many other things to learn, but in the end, it all comes down to
writing a good and well-written SELECT command, knowing how to use
properly the operators, have a lot of attention when working with nulls, and
not executing a DELETE without WHERE! Unless you are really sure about
it.
See you…!
APPENDIX
Following are the commands used to create the tables used, and to insert
the data used in this book, following the Microsoft SQL Server pattern.
Tables:
CREATE TABLE Department
(
department nvarchar(50) PRIMARY KEY,
manager nvarchar(50)
)
Data:
INSERT INTO Department (department, manager)
VALUES('Administration', 'Steven Smith')
INSERT INTO Department (department, manager)
VALUES('HR', 'Tyler Wilson')
INSERT INTO Department (department, manager)
VALUES('Technology', 'John Wilson')
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(1, 'James Smith', '1990-10-05', 'Male', '2018-12-20', 'Technology')
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(2, 'Robert Brown', '1987-08-25', 'Male', null, 'HR')
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(50, 'Patricia Miller', '1983-01-17', 'Female', '2020-01-25', 'Administration')
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(101, 'Jessica Wilson', '1992-01-03', 'Female', null, 'HR')
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(200, 'Mike Halpert', '1972-10-20', 'Male', null, null)
INSERT INTO Employees (registration_id, name, date_birth, sex, termination_date, department)
VALUES(250, 'Wilson Lewis', '1970-12-15', 'Male', null, 'Technology')
About the author
Lucas Felix Carvalho, graduated in Systems Analysis and Development at
FATEC Franca, in the city of Franca – São Paulo estate. I always liked
database, since my third semester at university. For this reason, I studied and
acquired three database certifications (Oracle and Microsoft), and there are
more to come.
Name type
Pedro Figueredo Technical
Roberta Carvalho General
Samara Borges User