Introduction to SQL
Introduction to SQL
1. Databases
1.1. Databases
We have two main goals in this course. In Chapter One, we will get to know databases, which
store and organize data electronically. We'll discuss how databases and the data they store are
structured. This context will prepare us for our second goal: to extract data from databases using
SQL code in Chapter Two. Let's dive in!
A database stores data. Let's imagine that we are in charge of storing and organizing data for a library.
We might set up a database that holds information such as the data pictured here on patrons, books,
and checkouts. This information is housed in objects called tables, with data organized into rows and
columns. This database contains a patrons table, a books table, and a checkouts table.
A closer look at the patrons table shows that it stores various data about our library's patrons, like
library card number, name, the year the patron became a library member, and the total overdue fines
the patron owes our library.
1.3. Relational databases
A relational database defines relationships between tables of data inside the database. For example,
each of our library patrons might each be associated with several checkouts. Through these
relationships, we can draw conclusions about data housed in separate tables in the same database, and
answer questions such as "Which books did James check out during 2022?" or "Which books are
checked out most often?"
1.4. Database advantages
These tables might look similar to the way data is organized in spreadsheet applications such as Excel
or Google Sheets, but databases are far more powerful than spreadsheets. Databases can store much
more data, and storage is more secure due to encryption.
Possibly the biggest advantage of a database is that many users can write queries to gather insights
from the data at the same time. When a database is queried, the data stored inside the database does
not change: rather, the database information is accessed and presented according to instructions in
the query. Which leads us to the star of this show:
1.5. SQL
• Short for Structured Query Language
• The most widely used programming language for databases
SQL! SQL, or S-Q-L, is short for Structured Query Language. It is the most widely used programming
language for creating, querying, and updating relational databases. Once we are familiar with the data
we have and which table it is stored on, we can use SQL to begin writing queries to answer questions
about our library -- more on that in Chapter Two.
2. Tables
2.1. Tables
Now that we know the basic organization of a database, let's take a closer look at the main building
block of databases: tables!
We saw in the previous lesson that databases are organized into tables, which hold related data about
a particular subject. As we've seen, tables are organized into rows and columns; in the world of
databases, rows are often referred to as records and columns as fields. A table's fields are limited to
those set when the database was created, but the number of rows is unlimited.
2.3. Good table manners
• Be lowercase
• Have no spaces – use underscores instead
• Refer to a collective group or be plural
Let's talk a little bit about table naming. Table names should be lowercase and should not include
spaces - we use underscores in place of spaces. And ideally, a table name would refer to a
collective group (like "inventory") but it's also okay for the table to have a plural name (such as
"products").
2.4. Laying the table: records
A record is a row in a table. It holds data on an individual observation. Taking a look at the patrons table,
we see that the table has four records: one for each of the patrons. The record for Jasmin indicates that
she became a member in 2022 and owes two dollars and five cents in fines.
A field is a column in a table. It holds one piece of information about all observations in the table. The
"name" field in the patrons table lists all of the names of our library patrons.
2.6. More table manners
• Be lowercase
• Have no spaces
• Be singular
• Be different from other field names
• Be different from the table name
Because field names must be typed out when querying a database with SQL, field naming is
important. Generally, field names should be lowercase and should not involve spaces. A field name
should be singular rather than plural because it refers to the information contained in that field for a
single record. This is why our table has "card_num" and "name" fields rather than "card_nums" and
"names". Similarly, two fields in a table cannot have the same name. Finally, field names should
never share a name with the table they are housed in so that it's clear in all cases whether a field or table
is being referred to.
2.7. Assigned seats
A unique identifier, sometimes called a "key," is just what it sounds like: a unique value which
identifies a record so that it can be distinguished from other records in the same table. This value is
very often a number. In the patrons table, it makes sense to use the card_num field as the unique
identifier for each patron, not the name field, because it's possible that as our little library grows, two
patrons might have the same name.
Having more tables, each with a clearly marked subject, is generally better than having fewer tables
where information about multiple subjects is combined. Take a look at the patrons and checkouts
tables. Now, here's what our patrons and checkouts tables would look like if we tried to combine them.
It's the same data, but much less clear because it now contains duplicate information. While we can
see that Izzy has two checkouts and Maham has none, the card_num column is no longer unique
because of Izzy's multiple checkouts. We can always use SQL to gather information from multiple
related tables and connect them if a question requires it, but table topics should remain separate.
Exercise
Our very own table
A database has been set up for this course and the books table is available here.
Instructions:
• Hit "Run Code" to see the books table.
• SELECT *
• FROM books;
3. Data
3.1. Data
Welcome to the final part of the databases chapter! This lesson will focus on the data inside a
database as well as its storage.
When a table is created, a data type must be indicated for each field. The data type is chosen based on
the type of data that the field will hold - a number, text, or a date for example. We use data types for
several reasons. First, different types of data are stored differently and take up different amounts of
storage space. Second, some operations only apply to certain data types. It makes sense to multiply a
number by another number, but it does not make sense to multiply text by other text for example.
3.2.1. Strings
3.2.3. Floats
Now that we're familiar with data types, we can look at a database schema. Schemas are often
referred to as "blueprints" of databases. A schema shows a database's design, such as what tables are
included in the database and any relationships between its tables. A schema also lets the reader know
what data type each field can hold. The schema for our library database shows the VARCHAR data
type is used for strings like book title, author, and genre. We can also see that the patrons table is related
to the checkouts table, but not the books table.
3.4. Database storage
Finally, let's discuss storage. The information we find in a database table is physically stored on the
hard disk of a server. Servers are centralized computers that perform services via requests made
over a network. In our case, the service performed is data access, but servers are also used to
access websites or files stored on the server. Any computer can be a server if it is set up to provide
a service, even a laptop! However, servers are generally very powerful and large machines, because
they are best equipped to handle a high volume of requests and data.
4. Introducing queries
Welcome back. Now that we understand how data is organized in databases, we can begin drawing
insights using SQL queries!
4.2. What is SQL useful for?
Recall from the last chapter that SQL is used to answer questions both within and across relational
database tables. In the library database, we might use SQL to find which books James checked out
from the library in 2022. In an HR database, we could query salaries for employees in Marketing
and Accounting to determine whether pay across departments is comparable.
Let's write our first SQL code! To do that, we will need to learn a few keywords. Keywords are
reserved words used to indicate what operation we'd like our code to perform. The two most common
keywords are SELECT and FROM. Perhaps we'd like a list of every patron our library has. The
SELECT keyword indicates which fields should be selected - in this case, the name field. The
FROM keyword indicates the table in which these fields are located - in this case, the patrons table.
Let's put these parts together. Here's how the query should be written. The SELECT statement
appears first, followed on the next line by the FROM statement. It's best practice to end the
querywith a semicolon to indicate that the query is complete. We also capitalize keywords while
keeping table and field names all lowercase. Now let's take a look at the results of our query, often
called a result set. The result set lists all patron names, just as we had hoped. Note that we have not
changed our database by writing this query. The tables, including the patrons table, are exactly the
same as before we wrote the query. In order to share our results, we can save the SQL code we have
written so that our collaborators can use it to query the database themselves. We'll cover saving
queries in a later lesson.
4.6. Selecting multiple fields
To select multiple fields, we can list multiple field names after the SELECT keyword, separated by
commas. For example, to select card number and name, we'd list both field names in the order we'd
like them to appear in our result set. Notice that this does not have to match the order the fields are
presented in the table: listing name before card_num means that name appears first in the results.
As you might expect, we can select three fields such as name, card_num, and total_fine by listing
all three field names after the SELECT keyword and separating them with commas.
4.7. Selecting all fields
What if we'd like to select all four fields in the patrons table? We could list out the four field names
after the SELECT statement, but there's an even easier way: we can tell SQL to select all fields using an
asterisk in place of the four field names.
Exercise
Querying the books table
You're ready to practice writing your first SQL queries using the SELECT and FROM keywords.
Recall from the video that SELECT is used to choose the fields that will be included in the result
set, while FROM is used to pick the table in which the fields are listed.
Instructions 1/3:
• Use SQL to return a result set of all book titles included in the books table.
Instructions 2/3:
• Select both the title and author fields from books.
It's time to level up on our SQL queries by learning a few more commonly used keywords. Let's
dive in.
5.1. Allasing
Sometimes it can be helpful to rename columns in our result set, whether for clarity or brevity. We
can do this using aliasing. Perhaps we'd like to select the name and hire year for each record in the
employees table. We could alias the name column as first_name in the query by adding the AS
keyword to indicate an alias of first_name after selecting the name field. The result set now has
first_name rather than name as the column header. The alias only applies to the result of this
particular query; in other words, the field name in the employees table itself is still name rather than
first_name.
5.2. Selecting distinct records
Some SQL questions require a way to return a list of unique values. Let's imagine that we are
interested in getting a list of years in which we hired our current employees. If we select the
year_hired field from the employees table, the result set shows several years listed twice, which
isn't what we are looking for. To get a list of years with no repeat values, we can add the DISTINCT
keyword before the year_hired field name in the SELECT statement. Now, we can see that all of our
employees were hired in just four different years.
It's possible to return the unique combinations of multiple field values by listing multiple fields after the
DISTINCT keyword. Take a look at the employees table. Perhaps we'd like to know the years that
different departments hired employees. We could use this SQL query to look at this information,
selecting the dept_id and year_hired from the employees table. Looking at the results, we see that
department three hired two employees in 2021.
To avoid repeating this information, we could add the DISTINCT keyword before the fields to select.
Notice that the department id and year_hired fields still have repeat values individually, but none of the
records are the same: they are all unique combinations of the two fields.
5.4. Views
• A view is a virtual table that is the result of a saved SQL SELECT statement
• When accessed, vies automatically update in response to updates in the underlying data
Finally, let's discuss saving SQL result sets. In SQL, a view refers to a table that is the result of a saved
SQL SELECT statement. Views are considered virtual tables, which means that the data a view
contains is not generally stored in the database. Rather, it is the query code that is stored for future use. A
benefit of this is that whenever the view is accessed, it automatically updates the query results to
account for any updates to the underlying database. To create a view, we'll add line of code before the
SELECT statement: CREATE VIEW, then the name we'd like for the new view, then the AS keyword
to assign the results of the query to the new view name. Here, we create a view called
employee_hire_years by assigning the results of a query selecting three fields from the employees table
to a new view. There is no result set when creating a view.
5.5. Using views
Once a view is created, however, we can query it just as we would a normal table by selecting FROM
the view.
Exercise
Making queries DISTINCT
You've learned that the DISTINCT keyword can be used to return unique values in a field. In
this exercise, you'll use this understanding to find out more about the books table!
There are 350 books in the books table, representing all of the books that our local library has
available for checkout. But how many different authors are represented in these 350 books? The
answer is surely less than 350. For example, J.K. Rowling wrote all seven Harry Potter books, so
if our library has all Harry Potter books, seven books will be written by J.K Rowling. There are
likely many more repeat authors!
Instructions 1/2:
• Write SQL code that returns a result set with just one column listing the unique authors in
the books table.
Exercise
Aliasing
While the default column names in a SQL result set come from the fields they are created from,
you've learned that aliasing can be used to rename these result set columns. This can be helpful
for clarifying the intent or contents of the column.
Your task in this exercise is to incorporate an alias into one of the SQL queries that you worked
with in the previous exercise!
Instructions:
• Add an alias to the SQL query to rename the author column to unique_author in the result
set.
What if you'd like to be able to refer to it later, or allow others to access and use the results? The
best way to do this is by creating a view. Recall that a view is a virtual table: it's very similar to a
real table, but rather than the data itself being stored, the query code is stored for later use.
Instructions 1/2:
• Add a single line of code that saves the results of the written query as a view called
library_authors.
• -
- Save the results of this query as a view called library_a
uthors
• CREATE VIEW library_authors AS
• SELECT DISTINCT author AS unique_author
• FROM books;
Instructions 2/2:
• Check that the view was created by selecting all columns from library_authors.
-- Select all columns from library_authors
SELECT *
FROM library_authors;
6. SQL flavors
SQL has a few different versions, or flavors. Some are free, while others have customer support and
are made to complement major databases such as Microsoft's SQL Server or Oracle Database,
which are used by many companies. All SQL flavors are used with table-based relational
databases like the ones we've seen, and the vast majority of keywords are shared between them! In
fact, all SQL flavors must follow universal standards set by the International Organization for
Standards and the American National Standards Institute. Only additional features on top of these
standards result in different SQL flavors.
SQL Server
Let's take a look at two of the most popular SQL flavors. PostgreSQL is a free and open-source
relational database system which was originally created at the University of California, Berkeley, and
was sponsored by America's famous Defense Advanced Research Projects Agency, or DARPA.
DARPA also sponsored research leading to creating the internet, the computer mouse, and GPS! The
name "PostgreSQL" is used to refer to both the database system itself as well as the SQL flavor used
with it. SQL Server is also a relational database system which comes in both free and enterprise
versions. It was created by Microsoft, so it pairs well with other Microsoft products. T- SQL is
Microsoft's proprietary flavor of SQL, used with SQL Server databases.
Think of SQL flavors as dialects of the same language. If Claudia speaks American English, she will
have no trouble understanding people on a trip to London, even though most people in London speak
British English and there are some small differences. Here's an example of a small difference between
SQL Server and PostgreSQL: when we want to limit the number of records returned, we use the
LIMIT keyword in PostgreSQL. Here, we limit the number of employee names and ids selected to
only the first two records. The exact same results are achieved in SQL Server using the TOP keyword
instead of LIMIT. Notice that this keyword is the only difference between the two queries! Limiting
results is useful when testing code, since many result sets can have thousands of results! It's best to
write and test code using just a few results before removing the LIMIT for the final query.
New SQL learners may wonder which flavor they should learn. This may be an easy decision if a
learner knows that her employer uses Microsoft's SQL Server, for example. Or it might be a hard one
for a job seeker or student who doesn't know what database management system a future employer
might use. Don't worry too much about what flavor to learn. As we've seen, the differences are small. A
PostgreSQL wizard can become a SQL Server wizard by learning a handful of different keywords!
Exercise
Limiting results
Let's take a look at a few of the genres represented in our library's books.
Recall that limiting results is useful when testing code since result sets can have thousands of
results! Queries are often written with a LIMIT of just a few records to test out code before
selecting thousands of results from the database.
Instructions:
• Using PostgreSQL, select the genre field from the books table; limit the number of results
to 10.
• -- Select the first 10 genres from books using PostgreSQL
• SELECT genre
• FROM books
• LIMIT 10;