0% found this document useful (0 votes)
227 views49 pages

BFDFG

fhhjh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
227 views49 pages

BFDFG

fhhjh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Unit 3 Tutorials: Administration

INSIDE UNIT 3

Reliability

Transactions
ACID Properties
Atomicity
Consistency
Isolation
Durability
COMMIT and ROLLBACK to Manage Changes

Security

CREATE USER/ROLE to Add Users


CREATE ROLE to Create Groups
GRANT to Assign Users
GRANT to Assign Privileges
Application Security
Superusers

Enhancement

Index Overview
B-Tree Index
Hash Index
DROP INDEX to Remove Indexes

Management

Create a Backup
Restore from Backup
Backups: Command Line vs. GUI
Backup Methods

Transactions

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the concept of transactions in a database to ensure multiple statements execute
in an all-or-nothing scenario, in two parts:
1. Introduction
2. Examples

1. Introduction
Transactions are a core feature of every database system. The purpose of a transaction is to combine multiple
SQL statements together into a scenario that would either execute all of the statements or none of them. Each
individual SQL statement within a transaction is not visible to other concurrent transactions in the database, as
they are not saved to the database unless all of the statements in the transaction have executed successfully.
If at any point there is a failure that occurs in any of the statements within a transaction, none of the steps or
statements affect the database at all.

2. Examples
Let us take a look at a scenario in which a transaction would be necessary. James has gone to an online
computer store and purchased a new computer for $500. In this transaction, there are two things to track. The
first is the $500 being transferred from James to the store, and the second is the computer being deducted
from the store inventory and transferred to James. The basic SQL statements would look something like the
following:

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;

INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)


VALUES (1000, ‘Computer’,5, 1, 500);

UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;
As you can see, there are multiple SQL statements that are needed to accomplish this operation. Both James
and the store would want to be assured that either all of these statements occur, or none of them do. It would

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
not be acceptable if James’s account was deducted by $500, the inventory for the store had the computer
removed, and then there was a system failure. This would mean that James would not get the order and the
store would not get the $500 that James had paid. We need to ensure that if anything goes wrong at any
point within the entire operation, that none of the statements that had been executed so far would take effect.
This is where the use of a transaction is valuable.

To set up a transaction, we need to start with the BEGIN command, have our list of commands, and then end
with COMMIT. Similar to our prior example:

BEGIN;

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;

INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)


VALUES (1000, ‘Computer’,5, 1, 500);

UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;

COMMIT;
A transaction can contain a single statement or a dozen SQL statements. Note that each SQL statement
needs to end with a semicolon to separate each out individually.

PostgreSQL and other databases treat each SQL statement that is executed as if it were an individual
transaction. If we do not include a BEGIN command, then each of the SQL statements (INSERT, UPDATE,
DELETE, etc) has an implicit BEGIN command and a COMMIT command if the statement is successful.

Video Transcription
[MUSIC PLAYING] Transactions allow you to execute multiple statements while ensuring that every
single statement must be successful for the entire data set to be completed. By default, if you just run
individual statements, like for example, if you just run this particular statement on its own, that piece
would actually be implicitly having a begin statement and a commit statement at the end, and it would
just execute that particular statement.

However when you have a transaction block, within this case here, you'll start with a begin or begin
transaction, and at the end, there will be a commit. This whole entire statement will only execute if all the
different statements are successful. So for example, in this case here, what I'm going to do is I'm going
to try to update the first name to Bob for the first name, and then I'm going to update the support rep to
20. However, the support rep it references the employee table, and in the employee table there is no

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
employee ID that's equal to 20. So in this case here, the entire transaction should fail based on this
particular update statement.

So if I try to run this statement, you'll notice that it identifies that there's an error. We can go ahead and
take a look at the table again, and query it, and we'll see that the first name was not updated being that
all those items were rolled back. Now, if we change this to the support rep id equals to 5, and we take a
look at the data, we see that this is originally a 3, so it should be updated to 5 if it's possible. So we'll go
ahead and execute this entire transaction. And you'll notice that it was completed successfully. Now if
we tried to query the data, we should see that the changes for Bob, which is the customer ID equals 1,
has the first name Bob, and the support ID is equal to 5 now.

[MUSIC PLAYING]

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

Transactions in a database are used to ensure multiple statements execute in an all-or-not scenario.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
ACID Properties
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the ACID properties and how they affect database transactions in two parts:
1. Consistent State
2. ACID properties

1. Consistent State
As we covered in the prior tutorial, a transaction is a single unit of work that has to be fully executed or
completely aborted if there are any issues within the transaction. There are no states in between the
beginning and end that are acceptable for a database to be in. Recall our prior example, where James made a
purchase for $500 for a computer from a store. All of the SQL statements that we defined in that transaction
must be executed entirely:

BEGIN;

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;

INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)


VALUES (1000, ‘Computer’,5, 1, 500);

UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;

COMMIT;
It is not acceptable to only deduct James’s account balance or remove inventory from the store. If any of the
SQL statements in the transaction fails, the entire transaction is rolled back (i.e, not committed) to the original
state. If the transaction is successful, the changes to the database bring it from one consistent state to
another. A consistent state is a state in which all of the data integrity constraints on the database are satisfied.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
2. ACID properties
To ensure that we have consistency in the database, every transaction has to begin with the database in a
known consistent state. If the database is not in a consistent state, transactions can result in a database that is
inconsistent due to violations of integrity or business rules. As such, all of the transactions that occur in the
database are controlled and executed to ensure database integrity according to the ACID properties:
atomicity, consistency, isolation, and durability.

Atomicity requires that all SQL statements of a transaction be completed. If any of the SQL statements are not
completed, the entire transaction should be aborted. For example, in our transaction above, imagine that the
first two statements executed:

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;
Then imagine that we ran into an error with the data that stopped the transaction. The entire transaction
should be reverted to its original state. However, if all four statements in the transaction executed
successfully, the entire transaction would be committed to the database.

Consistency ensures that the database is in a consistent state. This means that a transaction takes a database
from one consistent state to another. When a transaction starts, the database must be in a consistent state
and when the transaction ends, the database must be in a consistent state. If any of parts of the transaction
violate one of the integrity constraints, the entire transaction is aborted.

Isolation means that the data that is used during the first transaction cannot be used in another transaction
until the first transaction has finished executing. In looking at the example above, Imagine that James and
another customer purchased a computer at similar times. If James's transaction has started, the second
customer cannot attempt to purchase the same computer until James’s transaction is completed. Otherwise,
they may have both tried to purchase the single available computer. This is especially important for multiuser
databases where you will have many users accessing and updating the database at the same time.

Durability is the last ACID property. It ensures that when the transaction changes are finished and committed,
they cannot be undone or removed even if there is a system failure.

We will get into each of these properties in further detail in the upcoming tutorials.

 SUMMARY

The ACID properties for a transaction are atomicity, consistency, isolation and durability.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
Atomicity
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the atomicity property in a transaction and how it affects the database in two
parts:
1. Atomicity in Transactions
2. Transaction Example

1. Atomicity in Transactions
Transactions in a database consist of multiple SQL statements that are executed together. Atomicity is
important as it ensures that each transaction is treated as a single statement. Atomicity ensures that if any of
the SQL statements in a transaction fails, the entire transaction fails and the attempted changes within the
transaction are reverted. If all of the statements in a transaction are executed successfully, then the
transaction is successful and committed.

This approach prevents the database from making updates that may only be partially completed. The
database will do one of two operations to ensure atomicity. It will either:

1. Commit – If the transaction is successful, the changes are applied and saved to the database.
2. Abort – If a transaction has any issues, the transaction is aborted, and the changes are rolled back so that
they are not reflected in the database.

This includes all insert, update and delete statements in a transaction.

2. Transaction Example
Jennifer would like to make a payment to Randall for $100 through an account transfer. This transaction is a
balance transfer between two accounts at two different branches of the same bank. Let us take a look at what
the transaction would look like:

1. Jennifer’s (10) account would be deducted by $100.


2. The banking location where Jennifer has her account would have their location’s account deducted by
$100.
3. The banking location where Randall (50) has his account would be increased by $100.
4. Randall’s account would be increased by $100.

The transaction would look something like this in PostgreSQL:

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
BEGIN;

UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;

UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);

UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);<br>

UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;

COMMIT;
With the atomicity property, if there was an error at any point in the four statements, then the entire
transaction would be rolled back. For example, imagine that Randall’s account had a freeze on it that
prevented any changes. The first three statements would execute, but on the fourth UPDATE statement, an
error would be returned. Regardless of what the error was, the first three SQL statements would revert back to
what they were before the transaction started. Otherwise, Jennifer’s account would be deducted by $100, the
bank branch that holds Jennifer’s account would have their balance deducted by $100, Randall’s bank branch
would have $100 added, but Randall’s account would have its original balance. That certainly would not be
acceptable to Randall.

 SUMMARY

The atomicity property ensures that either all SQL statements are executed or none of them are
executed in a transaction.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
Consistency
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the consistency property in a transaction and how it affects the database in
three parts:
1. Introduction
2. Criteria for Consistency
3. Consistency Example

1. Introduction
Consistency within the ACID properties focuses on ensuring that the data in the database moves from one
valid state to another valid state. This ensures that any data that has been modified in the database is
uncorrupted and correct at the end of the transaction. With the consistency property, the database should not
be in a partially completed state.

2. Criteria for Consistency


The consistency property follows the following criteria:

1. If the transaction has completed successfully, the changes will be applied to the database.
2. If there was an error in the transaction, all of the changes should be reverted/rolled back automatically.
This means that the database should restore the pre-transaction state.
3. If there was a system failure or external issue while the transaction was executing, all of the changes that
were made in the transaction up to that point should automatically be reverted/rolled back.

3. Consistency Example
Let's look at our banking example again. Jennifer would like to make a $100 payment to Randall through an
account transfer. This transaction is a balance transfer between two accounts at two different branches of the
same bank. Let us review what the transaction would look like:

1. Jennifer’s (10) account would be deducted by $100.


2. The banking location where Jennifer has her account would have their location’s account deducted by
$100.
3. The banking location where Randall(50) has his account would be increased by $100.
4. Randall’s account would be increased by $100.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
The consistency property ensures that the total value of each type of account is the same at the start and the
end of the transaction. This means that the customer_accounts and branch_accounts would have a consistent
total to account for each statement. Let us look back at the transaction in SQL:

BEGIN;

UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;

UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);

UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);

UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;

COMMIT;
Imagine that during the second UPDATE statement, the system had a failure and when it recovered, the
transaction had only partially executed. There would be an inconsistent state because the total balances
would not match up. In this situation, the system would roll back those UPDATE statements to the consistent
state prior to the transaction starting. Unlike with atomicity, the issue was not caused by an error in the
database statement.

If both Jennifer’s and Randall's account balances started at $1000, the end result should have the appropriate
expected balances. Jennifer’s account balance should be set at $900, and Randall’s balance should be at
$1100. If for any reason, the end values were not what was expected, the transaction would also be rolled
back.

 SUMMARY

The consistency property ensures that each transaction starts in a consistent state and ends in a
consistent state.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
Isolation
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the isolation property in a transaction and how it affects the database in two
parts:
1. Introduction
2. Isolation Example

1. Introduction
The isolation property in a transaction ensures that if there are multiple transactions run at the same time as
one another, they do not leave the database in an inconsistent state. The transactions themselves should not
interfere with one another, and each of the transactions should be run independently. Any changes that are
being modified in a transaction will only be visible to its own transaction, and any other concurrent transaction
will not see the results of the changes until the transaction is complete and the data has been committed. This
also ensures that transactions that run concurrently have the results the same as if they were run sequentially.

In addition, the isolation property ensures that the data that is used in a transaction cannot be used in another
transaction until the original transaction is complete with it. Most databases including PostgreSQL will use
locking to maintain transactional isolation.

2. Isolation Example
Let us consider our banking example yet again. Jennifer would like to make a $100 payment to Randall
through an account transfer. This transaction is a balance transfer between two accounts at two different
branches of the same bank. Here's what the transaction looks like:

1. Jennifer’s (10) account would be deducted by $100.


2. The banking location where Jennifer has her account would have their location’s account deducted by
$100.
3. The banking location where Randall(50) has his account would be increased by $100.
4. Randall’s account would be increased by $100.

Let us look at the transaction in SQL:

BEGIN;

UPDATE customer_account

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
SET balance = balance – 100
WHERE account_id = 10;

UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);

UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);

UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;

COMMIT;
Let us say that both Jennifer’s and Randall's account balance started at $1000. If Jennifer was attempting to
start this transaction and Randall concurrently was trying to check his account balance, Randall should not see
any updates to his account until the changes are made and the transaction has committed. If Randall queries
for his customer_account balance, it would be at $1000 until the entire transaction from Jennifer executes
successfully and commits the data to the database. In certain databases, if Randall tried to query his account
balance, no result would be provided until Jennifer’s transaction was completed.

Isolation is increasingly important as you have more concurrent transactions that access the same data. For
example, imagine a situation where Randall is receiving two different account transfers from two different
individuals at the same time.

Imagine Randall is receiving $100 from Jennifer and $50 from Paul. If there was not isolation in place,
Jennifer’s transaction could check Randall’s balance at $1000. At the same time, Paul’s transaction could
check Randall’s balance at $1000. Jennifer’s transaction would add $100 to the $1000 and save the results.
Paul’s transaction would add $50 to the $1000 and save the results. The final result is that Randall’s account
balance is $1050, instead of $1150. This could be the case if Jennifer's and Paul's transactions are both
reading Randall’s balance at the same time. Jennifer’s transaction started and updated the balance, but Paul’s
transaction also completed and saved over Jennifer’s transaction. This is why isolation is important, as
Jennifer’s transaction would prevent Paul's transaction from reading the value. Even if the value is read, Paul’s
transaction would not be able to complete due to the inconsistency in the database state.

 SUMMARY

The isolation property ensures that multiple transactions can run concurrently without leading to the
inconsistency in the database state.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Durability
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the durability property in a transaction and how it affects the database in two
parts:
1. Introduction
2. Durability Example

1. Introduction
Durability is the last ACID property, and one of the easiest to understand. This property focuses on ensuring
that once the data from a transaction has been saved/committed to the database, it will stay in place and will
not be affected by a system failure. This means that any completed transactions should be recorded and
saved in memory.

2. Durability Example
Once again, consider our banking example. Jennifer would like to make a $100 payment to Randall through
an account transfer. This transaction is a balance transfer between two accounts at two different branches of
the same bank. Here's what the transaction looks like:

1. Jennifer’s (10) account would be deducted by $100.


2. The banking location where Jennifer has her account would have their location’s account deducted by
$100.
3. The banking location where Randall(50) has his account would be increased by $100.
4. Randall’s account would be increased by $100.

Let us look at the transaction in SQL:

BEGIN;

UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;

UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);

UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;

COMMIT;
Once the transaction has successfully completed execution, the updates and changes to the database are
stored in and written to the database. These changes will still persist even if the system fails, as those updates
are now permanent. The effects of the account transfer will never be lost. Note that durability is only applied
after the transaction has occurred and the COMMIT has been successfully executed. Anything that occurs
prior to that is part of another ACID property.

 SUMMARY

The durability property ensures that once a transaction is completed and committed, the data is
permanent and never lost, even with a system failure.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
COMMIT and ROLLBACK to Manage Changes
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores using COMMIT and ROLLBACK in two parts:


1. Introduction
2. Example Transaction

1. Introduction
Up to this point, we have looked at transactions as single units that start with a BEGIN and end with the
COMMIT statement, with multiple SQL commands in between that are executed at once. Remember as well
that without the BEGIN command, each individual SQL statement is viewed as a transaction with an implicit
BEGIN and COMMIT command executed. However, you can split this up to execute the commands one at a
time and control the results as if they were in a transaction to keep the ACID properties.

2. Example Transaction
To start a transaction in the PostgreSQL command line, you can start with either:

BEGIN;
or

BEGIN TRANSACTION;
This will start the transaction until the next COMMIT or ROLLBACK command is encountered. However, if
there is an error in any of the statements after the BEGIN statement, the changes will automatically be rolled
back.

The COMMIT command is used to save changes from a transaction to the database. This COMMIT command
will save all of the SQL statements to the database that followed the BEGIN command. The syntax for the
COMMIT command looks like this:

COMMIT;
or

END TRANSACTION;
The ROLLBACK command is used to undo or revert SQL statements that have not already been saved to the

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
database. The ROLLBACK command can only be used to undo SQL statements that follow the BEGIN
command. The syntax looks like this:

ROLLBACK;
Let us take a look at an example of a series of statements:

SELECT *
FROM customer
ORDER BY customer_id;

Let us start our transaction:

BEGIN;

Let us go ahead to update the CUSTOMER table to set the customer_id equal to 1 to have the first name set to
Bob:

UPDATE customer
SET first_name = 'Bob';
Oops, we accidentally updated all of the names to Bob, as we forgot the WHERE clause.

SELECT *
FROM customer;

We can use the ROLLBACK statement to undo the changes:

ROLLBACK;

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
So although the database did not throw any errors, we can revert the changes from the last BEGIN statement
with the use of the ROLLBACK statement.

 SUMMARY

The COMMIT statement will save results to the database ,while the ROLLBACK statement will undo
results from the start of a transaction.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
CREATE USER/ROLE to Add Users
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores creating users in a database in two parts:


1. CREATE ROLE
2. Other Attributes

1. CREATE ROLE
For many databases, there is a difference between user accounts and groups. User accounts are created and
added to groups. As a best practice, permissions are set to a group and are applied to the user accounts that
are added to the group. In PostgreSQL, roles are used for both users and groups. Logically, roles that can log
into the database are the same as users, which are called login roles.

When we have roles that contain other roles, they are called group roles, which are the same as groups in
other databases.

The basic way to create a role is by using the CREATE ROLE statement:

CREATE ROLE <rolename>;


For example, if we wanted to create a role named “newaccount”, we can run the following:

CREATE ROLE newaccount;


We can see which roles exist in our database by running the following command:

SELECT rolname
FROM pg_roles;

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
Note that the column is rolname and not rolename. We can see the newaccount is the last role in the list. All of
the role names that start with the prefix pg_ are system roles within the database.

This newaccount cannot log in, as we have not defined the LOGIN attribute to that role. To create a login role,
we must use the LOGIN attribute and the initial password. Let us go ahead and create one:

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
CREATE ROLE myaccount
LOGIN
PASSWORD 'mypassword';
This will create a login role named myaccount with the password mypassword. Note that we have to use
single quotes around the password. Also note that even with this account created, we will not be able to
switch to it in our web interface. However, if we wanted to log in through the psql client tool, the command
would look like this:

psql -U myaccount -W postgres


The command would then prompt you to enter in the password.

2. Other Attributes
With the creation of the login role, there are additional attributes that can be used. The superuser attribute, for
example, is one that can override all access restrictions in the database. It is a role that should be used only
when truly needed, as it can basically do anything in a database. It would look like:

CREATE ROLE adminaccount


SUPERUSER
LOGIN
PASSWORD 'secretpassword';
Another type of role is one that can create other databases in PostgreSQL. For this, the CREATEDB attribute is
needed:

CREATE ROLE dbaccount


CREATEDB
LOGIN
PASSWORD 'securePass1';
The VALID UNTIL attribute can also be used. This allows you to enter in a date and time after which the role’s
password is no longer valid and the user can no longer log in. This can be useful for individuals that may only
work at the company for a short period of time, such as a contractor.

CREATE ROLE contractaccount


LOGIN
PASSWORD 'securePass1'
VALID UNTIL '2025-01-01';
After January 1st, 2025, the password for the role contractaccount would no longer be valid.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
 SUMMARY

The CREATE ROLE statement will allow us to create users with a variety of attributes.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
CREATE ROLE to Create Groups
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores using the CREATE ROLE to create groups in a database in two parts:
1. Introduction
2. Exploring CREATE ROLE Options

1. Introduction
The CREATE ROLE statement can be used to create a user or a group, depending on how it is used. This is
something unique to PostgreSQL, as other databases typically have a different concept to separate the user
and group. We have already looked at the creation of a role for a user in the prior tutorial. This role needs the
LOGIN attribute with a password that is set. Group roles, on the other hand, do not have a LOGIN attribute or
password, as users are meant to inherit the permissions from the group role as a best practice.

2. Exploring CREATE ROLE Options


The structure of the CREATE ROLE command looks like this:

CREATE ROLE <rolename>


<WITH>
[SUPERUSER]
[CREATEDB]
[CREATEROLE]
[INHERIT]
[LOGIN]
[CONNECTION LIMIT]
[IN ROLE];
The keyword WITH is an optional keyword in the CREATE ROLE command. Notice that we have a lot of
different options here that can be used. From the SUPERUSER to the BYPASSRLS option, there is a default
option with NO in front of it like NOSUPERUSER or NOBYPASSRLS that is available. By default, it is used if the
specific parameter is not used.

We looked at the SUPERUSER, CREATEDB, and LOGIN options in the prior tutorial. As a reminder, the
SUPERUSER is one that has the ability to override all access restrictions. It can be a dangerous status as the
role can drop any objects or access any data. To create a role using the SUPERUSER attribute, the account
must be a SUPERUSER.

The CREATEDB attribute just defines if the role is able to create databases. Typically a database administrator

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
role would benefit from this attribute.

The CREATEROLE attribute allows you to create new roles, ALTER roles and DROP other roles.

The INHERIT option in this case is the default choice. There is the NOINHERIT attribute. This attribute
determines if the role inherits the permissions and privileges of roles that it is a member of. A role that has the
INHERIT attribute has the ability to use the database permissions that have been granted to all of the roles
that it is directly or indirectly a member of.

The CONNECTION LIMIT determines how many concurrent connections the role can make. The default is set
to -1 meaning that there is no limit.

The IN ROLE will list one or more existing roles in which the new role would be immediately added as a
member. This could be a login role or another group role. For example, you could have an executive role that
is set up to be part of a management role so that the executive role will get all of the permissions that the
management role has.

There are other attributes that can be used with the CREATE ROLE that are very specific to a given scenario
that we will not cover.

We can use the different attributes together in a single CREATE ROLE statement. For example, you could
have created an admin role that has the ability to create databases and roles:

CREATE ROLE adminrole


CREATEDB
CREATEROLE;
This would create a group role called adminrole.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

The CREATE ROLE statement can be used to create group roles. Login roles can be a part of these.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
GRANT to Assign Users
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the GRANT and REVOKE commands to assign user roles to a group role in two
parts:
1. Introduction
2. Examples

1. Introduction
It can be useful to create group roles, as we did in the prior tutorial, so that it is easier to manage privileges
and permissions. This is especially important as an organization has more users. This way, privileges and
permissions can be granted to or revoked from a group as a whole rather than from the individual users.

As we have covered, typically a group role would not have the LOGIN attribute, although logically it can.
However, it does not make sense to have it defined as such. Remember as well that in PostgreSQL, there is
not a distinction between group roles or non-group roles. As such, you can grant membership to other group
roles rather than just to user roles.

2. Examples
Let us first create a user role for an admin account:

CREATE ROLE myadmin


LOGIN
PASSWORD 'mypassword';
We can then create a group role called adminrole that has the various admin privileges:

CREATE ROLE adminrole


CREATEDB
CREATEROLE;
If we wanted to grant the adminrole role to the myadmin user role, we would do so with the GRANT command
like:

GRANT adminrole TO myadmin;


If we wanted to grant to more than one user at a time, we could include the list of users separated by commas
like:

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
GRANT adminrole TO myadmin1, myadmin2, myadmin3;
Say we wanted to separate out the admin role into one that could create roles and a separate one that could
create databases:

CREATE ROLE adminrole_cr


CREATEROLE;

CREATE ROLE adminrole_db


CREATEDB;
We can grant permissions separately to myadmin by doing:

GRANT adminrole_cr TO myadmin;


GRANT adminrole_db TO myadmin;
If we wanted to take away the permission to create databases, we can use the REVOKE command like:

REVOKE adminrole_db FROM myadmin;


Since both of those roles are group roles, you could grant them to each other, but only in one direction
because the database will not allow you to set up circular membership loops:

GRANT adminrole_cr TO adminrole_db;


GRANT adminrole_db TO adminrole_cr;

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

We can GRANT and REVOKE group roles to and from user roles and other group roles.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
GRANT to Assign Privileges
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the GRANT and REVOKE commands to assign privileges on objects to roles in
three parts:
1. Introduction
2. Possible Privileges
3. Examples

1. Introduction
When we create an object such as a table, view, or index, the object is assigned an owner. Typically, the
owner of the object is the role that executed the CREATE statement associated with the object. For most
types of objects, only the owner or superuser has the ability to do anything with the object. In order to allow
other user or group roles to use and interact with the object, privileges on the object must be granted. There
are many different types of privileges that are available to grant, depending on the type of object.

2. Possible Privileges
These privileges include:

SELECT – Allows the role to select from any column or from specific columns listed within a table, view, or
sequence. This privilege would also be required if there is a need to reference existing column values in
an UPDATE or DELETE statement.
INSERT – Allows the role to INSERT a new row within a table. If there are specific columns that are listed,
only those columns may be inserted into the other columns automatically being set with default values.
UPDATE – Allows the role to UPDATE a column or a list of columns in a table. If the UPDATE privilege is
granted, the SELECT privilege should also be granted since it has to reference the table columns to
determine which rows of data should be updated.
DELETE – Allows the role to DELETE a row from a table. If the DELETE privilege is granted, the SELECT
privilege should also be granted since it has to reference the table columns to determine which rows of
data should be removed.
ALL – This grants access to all available privileges to that object.

There are other privileges as well, including TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT,
TEMPORARY, EXECUTE, and USAGE that allow interaction with objects.

3. Examples
© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
Let us say we have a group role called employee_role. We may want to allow the querying of the employee
table, but not allow any changes to the data. As such, we would run the following:

GRANT SELECT ON employee TO employee_role;


We could also grant the employee_role edit access and query access on the customer table:

GRANT SELECT, INSERT, UPDATE, DELETE ON customer TO employee_role;


As another example, for an admin_role, you could grant all privileges to the customer table:

GRANT ALL ON customer TO admin_role;


If we wanted to grant access to specific columns, we would add those columns in round brackets after the
privilege type. For example, if we wanted to grant UPDATE privileges to the employee_role on the track table,
but only on the unit_price, we would do the following:

GRANT UPDATE(unit_price) ON track TO employee_role;


This grants only the ability to update the price, but none of the other columns.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

We can GRANT and REVOKE to assign and remove privileges to roles.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
Application Security
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores security concerns with applications that connect to databases in two parts:
1. Confidentiality, Integrity, and Availability
2. SQL Injections

1. Confidentiality, Integrity, and Availability


Security is always a potential concern when it comes to information systems. Security concerns include data
confidentiality, integrity, and availability.

Confidentiality focuses on ensuring that the data is protected against unauthorized access or disclosure of
private information. Many organizations have to follow various laws and guidelines around data confidentiality,
like the Health Insurance Portability and Accountability Act (HIPAA) in medicine, or the Sarbanes-Oxley Act
(SOX) in the business world, as examples. As such, the data stored within databases needs to be classified as
highly restricted, where very few individuals would have access (credit card information as an example);
confidential, where certain groups would have information (pay information for employees as an example); or
unrestricted, where anyone can have access.

Availability focuses on the accessibility of the data when authorized users wanted access to it.

Finally, integrity focuses on ensuring that the data is consistent and free from errors. The database itself plays
a key role in data integrity, as we have seen in prior tutorials. Protecting sensitive data often involves using
encryption. Through encryption, the underlying data is scrambled, using a key to a format so that it cannot be
read as is. There are many forms of encryption with various algorithms. With weaker encryption, you may see
attackers trying to decrypt the data by brute force, which basically means trying to guess what the decryption
key is through iterative trial and error.

With applications, security vulnerabilities can occur due to many different factors. Individuals can sometimes
take advantage of bugs within the application that connects to the database. When code is poorly developed
or focused purely on functionality and not security, issues can occur. One example is session hijacking, when
an individual takes over a web session from another individual. By doing so, the individual can get access to
certain personal information from another user that they may not have been able to access otherwise.

2. SQL Injections
SQL injections are one of the most common web hacking techniques used with applications. It can damage
your database, or provide complete unauthorized access to the database if things are not well protected. SQL
injection typically works by attempting to add malicious code into SQL statements through the use of web

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
page input.

This can occur if the page asks for input like a userid and concatenates the input to the SQL string. For
example, from a program, we may have something that looks like:

myinput = readFromInput(“userid”);

mysql = “SELECT * FROM users where user_id =” + myinput;


If the user enters in 5 for the userid, this would create a SQL statement like:

SELECT *
FROM users
WHERE user_id = 5;
That would be the expected result. However, if the “hacker” entered in “5 or 1=1”, the following SQL statement
would look like this:

SELECT *
FROM users
WHERE user_id = 5 or 1=1;
In we look at the statement, the 1=1 will always return true, which means that the query would return every
single row within the table. Imagine if the table had usernames, passwords, and other user information. All of
that would now be compromised in this successful SQL injection.

Or perhaps the input could look like this “5; DROP TABLE customer;”. The resulting query would look like this:

SELECT *
FROM users
WHERE user_id = 5; DROP TABLE customer;
If this SQL injection is successful, it could potentially drop the table customer, which is also quite problematic.

Different databases handle SQL injection issues slightly differently. To avoid these types of scenarios, what is
important is that the application first ensures that the input data has been validated before sending the query
to the database. We also want to filter input data to avoid the user bypassing our validation. By filtering the
user input, we can ensure that we're checking for any special characters that should not be included. In many
applications, the use of SQL parameters can help.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

There are many different security concerns surrounding databases, including SQL injections.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
Superusers
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the importance of separating out superuser and regular accounts in two parts:
1. Reasons For Superusers
2. Superuser Administration

1. Reasons For Superusers


The principle of least privileges is an important element to consider when it comes to setting up security in
any information system. Whether it is through Windows, Linux, or a database, a “superuser” account is viewed
as a role that comes with unrestricted access to all commands, functions, and resources within a system. A
superuser can bypass all permission checks, and access all types of powerful operations, including everything
in the database, things that touch the underlying system, and enabling extensions. Superuser roles in a
database have the ability to create databases and roles, or completely remove them. If the superuser role is
misused on purpose or even by accident, it can create significant damage.

Most of the security protection measures are handled around the perimeter of an information system.
However, superuser roles and accounts are already on the inside. For example, if an individual had temporary
access to the superuser role in a database, they could create additional roles to which they could connect,
causing further damage. Even if the superuser role that an individual had been using was removed, they
could have backdoors through these other roles. They could even remove evidence of their activity within the
system. For example, an intruder with a superuser role could create orders within a system and have items
that they did not purchase sent to them. Once the order had been sent out, they could delete the order and
any related data so that the system had no indication of that order.

2. Superuser Administration
In the last tutorial, we explored SQL injections, wherein individuals can potentially drop a table or even a
database. However, they can only do so if the role that the application was logging into had that permission.
As such, any roles that a user of an application will interface with should only be given the necessary
privileges to complete their tasks. In some organizations, superuser roles and accounts are shared among
various users, such as database administrators. By doing this, though, the audit trail becoming a lot more
difficult to track.

Policies need to be put in place to help provision, segregate and monitor various risks. For example, the
superuser role should be strictly limited to a number of individuals. You can temporarily increase the
privileges of individuals when needed, without granting them full superuser privileges around the clock. You
can also work on separating the privileges of users and force them to use certain accounts depending on
when they need certain privileges. You may have separate accounts/roles that have the ability to update and

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
query data, while other accounts/roles can only query data if that role doesn’t need to make changes. All
permissions should be granted to the user or role, they should not be automatically given.

It can also be important to change out the superuser password on a regular basis so that even if the account
is compromised, it is limited. Some organizations may even change the password after each use of the role to
keep it safe.

 SUMMARY

Separating out superuser roles and regular accounts is important, as superuser roles should only be
used when absolutely needed.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
Index Overview
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the different types of indexes in databases in two parts:
1. Indexes in General
2. Data Structures

1. Indexes in General
A database index is a structure that allows a query to efficiently retrieve data from a database. Let us look at a
query run on our track table:

SELECT *
FROM track
WHERE album_id = 5;
If there are no indexes added to the table on this column, the database would have to scan row by row
through the entire table to find all of the matching rows. If there were many rows in this table and we only
expect to have a few rows returned, this can be quite an inefficient search. Imagine if there were one million
rows in the table, and this album_id did not even exist. This means that the database would have to search
through all one million rows to identify that there were no matching values at all. If the database has an index
on the album_id column, however, the database would be able to much more efficiently find the matching
data.

The concept of database indexes is quite similar to the alphabetical index at the end of a book. With a book,
an individual can scan through the index fairly quickly to find the topics they're interested in and what page to
turn to. This is a much faster approach than having to read through the entire book to find the content you
want. The usefulness of a book index depends on the person who creates the topic list. Similarly, the
database developer must determine what indexes could be useful.

Once an index is created in a database, nothing else needs to occur. The database will automatically update
the index when the table data is modified. The database will also make use of the index if it will be more useful
than searching row by row. Indexes can be useful not only in SELECT statements with joins on the indexed
columns, but also in UPDATE and DELETE commands that have filtering criteria.

It is important to note that we do not want to index every single column in the database, as creating an index
can take a long time. There is also a cost to maintaining the index. This depends on the database, but it could
occur with every insert, update or delete SQL statement that affects the index. The index has to be
synchronized with the table, so if there are indexes that are not frequently used, they should ideally be
removed. By default, primary keys and columns with the UNIQUE constraint have indexes created for them.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
2. Data Structures
Most databases will implement indexes using one of the following data structures:

1. Hash index – this type of index is based on an order list of hash values. Typically we will have a hash
algorithm that is used to create a hash value from a key column. This value then points to an entry in a
hash table which then points to an actual location of the data row. Hash indexes can only handle simple
equality operators and in most cases will be the first index to use when we use the equality operator = .
2. B-tree index – this type of index has the data organized in an upside-down tree. The index tree is stored
separately from the data itself. The B-tree index self-balances, meaning that it will take about the same
amount of time to access any row in the index regardless of the size of the index. For example, with one
million rows of data, it will take less than 20 searches to find data using a B-tree index compared to one
million searches if it were done sequentially. This is the most common type of index in databases and
generally is the most useful when we have limited repeating values. B-trees can handle equality and
range queries quite well. If the query uses a <, <=, =, >= or > operator, a B-tree index if available could be
used. It is generally also used for pattern matching as long as the pattern is constant and starts with the
beginning of the string being set.
3. Bitmap index – this type of index uses a bit array of zeros and ones. These are useful to represent a value
or condition that may be true or false. For example, this could be useful to check if an individual was
signed up for a newsletter. Typically, since the values in the columns that have the bitmap index are one
of two values, the equality operator = is most commonly used.
4. Generalized Search Tree (GiST) – This is a more complex type of index unique to PostgreSQL that is not
used often but focuses on certain functionality like searching for a value closest to another value or
finding pattern matching. As such, there are special types of operators that could be useful for this
purpose like <<, &< &>, >>, <<|, &<|, |&>, |>> @<, @>, ~= and &&. We have not explored any of these
operators, because they are not used that frequently.
5. Generalized Inverted Index (GIN) – This is a unique type of index for PostgreSQL that focuses on indexing
array values and testing if a value exists. Operators you would see with this include <@, @>, = and &&.

 SUMMARY

Indexes are used to speed up queries and avoid having to search through data one row at a time.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
B-Tree Index
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the use of B-tree indexes in databases in three parts:
1. Introduction
2. Examples
3. When To Use

1. Introduction
As we discussed in the prior tutorial, the B-tree index is formatted like an upside-down tree. B-tree indexes
generally handle equality and range values that could be sorted in a specific order. The most common
operators that are used with a B-tree index include <, <=, =, >= and >. Note that there are constructs that are
the same as these operators, like those that use BETWEEN or IN, as they could be represented in a similar
manner. It is also common to see IS NULL or IS NOT NULL using the B-tree index as a means to check the
data.

2. Examples
The database may also use a B-tree index with certain pattern matching operators such as LIKE, if the pattern
is a constant and is anchored at the start of the string. For example, we could look for patterns of the name
that starts with 'Wal':

SELECT *
FROM track
WHERE name LIKE 'Wal%';
Or for customers that have an email address starting with 'ro':

SELECT *
FROM customer
WHERE email LIKE 'ro%';
However, the B-tree index would not be useful if we tried to find information in the middle or at the end of the
string, like tracks that have 'at' in the middle name:

SELECT *
FROM track

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
WHERE name LIKE '%at%';
Or customers that have the email with 'gmail.com' as the domain name:

SELECT *
FROM customer
WHERE email LIKE '%@gmail.com';
Other queries on data are based on ranges. For example, you could have open-ended ranges:

SELECT *
FROM track
WHERE album_id >=5;
Or those that have specific ranges that contain values:

SELECT *
FROM track
WHERE album_id >= 5 AND album_id <=10;
This is the same as if we had:

SELECT *
FROM track
WHERE album_id BETWEEN 5 AND 10;
Note that this is different than considering two ranges that do not overlap like:

SELECT *
FROM track
WHERE album_id <= 5 AND album_id >=10;
Here we are looking for items with the album_id less than or equal to 5 while at the same time looking for the
album_id being greater or equal to 10. As the album_id cannot be a value that simultaneously meets that
criteria, no rows would be returned. More importantly, this would not be a good fit for the B-tree index. Even if
we use the OR operator, it will not be as efficient as having the overlapping range:

SELECT *
FROM track
WHERE album_id <= 5 OR album_id >=10;

3. When To Use
The B-tree index is great for instances when you have values that only repeat a few times, or are completely
unique. Take the track name, for example. There may be a few repeated track names, but for the most part,
the names on the tracks are different compared to the total number of rows in the table. As such, a B-tree
index would be the best choice. Note that when you are adding indexes to tables, you generally do not have
to worry about what the best type of index to use is, because the database will handle it for you. The B-tree

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
index is the default choice.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

The B-Tree index is the most common type of index, which is used when considering a range of
sortable values.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
Hash Index
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the use of hash indexes in databases in two parts:
1. Hash Index Explanation
2. Examples

1. Hash Index Explanation


Hash indexes are unique in that they are really only used for equality operators = and single-value lookup
scenarios. They aren’t used for comparison operators that find a range of values. Basically, a hash index has
an array of N number of buckets where each one has a pointer to a row in the table. The hash index uses a
function that takes a key and the N number of buckets and maps the key to the corresponding bucket of the
hash index. The hash function uses an algorithm that maps data of a variable length to data of a fixed length
in a specific but random way.

One simple example of a hash algorithm could be that it takes a string and returns the length of a string. Let
us say that we have 10 buckets, as the length of the column is 10, and we pass into the hash function “Bob”.
Since the length of “Bob” is equal to 3, it would be placed in the third bucket. If we passed into the hash
function “Jennifer”, we would have the value of 8 as the length, so it would go in the 8th bucket. So far, that
seems to be fairly easy as a hash function.

However, what happens in the case that you have a value hashed to the same place? For example, if we have
“Ron”, the length is 3 and it would point to the third bucket. In this case, we have two different keys (“Bob”
and “Ron”), but they have the same hash value based on our function. In this case, we have a collision, which
is very common in hash functions. The more collisions a hash function has, the worse it can be, as it can have
a performance impact when reading values. This can also occur if the number of buckets is too small
compared to the number of distinct keys.

There are many ways to solve this collision problem. One approach is to have the pointer of the first item point
to the next item and so forth. This way, you would group all of the items that hash to the same bucket linked
to one another. Remember that the hash algorithm that we used is just a simple approach, but there can be
more complex types of hash algorithms that will evenly distribute the data into buckets. For example, if we
hashed the data based on the first name and had 100 buckets, most of the data would only be in the first 10
buckets or so. The remaining buckets would be mostly empty. Having an algorithm that would distribute the
names across all 100 buckets would make it much more even and quicker to find data.

2. Examples
Let us take a look at instances where a hash index would be ideal:

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
SELECT *
FROM track
WHERE name = 'Walk On';

SELECT *
FROM track
WHERE album_id = 5;

SELECT *
FROM customer
WHERE country = 'USA';
What do these queries have in common? They all use the equality operator. Queries like this would not use
the hash index as they don’t use the = operator:

SELECT *
FROM customer
WHERE email LIKE 'ro%';

SELECT *
FROM track
WHERE album_id >=5;

SELECT *
FROM track
WHERE album_id >= 5 AND album_id <=10;

 SUMMARY

The hash index uses a hashing function to speed up queries that use the equality operator.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
DROP INDEX to Remove Indexes
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores the use of the DROP INDEX command to remove indexes in two parts:
1. Index Creation
2. Dropping an Index

1. Index Creation
We previously looked at the creation of indexes through the use of primary keys and foreign keys. Both of
those constraints automatically create the indexes. The CREATE INDEX statement looks like this:

CREATE [UNIQUE] INDEX <indexname>


<tablename> (<columnname>) [USING method];
For example, if we wanted to create a unique constraint and index on the email address in the customer table,
we could do:

CREATE UNIQUE INDEX idx_customer_email


ON customer(email);
However, we could not create a UNIQUE constraint and index on the country, as the country does repeat by
customer:

CREATE UNIQUE INDEX idx_customer_country


ON customer(country);

You could, however, create an index on the country in general:

CREATE INDEX idx_customer_country


ON customer(country);
We can add the type of index that we want by adding the USING method. By default, the b-tree option is
selected. However, you can pass in a hash, gist, or gin. Note, though, that not all versions of PostgreSQL
support this.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
CREATE INDEX idx_customer_country USING hash
ON customer(country);

2. Dropping an Index
Once we have created an index, removing the index is quite simple. We simply do the following:

DROP INDEX [CONCURRENTLY] [IF EXISTS] <indexname> [CASCADE or RESTRICT];


You have options when removing the index, similar to some of the other DROP statements for other objects.
Typically, we would want to remove any unused indexes in consideration of the database's performance.

IF EXISTS will only remove an index if it exists. Using IF EXISTS will not throw an error if the index does not
exist.

DROP INDEX IF EXISTS idx_customer_country;


CASCADE will automatically drop any objects that depend on the index that we are dropping.

DROP INDEX idx_customer_country CASCADE;


RESTRICT is set by default, and will not drop the index if we have any objects that depend on it.

DROP INDEX idx_customer_country RESTRICT;


When we run the DROP INDEX statement, PostgreSQL will get a lock on the table and block any other access
to the table until the dropping of the index is complete. If there is a conflicting transaction that is running on
the table, we can add CONCURRENTLY to the command to wait until any conflicting transactions are done
before we remove the index. Note that if we do use the DROP INDEX CONCURRENTLY option, we cannot use
the CASCADE option.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

Dropping an index is important to run if you do not use an index, to preserve database performance.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Create a Backup
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores using the command line to back up a PostgreSQL database in two parts:
1. Introduction
2. PostgreSQL Backup Tools

1. Introduction
Depending on the choice of database, backing up a database can take various forms. In most cases, there is a
command line option and a graphical user interface option that functions by using the command line behind
the scenes. These steps are important to save the data and state of the database, so that it can be recovered
if there are any issues. There are different types of database backup options, such as full or partial data, or
being able to choose to back up structures.

2. PostgreSQL Backup Tools


In PostgreSQL databases, we have the pg_dump and pg_dumpall tools. These will not work within our web
interface tool, as the tool is logged into PostgreSQL already. However, if you have PostgreSQL installed
locally, you can test the commands.

The pg_dump tool outputs all of the contents of all database objects into a single file. The script dumps
contain SQL commands that are in plain-text files that can be used to reconstruct the database back to the
state that it was in when the database was backed up.

There are a lot of different parameters that can be used, so we will explore some of the more common
options.

Let us first look at a complete command that will back up the mydb database to a mydb.sql file in the
c:\backups\ folder on a Windows system, using the user adminrole:

pg_dump -U adminrole -W -F p mydb > c:\backup\mydb.sql


Let's break this down:

The pg_dump is the command line tool.


The -U adminrole specifics the user role that will be used to connect to the database. In this case, we are
using the adminrole to login to perform the backup.
The -W prompts the pg_dump command to prompt for the password on adminrole before it can continue.
The -F specifies the output file format. In this case, the p stands for plain-text SQL script file.

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
Then, we indicate the database that we want to backup, which is mydb.
The > c: :\backup\mydb.sql is the output backup file name and path that we are backing up to. If you don’t
pass in a path, you can just include the file name, which will output the backup file to the current directory
that you are running the command in.

Let us explore some other options:

The -a option will only dump out the data, but not the schema with the data definitions like the table
structure. This approach is only useful if we back up the data in plain text, as it will allow us to export just
the data.
The -s will dump out just the object definitions like the tables, rather than including the data.
The -c with the lowercase c will clean the database objects by drop statements first before creating them.
The -t will only dump out specific tables that match the table name that is passed. For example, -t
employee would only dump out the employee table.
The -T with a capital T will dump out all tables other than the ones that are listed.
The -C with a capital letter will start the output with a command to create the database and reconnect to
the created database. This option allows you to avoid having to create the database first.

With the -F option to indicate the format, we looked at p to output the plain-text SQL script. There are other
options as well, like “d” for directory, where it will create a directory with one file for each table. There is also
“t” for a tar, which will create an archive file similar to a .zip file.
The pg_dumpall command will dump out all of the databases within a server. This is not a commonly used
option, as you typically would only want to back up specific databases at a time. The pg_dumpall process will
export all of the databases into a single file, so restoring from this can be unreliable. All of the options with
pg_dumpall command will be the same as with the pg_dump command. The only difference is that the -W
option is not used, as you wouldn’t want to have to type the password for each individual database.

 SUMMARY

The pgdump and pg_dumpall commands will allow you to backup your database(s) using the
command line.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Restore from Backup
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores using pgsql or pg_restore to restore data from a backup in the command line in
two parts:
1. Getting Started
2. Running A Query

1. Getting Started
There are two commands that can be used to restore from database backups. The psql command will restore
plain SQL script files that have been created by pg_dump and pg_dump tools. The pg_restore command is a
utility that allows us to restore a PostgreSQL database from an archive that has been created using the
pg_dump command using one of the non-plain text formats such as a tar. This command will execute the
commands to reconstruct the database to the time in which the database backup was created. Within the
archive file, the pg_restore has the ability to identify what items are restored as well as the items to be
restored.

First, we will look at the psql tool. By using the psql tool, you can execute the entire SQL script at once. The
command will look like the following:

psql -U adminrole -f backupfile.sql


This will log in using the admin role, although you will need to enter in the password and run the
backupfile.sql to restore the data. Similar to the pg_dump tool, there are other options that can be passed in:

The -a option will output all of the input lines to the standard output so you will be able to visually see the
progress of the restore.
The -d option will allow you to specify the database name to connect to, like -d mydb.
The -W will force psql to prompt for a password if needed.

2. Running A Query
The pg_restore focuses on restoring databases that are in a non-text format created from the pg_dump or
pg_dumpall tools. Using this command, you can specify database objects from a database file that contains
full databases or individual databases. This tool can also take a backed-up database from an older version of
a database and restore it in a new version.

For example, say we had a backup.tar file that had been created in the same folder. We can restore the
database by doing:
© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
pg_restore -d mydb -f backup.tar
We also have options with the pg_restore:

The -a option will only restore the data, but not create the schema. This would assume that the schema
has already been created.
The -c option will clean/drop the database objects before they are recreated.
The -C with an uppercase C will create the entire database before restoring it. If the -d database is used, it
will drop the current database and recreate it before the restoring is done.
The -f can pass the filename if we include the file name.
The -s will only create the schema but not restore the data into the database.
With the -t option, we can specify the table name to restore.

 SUMMARY

The psql and pg_restore commands are used to restore the database.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
Backups: Command Line vs. GUI
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores using the differences between backing up and restoring from the command line
and graphical user interfaces (GUI) in three parts:
1. Introduction
2. GUI Ease of Use
3. Command Line Flexibility

1. Introduction
When it comes to backing up and restoring data from a database, we have the option to use graphical user
interfaces (GUI) or the command line. There are advantages and disadvantages to both options. Most of the
time, with a database, you will automatically have access to the command line. This command-line interface
will be consistent regardless of the platform that you use. With a GUI interface, there is typically a separate
installation that connects to the database. You generally have options beyond what the database vendor
offers to be able to connect to the database. The GUI tool interfaces with the database through application
programming interfaces or by running it through the command line.

2. GUI Ease of Use


Graphical user interfaces (GUI) tend to be visually intuitive to use, especially for a beginner. The command line
commands and options, on the other hand, can be quite difficult for beginners to handle and require some
practice and expertise. For example, think back to the last tutorial. The pg_restore command -a restores only
data, but the same option in psql echoes the input lines in the script. The same feature in a GUI form would
just be an input field, like a checkbox, without having to remember what the command looks like. In the GUI,
you can execute the commands without knowing the exact syntax and making mistakes in the underlying
code.

The graphical user interface will use more memory, as it has to display the graphical elements of the program,
whereas the command line will use a lot less memory as there is no extra overhead. But with many different
GUI tools for the database, you can customize the look and feel and save your settings. This allows you to
choose the different interfaces based on the database you're using. With the command line, you do not have
the same customization, although you can create scripts to run your commands.

3. Command Line Flexibility


Since you can create scripts in the command line, you have the ability to combine statements together and

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
perform tasks that cannot be done in GUI design unless it has been implemented. For example, from the
command line, you can backup from one database and immediately send the result to restore another
database. You also have the option in the command line to connect to a remote database, which is not always
possible in the GUI.

Regardless of whether you use the GUI interface or the command line, you can continue to restore a database
even if there are errors in the files, or stop. With either option, you can also choose to backup and restore just
the data, the schema, or both. You also have the option to backup and restore data to and from plain text files
or non-plain text files.

 SUMMARY

There are some differences and similarities between backing up and restoring databases through the
command line and GUI interface.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Backup Methods
by Sophia Tutorial

 WHAT'S COVERED

This tutorial explores various database backup strategies in two parts:


1. Introduction
2. Three Options

1. Introduction
There are many different approaches for database backups. In part, the best approach is dependent on the
organization, the size of the database, and how often the database is used. The most common options within
most databases are a full database backup, a differential backup, and an incremental backup.

2. Three Options
The simplest type of backup is a full database backup. This provides a copy of the entire database and allows
us to restore the data to the point in time when the database backup was made. Note that you can have
database transactions running even when the database is still in the process of backing up the data. However,
database backup input and output operations can slow down some of those transactions.

Running a full database backup is an option that a lot of organizations use on a nightly basis. It is a good plan
if the database size is relatively small, because each backup file is small and can easily be restored
independently if there are any issues. However, if the database is large, it can take a lot of time and space to
create full backups on a nightly basis. Besides being the simplest, this type of backup is also the fastest to
restore, as it only takes one step.

Differential backups are another approach. They contain only the data that has changed since the last full
backup. Differential backups are cumulative, not incremental. This means that the differential backup will save
all of the changes since the last full backup, even if there are other differential backups run since the last full
backup.

For example, imagine that we ran the full backup only Sunday, and ran the differential backup every day. The
differential backup on Monday would only contain Monday’s data. The differential backup on Wednesday
would contain Monday, Tuesday, and Wednesday’s data, even though the differential backup ran on Monday
and Tuesday. Friday’s differential backup would contain all of the data from Monday to Friday. As such, if we
used a full backup and differential backup strategy, we would need at most 2 restores to restore the data: the
full backup, and the most recent differential backup. Since the differential backups only contain the data
changed since the last full backup, they can be created a lot faster than a full backup.

The incremental backup is similar to a differential backup, but the key difference is that each incremental

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
backup only contains the data from the last backup, regardless of whether it is a full backup, differential
backup, or incremental backup. It is not cumulative.

For example, imagine that we ran the full backup on Sunday and ran incremental backups every day. The
incremental backup on Monday would have Monday’s data. The incremental backup on Tuesday would only
have Tuesday’s data. Saturday’s incremental backup would only have Saturday’s data. These incremental
backups run in the least amount of time and take the least amount of space. However, they can take the
longest to restore as each incremental backup has to be applied one after another. Moreover, if the data from
one incremental backup is missing, the incremental backup files after that are not useful because we would
be missing the changes.

 SUMMARY

There are three primary database backup strategies that can be implemented depending on the
organization and the use of the database.

Source: Authored by Vincent Tran

© 2022 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49

You might also like