0% found this document useful (0 votes)

78 views22 pages

Lab_ Storing and Analyzing Data by Using Amazon Redshift

Uploaded by

Bernadi Beltran Canovas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views22 pages

Lab_ Storing and Analyzing Data by Using Amazon Redshift

Uploaded by

Bernadi Beltran Canovas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

[Version 1.0.23]

Storing and Analyzing Data by Using

Amazon Redshift
Lab overview and objectives
Of the members of the data science team, Mary most often works with large datasets and
columnar data stores. You have decided to explore using Amazon Redshift for your next proof of
concept (POC) to address these use cases for her.
You ask Mary if she has a sample dataset that you can use for your POC. She suggests that you
use a large music ticket sales dataset that she has worked with before. The dataset is available
from a public Amazon Simple Storage Service (Amazon S3) bucket.
Data analysts often work with large datasets. For example, some data warehouses can contain
multiple petabytes of data, and data lakes can be even larger. Amazon Redshift is designed to
handle large datasets. Unlike traditional relational database management systems, Amazon
Redshift uses columnar data storage. Columnar storage improves query performance and saves
on storage costs.
In this lab, you will learn how to implement a Redshift cluster as part of your analytics pipeline.
You will retrieve data from a dataset that is stored in Amazon S3, extract the data to a Redshift
database, and then query the data for analysis.
After completing this lab, you should be able to do the following:
Review an AWS Identity and Access Management (IAM) role's permissions to access and
configure Amazon Redshift.
Create and configure a Redshift cluster.
Create a security group for a Redshift cluster.
Create a database, schemas, and tables with the Redshift cluster.
Load data into tables in a Redshift cluster.
Query data within a Redshift cluster by using the Amazon Redshift console.
Query data within a Redshift cluster by using the API and AWS Command Line Interface (AWS
CLI) in AWS Cloud9.
Review an IAM policy with permissions to run queries on a Redshift cluster.
Confirm that a data science team member can run queries on a Redshift cluster.

Duration
This lab requires 90 minutes to complete. The lab will remain active for 180 minutes so that you
can complete all steps.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 1/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

AWS service restrictions

In this lab environment, access to AWS services and service actions might be restricted to the
ones that are needed to complete the lab instructions. You might encounter errors if you attempt
to access other services or perform actions beyond the ones that are described in this lab.

Scenario
In this lab, you will build a POC to use Amazon Redshift to address the need to query large
datasets. You will use SQL queries for a music ticket sales dataset. These data files are already
stored in Amazon S3 in another account, and you will load them into a Redshift database.
The following diagram illustrates the environment that was created for you in AWS at the
beginning of the lab.

Tip: To review the CloudFormation template that built this environment, after logging in to the
AWS Management Console, navigate to the CloudFormation console. In the navigation pane,
choose Stacks.
By the end of the lab, you will have created the additional architecture that is shown in the
following diagram. The table after the diagram provides a detailed explanation of the architecture.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 2/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Numbered
Detail
Task
You will review the MyRedshiftRole IAM role, which has the RedshiftIAMLabPolic
1
policy attached.
You will use the Redshift console to create a Redshift cluster, called redshift-clus
2
the dev database.
3 You will create tables in the Redshift dev database.
You will load the music ticket sales dataset data into the dev database from an S
4
in another account.
5 You will query the dev database by using the Amazon Redshift console.
You will also query the dev database by using the Redshift Data API and the AW
6
the AWS Cloud9 terminal.
You will review the Policy-For-Data-Scientists IAM policy, which is assigned to
7
the Mary IAM user.
You will test the Mary user's ability to access Amazon Redshift with the Policy-F
8 Scientists IAM policy and query the dev database from the AWS CLI in the Clou
terminal.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 3/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Accessing the AWS Management Console

1. At the top of these instructions, choose Start Lab.
The lab session starts.
A timer displays at the top of the page and shows the time remaining in the session.
Tip: To refresh the session length at any time, choose Start Lab again before the timer
reaches 00:00.
Before you continue, wait until the circle icon to the right of the AWS link in the upper-left
corner turns green.
2. To connect to the AWS Management Console, choose the AWS link in the upper-left corner.
A new browser tab opens and connects you to the console.
Tip: If a new browser tab does not open, a banner or icon is usually at the top of your
browser with the message that your browser is preventing the site from opening pop-up
windows. Choose the banner or icon, and then choose Allow pop-ups.

Task 1: Reviewing the IAM role to access and

configure Amazon Redshift
To access the Amazon Redshift console, you must have the appropriate permissions. This access
is controlled through an IAM policy that you are assigned when the lab environment launches.
In addition to having permissions to use Amazon Redshift, you will need to configure Amazon
Redshift to access Amazon S3 on your behalf. This is accomplished with an IAM role, and the role
MyRedshiftRole has been provided in the lab environment.
In this task, you will do the following:
Access IAM and get the ARN for the MyRedshiftRole IAM role.
Review the permissions in the AmazonS3ReadOnlyAccess managed IAM policy.
Review the permissions in the RedshiftIAMLabPolicy IAM policy.
3. Access the IAM role and get the Amazon Resource Name (ARN).
In the AWS Management Console, in the search box to the right of Services, search for
and choose IAM to open the IAM console.
In the navigation pane, choose Roles.
In the roles list, search for MyRedshiftRole and then choose the link for MyRedshiftRole
when it displays.
The Summary section at the top of the page displays the ARN for the role.
Copy the role's ARN to a text editor to use later.
On the lower part of the page, you can see which permissions policies are attached to the
role. Notice that two IAM policies are attached to this role: AmazonS3ReadOnlyAccess,
which is an AWS managed policy, and RedshiftIAMLabPolicy.
4. Review the policies that are associated with the role.
Expand and review the AmazonS3ReadOnlyAccess policy.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 4/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

This policy includes permissions that allow Amazon Redshift to access and read S3
buckets and the objects contained within them.
Now expand and review the RedshiftIAMLabPolicy policy.
This policy was included so that Amazon Redshift can describe Amazon Elastic Compute
Cloud (Amazon EC2) and Amazon Virtual Private Cloud (Amazon VPC) resources. For
example, the permissions in this policy allow Amazon Redshift to get details of the VPC
configuration and the subnets associated with the VPC, and create and delete a network
interface in the VPC.

Task 1 summary
In this task, you reviewed a role that allows Amazon S3 to have read-only access to Amazon
Redshift. You also reviewed a role that allows Amazon Redshift to access Amazon S3, Amazon
EC2, and Amazon VPC resources. Keep these permissions in mind as you complete this lab.

Task 2: Creating and configuring a Redshift cluster

Clusters are the main infrastructure component of an Amazon Redshift data warehouse. A cluster
is made up of one or more compute nodes. If a cluster has more than one node, then one of the
nodes is the leader node, and the other nodes are compute nodes. Client applications interact
with the leader node.
The following diagram illustrates the overall Amazon Redshift infrastructure:

In this task, you will do the following:

Access the Amazon Redshift console.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 5/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Create and configure a new Redshift cluster.

5. Access Amazon Redshift and initiate the Redshift cluster creation process.
In the search box to the right of Services, search for and choose Amazon Redshift to
open the Amazon Redshift console.
Note: If you see an error that states that you are not authorized, ignore the error. This is
because of security restrictions in the lab environment.
In the navigation pane, choose Clusters.
Note: If the navigation pane is collapsed, choose the menu icon in the upper-left corner
of the page to open the pane.
Choose Create cluster.
6. Configure the following settings for the cluster.
Note: Some settings are prefilled with the correct value.
In the Cluster configuration section, configure the following:
Cluster identifier: Enter redshift-cluster-1
Choose Production.
Node type: Choose dc2.large.
Number of nodes: Enter 2
Note: The Configuration summary section indicates the cost of the configuration
per month and the total compressed storage that is available to the nodes.
In the Database configurations section, configure the following:
Admin user name: Enter awsuser
Admin user password: Enter Passw0rd1
Note: The password includes a zero, not a capital letter O.
To ensure that the password is correct, select Show password.
In the Cluster permissions section, configure the following:
Choose Manage IAM roles > Associate IAM roles.
In the pop-up window, select MyRedshiftRole.
Choose Associate IAM roles.
On the cluster creation page, in the Associated IAM roles section, select
MyRedshiftRole and verify that (1/1) displays to the right of the Associated IAM
roles heading.
At the bottom of the page, choose Create cluster.
Wait a few minutes for the Amazon Redshift to create the cluster.
At the bottom of the page, in the Clusters section, choose the link for the redshift-
cluster-1 cluster.
Note: Ignore any messages about changing the size of the instance to improve
performance or using query editor v2.
Now you will configure the VPC that hosts your Redshift cluster so that it allows traffic through
port 5439 (the default port for Amazon Redshift). To do this, you will configure a security group for
your VPC and add the Redshift cluster to this security group.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 6/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Note: Amazon Redshift is deployed in a three-tier architecture. The cluster is placed within a
private subnet of your VPC with web application servers in the public subnet. Users access the
web application through a NAT gateway. The Redshift cluster is not available to the public internet,
but port 5439 is configured within the security group to allow the application servers to access the
Redshift database. You could add a bastion host to the public subnet to manage the cluster
through SSH.
7. Create the security group and configure access to the Redshift database.
In the search box to the right of Services, search for and choose EC2 to open the
Amazon EC2 console.
In the navigation pane, under Network & Security, choose Security Groups.
Choose Create security group, and configure the following:
Basic details section:
Security group name: Enter Redshift security group
Description: Enter Security group for my Redshift cluster
Inbound rules section:
Choose Add rule.
Type: Choose Redshift. Note that the protocol and port range automatically
update to TCP and 5439.
Source: Choose Anywhere-IPv4. Note that 0.0.0.0/0 is automatically added.
Description: Enter Redshift inbound rule
At the bottom of the page, choose Create security group.
You now need to add the cluster to your security group.
8. Configure your Redshift cluster.
Navigate to the Amazon Redshift console.
In the navigation pane, choose Clusters.
Important: Before proceeding to the next step, check that the Status is Available for the
redshift-cluster-1 cluster. It might take up to 5 minutes for the cluster to be available after
creation. To refresh the status, choose the refresh icon in the upper-right corner of this
section.
When the Status is Available for the cluster that you created, choose the link for the
redshift-cluster-1 cluster.
Choose the Properties tab.
In the Network and security settings section, choose Edit.
For VPC security groups, select Redshift security group AND clear default.
Choose Save changes.
Notice that the Status of the cluster changes to Modifying.
Important: Wait about 5 minutes for the Status to change to Available before
proceeding to the next step. To refresh the status, refresh the webpage using your
browser's refresh function.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 7/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Task 2 summary
In this task, you created a Redshift cluster and associated the MyRedshiftRole IAM role to it. This
role allows Amazon Redshift to access Amazon S3 resources to load a dataset. You also created a
security group for the cluster so that other resources, such as Amazon EC2 instances, can access
it through port 5439. You then added the cluster to the VPC by associating it with the security
group that you created.

Task 3: Creating tables in a database

Now that you have created a Redshift cluster for your POC, you can load data into it directly from
Amazon S3.
Note: Amazon Redshift can store data from a variety of formats, including the following:
Text
Apache Avro
JSON
Apache Parquet
Optimized Row Columnar (ORC)
Mary advises you that the music dataset is composed of simple text files. Some of the files use
the pipe ( | ) character as a delimiter, and other files use tab ( \t ) as a delimiter. To load this data,
you will first need to create a few tables (users, date, and sales) with the CREATE TABLE
command. Then, you will copy data into each table with the COPY command.
In this task, you will do the following:
Connect to the Redshift cluster.
Create the users, date, and sales tables in the dev database.
Load data into these tables from text files that are stored in Amazon S3.
9. Connect to the Redshift cluster and access the query editor.
In the navigation pane, choose Clusters.
Select the redshift-cluster-1 cluster, and choose Query data > Query in query editor.
Note: You will see two options here, Query in query editor and Query in query editor v2.
In this lab you will use Query in query editor, which uses version 1 of the Redshift query
editor.
Choose Connect to database and configure the following:
Connection: Choose Create a new connection.
Authentication: Choose Temporary credentials.
Database name: Enter dev
Database user: Enter awsuser
Note: If you receive an error message that says the query editor is not available for
the cluster that you selected, refresh the page and try to connect to the database
again. The connection will eventually be established.
In the Resources pane of the query editor, for Select schema, choose public.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 8/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

10. To create the users table, copy and paste the following text into a query tab, and choose Run:

create table users(

userid integer not null distkey sortkey,
username char(8),
city varchar(30),
state char(2),
likesports boolean,
liketheatre boolean,
likeconcerts boolean,
likejazz boolean,
likeclassical boolean,
likeopera boolean,
likerock boolean,
likevegas boolean,
likebroadway boolean,
likemusicals boolean);

After the query completes, the Query results tab on the lower part of the page displays
Completed.
11. To create the date table, copy and paste the following text into a new query tab, and choose
Run:

create table date(

dateid smallint not null distkey sortkey,
caldate date not null,
day character(3) not null,
week smallint not null,
month character(5) not null,
qtr character(5) not null,
year smallint not null,
holiday boolean default('N'));

12. To create the sales table, copy and paste the following text into a new query tab, and choose
Run:

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 9/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

create table sales(

salesid integer not null,
listid integer not null distkey,
sellerid integer not null,
buyerid integer not null,
eventid integer not null,
dateid smallint not null sortkey,
qtysold smallint not null,
pricepaid decimal(8,2),
commission decimal(8,2),
saletime timestamp);

Task 3 summary
In this task, you accessed the Amazon Redshift console and created the users, date, and sales
tables in the dev database.
Now it is time to add data to these tables.

Task 4: Loading data from Amazon S3

To load the data from the S3 bucket, you will use the COPY command three times, once for each
table in the dev database.
13. To load data to the users table, run the following query. Replace <REDSHIFT-ROLE-ARN>
with the ARN that you copied for the MyRedshiftRole IAM role earlier in the lab:

copy users from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/allusers_tab.txt'

credentials 'aws_iam_role=<REDSHIFT-ROLE-ARN>'
delimiter '\t' region 'us-west-2';

After you replace the ARN, the query will look similar to the following:

copy users from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/allusers_tab.txt'

credentials 'aws_iam_role=arn:aws:iam::123456789012:role/MyRedshiftRole'
delimiter '\t' region 'us-west-2';

After the query completes, the Query results tab on the lower part of the page displays
Completed.
This query copied text from a tab-delimited file called allusers_tab.txt, which is located in an
S3 bucket, into the users table. To accomplish this, Amazon Redshift assumed the
MyRedshiftRole so that it had the appropriate IAM permissions to complete the action.
Tip: To see the data that was loaded into the table, in the Resources pane, to the right of
users, choose the ellipsis (three dot) icon, and choose Preview data. The data displays on
the Table details tab on the lower part of the page.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 10/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

14. To load data to the date table, run the following query. Replace <REDSHIFT-ROLE-ARN> with
the ARN that you copied for the MyRedshiftRole IAM role earlier in the lab:

copy date from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/date2008_pipe.txt'

credentials 'aws_iam_role=<REDSHIFT-ROLE-ARN>'
delimiter '|' region 'us-west-2';

After you replace the ARN, the query will look similar to the following:

copy date from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/date2008_pipe.txt'

credentials 'aws_iam_role=arn:aws:iam::123456789012:role/MyRedshiftRole'
delimiter '|' region 'us-west-2';

This query copied text from a pipe-delimited file called date2008_pipe.txt, which is located in
an S3 bucket, into the date table.
15. To load data to the sales table, run the following query. Replace <REDSHIFT-ROLE-ARN> with
the ARN that you copied for the MyRedshiftRole IAM role earlier in the lab:

copy sales from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/sales_tab.txt'

credentials 'aws_iam_role=<REDSHIFT-ROLE-ARN>'
delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';

After you replace the ARN, the query will look similar to the following:

copy sales from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/sales_tab.txt'

credentials 'aws_iam_role=arn:aws:iam::123456789012:role/MyRedshiftRole'
delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';

This query copied text from a tab-delimited file called sales_tab.txt, which is located in an S3
bucket, into the sales table. The text dataset includes time data, so the query specifies the
timeformat parameter. This ensures that each timestamp in the records is formatted
accordingly.

Task 4 summary
In this task, you loaded a dataset into your Redshift cluster. To accomplish this, you copied data
from text files in Amazon S3 into the tables by using the Amazon Redshift query editor and the
SQL COPY command. You learned how to configure a delimiter in the COPY command (as an
example, you used \t as the delimiter for the first query) to ensure that data is loaded properly and
each record matches the schema for its table.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 11/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Task 5: Querying the data

Now that the dataset is imported successfully into the Redshift cluster, you can write queries to
generate the reports that Mary needs. You ask Mary what typical queries on this data look like so
that you can use them in your POC.
Mary provides a query that provides total number of items that were sold on a particular date.
Before you try the full query, check that a simple scan works on the data.
16. Preview the sales table.
In the Resources pane, to the right of sales, choose the ellipsis (three dot) icon, and choose
Preview data.
The data displays on the Table details tab on the lower part of the page and looks similar to
the following:

Now that everything is configured correctly, you can query the cluster by using Marys first query.
17. Run the following query in the query editor:

SELECT sum(qtysold)
FROM sales, date
WHERE sales.dateid = date.dateid
AND caldate = '2008-01-05';

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 12/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

The query returns one row with the value 210. This is the total number of items that were sold
on a particular date.
Mary also sent you a query to find the top 10 buyers by quantity. Try running that query.
18. Run the following query in the query editor:

SELECT username, total_quantity

FROM
(SELECT buyerid, sum(qtysold) total_quantity
FROM sales
GROUP BY buyerid
ORDER BY total_quantity desc limit 10) Q, users
WHERE Q.buyerid = userid
ORDER BY Q.total_quantity desc;

The query returns customer usernames and the total quantity that was sold to each customer,
ordered by the total_quantity field and limited to 10 results, similar to the following
screenshot.

You share the query results with Mary, and she is impressed with the speed. She tries the queries
in AWS herself and is happy with the results.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 13/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Task 5 summary
In this task, you used SQL to perform queries on the Redshift database. Querying a Redshift
database is a broad and deep topic, so you only got an introduction here by using some example
queries. Consider using Amazon Redshift for your data analysis use cases because it uses SQL
queries. This might help a team to adopt the service and optimize their workflow quickly.

Task 6: Running queries from the AWS CLI

In addition to using the console to run queries on data in Amazon Redshift, you can also use the
Amazon Redshift API, AWS SDK libraries, and AWS CLI to perform actions.
In this task, you will use the AWS Cloud9 terminal to perform actions against your Redshift cluster.
19. Navigate to the AWS Cloud9 integrated development environment (IDE).
In the AWS Management Console, in the search box next to Services, search for and
choose Cloud9 to open the AWS Cloud9 console.
AWS Cloud9 environments are listed.
For the environment named Cloud9 Instance, choose Open IDE.
A new browser tab opens and displays the AWS Cloud9 IDE.
First, you need a database user. You will use the awsuser user, which you used previously. For the
first query, you will select one record from the users table in the dev database.
20. Query records from the dev database.
Run the following command in the AWS Cloud9 terminal.

aws redshift-data execute-statement --region us-east-1 --db-user awsuser --cluster-identifier

redshift-cluster-1 --database dev --sql "select * from users limit 1"

The output is similar to the following:

{
"ClusterIdentifier": "redshift-cluster-1",
"CreatedAt": 1649864498.714,
"Database": "dev",
"DbUser": "awsuser",
"Id": "ce0ec95c-596e-4cd9-86d4-5d9847dacd94"
}

Notice the information that is provided in the output:

ClusterIdentifier: Identifies the Redshift cluster that you ran the query on
CreatedAt: Gives the time that the query was created
Database: Identifies the database that you ran the query on
DbUser: Identifies the user who ran the query

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 14/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Id: Gives the ID for the query

Note: The DbUser will never be an IAM user. Amazon Redshift requires a database
admin user, which is what you used to run the query.
Note: To get more information about the query, including the result, you can use the
query ID to retrieve information by using the AWS CLI or by using the AWS SDK
within your application.
Copy the Id value to a text editor to use in the next command.
21. To retrieve the results of the query, run the following get-statement-result command. Replace
<QUERY-ID> with the query ID that you copied.

aws redshift-data get-statement-result --id <QUERY-ID> --region us-east-1

The output is similar to the following:

{
"Records": [
[
{
"longValue": 3
},
{
"stringValue": "IFT66TXU"
},
{
"stringValue": "Lars"
},
{
"stringValue": "Ratliff"
},
{
"stringValue": "High Point"
},
{
"stringValue": "ME"
},
{
"stringValue": "[email protected]"
},
{
"stringValue": "(624) 767-2465"
},
{
"stringValue": "true"
},
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 15/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
},
{
"stringValue": "true"
},
{
"isNull": true
},
{
"isNull": true
},
{
"stringValue": "true"
}
]
],
"ColumnMetadata": [
{
"isCaseSensitive": false,
"isCurrency": false,
"isSigned": true,
"label": "userid",
"length": 0,
"name": "userid",
"nullable": 0,
"precision": 10,
"scale": 0,
"schemaName": "public",
"tableName": "users",
"typeName": "int4"
},
... # Note: You will see more information in this response.
],
"TotalNumRows": 1
}

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 16/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

This is the JSON equivalent to issuing the query directly in the query editor.

Task 6 summary
In this task, you learned how to use the AWS CLI to query a Redshift database.
Using the console to submit queries is helpful to discover data and test. However, your
applications or background processes will likely need programmatic access to perform actions
with AWS services. Therefore, it is important to understand how queries are processed in the AWS
backend, such as receiving a JSON response for a query. From there, you can do further
processing; for example, displaying the result to a user in an app. You could also store the result
for another workload to process later.

Task 7: Reviewing the IAM policy for Amazon

Redshift access
Now that you have tested your ability to query the Redshift database, you will review an IAM
policy that allows other users to perform Amazon Redshift actions in production.
22. Review the Policy-For-Data-Scientists IAM policy, which was created for you.
Navigate to the IAM console.
In the navigation pane, choose Users.
Notice the Mary user that is listed. This user is part of the DataScienceGroup IAM group.
Choose the link for the DataScienceGroup IAM group.
Choose the Permissions tab.
The policy that is attached to this group is displayed.
Choose the link for the Policy-For-Data-Scientists policy.
The permissions for this policy display. Review the permissions.
Note: The permissions provide limited access to the Redshift Data API.
Tip: To look more closely at the policy details, choose { } JSON. The JSON version of the
policy shows the details for the allowed and denied actions.

Task 7 summary
In this task, you reviewed the IAM policy that is attached to the DataScienceGroup IAM group. The
policy contains permissions for limited access to the Redshift Data API. The policy can be
assigned to users who intend to use the API to query a Redshift database. As with all services in
AWS, a user must have the appropriate permissions to perform any action.
Now that you have reviewed an example policy, you will be able to apply permissions accordingly
for the Redshift Data API and other use cases.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 17/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Task 8: Confirming that users can run queries on the

Redshift database
Now you will confirm that Mary can run queries on the Redshift database. As an administrator
user, occasionally you might need to confirm that your team members have the appropriate
access to AWS resources and can perform certain tasks.
In the following steps, you will determine if Mary can access the data in Redshift. To test her
access, you will issue AWS CLI commands to query the database and retrieve the results. When
you issue the commands, you will pass the Mary user credentials as bash variables with the
commands. The API will then try to perform the commands as the specified user.
23. Retrieve the credentials for the Mary IAM user, and store these as bash variables.
In the search box next to Services, search for and choose CloudFormation.
In the navigation pane, choose Stacks.
Choose the link for the stack that created the lab environment. The stack name includes a
random string of letters and numbers, and the stack should have the oldest creation time.
On the stack details page, choose the Outputs tab.
Note: When you create a CloudFormation template, you can choose to output
information about the resources that the template will create. The CloudFormation
template that created the resources in your lab environment output the access key and
secret access key for the mary user.
Copy the value of MarysAccessKey to your clipboard.
Return to the AWS Cloud9 terminal.
To create a variable for the access key, run the following command. Replace <ACCESS-
KEY> with the value from your clipboard.

AK=<ACCESS-KEY>

Return to the CloudFormation console, and copy the value of MarysSecretAccessKey to

your clipboard.
Return to the AWS Cloud9 terminal.
To create a variable for the secret access key, run the following command. Replace
<SECRET-ACCESS-KEY> with the value from your clipboard.

SAK=<SECRET-ACCESS-KEY>

24. Use the execute-statement command in the Redshift Data API to query the database.
Run the following command in the AWS Cloud9 terminal.

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws redshift-data execute-

statement --region us-east-1 --db-user awsuser --cluster-identifier redshift-cluster-1 --database
dev --sql "select * from users limit 1"

The output is similar to the following and looks like the output that was displayed after
you ran the command earlier:

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 18/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

{
"ClusterIdentifier": "redshift-cluster-1",
"CreatedAt": 1649864498.714,
"Database": "dev",
"DbUser": "awsuser",
"Id": "ce0ec95c-596e-4cd9-86d4-5d9847dacd94"
}

Now, test the Mary user ability to get the results of the query by using the Redshift Data API get-
statement-result command.
25. To retrieve the query results, copy the following command, replace the placeholder with the
Id in the response above, then run the command in the terminal:

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws redshift-data get-statement-

result --id <QUERY-ID> --region us-east-1

The output is similar to the following and looks like the output that was displayed after you ran
the command earlier:

{
"Records": [
[
{
"longValue": 3
},
{
"stringValue": "IFT66TXU"
},
{
"stringValue": "Lars"
},
{
"stringValue": "Ratliff"
},
{
"stringValue": "High Point"
},
{
"stringValue": "ME"
},
{
"stringValue": "[email protected]"
},
{
"stringValue": "(624) 767-2465"
},
{
"stringValue": "true"
},
{
"stringValue": "false"
},
{
"isNull": true
},
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 19/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
},
{
"stringValue": "true"
},
{
"isNull": true
},
{
"isNull": true
},
{
"stringValue": "true"
}
]
],
"ColumnMetadata": [
{
"isCaseSensitive": false,
"isCurrency": false,
"isSigned": true,
"label": "userid",
"length": 0,
"name": "userid",
"nullable": 0,
"precision": 10,
"scale": 0,
"schemaName": "public",
"tableName": "users",
"typeName": "int4"
},
... # Note: You will see more information in this response.
],
"TotalNumRows": 1
}

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 20/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Task 8 summary
The results from this task confirm that Mary has the ability to run SQL statements on a Redshift
database by using the AWS CLI. This will simplify her work so that she can address other
challenges, such as automating tasks in Amazon Redshift and building applications that use
Redshift queries by using the Amazon Redshift SDK.

Update from the team

Congratulations! You have learned how to create a Redshift database and load data into it from a
dataset that is stored in Amazon S3.
Mary and the data science team are happy with the workflow that you created by using Amazon
Redshift. By using this workflow, the team can use AWS services in a way that follows best
practices and simplifies their workloads.

Submitting your work

26. To record your progress, choose Submit at the top of these instructions.
27. When prompted, choose Yes.
After a couple of minutes, the grades panel appears and shows you how many points you
earned for each task. If the results do not display after a couple of minutes, choose Grades at
the top of these instructions.
Important: Some of the checks made by the submission process in this lab will only give
you credit if it has been at least 5 minutes since you completed the action. If you do not
receive credit the first time you submit, you may need to wait a couple minutes and the
submit again to receive credit for these items.
Tip: You can submit your work multiple times. After you change your work, choose Submit
again. Your last submission is recorded for this lab.
28. To find detailed feedback about your work, choose Submission Report.

Lab complete
Congratulations! You have completed the lab.
29. At the top of this page, choose End Lab, and then choose Yes to confirm that you want to
end the lab.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 21/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

A message panel indicates that the lab is terminating.

30. To close the panel, choose Close in the upper-right corner.

Additional resources
For more information about the services and concepts that were covered in this lab, see the
following resources:
Columnar Storage
Authorizing Amazon Redshift to Access Other AWS Services on Your Behalf
Amazon Redshift System and Architecture Overview
Amazon Redshift Clusters
Security Group Rules for Different Use Cases
Modular Architecture for Amazon Redshift
Tools to Build on AWS
Cloud9 User Guide
Actions, Resources, and Condition Keys for Amazon Redshift Data API
Using the Amazon Redshift Data API
Amazon Redshift Data API Reference

© 2022, Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be
reproduced or redistributed, in whole or in part, without prior written permission from Amazon
Web Services, Inc. Commercial copying, lending, or selling is prohibited.

https://awsacademy.instructure.com/courses/96839/modules/items/8946932 22/22

CadScriptingLanguages Skill
50% (2)
CadScriptingLanguages Skill
10 pages
Amazon Web Services (AWS) Interview Questions and Answers
From Everand
Amazon Web Services (AWS) Interview Questions and Answers
Tech Interviews
4.5/5 (3)
Gangboard Admin: Amazon Redshift Interview Questions and Answers
No ratings yet
Gangboard Admin: Amazon Redshift Interview Questions and Answers
112 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
From Everand
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
Tech Interviews
No ratings yet
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Installation Readme
No ratings yet
Installation Readme
5 pages
Flowgorithm PDF
100% (1)
Flowgorithm PDF
56 pages
Training MeasurLink 8
No ratings yet
Training MeasurLink 8
17 pages
OpenText Imaging Enterprise Scan 10.5.0 - Installation Guide English (CLES100500-IGD-En)
100% (1)
OpenText Imaging Enterprise Scan 10.5.0 - Installation Guide English (CLES100500-IGD-En)
46 pages
Operating System Laboratory - Lab Manual
No ratings yet
Operating System Laboratory - Lab Manual
22 pages
Amazon Redhsift
No ratings yet
Amazon Redhsift
25 pages
Amazon Redshift: Getting Started Guide
No ratings yet
Amazon Redshift: Getting Started Guide
34 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
54 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
Amazon Redshift-Lab
100% (1)
Amazon Redshift-Lab
14 pages
Amazon AWS Redshift Overview
No ratings yet
Amazon AWS Redshift Overview
3 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Amazon Redshift Interview Questions
100% (1)
Amazon Redshift Interview Questions
4 pages
Redshift-DA Handout
No ratings yet
Redshift-DA Handout
121 pages
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
From Everand
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
Tech Interviews
5/5 (2)
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
From Everand
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
Savitra Sirohi
No ratings yet
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
introductiontoamazonredshiftwebinar-130322140336-phpapp01
No ratings yet
introductiontoamazonredshiftwebinar-130322140336-phpapp01
32 pages
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
Amazon Redshift
No ratings yet
Amazon Redshift
5 pages
Mastering Amazon Web Services: Essential AWS Techniques
From Everand
Mastering Amazon Web Services: Essential AWS Techniques
Ed A Norex
No ratings yet
Mastering Amazon Web Services: Comprehensive Techniques for AWS Success
From Everand
Mastering Amazon Web Services: Comprehensive Techniques for AWS Success
Adam Jones
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
48 pages
Aws (S3, Iam, Ec2, Emr and Redshift)
100% (1)
Aws (S3, Iam, Ec2, Emr and Redshift)
16 pages
Securing Amazon Web Services
From Everand
Securing Amazon Web Services
Jay Schulman
3.5/5 (2)
AWS - Interview Questions and Answers
50% (4)
AWS - Interview Questions and Answers
112 pages
T15-AWSAnalyticsAndAI-ProblemStatement-Mocktest
No ratings yet
T15-AWSAnalyticsAndAI-ProblemStatement-Mocktest
14 pages
Introduction to Using Redshift
No ratings yet
Introduction to Using Redshift
1 page
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
No ratings yet
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
17 pages
Data Warehouse
No ratings yet
Data Warehouse
42 pages
Amazon Web Service: From Basics to Expert Proficiency
From Everand
Amazon Web Service: From Basics to Expert Proficiency
William Smith
No ratings yet
Redshift DG
No ratings yet
Redshift DG
735 pages
Redshift Dg (4)
No ratings yet
Redshift Dg (4)
733 pages
AWS SysOps Administrator Associate: From basic to advanced
From Everand
AWS SysOps Administrator Associate: From basic to advanced
Alex Carvalho
No ratings yet
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
100% (1)
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
18 pages
AWS Cloud Practitioner Study Guide & Practice Tests
From Everand
AWS Cloud Practitioner Study Guide & Practice Tests
SUJAN
No ratings yet
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
From Everand
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
Robert Johnson
No ratings yet
Redshift DG PDF
100% (1)
Redshift DG PDF
1,161 pages
Amazon Redshift: Database - PRN NO-2017BTECS00041
No ratings yet
Amazon Redshift: Database - PRN NO-2017BTECS00041
9 pages
Amazon Redshift
No ratings yet
Amazon Redshift
5 pages
Amazon Redshift is a fully managed data warehouse service in the cloud that allows you to analyze large amounts of data quickly
No ratings yet
Amazon Redshift is a fully managed data warehouse service in the cloud that allows you to analyze large amounts of data quickly
2 pages
Redshift DG
No ratings yet
Redshift DG
871 pages
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
From Everand
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
SUJAN
No ratings yet
AWS Fully Loaded: Mastering Amazon Web Services for Complete Cloud Solutions
From Everand
AWS Fully Loaded: Mastering Amazon Web Services for Complete Cloud Solutions
Kameron Hussain
No ratings yet
AWS Cloud Practitioner: From Basic to Advanced
From Everand
AWS Cloud Practitioner: From Basic to Advanced
Alex Carvalho
No ratings yet
Amazon Red Shift
No ratings yet
Amazon Red Shift
17 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
AWS Administration ??? The Definitive Guide: Learn to design, build, and manage your infrastructure on the most popular of all the Cloud platforms - Amazon Web Services
From Everand
AWS Administration ??? The Definitive Guide: Learn to design, build, and manage your infrastructure on the most popular of all the Cloud platforms - Amazon Web Services
Yohan Wadia
4.5/5 (3)
Amazon Redshift Serverless - Amazon Redshift
No ratings yet
Amazon Redshift Serverless - Amazon Redshift
13 pages
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Amazon Redshift Database Developer Guide
No ratings yet
Amazon Redshift Database Developer Guide
783 pages
AWS Cloud Automation: Harnessing Terraform For AWS Infrastructure As Code
From Everand
AWS Cloud Automation: Harnessing Terraform For AWS Infrastructure As Code
Rob Botwright
No ratings yet
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
No ratings yet
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
45 pages
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
From Everand
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
Tech Interviews
5/5 (1)
Amazon Lead2pass AWS-Solution-Architect-Associate Actual Test V2019-Jun-03 by Devin 328q Vce
100% (2)
Amazon Lead2pass AWS-Solution-Architect-Associate Actual Test V2019-Jun-03 by Devin 328q Vce
43 pages
Build DB Server
No ratings yet
Build DB Server
5 pages
AWS Data Engineering Cheatsheet2
No ratings yet
AWS Data Engineering Cheatsheet2
27 pages
Hansshow Model 3Y Center Console Dashboard Touch Screen User Manual (10.25 Inch 4G)
No ratings yet
Hansshow Model 3Y Center Console Dashboard Touch Screen User Manual (10.25 Inch 4G)
23 pages
Computer Fundamentals Assignments
No ratings yet
Computer Fundamentals Assignments
4 pages
FSC3000 Configuration Guide
No ratings yet
FSC3000 Configuration Guide
173 pages
VisualTCAD Ug en
No ratings yet
VisualTCAD Ug en
134 pages
A Guide To The MARIE Machine Simulator Environment
No ratings yet
A Guide To The MARIE Machine Simulator Environment
20 pages
Cisco Switch Best Practices Guide: Table of Contents (After Clicking Link Hit HOME To Return To TOC)
No ratings yet
Cisco Switch Best Practices Guide: Table of Contents (After Clicking Link Hit HOME To Return To TOC)
8 pages
ODI Migration Process
100% (2)
ODI Migration Process
10 pages
360WebPlatform_v2_InstallationGuide_en-US
No ratings yet
360WebPlatform_v2_InstallationGuide_en-US
28 pages
STEP 1: Install JDK/JRE: Environment Variable - PATH
No ratings yet
STEP 1: Install JDK/JRE: Environment Variable - PATH
2 pages
MicroDegree AWS Notes
No ratings yet
MicroDegree AWS Notes
85 pages
Disk Operating System (DOS) 5.1
No ratings yet
Disk Operating System (DOS) 5.1
12 pages
CPTC Aman Sinha
No ratings yet
CPTC Aman Sinha
57 pages
Tutorial Eagle
No ratings yet
Tutorial Eagle
20 pages
Mcaschsyll 2013
No ratings yet
Mcaschsyll 2013
134 pages
User Manual - Tersus GNSS Center - EN - 20200909
No ratings yet
User Manual - Tersus GNSS Center - EN - 20200909
45 pages
Forescout Installation Guide v8.1.x 10-14-2021
No ratings yet
Forescout Installation Guide v8.1.x 10-14-2021
88 pages
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
No ratings yet
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
41 pages
Windows Server 2016 DFS Namespaces Management Pack For Operations Manager 2016 Guide
No ratings yet
Windows Server 2016 DFS Namespaces Management Pack For Operations Manager 2016 Guide
26 pages
RHCE Exam Syllabus
No ratings yet
RHCE Exam Syllabus
6 pages
Oracle Tuxedo Command Reference
No ratings yet
Oracle Tuxedo Command Reference
226 pages
How Do I Manually Uninstall My Windows ESET Product - ESET Knowledgebase
No ratings yet
How Do I Manually Uninstall My Windows ESET Product - ESET Knowledgebase
16 pages
4.4.7 Lab - Configure Secure Administrative Access - ILM
No ratings yet
4.4.7 Lab - Configure Secure Administrative Access - ILM
22 pages
BMC Remedy Error Message
100% (1)
BMC Remedy Error Message
270 pages
HONEYWELL Experion Software Installation Users Guide
100% (2)
HONEYWELL Experion Software Installation Users Guide
120 pages

Lab_ Storing and Analyzing Data by Using Amazon Redshift

Uploaded by

Lab_ Storing and Analyzing Data by Using Amazon Redshift

Uploaded by

1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift

Storing and Analyzing Data by Using

AWS service restrictions

Accessing the AWS Management Console

Task 1: Reviewing the IAM role to access and

Task 2: Creating and configuring a Redshift cluster

In this task, you will do the following:

Create and configure a new Redshift cluster.

Task 3: Creating tables in a database

create table users(

create table date(

create table sales(

Task 4: Loading data from Amazon S3

copy users from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/allusers_tab.txt'

copy users from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/allusers_tab.txt'

copy date from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/date2008_pipe.txt'

copy date from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/date2008_pipe.txt'

copy sales from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/sales_tab.txt'

copy sales from 's3://aws-tc-largeobjects/CUR-TF-200-ACDSCI-1/Lab4/sales_tab.txt'

Task 5: Querying the data

SELECT username, total_quantity

Task 6: Running queries from the AWS CLI

aws redshift-data execute-statement --region us-east-1 --db-user awsuser --cluster-identifier

The output is similar to the following:

Notice the information that is provided in the output:

Id: Gives the ID for the query

aws redshift-data get-statement-result --id <QUERY-ID> --region us-east-1

The output is similar to the following:

Task 7: Reviewing the IAM policy for Amazon

Task 8: Confirming that users can run queries on the

Return to the CloudFormation console, and copy the value of MarysSecretAccessKey to

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws redshift-data execute-

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws redshift-data get-statement-

Update from the team

Submitting your work

A message panel indicates that the lab is terminating.

You might also like