Lab_ Storing and Analyzing Data by Using Amazon Redshift
Lab_ Storing and Analyzing Data by Using Amazon Redshift
[Version 1.0.23]
Duration
This lab requires 90 minutes to complete. The lab will remain active for 180 minutes so that you
can complete all steps.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 1/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Scenario
In this lab, you will build a POC to use Amazon Redshift to address the need to query large
datasets. You will use SQL queries for a music ticket sales dataset. These data files are already
stored in Amazon S3 in another account, and you will load them into a Redshift database.
The following diagram illustrates the environment that was created for you in AWS at the
beginning of the lab.
Tip: To review the CloudFormation template that built this environment, after logging in to the
AWS Management Console, navigate to the CloudFormation console. In the navigation pane,
choose Stacks.
By the end of the lab, you will have created the additional architecture that is shown in the
following diagram. The table after the diagram provides a detailed explanation of the architecture.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 2/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Numbered
Detail
Task
You will review the MyRedshiftRole IAM role, which has the RedshiftIAMLabPolic
1
policy attached.
You will use the Redshift console to create a Redshift cluster, called redshift-clus
2
the dev database.
3 You will create tables in the Redshift dev database.
You will load the music ticket sales dataset data into the dev database from an S
4
in another account.
5 You will query the dev database by using the Amazon Redshift console.
You will also query the dev database by using the Redshift Data API and the AW
6
the AWS Cloud9 terminal.
You will review the Policy-For-Data-Scientists IAM policy, which is assigned to
7
the Mary IAM user.
You will test the Mary user's ability to access Amazon Redshift with the Policy-F
8 Scientists IAM policy and query the dev database from the AWS CLI in the Clou
terminal.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 3/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 4/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
This policy includes permissions that allow Amazon Redshift to access and read S3
buckets and the objects contained within them.
Now expand and review the RedshiftIAMLabPolicy policy.
This policy was included so that Amazon Redshift can describe Amazon Elastic Compute
Cloud (Amazon EC2) and Amazon Virtual Private Cloud (Amazon VPC) resources. For
example, the permissions in this policy allow Amazon Redshift to get details of the VPC
configuration and the subnets associated with the VPC, and create and delete a network
interface in the VPC.
Task 1 summary
In this task, you reviewed a role that allows Amazon S3 to have read-only access to Amazon
Redshift. You also reviewed a role that allows Amazon Redshift to access Amazon S3, Amazon
EC2, and Amazon VPC resources. Keep these permissions in mind as you complete this lab.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 5/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 6/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Note: Amazon Redshift is deployed in a three-tier architecture. The cluster is placed within a
private subnet of your VPC with web application servers in the public subnet. Users access the
web application through a NAT gateway. The Redshift cluster is not available to the public internet,
but port 5439 is configured within the security group to allow the application servers to access the
Redshift database. You could add a bastion host to the public subnet to manage the cluster
through SSH.
7. Create the security group and configure access to the Redshift database.
In the search box to the right of Services, search for and choose EC2 to open the
Amazon EC2 console.
In the navigation pane, under Network & Security, choose Security Groups.
Choose Create security group, and configure the following:
Basic details section:
Security group name: Enter Redshift security group
Description: Enter Security group for my Redshift cluster
Inbound rules section:
Choose Add rule.
Type: Choose Redshift. Note that the protocol and port range automatically
update to TCP and 5439.
Source: Choose Anywhere-IPv4. Note that 0.0.0.0/0 is automatically added.
Description: Enter Redshift inbound rule
At the bottom of the page, choose Create security group.
You now need to add the cluster to your security group.
8. Configure your Redshift cluster.
Navigate to the Amazon Redshift console.
In the navigation pane, choose Clusters.
Important: Before proceeding to the next step, check that the Status is Available for the
redshift-cluster-1 cluster. It might take up to 5 minutes for the cluster to be available after
creation. To refresh the status, choose the refresh icon in the upper-right corner of this
section.
When the Status is Available for the cluster that you created, choose the link for the
redshift-cluster-1 cluster.
Choose the Properties tab.
In the Network and security settings section, choose Edit.
For VPC security groups, select Redshift security group AND clear default.
Choose Save changes.
Notice that the Status of the cluster changes to Modifying.
Important: Wait about 5 minutes for the Status to change to Available before
proceeding to the next step. To refresh the status, refresh the webpage using your
browser's refresh function.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 7/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Task 2 summary
In this task, you created a Redshift cluster and associated the MyRedshiftRole IAM role to it. This
role allows Amazon Redshift to access Amazon S3 resources to load a dataset. You also created a
security group for the cluster so that other resources, such as Amazon EC2 instances, can access
it through port 5439. You then added the cluster to the VPC by associating it with the security
group that you created.
10. To create the users table, copy and paste the following text into a query tab, and choose Run:
After the query completes, the Query results tab on the lower part of the page displays
Completed.
11. To create the date table, copy and paste the following text into a new query tab, and choose
Run:
12. To create the sales table, copy and paste the following text into a new query tab, and choose
Run:
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 9/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Task 3 summary
In this task, you accessed the Amazon Redshift console and created the users, date, and sales
tables in the dev database.
Now it is time to add data to these tables.
After you replace the ARN, the query will look similar to the following:
After the query completes, the Query results tab on the lower part of the page displays
Completed.
This query copied text from a tab-delimited file called allusers_tab.txt, which is located in an
S3 bucket, into the users table. To accomplish this, Amazon Redshift assumed the
MyRedshiftRole so that it had the appropriate IAM permissions to complete the action.
Tip: To see the data that was loaded into the table, in the Resources pane, to the right of
users, choose the ellipsis (three dot) icon, and choose Preview data. The data displays on
the Table details tab on the lower part of the page.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 10/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
14. To load data to the date table, run the following query. Replace <REDSHIFT-ROLE-ARN> with
the ARN that you copied for the MyRedshiftRole IAM role earlier in the lab:
After you replace the ARN, the query will look similar to the following:
This query copied text from a pipe-delimited file called date2008_pipe.txt, which is located in
an S3 bucket, into the date table.
15. To load data to the sales table, run the following query. Replace <REDSHIFT-ROLE-ARN> with
the ARN that you copied for the MyRedshiftRole IAM role earlier in the lab:
After you replace the ARN, the query will look similar to the following:
This query copied text from a tab-delimited file called sales_tab.txt, which is located in an S3
bucket, into the sales table. The text dataset includes time data, so the query specifies the
timeformat parameter. This ensures that each timestamp in the records is formatted
accordingly.
Task 4 summary
In this task, you loaded a dataset into your Redshift cluster. To accomplish this, you copied data
from text files in Amazon S3 into the tables by using the Amazon Redshift query editor and the
SQL COPY command. You learned how to configure a delimiter in the COPY command (as an
example, you used \t as the delimiter for the first query) to ensure that data is loaded properly and
each record matches the schema for its table.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 11/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Now that everything is configured correctly, you can query the cluster by using Marys first query.
17. Run the following query in the query editor:
SELECT sum(qtysold)
FROM sales, date
WHERE sales.dateid = date.dateid
AND caldate = '2008-01-05';
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 12/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
The query returns one row with the value 210. This is the total number of items that were sold
on a particular date.
Mary also sent you a query to find the top 10 buyers by quantity. Try running that query.
18. Run the following query in the query editor:
The query returns customer usernames and the total quantity that was sold to each customer,
ordered by the total_quantity field and limited to 10 results, similar to the following
screenshot.
You share the query results with Mary, and she is impressed with the speed. She tries the queries
in AWS herself and is happy with the results.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 13/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Task 5 summary
In this task, you used SQL to perform queries on the Redshift database. Querying a Redshift
database is a broad and deep topic, so you only got an introduction here by using some example
queries. Consider using Amazon Redshift for your data analysis use cases because it uses SQL
queries. This might help a team to adopt the service and optimize their workflow quickly.
{
"ClusterIdentifier": "redshift-cluster-1",
"CreatedAt": 1649864498.714,
"Database": "dev",
"DbUser": "awsuser",
"Id": "ce0ec95c-596e-4cd9-86d4-5d9847dacd94"
}
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 14/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
{
"Records": [
[
{
"longValue": 3
},
{
"stringValue": "IFT66TXU"
},
{
"stringValue": "Lars"
},
{
"stringValue": "Ratliff"
},
{
"stringValue": "High Point"
},
{
"stringValue": "ME"
},
{
"stringValue": "[email protected]"
},
{
"stringValue": "(624) 767-2465"
},
{
"stringValue": "true"
},
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 15/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
},
{
"stringValue": "true"
},
{
"isNull": true
},
{
"isNull": true
},
{
"stringValue": "true"
}
]
],
"ColumnMetadata": [
{
"isCaseSensitive": false,
"isCurrency": false,
"isSigned": true,
"label": "userid",
"length": 0,
"name": "userid",
"nullable": 0,
"precision": 10,
"scale": 0,
"schemaName": "public",
"tableName": "users",
"typeName": "int4"
},
... # Note: You will see more information in this response.
],
"TotalNumRows": 1
}
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 16/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
This is the JSON equivalent to issuing the query directly in the query editor.
Task 6 summary
In this task, you learned how to use the AWS CLI to query a Redshift database.
Using the console to submit queries is helpful to discover data and test. However, your
applications or background processes will likely need programmatic access to perform actions
with AWS services. Therefore, it is important to understand how queries are processed in the AWS
backend, such as receiving a JSON response for a query. From there, you can do further
processing; for example, displaying the result to a user in an app. You could also store the result
for another workload to process later.
Task 7 summary
In this task, you reviewed the IAM policy that is attached to the DataScienceGroup IAM group. The
policy contains permissions for limited access to the Redshift Data API. The policy can be
assigned to users who intend to use the API to query a Redshift database. As with all services in
AWS, a user must have the appropriate permissions to perform any action.
Now that you have reviewed an example policy, you will be able to apply permissions accordingly
for the Redshift Data API and other use cases.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 17/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
AK=<ACCESS-KEY>
SAK=<SECRET-ACCESS-KEY>
24. Use the execute-statement command in the Redshift Data API to query the database.
Run the following command in the AWS Cloud9 terminal.
The output is similar to the following and looks like the output that was displayed after
you ran the command earlier:
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 18/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
{
"ClusterIdentifier": "redshift-cluster-1",
"CreatedAt": 1649864498.714,
"Database": "dev",
"DbUser": "awsuser",
"Id": "ce0ec95c-596e-4cd9-86d4-5d9847dacd94"
}
Now, test the Mary user ability to get the results of the query by using the Redshift Data API get-
statement-result command.
25. To retrieve the query results, copy the following command, replace the placeholder with the
Id in the response above, then run the command in the terminal:
The output is similar to the following and looks like the output that was displayed after you ran
the command earlier:
{
"Records": [
[
{
"longValue": 3
},
{
"stringValue": "IFT66TXU"
},
{
"stringValue": "Lars"
},
{
"stringValue": "Ratliff"
},
{
"stringValue": "High Point"
},
{
"stringValue": "ME"
},
{
"stringValue": "[email protected]"
},
{
"stringValue": "(624) 767-2465"
},
{
"stringValue": "true"
},
{
"stringValue": "false"
},
{
"isNull": true
},
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 19/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
{
"stringValue": "false"
},
{
"isNull": true
},
{
"stringValue": "false"
},
{
"stringValue": "true"
},
{
"isNull": true
},
{
"isNull": true
},
{
"stringValue": "true"
}
]
],
"ColumnMetadata": [
{
"isCaseSensitive": false,
"isCurrency": false,
"isSigned": true,
"label": "userid",
"length": 0,
"name": "userid",
"nullable": 0,
"precision": 10,
"scale": 0,
"schemaName": "public",
"tableName": "users",
"typeName": "int4"
},
... # Note: You will see more information in this response.
],
"TotalNumRows": 1
}
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 20/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Task 8 summary
The results from this task confirm that Mary has the ability to run SQL statements on a Redshift
database by using the AWS CLI. This will simplify her work so that she can address other
challenges, such as automating tasks in Amazon Redshift and building applications that use
Redshift queries by using the Amazon Redshift SDK.
Lab complete
Congratulations! You have completed the lab.
29. At the top of this page, choose End Lab, and then choose Yes to confirm that you want to
end the lab.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 21/22
1/2/25, 17:19 Lab: Storing and Analyzing Data by Using Amazon Redshift
Additional resources
For more information about the services and concepts that were covered in this lab, see the
following resources:
Columnar Storage
Authorizing Amazon Redshift to Access Other AWS Services on Your Behalf
Amazon Redshift System and Architecture Overview
Amazon Redshift Clusters
Security Group Rules for Different Use Cases
Modular Architecture for Amazon Redshift
Tools to Build on AWS
Cloud9 User Guide
Actions, Resources, and Condition Keys for Amazon Redshift Data API
Using the Amazon Redshift Data API
Amazon Redshift Data API Reference
© 2022, Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be
reproduced or redistributed, in whole or in part, without prior written permission from Amazon
Web Services, Inc. Commercial copying, lending, or selling is prohibited.
https://awsacademy.instructure.com/courses/96839/modules/items/8946932 22/22