Datawarehousing HandsOn
Datawarehousing HandsOn
Redshift
Scenario
Create the Dataset to set in Redshift cluster using star schema design. Data warehouse databases
commonly use a star schema design, in which a central fact table contains the core data for the
database and several dimension tables provide descriptive attribute information for the fact table. The
fact table joins each dimension table on a foreign key that matches the dimension's primary key.
For this tutorial, you will use a set of five tables based on the Star Schema Benchmark (SSB)
schema. The following diagram shows the SSB data model.
Solution
© LEARNSECTOR
Learnsector
If you already have a cluster that you want to use, you can skip this step. Your cluster should
have at least two nodes. For the exercises in this tutorial, you will use a four-node cluster.
To launch a dc1.large cluster with four nodes, follow the steps shown in the demo, select Multi
Node for Cluster Type and set Number of Compute Nodes to 4.
Follow the steps to connect to your cluster from a SQL client and test a connection. You do not
need to complete the remaining steps to create tables, upload data, and try example queries.
For the purposes of this tutorial, the first time you create the tables, they will not have sort
keys, distribution styles, or compression encodings.
© LEARNSECTOR
Learnsector
© LEARNSECTOR
Learnsector
The sample data for this tutorial is provided in an Amazon S3 buckets that give read
access to all authenticated AWS users, so any valid AWS credentials that permit access
to Amazon S3 will work.
a. Create a new text file named loadssb.sql containing the following SQL.
© LEARNSECTOR
Learnsector
c. Execute the COPY commands either by running the SQL script or by copying and
pasting the commands into your SQL client.
Note
The load operation will take about 10 to 15 minutes for all five tables.
0 row(s) affected.
copy executed successfully
3. Sum the execution time for all five tables, or else note the total script execution time.
You’ll record that number as the load time in the benchmarks table in Step 2, following.
4. To verify that each table loaded correctly, execute the following commands.
© LEARNSECTOR
Learnsector
The following results table shows the number of rows for each SSB table.
LINEORDER 600,037,902
PART 1,400,000
CUSTOMER 3,000,000
SUPPLIER 1,000,000
DWDATE 2,556
© LEARNSECTOR