0% found this document useful (0 votes)
2 views

Unit 5 Notes

The document provides an overview of Elastic Block Storage (EBS) and Amazon S3, detailing their types, features, and usage. EBS is a durable storage option for EC2 instances, with features like scalability, backup, and encryption, while S3 is a scalable storage service for various data types, offering multiple storage classes for different access needs. Additionally, it covers lifecycle management in S3, Amazon RDS for relational databases, and instructions for creating and managing buckets and volumes.

Uploaded by

akshaya Dasari
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 5 Notes

The document provides an overview of Elastic Block Storage (EBS) and Amazon S3, detailing their types, features, and usage. EBS is a durable storage option for EC2 instances, with features like scalability, backup, and encryption, while S3 is a scalable storage service for various data types, offering multiple storage classes for different access needs. Additionally, it covers lifecycle management in S3, Amazon RDS for relational databases, and instructions for creating and managing buckets and volumes.

Uploaded by

akshaya Dasari
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

1) Explain types of Elastic block storage with its features?

Elastic Block Storage (EBS): From the aforementioned list, EBS is a block type durable and persistent
storage that can be attached to EC2 instances for additional storage. Unlike EC2 instance storage
volumes which are suitable for holding temporary data EBS volumes are highly suitable for essential
and long term data. EBS volumes are specific to availability zones and can only be attached to
instances within the same availability zone.

Features of EBS:

 Scalability: EBS volume sizes and features can be scaled as per the needs of the system. This
can be done in two ways:

o Take a snapshot of the volume and create a new volume using the Snapshot with
new updated features.

o Updating the existing EBS volume from the console.

 Backup: Users can create snapshots of EBS volumes that act as backups.

o Snapshot can be created manually at any point in time or can be scheduled.

o Snapshots are stored on AWS S3 and are charged according to the S3 storage
charges.

o Snapshots are incremental in nature.

o New volumes across regions can be created from snapshots.

 Encryption: Encryption can be a basic requirement when it comes to storage. This can be
due to the government of regulatory compliance. EBS offers an AWS managed encryption
feature.

o Users can enable encryption when creating EBS volumes by clicking on a checkbox.

o Encryption Keys are managed by the Key Management Service (KMS) provided by
AWS.

o Encrypted volumes can only be attached to selected instance types.

o Encryption uses the AES-256 algorithm.

o Snapshots from encrypted volumes are encrypted and similarly, volumes created
from snapshots are encrypted.

 Charges: Unlike AWS S3, where you are charged for the storage you consume, AWS charges
users for the storage you hold. For example if you use 1 GB storage in a 5 GB volume, you’d
still be charged for a 5 GB EBS volume.

o EBS charges vary from region to region.

Types of EBS Volumes:

SSD: This storage type is suitable for small chunks of data that requires fast I/Ops. SSDs can be used
as root volumes for EC2 instances.

 General Purpose SSD (GP2)


o Offers a single-digit millisecond latency.

o Can provide 3000 IOps burst.

o IOps speed is limited from 3-10000 IOps.

o The throughput of these volumes is 128MBPS up to 170GB. After which throughput


increases 768KBPS per GB and peaks at 160MBPS.

 Provisioned IOPS SSD (IO1)

o These SSDs are IO intensive.

o Users can specify IOPS requirement during creation.

o Size limit is 4TB-16TB

o According to AWS claims “These volumes, if attached to EBS optimized instances will
deliver IOPS defined within 10% 99.9% times of the year”

o Max IOPS speed is 20000.

HDD: This storage type is suitable for Big Data chunks and slower processing. These volumes cannot
be used as root volumes for EC2. AWS claims that “These volumes provide expected throughput
99.9% times of the year”

 Cold HDD (SC1)

o SC1 is the cheapest of all EBS volume types. It is suitable for large, infrequently
accessed data.

o Max Burst speed offered is 250 Mbps

 Throughput optimized HDD (ST)

o Suitable for large, frequently accessed data.

o Burst speed ranges from 250 MBPS to 500 MBPS.

2) Explain how to create , attach volume in EBS?

AIM : Create and manage Elastic Block Store.


Go to ServicescomputeEC2ELASTIC BLOCK STOREVOLUMES
Click on create volume
Select default settings as volume type :General Purpose SSD(solid state drives )
(gp3),size:20gb,IOPS(Input/Output Operations Per Second):3000,Throughput(mb/s):125

availability zone is same as where Instance was crated ,choose that availability zone only.

Click on create volume


Created successfully.

Launch the Windows


instance and connect the windows instance.

Open File Folder—> we can find only one driver :c with 29.9 GB

Attach the EDS volume to Windows instance.

Go to InstanceVolumesclick one volume nameactionsattach Volume


Select Instance from same availability zone and remember Device name &click on attach volume.
Now this volume state will be change to in-use from available state

3) Describe amazon S3? Why is It used? explain types of storage classes?

What is Amazon S3?

Amazon S3 is a Simple Storage Service in AWS that stores files of different types like Photos, Audio,
and Videos as Objects providing more scalability and security to. It allows the users to store and
retrieve any amount of data at any point in time from anywhere on the web. It facilitates features
such as extremely high availability, security, and simple connection to other AWS Services.

What is Amazon S3 Used for?

Amazon S3 is used for various purposes in the Cloud because of its robust features with scaling and
Securing of data. It helps people with all kinds of use cases from fields such as Mobile/Web
applications, Big data, Machine Learning and many more. The following are a few Wide Usage of
Amazon S3 service.
 Data Storage: Amazon s3 acts as the best option for scaling both small and large storage
applications. It helps in storing and retrieving the data-intensitive applications as per needs
in ideal time.

 Backup and Recovery: Many Organizations are using Amazon S3 to backup their critical data
and maintain the data durability and availability for recovery needs.

 Hosting Static Websites: Amazon S3 facilitates in storing HTML, CSS and other web content
from Users/developers allowing them for hosting Static Websites benefiting with low-latency
access and cost-effectiveness. To know more detailing refer this Article – How to host static
websites using Amazon S3

 Data Archiving: Amazon S3 Glacier service integration helps as a cost-effective solution for
long-term data storing which are less frequently accessed applications.

 Big Data Analytics: Amazon S3 is often considered as data lake because of its capacity to
store large amounts of both structured and unstructured data offering seamless integration
with other AWS Analytics and AWS Machine Learning Services.

What are the types of S3 Storage Classes?

AWS S3 provides multiple storage types that offer different performance and features and different
cost structures.

 Standard: Suitable for frequently accessed data, that needs to be highly available and
durable.

 Standard Infrequent Access (Standard IA): This is a cheaper data-storage class and as the
name suggests, this class is best suited for storing infrequently accessed data like log files or
data archives. Note that there may be a per GB data retrieval fee associated with the
Standard IA class.

 Intelligent Tiering: This service class classifies your files automatically into frequently
accessed and infrequently accessed and stores the infrequently accessed data in infrequent
access storage to save costs. This is useful for unpredictable data access to an S3 bucket.

 One Zone Infrequent Access (One Zone IA): All the files on your S3 have their copies stored
in a minimum of 3 Availability Zones. One Zone IA stores this data in a single availability zone.
It is only recommended to use this storage class for infrequently accessed, non-essential
data. There may be a per GB cost for data retrieval.

 Reduced Redundancy Storage (RRS): All the other S3 classes ensure the durability of
99.999999999%. RRS only ensures 99.99% durability. AWS no longer recommends RRS due to
its less durability. However, it can be used to store non-essential data.

How Does Amazon S3 works?

Amazon S3 works on organizing the data into unique S3 Buckets, customizing the buckets with
Acccess controls. It allows the users to store objects inside the S3 buckets with facilitating features
like versioning and lifecycle management of data storage with scaling.

4) Explain amazon S3 buckets? How to create buckets? how upload file, download, copy, move ,
delete a file in it?
What is an Amazon S3 bucket?

Amazon S3 bucket is a fundamental Storage Container feature in AWS S3 Service. It provides a secure
and scalable repository for storing of Objects such as Text data, Images, Audio and Video files over
AWS Cloud. Each S3 bucket name should be named globally unique and should be configured with
ACL (Access Control List).

Amazon S3 Bucket: Data, in S3, is stored in containers called buckets. Each bucket will have its own
set of policies and configurations. This enables users to have more control over their data. Bucket
Names must be unique. Can be thought of as a parent folder of data. There is a limit of 100 buckets
per AWS account. But it can be increased if requested by AWS support.

Amazon S3 Objects: Fundamental entity type stored in AWS S3.You can store as many objects as you
want to store. The maximum size of an AWS S3 bucket is 5TB. It consists of the following:

 Key

 Version ID

 Value

 Metadata

 Subresources

 Access control information

 Tags

1. Create a Bucket.
1. Sign into the AWS Management Console. Under Services, navigate to the S3 console.
Choose Create bucket

2. give the bucket a unique name .Verify that the region matches your product region.
Leave default settings as it is . Select Create Bucket.
2. Add an object to a bucket.
1. In the Buckets list, choose the name of the bucket that you want
to upload your object to.

3. On the Objects tab for your bucket, choose Upload.


Under Files and folders, choose Add files.Choose a file to upload, and
then choose Open. Choose Upload
3. Open and download an object.
1. In the Buckets list, choose the name of the bucket that you
want to download an object from.
2. You can download an object from an S3 bucket in any of the
following ways:
3. Select the check box next to the object, and choose Download.
If you want to download the object to a specific folder, on
the Actions menu, choose Download as.

Or

4. How to move, copy S3 objects.


1. Navigate to the Amazon S3 bucket or folder that contains the
objects that you want to copy.
2. Select the check box to the left of the names of the objects that you
want to copy.
Choose Actions and choose Copy / Move from the list of options
that appears.

3. Choose the destination folder:


a. Choose Browse S3.
b. Choose the option button to the left of the folder name.
To navigate into a folder and choose a subfolder as your destination,
choose the folder name.

In the bottom right, choose Copy.

4. How to delete an object.


1. In the Buckets list, choose the name of the bucket that you want to
delete an object from. Select the object that you want to delete.
2. Choose Delete from the options in the upper right.
3. On the Delete objects page, type delete to confirm deletion of your
objects. Choose Delete objects.

5. How to empty a bucket.


1. In the Buckets list, select the bucket that you want to empty, and
then choose Empty.
2. To confirm that you want to empty the bucket and delete all the
objects in it, in Empty bucket, type permanently delete.

5) Explain S3 life cycle management? Explain it with an


experiment?

An S3 Lifecycle Management in simple terms when in an S3


bucket some data is stored for a longer time in standard storage
even when not needed. The need to shift this old data to cheaper
storage or delete it after a span of time gives rise to life cycle
management.
Why is it needed?

Assume a lot of data is updated in an S3 bucket regularly, and if


all the data is maintained by standard storage it will cost you
more(even if previous data is of no use after some time). So, to
avoid extra expenses and to maintain data as per requirement
only life cycle management is needed.

There are 2 types of actions:

1. Transition actions: Moving objects from one storage


class to another storage class. Each storage class has a
different cost associated with it.
2. Expiration actions: When objects expire after a span of
time (say 30 days,60 days, etc). Amazon S3 deletes expired
objects on your behalf.
Step1: To configure Glacier with following task.

Transfer files from S3 to Glacier Note: Amazon does not allows files to be directly loaded on
Glacier Use S3 or third party tools to archive or restore.

1.Using S3 bucket & S3 lifecycle permission to archive in glacier


Select S3 bucket [refer S3 topics how to create bucket and upload files]

Select the bucket,

Go to management Click on LifeCycle


Click on Add rule

Give rule name,click on apply to all objects


Under Lifecycle rules Select Choose Rule actions move current versions of objects between
storage classes,

Verify storage class is Standard-IA


SELECT DAYS AFTER OBJECT CREATION 30 DAYS(MIN)

Click on Create Rule

Glacier voult created successfully

Verify glacier vault from s3 bucket,

Goto bucketschoose bucket namemanagement

Glacier vault created in life cycle rule.


6) Explain amazon relational database service in detail?

Amazon RDS: DB Instances

 Amazon RDS (Relational Database Service) is a


managed service by AWS that makes it easy to set up,
operate, and scale relational databases.

 DB Instance: An Amazon RDS DB instance is an isolated


database environment in the cloud, consisting of compute,
storage, and memory resources.

 Supported database engines: MySQL, PostgreSQL, MariaDB,


Oracle, SQL Server, and Amazon Aurora.

 Instances can be single-instance or deployed as multi-AZ


for high availability.

RDS Limits

 Default Limits: AWS sets some default limits on RDS


resources:

o Max DB Instances per Account: Typically 40, but


can vary based on region and DB engine.

o Storage: Maximum storage varies by engine (up to


64TB for Aurora).

o Connections: Each instance has a limit on the


number of concurrent connections, which varies based
on the instance type.

 Requesting Limit Increases: AWS allows requests for


limit increases through the Service Quotas console.
 DB Engine-Specific Limits: Each database engine might
have specific restrictions, like maximum database size, row
limits, or supported features.

4. Automatic Backups

 Purpose: Automatic backups allow RDS to create daily


backups of your database and transaction logs, enabling
point-in-time recovery.

 Enabling Backups: Automatic backups are enabled by


default when creating a DB instance.

 Backup Retention Period: You can configure retention


from 1 to 35 days. Backup storage is allocated
automatically by RDS.

 Automated Snapshot Creation: RDS takes snapshots


and retains them based on the retention period.

 Restoration: You can restore a database to any point in


time within the retention period using these backups.

5. Snapshots

 Manual Snapshots: A snapshot is a manual backup of the


database that you can create at any time.

 Use Case: Snapshots are useful before performing major


updates or migrating data.

 Retention: Unlike automatic backups, manual snapshots


are retained until you delete them.

 Usage: Snapshots can be used to create a new DB instance


or restore an existing one. They are also useful for
migration or cloning databases.

6. Restores

 Restoring from Automatic Backups:

o You can restore your database to any point within the


backup retention period.
o AWS creates a new DB instance when you perform a
restore from backup.

 Restoring from Snapshots:

o Snapshots can be used to restore to the exact state


when the snapshot was taken.

o A new DB instance is created from the snapshot, and it


can be in the same AWS region or in another region if
cross-region snapshots are enabled.

 Point-in-Time Recovery: Using automatic backups and


transaction logs, RDS can restore a DB instance to any
second within the retention period.

7. Best Practices for Backups and Recovery

 Set an Appropriate Retention Period based on business


requirements and RPO (Recovery Point Objective).

 Automate Manual Snapshots using AWS Lambda or


scheduled tasks for regular backups if additional control is
needed.

 Cross-Region Snapshots: Enable cross-region replication


for disaster recovery.

 Regularly Test Restores to ensure data integrity and the


effectiveness of the backup plan.
7) How to configure amazon RDS?

What is RDS?
RDS belongs to Amazon Relational Database Service. It is a
distributed Relational Database Service by Amazon Web
Services.

Features of Cloud Database

1. A database service built and accessed through a cloud


platform
2. Enables enterprise users to host databases without buying
dedicated hardware
3. Can be managed by the user or offered as a service and
managed by a provider
4. Can support SQL (including MySQL) or NoSQL databases
5. Accessed through a web interface or vendor-provided API

Steps for making the database on cloud platform

Now, let’s start to know how to make the database on RDS of


AWS.

Step 1:
First open your account on the AWS (Amazon Web Service), as
your main screen is open then you have to go to the services,
and at the bottom, there is an option of the database, here we
have the RDS amazon relation database services and then click
on the RDS.

Step 2:
After clicking on the RDS you have to click on the launch
instance it will launch your database instance on the cloud.
Step 3:
Then the page given below is open you have to select your
database from the given database on it.

Step 4:
Don’t forget to click on the free trial as this all services are paid
one if you don’t click on it and selected the database which is
paid you have the bill and it will debate from your ATM or from
your account after that click on next.
Step 5:
After that choose your case you have to make it by default or the
recommend one chooses that is production – Amazon Aurora and
click on next.
Now the ques rising what is amazon aurora?

Amazon Aurora

Amazon Aurora is a hosted relational database service developed


and offered by Amazon in October 2014. Aurora is available as
part of the Amazon Relational Database Service

Step 6:
In the next step, you have to specify the database details In this
you have to fill the database name and set the password for the
database. we do this to make the security better so that no other
unnecessary people interfere with our database or corrupt our
database.

After scrolling down, you have an entity to fill and there you
have to fill the database name and the database password and
then click on next.

Step 7:
After that we have the configure advance setting in this we don’t
have to do anything leave it as is it and scroll down there you get
launch instance and click on launch instance.
By this, we make a database on the AWS (amazon web services )
cloud and we can access it by using the public IP (Internet
protocol ) of this given DataBase instance.

8) Explain amazon DynamoDB? Describe the steps to create


table?
Amazon DynamoDB Overview
 Amazon DynamoDB is a fully managed NoSQL database
service provided by AWS. It is known for low-latency, single-
digit millisecond response times and is designed for high
availability and fault tolerance.
 Data Models: DynamoDB supports a key-value and
document store model, where each item is a collection of
attributes. Each item can have different attributes, making
it a flexible option for applications with diverse data types.
 Use Cases: Common use cases for DynamoDB include
mobile apps, gaming, IoT applications, e-commerce
platforms, and other real-time applications requiring
scalable, high-performance databases.
Creating Tables in Amazon DynamoDB
Understanding DynamoDB Tables
 In DynamoDB, data is organized into tables, which are
similar to tables in relational databases but are more
flexible and schema-less.
 A table in DynamoDB is a collection of items, where each
item is a set of attributes (key-value pairs). Each table
requires a primary key to uniquely identify items.
Primary Keys
 Partition Key (Simple Primary Key): A single attribute
that acts as a unique identifier for each item. DynamoDB
uses the partition key to distribute items across partitions
for scalability.
 Partition and Sort Key (Composite Primary Key): A
composite key that includes a partition key and a sort key.
This allows multiple items with the same partition key to be
grouped together, and the sort key is used to distinguish
these items within the group.
Secondary Indexes
 Global Secondary Index (GSI): Allows querying on non-
primary key attributes, even across different partition keys.
GSIs can have unique or duplicated partition and sort keys.
 Local Secondary Index (LSI): Also allows querying on
non-primary attributes but only with the same partition key
as the table. LSIs are useful for querying by additional
attributes but have limitations compared to GSIs.
Steps to Create a DynamoDB Table
1. Access the DynamoDB Console: In the AWS
Management Console, navigate to DynamoDB and select
“Create Table.”
2. Define Table Settings:
o Table Name: Provide a name for the table.
o Primary Key: Choose either a simple (partition key
only) or composite (partition and sort key) primary
key.
o Secondary Indexes (optional): Add global or local
secondary indexes based on expected query patterns.
3. Configure Capacity Mode:
o Choose between On-demand or Provisioned
capacity (details below).
4. Configure Additional Settings (optional):
o Auto Scaling: Set up automatic scaling for
provisioned capacity mode.
o Encryption: Enable server-side encryption to protect
data at rest.
o TTL (Time to Live): Specify a time-to-live attribute to
automatically delete expired items.
5. Review and Create: Review settings and click "Create
Table." The table will be available for use once created.
9) Explain DynamoDB with its key features?
Amazon DynamoDB is a fully managed, serverless, NoSQL
database service provided by Amazon Web Services (AWS). It is
designed for high availability, scalability, and low-latency
performance, making it suitable for applications requiring
consistent, single-digit millisecond response times at any scale.
Here’s a detailed overview of DynamoDB, covering its key
features, architecture, use cases, and best practices.

1. Key Features of Amazon DynamoDB


a. Fully Managed Service
 No Server Management: DynamoDB abstracts away the
complexities of hardware provisioning, setup, configuration,
replication, software patching, and cluster scaling. AWS
manages all operational aspects, allowing developers to
focus on application development.
b. Serverless Architecture
 On-Demand Scalability: DynamoDB automatically scales
up or down to handle varying workloads without any
manual intervention. This elasticity ensures consistent
performance during peak and low usage periods.
c. NoSQL Data Model
 Key-Value and Document Store: DynamoDB supports
both key-value and document data models. This flexibility
allows for the storage of unstructured, semi-structured, or
structured data. Items can be stored as JSON documents,
making it easy to represent complex data structures.
d. Single-Digit Millisecond Response Times
 Low Latency: Designed for performance, DynamoDB
delivers consistent, low-latency responses, making it
suitable for high-traffic applications like gaming, IoT, and
mobile backends.
e. Global Tables
 Multi-Region Replication: DynamoDB Global Tables
enable multi-region, fully replicated tables for low-latency
access to data across multiple geographical locations. This
feature enhances data availability and disaster recovery.
f. Built-in Security
 Encryption: DynamoDB supports encryption at rest and in
transit, ensuring data security and compliance with
regulations.
 Access Control: Integration with AWS Identity and Access
Management (IAM) provides fine-grained access control to
resources.
g. Flexible Pricing
 On-Demand and Provisioned Pricing Models: Users can
choose between on-demand capacity (pay-per-request) and
provisioned capacity (specifying read and write capacity
units), allowing for cost-effective resource management
based on workload patterns.

2. Data Model and Structure


a. Tables
 Primary Structure: DynamoDB data is organized into
tables. Each table is identified by a unique name and can
store multiple items.
b. Items
 Records in a Table: Items are individual records within a
table and can have different attributes. Each item is
uniquely identified by a primary key.
c. Attributes
 Data Fields: Attributes are the individual pieces of data
associated with an item. They can be of various data types,
including string, number, binary, Boolean, list, and map.
d. Primary Keys
 Unique Identifiers: DynamoDB requires each item to have
a primary key, which can be:
o Partition Key: A single attribute used to distribute
data across partitions.
o Composite Key: A combination of a partition key and
a sort key, allowing for more complex querying and
organization of items.
e. Secondary Indexes
 Enhanced Query Capabilities: DynamoDB supports two
types of secondary indexes:
o Global Secondary Indexes (GSI): Allow queries on
non-primary key attributes with different partition and
sort keys.
o Local Secondary Indexes (LSI): Allow queries on
non-primary key attributes but with the same partition
key and a different sort key.

3. Querying and Data Access


a. Query Operation
 Retrieves Items: The query operation allows for retrieving
multiple items from a table based on the primary key. It
supports filtering and sorting by using sort keys and
secondary indexes.
b. Scan Operation
 Full Table Scans: The scan operation reads every item in
the table and returns all data attributes. It can be filtered,
but it is less efficient than a query operation due to the
need to read through all items.
c. Conditional Writes
 Atomic Operations: DynamoDB supports conditional
writes, allowing you to update or delete items only if certain
conditions are met. This feature is useful for implementing
optimistic concurrency control.
d. Transactions
 ACID Transactions: DynamoDB provides support for ACID
transactions, allowing multiple operations across one or
more tables to be performed atomically. This ensures data
integrity even in complex scenarios.

4. Performance and Scalability


a. Provisioned Capacity Mode
 Set Capacity Limits: Users can specify the number of
read and write capacity units for tables, allowing for
predictable performance based on application
requirements.
b. On-Demand Capacity Mode
 Dynamic Scaling: Automatically scales to accommodate
workload fluctuations. Users pay per read and write
request, making it cost-effective for variable workloads.
c. Adaptive Capacity
 Automatic Load Balancing: DynamoDB can automatically
redistribute data to maintain consistent performance, even
when workloads are unevenly distributed.

5. Security and Compliance


a. IAM Integration
 Fine-Grained Access Control: Users can define IAM
policies that restrict access to specific tables or actions
within DynamoDB.
b. Encryption
 Data Protection: All data in DynamoDB is automatically
encrypted at rest and can also be encrypted in transit using
SSL.
c. Audit Logging
 Monitoring: DynamoDB integrates with AWS CloudTrail to
provide logging and monitoring capabilities, which help in
tracking API calls and ensuring compliance.

6. Use Cases for Amazon DynamoDB


 Web and Mobile Applications: Ideal for applications
requiring fast and scalable data access, such as social
media platforms, online gaming, and mobile apps.
 IoT Applications: Supports the ingestion and querying of
time-series data from IoT devices, making it suitable for
monitoring and analytics.
 Real-Time Analytics: Provides low-latency access to data,
enabling real-time analytics and reporting for business
intelligence.
 Session Management: Suitable for storing user sessions
and preferences due to its low-latency performance and
ability to handle high request volumes.
 Content Management Systems: Works well for
managing user-generated content, allowing for flexible
schema and rapid changes to data structure.
10) Explain the role of Workload Management (WLM) in
Amazon Redshift. How can you configure WLM queues to
handle a mixed workload environment with ETL tasks?
Amazon Redshift is a fully managed, petabyte-scale data
warehouse service provided by Amazon Web Services (AWS). It
enables organizations to store, analyze, and query large volumes
of structured data quickly and efficiently, using SQL and other
analytic tools. Here’s a breakdown of its key features,
architecture, and use cases.

1. Key Features of Amazon Redshift


 Massive Parallel Processing (MPP): Redshift distributes
data and query processing across multiple nodes, enabling
high-speed analysis and the handling of large data volumes.
This architecture allows it to perform complex queries in
parallel, significantly reducing query times.
 Columnar Storage: Redshift stores data in a columnar
format rather than the traditional row-based format. This
means that data is stored by columns, which is more
efficient for analytical queries that often retrieve specific
columns rather than entire rows. Columnar storage also
improves compression rates, reducing storage costs.
 Compression: Redshift uses advanced compression
techniques, which reduce the storage requirements of the
data. Since similar data is stored together in columns, it
compresses better than row-based storage, saving storage
space and reducing I/O.
 Scalability: Redshift supports scaling both compute and
storage independently. With Redshift RA3 node types, you
can scale storage capacity without needing to add more
compute resources, which can help manage costs
effectively.
 Redshift Spectrum: This feature allows you to query data
stored in Amazon S3 directly, without needing to load it into
Redshift. It extends Redshift’s querying capability to data
lakes, enabling analyses on both warehouse and S3 data in
a unified manner.
 Data Sharing: Redshift Data Sharing allows clusters to
share data with other Redshift clusters without having to
copy or move the data. This enables seamless access
across different clusters and use cases, such as sharing
datasets with different departments or teams.
 Machine Learning Integration: Amazon Redshift
integrates with Amazon SageMaker for machine learning
tasks. Using SQL commands, you can build, train, and
deploy machine learning models directly from Redshift
without needing extensive ML expertise.
amazon Redshift Parameter Groups
What are Parameter Groups?
 Parameter groups in Amazon Redshift are configurations
that define runtime settings for a Redshift cluster.
 They allow you to customize various database and system
parameters, such as query timeout limits, memory
allocation, and other settings, to optimize the cluster’s
performance for specific workloads.
 Default Parameter Group: Each Redshift cluster is
associated with a default parameter group that includes
preset parameter values, optimized for general-purpose
use.
Creating and Modifying Parameter Groups
 You can create custom parameter groups to apply specific
settings for specialized workloads or performance
requirements.
 Creating a Parameter Group:
1. Go to the Redshift Console.
2. Under Parameter Groups, select Create Parameter
Group.
3. Assign a unique name and description to the
parameter group.
4. After creating it, you can modify parameters in the
new group as needed.
 Modifying Parameters: Commonly modified parameters
include:
o Statement Timeout: Specifies the maximum
duration a query can run before it is automatically
terminated. Useful for preventing long-running queries
from consuming resources.
o Datestyle: Sets the date format for queries and
results.
o WLM Parameters: Controls various settings related
to workload management (discussed below).
Amazon Redshift Workload Management (WLM)
What is Workload Management (WLM)?
 Workload Management (WLM) in Amazon Redshift
allows you to allocate resources and prioritize workloads by
defining multiple queues with different query processing
rules and concurrency limits.
 WLM is essential for optimizing Redshift’s performance,
especially for clusters that handle a mix of short and long-
running queries, as it controls how many queries can run
simultaneously and how memory resources are distributed.
Types of WLM Configurations
 Automatic WLM: Amazon Redshift automatically manages
resources, memory allocation, and concurrency based on
query patterns and cluster configuration.
 Manual WLM: Offers detailed customization options,
allowing you to create multiple queues, assign resources,
set concurrency levels, and prioritize workloads according
to your specific needs.

You might also like