0% found this document useful (0 votes)
29 views

Data Storage and AWS

This document discusses building an ETL pipeline using AWS Lambda and Glue. It provides an overview of AWS storage services like S3 and Glacier, describes how to create S3 buckets and their properties, working with RDS databases, and an overview of NoSQL databases.

Uploaded by

Kumar Ashwini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Storage and AWS

This document discusses building an ETL pipeline using AWS Lambda and Glue. It provides an overview of AWS storage services like S3 and Glacier, describes how to create S3 buckets and their properties, working with RDS databases, and an overview of NoSQL databases.

Uploaded by

Kumar Ashwini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Building ETL pipeline using AWS

Lambda and Glue


1
Module Name - Building
ETL pipeline using AWS
Lambda and Glue

Topic - Data Storage and


EditEdit
MasterMaster
AWS
texttext
stylesstyles
Instructor:
● Overview of AWS Storage Service
● Overview of S3
● Glacier
● Creating S3 Bucket
● Properties of S3 bucket
● Working with RDS databases
● Overview of No-SQL database
Revision

● Amazon Web Services and Free tier Account


○ AWS is a broad set of global cloud computing services.
○ AWS Free Tier offers a wide range of services for free.
○ AWS offers scalability, reliability, security, global reach, and cost-
effectiveness.
● Creating an AWS Account
○ Create an AWS account to access the Web Console, CLI, and SDKs.
● Exploring Web Console
● AWS CLI tool
● SDKs and APIs
● EC2 instance
○ EC2 instances are virtual machines that you can run in the AWS cloud.
Overview of AWS Storage Service

● AWS Storage Services are a broad set of


cloud-based storage services that offer a
wide range of features and benefits to meet
the needs of any organization.

● AWS Storage Services are highly scalable,


reliable, and secure, and they can be used to
store any type of data, from simple files to
complex databases.
Overview of AWS Storage Service

● Some of the most popular AWS Storage Services include:

● Amazon S3 (Simple Storage Service): A highly scalable, object storage


service that is ideal for storing large amounts of data, such as websites,
images, and videos.

● Amazon Glacier: A low-cost storage service for archiving data that is


infrequently accessed.

● Amazon EBS (Elastic Block Store): A high-performance storage service


that is ideal for storing data that is accessed frequently, such as database
files and application logs.
Overview of AWS Storage Service

● Amazon EFS (Elastic File System): A scalable, shared file system that is
ideal for use with Amazon EC2 instances.

● Amazon FSx (File System for Windows) : A scalable Windows file


system that is fully managed by AWS.
Overview of S3

● Amazon S3 (Simple Storage Service) is a highly scalable, object storage


service that is ideal for storing large amounts of data, such as websites,
images, and videos. S3 is highly scalable and reliable.
Overview of S3

● S3 offers a wide range of features, including:

● Versioning: S3 automatically keeps track of changes to your objects, so


you can easily recover previous versions of your data if needed.

● Life cycle management: S3 allows you to set lifecycle policies for your
objects, so that they can be automatically transitioned to different storage
classes or deleted after a certain period of time.

● Security: S3 offers a wide range of security features to help you protect


your data, including server-side encryption and access control lists.
Glacier

● Amazon Glacier is a low-cost


storage service for archiving data that
is infrequently accessed.

● Glacier is ideal for storing data that


needs to be retained for long periods
of time, such as backups and
historical data.
Exploring the Web Console

● Glacier offers a variety of features to help you


manage your archived data, including:

● Flexible retrieval options: You can choose to


retrieve your data from Glacier in as little as 1-5
minutes, or you can choose to have it delivered to
you on a physical storage device.

● Low cost: Glacier is one of the most cost-effective


storage options available, making it ideal for
archiving large amounts of data.
Creating S3 Bucket

● To create an S3 bucket, you can use the AWS Web Console, the AWS
CLI, or an AWS SDK.

● To create an S3 bucket using the AWS Web Console:

○ Go to the Amazon S3 console.


○ Click the "Create Bucket" button.
○ Enter a name for your bucket and select a region.
○ Click the "Create Bucket" button.
Creating S3 Bucket
Creating S3 Bucket

● To create an S3 bucket using the AWS CLI:

aws s3 mb s3://<bucket-name> --region <region-name>

● Where:

○ <bucket-name> is the name of your bucket.

○ <region-name> is the region where you want to create your bucket.

● To create an S3 bucket using an AWS SDK, please consult the


documentation for the SDK that you are using.
Properties of S3 Bucket

● S3 buckets have a number of properties that you can configure,


including:

● Bucket name: The name of your bucket.

● Region: The region where your bucket is located.

● Versioning: Whether or not versioning is enabled for your bucket.

● Life cycle management: Whether or not life cycle management is


enabled for your bucket.

● Access control: The access control list (ACL) for your bucket.
Working with RDS Databases

● Amazon Relational Database Service


(RDS) is a service that makes it easy to set
up, operate, and scale a relational database
in the cloud.

● RDS supports a variety of database


engines, including MySQL, PostgreSQL,
Oracle, and Microsoft SQL Server.
Working with RDS Databases

● To get started with RDS:

● Go to the Amazon RDS console.

● Click the "Create Database" button.

● Select a database engine and version.

● Configure your database instance.

● Click the "Create Database" button.

● Once your database instance is created, you can connect to it using a


database client such as MySQL Workbench
Working with RDS Databases
Overview of NoSQL Databases

● NoSQL databases are a type of database that does not use the traditional
relational database schema.

● NoSQL databases are often used for applications that require high
scalability and performance, such as web applications and social media
applications.
Overview of NoSQL Databases

● Some of the most popular NoSQL databases include:

● Amazon DynamoDB: A fully managed, multi-region, multi-master, durable


database with built-in security, backup and restore, and in-memory caching for
internet-scale applications.

● Amazon DocumentDB: A fully managed, high-performance, compatible


database service for MongoDB.

● Amazon Neptune: A fully managed graph database service that makes it easy
to store and query graph relationships.

● Amazon Keyspaces: A fully managed, scalable, durable, and highly available


distributed table store that provides serverless NoSQL database capabilities.
Overview of NoSQL Databases

● Amazon DynamoDB is a fully managed, multi-region,


multi-master, durable database with built-in security,
backup and restore, and in-memory caching for
internet-scale applications.

● DynamoDB offers a seamless primary key and


secondary index design that delivers single-digit
millisecond performance at any scale.

● It is a flexible and scalable database that can be used


to store and query data of any size or complexity.
Key Takeaways
● AWS Storage Services: Scalable, reliable, and secure storage for a
wide range of use cases.
● S3: Highly scalable, object storage service for large amounts of data.
● Glacier: Low-cost storage service for archiving infrequently accessed
data.
● Creating S3 Bucket: Use the AWS Web Console, CLI, or SDK.
● S3 Bucket Properties: Configure bucket name, region, versioning, life
cycle management, and access control.
● Working with RDS databases: Set up, operate, and scale relational
databases in the cloud.
● NoSQL database: Database type that does not use the relational
database schema. Often used for applications that require high
scalability and performance.

Data Science Certification 23


#LifeKoKaroLift

Thank You!

You might also like