0% found this document useful (0 votes)

7 views26 pages

Slides

The document provides an overview of AWS Data Engineering, focusing on key services such as AWS Kinesis for real-time data streaming, AWS Glue for ETL processes, and AWS Database Migration Service for database migrations. It details the functionalities of each service, including data ingestion, processing, and management features. Additionally, it explains the architecture and components involved in these services, emphasizing their scalability and serverless nature.

Uploaded by

dheerajyadavbca2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Slides

Uploaded by

dheerajyadavbca2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

AWS Data Engineering

Course by Johnny Chivers

https://www.youtube.com/c/JohnnyChivers
What is Data Enginering?
Data Engineering is the process of collecting, analysing and
transforming data from numerous sources. Data can be
transient, or persisted to a repository.
AWS Data Streaming

AWS Data Engineering

What is AWS Kinesis?
Realtime AWS Managed Service Scalable

Amazon Kinesis enables you to Amazon Kinesis is fully managed and Amazon Kinesis can handle any
ingest, buffer, and process runs your streaming applications amount of streaming data and
streaming data in real-time, so you without requiring you to manage any process data from hundreds of
can derive insights in seconds or infrastructure. thousands of sources with very low
minutes instead of hours or days. latencies.
Kinesis Video Streams Kinesis Data Firehose
Amazon Kinesis Video Streams Amazon Kinesis Data Firehose is
makes it easy to securely stream the easiest way to capture,
video from connected devices to transform, and load data streams
AWS for analytics, machine learning into AWS data stores for near real-
(ML), and other processing. time analytics with existing
business intelligence tools.

Kinesis Data Streams Kinesis Data Analytics

Amazon Kinesis Data Streams is a Amazon Kinesis Data Analytics is
scalable and durable real-time data the easiest way to process data
streaming service that can streams in real time with SQL or
continuously capture gigabytes of Apache Flink without having to
data per second from hundreds of learn new programming languages
thousands of sources. or processing frameworks.
Kinesis Data Streams High-Level Architecture

https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html
Producer Shard
Producers put records into Amazon
Kinesis Data Streams. For example, A shard is a uniquely identified sequence of data
a web server sending log data to a records in a stream. A stream is composed of one or
stream is a producer. more shards, each of which provides a fixed unit of
capacity. Each shard can support up to 5
transactions per second for reads, up to a maximum
total data read rate of 2 MB per second and up to
1,000 records per second for writes, up to a
Retention Period maximum total data write rate of 1 MB per second
(including partition keys). The data capacity of your
The length of time that data
stream is a function of the number of shards that
records are accessible after they
you specify for the stream. The total capacity of the
are added to the stream. A stream’s
stream is the sum of the capacities of its shards.
retention period is set to a default
If your data rate increases, you can increase or
of 24 hours after creation. You can
decrease the number of shards allocated to your
increase the retention period up to
stream
8760 hours (365 days)
Partition Key Consumer
A partition key is used to group Consumers get records from
data by shard within a stream. Amazon Kinesis Data Streams and
process them. These consumers
are known as Amazon Kinesis Data
Streams Application.

Sequence Number
Each data record has a sequence
number that is unique per
partition-key within its shard.
Kinesis Data Firehose High-Level Architecture

https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
Record
The data of interest that your data Data Producer
producer sends to a Kinesis Data
Producers send records to Kinesis
Firehose delivery stream. A record
Data Firehose delivery streams. For
can be as large as 1,000 KB.
example, a web server that sends
log data to a delivery stream is a
data producer. You can also
configure your Kinesis Data
Buffer Size and Buffer Interval Firehose delivery stream to
automatically read data from an
Kinesis Data Firehose buffers existing Kinesis data stream, and
incoming streaming data to a load it into destinations. For more
certain size or for a certain period information, see Sending Data to
of time before delivering it to an Amazon Kinesis Data Firehose
destinations. Buffer Size is in MBs Delivery Stream.
and Buffer Interval is in seconds.
Destinations

Amazon Simple Storage Service (Amazon S3)

Amazon Redshift
Amazon Elasticsearch Service (Amazon ES)
Splunk
Datadog
Dynatrace
LogicMonitor
MongoDB
New Relic
Sumo Logic
Kinesis Data Analytics

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works.html
AWS Glue

AWS Data Engineering

What is AWS Glue?
Managed ETL Service Collection of Components Serverless

AWS Glue is a fully managed ETL AWS Glue consists of a central AWS Glue is serverless, so there’s
(extract, transform, and load) metadata repository known as the no infrastructure to set up or
service that makes it simple and AWS Glue Data Catalog, an ETL manage.
cost-effective to categorize your engine that automatically generates
data, clean it, enrich it, and move it Python or Scala code, and a flexible
reliably between various data scheduler that handles dependency
stores and data streams. resolution, job monitoring, and
retries.
AWS Glue Key Concepts

https://docs.aws.amazon.com/glue/latest/dg/components-key-concepts.html
AWS Glue Data Catalog Crawler
The persistent metadata store in A program that connects to a data
AWS Glue. It contains table store (source or target), progresses
definitions, job definitions, and through a prioritized list of
other control information to classifiers to determine the schema
manage your AWS Glue for your data, and then creates
environment. Each AWS account metadata tables in the AWS Glue
has one AWS Glue Data Catalog per Data Catalog.
region.

Classifers Data Store

Determines the schema of your A data store is a repository for
data. AWS Glue provides classifiers persistently storing your data.
for common file types, such as CSV, Examples include Amazon S3
JSON, AVRO, XML, and others. buckets and relational databases. A
Plus Common relational database data source is a data store that is
management systems using a JDBC used as input to a process or
connection. You can write your own transform. A data target is a data
classifier store that a process or transform
writes to.

https://docs.aws.amazon.com/glue/latest/dg/components-overview.html
AWS Glue Data Catalog

The AWS Glue Data Catalog is your The Data Catalog also provides
persistent metadata store. It is a comprehensive audit and
managed service that lets you governance capabilities, with
store, annotate, and share schema change tracking and data
metadata in the AWS Cloud in the access controls. You can audit
same way you would in an Apache changes to data schemas. This
Hive metastore. helps ensure that data is not
inappropriately modified or
inadvertently shared.

Each AWS account has one AWS AWS Glue Data Catalog consists of
Glue Data Catalog per AWS region. a hierarchy of databases and
tables.Tables are the metadata
definition that represents your data
and databases are logically
grouped tables.
AWS Database
Migration Service
(DMS)

AWS Data Engineering

What is AWS DMS?
AWS Database Migration Service helps you migrate databases to AWS
quickly and securely. The source database remains fully operational during
the migration, minimizing downtime to applications that rely on the
database. The AWS Database Migration Service can migrate your data to
and from most widely used commercial and open-source databases.

https://aws.amazon.com/dms/
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Components.html
Replication instance Replication tasks
At a high level, an AWS DMS You use an AWS DMS replication
replication instance is simply a task to move a set of data from the
managed Amazon Elastic Compute source endpoint to the target
Cloud (Amazon EC2) instance that endpoint. Creating a replication
hosts one or more replication task is the last step you need to
tasks. take before you start a migration.

Endpoint Schema and Code Migration

AWS DMS uses an endpoint to AWS DMS doesn't perform schema
access your source or target data or code conversion
store. The specific connection
information is different, depending
on your data store, but in general
you supply the following
information when you create an
endpoint:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Components.html
Sources for AWS DMS

Oracle
Microsoft SQL Server
MySQL
MariaDB
PostgreSQL
MongoDB
SAP Adaptive Server Enterprise (ASE)
Microsoft Azure
Amazon RDS instance databases, and Amazon Simple
Storage Service

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html

AWS Data Engineering Notes by Iusmanmaqbool
No ratings yet
AWS Data Engineering Notes by Iusmanmaqbool
79 pages
AWS Data Lake
100% (1)
AWS Data Lake
104 pages
WhizCard CLF C02 Cheat Sheet Nov 2024
No ratings yet
WhizCard CLF C02 Cheat Sheet Nov 2024
110 pages
Ppb1 Workshop Streaming
No ratings yet
Ppb1 Workshop Streaming
64 pages
AWSCertified MLSlides
No ratings yet
AWSCertified MLSlides
450 pages
AWS Data Engineering Services
No ratings yet
AWS Data Engineering Services
24 pages
Amazon Kinesis Data Firehose: Developer Guide
No ratings yet
Amazon Kinesis Data Firehose: Developer Guide
113 pages
Final CDA - Pptx-No - Ix - and - Coverslides
No ratings yet
Final CDA - Pptx-No - Ix - and - Coverslides
73 pages
1605192076066-614 DAS-C01 Study Guide
No ratings yet
1605192076066-614 DAS-C01 Study Guide
18 pages
Event Streaming With Modern Data Pipelines in A SaaS Architecture ISV201
No ratings yet
Event Streaming With Modern Data Pipelines in A SaaS Architecture ISV201
22 pages
AWS Whitepaper
No ratings yet
AWS Whitepaper
31 pages
AWS White Paper
No ratings yet
AWS White Paper
6 pages
AWS Data Lake
No ratings yet
AWS Data Lake
87 pages
Module 4
No ratings yet
Module 4
38 pages
Use Cases of Dynamo DB
No ratings yet
Use Cases of Dynamo DB
16 pages
Zorba
No ratings yet
Zorba
28 pages
4.1. Notes
No ratings yet
4.1. Notes
25 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
100% (1)
Big Data Architectural Patterns and Best Practices On AWS Presentation
56 pages
Coursework at School
100% (2)
Coursework at School
5 pages
Introducing Amazon Kinesis: Managed Service For Streaming Data Ingestion & Processing
No ratings yet
Introducing Amazon Kinesis: Managed Service For Streaming Data Ingestion & Processing
36 pages
The Kinesis Family Slides
No ratings yet
The Kinesis Family Slides
26 pages
AWS 05 DataLake
No ratings yet
AWS 05 DataLake
78 pages
Kinesis
No ratings yet
Kinesis
11 pages
WhizCard AWS Certified Developer Associate (DVA C02)
No ratings yet
WhizCard AWS Certified Developer Associate (DVA C02)
87 pages
Competition Law in India by Nishith Desai
No ratings yet
Competition Law in India by Nishith Desai
120 pages
Wa0025.
No ratings yet
Wa0025.
25 pages
WhizCard CLF C01 06 09 2022
No ratings yet
WhizCard CLF C01 06 09 2022
111 pages
Amazon Kinesis Data Streams Documentation
No ratings yet
Amazon Kinesis Data Streams Documentation
3 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
Internal Reconstruction
No ratings yet
Internal Reconstruction
21 pages
AWS-Doc-Amazon Kinesis Data Streams
No ratings yet
AWS-Doc-Amazon Kinesis Data Streams
3 pages
2015 - AutoCAD Tutorial Architecture Imperial Version
67% (6)
2015 - AutoCAD Tutorial Architecture Imperial Version
44 pages
Module3 4
No ratings yet
Module3 4
16 pages
Digital Leadership in Higher Education Prof Abd Karim Alias
No ratings yet
Digital Leadership in Higher Education Prof Abd Karim Alias
75 pages
Adi Krishnan, Sr. Product Manager Amazon Kinesis: November 13, 2014 - Las Vegas, NV
No ratings yet
Adi Krishnan, Sr. Product Manager Amazon Kinesis: November 13, 2014 - Las Vegas, NV
38 pages
Modernize Your Analyticsand Data Architecture
No ratings yet
Modernize Your Analyticsand Data Architecture
47 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Architecture
No ratings yet
Architecture
6 pages
CHEMISTRY Grade 9 Retake
No ratings yet
CHEMISTRY Grade 9 Retake
8 pages
Apache Iceberg - Additional Real World Use Cases
No ratings yet
Apache Iceberg - Additional Real World Use Cases
25 pages
Data Storage and AWS
No ratings yet
Data Storage and AWS
24 pages
UNITAR Introduction To Sustainable Development in Practice
No ratings yet
UNITAR Introduction To Sustainable Development in Practice
30 pages
Bigdata Pipeline With AWS: Author: Diksha Singh Tomer Computer and Science Engineering Banasthali University, India
No ratings yet
Bigdata Pipeline With AWS: Author: Diksha Singh Tomer Computer and Science Engineering Banasthali University, India
9 pages
AWS Certified Cloud Practitioner 03-09-2021
100% (1)
AWS Certified Cloud Practitioner 03-09-2021
111 pages
7.1-Amazon Kinesis - Digital Cloud Training
No ratings yet
7.1-Amazon Kinesis - Digital Cloud Training
9 pages
Entrepreneurship Introduction
No ratings yet
Entrepreneurship Introduction
25 pages
Entrep Q1 Mod1
No ratings yet
Entrep Q1 Mod1
18 pages
Movie
No ratings yet
Movie
25 pages
AWS - Kinesis Quizlet
No ratings yet
AWS - Kinesis Quizlet
15 pages
Prisoner Diving Gear
No ratings yet
Prisoner Diving Gear
2 pages
Log Analytics Withamazonelasticsearchservice
No ratings yet
Log Analytics Withamazonelasticsearchservice
46 pages
Shivam Gupta Resume
No ratings yet
Shivam Gupta Resume
1 page
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
No ratings yet
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
2 pages
AWS Data Lake
No ratings yet
AWS Data Lake
13 pages
Mba-Hrd Placementreport 2018-19
No ratings yet
Mba-Hrd Placementreport 2018-19
16 pages
Change Data Capture Using Aws Dms Ra
No ratings yet
Change Data Capture Using Aws Dms Ra
3 pages
Big Data PDF
No ratings yet
Big Data PDF
18 pages
Anatomy and Physiology Workbook FINAL
100% (1)
Anatomy and Physiology Workbook FINAL
66 pages
Wrath & Glory - Beginner - S Rulebook
100% (3)
Wrath & Glory - Beginner - S Rulebook
74 pages
AWS Cloud Practitioner (CLF C02)
100% (1)
AWS Cloud Practitioner (CLF C02)
102 pages
Wa 971053
No ratings yet
Wa 971053
2 pages
AWS Big Data Specialty Study Guide PDF
No ratings yet
AWS Big Data Specialty Study Guide PDF
13 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
July 2010 Nursing Board Exam Topnotchers
No ratings yet
July 2010 Nursing Board Exam Topnotchers
2 pages
FIELD TRIP REPORT Sabnam
No ratings yet
FIELD TRIP REPORT Sabnam
21 pages
A Detailed Lesson Plan in MATHEMATICS 6 Day 3
No ratings yet
A Detailed Lesson Plan in MATHEMATICS 6 Day 3
10 pages
4 Hulganza v. CA PDF
No ratings yet
4 Hulganza v. CA PDF
4 pages
Machine Tool Industry in India
No ratings yet
Machine Tool Industry in India
26 pages
Thermodynamics Problems
No ratings yet
Thermodynamics Problems
10 pages
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet
Unit 6 Crime and Punishment
No ratings yet
Unit 6 Crime and Punishment
2 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
AWS in ACTION Part -1: Real-world Solutions for Cloud Professionals
From Everand
AWS in ACTION Part -1: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
Max and Big-E Part 1
No ratings yet
Max and Big-E Part 1
3 pages
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
No ratings yet
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
7 pages
AWS CLI Essentials: A Beginner's Guide to Cloud Automation
From Everand
AWS CLI Essentials: A Beginner's Guide to Cloud Automation
Robert Johnson
No ratings yet
AWS Cloud Practitioner Study Guide & Practice Tests
From Everand
AWS Cloud Practitioner Study Guide & Practice Tests
SUJAN
No ratings yet
Grammar Quiz - Gustavo Millan
No ratings yet
Grammar Quiz - Gustavo Millan
2 pages
Energy Day: From The Content Group To The Climate Champions
No ratings yet
Energy Day: From The Content Group To The Climate Champions
3 pages
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
From Everand
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
Tech Interviews
5/5 (1)
A Comprehensive Guide to Amazon Web Services
From Everand
A Comprehensive Guide to Amazon Web Services
Josh Luberisse
No ratings yet
AWS SysOps Administrator Associate: From basic to advanced
From Everand
AWS SysOps Administrator Associate: From basic to advanced
Alex Carvalho
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
From Everand
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
Tech Interviews
5/5 (2)
History of Fifth Philippine Republic
No ratings yet
History of Fifth Philippine Republic
5 pages
AWS Associate Architect: From basic to advanced
From Everand
AWS Associate Architect: From basic to advanced
Alex Carvalho
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
From Everand
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
Tech Interviews
No ratings yet
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
AWS for Beginners
From Everand
AWS for Beginners
Sankar Srinivasan
No ratings yet
Normandy vs. Duque
No ratings yet
Normandy vs. Duque
2 pages

Slides

Uploaded by

Slides

Uploaded by

AWS Data Engineering

Course by Johnny Chivers

AWS Data Engineering

Kinesis Data Streams Kinesis Data Analytics

Amazon Simple Storage Service (Amazon S3)

AWS Data Engineering

Classifers Data Store

AWS Data Engineering

Endpoint Schema and Code Migration

You might also like