0% found this document useful (0 votes)
62 views54 pages

AWS Certified Data Engineer Associate Cheat Sheet

The AWS Certified Data Engineer Associate Cheat Sheet provides essential information and insights needed to prepare for the DEA-C01 exam, covering key AWS services and data engineering principles. It includes detailed sections on compute services like EC2, EKS, and ECS, as well as storage solutions such as S3 and EBS, emphasizing their functionalities and best practices. Mastery of these services is crucial for designing effective data processing and storage solutions in AWS.

Uploaded by

Lahbib Fedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views54 pages

AWS Certified Data Engineer Associate Cheat Sheet

The AWS Certified Data Engineer Associate Cheat Sheet provides essential information and insights needed to prepare for the DEA-C01 exam, covering key AWS services and data engineering principles. It includes detailed sections on compute services like EC2, EKS, and ECS, as well as storage solutions such as S3 and EBS, emphasizing their functionalities and best practices. Mastery of these services is crucial for designing effective data processing and storage solutions in AWS.

Uploaded by

Lahbib Fedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

AWS Certified Data Engineer

Associate Cheat Sheet


se the menu below to navigate the article sections:

cle menu

e
Home » AWS Cheat Sheets » AWS Certified Data Engineer Associate Cheat Sheet
Preparin
g for the
ng AWS
Certified
ss
Data
Engineer
e
Associat
e (DEA-
C01)
exam
requires
ent and Management

a deep understanding of AWS services and data engineering


Identity, and Compliance

principles. The coverage of AWS services is both broad and, in


n some cases, deep. You’ll need solid data engineering
experience to pass this challenging exam.
ly Asked Questions

This AWS cheat sheet for the AWS Certified Data Engineer
Associate exam consolidates the core facts you need to know
r Knowledge with Free Practice Questions

to pass the exam for each AWS service. Coupled with our
practice tests this knowledge will give you the edge on exam
day.
Compute
In the Compute category of our AWS Certified Data Engineer
Associate (DEA-C01) exam cheat sheet, we delve into the
essential AWS compute services that are integral to the exam.
This section provides key insights and facts about services
including Amazon EC2, and Amazon ECS/EKS, which are
fundamental in data engineering on AWS.
Understanding these compute services is vital for tackling the
DEA-C01 exam, as they form the backbone of many data
se the menu below to navigate the article sections:

cle menu
processing and analytics solutions in the AWS ecosystem.
e
Amazon EC2 (Elastic Compute Cloud):
EC2 Instances: Amazon EC2 provides resizable compute
capacity in the cloud, allowing you to launch virtual servers
(instances) as needed.
Instance Types: EC2 offers a variety of instance types
ng

ss optimized for different use cases, such as compute-


optimized, memory-optimized, and storage-optimized
e
instances.
Elastic Block Store (EBS): EC2 instances use EBS volumes
for persistent storage. EBS volumes offer different types like
General Purpose (SSD), Provisioned IOPS (SSD), and
Magnetic.
ent and Management

Security Groups: These act as virtual firewalls for EC2


instances, controlling inbound and outbound traffic at the
Identity, and Compliance

n instance level.
Elastic IP Addresses: These are static IP addresses
designed for dynamic cloud computing, allowing you to
ly Asked Questions

allocate and assign a fixed IP address to an EC2 instance.


Key Pairs: EC2 uses public-key cryptography to encrypt and
r Knowledge with Free Practice Questions

decrypt login information. To log into your instances, you


must create a key pair.
Amazon Machine Images (AMIs): AMIs are templates that
contain the software configuration (operating system,
application server, applications) required to launch an EC2
instance.
Instance Store Volumes: These provide temporary block-
level storage for some EC2 instances. Data on instance
store volumes is lost if the instance is stopped or
terminated.
Auto Scaling: This feature allows you to automatically scale
your EC2 capacity up or down according to conditions you
define.
Elastic Load Balancing (ELB): ELB automatically distributes
se the menu below to navigate the article sections:

incoming application traffic across multiple EC2 instances,


cle menu
improving application scalability and reliability.
Pricing Models: EC2 offers several pricing options,
e
including On-Demand, Reserved Instances, and Spot
Instances, each catering to different business needs and
cost optimization strategies.
VPC Integration: EC2 instances are launched in a Virtual
Private Cloud (VPC) to provide network isolation and
ng

connection to your own network.


Monitoring and Logging: Integration with Amazon
ss

e CloudWatch allows for monitoring the performance of EC2


instances, providing metrics like CPU utilization, disk I/O,
and network usage.
EC2 Instance Lifecycle: Understanding the lifecycle phases
of an EC2 instance, including launching, starting, stopping,
ent and Management

rebooting, and terminating.


AMI Customization: Ability to create custom AMIs from
Identity, and Compliance

existing instances, which can be used to launch new


instances with pre-configured settings.
n

Amazon Elastic Kubernetes Service (Amazon EKS):


ly Asked Questions

r Knowledge with Free Practice Questions

EKS Overview: Amazon EKS is a managed service that


makes it easy to run Kubernetes on AWS without needing to
install, operate, and maintain your own Kubernetes control
plane or nodes.
Kubernetes Clusters: EKS runs Kubernetes control plane
instances across multiple Availability Zones to ensure high
availability. It automatically detects and replaces unhealthy
control plane instances, and it provides automated version
upgrades.
Integration with AWS Services: EKS integrates with AWS
services like Elastic Load Balancing for distributing traffic,
IAM for authentication, and Amazon VPC for networking.
Worker Nodes Management: While EKS manages the
Kubernetes control plane, the responsibility of managing
worker nodes that run the applications lies with the user.
se the menu below to navigate the article sections:

These can be EC2 instances or AWS Fargate.


cle menu
EKS with Fargate: AWS Fargate is a serverless compute
engine for containers that works with EKS. Fargate removes
e
the need to provision and manage servers, and you only pay
for the resources required to run your containers.
Networking in EKS: EKS can be integrated with Amazon
VPC, allowing you to isolate your cluster within your own
network and connect to your existing services or resources.
ng

Load Balancing: EKS supports Elastic Load Balancing (ELB),


which automatically distributes incoming application traffic
ss

e across multiple targets, such as EC2 instances.


Security: EKS integrates with AWS IAM, providing granular
control over AWS resources. Security groups can be used to
control the traffic allowed to and from worker node
instances.
ent and Management

Logging and Monitoring: EKS integrates with Amazon


CloudWatch and AWS CloudTrail for logging and monitoring.
Identity, and Compliance

CloudWatch collects and tracks metrics, collects and


monitors log files, and sets alarms.
n

Persistent Storage: EKS supports Amazon EBS and Amazon


ly Asked Questions

EFS for persistent storage of Kubernetes pods.


Scalability: EKS supports horizontal pod autoscaling and
r Knowledge with Free Practice Questions

cluster autoscaling. Horizontal Pod Autoscaler automatically


scales the number of pods, and Cluster Autoscaler adjusts
the number of nodes.
Kubernetes API Compatibility: EKS provides a fully
managed Kubernetes API server that you can interact with
using your existing tools and workflows.
EKS Console: AWS provides a management console for
EKS, simplifying the process of creating, updating, and
deleting clusters.
EKS Pricing: EKS pricing is based on the number of hours
that your Kubernetes control plane runs, with no minimum
fees or upfront commitments.
Amazon Elastic Container Service (Amazon ECS):
se the menu below to navigate the article sections:

ECS Overview: Amazon ECS is a fully managed container


orchestration service that makes it easy to deploy, manage,
cle menu

and scale containerized applications using Docker.


e
Container Definition and Task Definitions: In ECS, a
container definition is part of a task definition. It specifies
how to run a Docker container, including CPU and memory
ng allocations, environment variables, and networking settings.
ECS Tasks and Services: A task is the instantiation of a task
ss
definition within a cluster. An ECS service allows you to run
and maintain a specified number of instances of a task
e
definition simultaneously.
Cluster Management: ECS clusters are logical groupings of
tasks or services. You can run ECS on a serverless
infrastructure that’s managed by AWS Fargate or on a
cluster of EC2 instances that you manage.
ent and Management

Integration with AWS Fargate: AWS Fargate is a serverless


compute engine for containers that removes the need to
Identity, and Compliance

n provision and manage servers. With Fargate, you specify


and pay for resources per application.
Networking and Load Balancing: ECS can be integrated
ly Asked Questions

with Amazon VPC to provide isolation for the containerized


applications. It also supports Elastic Load Balancing (ELB)
r Knowledge with Free Practice Questions

for distributing incoming traffic.


Storage with EBS and EFS: ECS tasks can use EBS volumes
for persistent storage, which persists beyond the life of a
single task. ECS can also integrate with EFS for shared
storage between multiple tasks.
IAM Roles for Tasks: ECS tasks can have IAM roles
associated with them, allowing each task to have specific
permissions.
Logging and Monitoring: ECS integrates with Amazon
CloudWatch for logging and monitoring. CloudWatch Logs
can collect and store container logs, and CloudWatch
metrics can monitor resource utilization.
ECS Scheduling and Orchestration: ECS includes built-in
schedulers for running containers based on resource needs
se the menu below to navigate the article sections:

and other requirements. It also supports integration with


cle menu
third-party schedulers.
Service Discovery: ECS supports service discovery, which
e
makes it easy for your containerized services to discover
and connect with each other.
ECS Security: Security in ECS involves securing the
container instances, managing access to the ECS resources
through IAM, and network traffic control with security
ng

groups and network ACLs.


ECS Pricing: Pricing for ECS is based on the resources you
ss

e use, such as EC2 instances or Fargate compute resources.


There is no additional charge for ECS itself.
Container Agent: ECS uses a container agent running on
each container instance in an ECS cluster. The agent sends
information about the resource’s current running tasks and
ent and Management

resource utilization to ECS.


Identity, and Compliance

n
Storage
In the Storage section of our AWS Certified Data Engineer
ly Asked Questions

Associate (DEA-C01) exam cheat sheet, we focus on Amazon


S3, S3 Select, Glacier, and EBS – key AWS storage services
r Knowledge with Free Practice Questions

essential for data engineering.


This section provides detailed insights into Amazon S3 for
object storage, S3 Select for efficient data querying, Glacier
for long-term archival, and EBS for block-level storage.
Understanding the functionalities, use cases, and best
practices of these services is crucial for the DEA-C01 exam, as
they are fundamental in designing and implementing effective,
scalable, and cost-efficient storage solutions in AWS.
Amazon S3 (Simple Storage Service):
S3 Overview: Amazon S3 (Simple Storage Service) is an
object storage service offering scalability, data availability,
se the menu below to navigate the article sections:

security, and performance.


Buckets and Objects: S3 stores data as objects within
cle menu

buckets. A bucket is a container for objects stored in


e
Amazon S3.
S3 Data Consistency Model: Amazon S3 offers strong
read-after-write consistency for PUTS of new objects and
eventual consistency for overwrite PUTS and DELETES.
Storage Classes: S3 offers a range of storage classes
ng

ss designed for different use cases, including S3 Standard, S3


Intelligent-Tiering, S3 Standard-IA (Infrequent Access), S3
e
One Zone-IA, and S3 Glacier.
S3 Glacier: S3 Glacier is a secure, durable, and low-cost
storage class for data archiving. Retrieval times can range
from minutes to hours.
S3 Select: This feature allows retrieval of only a subset of
ent and Management

data from an object, using simple SQL expressions. S3


Select improves the performance of applications by
Identity, and Compliance

n retrieving only the needed data from an S3 object.


Versioning: S3 supports versioning, enabling multiple
versions of an object to be stored in the same bucket.
ly Asked Questions

Lifecycle Policies: Lifecycle policies automate moving your


objects between different storage tiers and can be used to
r Knowledge with Free Practice Questions

expire objects at the end of their lifecycles.


Security and Encryption: S3 offers various encryption
options for data at rest and in transit. It also integrates with
AWS Identity and Access Management (IAM) for secure
access control.
Performance Optimization: Techniques like multipart
uploads, S3 Transfer Acceleration, and using byte-range
fetches can optimize the performance of S3.
Data Replication: S3 offers cross-region replication (CRR)
and same-region replication (SRR) for replicating objects
across buckets.
Event Notifications: S3 can send notifications when
specified events happen in a bucket, which can trigger
workflows, alerts, or other processing.
se the menu below to navigate the article sections:

Access Management: S3 provides various mechanisms for


cle menu
managing access, including bucket policies, ACLs (Access
Control Lists), and Query String Authentication.
e
S3 Analytics and Monitoring: Integration with Amazon
CloudWatch and S3 Storage Class Analysis tools help
monitor and analyze storage usage.
S3 Pricing: Costs are based on storage used, number of
requests, data transfer, and additional features like S3
ng

ss Select and Glacier retrieval.


e
Amazon EBS (Elastic Block Store):
EBS Overview: Amazon EBS provides block-level storage
volumes for use with Amazon EC2 instances. EBS volumes
are highly available and reliable storage volumes that can be
ent and Management

attached to any running instance in the same Availability


Zone.
Identity, and Compliance

n Volume Types: EBS offers different types of volumes for


different needs, such as General Purpose (SSD),
Provisioned IOPS (SSD), and Magnetic. Each type has
ly Asked Questions

distinct performance characteristics and cost implications.


Data Durability and Availability: EBS volumes are designed
r Knowledge with Free Practice Questions

for high durability, protecting against failures by replicating


within the same Availability Zone.
Snapshots: EBS allows you to create snapshots (backups)
of volumes, which are stored in Amazon S3. Snapshots can
be used for data recovery and creating new volumes.
Encryption: EBS provides the ability to encrypt volumes and
snapshots with AWS Key Management Service (KMS),
ensuring data security.
Performance Metrics: Understanding EBS performance
metrics like IOPS (Input/Output Operations Per Second) and
throughput is crucial for optimizing storage performance.
Scalability and Flexibility: EBS volumes can be easily
resized, and their performance can be changed depending
on the workload requirements.
se the menu below to navigate the article sections:

EBS-Optimized Instances: Certain EC2 instances are EBS-


cle menu
optimized, offering dedicated bandwidth for EBS volumes,
which is essential for high-performance workloads.
e
Lifecycle Management: Knowledge of EBS volume lifecycle,
from creation to deletion, and how it impacts EC2 instances
is important.
Cost Management: Understanding the pricing model of
EBS, including volume types and snapshot storage costs, is
ng

crucial for cost-effective solutions.


Integration with EC2: EBS is tightly integrated with EC2,
ss

e and knowledge of how they work together is essential for


effective data engineering on AWS.
Use Cases: EBS is commonly used for databases, file
systems, and any applications that require a file system or
direct block-level access to storage.
ent and Management

Data Transfer: Data transfer between EBS and EC2 is a key


concept, especially regarding performance and costs.
Identity, and Compliance

EBS and High Availability: Strategies for using EBS in high


availability configurations, such as with EC2 Auto Scaling
n

and across multiple Availability Zones.


ly Asked Questions

Disaster Recovery: Using EBS snapshots for disaster


recovery and understanding the process of restoring data
r Knowledge with Free Practice Questions

from snapshots.
Networking
In the Networking section of our AWS Certified Data Engineer
Associate (DEA-C01) exam cheat sheet, we delve into the
intricacies of Amazon Virtual Private Cloud (VPC), AWS Direct
Connect, and AWS Transit Gateway.
This segment is tailored to enhance your understanding of
AWS’s networking services, which are pivotal in establishing
secure, scalable, and efficient network architectures. Mastery
of VPC for isolated cloud resources, Direct Connect for
dedicated network connections, and Transit Gateway for
network scaling and connectivity is essential for the DEA-C01
se the menu below to navigate the article sections:

cle menu exam.


These services form the backbone of network management
e
and optimization in AWS, crucial for any data engineering
solution.
ng Amazon VPC (Virtual Private Cloud):
ss VPC Overview: Amazon VPC allows you to provision a
logically isolated section of the AWS Cloud where you can
e
launch AWS resources in a virtual network that you define.
Subnets: A VPC can be segmented into subnets, which are
subsets of the VPC’s IP address range. Subnets can be
public (with direct access to the internet) or private (without
direct access).
ent and Management

Internet Gateways (IGW): To enable access to or from the


internet in a VPC, you must attach an Internet Gateway.
Identity, and Compliance

n Route Tables: These define rules, known as routes, which


determine where network traffic from your subnet or
gateway is directed.
ly Asked Questions

Network Access Control Lists (NACLs): Stateless firewalls


for controlling traffic at the subnet level, allowing, or
r Knowledge with Free Practice Questions

denying traffic based on IP protocol, port number, and


source/destination IP.
Security Groups: Act as virtual firewalls for EC2 instances
to control inbound and outbound traffic at the instance
level.
VPC Peering: Allows you to connect one VPC with another
via a direct network route using private IP addresses.
NAT Devices: Network Address Translation (NAT) devices
enable instances in a private subnet to connect to the
internet or other AWS services but prevent the internet from
initiating connections with the instances.
Elastic IP Addresses: These are static IPv4 addresses
designed for dynamic cloud computing, which can be
associated with instances or network interfaces in a VPC.
Virtual Private Gateway (VPG) and VPN Connection: A
se the menu below to navigate the article sections:

VPG is the VPN concentrator on the Amazon side of a VPN


cle menu
connection, and the VPN connection links your VPC to your
own network.
e
Direct Connect: AWS Direct Connect bypasses the internet
and provides a direct connection from your network to AWS,
which can be used to create a more consistent network
experience.
Endpoint Services: VPC Endpoint Services allow you to
ng

expose your own services within your VPC to other AWS


accounts.
ss

e Flow Logs: Capture information about the IP traffic going to


and from network interfaces in your VPC, which can be used
for network monitoring, forensics, and security.
VPC Pricing: There is no additional charge for creating and
using the VPC itself. Charges are for the AWS resources you
ent and Management

create in your VPC and for data transfer.


Identity, and Compliance

n AWS Direct Connect:


Direct Connect Overview: AWS Direct Connect is a cloud
ly Asked Questions

service solution that makes it easy to establish a dedicated


network connection from your premises to AWS.
r Knowledge with Free Practice Questions

Private Connectivity: It provides a private, dedicated


network connection between your data center and AWS,
bypassing the public internet.
Reduced Bandwidth Costs: By using AWS Direct Connect,
you can reduce network costs, increase bandwidth
throughput, and provide a more consistent network
experience than internet-based connections.
Connection Options: AWS Direct Connect offers different
connection speeds, starting from 50 Mbps up to 100 Gbps.
Virtual Interfaces (VIFs): You can create virtual interfaces
directly to public AWS services (Public VIF) or to resources
in your VPC (Private VIF).
Data Transfer: AWS Direct Connect changes the data
transfer rate for AWS services, often reducing the cost of
data transfer compared to internet rates.
se the menu below to navigate the article sections:

Direct Connect Gateway: Allows you to connect to multiple


cle menu
VPCs in different AWS regions with the same AWS Direct
Connect connection.
e
Partner Network: AWS Direct Connect can be set up
through AWS partners who can help in establishing the
physical connection between your network and AWS.
Hybrid Environments: Ideal for hybrid cloud architectures,
providing a secure and reliable connection to AWS for
ng

workloads that require higher bandwidth or lower latency.


Consistent Performance: Offers more consistent network
ss

e performance and lower latency compared to the internet.


Data Privacy: Since traffic is not traversing the public
internet, it provides a higher level of security and privacy for
your data.
Resilience and Redundancy: For enhanced resilience, you
ent and Management

can set up multiple Direct Connect connections for


redundancy.
Identity, and Compliance

Use Cases: Commonly used for high-volume data transfers,


such as large-scale migrations, real-time data feeds, and
n

hybrid cloud architectures.


ly Asked Questions

Pricing Model: Pricing is based on the port speed and data


transfer rates. There are no minimum commitments or long-
r Knowledge with Free Practice Questions

term contracts required.


AWS Transit Gateway:
Transit Gateway Overview: AWS Transit Gateway acts as a
network transit hub, enabling you to connect your VPCs and
on-premises networks through a central point of
management.
Simplified Network Architecture: It simplifies the network
architecture by reducing the number of required peering
connections and managing them centrally.
Inter-Region Peering: Transit Gateway supports peering
connections across different AWS Regions, facilitating a
global network architecture.
se the menu below to navigate the article sections:

Integration with Direct Connect: It can be integrated with


cle menu
AWS Direct Connect to create a unified network interface
for both cloud and on-premises environments.
e
Routing and Segmentation: Offers advanced routing
features for network segmentation and traffic management,
including support for both static and dynamic routing.
Centralized Management: Provides a single gateway for all
network traffic, simplifying management and monitoring of
ng

inter-VPC and VPC-to-on-premises connectivity.


Scalability: AWS Transit Gateway is designed to scale
ss

e horizontally, allowing it to handle a growing amount of


network traffic as your AWS environment expands.
Security and Compliance: Enhances network security and
compliance by providing a single point to enforce security
policies and network segmentation.
ent and Management

Cost-Effective: Reduces the overall operational cost by


minimizing the complexity of network topology and reducing
Identity, and Compliance

the number of peering connections.


VPN Support: Supports VPN connections, enabling secure
n

connectivity between your on-premises networks and the


ly Asked Questions

AWS cloud.
Multicast Support: Transit Gateway supports multicast
r Knowledge with Free Practice Questions

routing, which is useful for applications that need to send


the same content to multiple destinations simultaneously.
High Availability and Resilience: Designed for high
availability and resilience, Transit Gateway automatically
scales with the increase in the volume of network traffic.
Flow Logs: Supports VPC Flow Logs for Transit Gateway,
allowing you to capture information about the IP traffic
going to and from network interfaces in your Transit
Gateway.
Pricing Model: Pricing is based on the amount of data
processed through the Transit Gateway and the number of
connections.
Serverless
se the menu below to navigate the article sections:

cle menu In the Networking section of our AWS Certified Data Engineer
Associate (DEA-C01) exam cheat sheet, we delve into the
intricacies of Amazon Virtual Private Cloud (VPC), AWS Direct
Connect, and AWS Transit Gateway.
e

This segment is tailored to enhance your understanding of


ng AWS’s networking services, which are pivotal in establishing
ss
secure, scalable, and efficient network architectures.
Mastery of VPC for isolated cloud resources, Direct Connect
e
for dedicated network connections, and Transit Gateway for
network scaling and connectivity is essential for the DEA-C01
exam.
These services form the backbone of network management
ent and Management

and optimization in AWS, crucial for any data engineering


solution.
Identity, and Compliance

AWS Lambda:
n

ly Asked Questions

Lambda Overview: AWS Lambda is a serverless compute


service that lets you run code without provisioning or
r Knowledge with Free Practice Questions

managing servers, creating workload-aware cluster scaling


logic, maintaining event integrations, or managing runtimes.
Event-Driven Architecture: Lambda functions are designed
to be triggered by AWS services like S3, DynamoDB,
Kinesis, SNS, and SQS, making it a key component in event-
driven architectures.
Scaling: AWS Lambda automatically scales your application
by running code in response to each trigger. Your code runs
in parallel and processes each trigger individually, scaling
precisely with the size of the workload.
Stateless: Lambda functions are stateless, meaning they do
not retain any state between invocations. For state
management, you need to use external services like S3 or
DynamoDB.
se the menu below to navigate the article sections:

Supported Languages: Lambda supports multiple


cle menu
programming languages, including Node.js, Python, Ruby,
Java, Go, .NET Core, and custom runtimes.
e
Time Limits: Lambda functions have a maximum execution
time limit, which as of the latest update, is 15 minutes.
Resource Allocation: You can allocate CPU power to
Lambda functions in increments of 64MB up to 10GB of
memory.
ng

Pricing: Lambda charges are based on the number of


requests for your functions and the time your code
ss

e executes.
Integration with AWS Services: Lambda can be integrated
with various AWS services for logging (CloudWatch),
monitoring (X-Ray), and security (IAM, VPC).
Deployment Packages: Lambda code can be deployed as a
ent and Management

ZIP file or a container image.


Versioning and Aliases: Lambda supports versioning of
Identity, and Compliance

functions. You can use aliases to route traffic between


different versions.
n

Environment Variables: Lambda allows you to set


ly Asked Questions

environment variables for your functions, which can be used


to store configuration settings and secrets.
r Knowledge with Free Practice Questions

Cold Starts: Understanding the concept of cold starts –


when a new instance of a function is created in response to
an event – and strategies to mitigate them.
Concurrency and Throttling: AWS Lambda has a
concurrency limit, which can be managed and configured.
Throttling may occur when these limits are reached.
Security: Lambda functions run within a VPC by default, but
you can configure them to access resources within your
own VPC.
AWS Step Functions:
Step Functions Overview: AWS Step Functions is a
serverless orchestration service that lets you combine AWS
Lambda functions and other AWS services to build
business-critical applications through visual workflows.
se the menu below to navigate the article sections:

State Machine: Step Functions are based on the concept of


a state machine, where each state represents a step in the
cle menu

workflow and can perform different functions like


e
calculations, data retrieval, or decision-making.
Types of States:
Task State: Represents a single unit of work performed by a
ng workflow. It can invoke Lambda functions, run ECS tasks, or
interact with other supported AWS services.
ss
Choice State: Adds branching logic to the workflow,
allowing for decisions to be made based on the input.
e
Wait State: Delays the state machine from transitioning to
the next state for a specified time.
Succeed and Fail States: Indicate the successful or
unsuccessful termination of the state machine.
Parallel State: Allows for the concurrent execution of
ent and Management

multiple branches of a workflow.


Map State: Processes multiple input elements dynamically,
Identity, and Compliance

n iterating through a set of steps for each element of an array.


Integration with AWS Services: Step Functions can
integrate with various AWS services, enabling complex
ly Asked Questions

workflows that include functions like data transformation,


batch processing, and report generation.
r Knowledge with Free Practice Questions

Error Handling: Provides robust error handling mechanisms,


allowing you to catch errors and implement retry logic or
fallback states.
Execution History: Keeps a detailed history of each
execution, which is useful for debugging and auditing
purposes.
Visual Interface: Offers a graphical console to visualize the
components of your workflow and their real-time status.
Scalability and Reliability: Automatically scales the
execution of workflows and ensures the reliable execution of
each step.
Pricing: Charges are based on the number of state
transitions in your workflows, making it cost-effective for a
wide range of use cases.
se the menu below to navigate the article sections:

Use Cases: Commonly used for data processing, task


cle menu
coordination, microservices orchestration, and automated IT
and business processes.
e
IAM Integration: Uses AWS Identity and Access
Management (IAM) to control access to resources and
services used in workflows.
API Support: Provides APIs for managing and executing
workflows programmatically.
ng

Amazon Managed Streaming for Apache Kafka


ss

e
(Amazon MSK):
MSK Overview: Amazon MSK is a fully managed service
that makes it easy to build and run applications that use
ent and Management

Apache Kafka to process streaming data.


Apache Kafka Integration: MSK is fully compatible with
Identity, and Compliance

Apache Kafka, allowing you to use Kafka APIs for creating,


n
configuring, and managing your Kafka clusters.
Cluster Management: MSK handles the provisioning,
configuration, and maintenance of Kafka clusters, including
ly Asked Questions

tasks like patching and updates.


Scalability: MSK can scale out to handle high throughput
r Knowledge with Free Practice Questions

and large numbers of topics and partitions, making it


suitable for big data streaming applications.
High Availability: MSK is designed for high availability with
replication across multiple AWS Availability Zones.
Security: Supports encryption at rest and in transit, VPC
integration, IAM for access control, and private connectivity
to ensure secure data handling.
Monitoring and Logging: Integrates with Amazon
CloudWatch for metrics and logging, allowing you to monitor
the health and performance of your Kafka clusters.
Data Retention: MSK allows you to configure the data
retention period, enabling you to store data for a specified
duration.
Consumer Lag Metrics: Provides consumer lag metrics,
which are critical for monitoring the health of streaming
se the menu below to navigate the article sections:

applications.
cle menu
Automatic Scaling: Supports automatic scaling of the
storage associated with your MSK clusters.
e
Kafka Connect and Kafka Streams: Compatible with Kafka
Connect for data integration and Kafka Streams for stream
processing.
Pricing: Pricing is based on the resources consumed,
including the number of broker nodes, storage, and data
ng

transfer.
Use Cases: Commonly used for real-time analytics, log
ss

e aggregation, message brokering, and event-driven


architectures.
Broker Node Configuration: Allows you to select the type
and number of broker nodes, providing flexibility based on
your workload requirements.
ent and Management

Database
Identity, and Compliance

In the Database section of our AWS Certified Data Engineer


Associate (DEA-C01) exam cheat sheet, we focus on a range of
ly Asked Questions

pivotal AWS database services including Amazon RDS, Aurora,


DynamoDB, Redshift, and AWS Data Pipeline.
r Knowledge with Free Practice Questions

This segment is crafted to equip you with a thorough


understanding of these database technologies, each playing a
significant role in AWS data engineering. From the managed
relational database capabilities of RDS and Aurora to the
NoSQL solutions offered by DynamoDB, the powerful data
warehousing features of Redshift, and the data orchestration
provided by AWS Data Pipeline, mastering these services is
essential for the DEA-C01 exam.
This section aims to provide the knowledge needed to design,
implement, and manage robust, scalable, and efficient
database solutions in the AWS ecosystem.
Amazon RDS (Relational Database Service):
se the menu below to navigate the article sections:

Database Engines Supported: Amazon RDS supports


several database engines, including MySQL, PostgreSQL,
cle menu

MariaDB, Oracle, and SQL Server.


e
Automated Backups: RDS automatically performs a daily
backup of your database (during a specified backup
window) and captures the entire DB instance and its data.
DB Snapshots: RDS allows you to create manual backups of
your database, known as DB Snapshots, which are user-
ng

ss initiated and retained until explicitly deleted.


Multi-AZ Deployments: RDS offers Multi-AZ deployments
e
for high availability. In a Multi-AZ deployment, RDS
automatically provisions and maintains a synchronous
standby replica in a different Availability Zone.
Read Replicas: RDS supports read replicas to increase read
scaling. Changes to the primary DB instance are
ent and Management

asynchronously copied to the read replica.


Storage Types: RDS offers three types of storage: General
Identity, and Compliance

n Purpose SSD (gp2), Provisioned IOPS SSD (io1), and


Magnetic. The choice depends on the type of workload.
Scaling: RDS allows vertical scaling (changing the instance
ly Asked Questions

type) and storage scaling. Storage scaling is online and


does not require downtime.
r Knowledge with Free Practice Questions

Security: RDS integrates with AWS Identity and Access


Management (IAM) and offers encryption at rest using AWS
Key Management Service (KMS). It also supports encryption
in transit using SSL.
Monitoring and Metrics: RDS integrates with Amazon
CloudWatch for monitoring the performance and health of
databases. Key metrics include CPU utilization, read/write
throughput, and database connections.
Parameter Groups: RDS uses DB Parameter Groups to
manage the configuration and tuning of the database
engine.
Subnet Groups: DB Subnet Groups in RDS define which
subnets and IP ranges the database can use in a VPC,
allowing for network isolation.
se the menu below to navigate the article sections:

Maintenance and Updates: RDS provides a maintenance


cle menu
window for updates to the database engine, which can be
specified by the user.
e
Endpoint Types: RDS instances have endpoints, and each
type (primary, read replica, custom) serves different
purposes in database connectivity.
Pricing Model: RDS pricing is based on the resources
consumed, such as DB instance hours, provisioned storage,
ng

provisioned IOPS, and data transfer.


Failover Process: In Multi-AZ deployments, RDS
ss

e automatically performs failover to the standby in case of an


issue with the primary instance.
Amazon Aurora:
ent and Management

Aurora Overview: Amazon Aurora is a MySQL and


PostgreSQL-compatible relational database built for the
Identity, and Compliance

n cloud, providing the performance and availability of high-


end commercial databases at a fraction of the cost.
High Performance and Scalability: Aurora provides up to
ly Asked Questions

five times the throughput of standard MySQL and three


times the throughput of standard PostgreSQL. It’s designed
r Knowledge with Free Practice Questions

to scale storage automatically, growing in 10GB increments


up to 64TB.
Aurora Replicas: Supports up to 15 low latency read
replicas across three Availability Zones to increase read
scalability and fault tolerance.
Aurora Serverless: Aurora Serverless is an on-demand,
auto-scaling configuration for Aurora where the database
will automatically start-up, shut down, and scale capacity up
or down based on your application’s needs.
Storage and Replication: Aurora replicates data across
multiple Availability Zones for improved availability and
reliability. It uses a distributed, fault-tolerant, self-healing
storage system.
Backup and Recovery: Continuous backup to Amazon S3
and point-in-time recovery are supported. Snapshots can be
se the menu below to navigate the article sections:

shared with other AWS accounts.


cle menu
Security: Offers encryption at rest using AWS Key
Management Service (KMS) and encryption in transit with
e
SSL. Also integrates with AWS Identity and Access
Management (IAM).
Aurora Global Database: Designed for globally distributed
applications, allowing a single Aurora database to span
multiple AWS regions with fast replication.
ng

Database Cloning: Supports fast database cloning, which is


useful for development and testing.
ss

e Compatibility: Offers full compatibility with existing MySQL


and PostgreSQL open-source databases.
Monitoring and Maintenance: Integrates with Amazon
CloudWatch for monitoring. Aurora automates time-
consuming tasks like patching and backups.
ent and Management

Custom Endpoints: Aurora allows you to create custom


endpoints that can direct read/write operations to specific
Identity, and Compliance

instances.
Pricing: Aurora pricing is based on instance hours, storage
n

consumed, and I/O operations. Aurora Serverless charges


ly Asked Questions

for actual consumption by the second.


Failover: Automatic failover to a replica in the case of a
r Knowledge with Free Practice Questions

failure, improving the database’s availability.


Aurora Parallel Query: Enhances performance by pushing
query processing down to the Aurora storage layer,
speeding up analytical queries.
Amazon DynamoDB:
DynamoDB Overview: Amazon DynamoDB is a fully
managed NoSQL database service that provides fast and
predictable performance with seamless scalability.
Data Model: DynamoDB is a key-value and document
database. It supports JSON-like documents and simple key-
value pairs.
Primary Key Types: DynamoDB supports two types of
primary keys:
se the menu below to navigate the article sections:

Partition Key: A simple primary key, composed of one


cle menu
attribute.
Composite Key: Consists of a partition key and a sort
e
key.
Read/Write Capacity Modes: Offers two read/write capacity
modes:
Provisioned Throughput Mode: Pre-allocate capacity
units.
ng

On-Demand Mode: Automatically scales to


accommodate workload demands.
ss

e Secondary Indexes: Supports two types of secondary


indexes for more complex queries:
Global Secondary Indexes (GSI): Index with a partition
key and sort key that can be different from those on the
table.
ent and Management

Local Secondary Indexes (LSI): Index with the same


partition key as the table but a different sort key.
Identity, and Compliance

Consistency Models: Offers both strongly consistent and


eventually consistent read options.
n

DynamoDB Streams: Captures a time-ordered sequence of


ly Asked Questions

item-level modifications in any DynamoDB table and stores


this information in a log for up to 24 hours.
r Knowledge with Free Practice Questions

Auto Scaling: Automatically adjusts read and write


throughput capacity, in response to dynamically changing
request volumes.
DynamoDB Accelerator (DAX): In-memory caching service
for DynamoDB, delivering fast read performance.
Data Backup and Restore: Supports on-demand and
continuous backups, point-in-time recovery, and restoration
of table data.
Security: Integrates with AWS Identity and Access
Management (IAM) for authentication and authorization.
Supports encryption at rest.
Integration with AWS Lambda: Enables the triggering of
AWS Lambda functions directly from DynamoDB Streams.
Global Tables: Provides fully replicated, multi-region, multi-
se the menu below to navigate the article sections:

master tables for high availability and global data access.


cle menu
Pricing: Based on provisioned throughput and stored data.
Additional charges for optional features like DAX, Streams,
e
and backups.
Use Cases: Ideal for web-scale applications, gaming, mobile
apps, IoT, and many other applications requiring low-latency
ng data access.
ss Amazon Redshift:
e
Redshift Overview: Amazon Redshift is a fully managed,
petabyte-scale data warehouse service in the cloud,
allowing users to analyze data using standard SQL and
existing Business Intelligence (BI) tools.
Columnar Storage: Redshift uses columnar storage, which
ent and Management

is optimized for data warehousing and analytics, leading to


faster query performance and efficient storage.
Identity, and Compliance

n Node Types: Redshift offers two types of nodes – dense


compute (DC) and dense storage (DS) – which are chosen
based on the amount of data and the computational power
ly Asked Questions

required.
Data Distribution Styles:
r Knowledge with Free Practice Questions

Even Distribution: Distributes table rows evenly across


all slices and nodes.
Key Distribution: Distributes rows based on the values of
the specified column.
All Distribution: Copies the entire table to every node,
beneficial for smaller dimension tables.
Sort Keys: Sort keys determine the order of data within
each block and can significantly impact query performance.
Redshift supports both compound and interleaved sort
keys.
Redshift Spectrum: Allows querying data directly in
Amazon S3, enabling a data lake architecture. It’s used for
running queries on large datasets in S3 without loading
them into Redshift.
se the menu below to navigate the article sections:

Concurrency Scaling: Automatically adds additional cluster


cle menu
capacity to handle an increase in concurrent read queries.
Workload Management (WLM): Redshift WLM allows users
e
to define multiple queues and assign memory and
concurrency limits to manage query performance.
Elastic Resize: Quickly adds or removes nodes to match
workload demands, enabling fast scaling of the cluster’s
compute resources.
ng

VACUUM Command: Used to reclaim space and resort rows


in tables where data has been updated or deleted,
ss

e optimizing storage efficiency and query performance.


Redshift Data API: Enables running SQL queries on data in
Redshift asynchronously and retrieving the results through a
simple API call, useful for integrating with web services and
AWS Lambda.
ent and Management

Encryption and Security: Supports encryption at rest and in


transit, along with VPC integration and IAM for access
Identity, and Compliance

control.
Backup and Restore: Automated and manual snapshots for
n

data backup and point-in-time recovery.


ly Asked Questions

Query Optimization: Redshift’s query optimizer uses cost-


based algorithms and machine learning to deliver fast query
r Knowledge with Free Practice Questions

performance.
Pricing Model: Based on the type and number of nodes in
the cluster, with additional costs for features like Redshift
Spectrum and data transfer.
Use Cases: Ideal for complex querying and analysis of large
datasets, business intelligence applications, and data
warehousing.
AWS Data Pipeline:
Data Pipeline Overview: AWS Data Pipeline is a web service
for processing and moving data between different AWS
compute and storage services, as well as on-premises data
sources, at specified intervals.
se the menu below to navigate the article sections:

Data Movement and Transformation: It can be used to


regularly transfer and transform data between AWS services
cle menu

like Amazon S3, RDS, DynamoDB, and EMR.


e
Workflow Management: Data Pipeline allows you to create
complex data processing workloads that are fault-tolerant,
repeatable, and highly available.
ng Scheduling: You can schedule regular data movement and
data processing activities. The service ensures that these
ss
tasks are carried out at defined intervals.
Prebuilt Templates: AWS Data Pipeline provides prebuilt
e
templates for common scenarios like copying data between
Amazon S3 and RDS or running queries on a schedule.
Custom Scripts: Supports custom scripts written in SQL,
Python, and other scripting languages for data
transformation tasks.
ent and Management

Error Handling: Provides options to retry failed tasks and to


notify users of success or failure through Amazon SNS.
Identity, and Compliance

n Resource Management: Manages the underlying resources


needed to perform data movements and transformations,
automatically spinning up EC2 instances or EMR clusters as
ly Asked Questions

needed.
Integration with AWS IAM: Uses AWS Identity and Access
r Knowledge with Free Practice Questions

Management (IAM) for security and access control to


resources and pipeline activities.
Visual Interface: Offers a drag-and-drop web interface to
create and manage data processing workflows.
Pipeline Definition: Pipelines are defined in JSON format,
specifying the data sources, destinations, activities, and
scheduling information.
Logging and Monitoring: Integrates with Amazon
CloudWatch for monitoring pipeline performance and logs
activities for auditing and troubleshooting.
Pricing: Charges are based on the number of preconditions
and activities used in your pipeline and the compute
resources consumed.
Use Cases: Commonly used for regular data extraction,
transformation, and loading (ETL) tasks, data backup, and
se the menu below to navigate the article sections:

cle menu log processing.


e Analytics
In the Analytics section of our AWS Certified Data Engineer
Associate (DEA-C01) exam cheat sheet, we delve into the core
ng
AWS analytics services, including AWS Glue, Amazon Athena,
Amazon EMR, Amazon Kinesis, AWS Lake Formation, and
ss
Amazon QuickSight.
e
This part of the guide is essential for understanding how to
leverage these services to analyze and process large datasets
effectively.
AWS Glue:
ent and Management

Identity, and Compliance

AWS Glue Overview: AWS Glue is a fully managed extract,


n
transform, and load (ETL) service that makes it easy to
prepare and load data for analytics.
Glue Data Catalog: Acts as a centralized metadata
ly Asked Questions

repository for all your data assets, regardless of where they


are stored. It integrates with Amazon Athena, Amazon
r Knowledge with Free Practice Questions

Redshift Spectrum, and AWS Lake Formation.


AWS Glue Crawlers: Automatically discover and profile your
data. Crawlers scan various data stores to infer schemas
and populate the Glue Data Catalog with table definitions
and other metadata.
ETL Jobs in Glue: Allows you to author and orchestrate ETL
jobs. These jobs can be triggered on a schedule or in
response to an event.
AWS Glue Studio: A visual interface to create, run, and
monitor ETL jobs. It simplifies the process of writing ETL
scripts with a drag-and-drop editor.
AWS Glue DataBrew: A visual data preparation tool that
enables data analysts and data scientists to clean and
normalize data without writing code.
se the menu below to navigate the article sections:

Glue Schema Registry: Manages schema versioning and


cle menu
validation. It’s used to track different versions of data
schemas and validate data formats to ensure data quality.
e
Script Generation: Glue automatically generates ETL scripts
in PySpark or Scala that can be customized as needed.
Serverless Architecture: AWS Glue is serverless, so there is
no infrastructure to manage. It automatically provisions the
resources required to run your ETL jobs.
ng

Data Sources and Targets: Supports various data sources


and targets, including Amazon S3, RDS, Redshift, and third-
ss

e party databases.
Built-in Transforms: Provides a library of predefined
transforms to perform operations like joining, filtering, and
sorting data.
Security: Integrates with AWS IAM for access control and
ent and Management

supports encryption of data in transit and at rest.


Monitoring and Logging: Integrates with Amazon
Identity, and Compliance

CloudWatch for monitoring ETL job execution and logs.


Pricing: Based on the resources consumed by the ETL jobs
n

and the number of DataBrew interactive sessions.


ly Asked Questions

Use Cases: Commonly used for data integration, data


cleansing, data normalization, and building data lakes.
r Knowledge with Free Practice Questions

Amazon Athena:
Athena Overview: Amazon Athena is an interactive query
service that makes it easy to analyze data in Amazon S3
using standard SQL.
Serverless: Athena is serverless, so there is no
infrastructure to manage. You pay only for the queries you
run.
S3 Integration: Directly works with data stored in S3. It’s
commonly used for querying log files, clickstream data, and
other unstructured/semi-structured data.
SQL Compatibility: Supports most of the standard SQL
functions, including joins, window functions, and arrays.
Data Formats: Works with multiple data formats such as
se the menu below to navigate the article sections:

JSON, CSV, Parquet, ORC, and Avro.


cle menu
Schema Definition: Uses the AWS Glue Data Catalog for
schema management, which stores metadata and table
e
definitions.
Partitioning: Supports partitioning of data, which improves
query performance and reduces costs by scanning only
relevant data.
Query Results: Athena stores query results in S3, and you
ng

can specify the output location.


Security: Integrates with AWS IAM for access control.
ss

e Supports encryption at rest for query results in S3.


Performance Optimization: Query performance can be
optimized by using columnar formats like Parquet or ORC,
compressing data, and partitioning datasets.
Cost Optimization: Athena charges are based on the
ent and Management

amount of data scanned per query. Costs can be optimized


by compressing data, partitioning, and using columnar data
Identity, and Compliance

formats.
Use Cases: Ideal for ad-hoc querying, data analysis, and
n

business intelligence applications.


ly Asked Questions

Integration with Other AWS Services: Integrates with AWS


Glue for ETL, AWS QuickSight for visualization, and AWS
r Knowledge with Free Practice Questions

Lambda for advanced processing.


Federated Query: Athena supports federated queries,
allowing you to run SQL queries across data stored in
relational, non-relational, object, and custom data sources.
Saved Queries and History: Athena allows saving queries
and maintains a history of executed queries for auditing and
review purposes.
Amazon EMR (Elastic MapReduce):
EMR Overview: Amazon EMR is a cloud-native big data
platform, allowing processing of vast amounts of data
quickly and cost-effectively across resizable clusters of
Amazon EC2 instances.
se the menu below to navigate the article sections:

Hadoop Ecosystem: EMR supports a broad array of big


data frameworks, including Apache Hadoop, Spark, HBase,
cle menu

Presto, and Flink, making it suitable for a variety of


e
processing tasks like batch processing, streaming, machine
learning, and interactive analytics.
Cluster Management: EMR simplifies the setup,
ng management, and scaling of big data processing clusters. It
offers options for auto-scaling the cluster size based on
ss
workload.
Data Storage: EMR can process data from Amazon S3,
e
DynamoDB, Amazon RDS, and Amazon Redshift. It also
supports HDFS (Hadoop Distributed File System) and EMR
File System (EMRFS).
EMRFS (EMR File System): An implementation of HDFS that
allows EMR clusters to store data directly in Amazon S3,
ent and Management

providing durability and cost savings on storage.


Pricing Model: Offers a pay-as-you-go pricing model. You
Identity, and Compliance

n pay for the EC2 instances and other AWS resources (like
Amazon S3) used while your cluster is running.
Security: Integrates with AWS IAM for authentication and
ly Asked Questions

authorization. Supports encryption in transit and at rest, and


can be configured to launch in a VPC.
r Knowledge with Free Practice Questions

Spot Instances: Supports the use of EC2 Spot Instances to


optimize the cost of processing large datasets.
Customization and Flexibility: Allows customization of
clusters with bootstrap actions and supports multiple
instance types and configurations.
Data Processing Optimization: Offers optimizations for
processing with Spark and Hadoop, including optimized
versions of these frameworks.
Monitoring and Logging: Integrates with Amazon
CloudWatch for monitoring the performance of the cluster.
Also supports logging to Amazon S3 for audit and
troubleshooting purposes.
Notebook Integration: Supports Jupyter and Zeppelin
notebooks for interactive data exploration and visualization.
se the menu below to navigate the article sections:

EMR Studio: An integrated development environment (IDE)


cle menu
for developing, visualizing, and debugging data engineering
and data science applications written in R, Python, Scala,
e
and PySpark.
Use Cases: Commonly used for log analysis, real-time
analytics, web indexing, data transformations (ETL),
machine learning, scientific simulation, and bioinformatics.
EMR Managed Scaling: Automatically resizes clusters for
ng

ss optimal performance and cost with EMR managed scaling.


e
Amazon Kinesis (including Kinesis Data Streams,
Data Firehose, and Data Analytics):
Amazon Kinesis Overview: Amazon Kinesis is a platform for
ent and Management

streaming data on AWS, offering powerful services to load


and analyze streaming data, and also providing the ability to
Identity, and Compliance

build custom streaming data applications.


n
Kinesis Data Streams (KDS):
Purpose: Enables real-time processing of streaming data
at a massive scale.
ly Asked Questions

Key Features: Allows you to continuously collect and


store terabytes of data per hour from hundreds of
r Knowledge with Free Practice Questions

thousands of sources.
Consumers: Data can be processed with custom
applications using Kinesis Client Library (KCL) or other
AWS services like Kinesis Data Analytics, Kinesis Data
Firehose, and AWS Lambda.
Kinesis Data Firehose:
Purpose: Automatically loads streaming data into AWS
data stores and analytics tools.
Key Features: Supports near-real-time loading of data
into Amazon S3, Redshift, Elasticsearch Service, and
Splunk.
Transformation and Conversion: Offers capabilities to
transform and convert incoming streaming data before
loading it to destinations.
se the menu below to navigate the article sections:

Kinesis Data Analytics:


cle menu
Purpose: Enables you to analyze streaming data with
SQL or Apache Flink without having to learn new
e
programming languages or processing frameworks.
Key Features: Provides built-in functions to filter,
aggregate, and transform streaming data for advanced
analytics.
Integration: Seamlessly integrates with Kinesis Data
ng

Streams and Kinesis Data Firehose for sourcing data.


Shards in Kinesis Data Streams:
ss

e Functionality: A stream is composed of one or more


shards, each of which provides a fixed unit of capacity.
Scaling: The total capacity of a stream is the sum of the
capacities of its shards.
Data Retention: Kinesis Data Streams stores data for 24
ent and Management

hours by default, which can be extended up to 7 days.


Real-Time Processing: Kinesis is designed for real-time
Identity, and Compliance

processing of data as it arrives, unlike batch processing.


Security: Supports encryption at rest and in transit, IAM for
n

access control, and VPC endpoints for private network


ly Asked Questions

access.
Monitoring and Logging: Integrates with Amazon
r Knowledge with Free Practice Questions

CloudWatch for monitoring the performance of streams and


firehoses.
Use Cases: Ideal for real-time analytics, log and event data
collection, real-time metrics, and reporting, and IoT data
processing.
Pricing: Based on the volume of data processed and the
number of shards used in Kinesis Data Streams, and the
amount of data ingested and transformed in Kinesis Data
Firehose.
AWS Lake Formation:
Lake Formation Overview: AWS Lake Formation simplifies
the process of setting up a secure and well-architected data
lake. It automates the provisioning and configuration of the
underlying resources needed for a data lake on AWS.
se the menu below to navigate the article sections:

Data Lake Creation and Management: Lake Formation


assists in collecting, cleaning, and cataloging data from
cle menu

various sources. It organizes data into a central repository in


e
Amazon S3.
Integration with AWS Services: Works seamlessly with
other AWS services like Amazon Redshift, Amazon Athena,
ng and AWS Glue. It uses the AWS Glue Data Catalog as a
central metadata repository.
ss
Security and Access Control: Provides granular access
control to data stored in the data lake. It integrates with
e
AWS Identity and Access Management (IAM) to manage
permissions and access.
Data Cataloging: Automatically crawls data sources to
identify and catalog data, making it searchable and
queryable.
ent and Management

Data Cleaning and Transformation: Offers tools to clean


and transform data using AWS Glue, making it ready for
Identity, and Compliance

n analysis.
Blueprints: Lake Formation provides blueprints for common
data ingestion patterns, such as database replication or log
ly Asked Questions

processing, simplifying the process of data loading.


Machine Learning Integration: Facilitates the use of
r Knowledge with Free Practice Questions

machine learning with data in the data lake using services


like Amazon SageMaker.
Audit and Monitoring: Integrates with AWS CloudTrail and
Amazon CloudWatch for auditing and monitoring data lake
activities.
Self-service Data Access: Enables end-users to access and
analyze data with their choice of analytics and machine
learning services.
Cross-Account Data Sharing: Supports sharing data across
different AWS accounts, enhancing collaboration while
maintaining security and governance.
Data Lake Optimization: Provides recommendations for
optimizing data storage and access, improving
performance, and reducing costs.
se the menu below to navigate the article sections:

Use Cases: Ideal for organizations looking to set up a


cle menu
secure data lake quickly, enabling various analytics and
e
machine learning applications.
Amazon QuickSight:
ng QuickSight Overview: Amazon QuickSight is a scalable,
serverless, embeddable, machine learning-powered
ss
business intelligence (BI) service built for the cloud.
Data Sources: QuickSight can connect to a wide array of
e
data sources within AWS, such as Amazon RDS, Redshift,
S3, Athena, and more, as well as external databases and flat
files.
SPICE Engine: QuickSight uses the Super-fast, Parallel, In-
memory Calculation Engine (SPICE) to perform advanced
ent and Management

calculations and render visualizations quickly.


Visualizations: Offers a variety of visualization types,
Identity, and Compliance

n including graphs, charts, tables, and more, which can be


used to create interactive dashboards.
Dashboards and Stories: Users can create and publish
ly Asked Questions

interactive dashboards, and share insights with others


through stories within QuickSight.
r Knowledge with Free Practice Questions

Machine Learning Insights: Integrates machine learning


capabilities to provide insights, forecast trends, and
highlight patterns in data.
Security and Access Control: Integrates with AWS IAM for
managing access and uses row-level security to control
access to data based on user roles.
Embedding and API: Supports embedding analytics into
applications and provides an API for interaction with other
services and applications.
Mobile Access: Offers mobile applications for iOS and
Android, allowing access to dashboards and insights on the
go.
Scalability: As a serverless service, QuickSight scales
automatically to accommodate the number of users and the
volume of data.
se the menu below to navigate the article sections:

Pay-per-Session Pricing: Offers a unique pay-per-session


cle menu
pricing model, making it cost-effective for wide deployment
across many users.
e
Themes and Customization: Supports custom themes and
layouts for dashboards, enabling alignment with company
branding.
Collaboration and Sharing: Facilitates sharing of
dashboards and analyses within and outside the
ng

organization, with fine-grained control over permissions.


Data Preparation: Includes data preparation tools for
ss

e cleaning and transforming data before analysis.


Use Cases: Ideal for building interactive BI dashboards,
performing ad-hoc analysis, and embedding analytics in
applications.
ent and Management

Deployment and Management


Identity, and Compliance

n
In the Deployment and Management section of our AWS
Certified Data Engineer Associate (DEA-C01) exam cheat
sheet, we concentrate on pivotal AWS services like AWS
ly Asked Questions

CloudFormation, Amazon CloudWatch, Amazon AppFlow, and


Amazon Managed Workflows for Apache Airflow (MWAA).
r Knowledge with Free Practice Questions

This segment is tailored to provide a deep dive into the tools


and services essential for efficiently deploying, monitoring, and
managing data engineering workflows and resources in AWS.
AWS CloudFormation:
CloudFormation Overview: AWS CloudFormation is a
service that helps you model and set up your Amazon Web
Services resources so that you can spend less time
managing those resources and more time focusing on your
applications.
Infrastructure as Code: CloudFormation allows you to use
programming languages or a simple text file to model and
provision, in an automated and secure manner, all the
se the menu below to navigate the article sections:

resources needed for your applications across all regions


cle menu
and accounts.
Templates: Resources are defined in CloudFormation
e
templates, which are JSON or YAML files describing the
AWS resources and their dependencies so you can launch
and configure them together as a stack.
Stacks: A stack is a collection of AWS resources that you
can manage as a single unit. You can create, update, or
ng

delete a collection of resources by managing the stack.


Change Sets: Before making changes to your stack, Change
ss

e Sets allow you to see how those changes might impact your
existing resources.
Resource Management: CloudFormation manages the
complete lifecycle of resources: creation, updating, and
deletion.
ent and Management

Custom Resources: Enables the creation of custom


resources when existing resources do not meet all your
Identity, and Compliance

needs.
Nested Stacks: Allows organizing stacks in a hierarchical
n

manner by creating a parent stack and including other


ly Asked Questions

stacks as child stacks.


Rollback Capabilities: In case of errors during deployment,
r Knowledge with Free Practice Questions

CloudFormation automatically rolls back to the previous


state, ensuring resource integrity.
Integration with AWS Services: Works with a wide range of
AWS services, enabling comprehensive management of an
application’s resources.
Declarative Programming: You declare the desired state of
your AWS resources, and CloudFormation takes care of the
provisioning and configuration.
Drift Detection: CloudFormation can detect if the
configuration of a resource has drifted from its expected
state.
Security: Integrates with AWS Identity and Access
Management (IAM) for secure access to resources and
supports encryption for sensitive data.
se the menu below to navigate the article sections:

Cross-Account and Cross-Region Management: Supports


cle menu
managing resources across different AWS accounts and
regions.
e
Use Cases: Commonly used for repeatable and consistent
deployment of applications, infrastructure automation, and
managing multi-tier complex architectures.
Amazon CloudWatch:
ng

ss

CloudWatch Overview: Amazon CloudWatch is a monitoring


e
and observability service built for DevOps engineers,
developers, site reliability engineers (SREs), and IT
managers.
Metrics: CloudWatch provides data and actionable insights
to monitor applications, understand and respond to system-
ent and Management

wide performance changes, optimize resource utilization,


and get a unified view of operational health.
Identity, and Compliance

n Custom Metrics: You can publish your own metrics to


CloudWatch using the AWS CLI or API.
Alarms: CloudWatch Alarms allow you to watch a single
ly Asked Questions

CloudWatch metric or the result of a math expression based


on CloudWatch metrics. You can set alarms to notify you
r Knowledge with Free Practice Questions

when a threshold is breached.


Logs: CloudWatch Logs can be used to monitor, store, and
access log files from AWS EC2 instances, AWS CloudTrail,
Route 53, and other sources.
Events/EventBridge: CloudWatch Events/EventBridge
delivers a near real-time stream of system events that
describe changes in AWS resources. It can trigger AWS
Lambda functions, create SNS topics, or perform other
actions.
Dashboards: CloudWatch Dashboards are customizable
home pages in the CloudWatch console that you can use to
monitor your resources in a single view, even those spread
across different regions.
High-Resolution Metrics: Supports high-resolution metrics
(down to one-second granularity).
se the menu below to navigate the article sections:

Integration with AWS Services: Integrates with various


cle menu
AWS services for monitoring, logging, and events, providing
a comprehensive view of AWS resources and applications.
e
Real-time Monitoring: Offers real-time monitoring of AWS
resources and applications, with metrics updated
continuously.
Automated Actions: Can automatically respond to changes
in your AWS resources.
ng

CloudWatch Logs Insights: Provides an interactive


interface to search and analyze your log data in CloudWatch
ss

e Logs.
CloudWatch Synthetics: Allows you to create canaries to
monitor your endpoints and APIs from the outside-in.
Pricing: Offers a basic level of monitoring and logging at no
cost, with additional charges for extended metric retention,
ent and Management

additional dashboards, and logs data ingestion and storage.


Use Cases: Commonly used for performance monitoring,
Identity, and Compliance

operational troubleshooting, application monitoring, and


ensuring the security and compliance of AWS environments.
n

Amazon AppFlow:
ly Asked Questions

r Knowledge with Free Practice Questions

AppFlow Overview: Amazon AppFlow is a fully managed


integration service that enables you to securely transfer
data between AWS services and SaaS applications like
Salesforce, ServiceNow, Slack, and Google Analytics.
Data Transfer and Integration: AppFlow allows you to
automate the flow of data between AWS services and SaaS
applications without writing custom integration code.
Secure Data Movement: Ensures secure and private data
transfer with encryption at rest and in transit.
Data Transformation Capabilities: Offers data
transformation features such as mapping, merging,
masking, filtering, and validation to prepare data for
analysis.
No Code Required: Provides a simple, no-code interface to
create and execute data flows.
se the menu below to navigate the article sections:

Event-Driven Flows: Supports triggering data flows based


cle menu
on events in SaaS applications, enabling real-time data
integration.
e
Batch and Scheduled Data Transfers: Allows for both
batch and scheduled data transfers, giving flexibility in how
and when data is moved.
Error Handling: Includes robust error handling capabilities,
ensuring reliable data transfer even in case of intermittent
ng

connectivity issues.
Integration with AWS Analytics Services: Seamlessly
ss

e integrates with AWS analytics services like Amazon


Redshift, Amazon S3, and AWS Lambda for advanced data
processing and analytics.
Use Cases: Commonly used for CRM data integration,
marketing analytics, operational reporting, and data backup
ent and Management

and archival.
Scalability: Scales automatically to meet the data transfer
Identity, and Compliance

demands of the applications.


Monitoring and Logging: Integrates with Amazon
n

CloudWatch for monitoring the performance and logging the


ly Asked Questions

activities of data flows.


Pricing: Pay-as-you-go pricing model based on the number
r Knowledge with Free Practice Questions

of flows run and the volume of data processed.


Connectors: Provides a range of pre-built connectors for
popular SaaS applications, making it easy to set up data
flows.
Data Governance and Compliance: Adheres to AWS’s high
standards for data governance and compliance, ensuring
data is handled securely.
Amazon Managed Workflows for Apache Airflow
(Amazon MWAA):
Amazon MWAA Overview: Amazon Managed Workflows for
Apache Airflow (MWAA) is a managed service that makes it
se the menu below to navigate the article sections:

easier to set up and operate end-to-end data pipelines in


cle menu the cloud with Apache Airflow.
Apache Airflow Integration: Amazon MWAA is built on
Apache Airflow, an open-source platform used for
orchestrating complex computational workflows and data
e

processing pipelines.
Managed Service: AWS manages the underlying
ng
infrastructure for Apache Airflow, including the setup,
maintenance, scaling, and patching, reducing the
ss
operational overhead for users.
Workflow Automation: Enables the creation of workflows
e
using directed acyclic graphs (DAGs) in Python, which
specify the tasks to be executed, their dependencies, and
the order in which they should run.
Scalability: Automatically scales workflow execution
ent and Management

capacity to match the workload.


Integration with AWS Services: Seamlessly integrates with
Identity, and Compliance

various AWS services like Amazon S3, Amazon Redshift,


n
AWS Lambda, and AWS Step Functions, facilitating the
creation of diverse data pipelines.
Monitoring and Logging: Integrates with Amazon
ly Asked Questions

CloudWatch for monitoring and logging, providing insights


into workflow performance and execution.
r Knowledge with Free Practice Questions

Security: Offers built-in security features, including


encryption in transit and at rest, IAM roles for execution, and
VPC support for network isolation.
Customization: Supports custom plugins and
configurations, allowing users to tailor the environment to
their specific workflow requirements.
High Availability: Designed for high availability, with
workflows running in a highly available manner across
multiple Availability Zones.
Cost-Effective: Offers a pay-as-you-go pricing model,
charging based on the number of vCPU and GB of memory
used per hour.
DAG Scheduling and Triggering: Supports complex
scheduling and triggering mechanisms for DAGs, enabling
sophisticated workflow orchestration.
se the menu below to navigate the article sections:

User Interface: Provides a web interface for managing and


cle menu
monitoring Airflow DAGs, making it easy to visualize
pipelines and their execution status.
e
Use Cases: Ideal for data engineering tasks, ETL
processing, machine learning model training pipelines, and
any scenario requiring complex data workflow orchestration.
Version Support: Regularly updated to support the latest
versions of Apache Airflow, ensuring access to new features
ng

ss and improvements.
e
Security, Identity, and Compliance
In the Security, Identity, and Compliance section of our AWS
ent and Management

Certified Data Engineer Associate (DEA-C01) exam cheat


sheet, we delve into critical AWS services such as AWS
Identity, and Compliance

Identity and Access Management (IAM), AWS Secrets


n
Manager, Amazon EventBridge, and AWS CloudTrail. This part
of the guide is designed to enhance your understanding of the
security and compliance aspects within AWS, which are
ly Asked Questions

fundamental to any data engineering role.


r Knowledge with Free Practice Questions

You’ll gain insights into IAM for managing access to AWS


resources, Secrets Manager for securing sensitive information,
EventBridge for event-driven security monitoring, and
CloudTrail for logging and tracking user activity. Mastery of
these services is essential for the DEA-C01 exam, as they play
a crucial role in ensuring the security and compliance of data
engineering solutions in the AWS cloud.
AWS Identity and Access Management (IAM):
IAM Overview: AWS Identity and Access Management (IAM)
is a web service that helps securely control access to AWS
resources. It allows you to manage users, security
credentials (like access keys), and permissions that control
se the menu below to navigate the article sections:

which AWS resources users and applications can access.


Users, Groups, and Roles:
cle menu

Users: IAM identities that represent a person or service.


e
Groups: Collections of IAM users, managed as a unit with
shared permissions.
Roles: IAM identities with specific permissions that can
ng be assumed by users, applications, or AWS services.
Policies and Permissions:
ss
Policies are objects in IAM that define permissions and
can be attached to users, groups, and roles.
e
Supports JSON policy language to specify permissions
and resources.
Access Management:
Provides tools to set up authentication and authorization
for AWS resources.
ent and Management

Supports Multi-Factor Authentication (MFA) for enhanced


security.
Identity, and Compliance

n IAM Best Practices:


Principle of least privilege: Granting only the permissions
required to perform a task.
ly Asked Questions

Regularly rotate security credentials.


Use IAM roles for applications running on EC2 instances.
r Knowledge with Free Practice Questions

Integration with AWS Services:


Integrates with almost all AWS services, enabling fine-
grained access control to AWS resources.
Identity Federation:
Supports identity federation to allow users to
authenticate with external identity providers and then
access AWS resources without needing to create an IAM
user.
IAM Access Analyzer:
Helps identify the resources in your organization and
accounts that are shared with an external entity.
Security Auditing:
Integrates with AWS CloudTrail for auditing IAM activity.
Enables tracking of changes in permissions and resource
policies.
se the menu below to navigate the article sections:

Cross-Account Access:
cle menu
Allows users from one AWS account to access resources
in another AWS account.
e
Conditional Access Control:
Supports the use of conditions in IAM policies for finer
control, such as allowing access only from specific IP
ranges or at certain times.
IAM Roles for EC2:
ng

Allows EC2 instances to securely make API requests


using temporary credentials.
ss

e Service-Linked Roles:
Predefined roles that provide permissions for AWS
services to access other AWS services on your behalf.
Tagging IAM Entities:
Supports tagging of IAM users and roles for easier
ent and Management

management and cost allocation.


Use Cases:
Identity, and Compliance

Essential for managing security and access in AWS


environments, including scenarios like multi-user AWS
n

accounts, cross-account access, and automated access


ly Asked Questions

by AWS services.
r Knowledge with Free Practice Questions

AWS Secrets Manager:


Secrets Manager Overview: AWS Secrets Manager is a
service for managing, retrieving, and rotating database
credentials, API keys, and other secrets throughout their
lifecycle.
Secret Rotation: Secrets Manager can automatically rotate
secrets on a scheduled basis without user intervention. It
supports AWS databases like RDS, DocumentDB, and
Redshift, as well as third-party services.
Secure Storage of Secrets: Secrets are encrypted using
encryption keys that you create using AWS Key
Management Service (KMS). This ensures that the secrets
are stored securely.
se the menu below to navigate the article sections:

Integration with AWS Services: Seamlessly integrates with


cle menu
other AWS services, allowing you to retrieve secrets from
within AWS Lambda functions, EC2 instances, RDS
e
databases, and more.
Centralized Management: Provides a centralized interface
to manage secrets across various AWS services and
applications.
Versioning of Secrets: Supports versioning of secrets,
ng

allowing you to retrieve previous versions of a secret if


needed.
ss

e Fine-Grained Access Control: Integrates with AWS IAM,


allowing you to control which users or services have access
to specific secrets.
Audit and Monitoring: Integrates with AWS CloudTrail for
auditing secret access and changes, providing a record of
ent and Management

who accessed what secret and when.


Cross-Account Access: Allows sharing of secrets across
Identity, and Compliance

different AWS accounts, facilitating secure access in multi-


account environments.
n

API and CLI Access: Secrets can be managed and retrieved


ly Asked Questions

using the AWS Management Console, AWS CLI, or Secrets


Manager APIs.
r Knowledge with Free Practice Questions

Secrets Retrieval: Applications can retrieve secrets with a


simple API call, which makes it easier to manage credentials
for databases and other services.
Disaster Recovery: Secrets Manager is designed for high
availability and durability, storing secrets across multiple
Availability Zones.
Custom Rotation Logic: Supports the use of custom AWS
Lambda functions for defining custom secret rotation logic
for non-AWS databases and other types of secrets.
Pricing: Charged based on the number of secrets managed
and the number of secret versions accessed each month.
Use Cases: Commonly used for managing database
credentials, API keys, and other sensitive information,
especially in automated and scalable environments.
se the menu below to navigate the article sections:

cle menu Amazon EventBridge:


EventBridge Overview: Amazon EventBridge is a serverless
e
event bus service that enables you to connect applications
using events. It facilitates building event-driven
architectures by routing events between AWS services,
ng integrated SaaS applications, and custom applications.
Event Sources: EventBridge can receive events from AWS
ss
services, SaaS applications, and custom applications. It
supports a wide range of event sources, making it versatile
e
for various use cases.
Event Rules: You can create rules that define how to
process and route events. These rules can filter events or
transform their content before routing.
Targets: Events can be routed to multiple AWS service
ent and Management

targets for processing. Common targets include AWS


Lambda functions, Amazon SNS topics, Amazon SQS
Identity, and Compliance

n queues, and more.


Schema Registry: EventBridge includes a schema registry
that defines the structure of event data. It helps in
ly Asked Questions

understanding the format of incoming events and simplifies


the process of writing code to handle those events.
r Knowledge with Free Practice Questions

Integration with SaaS Applications: EventBridge has built-


in integrations with various SaaS applications, enabling you
to easily route events from these applications to AWS
services.
Custom Event Buses: Supports creating custom event
buses in addition to the default event bus. Custom event
buses can be used for routing events from your own
applications or third-party SaaS applications.
Scalability and Reliability: As a serverless service,
EventBridge scales automatically to handle a high number of
events and offers high availability and reliability.
Event Pattern Matching: Event rules use event patterns for
filtering events. These patterns can match event attributes,
enabling precise control over which events trigger actions.
se the menu below to navigate the article sections:

Security and Access Control: Integrates with AWS IAM for


cle menu
access control, ensuring secure handling of events.
Cross-Account Event Delivery: EventBridge supports
e
sending events to different AWS accounts, facilitating
cross-account communication, and decoupling of services.
Real-Time Data Flow: Enables real-time data flow between
services, making it suitable for applications that require
immediate response to changes.
ng

Monitoring and Logging: Integrates with Amazon


CloudWatch for monitoring and logging, providing insights
ss

e into event patterns and rule invocations.


API Destinations: Allows you to route events to HTTP APIs,
expanding the range of possible integrations and actions.
Use Cases: Commonly used for building loosely coupled,
scalable, and reliable event-driven architectures in the
ent and Management

cloud.
Identity, and Compliance

n AWS CloudTrail:
CloudTrail Overview: AWS CloudTrail is a service that
ly Asked Questions

provides a record of actions taken by a user, role, or an AWS


service in AWS, enabling governance, compliance,
r Knowledge with Free Practice Questions

operational auditing, and risk auditing of your AWS account.


Activity Logging: CloudTrail tracks user activity and API
usage, recording AWS Management Console actions and
API calls, including who made the call, from what IP
address, and when.
Event History: CloudTrail Event history allows you to view,
search, and download the recent AWS account activity.
Management and Data Events: CloudTrail provides two
types of events:
Management events: Operations that are performed on
resources in your AWS account.
Data events: Operations performed on or within a
resource.
Multiple Trails: You can create multiple trails, each of which
can be configured to capture different types of events or to
se the menu below to navigate the article sections:

log events in different S3 buckets.


cle menu
Integration with Amazon S3: CloudTrail logs can be
delivered to an Amazon S3 bucket for storage and analysis.
e
You can set up S3 lifecycle policies to archive or delete logs
after a specified period.
Log File Integrity Validation: CloudTrail provides log file
integrity validation, ensuring that your log files have not
been tampered with after CloudTrail has delivered them to
ng

your S3 bucket.
Encryption: Log files are encrypted using Amazon S3
ss

e server-side encryption (SSE).


Real-Time Monitoring: Integrates with Amazon CloudWatch
Logs and Amazon CloudWatch Events for real-time
monitoring and alerting of specific API activity or error rates.
Global Service Events: CloudTrail can be configured to log
ent and Management

API calls and activities from AWS global services such as


IAM and AWS STS.
Identity, and Compliance

Lookup API: Provides the Lookup API to programmatically


access and search CloudTrail event history for specific
n

activities.
ly Asked Questions

Cross-Account Access: Supports logging of events in


multi-account AWS environments, allowing centralized
r Knowledge with Free Practice Questions

logging and analysis.


Compliance and Auditing: CloudTrail logs are crucial for
compliance and auditing processes, providing evidence of
who did what in the AWS environment.
AWS Organizations Integration: CloudTrail supports AWS
Organizations, enabling you to set up a single trail to log
events for all AWS accounts in an organization.
Use Cases: Commonly used for security analysis, resource
change tracking, troubleshooting, and ensuring compliance
with internal policies and regulatory standards.
Migration
In the Migration section of our AWS Certified Data Engineer
se the menu below to navigate the article sections:

Associate (DEA-C01) exam cheat sheet, we focus on AWS


cle menu DataSync and AWS Database Migration Service (DMS), two
key services for data migration in the AWS ecosystem.
This segment is specifically designed to provide you with
e

essential knowledge and practical insights into these services,


crucial for any data engineering professional. Understanding
ng the functionalities and best practices of DataSync for efficient
data transfer and synchronization, along with DMS for
ss
seamless database migration, is vital for excelling in the DEA-
e
C01 exam.
These services are instrumental in facilitating smooth and
secure migration of data to AWS, making them indispensable
tools in your data engineering toolkit.
ent and Management

AWS DataSync:
Identity, and Compliance

n
1. DataSync Overview: AWS DataSync is a data transfer
service that simplifies, automates, and accelerates moving
data between on-premises storage systems and AWS
ly Asked Questions

storage services, as well as between AWS storage services.


2. High-Speed Data Transfer: DataSync uses a purpose-built
r Knowledge with Free Practice Questions

network protocol and parallel transfer to achieve high-speed


data transfer, significantly faster than traditional transfer
protocols like FTP and HTTP.
3. Automated Data Synchronization: It automates the
replication of data between NFS or SMB file systems,
Amazon S3 buckets, and Amazon EFS file systems.
4. Data Transfer Management: DataSync handles tasks like
scheduling, monitoring, and validating data transfers,
reducing the need for manual intervention and scripting.
5. Integration with AWS Storage Services: Works seamlessly
with AWS storage services like Amazon S3, Amazon EFS,
and Amazon FSx for Windows File Server.
6. Data Encryption and Integrity Checks: Encrypts data in
transit and performs data integrity checks both during and
after the transfer to ensure data is securely and accurately
se the menu below to navigate the article sections:

transferred.
cle menu
7. On-Premises to AWS Transfer: Ideal for moving large
volumes of data from on-premises storage into AWS for
e
processing, backup, or archiving.
8. AWS to AWS Transfer: Supports transferring data between
AWS storage services across different regions, useful for
data migration, replication for disaster recovery, and data
distribution.
ng

9. Bandwidth Throttling: Offers bandwidth throttling to


manage network bandwidth usage during data transfers.
ss

e 10. Agent Deployment: Requires the deployment of a DataSync


agent in the on-premises environment for communication
with AWS storage services.
11. Scheduled Transfers: Allows scheduling of data transfers,
enabling regular, automated synchronization of data.
ent and Management

12. Monitoring and Logging: Integrates with Amazon


CloudWatch for monitoring and AWS CloudTrail for logging,
Identity, and Compliance

providing visibility into data transfer operations.


13. Pricing: Charges based on the amount of data transferred,
n

with no minimum fees or setup costs.


ly Asked Questions

14. Use Cases: Commonly used for data migration, online data
transfer for analytics and processing, and disaster recovery.
r Knowledge with Free Practice Questions

15. Simple Setup and Configuration: Offers a simple interface


for setting up and configuring data transfer tasks, reducing
the complexity of data migration and synchronization.
AWS Database Migration Service (DMS):
DMS Overview: AWS Database Migration Service (DMS) is a
service that enables easy and secure migration of
databases to AWS, between on-premises instances, or
between different AWS cloud services.
Support for Various Database Types: DMS supports a wide
range of database platforms, including relational databases,
NoSQL databases, and data warehouses.
Minimal Downtime: DMS is designed to ensure minimal
se the menu below to navigate the article sections:

downtime during database migration, making it suitable for


cle menu
migrating production databases with minimal impact on
operations.
e
Data Replication: Apart from migration, DMS can also be
used for continuous data replication with high availability.
Schema Conversion: Works in conjunction with the AWS
Schema Conversion Tool (SCT) to convert the source
database schema and code to a format compatible with the
ng

target database.
Homogeneous and Heterogeneous Migrations: Supports
ss

e both homogeneous migrations (like Oracle to Oracle) and


heterogeneous migrations (like Oracle to Amazon Aurora).
Incremental Data Sync: Capable of syncing only the data
that has changed, which is useful for keeping the source
and target databases in sync during the migration process.
ent and Management

Secure Data Transfer: Ensures data security during


migration by encrypting data in transit.
Identity, and Compliance

Monitoring and Logging: Integrates with Amazon


CloudWatch and AWS CloudTrail for monitoring the
n

performance and auditing the migration process.


ly Asked Questions

Easy to Set Up and Use: Provides a simple-to-use interface


for setting up and managing database migrations.
r Knowledge with Free Practice Questions

Resilience and Scalability: Automatically manages the


replication and network resources required for migration,
scaling resources as needed to match the volume of data.
Change Data Capture (CDC): Supports CDC, capturing and
replicating ongoing changes to the database.
Pricing: Charges based on the compute resources used
during the migration process and the amount of data
transferred.
Use Cases: Commonly used for database migration
projects, including migrating from on-premises databases to
AWS, consolidating databases onto AWS, and migrating
between different AWS database services.
Endpoint Compatibility: Supports various source and target
endpoints, including Amazon RDS, Amazon Redshift,
se the menu below to navigate the article sections:

cle menu Amazon DynamoDB, and other non-AWS database services.


e Frequently Asked Questions
What are some core AWS services that aspiring
ng Data Engineers should focus on?
Aspiring Data Engineers should become comfortable with
services such as Amazon S3, AWS Glue, Amazon Kinesis,
ss

e Amazon EMR, and Amazon Redshift. These services form the


backbone of typical data pipelines and are central to the AWS
Certified Data Engineer Associate exam.
How can I use the cheat sheet to prepare for the
ent and Management

exam?
Identity, and Compliance

n
The cheat sheet is designed as a quick reference guide to
reinforce your understanding of core concepts and services.
Use it alongside hands-on practice and full-length practice
ly Asked Questions

exams. For additional study resources, visit the AWS


Certification Training page.
r Knowledge with Free Practice Questions

Does the cheat sheet include tips for managing


data pipelines on AWS?
Yes, the cheat sheet highlights best practices for designing
and managing data pipelines, including the use of AWS Glue
for ETL (Extract, Transform, Load) processes and Amazon
Kinesis for real-time data streaming.
Are there any cost optimization strategies
mentioned in the cheat sheet?
The cheat sheet briefly touches on cost optimization
techniques, such as selecting the right storage class in
se the menu below to navigate the article sections:

Amazon S3 and using Amazon Redshift’s concurrency scaling


cle menu feature. For a comprehensive guide on cost management, visit
the AWS Cost Management Blog.
Can I find practice questions or mock exams
e

related to the cheat sheet content?


ng
While the cheat sheet itself is a summary tool, Digital Cloud
Training offers full-length practice exams and quizzes tailored
to the AWS Certified Data Engineer Associate exam. Check out
ss

e the AWS Practice Exams section to test your knowledge.


Test your Knowledge with Free Practice
Questions
ent and Management

Get access to FREE practice questions and check out the


Identity, and Compliance

difficulty of AWS Certified Data Engineer Associate (DEA-C01)


n
exam questions for yourself:
https://learn.digitalcloud.training/course/free-aws-certified-
data-engineer-associate-practice-exam
ly Asked Questions

Related posts:
r Knowledge with Free Practice Questions
se the menu below to navigate the article sections:

cle menu
Categories: AWS Cheat Sheets, AWS Data Engineer Associate
e

Responses
ng

ss
Your email address will not be published. Required fields are marked *
e Write a response...

ent and Management

Name *
Identity, and Compliance

Email *
ly Asked Questions

r Knowledge with Free Practice Questions

Website

Publish
AWS Training AWS Certifications
Live Virtual Bootcamps AWS Cloud Practitioner
Monthly
se the menu |below
Yearly to
Plans
navigate the article sections: AWS Solutions Architect
Hands-on Challenge Labs AWS Developer Associate
Training for Businesses
cle menu AWS SysOps Administrator
AWS Books for Offline Study AWS Solutions Architect PRO
e
Find Answers Connect
Getting Started with AWS About us
ng Knowledge Hub Newsletter
ss
Cheat Sheets Contact us
FAQ Submit Feedback
e Join our Slack Channels Join our Team

Get the Free Beginner's Guide to AWS Certification


ent and Management

First name* Last name*


Identity, and Compliance

n Email* Tunis +216

By submitting this form, you agree to receive communications from us, as outlined in our
ly Asked Questions
Privacy Policy. You can unsubscribe anytime.

r Knowledge with Free Practice Questions


Access Free Guide

Follow Terms
LinkedIn Terms of Service
Youtube Privacy Policy
Facebook Refund Policy
Twitter Sitemap
Instagram © 2025 Digital Cloud Training

se the menu below to navigate the article sections:

cle menu

ng

ss

ent and Management

Identity, and Compliance

ly Asked Questions

r Knowledge with Free Practice Questions

You might also like