100% found this document useful (2 votes)

1K views23 pages

AWS Associate Data Engineer

The document contains 12 multiple choice questions about AWS data engineering concepts and services. The questions cover topics like AWS Glue, Amazon S3, AWS Lambda, Amazon Athena, data pipelines, data formats and compression.

Uploaded by

Harsh Mendapara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

1K views23 pages

AWS Associate Data Engineer

Uploaded by

Harsh Mendapara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

AWS Certified Data Engineer Associate Exam (DEA-C01) Dumps

1) A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data
engineer has set up the necessary AWS Glue connection details and an associated IAM role.
However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an
error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.

The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket. Which
solution will meet this requirement?

A. Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC
gateway endpoint.

B. Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access
the S3 bucket.

C. Review the AWS Glue job code to ensure that the AWS Glue connection details include a
fully qualified domain name.

D. Verify that the VPC's route table includes inbound and outbound routes for the Amazon
S3 VPC gateway endpoint.

2) A retail company has a customer data hub in an Amazon S3 bucket. Employees from many
countries use the data hub to support company-wide analytics A governance team must ensure that
the company's data analysts can access data only for customers who are within the same country as
the analysts. Which solution will meet these requirements with the LEAST operational effort?

A. Create a separate table for each country's customer data. Provide access to each analyst
based on the country that the analyst serves.

B. Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake
Formation row-level security features to enforce the company's access policies.

C. Move the data to AWS Regions that are close to the countries where the customers are.
Provide access to each analyst based on the country that the analyst serves.

D. Load the data into Amazon Redshift. Create a view for each country. Create separate IAM
roles for each country to provide access to data from each country. Assign the appropriate
roles to the analysts.
3) A media company wants to improve a system that recommends media content to customer based
on user behaviour and preferences. To improve the recommendation system, the company needs to
incorporate insights from third-party datasets into the company's existing analytics platform. The
company wants to minimize the effort and time required to incorporate third-party datasets. Which
solution will meet these requirements with the LEAST operational overhead?

A. Use API calls to access and integrate third-party datasets from AWS Data Exchange.

B. Use API calls to access and integrate third-party datasets from AWS DataSync.

C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS
CodeCommit repositories.

D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from
Amazon Elastic Container Registry (Amazon ECR).

4) A financial company wants to implement a data mesh. The data mesh must support centralized
data governance, data analysis, and data access control. The company has decided to use AWS Glue
for data catalogs and extract, transform, and load (ETL) operations.

Which combination of AWS services will implement a data mesh? (Choose two.)

A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data
analysis.

B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.

C. Use AWS Glue Data Brew for centralized data governance and access control. D. Use
Amazon RDS for data storage. Use Amazon EMR for data analysis.

E. Use AWS Lake Formation for centralized data governance and access control.

5) A data engineer maintains custom Python scripts that perform a data formatting process that
many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the
data engineer must manually update all the Lambda functions. The data engineer requires a less
manual way to update the Lambda functions. Which solution will meet this requirement?

A. Store a pointer to the custom Python scripts in the execution context object in a shared
Amazon S3 bucket.

B. Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the
Lambda functions.

C. Store a pointer to the custom Python scripts in environment variables in a shared Amazon
S3 bucket.

D. Assign the same alias to each Lambda function. Call reach Lambda function by specifying
the function's alias.
6) A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data
engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract,
transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must
orchestrate the data pipeline.

Which AWS service or feature will meet these requirements MOST cost-effectively?

A. AWS Step Functions

B. AWS Glue workflows

C. AWS Glue Studio

D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

7) A financial services company stores financial data in Amazon Redshift. A data engineer wants to
run real-time queries on the financial data to support a web-based trading application. The data
engineer wants to run the queries from within the trading application.

Which solution will meet these requirements with the LEAST operational overhead?

A. Establish WebSocket connections to Amazon Redshift.

B. Use the Amazon Redshift Data API.

C. Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.

D. Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

8) A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The
company has several use cases. The company must implement permission controls to separate
query processes and access to query history among users, teams, and applications that are in the
same AWS account. Which solution will meet these requirements?

A. Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions
to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.

B. Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an
IAM policy that uses the tags to apply appropriate permissions to the workgroup.

C. Create an IAM role for each use case. Assign appropriate permissions to the role for each
use case. Associate the role with Athena.

D. Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate
individual IAM users for each use case. Apply the resource policy to the specific tables that
Athena uses.
9) A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The
data engineer does not require the Glue jobs to run or finish at a specific time.

Which solution will run the Glue jobs in the MOST cost-effective way?

A. Choose the FLEX execution class in the Glue job properties

B. Use the Spot Instance type in Glue job properties.

C. Choose the STANDARD execution class in the Glue job properties.

D. Choose the latest version in the GlueVersion field in the Glue job properties.

10) data engineer needs to create an AWS Lambda function that converts the format of data
from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an
Amazon S3 bucket. Which solution will meet these requirements with the LEAST operational
overhead?

A. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter
rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource
Name (ARN) of the Lambda function as the destination for the event notification.

B. Create an $3 event notification that has an event type of s3:ObjectTagging:* for objects
that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as
the destination for the event notification.

C. Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate
notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the
Lambda function as the destination for the event notification.

D. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter
rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple
Notification Service (Amazon SNS) topic as the destination for the event notification.
Subscribe the Lambda function to the SNS topic.

11) A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all
the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer
also notices that users perform most queries by selecting a specific column. Which solution will
MOST speed up the Athena query performance?

A. Change the data format from .csv to JSON format. Apply Snappy compression.

B. Compress the .csv files by using Snappy compression.

C. Change the data format from .csv to Apache Parquet. Apply Snappy compression.

D. Compress the .csv files by using gzip compression.

12) A manufacturing company collects sensor data from its factory floor to monitor and enhance
operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the
sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon
S3 bucket.

The company needs to display a real-time view of operational efficiency on a large screen in the
manufacturing facility.

Which solution will meet these requirements with the LOWEST latency?

A. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis
Data Analytics) to process the sensor data. Use a connector for Apache Flink to write data
to an Amazon Timestream database. Use the Timestream database as a source to create a
Grafana dashboard.

B. Configure the S3 bucket to send a notification to an AWS Lambda function when any new
object is created. Use the Lambda function to publish the data to Amazon Aurora. Use
Aurora as a source to create an Amazon QuickSight dashboard.

C. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data
Analytics) to process the sensor data. Create a new Data Firehose delivery stream to publish
data directly to an Amazon Timestream database. Use the Timestream database as a source
to create an Amazon QuickSight dashboard.

D. Use AWS Glue bookmarks to read sensor data from the S3 bucket in real time. Publish the
data to an Amazon Timestream database. Use the Timestream database as a source to
create a Grafana dashboard.

13) A company stores daily records of the financial performance of investment portfolios in .csv
format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.

The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog. Which
solution will meet these requirements?

A. Create an IAM role that includes the Amazon S3 Full Access policy. Associate the role with
the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create
a daily schedule to run the crawler. Configure the output destination to a new path in the
existing S3 bucket.

B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role
with the crawler. Specify the S3 bucket path of the source data as the crawler's data store.
Create a daily schedule to run the crawler. Specify a database name for the output.

C. Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with
the crawler. Specify the S3 bucket path of the source data as the crawler's data store.
Allocate data processing units (DPUs) to run the crawler every day. Specify a database name
for the output.

D. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with
the crawler. Specify the 53 bucket path of the source data as the crawler's data store.
Allocate data processing units (DPUs) to run the crawler every day. Configure the output
destination to a new path in the existing S3 bucket.
14) A company loads transaction data for each day into Amazon Redshift tables at the end of each
day. The company wants to have the ability to track which tables have been loaded and which tables
still need to be loaded.

A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table.
The data engineer creates an AWS Lambda function to publish the details of the load statuses to
DynamoDB.

How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB
table?

A. Use a second Lambda function to invoke the first Lambda function based on Amazon
CloudWatch events.

B. Use the Amazon Redshift Data API to publish an event to Amazon EventBridge.
Configure an EventBridge rule to invoke the Lambda function.

C. Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue
Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.

D. Use a second Lambda function to invoke the first Lambda function based on AWS
CloudTrail events.

15) A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an
Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be
regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data
engineer needs to automate the transfer process and must schedule the process to run periodically.
Which AWS service should the data engineer use to transfer the data in the MOST operationally
efficient way?

A. AWS DataSync

B. AWS Glue

C. AWS Direct Connect

D. Amazon S3 Transfer Acceleration

16) A company uses an on-premises Microsoft SQL Server database to store financial transaction
data. The company migrates the transaction data from the on-premises database to AWS at the end
of each month. The company has noticed that the cost to migrate data from the on-premises
database to an Amazon RDS for SQL Server database has increased recently. The company requires a
cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for
the applications that access the database. Which AWS service should the company use to meet
these requirements?

A. AWS Lambda

B. AWS Database Migration Service (AWS DMS)

C. AWS Direct Connect

D. AWS DataSync
17) A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and
load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform
transformations, and load the transformed data into Amazon Redshift for analytics. The data
updates must occur every hour.

Which combination of tasks will meet these requirements with the LEAST operational overhead?
(Choose two.)

A. Configure AWS Glue triggers to run the ETL jobs every hour.

B. Use AWS Glue DataBrew to clean and prepare the data for analytics.

C. Use AWS Lambda functions to schedule and run the ETL jobs every hour.

D. Use AWS Glue connections to establish connectivity between the data sources and
Amazon Redshift.

E. Use the Redshift Data API to load transformed data into Amazon Redshift.

18) A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to
scale read and write capacity to meet demand. A data engineer needs to identify a solution that will
turn on concurrency scaling. Which solution will meet this requirement?

A. Turn on concurrency scaling in workload management (WLM) for Redshift Serverless

workgroups.

B. Turn on concurrency scaling at the workload management (WLM) queue level in the
Redshift cluster.

C. Turn on concurrency scaling in the settings during the creation of any new Redshift
cluster.

D. Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

19) A data engineer must orchestrate a series of Amazon Athena queries that will run every day.
Each query can run for more than 15 minutes.

Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)

A. Use an AWS Lambda function and the Athena Boto3 client start_query_execution API
call to invoke the Athena queries programmatically.

B. Create an AWS Step Functions workflow and add two states. Add the first state before
the Lambda function. Configure the second state as a Wait state to periodically check
whether the Athena query has finished using the Athena Boto3 get_query_execution API
call. Configure the workflow to invoke the next query when the current query has finished
running.

C. Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API
call to invoke the Athena queries programmatically.
D. Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to
determine whether the current Athena query has finished running successfully. Configure
the Python shell script to invoke the next query when the current query has finished
running.

E. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the
Athena queries in AWS Batch.

20) A company is migrating on-premises workloads to AWS. The company wants to reduce overall
operational overhead. The company also wants to explore serverless options.

The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and
Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must
maintain similar or better performance after the migration to AWS.

Which extract, transform, and load (ETL) service will meet these requirements?

A. AWS Glue

B. Amazon EMR

C. AWS Lambda

D. Amazon Redshift

21) A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data
engineer profiles the dataset and discovers that the dataset contains personally identifiable

information (PII). The data engineer must implement a solution to profile the dataset and obfuscate
the PII.

Which solution will meet this requirement with the LEAST operational effort?

A. Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an
AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set
the S3 data lake as the target for the delivery stream.

B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use
an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into
the S3 data lake.

C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS
Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to
orchestrate a data pipeline to ingest the data into the S3 data lake.

D. Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify
and obfuscate the PII in the DynamoDB table and to transform the data. Use the same
Lambda function to ingest the data into the S3 data lake.
22) A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from
the company's operational databases into an Amazon S3 based data lake. The ETL workflows use
AWS Glue and Amazon EMR to process data.

The company wants to improve the existing architecture to provide automated orchestration and to
require minimal manual effort.

Which solution will meet these requirements with the LEAST operational overhead?

A. AWS Glue workflows

B. AWS Step Functions tasks

C. AWS Lambda functions

D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows

23) A company currently stores all of its data in Amazon S3 by using the S3 Standard storage class. A
data engineer examined data access patterns to identify trends. During the first 6 months, most data
files are accessed several times each day. Between 6 months and 2 years, most data files are
accessed once or twice each month. After 2 years, data files are accessed only once or twice each
year.

The data engineer needs to use an S3 Lifecycle policy to develop new data storage rules. The new
storage solution must continue to provide high availability.

Which solution will meet these requirements in the MOST cost-effective way?

A. Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months.
Transfer objects to S3 Glacier Flexible Retrieval after 2 years.

B. Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months.

Transfer objects to S3 Glacier Flexible Retrieval after 2 years.

C. Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months.

Transfer objects to S3 Glacier Deep Archive after 2 years.

D. Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months.
Transfer objects to S3 Glacier Deep Archive after 2 years.

24) A company maintains an Amazon Redshift provisioned cluster that the company uses for extract,
transform, and load (ETL) operations to support critical analysis tasks. A sales team within the
company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.
The sales team recently requested access to the data that is in the ETL Redshift cluster so the team
can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster
with data that is in the sales team's Bl cluster.

The company needs a solution that will share the ETL cluster data with the sales team without
interrupting the critical analysis tasks. The solution must minimize usage of the computing resources
of the ETL cluster.
Which solution will meet these requirements?

A. Set up the sales team BI cluster as a consumer of the ETL cluster by using Redshift data
sharing.

B. Create materialized views based on the sales team's requirements. Grant the sales team
direct access to the ETL cluster.

C. Create database views based on the sales team's requirements. Grant the sales team
direct access to the ETL cluster.

D. Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week.
Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

25) A data engineer needs to join data from multiple sources to perform a one-time analysis job. The
data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.

Which solution will meet this requirement MOST cost-effectively?

A. Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join
the data and perform the analysis.

B. Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run
Amazon Athena queries directly on the S3 files.

C. Use Amazon Athena Federated Query to join the data from all data sources.

D. Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3
directly from Redshift.

26) A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to
perform big data analysis. The company requires high reliability. A big data team must follow best
practices for running cost-optimized and long-running workloads on Amazon EMR. The team must
find a solution that will maintain the company's current level of performance.

Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

A. Use Hadoop Distributed File System (HDFS) as a persistent data store.

B. Use Amazon S3 as a persistent data store.

C. Use x86-based instances for core nodes and task nodes.

D. Use Graviton instances for core nodes and task nodes.

E. Use Spot Instances for all primary nodes.

27) A company wants to implement real-time analytics capabilities. The company wants to use
Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate
of several gigabytes per second. The company wants to derive near real-time insights by using
existing business intelligence (BI) and analytics tools.

Which solution will meet these requirements with the LEAST operational overhead?

A. Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load
data from Amazon S3 directly into Amazon Redshift to make the data immediately available
for real-time analysis.

B. Access the data from Kinesis Data Streams by using SQL queries. Create materialized
views directly on top of the stream. Refresh the materialized views regularly to query the
most recent stream data.

C. Create an external schema in Amazon Redshift to map the data from Kinesis Data
Streams to an Amazon Redshift object. Create a materialized view to read data from the
stream. Set the materialized view to auto refresh.

D. Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose
to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to
a table in Amazon Redshift.

28) A company uses an Amazon QuickSight dashboard to monitor usage of one of the company's
applications. The company uses AWS Glue jobs to process data for the dashboard. The company
stores the data in a single Amazon S3 bucket. The company adds new data every day.

A data engineer discovers that dashboard queries are becoming slower over time. The data engineer
determines that the root cause of the slowing queries is long-running AWS Glue jobs.

Which actions should the data engineer take to improve the performance of the AWS Glue jobs?
(Choose two.)

A. Partition the data that is in the S3 bucket. Organize the data by year, month, and day.

B. Increase the AWS Glue instance size by scaling up the worker type.

C. Convert the AWS Glue schema to the DynamicFrame schema class.

D. Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.

E. Modify the IAM role that grants access to AWS glue to grant access to all S3 features.

29) A data engineer needs to use AWS Step Functions to design an orchestration workflow. The
workflow must parallel process a large collection of data files and apply a specific transformation to
each file. Which Step Functions state should the data engineer use to meet these requirements?

A. Parallel state

B. Choice state

C. Map state

D. Wait state
30) A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer
reviewed data that is associated with the legacy application. The data engineer found that the legacy
data contained some duplicate information.

The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

A. Write a custom extract, transform, and load (ETL) job in Python. Use the
DataFrame.drop_duplicates() function by importing the Pandas library to perform data
deduplication.

B. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine
learning (ML) transform to transform the data to perform data deduplication.

C. Write a custom extract, transform, and load (ETL) job in Python. Import the Python
dedupe library. Use the dedupe library to perform data deduplication.

D. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe
library. Use the dedupe library to perform data deduplication.

31) A company is building an analytics solution. The solution uses Amazon S3 for data lake storage
and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum
to query the data that is in Amazon S3.

Which actions will provide the FASTEST queries? (Choose two.)

A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5
GB.

B. Use a columnar storage file format.

C. Partition the data based on the most common query predicates.

D. Split the data into files that are less than 10 KB.

E. Use file formats that are not splittable.

32) A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance
in a private subnet. A developer wrote an AWS Lambda function with default settings to insert,
update, or delete data in the DB instance.

The developer needs to give the Lambda function the ability to connect to the DB instance privately
without using the public internet.

Which combination of steps will meet this requirement with the LEAST operational overhead?
(Choose two.)

A. Turn on the public access setting for the DB instance.

B. Update the security group of the DB instance to allow only Lambda function invocations on the
database port.

C. Configure the Lambda function to run in the same subnet that the DB instance uses.
D. Attach the same security group to the Lambda function and the DB instance. Include a self-
referencing rule that allows access through the database port.

E. Update the network ACL of the private subnet to include a self-referencing rule that allows access
through the database port.

33) A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs.
The APIs perform the functionality of the website. A data engineer needs to write a Python script
that can be occasionally invoked through API Gateway. The code must return results to API Gateway.
Which solution will meet these requirements with the LEAST operational overhead?

A. Deploy a custom Python script on an Amazon Elastic Container Service (Amazon ECS)
cluster.

B. Create an AWS Lambda Python function with provisioned concurrency.

C. Deploy a custom Python script that can integrate with API Gateway on Amazon Elastic
Kubernetes Service (Amazon EKS).

D. Create an AWS Lambda function. Ensure that the function is warm by scheduling an
Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock
events.

34) A company has a production AWS account that runs company workloads. The company's security
team created a security AWS account to store and analyze security logs from the production AWS
account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs.

The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security
AWS account.

Which solution will meet these requirements?

A. Create a destination data stream in the production AWS account. In the security AWS
account, create an IAM role that has cross-account permissions to Kinesis Data Streams in
the production AWS account.

B. Create a destination data stream in the security AWS account. Create an IAM role and a
trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a
subscription filter in the security AWS account.

C. Create a destination data stream in the production AWS account. In the production AWS
account, create an IAM role that has cross-account permissions to Kinesis Data Streams in
the security AWS account.

D. Create a destination data stream in the security AWS account. Create an IAM role and a
trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a
subscription filter in the production AWS account.
35) A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of
the data files are small, but other data files are tens of terabytes.

A data engineer must perform a change data capture (CDC) operation to identify changed data from
the data source. The data source sends a full snapshot as a JSON file every day and ingests the
changed data into the data lake.

Which solution will capture the changed data MOST cost-effectively?

A. Create an AWS Lambda function to identify the changes between the previous data and
the current data. Configure the Lambda function to ingest the changes into the data lake.

B. Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS
DMS) to write the changed data to the data lake.

C. Use an open source data lake format to merge the data source with the S3 data lake to
insert the new data and update the existing data.

D. Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless.
Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

36) A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The
Athena queries use AWS Glue Data Catalog as a metadata table.

The data engineer notices that the Athena query plans are experiencing a performance bottleneck.
The data engineer determines that the cause of the performance bottleneck is the large number of
partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and
reduce Athena query planning time.

Which solutions will meet these requirements? (Choose two.)

A. Create an AWS Glue partition index. Enable partition filtering.

B. Bucket the data based on a column that the data have in common in a WHERE clause of
the user query.

C. Use Athena partition projection based on the S3 bucket prefix.

D. Transform the data that is in the S3 bucket to Apache Parquet format.

E. Use the Amazon EMR S3DistCP utility to combine smaller objects in the 53 bucket into
larger objects.
37) A data engineer must manage the ingestion of real-time streaming data into AWS. The data
engineer wants to perform real-time analytics on the incoming streaming data by using time-based
aggregations over a window of up to 30 minutes. The data engineer needs a solution that is highly
fault tolerant. Which solution will meet these requirements with the LEAST operational overhead?

A. Use an AWS Lambda function that includes both the business and the analytics logic to
perform time-based aggregations over a window of up to 30 minutes for the data in Amazon
Kinesis Data Streams.

B. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data
Analytics) to analyze the data that might occasionally contain duplicates by using multiple
types of aggregations.

C. Use an AWS Lambda function that includes both the business and the analytics logic to
perform aggregations for a tumbling window of up to 30 minutes, based on the event
timestamp.

D. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis
Data Analytics) to analyze the data by using multiple types of aggregations to perform
time-based analytics over a window of up to 30 minutes.

38) A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose
SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2
instances that will cause data loss during the migration to the upgraded storage.

Which solution will meet these requirements with the LEAST operational overhead?

A. Create snapshots of the gp2 volumes. Create new gp3 volumes from the snapshots.
Attach the new gp3 volumes to the EC2 instances.

B. Create new gp3 volumes. Gradually transfer the data to the new gp3 volumes. When the
transfer is complete, mount the new gp3 volumes to the EC2 instances to replace the gp2
volumes.

C. Change the volume type of the existing gp2 volumes to gp3. Enter new values for
volume size, IOPS, and throughput.

D. Use AWS DataSync to create new gp3 volumes. Transfer the data from the original gp2
volumes to the new gp3 volumes.
39) A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL
Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must
export large data elements every day until the migration is complete. The data elements are the
result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics
team must store the data in Amazon S3.

Which solution will meet these requirements in the MOST operationally efficient way?

A. Create a view in the EC2 instance-based SQL Server databases that contains the required
data elements. Create an AWS Glue job that selects the data directly from the view and
transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run
every day.

B. Schedule SQL Server Agent to run a daily SQL query that selects the desired data elements
from the EC2 instance-based SQL Server databases. Configure the query to direct the
output .csv objects to an S3 bucket. Create an S3 event that invokes an AWS Lambda
function to transform the output format from .csv to Parquet.

C. Use a SQL query to create a view in the EC2 instance-based SQL Server databases that
contains the required data elements. Create and run an AWS Glue crawler to read the
view. Create an AWS Glue job that retrieves the data and transfers the data in Parquet
format to an S3 bucket. Schedule the AWS Glue job to run every day.

D. Create an AWS Lambda function that queries the EC2 instance-based databases by using
Java Database Connectivity (JDBC). Configure the Lambda function to retrieve the required
data, transform the data into Parquet format, and transfer the data into an S3 bucket. Use
Amazon EventBridge to schedule the Lambda function to run every day.

40) A data engineering team is using an Amazon Redshift data warehouse for operational reporting.
The team wants to prevent performance issues that might result from long-running queries. A data

engineer must choose a system table in Amazon Redshift to record anomalies when a query
optimizer identifies conditions that might indicate performance issues.

Which table views should the data engineer use to meet this requirement?

A. STL_USAGE_CONTROL

B. STL_ALERT_EVENT_LOG

C. STL_QUERY_METRICS

D. STL_PLAN_INFO
41) A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3
data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on
one or two columns of the dataset. The data analysts rarely query the entire file.

Which solution will meet these requirements MOST cost-effectively?

A. Use an AWS Glue PySpark job to ingest the source data into the data lake in .csv format.

B. Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured
data source. Configure the job to ingest the data into the data lake in JSON format.

C. Use an AWS Glue PySpark job to ingest the source data into the data lake in Apache Avro
format.

D. Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv
structured data source. Configure the job to write the data into the data lake in Apache
Parquet format.

42) A company has five offices in different AWS Regions. Each office has its own human resources
(HR) department that uses a unique IAM role. The company stores employee records in a data lake
that is based on Amazon S3 storage.

A data engineering team needs to limit access to the records. Each HR department should be able to
access records for only employees who are within the HR department's Region.

Which combination of steps should the data engineering team take to meet this requirement with
the LEAST operational overhead? (Choose two.)

A. Use data filters for each Region to register the S3 paths as data locations.

B. Register the S3 path as an AWS Lake Formation location.

C. Modify the IAM roles of the HR departments to add a data filter for each department's
Region.

D. Enable fine-grained access control in AWS Lake Formation. Add a data filter for each
Region.

E. Create a separate S3 bucket for each Region. Configure an IAM policy to allow $3 access.
Restrict access based on Region.

43) A company uses AWS Step Functions to orchestrate a data pipeline. The pipeline consists of
Amazon EMR jobs that ingest data from data sources and store the data in an Amazon S3 bucket.
The pipeline also includes EMR jobs that load the data to Amazon Redshift.

The company's cloud infrastructure team manually built a Step Functions state machine. The cloud
infrastructure team launched an EMR cluster into a VPC to support the EMR jobs. However, the
deployed Step Functions state machine is not able to run the EMR jobs.
Which combination of steps should the company take to identify the reason the Step Functions state
machine is not able to run the EMR jobs? (Choose two.)

A. Use AWS CloudFormation to automate the Step Functions state machine deployment.
Create a step to pause the state machine during the EMR jobs that fail. Configure the step to
wait for a human user to send approval through an email message. Include details of the
EMR task in the email message for further analysis.

B. Verify that the Step Functions state machine code has all IAM permissions that are
necessary to create and run the EMR jobs. Verify that the Step Functions state machine
code also includes IAM permissions to access the Amazon S3 buckets that the EMR jobs
use. Use Access Analyzer for S3 to check the S3 access properties.

C. Check for entries in Amazon CloudWatch for the newly created EMR cluster. Change the
AWS Step Functions state machine code to use Amazon EMR on EKS. Change the IAM access
policies and the security group configuration for the Step Functions state machine code to
reflect inclusion of Amazon Elastic Kubernetes Service (Amazon ĔKS).

D. Query the flow logs for the VPC. Determine whether the traffic that originates from the
EMR cluster can successfully reach the data providers. Determine whether any security
group that might be attached to the Amazon EMR cluster allows connections to the data
source servers on the informed ports.

E. Check the retry scenarios that the company configured for the EMR jobs. Increase the
number of seconds in the interval between each EMR task. Validate that each fallback state
has the appropriate catch for each decision state. Configure an Amazon Simple Notification
Service (Amazon SNS) topic to store the error messages.

44) A company is developing an application that runs on Amazon EC2 instances. Currently, the data
that the application generates is temporary. However, the company needs to persist the data, even
if the EC2 instances are terminated.

A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and
configure the instances to preserve the data.

Which solution will meet this requirement?

A. Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume
that contains the application data. Apply the default settings to the EC2 instances.

B. Launch new EC2 instances by using an AMI that is backed by a root Amazon Elastic Block
Store (Amazon EBS) volume that contains the application data. Apply the default settings
to the EC2 instances.

C. Launch new EC2 instances by using an AMI that is backed by an EC2 instance store
volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the
application data. Apply the default settings to the EC2 instances.

D. Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic Block
Store (Amazon EBS) volume. Attach an additional EC2 instance store volume to contain the
application data. Apply the default settings to the EC2 instances.
45) A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks
by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to
generate analytics.

Which solution will give the company the ability to use Spark to access Athena?

A. Athena query settings

B. Athena workgroup

C. Athena data source

D. Athena query editor

46) A company needs to partition the Amazon S3 storage that the company uses for a data lake. The
partitioning will use a path of the S3 object keys in the following format:

s3://bucket/prefix/year=2023/month=01/day=01.

A data engineer must ensure that the AWS Glue Data Catalog synchronizes with the S3 storage when
the company adds new partitions to the bucket.

Which solution will meet these requirements with the LEAST latency?

A. Schedule an AWS Glue crawler to run every morning.

B. Manually run the AWS Glue CreatePartition API twice each day.

C. Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition
API call.

D. Run the MSCK REPAIR TABLE command from the AWS Glue console.

47) A media company uses software as a service (SaaS) applications to gather data by using third-
party tools. The company needs to store the data in an Amazon S3 bucket. The company will use
Amazon Redshift to perform analytics based on the data.

Which AWS service or feature will meet these requirements with the LEAST operational overhead?

A. Amazon Managed Streaming for Apache Kafka (Amazon MSK)

B. Amazon AppFlow

C. AWS Glue Data Catalog

D. Amazon Kinesis
48) A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data
engineer writes a query to retrieve sales amounts for 2023 for several products from a table named
sales_data. However, the query does not return results for all of the products that are in the
sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.

The data engineer's original query is as follows:

SELECT product_name, sum(sales_amount)

FROM sales_data-

WHERE year = 2023 -

GROUP BY product_name -

How should the data engineer modify the Athena query to meet these requirements?

A. Replace sum(sales_amount) with count(*) for the aggregation.

B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023.

C. Add HAVING sum(sales_amount) > 0 after the GROUP BY clause.

D. Remove the GROUP BY clause.

49) A data engineer has a one-time task to read data from objects that are in Apache Parquet format
in an Amazon S3 bucket. The data engineer needs to query only one column of the data.

Which solution will meet these requirements with the LEAST operational overhead?

A. Configure an AWS Lambda function to load data from the S3 bucket into a pandas
dataframe. Write a SQL SELECT statement on the dataframe to query the required column.

B. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the
S3 objects.

C. Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the
required column.

D. Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon
Athena to query the required column.

50) A company uses Amazon Redshift for its data warehouse. The company must automate refresh
schedules for Amazon Redshift materialized views.

Which solution will meet this requirement with the LEAST effort?

A. Use Apache Airflow to refresh the materialized views.

B. Use an AWS Lambda user-defined function (UDF) within Amazon Redshift to refresh the
materialized views.

C. Use the query editor v2 in Amazon Redshift to refresh the materialized views.

D. Use an AWS Glue workflow to refresh the materialized views.

51) A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and
one AWS Glue job. The solution must integrate with AWS services.

Which solution will meet these requirements with the LEAST management overhead?

A. Use an AWS Step Functions workflow that includes a state machine. Configure the state
machine to run the Lambda function and then the AWS Glue job.

B. Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance. Define a
directed acyclic graph (DAG) in which the first task is to call the Lambda function and the
second task is to call the AWS Glue job.

C. Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.

D. Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service
(Amazon EKS). Define a directed acyclic graph (DAG) in which the first task is to call the
Lambda function and the second task is to call the AWS Glue job.

52) A company needs to set up a data catalog and metadata management for data sources that run
in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects
that are in a set of data stores. The data stores include structured sources such as Amazon RDS and
Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml
files that are stored in Amazon S3.

The company needs a solution that will update the data catalog on a regular basis. The solution also
must detect changes to the source metadata.

Which solution will meet these requirements with the LEAST operational overhead?

A. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect
to the data catalog. Configure the Lambda functions to gather the metadata information
from multiple sources and to update the Aurora data catalog. Schedule the Lambda
functions to run periodically.

B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue
crawlers to connect to multiple data stores and to update the Data Catalog with metadata
changes. Schedule the crawlers to run periodically to update the metadata catalog.

C. Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will
connect to the data catalog. Configure the Lambda functions to gather the metadata
information from multiple sources and to update the DynamoDB data catalog. Schedule the
Lambda functions to run periodically.

D. Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema
for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue
crawlers for data that is in Amazon S3 to infer the schema and to automatically update the
Data Catalog
53) A company stores data from an application in an Amazon DynamoDB table that operates in
provisioned capacity mode. The workloads of the application have predictable throughput load on a
regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The
application has very low usage during weekends.

The company must ensure that the application performs consistently during peak usage times.
Which solution will meet these requirements in the MOST cost-effective way?

A. Increase the provisioned capacity to the maximum capacity that is currently present
during peak load times.

B. Divide the table into two tables. Provision each table with half of the provisioned capacity
of the original table. Spread queries evenly across both tables.

C. Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak
usage times. Schedule lower capacity during off-peak times.

D. Change the capacity mode from provisioned to on-demand. Configure the table to scale
up and scale down based on the load on the table.

Dva-C02 1
No ratings yet
Dva-C02 1
27 pages
80 Mock Questions - Aws Certified Data Engineer Associate
100% (1)
80 Mock Questions - Aws Certified Data Engineer Associate
33 pages
Cheat Sheet AWS Data Engineer Associate
No ratings yet
Cheat Sheet AWS Data Engineer Associate
117 pages
Certified Developer Slides 1462373421
50% (2)
Certified Developer Slides 1462373421
170 pages
E-Book AWS+DevOps
No ratings yet
E-Book AWS+DevOps
17 pages
Interview Question
No ratings yet
Interview Question
24 pages
Its2 Arcsight Poc 1.0
No ratings yet
Its2 Arcsight Poc 1.0
14 pages
AWS DE Certification Guide 1728124415
No ratings yet
AWS DE Certification Guide 1728124415
112 pages
AWS Certified Solutions Architect Associate Dump Questions
No ratings yet
AWS Certified Solutions Architect Associate Dump Questions
304 pages
Airflow - Notes
No ratings yet
Airflow - Notes
82 pages
AWS Certified Solutions Architect Practice Tests B096T3NJTT
No ratings yet
AWS Certified Solutions Architect Practice Tests B096T3NJTT
324 pages
Aws Cli Book
No ratings yet
Aws Cli Book
24 pages
TM-2268 AVEVA (12 Series) .Net Customisation Training Rev 4.0
No ratings yet
TM-2268 AVEVA (12 Series) .Net Customisation Training Rev 4.0
74 pages
AWS Certified Solutions Architect - Associate SAA-C03 Exam Questions and Answers
No ratings yet
AWS Certified Solutions Architect - Associate SAA-C03 Exam Questions and Answers
43 pages
AWS SAA-C03 Dumps Part 3
No ratings yet
AWS SAA-C03 Dumps Part 3
532 pages
Dva-C02 3
No ratings yet
Dva-C02 3
23 pages
Practice Test 1 Neal Davis Course
No ratings yet
Practice Test 1 Neal Davis Course
100 pages
Amazon: Questions & Answers
No ratings yet
Amazon: Questions & Answers
279 pages
Aws Saa c03 Part 5
No ratings yet
Aws Saa c03 Part 5
100 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&As, Page 1 - ExamTopics
No ratings yet
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&As, Page 1 - ExamTopics
97 pages
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
No ratings yet
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
34 pages
Advance AWS Interviewquestions
No ratings yet
Advance AWS Interviewquestions
39 pages
ExamTopicsSAA C03 2 6
No ratings yet
ExamTopicsSAA C03 2 6
427 pages
Aws Certified Cloud Practitioner - 9
100% (1)
Aws Certified Cloud Practitioner - 9
28 pages
AWS Dumps Part-1
No ratings yet
AWS Dumps Part-1
78 pages
AWS Certified Data Engineer Associate - Exam Guide
No ratings yet
AWS Certified Data Engineer Associate - Exam Guide
19 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
19 pages
AWS Certified Solutions Architect Associate (SAA-C03) Exam Notes - Coding N Concepts2
100% (1)
AWS Certified Solutions Architect Associate (SAA-C03) Exam Notes - Coding N Concepts2
58 pages
AWS Solutions Architect Associate Certification Notes
No ratings yet
AWS Solutions Architect Associate Certification Notes
20 pages
Terraform Certification Exam Questions FREE Guide Ed2
No ratings yet
Terraform Certification Exam Questions FREE Guide Ed2
30 pages
Dva-C02 8
No ratings yet
Dva-C02 8
24 pages
Answer: C
No ratings yet
Answer: C
24 pages
Amazon SAA-C03 Exam
No ratings yet
Amazon SAA-C03 Exam
9 pages
Online Movie Ticket System
No ratings yet
Online Movie Ticket System
70 pages
Documentation
No ratings yet
Documentation
48 pages
Shopping Cart Items Recommendation PDF
No ratings yet
Shopping Cart Items Recommendation PDF
8 pages
AWS Solutions Architect Associate Sample Questions SAA-C03
No ratings yet
AWS Solutions Architect Associate Sample Questions SAA-C03
5 pages
Sample Que Ans1
No ratings yet
Sample Que Ans1
194 pages
Google Passguide Associate-Cloud-Engineer Exam Dumps 2023-Jan-30 by Christopher 122q Vce
No ratings yet
Google Passguide Associate-Cloud-Engineer Exam Dumps 2023-Jan-30 by Christopher 122q Vce
11 pages
AWS Certified Developer Associate Study Plan PDF
No ratings yet
AWS Certified Developer Associate Study Plan PDF
6 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
AWS Test 3
No ratings yet
AWS Test 3
167 pages
Saa-C03 9
No ratings yet
Saa-C03 9
25 pages
Amazon: Exam Questions AWS-Certified-Solutions-Architect-Professional
No ratings yet
Amazon: Exam Questions AWS-Certified-Solutions-Architect-Professional
27 pages
AWS Part1
No ratings yet
AWS Part1
78 pages
Dea C01
No ratings yet
Dea C01
7 pages
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
From Everand
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
Tech Interviews
No ratings yet
AWS Basic Interview Questions
No ratings yet
AWS Basic Interview Questions
12 pages
Queue WORKSHEETS
No ratings yet
Queue WORKSHEETS
9 pages
AWS CloudFormation Basics
No ratings yet
AWS CloudFormation Basics
13 pages
Practice Test 6
No ratings yet
Practice Test 6
50 pages
Cloud CDL
No ratings yet
Cloud CDL
30 pages
What Is Microsoft Access
No ratings yet
What Is Microsoft Access
5 pages
Exam: SAA-C02 Name: AWS Certified Solutions Architect - Associate V9.01 118 Q&As Official Website
67% (3)
Exam: SAA-C02 Name: AWS Certified Solutions Architect - Associate V9.01 118 Q&As Official Website
6 pages
AWS EC2 Interview Questions - MindMajix
No ratings yet
AWS EC2 Interview Questions - MindMajix
27 pages
Authentic AWS-Solution-Architect-Associate Exam - Questions - Answers
0% (1)
Authentic AWS-Solution-Architect-Associate Exam - Questions - Answers
3 pages
AWS SA-A Summary - Notes
No ratings yet
AWS SA-A Summary - Notes
39 pages
AWS Certified Solutions Architect - Associate (SAA-C02) Exam Guide
100% (1)
AWS Certified Solutions Architect - Associate (SAA-C02) Exam Guide
8 pages
HTML
No ratings yet
HTML
86 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
WPstatsReport 052013 PDF
No ratings yet
WPstatsReport 052013 PDF
53 pages
Beginning Java EE 5 From Novice To Professional Beginning From Novice To Professional 1st Ed. 2006. Corr. 2nd Printing Edition Kevin Mukhar
100% (3)
Beginning Java EE 5 From Novice To Professional Beginning From Novice To Professional 1st Ed. 2006. Corr. 2nd Printing Edition Kevin Mukhar
55 pages
AWS Solution Architect Associate Dump4
No ratings yet
AWS Solution Architect Associate Dump4
13 pages
CSIT114-Week 6 - Domain Modelling
No ratings yet
CSIT114-Week 6 - Domain Modelling
49 pages
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
No ratings yet
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
9 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
3 pages
List of Content: AWS Notes Start Here
No ratings yet
List of Content: AWS Notes Start Here
8 pages
Microsoft AZURE® AZ-104 Administrator Practice Tests
From Everand
Microsoft AZURE® AZ-104 Administrator Practice Tests
iCertify Training
No ratings yet
DBMS Unit1
No ratings yet
DBMS Unit1
45 pages
AWS Solution Architect Associate Dump5
No ratings yet
AWS Solution Architect Associate Dump5
13 pages
Mobile Security PPT 1
No ratings yet
Mobile Security PPT 1
27 pages
Index: Mysql Apache Xampp
No ratings yet
Index: Mysql Apache Xampp
29 pages
Chapter 4-Python Database Interaction Using MongoDB
No ratings yet
Chapter 4-Python Database Interaction Using MongoDB
25 pages
Saga Pattern and Event Sourcing With Kafka: @rafabene
No ratings yet
Saga Pattern and Event Sourcing With Kafka: @rafabene
50 pages
Python Advance Workbook OWTP
No ratings yet
Python Advance Workbook OWTP
27 pages
Modèle Excel VSM-NG
No ratings yet
Modèle Excel VSM-NG
16 pages
Real SAA-C02 Exam Questions 2021
No ratings yet
Real SAA-C02 Exam Questions 2021
11 pages
Big Buckks: These Are The Who Change The World and Innovate The People!!!
No ratings yet
Big Buckks: These Are The Who Change The World and Innovate The People!!!
16 pages
Amazon Storage S3
No ratings yet
Amazon Storage S3
4 pages
Restdb Cheat Sheet
No ratings yet
Restdb Cheat Sheet
9 pages
Hadoop Backup and Recovery
No ratings yet
Hadoop Backup and Recovery
14 pages
SAA-C03 AWS Certified Solutions Architect - Associate Updated Dumps
No ratings yet
SAA-C03 AWS Certified Solutions Architect - Associate Updated Dumps
89 pages
Python +
No ratings yet
Python +
3 pages
AI Based Healthcare Chatbot System Using Natural Language Processing
No ratings yet
AI Based Healthcare Chatbot System Using Natural Language Processing
5 pages
Python Basic Workbook OWTP
No ratings yet
Python Basic Workbook OWTP
9 pages
Xii Preboard1 QP Set A
No ratings yet
Xii Preboard1 QP Set A
10 pages
Sentiment Analysis of COVID-19 Tweets - Visualization Dashboard
No ratings yet
Sentiment Analysis of COVID-19 Tweets - Visualization Dashboard
8 pages
HANA CalculationEngine CalculationScenarios 2.00.030+
No ratings yet
HANA CalculationEngine CalculationScenarios 2.00.030+
9 pages
Pooja R. Nayaka: ADDRESS: Vardhman Darshan A-102, Vishnu Nagar, at Post/Tal/Dist-Palghar (W) - 401404
No ratings yet
Pooja R. Nayaka: ADDRESS: Vardhman Darshan A-102, Vishnu Nagar, at Post/Tal/Dist-Palghar (W) - 401404
3 pages
AI Based Healthcare Chatbot System by Using Natural Language Processing
No ratings yet
AI Based Healthcare Chatbot System by Using Natural Language Processing
8 pages
CS405 Assignment 2 Solution Spring 2024
No ratings yet
CS405 Assignment 2 Solution Spring 2024
6 pages
Functionality Flow - Full Stack App
No ratings yet
Functionality Flow - Full Stack App
3 pages
EM MCQ Set1
No ratings yet
EM MCQ Set1
5 pages
Brief About DOS and Viruses
No ratings yet
Brief About DOS and Viruses
6 pages
Ejercicios Capitulo 2
No ratings yet
Ejercicios Capitulo 2
4 pages
Controlling Database Management Systems: Access Controls
No ratings yet
Controlling Database Management Systems: Access Controls
6 pages
Databases Homework 2 Normalisation
No ratings yet
Databases Homework 2 Normalisation
3 pages
Monster Resume
No ratings yet
Monster Resume
4 pages
03 02 Working With Various Data Sources - Handout - Understanding Normalization
No ratings yet
03 02 Working With Various Data Sources - Handout - Understanding Normalization
5 pages
Assignment 1 Dbms
No ratings yet
Assignment 1 Dbms
5 pages
PHP Star Rating System With JavaScript - Phppot
No ratings yet
PHP Star Rating System With JavaScript - Phppot
5 pages
Sanjay - Maity CV
No ratings yet
Sanjay - Maity CV
2 pages
Harsh Mendapara: Profile
No ratings yet
Harsh Mendapara: Profile
2 pages
Attendance Tracker: Mini-Project Report On
No ratings yet
Attendance Tracker: Mini-Project Report On
2 pages
Harsh Mendapara: Profile
No ratings yet
Harsh Mendapara: Profile
2 pages
Aws Index 1
No ratings yet
Aws Index 1
2 pages
Functionality Flow
No ratings yet
Functionality Flow
1 page
Imp Data
No ratings yet
Imp Data
1 page
Imp Data
No ratings yet
Imp Data
1 page
Emotxt: A Toolkit For Emotion Recognition From Text: Fabio Calefato, Filippo Lanubile, Nicole Novielli
No ratings yet
Emotxt: A Toolkit For Emotion Recognition From Text: Fabio Calefato, Filippo Lanubile, Nicole Novielli
2 pages
Page 2
No ratings yet
Page 2
1 page

AWS Associate Data Engineer

Uploaded by

AWS Associate Data Engineer

Uploaded by

AWS Certified Data Engineer Associate Exam (DEA-C01) Dumps

A. AWS Step Functions

B. AWS Glue workflows

C. AWS Glue Studio

D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

A. Establish WebSocket connections to Amazon Redshift.

B. Use the Amazon Redshift Data API.

C. Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.

A. Choose the FLEX execution class in the Glue job properties

B. Use the Spot Instance type in Glue job properties.

C. Choose the STANDARD execution class in the Glue job properties.

B. Compress the .csv files by using Snappy compression.

D. Compress the .csv files by using gzip compression.

C. AWS Direct Connect

D. Amazon S3 Transfer Acceleration

B. AWS Database Migration Service (AWS DMS)

C. AWS Direct Connect

A. Turn on concurrency scaling in workload management (WLM) for Redshift Serverless

A. AWS Glue workflows

B. AWS Step Functions tasks

C. AWS Lambda functions

D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows

B. Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months.

C. Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months.

Which solution will meet this requirement MOST cost-effectively?

A. Use Hadoop Distributed File System (HDFS) as a persistent data store.

B. Use Amazon S3 as a persistent data store.

C. Use x86-based instances for core nodes and task nodes.

D. Use Graviton instances for core nodes and task nodes.

E. Use Spot Instances for all primary nodes.

C. Convert the AWS Glue schema to the DynamicFrame schema class.

Which actions will provide the FASTEST queries? (Choose two.)

B. Use a columnar storage file format.

C. Partition the data based on the most common query predicates.

E. Use file formats that are not splittable.

A. Turn on the public access setting for the DB instance.

B. Create an AWS Lambda Python function with provisioned concurrency.

Which solution will meet these requirements?

Which solution will capture the changed data MOST cost-effectively?

Which solutions will meet these requirements? (Choose two.)

A. Create an AWS Glue partition index. Enable partition filtering.

C. Use Athena partition projection based on the S3 bucket prefix.

D. Transform the data that is in the S3 bucket to Apache Parquet format.

Which solution will meet these requirements MOST cost-effectively?

B. Register the S3 path as an AWS Lake Formation location.

Which solution will meet this requirement?

A. Athena query settings

C. Athena data source

D. Athena query editor

A. Schedule an AWS Glue crawler to run every morning.

A. Amazon Managed Streaming for Apache Kafka (Amazon MSK)

C. AWS Glue Data Catalog

The data engineer's original query is as follows:

SELECT product_name, sum(sales_amount)

WHERE year = 2023 -

A. Replace sum(sales_amount) with count(*) for the aggregation.

B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023.

C. Add HAVING sum(sales_amount) > 0 after the GROUP BY clause.

D. Remove the GROUP BY clause.

A. Use Apache Airflow to refresh the materialized views.

D. Use an AWS Glue workflow to refresh the materialized views.

You might also like