AWS Data Engg Exam MCQ
AWS Data Engg Exam MCQ
-- a) Create an EMR cluster with Kafka and Zookeeper configured. Use Spark
Streaming for real-time data processing of the Kafka streaming data and write the
output to the data lake.
A company using AWS wants to visualize data from the AWS Cost and Usage Report
to better understand their daily cloud spending. Which of the following solutions
meets the requirement most efficiently?
-- a) From the Billing and Cost Management console, create a Cost and Usage
Report. Select daily for time granularity and enable report data integration for
Quicksight. From the Quicksight Console, connect the dataset and choose the
visualize option to display the report's fields.
-- b) From the CloudWatch console, select the Billing, Total Estimated Charge metric
to display a graph of the billing metric data. Adjust the graph to display data
aggregated by the day.
-- c) From the Billing and Cost Management console, create a Cost and Usage
Report. Select daily for time granularity and enable report data integration for
CloudWatch. From the CloudWatch console, select the Billing, Total Estimated
Charge metric to display a graph of the billing metric data. Adjust the graph to
display data aggregated by the day.
-- d) From the Billing and Cost Management console, create a Cost and Usage
Report. Select daily for time granularity. Import the data into Elasticsearch for
processing and output results to an S3 bucket. From the Quicksight Console,
connect the dataset and choose the visualize option to display daily billing costs.
Question 3:
You want your team to receive email notifications when a row is deleted from a
DynamoDB table. To achieve this, you enabled streams on the DynamoDB table and
created a Lambda function to send out email notifications when a row is deleted.
The Lambda function has all the permissions required. What action is needed to
complete this application?
A mobile delivery service is using Amazon S3 to store its reporting data. Records
that are less than one year old are queried often using SQL, while older records are
rarely queried.
All records must be stored for 5 years. Which of the following describes the
approach to store and quickly query data with the lowest cost?
Options
-- a) Store all of the records in Amazon S3. Apply a lifecycle policy on the bucket to
change the storage class from S3 Standard to S3 Standard-Infrequent Access after
one year. Query using Redshift Spectrum.
-- b) Store all of the records in S3. Apply a lifecycle policy on the bucket to change the
storage class from S3 Standard to S3 Glacier after one year. Query using Redshift
Spectrum.
-- c) Store all of the records in S3. Apply a lifecycle policy on the bucket to change the
storage class from S3 Standard to S3 Glacier after one year. Restore archived
objects before querying using the standard retrieval option. Query using S3 Select.
-- d) Store all of the records in S3. Apply a lifecycle policy on the bucket to change the
storage class from S3 Standard to S3 Glacier after one year. Query using S3 Select
and Glacier Select.
Question 5:
-- a) Ingest sensor data using Kinesis Data Streams. Create an Amazon Managed
Service for Apache Flink application to read from the stream destination and perform
continuous queries. Create alarms based on the desired moisture threshold using
CloudWatch.
-- b) Ingest sensor data using Kinesis Data Streams. Create an Amazon Managed
Service for Apache Flink application to read from the stream destination, perform
continuous queries, and publish to an SNS topic if the desired moisture threshold is
breached.
-- c) Ingest sensor data using Kinesis Data Streams. Create an Amazon Managed
Service for Apache Flink application to read from the stream destination and perform
continuous queries. Output the real-time query results to AWS Lambda and publish
them to an SNS topic if the desired moisture threshold is breached.
-- d) Ingest sensor data using Kinesis Data Streams. Create an Amazon Managed
Service for Apache Flink application to read from the stream destination and perform
continuous queries. Output the real-time query results to Kinesis Firehose and
publish them to an SNS topic if the desired moisture threshold is breached.
Question 6:
A taxi company uses Amazon Kinesis Data Streams to determine trip pricing based
on real-time data.
During peak travel periods, the message size limitation of 1 MB results in many small
files, leading to downstream processing inefficiencies.
-- a) Configure Kinesis to compress the data before writing it to the data stream. Use
an AWS Lambda function to decompress the records and output them to Amazon
S3.
-- b) Configure an AWS Lambda function as the Kinesis data stream destination. Use
the Lambda function to aggregate the records and output them to Amazon S3.
-- c) Configure an AWS Lambda function as the Kinesis data stream destination. Use
the Lambda function to send data to an Amazon Data Firehose (formerly Kinesis
Data Firehose) delivery stream. Leverage compression in Kinesis Firehose to send
more data and deliver the records to Amazon S3.
-- d) Configure an AWS Lambda function as the Kinesis data stream destination. Use
the Lambda function to compress data and send it to an Amazon Data Firehose
(formerly Kinesis Data Firehose) delivery stream. Leverage buffering in Kinesis
Firehose to generate larger files and deliver the records to Amazon S3.
Question 7:
You have a set of unstructured data in an S3 bucket that you would like to analyze
with custom queries.
The overall project has the following requirements: Minimal administrative overhead,
and once the analysis is complete, no AWS resources such as tables or indexes
should be retained.
You are very familiar with SQL query syntax, and being able to apply it to this project
would be extremely efficient. Which AWS service should you use?
-- a) Amazon Athena
-- b) Amazon RDS
-- d) Amazon DocumentDB
Question 8:
You host a consumer Amazon Kinesis Client (KCL) application on EC2 instances that
reads the latest data from a Kinesis Data Stream and streams it to your customers.
A few customers have complained that the data seems inaccurate, so you visit
CloudWatch and notice that the data stream's GetRecords.IteratorAgeMilliseconds
and MillisBehindLast metrics have been gradually increasing over the last month.
What information can you deduce from this trend, and how would you attempt to
remediate the issue?
-- a) This trend indicates a transient problem, such as an API operation failure. Check
your application to see if enough GetRecords calls are being made to process the
volume of incoming data. If the value for GetRecords.Latency is also trending
upward, reduce the number of shards in your stream.
-- b) This trend indicates a consumer is not keeping up with the stream because it is
not processing records fast enough. Verify the
RecordProcessor.processRecords.Time metric correlates with increased application
throughput, then ensure your application has adequate memory and CPU utilization
on processing nodes during peak demand. If physical resources are sufficient,
increase the number of shards.
-- c) This trend indicates a transient problem, such as the record processor blocking
an operation. To verify whether the record processor is blocked, enable Advanced
KCL Warning Logs and set the logWarningForTaskAfterMillis value for the KCL
configuration to milliseconds. You can then view logs to see what is blocked using a
stack trace and remediate the issue.
-- d) This trend indicates a consumer is not keeping up with the stream because it is
not processing records fast enough. Check the values in CloudWatch for PutRecord
or PutRecords. If you see an AmazonKinesisException 500 or
AmazonKinesisException 503 error with a rate above 1% for several minutes,
calculate your internal error rate and implement a retry mechanism using the Kinesis
Producer Library.
Question 9:
-- b) Store all of the data in an Amazon Redshift cluster with views for the aggregated
data. Create an Amazon QuickSight dashboard using the Redshift cluster as the data
source. Embed the dashboard into the web application.
-- c) Store all of the data in an Amazon RDS for Postgres DB instance cluster with
views for the aggregated data. Use the RDS instance as a datastore for the web
application.
-- d) Store all of the data in a DynamoDB table. Create global secondary indexes as a
data access pattern for aggregation queries.
Question 10:
You built an application to aggregate and analyze data from streaming channels. You
leveraged Amazon Kinesis Data Streams with default settings and an AWS Lambda
function that processes the data every 36 hours.
A quick verification showed that data is missing over some periods of time.
-- b) Overprovisioning of shards
Due to the sensitive nature of patient health information, the hospital wants to
anonymize data before performing an analysis with it.
-- a) Create Lake Formation personas for data lake administrators, analysts, and their
IAM permissions. Label sensitive data in Lake Formation, anonymize raw data using
Athena, and restrict access to any non-anonymized data for analysts.
-- b) Create IAM groups for data lake administrators and analysts. Label sensitive
data in Lake Formation, anonymize raw data using Athena, and restrict analyst
access to non-anonymized data.
-- c) Create IAM groups for data lake administrators and analysts. Label sensitive
data in Lake Formation and encrypt the raw data using an AWS KMS customer
master key. Restrict access to the raw dataset for the data analyst persona.
-- d) Create Lake Formation personas for data lake administrators and analysts.
Label sensitive data using AWS resource tags for S3 and anonymize raw data using
Amazon Managed Service for Apache Flink. Restrict data analyst access to any
non-anonymized data.
Question 12:
A bank's web app is hosted on AWS Lambda. A new bank account is created after
validating a customer’s name and address. The workflow begins with two Lambda
functions CheckName and CheckAddress executing in parallel.
The workflow should then execute the ApproveApplication Lambda function after
both CheckName and CheckAddress are complete.
-- a) Use AWS Step Functions to change the state of ApproveApplication from Wait
once CheckName and CheckAddress succeed
-- b) Create a CloudWatch Event for CheckName and CheckAddress. Once this Event
occurs, create a Lambda function to trigger ApproveApplication
-- c) Migrate the Lambda functions to Apache Airflow and use Amazon Managed
Workflows to scale and trigger functions as needed
A data engineer is using HDFS with an Amazon EMR cluster. Per company
compliance requirements, the data engineer needs to encrypt sensitive data in
transit and at rest using HDFS encryption zones.
Which of the following can the data engineer implement to meet requirements?
-- a) Server-side Encryption
-- b) Client-side Encryption
-- c) Transparent Encryption
-- d) SSL/TLS Encryption
Question 14:
A company using AWS wants to better understand how devices on its network are
interacting ahead of an audit. The data team has been tasked with analyzing VPC
traffic in real-time.
The data analytics solution should avoid creating additional cloud infrastructure.
Which of the following solutions meets the requirements?
-- a) Create a CloudWatch Logs subscription to send VPC flow logs to a Kinesis Data
Firehose delivery stream. Use a Lambda function to decompress the data and write
to S3. Use Athena for SQL querying and Amazon Managed Service for Apache Flink
for near real-time analysis.
-- b) Create a CloudWatch Logs subscription to send VPC flow logs to a Kinesis Data
Firehose delivery stream. Launch an EMR cluster and use Hive to decompress the
data, convert it to a columnar storage format, and write to S3. Use Hive for SQL
querying and Flink for near real-time analysis of streaming data.
-- c) Create a CloudWatch Logs subscription to send VPC flow logs to a Kinesis Data
Firehose delivery stream. Use Glue to convert data to a columnar storage format and
write to S3. Use the CloudWatch console for querying and near real-time analysis.
A social media company wants to use Apache Spark for batch data transformation
jobs. The transformation job will use the social network data to create friend clusters
and identify additional features that can be leveraged in implementing the KNN
algorithm.
-- a) Run transformation jobs using EMR and write the data to HBase.
-- b) Use Kinesis Data Analytics for data transformation and write the data to S3.
-- c) Run transformation jobs using Glue and write the data to S3.
-- d) Run transformation jobs using Glue and write the data to OpenSearch.
Question 16:
The reporting team also does not want to manage additional infrastructure. Which of
the following meets these requirements?
-- a) Load data records into a Redshift cluster. Use a Tableau Server standalone
deployment connected to the Redshift cluster for visualization.
-- b) Load data records into SPICE. Use Amazon QuickSight Standard Edition for
visualization.
-- c) Load data records into SPICE. Use Amazon QuickSight Enterprise Edition for
visualization.
The security department has mandated the use of SSO with two-factor
authentication for Redshift cluster access.
-- a) Configure users on the Redshift cluster according to the data analysts’ SSO
credentials. Configure the EIC Endpoint to require multi-factor authentication (MFA).
-- b) Configure the SAML identity provider in the IAM console. Create an identity pool
in Amazon Cognito and configure it to work with SAML 2.0. Ensure user roles have
access to the Redshift cluster.
The security department has mandated the use of SSO with two-factor
authentication for Redshift cluster access. Which solution meets these
requirements?
-- a) Configure users on the Redshift cluster according to the data analysts’ SSO
credentials. Configure the EIC Endpoint to require multi-factor authentication (MFA).
-- b) Configure the SAML identity provider in the IAM console. Create an identity pool
in Amazon Cognito and configure it to work with SAML 2.0. Ensure user roles have
access to the Redshift cluster.
A team of data analysts is consuming data from a CSV file stored on Amazon S3.
One of the fields is a JSON string and the team needs to convert each item in the
JSON into its own column.
-- a) Write a custom Python script to read in the JSON and output it into separate
fields.
-- b) Use the ResolveChoice transformation in Glue to parse the JSON into individual
fields.
-- c) Launch an EMR cluster with Spark installed. Parse through the field containing
the JSON string and convert the values to columns using the Spark from_json()
function.
-- d) Use the Unbox transformation in Glue to parse the JSON into individual fields.
Answers
1 B
2 A
3 A
4 D
5 C
6 D
7 A
8 B
9 A
10 A
11 A
12 A
13 C
14 A
15 C
16 C
17 D
18 A
19 D