100% found this document useful (1 vote)

217 views211 pages

AWS Big Data Specialty

Uploaded by

Evan van Zyl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

217 views211 pages

AWS Big Data Specialty

Uploaded by

Evan van Zyl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 211

Vendor: Amazon

Exam Code: AWS-Certified-Big-Data-Specialty

Exam Name: AWS Certified Big Data - Specialty

Version: 19.121
Important Notice
Product
Our Product Manager keeps an eye for Exam updates by Vendors. Free update is available within
One year after your purchase.

You can login member center and download the latest product anytime. (Product downloaded
from member center is always the latest.)

PS: Ensure you can pass the exam, please check the latest product in 2-3 days before the exam
again.

Feedback
We devote to promote the product quality and the grade of service to ensure customers interest.

If you have any questions about our product, please provide Exam Number, Version, Page
Number, Question Number, and your Login Account to us, please contact us at
[email protected] and our technical experts will provide support in 24 hours.

If anyone who share the file we will disable the free update and account access.

Any unauthorized changes will be inflicted legal punishment. We will reserve the right of final
explanation for this statement.

Order ID: ****************

PayPal Name: ****************

PayPal ID: ****************

QUESTION 1
Which statements are true of sequence numbers in Amazon Kinesis? (choose three)

A. Sequence numbers are assigned by Amazon Kinesis when a data producer calls PutRecords
operation to add data to an Amazon Kinesis stream
B. A data pipeline is a group of data records in a stream.
C. The longer the time period between PutRecord or PutRecords requests, the larger the sequence
number becomes.
D. Sequence numbers are assigned by Amazon Kinesis when a data producer calls PutRecord
operation to add data to an Amazon Kinesis stream

Answer: ACD
Explanation:
Sequence numbers in amazon Kinesis are assigned by Amazon Kinesis when a data producer
calls PutRecord operation to add data to an Amazon Kinesis stream. Sequence numbers are
assigned by Amazon Kinesis when a data producer calls PutRecords operation to add data to an
Amazon Kinesis stream. Sequence numbers for the same partition key generally increase over
time.The longer the time period between PutRecord or PutRecords requests, the larger the
sequence number becomes.
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 2
How are Snowball logs stored?

A. in a JSON file
B. in a SQLite table
C. in a plaintext file
D. in an XML file

Answer: C
Explanation:
When you transfer data between your data center and a Snowball, the Snowball client generates
a plaintext log and saves it to your workstation.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 3
How do you put your data into a Snowball?

A. Mount your data source onto a workstation in your datacenter and then use this workstation to
transfer data to the Snowball.
B. Connect your data source to the Snowball and then press the "import" button.
C. Mount your data source onto the Snowball and ship it back together with the appliance.
D. Connect the Snowball to your datacenter and then copy the data from your data sources to the
appliance via FTP.

Answer: A
Explanation:
To put your data into a Snowball, you mount your data source onto a workstation in your
datacenter and then use this workstation to transfer data to the Snowball.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/receive-appliance.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 2
http://www.passleader.com
QUESTION 4
Kinesis Partition keys are unicoded strings with a maximum length of (choose one)

A. 256 bytes
B. 128 bytes
C. 512 bytes
D. 1024 bytes

Answer: A
Explanation:
Kinesis Partition keys are unicoded strings with a maximum length of 256 bytes
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 5
Identify a factor that affects the speed of data transfer in AWS Snowball.

A. Transcoder speed
B. The speed of the AGP card
C. Local network speed
D. The speed of the L3 cache

Answer: C
Explanation:
The Snowball client can be used to estimate the time taken to transfer data. Data transfer speed
is affected by a number of factors including local network speed, file size, and the speed at which
data can be read from local servers.
Reference: https://aws.amazon.com/importexport/faqs/

QUESTION 6
How can AWS Snowball handle petabyte-scale data migration?

A. Data is sent via a shipping container, pulled by a semi-trailer truck.

B. Data is sent compressed via a high speed network connection.
C. Data is sent via a physical appliance sent to you by AWS.
D. Data is sent encoded (forward error correction) via a high speed network connection.

Answer: C
Explanation:
Snowball uses secure appliances to transfer large amounts of data into and out of the AWS
cloud; this is fast and cheaper than high-speed Internet.
Reference: https://aws.amazon.com/snowball/

QUESTION 7
The maximum size of a Kinesis data blob, the data payload before Base64 encoding is? (choose
one)

A. Five megabytes
B. Two megabytes
C. One kilobyte

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 3
http://www.passleader.com
D. One megabyte

Answer: D
Explanation:
The maximum size of a Kinesis data blob, the data payload before Base64 encoding is one
megabtye
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 8
The Snowball client uses a(n) ____ to define what kind of data is transferred between the client's
data center and a Snowball.

A. schema
B. JSON configuration file
C. interface
D. XML configuration file

Answer: A
Explanation:
The Snowball client uses schemas to define what kind of data is transferred between the client's
data center and a Snowball. The schemas are declared when a command is issued.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 9
An AWS Snowball appliance includes a(n) ____ network connection to minimize data transfer
times.

A. 10GBaseT
B. 1000BaseT
C. 40GBaseT
D. Infiniband

Answer: A
Explanation:
An AWS Snowball appliance has a 10GBaseT network connection (both RJ45 as well as SFP+
with either a fiber or copper interface) to minimize data transfer times. This allows the Snowball
appliance to transfer up to 80 terabytes of data from a data source to the appliance in about a
day, plus shipping time.
Reference: https://aws.amazon.com/snowball/details/

QUESTION 10
The job management API for AWS Snowball is a network protocol based on HTTP that uses a(n)
____ model.

A. RPC
B. MPI
C. publish/subscribe
D. RMI

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 4
http://www.passleader.com
Explanation:
The job management API for AWS Snowball is a network protocol based on HTTP. It uses JSON
(RFC 4627) documents for HTTP request/response bodies and is an RPC model, in which there
is a fixed set of operations, and the syntax for each operation is known to clients without any prior
interaction.
Reference: http://docs.aws.amazon.com/snowball/latest/api-reference/api-reference.html

QUESTION 11
Which statements are true about re-sharding in Amazon Kinesis?

A. The shard or pair of shards that result from the re-sharding operation are referred to as child
shards.
B. When you re-shard, data records that were flowing to the parent shards are rerouted to flow to the
child shards based on the hash key values that the data record partition keys map to.
C. The shard or pair of shards that the re-sharding operation acts on are referred to as parent
shards.
D. After you call a re-sharding operation, you do not need to wait for the stream to become active
again.

Answer: ABC
Explanation:
Kinesis Streams supports re-sharding which enables you to adjust the number of shards in your
stream in order to adapt to changes in the rate of data flow through the stream. The shard or pair
of shards that the re-sharding operation acts on are referred to as parent shards.
The shard or pair of shards that result from the re-sharding operation are referred to as child
shards.
After you call a re-sharding operation, you need to wait for the stream to become active again.
When you re-shard, data records that were flowing to the parent shards are rerouted to flow to
the child shards based on the hash key values that the data record partition keys map to.
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 12
In AWS Data Pipeline, an activity is (choose one)

A. A pipeline component that defines the work to perform

B. The database schema of the pipeline data
C. A set of scripts loaded at run time
D. A read/ write event from the primary database

Answer: A
Explanation:
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 13
In AWS Data Pipeline, an activity is a pipeline component that defines the work to perform. All
AWS Data Pipeline schedules must have- (choose two)

A. an execution time
B. a start date

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 5
http://www.passleader.com
C. a frequency
D. an end date

Answer: BC

QUESTION 14
Which activities could be run by the data pipeline? (choose two)

A. Moving data from one location to another

B. Running Hive queries
C. Backing up a primary database to a replica
D. Creating Hive queries

Answer: AB
Explanation:
An AWS data pipeline activity can be used to move data from one location to another or to run
Hive queries.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 15
With reference to Hadoop MapReduce in Amazon EMR, which of the following best describes "a
user-defined unit of processing, mapping roughly to one algorithm that manipulates the data"?

A. A cluster map
B. A multi-cluster
C. A cluster store
D. A cluster step

Answer: D
Explanation:
A cluster step is a user-defined unit of processing, mapping roughly to one algorithm that
manipulates the data.
A step is a Hadoop MapReduce application implemented as a Java jar or a streaming program
written in Java, Ruby, Perl, Python, PHP, R, or C++. For example, to count the frequency with
which words appear in a document, and output them sorted by the count, the first step would be a
MapReduce application which counts the occurrences of each word, and the second step would
be a MapReduce application which sorts the output from the first step based on the counts.
Reference: https://aws.amazon.com/elasticmapreduce/faqs/

QUESTION 16
Which of the statements below are true of AWS Data Pipeline activities? (choose two)

A. Data pipeline activities are fixed to ensure no change or script error impacts processing speed
B. Data pipeline activities are extensible but only up to 256 characters
C. When you define your pipeline, you can choose to execute it on Activation or create a schedule to
execute it on a regular basis.
D. Data pipeline activities are extensible - so you can run your own custom scripts to support
endless combinations.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 6
http://www.passleader.com
Answer: CD
Explanation:
Data pipeline activities are extensible - so you can run your own custom scripts to support
endless combinations. When you define your pipeline, you can choose to execute it on Activation
or create a schedule to execute it on a regular basis.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 17
Which of the following data sources are supported by AWS Data Pipeline? (Choose 3)

A. Access via JDBC

B. Amazon Redshift
C. Amazon RDS databases
D. Elasticache

Answer: ABC

QUESTION 18
AWS Data Pipeline provides (choose one)

A. Several pre-packaged activities that accommodate common scenarios, such as moving data from
one location to another or running Hive queries.
B. A series of sequential tasks that are run by the data import engine
C. A number of pre-determined activities executed in sequence with no intervention required from
the user
D. An engine for transforming a primary database to a usable data set

Answer: A

QUESTION 19
AWS Data Pipeline works with which of the following services to access and store data? (choose
three)

A. Amazon DynamoDB
B. Amazon Elasticache
C. Amazon Redshift
D. Amazon S3

Answer: ACD
Explanation:
AWS Data Pipeline works with which of the following services to access and store data.
Amazon DynamoDB
Amazon RDS
Amazon Redshift
Amazon S3
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 7
http://www.passleader.com
QUESTION 20
An AWS Data Pipeline Task Runner is a task agent application that (choose two)

A. Polls AWS Data Pipeline for scheduled tasks and executes them on EC2 instances or Amazon
EMR clusters or other computational resources.
B. Polls AWS Data Pipeline for scheduled tasks. When a task is assigned to Task Runner, it
performs a task and reports its status back to AWS Pipeline.
C. Checks when data has completed loading in to Data Pipeline
D. Polls AWS Data Pipeline for errors and writes them to CloudWatch logs

Answer: AB

QUESTION 21
Which statements are true of System-managed preconditions in AWS Data Pipeline? (choose
two)

A. System-managed preconditions do not require a computational resource.

B. System-managed preconditions can be run on an EC2 instance
C. System-managed preconditions are run by the AWS Data Pipeline web service on your behalf.
D. System-managed preconditions are only run on the computational resource that you specify.

Answer: AC

QUESTION 22
Does the EMR Hadoop input connector for Kinesis enable continuous stream processing?

A. Only in some regions

B. Yes
C. No
D. Only if the iteration process succeeds

Answer: C
Explanation:
The Hadoop MapReduce framework is a batch processing system. As such, it does not support
continuous queries. However, there is an emerging set of Hadoop ecosystem frameworks like
Twitter Storm and Spark Streaming that enable developers to build applications for continuous
stream processing. A Storm connector for Kinesis is available on GitHub here and you can find a
tutorial explaining how to setup Spark Streaming on EMR and run continuous queries here.
Additionally, developers can utilize the Kinesis client library to develop real-time stream
processing applications.
Reference: https://aws.amazon.com/elasticmapreduce/faqs/

QUESTION 23
In AWS Data Pipeline, what precondition in a pipeline component containing conditional
statements must be true before an activity can run? (choose three)

A. Check whether an Amazon S3 key is present

B. Check whether source data is present before a pipeline activity attempts to copy it
C. Check if the Hive script has compile errors in it
D. Check whether a database table exists

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 8
http://www.passleader.com
Answer: ABD
Explanation:
The following conditional statements must be true before an AWS Data Pipeline activity will run.
Check whether source data is present before a pipeline activity attempts to copy it.
Check whether a database table exists.
Check whether an Amazon S3 key is present.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 24
What are supported ways you can use Task Runner to process your AWS Data pipeline? (choose
three)

A. Install Task Runner on a long-running EC2 instance.

B. install Task Runner on a computational resource that you manage.
C. Install Task Runner on an Database migration service instance
D. Enable AWS Data Pipeline to install Task Runner for you on resources that are launched and
managed by the AWS Data Pipeline web service.

Answer: ABD
Explanation:
Task Runner enables two use cases. Enable AWS Data Pipeline to install Task Runner for you
on resources that are launched and managed by the AWS Data Pipeline web service. install Task
Runner on a computational resource that you manage, such as a long-running EC2 instance or
an on-premise server.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 25
In AWS Data Pipeline data nodes are used for (choose two)

A. Loading data to the target

B. Accessing data from the source
C. Processing data transformations
D. Storing Logs

Answer: AB
Explanation:
In AWS Data Pipeline data nodes are used for Accessing data from the source, Loading data to
the target
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-datanodes.html

QUESTION 26
Which statements are true for user-managed preconditions in AWS Data Pipeline? (choose three)

A. User-managed preconditions only run on the computational resource that you have specified.
B. You can create, access and manage user-managed preconditions using the AWS Management
Console.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 9
http://www.passleader.com
C. You can create, access and manage user-managed preconditions using the AWS Command Line
Interface.
D. You need to install the AWS SDK to manage user-managed preconditions.

Answer: ABC
Explanation:
AWS Data Pipeline supports two types of preconditions. System-managed preconditions and
user-managed preconditions.
System-managed preconditions are run by the AWS Data Pipeline web service on your behalf do
not require a computational resource.
User Managed preconditions only run on the computational resource that you have specified. You
can create, access and manage your pipelines using the AWS Management Console. You can
create, access and manage your pipelines use the AWS Command Line Interface.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-managing-pipeline.html

QUESTION 27
In Amazon S3, you can protect data in transit (as it travels to and from Amazon S3) by using
either client-side encryption or by using _________.

A. MFA
B. SSL
C. ICMP
D. ARP

Answer: B
Explanation:
Data protection refers to protecting data while in-transit (as it travels to and from Amazon S3) and
at rest (while it is stored on disks in Amazon S3 data centers). You can protect data in transit by
using client-side encryption or by using SSL.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingEncryption.html

QUESTION 28
How long are the temporary security credentials valid for, if you obtain temporary security
credentials using your AWS account credentials in Amazon S3?

A. 30 Minutes
B. 1 Hour
C. 24 Hours
D. 2 Hours

Answer: B
Explanation:
An AWS account or an IAM user can request the temporary security credentials and use those
credentials to make authenticated requests to Amazon S3. By default, the temporary security
credentials are valid for only one hour. The user can specify the session duration only if the user
uses the IAM user credentials to request a session.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/AuthUsingTempSessionTokenJava.html

QUESTION 29

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 10
http://www.passleader.com
A user has enabled server side encryption with S3. The user downloads the encrypted object
from S3. How can the user decrypt it?

A. The user needs to decrypt the object using their own private key
B. S3 does not support server side encryption
C. S3 manages encryption and decryption automatically
D. S3 provides a server side key to decrypt the object

Answer: C
Explanation:
If the user is using the server-side encryption feature, Amazon S3 encrypts the object data before
saving it on disks in its data centres and decrypts it when the user downloads the objects. Thus,
the user is free from the tasks of managing encryption, encryption keys, and related tools.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingEncryption.html

QUESTION 30
What type of S3 Access Control supports AWS Account-Level Control as well as User-Level
control?

A. Bucket Policies
B. IAM Policies
C. ACLs
D. All of the three answers above

Answer: A
Explanation:
Bucket Policies allow you to create conditional rules for managing access to your buckets and
files. With bucket policies, you can also define security rules that apply to more than one file,
including all files or a subset of files within a bucket.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/using-iam-policies.html

QUESTION 31
In Amazon S3, which of the following security tokens is required to be passed in the header when
a user is signing a request using temporary security credentials?

A. x-amz-temporary-token
B. x-amz-temporary-security-token
C. x-amz-temp-secure-token
D. x-amz-security-token

Answer: D
Explanation:
If you are signing your request using temporary security credentials, you must include the
corresponding security token in your request by adding the x-amz-security-token header. When
you obtain temporary security credentials using the AWS Security Token Service API, the
response includes temporary security credentials and a session token. You provide the session
token value in the x-amz-security-token header when you send requests to Amazon S3.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html

QUESTION 32

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 11
http://www.passleader.com
Is it required to send both the Access Key and the Secret Access key in the REST request to
Amazon S3?

A. Yes
B. Yes, it is required only for the IAM users.
C. No
D. Yes, it is required only for the root accounts.

Answer: C
Explanation:
No, it is not required to send both the Access key and the Secret Access key. When a user is
making a REST URL, the user is required to send only the Access key and a signature. The
signature is created with the Secret Access key and request contents. The user does not need to
explicitly send the Secret Access key. Amazon S3 uses the access key ID to look up your secret
access key.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_Authentication2.html

QUESTION 33
Can temporary security credential validity be specified by a user when it has been created using
S3 SDK?

A. Yes
B. Yes, this is possible but only when generating credentials for an IAM user.
C. Yes, this is possible but only when generating credentials for a root account user.
D. No

Answer: B
Explanation:
An IAM user or an AWS Account can request temporary security credentials using AWS SDK for
Java and use them to access Amazon S3. These credentials expire after the session duration. By
default, the session duration is one hour. If you use IAM user credentials, you can specify
duration, between 1 and 36 hours, when requesting the temporary security credentials.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/AuthUsingTempSessionTokenJava.html

QUESTION 34
Which of the following must be supplied when working with the S3 REST APIs?

A. HTTP request
B. Signature and Time stamp
C. Time stamp only
D. WSDL File and Time stamp

Answer: B
Explanation:
When a user is trying to access the S3 bucket using REST, he is required to supply the below
mentioned items in the request:
1. AWS Access Key ID - Each request must contain the Access Key ID of the identity that the
user is using to send a request.
2. Signature - Each request must contain a valid request signature or the request will be rejected.
A request signature is calculated using the Secret Access key, which is a shared secret known

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 12
http://www.passleader.com
only to the user and AWS.
3. Time stamp - Each request must contain the date and time when the request was created,
represented as a string in UTC.
4. Date - Each request must contain the time stamp of the request.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_Authentication2.html

QUESTION 35
Which of the following are a part of the temporary security credentials that are programmatically
supported by the S3 SDK for federated users?

A. A user name with an Amazon ID only

B. Users with an Apple ID
C. A user name and an IAM policy describing the resource permissions to be granted
D. An IAM policy only

Answer: C

QUESTION 36
Authenticating a request in Amazon S3 includes the following three steps.

1. AWS creates an HMAC-SHA1 signature.

2. AWS retrieves a secret access key.
3. AWS compares signatures.

Which of the following lists those steps in the correct order?

A. 1, 2 and then 3
B. 1, 3 and then 2
C. 3, 1 and then 2
D. 2, 1 and then 3

Answer: D
Explanation:
When making a REST call, Amazon S3 will receive the request. The correct sequence is given
below. AWS retrieves a secret access key. AWS creates an HMAC-SHA1 signature. AWS
compares signatures.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_Authentication2.html

QUESTION 37
What does the Server-side encryption provide in Amazon S3?

A. Server-side encryption protects data at rest using Amazon S3-managed encryption keys (SSE-
S3).
B. Server-side encryption doesn't exist for Amazon S3, but only for Amazon EC2.
C. Server-side encryption allows to upload files using an SSL endpoint for a secure transfer.
D. Server-side encryption provides an encrypted virtual disk in the cloud.

Answer: A
Explanation:
Server-side encryption is about protecting data at rest. Server-side encryption with Amazon S3-

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 13
http://www.passleader.com
managed encryption keys (SSE-S3) employs strong multi-factor encryption. Amazon S3 encrypts
each object with a unique key. As an additional safeguard, it encrypts the key itself with a master
key that it regularly rotates.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html

QUESTION 38
A user is creating an S3 bucket policy. Which of the below mentioned elements the user will not
include as part of it?

A. Actions
B. Buckets
C. Principal
D. Resource

Answer: B
Explanation:
When creating an S3 bucket policy, the user needs to define the resource (which will have the
bucket or the object), actions, effect and principal.
They are explained below:
Resources - Buckets and objects are the Amazon S3 resources for which user can allow or deny
permissions.
Actions - For each resource, Amazon S3 supports a set of operations. user identifies resource
operations which will allow (or deny) by using action keywords
Effect - What the effect will be when the user requests the specific action--this can be either allow
or deny.
Principal - The account or user who is allowed access to the actions and resources in the
statement. You specify principal only in a bucket policy. It is the user, account, service, or other
entity who is the recipient of this permission. In a user policy, the user to which the policy is
attached is the implicit principal.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/access-policy-language-overview.html

QUESTION 39
An IAM user is performing an operation on another account's S3 bucket. What will S3 first check
in this context?

A. Verifies that the bucket has the required policy defined for access the IAM user
B. Verifies if the parent account of the IAM user has granted sufficient permission
C. Reject the request since the IAM user does not belong to the root account
D. Verifies if the IAM policy is available for the root account to provide permission to the other IAM
users

Answer: B

QUESTION 40
You can use _______ in an Amazon S3 bucket policy for cross-account access, which means an
AWS account can access resources in another AWS account.

A. access key IDs

B. secret access keys

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 14
http://www.passleader.com
C. account IDs
D. canonical user IDs

Answer: D
Explanation:
You can use canonical user IDs in an Amazon S3 bucket policy for cross-account access, which
means an AWS account can access resources in another AWS account. For example, to grant
another AWS account access to your bucket, you specify the account's canonical user ID in the
bucket's policy.
Reference: http://docs.aws.amazon.com/general/latest/gr/acct-identifiers.html

QUESTION 41
A root account owner is trying to understand the S3 bucket ACL. Which choice below is a not a
predefined group which can be granted object access via ACL?

A. Canonical user group

B. Log Delivery Group
C. All users group
D. Authenticated user group

Answer: A
Explanation:
An S3 bucket ACL grantee can be an AWS account or one of the predefined Amazon S3 groups.
Amazon S3 has a set of predefined groups. When granting account access to a group, the user
can specify one of the URLs of that group instead of a canonical user ID. Amazon S3 has the
following predefined groups:
. Authenticated Users group: It represents all AWS accounts.
. All Users group: Access permission to this group allows anyone to access the resource.
. Log Delivery group: WRITE permission on a bucket enables this group to write server access
logs to the bucket.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html

QUESTION 42
Your EMR cluster uses cc2.8xlarge instance types, which Ganglia reports is barely using 25%
CPU resources. You have to process large amounts of data that resides on S3. Which is the most
cost efficient way to reduce the runtime of the job? Choose the correct answer:

A. Split the files into smaller sizes on S3

B. Add additional cc2.8xlarge instances in a new task group
C. Reduce the numbers of files on S3 by merging some of them
D. Spread the files over multiple S3 buckets

Answer: C
Explanation:
https://aws.amazon.com/elasticmapreduce/faqs/

QUESTION 43
Your IoT application has smoke sensors in various hotels. You need to collect this data in real-
time and log it all to S3 and in the event a sensor detects smoke to send out an alert. What steps
do you need to take to accomplish this?
Choose the 2 correct answers:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 15
http://www.passleader.com
A. Create a rule to filter the smoke sensors that detect smoke
B. Create a rule to send a push notification to all users using Amazon SNS
C. Create an action to filter the smoke sensors that detect smoke
D. Create an action to send a push notification to all users using Amazon SNS

Answer: AD

QUESTION 44
You have been tasked to create an enterprise data warehouse. The data warehouse needs to
collect data from each of the three channels' various systems and from public records for weather
and economic data. Each data source sends data daily for consumption by the data warehouse.
Because each data source may be structured differently, an extract, transform, and load (ETL)
process is performed to reformat the data into a common structure. Then, analytics can be
performed across data from all sources simultaneously. Which tools shall you implement?
Choose the correct answer:

A. DynamoDB, Data Pipeline, SQS

B. S3, EMR, Data Pipeline, Lambda
C. RDS, EMR, Data Pipeline, Quicksight
D. S3, EMR, Redshift, Quicksight

Answer: D
Explanation:
The first step in this process is getting the data from the many different sources onto Amazon S3.
Amazon EMR is used to transform and cleanse the data from the source format into the
destination and a format. Each transformation job then puts the formatted and cleaned data onto
Amazon S3. Amazon Redshift loads, sorts, distributes, and compresses the data into its tables so
that analytical queries can execute efficiently and in parallel. For visualizing the analytics,
Amazon QuickSight can be used, or one of the many partner visualization platforms via the
OBDC/JDBC connection to Amazon Redshift .
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 45
You need to implement a solution for customer engagement: You need to write queries that join
clickstream data from advertising campaign information stored in a DynamoDB table to identify
the most effective categories of ads that are displayed on particular websites. Which services
should you employ? Choose two.
Choose the 2 correct answers:

A. Kinesis
B. Data Pipeline
C. SQS
D. EMR

Answer: AD
Explanation:
Amazon EMR clusters can read and process Amazon Kinesis streams directly, using familiar
tools in the Hadoop ecosystem such as Hive, Pig, MapReduce, the Hadoop Streaming API, and
Cascading. You can also join real-time data from Amazon Kinesis with existing data on Amazon

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 16
http://www.passleader.com
S3, Amazon DynamoDB, and HDFS in a running cluster. You can directly load the data from
Amazon EMR to Amazon S3 or DynamoDB for post-processing activities.
Reference:
http://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-kinesis.html

QUESTION 46
Your company has two batch processing applications that consume financial data about the day's
stock transactions. Each transaction needs to be stored durably and guarantee that a record of
each application is delivered so the audit and billing batch processing applications can process
the data. However, the two applications run separately and several hours apart and need access
to the same transaction information. After reviewing the transaction information for the day, the
information no longer needs to be stored. What is the best way to architect this application?
Choose the correct answer:

A. Use SQS for storing the transaction messages. When the billing batch process consumes each
message, have the application create an identical message and place it in a different SQS for the
audit application to use several hours later
B. Use SQS for storing the transaction messages; when the billing batch process performs first and
consumes the message, write the code in a way that does not remove the message after
consumed, so it is available for the audit application several hours later. The audit application can
consume the SQS message and remove it from the queue when completed
C. Store the transaction information in a DynamoDB table. The billing application can read the rows
while the audit application will read the rows then remove the data
D. Use Kinesis to store the transaction information. The billing application will consume data from
the stream, the audit application can consume the same data several hours later

Answer: D

QUESTION 47
You need to store data quickly in a cost-effective manner. Also, you do not know how much data
you will be handling in 6 months, and you have spikey processing needs. Which Big Data tool
should you use?
Choose the correct answer:

A. EMR
B. Machine Learning
C. Redshift
D. Kinesis

Answer: A
Explanation:
With Amazon EMR, it is easy to resize a running cluster. You can add core nodes, which hold the
Hadoop Distributed File System (HDFS), at any time to increase your processing power and
increase the HDFS storage capacity (and throughput). Additionally, you can use Amazon S3
natively or using EMFS along with or instead of local HDFS, which allows you to decouple your
memory and compute from your storage providing greater flexibility and cost efficiency. You can
also add and remove task nodes at any time which can process Hadoop jobs but do not maintain
HDFS. Some customers add hundreds of instances to their clusters when their batch processing
occurs and remove the extra instances when processing completes. For example, you may not
know how much data your clusters will be handling in 6 months, or you may have spikey
processing needs. With Amazon EMR, you don't need to guess your future requirements or
provision for peak demand because you can easily add or remove capacity at any time.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 17
http://www.passleader.com
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 48
There is a 14-day backpacking tour across Europe. The tour coordinators are using a Kinesis
Stream and IOT sensors to monitor the movement of the group. You have changed the default
settings on the stream to the max settings. Each backpack has a sensor and data is getting back
to the stream with the default stream settings. On the last day of the tour, data is sent to S3.
When you go to interpret the data in S3 there is only data for 7 days. Which of the following is the
most probable cause of this?
Choose the correct answer:

A. One of the sensors failed, so there was no data to record

B. You did not have versioning enabled and would need to create individual buckets to prevent the
data from being overwritten.
C. Data records are only accessible up to 7 days from the time they are added to a stream.
D. You needed to use EMR to send the data to S3; Kinesis Streams are only compatible with
DynamoDB.

Answer: C
Explanation:
Streams supports changes to the data record retention period of your stream. An Amazon Kinesis
stream is an ordered sequence of data records meant to be written to and read from in real-time.
Data records are therefore stored in shards in your stream temporarily. The time period from
when a record is added to when it is no longer accessible is called the retention period. An
Amazon Kinesis stream stores records from 24 hours by default, up to 168 hours.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html

QUESTION 49
You work for a social media startup and need to analyze the effectiveness of your new marketing
campaign from your previous one. Which process should you use to record the social media
replies in a durable data store that can be accessed at any time for analytics of historical data?
Choose the correct answer:

A. Read the data from the social media sites, store it in Amazon Glacier, and use AWS Data
Pipeline to publish it to Amazon RedShift for analytics.
B. Read the data from the social media sites, store it with Amazon Elastic Block Store, and use AWS
Data Pipeline with Amazon Kinesis for analytics
C. Read the data from the social media site, store it with Amazon Elastic Block store, and use
Amazon Kinesis for analytics
D. Read the data from the social media sites, store it in DynamoDB, and useApache Hive with
Amazon Elastic MapReduce for analytics.

Answer: D
Explanation:
https://aws.amazon.com/blogs/aws/aws-howto-using-amazon-elastic-mapreduce-with-dyna
modb/

QUESTION 50
You have a Kinesis stream with four shards in getting data from various IoT devices. There is a

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 18
http://www.passleader.com
lambda transformation function attached to the streams that fan out the data to eight destinations.
How many lambda functions get invoked concurrently per record? Choose the correct answer:

A. 1
B. 8
C. 4
D. 32

Answer: B
Explanation:
One lambda per shard

QUESTION 51
You to create an Amazon Machine Learning model to predict how many inches of rain will fall in
an area based on the historical rainfall data. What type of modeling will you use? Choose the
correct answer:

A. Categorial
B. Regression
C. Unsupervised
D. Binary

Answer: B
Explanation:
Regression will give you a range of values.

QUESTION 52
You need to perform ad-hoc SQL queries on structured data. Data comes in constantly at a high
velocity. What services should you use?
Choose the correct answer:

A. Kinesis Firehose and Redshift

B. Kinesis Streams and RDS
C. EMR + Redshift
D. EMR using Hive

Answer: A
Explanation:
Kinesis Firehose provides a managed service for aggregating streaming data and inserting it into
Redshift. Redshift also supports ad-hoc queries over well-structured data using a SQL-compliant
wire protocol, so the business team should be able to adopt this system easily.
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-sql.html

QUESTION 53
All the data in one of your Redshift tables needs to be exported in a CSV format into S3. What is
the most efficient command to do that?
Choose the correct answer:

A. DistCp

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 19
http://www.passleader.com
B. UNLOAD
C. COPY
D. EXPORT

Answer: B
Explanation:
Unloads the result of a query to one or more files on Amazon Simple Storage Service (Amazon
S3) using Amazon S3 server-side encryption (SSE-S3).

QUESTION 54
You need to visualize data from Spark and Hive running on an EMR cluster. Which of the options
is best for an interactive and collaborative notebook for data exploration? Choose the correct
answer:

A. Hive
B. Kinesis Analytics
C. D3.js
D. Zeppelin

Answer: D
Explanation:
Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for
data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to
manipulate data and quickly visualize results.

QUESTION 55
You are provisioning an application using EMR. You have requested 100 instances. You are
charged $0.015 per instance. In the first 10 minutes after your launch request, Amazon EMR
starts your cluster. 90 of your instances are available. It takes you cluster one hour to complete.
How much will you be charged?
Choose the correct answer:

A. $.015
B. $0
C. $1.35 per hour
D. $1.50 per hour

Answer: C
Explanation:
Billing commences when Amazon EMR starts running your cluster. You are only charged for the
resources actually consumed. For example, let's say you launched 100 Amazon EC2 Standard
Small instances for an Amazon EMR cluster, where the Amazon EMR cost is an incremental
$0.015 per hour. The Amazon EC2 instances will begin booting immediately, but they won't
necessarily all start at the same moment. Amazon EMR will track when each instance starts and
will check it into the cluster so that it can accept processing tasks. In the first 10 minutes after
your launch request, Amazon EMR either starts your cluster (if all your instances are available) or
checks in as many instances as possible. Once the 10-minute mark has passed, Amazon EMR
will start processing (and charging for) your cluster as soon as 90% of your requested instances
are available. As the remaining 10% of your requested instances check in, Amazon EMR starts
charging for those instances as well. So, in the above example, if all 100 of your requested
instances are available 10 minutes after you kick off a launch request, you'll be charged $1.50 per

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 20
http://www.passleader.com
hour (100 * $0.015) for as long as the cluster takes to complete. If only 90 of your requested
instances were available at the 10-minute mark, you'd be charged $1.35 per hour (90 * $0.015)
for as long as this was the number of instances running your cluster. When the remaining 10
instances checked in, you'd be charged $1.50 per hour (100 * $0.015) for as long as the balance
of the cluster takes to complete.
Reference:
https://aws.amazon.com/emr/faqs/

QUESTION 56
You have been running a Redshift cluster in your personal AWS account in us-east-1 and now
need to move the cluster to the company account in us-west-1. What is a primary step that you
must take to get this accomplished?
Choose the correct answer:

A. Enable cross-region snapshot copy

B. Create a manual snapshot of you Redshift cluster and restore it in the Company account from the
Amazon Redshift Console.
C. Manage snapshot access to authorize another AWS account to view and restore the snapshot
D. Configure a user in IAM for the Company to access snapshots

Answer: B

QUESTION 57
You have decided to migrate data from an on-premises database in Building A to a secondary on-
premises database in Building 2 and have decided to You need to do this as quickly as possible.
How do you achieve this with AWS?
Choose the correct answer:

A. Use AWS Data Migration Services and create a target database, migrate the database schema,
set up the data replication process, initiate the full load and a subsequent change data capture
and apply, and conclude with a switchover of your production environment to the new database
once the target database is caught up with the source database.
B. Use AWS Data Pipeline and create a target database, migrate the database schema, set up the
data replication process, initiate the full load and a subsequent change data capture and apply,
and conclude with a switchover of your production environment to the new database once the
target database is caught up with the source database
C. Replication between on-premises to on-premises databases is not supported by AWS
D. Use AWS Direct Connect with AWS Data Migration Service and immediately migrate the
database schema, no provisioning of a target database is needed.

Answer: C
Explanation:
AWS Database Migration Service supports both homogenous and heterogeneous data
replication. Supported database sources include: (1) Oracle, (2) SQL Server, (3) MySQL, (4)
Amazon Aurora (5) PostgreSQL and (6) SAP ASE. All sources are supported on-premises, in
EC2, and RDS except Amazon Aurora which is available only in RDS. RDS SQL Server is
supported in bulk extract mode only; the change data capture mode (CDC) is not yet supported.
CDC is supported for on-premises and EC2 SQL Server. Amazon Aurora is only available in
RDS. Supported database targets include: (1) Amazon Aurora, (2) Oracle, (3) SQL Server, (4)
MySQL, (5) PostgreSQL and (6) SAP ASE. All Oracle, SQL Server, MySQL and Postgres targets
are supported on-premises, in EC2 and RDS while SAP ASE is supported only in EC2. Either the
source or the target database (or both) need to reside in RDS or on EC2. Replication between

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 21
http://www.passleader.com
on-premises to on-premises databases is not supported.
Reference:
https://aws.amazon.com/dms/faqs/

QUESTION 58
Your data warehouse is running on Redshift. On average, you have 5 users that are logged in at
any given time and running queries. You have a query that runs for 50 minutes. While you run
your query, any new queries that are initiated by anyone else must wait for at least 50 minutes
before returning data; normally, their queries come back within a minute. Which of the following
can get the queries running faster for everyone else? Choose the 3 correct answers:

A. Put your query into a separate WLM queue with concurrency set to 1
B. Add more nodes to the Redshift cluster
C. Create a separate WLM queue for everyone else's queries with concurrency set to 3
D. Temporarily up the concurrency by one

Answer: ACD
Explanation:
All queries are going into 1 queue and concurrency is set to 1, which is why everyone is queued
behind you.

QUESTION 59
You need to migrate data to AWS; it is estimated that the data transfer will take over a week.
Which AWS tool should you use?
Choose the correct answer:

A. Direct Connect
B. Kinesis
C. Snowball
D. Data Pipeline

Answer: C
Explanation:
As a rule of thumb, if it takes more than one week to upload your data to AWS using the spare
capacity of your existing Internet connection, then you should consider using Snowball. For
example, if you have a 100 Mb connection that you can solely dedicate to transferring your data
and need to transfer 100 TB of data, it takes more than 100 days to complete data transfer over
that connection. You can make the same transfer by using multiple Snowballs in about a week.
Reference:
https://aws.amazon.com/snowball/faqs/

QUESTION 60
You work for a tech startup that has developed a bracelet to track information for children. Each
bracelet sends data in JSON format every 6 seconds to analyze and send to web portal for
parents. You need to provide a solution for real-time data analytics that is durable, elastic and
parallel. The results should be stored. Which solution should you select? Choose the correct
answer:

A. EMR to collect the inbound sensor data, analyze the data from EMR with Amazon Kinesis, and
save the results to DynamoDB.
B. SQS to collect the inbound sensor data analyze the data from SQS with a daily scheduled Data

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 22
http://www.passleader.com
Pipeline and save the results to a Redshift Cluster.
C. Amazon Kinesis to collect the inbound sensor data, analyze the data with Kinesis clients, and
save the results to a Redshift cluster using EMR.
D. S3 to collect the inbound sensor data analyze the data from S3 with Amazon Kinesis and save
the results to a Microsoft SQL Server RDS instance.

Answer: C
Explanation:
Amazon EMR clusters can read and process Amazon Kinesis streams directly, using familiar
tools in the Hadoop ecosystem such as Hive, Pig, MapReduce, the Hadoop Streaming API, and
Cascading. You can also join real-time data from Amazon Kinesis with existing data on Amazon
S3, Amazon DynamoDB, and HDFS in a running cluster. You can directly load the data from
Amazon EMR to Amazon S3 or DynamoDB for post-processing activities.
Reference:
http://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-kinesis.html

QUESTION 61
Your data warehouse is running on Redshift. You need to ensure that your cluster can be
restored in another region in case of region failure. What actions can you take to ensure that?
Choose the correct answer:

A. Use lambda to create EBS snapshots

B. Enable snapshot replication to another region
C. Enable cross-region replication in Redshift
D. Create a manual snapshot

Answer: B
Explanation:
B and D are not possible. C keeps the snapshot in the same region

QUESTION 62
You have EC2 instances which need to be connected to your on-premises data center. You need
to be able to support a connection speed of 200Mbps. How should you configure this? Choose
the correct answer:

A. Use Direct Connect to provision a 1 Gbps cross connect between your data center and VPC, then
increase the number or size of your Direct Connect connections as needed.
B. Create an internal ELB for your application, submit a Direct Connect request to provision a 1
Gbps cross connect between your data center and VPC, then increase the number or size of your
Direct Connect connections as needed.
C. Provision a VPN connection between a VPC and data center, submit a Direct Connect partner
request to provision cross connects between your data center and the Direct Connect location,
then cut over from the VPN connection to one or more Direct Connect connections as needed.
D. Allocate EIPs and an Internet Gateway for your VPC instances, then provision a VPN connection
between a VPC and your data center.

Answer: C
Explanation:
You can use AWS Direct Connect to establish a private logical connection from your on-premises
network directly to your Amazon VPC. AWS Direct Connect provides a private, high-bandwidth
network connection between your network and your VPC. You can use multiple logical

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 23
http://www.passleader.com
connections to establish private connectivity to multiple VPCs while maintaining network isolation.
With AWS Direct Connect, you can establish 1 Gbps or 10 Gbps dedicated network connections
between AWS and any of the AWS Direct Connect locations. A dedicated connection can be
partitioned into multiple logical connections by using industry standard 802.1Q VLANs. In this
way, you can use the same connection to access public resources, such as objects stored in
Amazon Simple Storage Service (Amazon S3) that use public IP address space, and private
resources, such as Amazon EC2 instances that are running within a VPC using private IP
space ?all while maintaining network separation between the public and private environments.
You can choose a partner from the AWS Partner Network (APN) to integrate the AWS Direct
Connect endpoint in an AWS Direct Connect location with your remote networks. Finally, you may
combine all these different options in any combination that make the most sense for your
business and security policies. For example, you could attach a VPC to your existing data center
with a virtual private gateway and set up an additional public subnet to connect to other AWS
services that do not run within the VPC, such as Amazon S3, Amazon Simple Queue Service
(Amazon SQS), or Amazon Simple Notification Service (Amazon SNS). In this situation, you
could also leverage IAM Roles for Amazon EC2 for accessing these services and configure IAM
policies to only allow access from the elastic IP address of the NAT server.
Reference:
https://d0.awsstatic.com/whitepapers/extend-your-it-infrastructure-with-amazon-vpc.pdf

QUESTION 63
A global shoemaker has over a thousand retail stores which they manage themselves, sell
through brokers in other discounted stores, and they also sell online. These channels are
completely independent. There is no system that merges these data sets together to allow the
COO to have comprehensive insight. You need to provide a company-wide picture of its channels
and be enabled to do ad-hoc analytics when required. How can you configure this? Choose two.
Choose the 2 correct answers:

A. Use Amazon EMR to load, sort, distribute, and compress the data into its tables so that analytical
queries can execute efficiently and in parallel. Lastly, for visualizing the analytics, Amazon
QuickSight can be used, or one of the many partner visualization platforms via the OBDC/JDBC
connection to Amazon EMR.
B. First, get all the data from the many different sources onto Amazon S3. Then, Amazon EMR is
used to transform and cleanse the data from the source format into the destination and desired
format.
C. Use Amazon Redshift to load, sort, distribute, and compress the data into its tables so that
analytical queries can execute efficiently and in parallel. Lastly, for visualizing the analytics,
Amazon QuickSight can be used, or one of the many partner visualization platforms via the
OBDC/JDBC connection to Amazon Redshift.
D. First, get all the data from the many different sources onto Amazon DynamoDB. Then, Amazon
Redshift is used to transform and cleanse the data from the source format into the destination and
a format.

Answer: BC
Explanation:
An enterprise data warehouse is a terrific way to solve this problem. The data warehouse needs
to collect data from each of the three channels'various systems and from public records for
weather and economic data. Each data source sends data daily for consumption by the data
warehouse. Because each data source may be structured differently, an extract, transform, and
load (ETL) process is performed to reformat the data into a common structure. Then, analytics
can be performed across data from all sources simultaneously Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 24
http://www.passleader.com
QUESTION 64
Your steaming application requires only-once delivery and out-of-order data is acceptable as long
as the data is processed within 5 seconds. Which solution can be used? Choose the correct
answer:

A. Spark Streaming
B. Kinesis Streams
C. Kinesis Firehose
D. None of the above

Answer: A
Explanation:
Spark has micro-batching but can guarantee only-once-delivery if configured.

QUESTION 65
Your supervisor has asked you provision Amazon Athena for data analysis. How should you
provision this?
Choose the correct answer:

A. Athena is serverless, so there is no infrastructure to set up or manage, you can start analyzing
data immediately. You don't even need to load your data into Athena, it works directly with data
stored in S3
B. Create an EC2 instance for Athena and you can start analyzing data immediately. You don't even
need to load your data into Athena, it works directly with data stored in S3
C. Load data into Athena. Athena is serverless, so there is no infrastructure to set up or manage,
you can start analyzing data immediately.
D. Create an RDS instance for Athena and you can start analyzing data immediately. You don't even
need to load your data into Athena, it works directly with data stored in S3

Answer: A
Explanation:
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3
using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage, and
you can start analyzing data immediately. You don't even need to load your data into Athena, it
works directly with data stored in S3.
Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 66
You need to move 64 TB of log data from your on-premises servers to Amazon to be loaded into
Redshift for analysis. What is the fastest method to accomplish this task? Choose the correct
answer:

A. Direct Connect
B. Upload over a VPN connection
C. Snowball
D. Multi-part upload

Answer: C
Explanation:
If moving data to S3 is going to take more than a week, use Snowball

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 25
http://www.passleader.com
QUESTION 67
What combination of services do you need for the following requirement: accelerate petabyte-
scale data transfers, load streaming data, and create scalable private connections.
Select the correct answer order.
Choose the correct answer:

A. Snowball, Direct Connection, Kinesis Firehose

B. Snowball, Kinesis Firehose, Direct Connect
C. Data Migration Services, Kinesis Firehose, Direct Connect
D. Snowball, Data Migration Services, Direct Connect

Answer: B
Explanation:
In addition, AWS has many options to help get data into the cloud, including secure devices like
AWS Import/Export Snowball5 to accelerate petabyte-scale data transfers, Amazon Kinesis
Firehose6 to load streaming data, and scalable private connections through AWS Direct Connect.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 68
You need to store large digital video files and need a fast, fully-managed, data warehouse service
that is simple and cost-effective to analyze all your data efficiently using your existing business
intelligence tools. How should you configure this? Choose the correct answer:

A. Store the data in Amazon S3 and reference its location in Amazon EMR. Amazon EMR will keep
track of metadata about your video files, but the video files themselves would be stored in
Amazon S3.
B. Store the data in Amazon S3 and reference its location in Amazon Redshift. Amazon Redshift will
keep track of metadata about your binary objects, but the large objects themselves would be
stored in Amazon S3.
C. Store the data in Amazon Redshift and Redshift will keep track of metadata about your binary
objects.
D. Store the data in Amazon DynamoDB and reference its location in Amazon Redshift. Amazon
Redshift will keep track of metadata about your video files, but the video files themselves would
be stored in Amazon S3.

Answer: B
Explanation:
BLOB data ?If you plan on storing large binary files (such as digital video, images, or music), you
may want to consider storing the data in Amazon S3 and referencing its location in Amazon
Redshift. In this scenario, Amazon Redshift keeps track of metadata (such as item name, size,
date created, owner, location, and so on) about your binary objects, but the large objects
themselves would be stored in Amazon S3.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 69
What is true about a Global Secondary Index on DynamoDB? Choose the correct answer:

A. The partition key and sort key can be different from the table

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 26
http://www.passleader.com
B. Either the partition key or the sort key can be different from the table, but not both
C. Only the sort key can be different from the table
D. Only the partition key can be different from the table

Answer: A

QUESTION 70
Your application generates logs that need to be stored in a DynamoDB table. The log contains
user, event_id, timestamp and status code. You expect to get hundreds of events per user, each
with a unique event_id. The number of users is expected to grow to 300,000 in two months. You
will mainly query the table for the events_id for a user during a time frame. What would be the
best pick for the hash key?
Choose the correct answer:

A. 'event_id' as the partition key

B. 'user' as the partition key and 'timestamp' as the sort key
C. 'user' as the partition key and 'event_id' as the sort key
D. 'event_id' as the partition key and 'user' as the sort key

Answer: B

QUESTION 71
You have to identify potential fraudulent credit card transactions using Amazon Machine
Learning. You have been given historical labeled data that you can use to create your model. You
will also need to the ability to tune the model you pick. Which model type should you use?
Choose the correct answer:

A. Cannot be done using Amazon Machine Learning

B. Regression
C. Binary
D. Categorical

Answer: C

QUESTION 72
There is a five-day car rally race across Europe. The race coordinators are using a Kinesis
Stream and IOT sensors to monitor the movement of the cars. Each car has a sensor and data is
getting back to the stream with the default stream settings. On the last day of the rally, data is
sent to S3. When you go to interpret the data in S3 there is only data for the last day and nothing
for the first 4 days. Which of the following is the most probable cause of this? Choose the correct
answer:

A. You needed to use EMR to send the data to S3, Kinesis Streams are only compatible with
DynamoDB.
B. Data records are only accessible for a default of 24 hours from the time they are added to a
stream.
C. One of the sensors failed, so there was no data to record
D. You did not have versioning enabled and would need to create individual buckets to prevent the
data from being overwritten.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 27
http://www.passleader.com
Answer: B
Explanation:
Streams supports changes to the data record retention period of your stream. An Amazon Kinesis
stream is an ordered sequence of data records meant to be written to and read from in real-time.
Data records are therefore stored in shards in your stream temporarily. The time period from
when a record is added to when it is no longer accessible is called the retention period. An
Amazon Kinesis stream stores records from 24 hours by default, up to 168 hours.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html

QUESTION 73
Your company stores very sensitive data on Redshift, which needs to be encrypted with keys that
are fully controlled by your company. Which option should you use? Choose the correct answer:

A. AWS CloudHSM
B. AWS KMS
C. On-premise HSM
D. S3-KMS

Answer: A
Explanation:
CloudHSM is a physical device that is attached to your VPC by AWS and only you have
access/control of the keys

QUESTION 74
You need to filter and transform incoming messages coming from a smart sensor you have
connected with AWS. Once messages are received, you need to store them as time series data
in DynamoDB. Which AWS service can you use?
Choose the correct answer:

A. Kinesis
B. IoT Device Shadow
C. IoT Rules Engine
D. Redshift

Answer: C
Explanation:
The AWS IoT Rules Engine enables continuous processing of inbound data from devices
connected to the AWS IoT service. You can configure rules in the Rules Engine in an intuitive,
SQL-like syntax to automatically filter and transform inbound data. You can further configure rules
to route data from the AWS IoT service to several other AWS services as well as your own or 3rd
party services. Here are just a few example use cases of rules: ?Filtering and transforming
incoming messages and storing them as time series data in DynamoDB. ?Sending a push
notification via SNS when the data from a sensor crosses a certain threshold. ?Saving a firmware
file to S3 ?Processing messages simultaneously from a multitude of devices using
Kinesis ?Invoke Lambda to do custom processing on incoming data ?Sending a command to a
group of devices with an automated republish
Reference:
https://aws.amazon.com/iot-platform/faqs/

QUESTION 75

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 28
http://www.passleader.com
You have large volumes of structured data in DynamoDB that you want to persist and query using
standard SQL and your existing BI tools. What solution should you use? Choose the correct
answer:

A. Use the COPY command to load data in parallel directly to EMR from DynamoDB
B. Use the INSERT command to load data in parallel directly to Redshift from DynamoDB
C. Use the INSERT command to load data in parallel directly to EMR from DynamoDB
D. Use the COPY command to load data in parallel directly to Redshift from DynamoDB

Answer: D
Explanation:
Amazon Redshift is ideal for large volumes of structured data that you want to persist and query
using standard SQL and your existing BI tools. You can use our COPY command to load data in
parallel directly to Amazon Redshift from Amazon EMR, Amazon DynamoDB, or any SSH-
enabled host.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 76
Your mobile application uses a DynamoDB backend to log data. The table has 3GB of data
already in it. The primary key/index is on the device ID of the mobile phone. The application also
logs the location of the mobile phone. A new marketing campaign requires a quick lookup for all
the phones in particular area. Also, you have checked CloudWatch and you are using 90% of the
provisioned RCUs and WCUs. How do you make sure you can support the new campaign without
any downtime?
Choose the 3 correct answers:

A. Create a LSI on location

B. Increase the RCUs
C. Increase the WCUs
D. Create a GSI on location

Answer: BCD
Explanation:
Adding a GSI adds the requirements for CUs.

QUESTION 77
Your company releases new features with high frequency while demanding high application
availability. As part of the application's A/B testing, logs from each updated Amazon EC2 instance
of the application need to be analyzed in near real-time, to ensure that the application is working
flawlessly after each deployment. If the logs show any anomalous behavior, then the application
version of the instance is changed to a more stable one. Which of the following methods should
you use for shipping and analyzing the logs in a highly available manner? Choose the correct
answer:

A. Ship the logs to Amazon S3 for durability and use Amazon EMR to analyze the logs in a batch
manner each hour
B. Ship the logs to Amazon CloudWatch Logs and use Amazon EMR to analyze the logs in a batch
manner each hour
C. Ship the logs to an Amazon Kinesis stream and have the consumers analyze the logs in a live
manner

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 29
http://www.passleader.com
D. Ship the logs to a large Amazon EC2 instance and analyze the logs in a live manner

Answer: C

QUESTION 78
Which tool provides the easiest way to run ad-hoc queries for data in S3 without the need to set
up or manage any servers.
Choose the correct answer:

A. Redshift
B. Athena
C. EMR
D. SQS

Answer: B
Explanation:
Query services like Amazon Athena, data warehouses like Amazon Redshift, and sophisticated
data processing frameworks like Amazon EMR, all address different needs and use cases. You
just need to choose the right tool for the job. Amazon Redshift provides the fastest query
performance for enterprise reporting and business intelligence workloads, particularly those
involving extremely complex SQL with multiple joins and sub-queries. Amazon EMR makes it
simple and cost-effective to run highly distributed processing frameworks such as Hadoop, Spark,
and Presto when compared to on-premises deployments. Amazon EMR is flexible ?you can run
custom applications and code, and define specific compute, memory, storage, and application
parameters to optimize your analytic requirements. Amazon Athena provides the easiest way to
run ad-hoc queries for data in S3 without the need to setup or manage any servers Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 79
You have a customer-facing application running on multiple M3 instances in two AZs. These
instances are in an Auto Scaling group configured to scale up when load increases. After taking a
look at your CloudWatch metrics, you realize that during specific times every single day, the Auto
Scaling group has a lot more instances than it normally does. Despite this, one of your customers
is complaining that the application is very slow to respond during those time periods, every day.
The application is reading and writing to a DynamoDB table which has 400 Write Capacity Units
and 400 Read Capacity Units. The primary key is the company ID, and the table is storing roughly
20 TB of data. Which solution would solve the issue in a scalable and cost-effective manner?
Choose the correct answer:

A. Add a caching layer in front of the web application with ElastiCache, Memcached, or Redis
B. Double the number of read and write capacity units, because the DynamoDB table is being
throttled when customers from the same company all use the table at the same time
C. Use data pipelines to migrate your DynamoDB table to a new DynamoDB table with a different
primary key that evenly distributes the dataset across the table
D. DynamoDB is not a good solution for this use case. Instead, create a data pipeline to move data
from DynamoDB to Amazon RDS, which is more suitable for this

Answer: C
Explanation:
Distribute the data to use the shards efficiently

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 30
http://www.passleader.com
QUESTION 80
You need to create a recommendations engine for your e-commerce website that sells over 300
items. The items never change and the new users need to be presented with the list of all 300
items in order of their interest. Which option do you use to accomplish this? Choose the 2 correct
answers:

A. RDS MySQL
B. Mahout
C. Spark/SparkMLlib
D. Amazon Machine Learning

Answer: BC
Explanation:
Amazon ML is limited to 100 'categorical' recommendations

QUESTION 81
A large stuffed animal maker is growing rapidly and constantly adding new animals to their
product line. The COO wants to know customers' reaction to each new animal and wants to
ensure that their customers are enjoying the products and use this information for future product
ideas. The social media manager is tasked with capturing the feedback. After the data is
ingested, the company wants to be able to analyze and classify the data in a cost-effective and
practical way. How do you do this? Choose two.
Choose the 2 correct answers:

A. Use Redshift for processing and normalizing the data and requesting predictions from Amazon
ML. Data is sent to Amazon SNS and delivered via email for further investigation.
B. Use EMR and copy the raw data to Amazon S3. For long term analysis and historical reference,
raw data is stored into Amazon S3.
C. Create Amazon Kinesis Streams and use one Kinesis stream to copy the raw data to Amazon S3.
For long term analysis and historical reference, raw data is stored into Amazon S3.
D. Use Lambda for processing and normalizing the data and requesting predictions from Amazon
ML. Data is sent to Amazon SNS using Lambda, and delivered via email for further investigation.

Answer: CD
Explanation:
By using a combination of Amazon Kinesis Streams, Lambda, Amazon ML, and Amazon SES,
we have created a scalable and easily customizable social listening platform. It is important to
note that this picture does not depict the action of creating a ML model. This act would be done at
least one time, but usually done on a regular basis to keep the model up to date. The frequency
of creating a new model depends on the workload and is really only done to make the model
more accurate when things change.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 82
Your application needs to support terabyte-scale ingestion of data. Which big data tool can you
use?
Choose the correct answer:

A. Amazon Redshift
B. Amazon EMR with Spark

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 31
http://www.passleader.com
C. Amazon Data Pipeline
D. Amazon Machine Learning

Answer: B
Explanation:
While Amazon ML can support up to 100 GB of data, terabyte-scale ingestion of data is not
currently supported. Using Amazon EMR to run Spark's Machine Learning Library (MLlib) is a
common tool for such a use case.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 83
You have a Redshift cluster with 10 ds2.8xlarge. What is the most efficient way to load a file that
has 5GB of data?
Choose the correct answer:

A. Split the file into 80 equal sized files and use batch SQL insert statements
B. Split the file into 64 equal sized files and use COPY
C. Split the file into 320 equal sized files and use S3DispCp
D. Split the file into 320 equal sized files and use COPY

Answer: D
Explanation:
10 ds2.8xlarges have 32 vCPUs = 320 slices. 320 files will load using all the slices.

QUESTION 84
You have 30 GB of data that needs to be loaded into Redshift. Which of the following will speed
up the data ingestion?
Choose the 2 correct answers:

A. Copy the data to S3 and use COPY to move the data into Redshift
B. Sort the files on the sort key prior to loading
C. Use S3DistCp
D. Use multiple COPY commands to load the data in parallel

Answer: AB
Explanation:
S3 COPY is the fastest loading mechanism; sorted files do not require Redshift to sort data based
on the sort key at load time.

QUESTION 85
You work for a startup that tracks commercial deliver airplanes via GPS. You receive coordinates
are transmitted from each delivery truck once every 6 seconds. You need to process these
coordinates in real-time from multiple sources. Which tool should you use to digest the data?
Choose the correct answer:

A. Amazon Kinesis
B. Amazon EMR
C. Amazon SQS
D. AWS Data Pipeline

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 32
http://www.passleader.com
Answer: A
Explanation:
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture,
transform, and load streaming data into Amazon S3, Amazon Redshift, and Amazon
Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools
and dashboards
Reference:
https://aws.amazon.com/kinesis/firehose/faqs/

QUESTION 86
Does AWS Direct Connect allow you access to all Availabilities Zones within a region? Choose
the correct answer:

A. Only two Availability Zones per region

B. Yes
C. No
D. Sometimes, depending on the region

Answer: B
Explanation:
Each AWS Direct Connect location enables connectivity to all Availability Zones within the
geographically nearest AWS region.
Reference:
https://aws.amazon.com/directconnect/faqs/

QUESTION 87
You need access one of your Amazon Redshift compute nodes directly, how can you do this?
Choose the correct answer:

A. Change the security settings in the VPC for Redshift

B. It is not possible to access a compute node directly
C. SSH into the EC2 instance hosting Redshift
D. Use the Amazon Redshift APIs

Answer: B
Explanation:
Amazon Redshift compute nodes are in a private network space and can only be accessed from
your data warehouse cluster's leader node. This provides an additional layer of security for your
data.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 88
Your EMR cluster uses twelve m4.large instances and runs 24 hours per day, but it is only used
for processing and reporting during business hours. Which options can you use reduce the costs?
Choose the 2 correct answers:

A. Run twelve d2.8xlarge instead without turn-off

B. Use a MapR distribution of EMR

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 33
http://www.passleader.com
C. Migrate the data from HDFS to S3 using S3DispCp and turn off the cluster when not in use
D. Use Spot instances for tasks nodes when needed

Answer: CD
Explanation:
EMRFS allows for transient clusters and spots reduces costs

QUESTION 89
Which DynamoDB index can be modified after the table is created? Choose the correct answer:

A. Primary hash key

B. GSI
C. LSI
D. None of the above

Answer: B

QUESTION 90
Your company recently purchases five different companies that run different backend databases
that include Redshift, MySQL, Hive on EMR and PostgreSQL. You need a single tool that can run
queries on all the different platform for your daily ad-hoc analysis. Which tools enables you to do
that?
Choose the correct answer:

A. Ganglia
B. QuickSight
C. Presto
D. YARN

Answer: C
Explanation:
A single Presto query can process data from multiple sources.

QUESTION 91
Your application requires real-time streaming of data. Each record is 500 KB. It is of utmost
importance that the data is delivered and processed as it comes in record-by-record with minimal
delay. Which solution allows you to do that? Choose the correct answer:

A. SQS
B. Spark Steaming
C. RDS
D. Kinesis Stream

Answer: D
Explanation:
SQS cannot handle the payload size, Spark streaming batches and RDS cannot

QUESTION 92
Your company deployed 100 sensors to measure traffic speeds on various highways that

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 34
http://www.passleader.com
generated about 4 GB of data per month. The initial architecture used 400 GB RDS with EC2
instances. Over the next 3 months, there will be an additional 100,000 sensors added. You need
to retain the data for at least 2 years for trends reporting. Which is the best solution to accomplish
this?
Choose the correct answer:

A. Replace the RDS instance with a 6 node Redshift cluster with 96 TB of storage
B. Write data from the sensors into a SQS queue and then write into RDS
C. Write data from the sensors into a DynamoDB table and move old data to a Redshift cluster
D. Keep the current architecture but upgrade RDS storage to 3 TB and 10 K provisioned IOPS

Answer: A
Explanation:
Redshift is the only option that has the storage space and performance.

QUESTION 93
You need a fast, fully-managed, petabyte-scale data warehouse that makes it simple and cost-
effective to analyze all your data using your existing business intelligence tools. Which Big Data
tool should you use?
Choose the correct answer:

A. Kinesis
B. Data Pipeline
C. Redshift
D. EMR

Answer: C
Explanation:
Amazon Redshift is ideal for large volumes of structured data that you want to persist and query
using standard SQL and your existing BI tools. Amazon EMR is ideal for processing and
transforming unstructured or semi-structured data to bring in to Amazon Redshift and is also a
much better option for data sets that are relatively transitory, not stored for long-term use.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 94
You are designing the system backend for a big box store. Your data is going to have logs from
every part of the business. The starting size of the data is 20 PB and spans about 8 years, and
new data is going to be added as you go along. The logs do not conform to any specific schema.
Which setup is most desirable for this scenario?
Choose the correct answer:

A. Move the data into S3 and load in RDS using COPY

B. Use Kinesis Firehose to move the data into S3 and load in Redshift using COPY
C. Move the data into S3 and load in Redshift using COPY
D. Use Flume to migrate the data into HDFS and Hive/HiveQL on top of Hadoop to query the data

Answer: D
Explanation:
Hive uses "Schema on Read" and thus, unlike Redshift, the schema does not have to be
predefined.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 35
http://www.passleader.com
QUESTION 95
You have advertising campaign information stored in a DynamoDB table. You need to write
queries that join clickstream data to identify the most effective categories of ads that are
displayed on websites. Which BigData tools shall you use? Choose two.
Choose the 2 correct answers:

A. Quicksight
B. Data Pipeline
C. EMR
D. Kinesis

Answer: CD
Explanation:
Amazon EMR clusters can read and process Amazon Kinesis streams directly, using familiar
tools in the Hadoop ecosystem such as Hive, Pig, MapReduce, the Hadoop Streaming API, and
Cascading. You can also join real-time data from Amazon Kinesis with existing data on Amazon
S3, Amazon DynamoDB, and HDFS in a running cluster. You can directly load the data from
Amazon EMR to Amazon S3 or DynamoDB for post-processing activities.
Reference:
http://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-kinesis.html

QUESTION 96
Which tool allows you to search through CloudWatch logs? Choose the correct answer:

A. Elastic Map Reduce

B. Elastic Cache
C. Elastic Cloud Computing
D. ElasticSearch

Answer: D
Explanation:
Native integration

QUESTION 97
Your application generates a 1 KB JSON payload that needs to be queued and delivered to EC2
instances for applications. At the end of the day, the application needs to replay the data for the
past 24 hours. Which is the best solution for this? Choose the correct answer:

A. Kinesis Firehose
B. SQS
C. SNS
D. Kinesis

Answer: D
Explanation:
Replay and KCL capability

QUESTION 98

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 36
http://www.passleader.com
You have 30 customers, each of whom have a dedicated Kinesis Stream for streaming events.
What action can you take so that the Kinesis charges are separated out on the Amazon invoice at
the end of the month?
Choose the correct answer:

A. Move each customer into a separate AWS account and use consolidated billing
B. Enable CloudWatch to monitor the streams
C. Call Amazon to do that for you
D. Tag the streams with the name of the customer

Answer: D
Explanation:
Consolidated billing is limited to 20 linked accounts

QUESTION 99
Your EMR job processes a single 2 TB data file stored on Amazon Simple Storage Service (S3).
The EMR job runs on two On-Demand core nodes and three On-Demand task nodes. What can
you do to reduce the time it takes for the job to complete? Choose the correct answer:

A. Use versioning on the S3 bucket.

B. Use a VPC to launch the core and task nodes
C. Use spot instances instead of on-demand for the core nodes
D. Adjust the number of simultaneous mapper tasks.

Answer: D
Explanation:
When your cluster runs, Hadoop creates a number of map and reduce tasks. These determine
the number of tasks that can run simultaneously during your cluster. Run too few tasks and you
have nodes sitting idle; run too many and there is significant framework overhead.
Reference:
http://docs.aws.amazon.com/emr/latest/DeveloperGuide/TaskConfiguration_H1.0.3.html

QUESTION 100
Your items are 1.5 KB in size and you want to write 20 items per second. How many WCUs do
you need?
Choose the correct answer:

A. 40
B. 20
C. 10
D. 80

Answer: A
Explanation:
1.5KB/1 = 1.5 KB, which gets rounded up to 2 2 RCUs * 20 items = 40 WCUs

QUESTION 101
You need to perform ad-hoc SQL queries on well-structured data. Data comes in constantly at a
high velocity. Which solution should you use?
Choose the correct answer:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 37
http://www.passleader.com
A. Kinesis Firehose and RDS
B. Kinesis Firehose and RedShift
C. EMR using Hive
D. EMR running Apache Spark

Answer: B
Explanation:
Amazon Kinesis Firehose is the easiest way to ingest streaming data into Amazon Redshift.
Amazon Kinesis Firehose automatically batches and compresses data before loading it into
Amazon Redshift and makes the data available for processing and analytics.
Reference:
https://aws.amazon.com/kinesis/firehose/details/

QUESTION 102
You are tasked to create a stream processing application. There will be multiple streams created
every hour sending data. The data will be fed into EMR for nightly processing and reporting. On
average, the record size will exceed 2.1 MB per record. Which of the following stream processing
frameworks can best process the streams in real-time? Choose the correct answer:

A. Kafka
B. Kinesis Streams
C. Kinesis Firehose
D. SQS

Answer: A
Explanation:
Kinesis gets ruled out since it can only do 1MB per record. Kafka can support bigger records.

QUESTION 103
You work for an online education company. You need to analyze click data from test takers. You
need to know which questions were skipped and which questions customers chose to save for
later. The data will be used in real time to modify the question sequence as students click through
the test. Which option meets the requirements for captioning and analyzing this data? Choose the
correct answer:

A. Push web clicks by session to Amazon Kinesis and analyze behavior with the Kinesis Client
Library to instantiate Kinesis workers
B. Utilize Amazon Redshift and then analyze with Data Pipeline
C. Push to Amazon SQS queue and send events to Amazon RDS and analyze withSQL
D. Utilize Amazon S3 to log clicks and then analyze with EMR

Answer: A
Explanation:
The Kinesis Client Library acts as an intermediary between your record processing logic and
streams. When you start a KCL application, it calls the KCL to instantiate aworker. This call
provides the KCL with configuration information for the application, such as the stream name and
AWS credentials. The KCL performs the following tasks: Connects to the stream Enumerates the
shards Coordinates shard associations with other workers (if any) Instantiates a record processor
for every shard it manages Pulls data records from the stream Pushes the records to the
corresponding record processor Checkpoints processed records Balances shard-worker

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 38
http://www.passleader.com
associations when the worker instance count changes Balances shard-worker associations when
shards are split or merged
Reference:
http://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html

QUESTION 104
You have a system that predicts mean-time-before-failure. You have sensors that monitor a fleet
of cranes on a ship. The sensors send back various attributes and a user labels the data if there
is a failure. What is the simplest solution to accomplish this? Choose the correct answer:

A. Use KCL to write the sensor data into DynamoDB

B. Stream data in Firehose and S3 and process over night using EMR
C. Connect the sensors to the AWS IoT platform and use predict() to find out the cranes that are
going to fail soon
D. Write a custom script that is embedded into sensors, stream the data using a KCL into Firehose
and write into S3. Then use Amazon Machine Learning to model and predict the failures

Answer: C

QUESTION 105
You have created a DynamoDB table with CustomerID as the primary partition key for the table.
You need to find all customers that live in a particular ZipCode. How should you configure this?
Choose the correct answer:

A. Change the primary partition key to ZipCode and use CustomerID as the global secondary index.
B. Use ZipCode as the partition key for a local secondary index, since there are a lot of zip codes
and since you will probably have a lot of customers and change CustomerID to the global
secondary index.
C. Use ZipCode as the partition key for a global secondary index, since there are a lot of zip codes
and since you will probably have a lot of customers.
D. Use ZipCode as the partition key for a local secondary index, since there are a lot of zip codes
and since you will probably have a lot of customers.

Answer: C
Explanation:
Global secondary indexes are particularly useful for tracking relationships between attributes that
have a lot of different values. For example, you could create a DynamoDB table with CustomerID
as the primary partition key for the table and ZipCode as the partition key for a global secondary
index, since there are a lot of zip codes and since you will probably have a lot of customers.
Using the primary key, you could quickly get the record for any customer. Using the global
secondary index, you could efficiently query for all customers that live in a given zip code.
Reference:
https://aws.amazon.com/dynamodb/faqs/

QUESTION 106
You are using a MapReduce job to analyze activation of an item you sell. The job can only be
processing for 2 hours or less per day. The number of activations is usually steady throughout the
year, except the week before Christmas there is a 20X increase. What is the most cost-effective
and performance optimized solution? Choose the correct answer:

A. Amazon RDS and Amazon Elastic MapReduce with Spot instances.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 39
http://www.passleader.com
B. Amazon DynamoDB and Amazon Elastic MapReduce with Reserved instances
C. Amazon DynamoDB and Amazon Elastic MapReduce with Spot instances.
D. Amazon RDS and Amazon Elastic MapReduce with Reserved instances.

Answer: C
Explanation:
DynamoDB is a fully managed NoSQL database service that provides fast and predictable
performance with seamless scalability. Spot instances allow you to bid on spare Amazon EC2
computing capacity. Since Spot instances are often available at a discount compared to On-
Demand pricing, you can significantly reduce the cost of running your applications, grow your
application's compute capacity and throughput for the same budget, and enable new types of
cloud computing applications.

QUESTION 107
A utility company is building an application that stores data coming from more than 10,000
sensors. Each sensor has a unique ID and will send a datapoint (approximately 1KB) every 10
minutes throughout the day. Each datapoint contains the information coming from the sensor as
well as a timestamp. This company would like to query information coming from a particular
sensor for the past week very rapidly and wants to delete all the data that is older than 4 weeks.
Using Amazon DynamoDB for its scalability and rapidity, how do you implement this in the most
cost effective way
Choose the correct answer:

A. One table for each week with a primary key that is the concatenation of the sensor ID and
timestamp
B. One table, with a primary key that is the sensor ID and a hash key that is the timestamp
C. One table with a primary key that is the concatenation of the sensor ID and timestamp
D. One table for each week with a primary key that is the sensor ID and a hash key that is the
timestamp

Answer: A
Explanation:
Composite key with Sensor ID and timestamp would help for faster queries

QUESTION 108
You have a Redshift table called 'item_description' that contains 3MB of data and is frequently
used in joins. What changes to the DISTSTYLE for the table will speed up the queries? Choose
the correct answer:

A. Change the DISTSTYLE to EVEN

B. Change the DISTSTYLE to PARTITION
C. Change the DISTSTYLE to ALL
D. Change the DISTSTYLE to KEY

Answer: C
Explanation:
DISTSTYLE ALL to place table data onto the first slice of every node in the cluster; that table is
small enough to do that.

QUESTION 109

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 40
http://www.passleader.com
What are some of the benefits of running Spark vs MapReduce? Choose the 2 correct answers:

A. No interactive query support

B. Machine learning and streaming libraries included
C. Faster than MapReduce using in-memory processing
D. Slower for iterative processing

Answer: BC

QUESTION 110
You need to be able to access resources in S3 and then write data to tables in S3. You also need
to be able to load table partitions automatically from Amazon S3. Which Big Data tool enables
you to do so?
Choose the correct answer:

A. Redshift and Athena

B. EMR and Pig
C. EMR and Hive
D. Redshift and SQL

Answer: C
Explanation:
Hive allows user extensions via user-defined functions written in Java. Amazon EMR has made
numerous improvements to Hive, including direct integration with DynamoDB and Amazon S3.
For example, with Amazon EMR you can load table partitions automatically from Amazon S3, you
can write data to tables in Amazon S3 without using temporary files, and you can access
resources in Amazon S3, such as scripts for custom map and/or reduce operations and additional
libraries
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 111
You have to do a security audit on an EMR cluster. Which of the following do you have to make
sure are enabled?
Choose the 3 correct answers:

A. Hive encryption
B. EC2 instances using encrypted EBS volumes
C. In-transit data encryption between S3 and EMRFS
D. Server side encryption on S3 using S3-SSE/KMS/Custom

Answer: BCD

QUESTION 112
Your application has large volumes of structured data that you want to persist and query using
standard SQL and your existing BI tools. Which Big Data tool should you use? Choose the
correct answer:

A. Kinesis
B. Redshift

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 41
http://www.passleader.com
C. EMR
D. Data Pipeline

Answer: B
Explanation:
Amazon Redshift is ideal for large volumes of structured data that you want to persist and query
using standard SQL and your existing BI tools. Amazon EMR is ideal for processing and
transforming unstructured or semi-structured data to bring in to Amazon Redshift and is also a
much better option for data sets that are relatively transitory, not stored for long-term use.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 113
What are the options to authenticate an IoT thing? Choose the 3 correct answers:

A. IAM users, groups, and roles

B. Amazon Cognito identities
C. X.509 certificates
D. KMS

Answer: ABC

QUESTION 114
You need real-time reporting on logs that are being generated from your applications. In addition,
you need anomaly detection. The processing latency needs to be one second or less. Which
option would you choose?
Choose the correct answer:

A. Kafka
B. Spark Streaming with SparkSQL and MLlib
C. Kinesis Streams with Kinesis Analytics
D. Firehose to S3 and Athena

Answer: C
Explanation:
Lowest latency

QUESTION 115
What option can you enable on a KCL application to reduce your Kinesis costs? Choose the 3
correct answers:

A. Compression
B. Aggregation
C. Batching
D. Collection

Answer: BCD

QUESTION 116

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 42
http://www.passleader.com
You are using QuickSight to identify demand vs supply trends over multiple months. Which type
of visualization do you choose?
Choose the correct answer:

A. Line Chart
B. Bar Charts
C. Pivot Table
D. Scatter Plot

Answer: A
Explanation:
Line charts show the individual values of a set of measures or dimensions against the range
displayed by the Y-axis. Area line charts differ from regular line charts in that each value is
represented by a colored area of the chart instead of just a line to make it easier to evaluate item
values relative to each other.

QUESTION 117
You have a lot of data in your on-premises data warehouse that you need to load into Amazon
Redshift. How do you load this data as quickly as possible? Choose two Choose the 2 correct
answers:

A. Snowball
B. Data Pipeline
C. Import/Export
D. Direct Connect

Answer: CD
Explanation:
You can use AWS Import/Export to transfer the data to Amazon S3 using portable storage
devices. In addition, you can use AWS Direct Connect to establish a private network connection
between your network or datacenter and AWS. You can choose 1Gbit/sec or 10Gbit/sec
connection ports to transfer your data.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 118
You have 3 Kinesis KCL applications that are reading a Kinesis Streams and are falling behind.
CloudWatch is emitting 'ProvisionedThroughputExceededException' errors on your steam. What
corrective active do you need to take to make sure you can use at least 3 KCL applications?
Choose the correct answer:

A. Add more shards to your Kinesis Stream

B. Convert the Kinesis Streams to Kinesis Firehouse
C. Add more KCL applications
D. Pick a different partition key for your stream

Answer: A
Explanation:
You are running out of throughput. Adding shards will increase you throughput

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 43
http://www.passleader.com
QUESTION 119
You need to analyze a large set of data updates from Kinesis and DynamoDB. Which Big Data
tool should you use?
Choose the correct answer:

A. Elasticsearch
B. Quicksight
C. EMR
D. Redshift

Answer: A
Explanation:
Amazon ES is ideal for querying and searching large amounts of data. Organizations can use
Amazon ES to do the following: Analyze activity logs, such as logs for customer-facing
applications or websites; analyze CloudWatch logs with Elasticsearch; analyze product usage
data coming from various services and systems; analyze social media sentiments and CRM data,
and find trends for brands and products; analyze data stream updates from other AWS services,
such as Amazon Kinesis Streams and DynamoDB; provide customers with a rich search and
navigation experience; monitor usage for mobile applications.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 120
You have an EMR cluster running that uses EMRFS on S3 for the data. The raw data on S3 is
constantly changing. How can you ensure the updated data is reflected in EMRFS? Choose the
correct answer:

A. Run an UPDATE query using HIVE

B. Enable consistent view
C. Move the data to EBS
D. Sent a RELOAD to the EMR cluster

Answer: B

QUESTION 121
You work for a photo processing startup and need the ability to change an image from color to
grayscale after it has been uploaded to Amazon S3. How can you configure this in AWS? Choose
the correct answer:

A. Real-time file processing ?You can trigger Lambda to invoke a process where a file has been
uploaded to Amazon S3 or modified.
B. Real-time file processing ?You can trigger EMR to invoke a process where a file has been
uploaded to Amazon S3 or modified.
C. Log and data feed intake and processing ?With Amazon Kinesis Streams, you can have
producers push changes directly into an Amazon Kinesis stream
D. Forecast product demand ?Use Amazon Machine Learning to track color information to predict
future changes.

Answer: A
Explanation:
Lambda enables you to execute code in response to triggers such as changes in data, shifts in

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 44
http://www.passleader.com
system state, or actions by users. Lambda can be directly triggered by AWS services such as
Amazon S3, DynamoDB, Amazon Kinesis Streams, Amazon Simple Notification Service
(Amazon SNS), and Amazon CloudWatch, allowing you to build a variety of real-time data
processing systems. Real-time file processing ?You can trigger Lambda to invoke a process
where a file has been uploaded to Amazon S3 or modified. For example, to change an image
from color to grayscale after it has been uploaded to Amazon S3.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 122
You need to leverage Amazon Simple Storage Service (S3) and Amazon Glacier for backups
using a third-party software. How can you ensure the credentials provisioned for the third-party
software only limit access to the "backups-17" folder? Choose the correct answer:

A. A custom IAM user policy limited to the Amazon S3 API for the Amazon Glacier archive "backups-
17"
B. A custom bucket policy limited to the Amazon S3 API in "backups-17"
C. A custom IAM user policy limited to the Amazon S3 API in "backups-17"
D. A custom bucket policy limited to the Amazon S3 API in three Amazon Glacier archive "backups-
17"

Answer: D

QUESTION 123
Which tool provides the fastest query performance for enterprise reporting and business
intelligence workloads, particularly those involving extremely complex SQL with multiple joins and
sub-queries.
Choose the correct answer:

A. Athena
B. EMR
C. SQS
D. Redshift

Answer: D
Explanation:
Query services like Amazon Athena, data warehouses like Amazon Redshift, and sophisticated
data processing frameworks like Amazon EMR, all address different needs and use cases. You
just need to choose the right tool for the job. Amazon Redshift provides the fastest query
performance for enterprise reporting and business intelligence workloads, particularly those
involving extremely complex SQL with multiple joins and sub-queries. Amazon EMR makes it
simple and cost-effective to run highly distributed processing frameworks such as Hadoop, Spark,
and Presto when compared to on-premises deployments. Amazon EMR is flexible - you can run
custom applications and code and define specific compute, memory, storage, and application
parameters to optimize your analytic requirements. Amazon Athena provides the easiest way to
run ad-hoc queries for data in S3 without the need to setup or manage any servers
Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 124
You need to provide customers with rich search and navigation and Monitor usage for mobile

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 45
http://www.passleader.com
applications experience. You also need to analyze data from Kinesis Streams. Which tool should
you use?
Choose the correct answer:

A. Elasticsearch
B. Redshift
C. Quicksight
D. EMR

Answer: A
Explanation:
Amazon ES is ideal for querying and searching large amounts of data. Organizations can use
Amazon ES to do the following: Analyze activity logs, such as logs for customer-facing
applications or websites; analyze CloudWatch logs with Elasticsearch; analyze product usage
data coming from various services and systems; analyze social media sentiments and CRM data,
and find trends for brands and products; analyze data stream updates from other AWS services,
such as Amazon Kinesis Streams and DynamoDB; provide customers with a rich search and
navigation experience; monitor usage for mobile applications
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 125
Which single action can speed up this query: " SELECT count(*) FROM transactions WHERE
date_of_purchase BETWEEN '2017-04-01' AND '2017-05-01' " that runs on a table with 10 million
rows.
Choose the correct answer:

A. Create a sort key on the column date_of_purchase

B. Use date_of_purchase as the DISTKEY
C. Use LZO compression on the date_of_purchase column
D. Use date_of_purchase as the PARTITION KEY

Answer: A
Explanation:
Sort key is the most effective way to get data when a range is specified

QUESTION 126
You need a secure a dedicated connection from your data center to AWS so you can use
additional compute resources (EC2) without using the public internet. Which is your best option?
Choose the correct answer:

A. An Amazon Dedicated Connection.

B. An encrypted tunnel to VPC
C. Direct Connect
D. None of the above; AWS requires you to connect over the public internet

Answer: C

QUESTION 127
You get daily dumps of transaction data into S3 which is batch processed into EMR on a nightly

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 46
http://www.passleader.com
basis. The size of the data spikes up and down regularly. What can be done to reduce the
processing time?
Choose the correct answer:

A. Add task nodes using based on CPU metrics from Ganglia

B. Use Spot Instances for the code nodes
C. Add more core nodes to the cluster
D. Process the data in Redshift instead

Answer: A
Explanation:
Adding more task nodes will improve processing speeds

QUESTION 128
You have to design an EMR system where you will be processing highly confidential data. What
can you do to ensure encryption of data at rest?
Choose the 2 correct answers:

A. TLS
B. VPN
C. LUKS
D. SS3-KMS

Answer: CD
Explanation:
SS3-KMS encrypts the S3 data at rest; Linux Unified Key Setup or LUKS is a disk encryption
specification

QUESTION 129
You currently have an on-premises Oracle database and have decided to leverage AWS and use
Aurora. You need to do this as quickly as possible. How do you achieve this? Choose the correct
answer:

A. Use AWS Direct Connect with AWS Data Migration Service and immediately migrate the
database schema, no provisioning of a target database is needed.
B. Use AWS Data Migration Services and create a target database, migrate the database schema,
set up the data replication process, initiate the full load and a subsequent change data capture
and apply, and conclude with a switchover of your production environment to the new database
once the target database is caught up with the source database.
C. Use AWS Data Pipeline and create a target database, migrate the database schema, set up the
data replication process, initiate the full load and a subsequent change data capture and apply,
and conclude with a switchover of your production environment to the new database once the
target database is caught up with the source database.
D. It is not possible to migrate an on premises database to AWS at this time.

Answer: B
Explanation:
AWS Database Migration Service supports both homogenous and heterogeneous data
replication. Supported database sources include: (1) Oracle, (2) SQL Server, (3) MySQL, (4)
Amazon Aurora (5) PostgreSQL and (6) SAP ASE. All sources are supported on-premises, in
EC2, and RDS except Amazon Aurora which is available only in RDS. RDS SQL Server is

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 47
http://www.passleader.com
supported in bulk extract mode only; the change data capture mode (CDC) is not yet supported.
CDC is supported for on-premises and EC2 SQL Server. Amazon Aurora is only available in
RDS. Supported database targets include: (1) Amazon Aurora, (2) Oracle, (3) SQL Server, (4)
MySQL, (5) PostgreSQL and (6) SAP ASE. All Oracle, SQL Server, MySQL and Postgres targets
are supported on-premises, in EC2 and RDS while SAP ASE is supported only in EC2. Either the
source or the target database (or both) need to reside in RDS or on EC2. Replication between
on-premises to on-premises databases is not supported. During a typical simple database
migration, you will create a target database, migrate the database schema, set up the data
replication process, initiate the full load and a subsequent change data capture and apply, and
conclude with a switchover of your production environment to the new database once the target
database is caught up with the source database.
Reference:
https://aws.amazon.com/dms/faqs/

QUESTION 130
Your company creates mobile games that use DynamoDB as the backend data store to save the
high scores. There is a hash+range key on the main table, where the game is the partition key
and the username is the sort key. Your highest selling game customers complain that your game
slows down to a halt when trying to send the high scores to your backend. CloudWatch metrics
suggest you are not exceeding your provisioned WCUs. Which options can improve the user
experience without increasing costs?
Choose the 2 correct answers:

A. Change the partition key to use just the game

B. Change the partition key to use the username as the partition key and the game as the sort key
C. Change the partition key to use the device ID as the partition key and the game name as the sort
key
D. Provision more WCUs

Answer: BC
Explanation:
You are getting hot keys, need to spread create keys that are spread out, both usernames and
device IDs can do that

QUESTION 131
You have configured an application that batches up data on the servers before submitting it for
intake. Your front-end or application server failed and now you have lost log data. How can you
prevent this from occurring in the future?
Choose the correct answer:

A. Submit system and application logs to Amazon Kinesis Streams and access the stream for
processing within seconds
B. Submit system and application logs to Amazon EMR and access the data for processing within
seconds
C. Input historical log information using Amazon Machine Learning and use Redshift to analyze and
store the logs
D. Trigger Lambda to invoke a process where a log file has been uploaded to Amazon S3 or
modified

Answer: A
Explanation:
With Amazon Kinesis Streams, you can have producers push data directly into an Amazon

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 48
http://www.passleader.com
Kinesis stream. For example, you can submit system and application logs to Amazon Kinesis
Streams and access the stream for processing within seconds. This prevents the log data from
being lost if the front-end or application server fails, and reduces local log storage on the source.
Amazon Kinesis Streams provides accelerated data intake because you are not batching up the
data on the servers before you submit it for intake.
Reference:
https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

QUESTION 132
A partition key is used to group data by shard within a stream.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
A partition key is used to group data by shard within a stream. The Streams service segregates
the data records belonging to a stream into multiple shards, using the partition key associated
with each data record to determine which shard a given data record belongs to.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

QUESTION 133
A Kinesis stream's retention period is set to a default of ___ hours after creation.
Choose the correct answer:

A. 24
B. 2
C. 12
D. 72

Answer: A
Explanation:
An Amazon Kinesis stream stores records from 24 hours by default, up to 168 hours.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html

QUESTION 134
Kinesis Streams can be transformed and processed using Lambda.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
When you enable Firehose data transformation, Firehose buffers incoming data up to 3 MB or the
buffering size you specified for the delivery stream, whichever is smaller. Firehose then invokes
the specified Lambda function with each buffered batch asynchronously. The transformed data is
sent from Lambda to Firehose for buffering. Transformed data is delivered to the destination

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 49
http://www.passleader.com
when the specified buffering size or buffering interval is reached, whichever happens first.
Reference:
http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html

QUESTION 135
What is used to group data by shard within a stream? Choose the correct answer:

A. Hash Key
B. RecordId
C. Sequence Number
D. Partition Key

Answer: D
Explanation:
A partition key is used to group data by shard within a stream. The stream's service segregates
the data records belonging to a stream into multiple shards, using the partition key associated
with each data record to determine to which shard a given data record belongs. Partition keys are
Unicode strings with a maximum length limit of 256 bytes. An MD5 hash function is used to map
partition keys to 128-bit integer values and to map associated data records to shards. A partition
key is specified by the applications putting the data into a stream.

QUESTION 136
What is the maximum data retention period for a Kinesis Stream? Choose the correct answer:

A. 1 hour
B. 1 day
C. 7 days
D. 14 days

Answer: C
Explanation:
Data records are accessible for a default of 24 hours from the time they are added to a stream.
This time frame is called the retention period and is configurable in hourly increments from 24 to
168 hours (1 to 7 days).
Reference:
http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html

QUESTION 137
In Kinesis Firehose, the buffer sizes hints range from 1 MB to 128 MB for Amazon S3 delivery.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
The buffer sizes hints range from 1 MB to 128 MB for Amazon S3 delivery and 1 MB to 100 MB
for Amazon Elasticsearch Service delivery. The size threshold is applied to the buffer before
compression.
Reference:
http://docs.aws.amazon.com/firehose/latest/dev/limits.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 50
http://www.passleader.com
QUESTION 138
The KCL (Kinesis Client Library) creates a(n) ___ with the same name of your application to
manage state.
Choose the correct answer:

A. DynamoDB Table
B. S3 Bucket
C. RDS Table
D. EC2 Instance

Answer: A
Explanation:
For each Amazon Kinesis Streams application, the KCL uses a unique Amazon DynamoDB table
to keep track of the application's state. Because the KCL uses the name of the Amazon Kinesis
Streams application to create the name of the table, each application name must be unique.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html

QUESTION 139
By default, each account can have up to _ Firehose delivery streams per region.
Choose the correct answer:

A. 5
B. 15
C. 10
D. 20

Answer: D
Explanation:
By default, each account can have up to 20 Firehose delivery streams per region. This limit can
be increased using the Amazon Kinesis Firehose Limits form.
Reference:
http://docs.aws.amazon.com/firehose/latest/dev/limits.html

QUESTION 140
You can configure the agent to monitor multiple file directories and send data to multiple streams.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
By specifying multiple flow configuration settings, you can configure the agent to monitor multiple
file directories and send data to multiple streams. In the following the configuration example, the
agent monitors two file directories and sends data to an Amazon Kinesis stream and a Firehose
delivery stream respectively. Note that you can specify different endpoints for Streams and
Firehose so that your Amazon Kinesis stream and Firehose delivery stream don't need to be in
the same region.
Reference:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 51
http://www.passleader.com
http://docs.aws.amazon.com/streams/latest/dev/writing-with-agents.html

QUESTION 141
Splitting a shard reduces the performance of a Kinesis stream.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Splitting increases the number of shards in your stream and, therefore, increases the data
capacity of the stream. Because you are charged on a per-shard basis, splitting increases the
cost of your stream. Similarly, merging reduces the number of shards in your stream and
therefore decreases the data capacity--and cost--of the stream.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html

QUESTION 142
Firehose provides the following Lambda blueprints that you can use to create a Lambda function
for data transformation. Select all that apply:
Choose the 3 correct answers:

A. Syslog to JSON
B. Apache Log to JSON
C. Apache Log to CSV
D. Syslog to YAML

Answer: ABC
Explanation:
Firehose provides the following Lambda blueprints that you can use to create a Lambda function
for data transformation. General Firehose Processing -- Contains the data transformation and
status model described in the previous section. Use this blueprint for any custom transformation
logic. Apache Log to JSON -- Parses and converts Apache log lines to JSON objects, using
predefined JSON field names. Apache Log to CSV -- Parses and converts Apache log lines to
CSV format. Syslog to JSON -- Parses and converts Syslog lines to JSON objects, using
predefined JSON field names. Syslog to CSV -- Parses and converts Syslog lines to CSV format.
Reference:
http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html

QUESTION 143
What is the maximum Direct Connect speed available? Choose the correct answer:

A. 10M
B. 1G
C. 1M
D. 10G

Answer: D
Explanation:
AWS Direct Connect links your internal network to an AWS Direct Connect location over a

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 52
http://www.passleader.com
standard 1 gigabit or 10 gigabit Ethernet fiber-optic cable.
Reference:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html

QUESTION 144
You transfer large amounts of data to Amazon. What kind of connection would be highly
desirable?
Choose the correct answer:

A. VPN
B. Direct Connect
C. Public Internet
D. VLAN

Answer: B
Explanation:
AWS Direct Connect makes it easy to establish a dedicated network connection from your
premises to AWS. Using AWS Direct Connect, you can establish private connectivity between
AWS and your datacenter, office, or colocation environment, which in many cases can reduce
your network costs, increase bandwidth throughput, and provide a more consistent network
experience than internet-based connections.
Reference:
https://aws.amazon.com/directconnect/

QUESTION 145
You need a direct connection with speeds less than 1G, and you do not have a point of presence
within the direct connect location. How should you provision this? Choose the correct answer:

A. You need to use an APN Direct Connect Partner

B. Create a cross-account IAM role for your Direct Connect partner
C. You can connect directly with AWS
D. Request a Direct Connect using the AWS CLI

Answer: A
Explanation:
If you don't have equipment hosted in the same facility as AWS Direct Connect, you can use a
network provider to connect to AWS Direct Connect. The provider does not have to be a member
of the Amazon Partner Network (APN) partner to connect you. You can get started using a
network provider to connect to AWS Direct Connect by completing the steps shown in the
following table.
Reference:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/getstarted_network_provider.html

QUESTION 146
Multiple AWS accounts can share a Direct Connect connection.
Choose the correct answer:

A. True
B. False

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 53
http://www.passleader.com
Explanation:
To use your AWS Direct Connect connection with another AWS account, you can create a hosted
virtual interface for that account. These hosted virtual interfaces work the same as standard
virtual interfaces and can connect to public resources or a VPC.
Reference:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/WorkingWithVirtualInterfaces.html

QUESTION 147
What is the maximum virtual interfaces per AWS Direct Connect connection before submitting a
request to Amazon?
Choose the correct answer:

A. 100
B. 50
C. 20
D. 10

Answer: B
Explanation:
Virtual interfaces per AWS Direct Connect connection: 50 Reference:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html

QUESTION 148
What steps need to be completed to provision Direct Connect? Select all that apply.
Choose the 2 correct answers:

A. Request a Direct Connect from the AWS Console

B. Create a cross-account IAM role for your Direct Connect partner
C. Request a Direct Connect using the AWS CLI
D. Download and send the Letter of Authority and Customer Facility Assignment to your colocation
partner or work with an APN partner

Answer: AD
Explanation:
Step 1: Sign up for Amazon Web Services
Step 2: Submit AWS Direct Connect Connection request
Step 3: Download the LOA-CFA and complete the cross-connect (Optional)
Step 4: Configure redundant connections with AWS Direct Connect
Step 5: Create a virtual interface Step 6: Download router configuration
Step 7: Verify your virtual interface
Reference:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/getstarted.html

QUESTION 149
Each Direct Connect is limited to one region only.
Choose the correct answer:

A. True
B. False

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 54
http://www.passleader.com
Explanation:
Each AWS Direct Connect location enables connectivity to the geographically nearest AWS
region. You can access all AWS services available in that region.
Reference:
https://aws.amazon.com/directconnect/faqs/

QUESTION 150
Snowball can encrypt your data at load.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
With AWS Snowball, encryption is enforced, protecting your data at rest and in physical transit.
Reference:
http://docs.aws.amazon.com/snowball/latest/ug/whatissnowball.html

QUESTION 151
AWS Snowball is a virtual appliance.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Snowball is a physical appliance.

QUESTION 152
What is the exabyte-scale data transfer service used to move extremely large amounts of data to
AWS? You can transfer up to 100PB.
Choose the correct answer:

A. Glacierbus
B. Snowman
C. Snowball
D. Snowmobile

Answer: D
Explanation:
AWS Snowmobile is an exabyte-scale data transfer service used to move extremely large
amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long
ruggedized shipping container, pulled by a semi-trailer truck. Snowmobile makes it easy to move
massive volumes of data to the cloud, including video libraries, image repositories, or even a
complete data center migration. Transferring data with Snowmobile is secure, fast, and cost-
effective.
Reference:
https://aws.amazon.com/snowmobile/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 55
http://www.passleader.com
QUESTION 153
AWS Snowmobile is equivalent to how many AWS Snowball appliances? Choose the correct
answer:

A. 12
B. 1500
C. 250
D. 1250

Answer: D
Explanation:
One Snowmobile can transport up to one hundred petabytes of data in a single trip, the
equivalent of using about 1,250 AWS Snowball devices.
Reference:
https://aws.amazon.com/snowmobile/

QUESTION 154
Which of the below is true in regards to pricing for AWS Snowball? Choose the 2 correct
answers:

A. First 10 days of on-site usage is free

B. First 7 days of on-site usage is free
C. Data egress from S3 is free
D. Data ingress from S3 is free

Answer: AD
Explanation:
First 10 days of onsite usage are free* and each extra onsite day is $15. Data transfer IN to
Amazon S3 is $0.00/GB (free). Data transfer OUT of Amazon S3 is priced by region.
Reference:
https://aws.amazon.com/snowball/pricing/

QUESTION 155
Select all that apply for supported credential mechanisms for AWS IoT.
Choose the 3 correct answers:

A. X.509 certificates
B. IAM users, groups, and roles
C. Amazon Cognito
D. Web identity federation

Answer: ABC
Explanation:
AWS IoT supports three types of identity principals for authentication: X.509 certificates IAM
users, groups, and roles Amazon Cognito identities Each identity type supports different use
cases for accessing the AWS IoT message broker and Thing Shadows service. The identity type
you use depends on your choice of application protocol. If you use HTTP, use IAM (users,
groups, roles) or Amazon Cognito identities. If you use MQTT, use X.509 certificates.
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/identity-in-iot.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 56
http://www.passleader.com
QUESTION 156
Device Gateway supports the pub/sub messaging platform.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
AWS IoT enables both companion apps and server applications to access connected devices via
uniform, RESTful APIs. Applications also have the option to use pub/sub to communicate directly
with the connected devices.
Reference:
https://aws.amazon.com/iot-platform/faqs/

QUESTION 157
Device Gateway has the following supported protocols. Select two.
Choose the 2 correct answers:

A. HTTPS
B. PPP
C. MQTT
D. IKE

Answer: AC
Explanation:
The AWS IoT message broker and thing shadows service encrypt all communication with TLS.
TLS is used to ensure the confidentiality of the application protocols (MQTT, HTTP) supported by
AWS IoT. TLS is available in a number of programming languages and operating systems.

QUESTION 158
Which component of AWS IoT allows you to create a persistent, virtual version of each device
that includes the device's latest state so that applications or other devices can read messages
and interact with the device?
Choose the correct answer:

A. Device Mirror
B. Rules Engine
C. Thing Shadow
D. Message Broker

Answer: C
Explanation:
A thing shadow (sometimes referred to as a device shadow) is a JSON document that is used to
store and retrieve current state information for a thing (device, app, and so on). The Thing
Shadows service maintains a thing shadow for each thing you connect to AWS IoT. You can use
thing shadows to get and set the state of a thing over MQTT or HTTP, regardless of whether the
thing is connected to the Internet. Each thing shadow is uniquely identified by its name.
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/iot-thing-shadows.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 57
http://www.passleader.com
QUESTION 159
What is the preferred credential if connecting things to AWS IoT using MQTT? Choose the
correct answer:

A. IAM Cross-Account Role

B. Multi-factor Authentication
C. Amazon Cognito
D. X.509 certificates

Answer: D
Explanation:
HTTPS and WebSockets requests sent to AWS IoT are authenticated using AWS IAM or AWS
Cognito, both of which support the AWS SigV4 authentication. If you are using the AWS SDKs or
the AWS CLI, the SigV4 authentication is taken care of for you under the hood. HTTPS requests
can also be authenticated using X.509 certificates. MQTT messages to AWS IoT are
authenticated using X.509 certificates.
Reference:
https://aws.amazon.com/iot-platform/faqs/

QUESTION 160
AWS recommends to always update the thing shadow instead of the thing directly.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/using-thing-shadows.html

QUESTION 161
What is a thing shadow?
Choose the correct answer:

A. An appliance that is used to store and retrieve current state information for a thing (device, app,
and so on).
B. An S3 Bucket that is used to store and retrieve current state information for a thing (device, app,
and so on).
C. A publish/subscribe broker service that enables the sending and receiving of messages to and
from AWS IoT.
D. A JSON document that is used to store and retrieve current state information for a thing (device,
app, and so on).

Answer: D
Explanation:
A thing shadow (sometimes referred to as a device shadow) is a JSON document that is used to
store and retrieve current state information for a thing (device, app, and so on). The thing
shadow's service maintains a thing shadow for each thing you connect to AWS IoT. You can use
thing shadows to get and set the state of a thing over MQTT or HTTP, regardless of whether the

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 58
http://www.passleader.com
thing is connected to the internet. Each thing shadow is uniquely identified by its name.
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/iot-thing-shadows.html

QUESTION 162
How many attributes can a thing have?
Choose the correct answer:

A. 20
B. 100
C. 50
D. 10

Answer: C
Explanation:
Although thing types are optional, their use provides better discovery of things. Things can have
up to 50 attributes. Things without a thing type can have up to three attributes. A thing can only
be associated with one thing type. There is no limit on the number of thing types you can create
in your account.
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/thing-types.html

QUESTION 163
The Rules Engine has a _____ interface to create a rule.
Choose the correct answer:

A. Oracle
B. SQL
C. PHP
D. Java

Answer: B
Explanation:
The AWS IoT Rules Engine enables continuous processing of inbound data from devices
connected to the AWS IoT service. You can configure rules in the Rules Engine in an intuitive,
SQL-like syntax to automatically filter and transform inbound data.
Reference:
https://aws.amazon.com/iot-platform/faqs/

QUESTION 164
In AWS IoT, what establishes an identity for devices and tracks metadata, such as the devices'
attributes and capabilities?
Choose the correct answer:

A. Registry
B. Device Gateway
C. Device Shadow
D. Rules Engine

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 59
http://www.passleader.com
Explanation:
Devices connected to AWS IoT are represented by things in the thing registry. The thing registry
allows you to keep a record of all of the devices that are connected to your AWS IoT account.
Reference:
http://docs.aws.amazon.com/iot/latest/developerguide/what-is-aws-iot.html

QUESTION 165
How can you automate replication between an Oracle and RDS database? Choose the correct
answer:

A. SQS
B. Data Migration Service
C. MySQL
D. Kinesis

Answer: B
Explanation:
AWS Database Migration Service helps you migrate databases to AWS easily and securely. The
source database remains fully operational during the migration, minimizing downtime to
applications that rely on the database. The AWS Database Migration Service can migrate your
data to and from most widely used commercial and open-source databases. The service supports
homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between
different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to
MySQL. It also allows you to stream data to Amazon Redshift from any of the supported sources
including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, SAP ASE, and SQL Server,
enabling consolidation and easy analysis of data in the petabyte-scale data warehouse. AWS
Database Migration Service can also be used for continuous data replication with high availability.
Reference:
https://aws.amazon.com/dms/

QUESTION 166
True or False: AWS DMS allows for heterogeneous migration; i.e., Oracle Aurora Choose the
correct answer:

A. True
B. False

Answer: A
Explanation:
Reference:
https://aws.amazon.com/dms/

QUESTION 167
You currently have databases running on site and at your data center; what service allows you to
consolidate to one database in Amazon?
Choose the correct answer:

A. AWS Data Migration Service

B. AWS RDS Aurora
C. AWS Kinesis
D. AWS Data Pipeline

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 60
http://www.passleader.com
Answer: A
Explanation:
AWS Database Migration Service can migrate your data to and from most of the widely used
commercial and open source databases. It supports homogeneous migrations such as Oracle to
Oracle, as well as heterogeneous migrations between different database platforms, such as
Oracle to Amazon Aurora. Migrations can be from on-premises databases to Amazon RDS or
Amazon EC2, databases running on EC2 to RDS, or vice versa, as well as from one RDS
database to another RDS database.
Reference:
https://aws.amazon.com/dms/

QUESTION 168
S3 can trigger other services using notifications.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
The Amazon S3 notification feature enables you to receive notifications when certain events
happen in your bucket. To enable notifications, you must first add a notification configuration
identifying the events you want Amazon S3 to publish and the destinations where you want
Amazon S3 to send the event notifications. You store this configuration in the notification
subresource (see Bucket Configuration Options) associated with a bucket. Amazon S3 provides
an API for you to manage this subresource.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

QUESTION 169
Select all methods used to control security with S3.
Choose the 3 correct answers:

A. Bucket Policy
B. IAM Policy
C. Access Control Lists (ACL)
D. AWS CloudWatch

Answer: ABC
Explanation:
Amazon S3 Access Control Lists (ACLs) enable you to manage access to buckets and objects.
Each bucket and object has an ACL attached to it as a subresource. It defines which AWS
accounts or groups are granted access and the type of access. When a request is received
against a resource, Amazon S3 checks the corresponding ACL to verify the requester has the
necessary access permissions. Bucket policy and user policy are two of the access policy options
available for you to grant permission to your Amazon S3 resources. Both use JSON-based
access policy language. The topics in this section describe the key policy language elements,
with emphasis on Amazon S3璼pecific details, and provide example bucket and user policies.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/using-iam-policies.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 61
http://www.passleader.com
QUESTION 170
You need to provision 45 GB of data. How many partitions do you need? Choose the correct
answer:

A. 20
B. 10
C. 15
D. 5

Answer: D
Explanation:
(45/10) = 4.5 =>5
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.
html#GuidelinesForTables.Partitions

QUESTION 171
For each partition key value, the total size of an LSI on a DynamoDB table is limited to 10GB
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
For each partition key value, the total size of all indexed items must be 10 GB or less.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 172
An LSI on a DynamoDB table can be added at any time.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Local secondary indexes are created at the same time that you create a table. You cannot add a
local secondary index to an existing table, nor can you delete any local secondary indexes that
currently exist.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 173
What is the maximum number of local secondary indexes and global secondary indexes per
table?
Choose the correct answer:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 62
http://www.passleader.com
A. 2, 5
B. 5, 2
C. 5, 5
D. 10, 10

Answer: C
Explanation:
For maximum query flexibility, you can create up to 5 global secondary indexes and up to 5 local
secondary indexes per table.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 174
What is the maximum number of Read Capacity Units that a partition can support in DynamoDB?
Choose the correct answer:

A. 1,000
B. 4,000
C. 2,000
D. 3,000

Answer: D
Explanation:
A single partition can support a maximum of 3,000 read capacity units or 1,000 write capacity
units.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.
html#GuidelinesForTables.Partitions

QUESTION 175
Select all CloudWatch metrics that can monitor your DynamoDB table.
Choose the 4 correct answers:

A. ConsumedReadCapacityUnits
B. ReadThrottleEvents
C. ThrottledRequests
D. ConsumedWriteCapacityUnits

Answer: ABCD
Explanation:
Amazon CloudWatch aggregates the following DynamoDB metrics at one-minute intervals:
ConditionalCheckFailedRequests ConsumedReadCapacityUnits ConsumedWriteCapacityUnits
ReadThrottleEvents ReturnedBytes ReturnedItemCount ReturnedRecordsCount
SuccessfulRequestLatency SystemErrors ThrottledRequests UserErrors WriteThrottleEvents
Reference:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/dynamo-metricscollected.html

QUESTION 176
You have a requirement to provision 5500 RCUs and 1500 WCUs. How many partitions do you
need?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 63
http://www.passleader.com
Choose the correct answer:

A. 8
B. 2
C. 1
D. 4

Answer: D
Explanation:
(5500/3000) + (1500/1000) = 3.33 => 4
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.
html#GuidelinesForTables.Partitions

QUESTION 177
What in DynamoDB identifies each item in a table uniquely? Choose the correct answer:

A. Primary Key
B. Attribute
C. Sort Key
D. Identifier

Answer: A
Explanation:
When you create a table, in addition to the table name, you must specify the primary key of the
table. The primary key uniquely identifies each item, so that no two items in the table can have
the same primary key.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html

QUESTION 178
GSI guarantee strict consistency.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Queries on global secondary indexes support eventual consistency only.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 179
Select all the true statements regarding DynamoDB structure:
Choose the 2 correct answers:

A. Each item is required to have a primary key

B. All items are not required to have the same number of attributes
C. All items must have the same number of attributes

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 64
http://www.passleader.com
D. Each item is required to have a sort key

Answer: AB
Explanation:
When you create a table, in addition to the table name, you must specify the primary key of the
table. The primary key uniquely identifies each item, so that no two items in the table can have
the same primary key. Individual items in a DynamoDB table can have any number of attributes,
although there is a limit of 400 KB on the item size.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html

QUESTION 180
GSIs have their own provisioned throughput independent of the main table.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Every global secondary index has its own provisioned throughput settings for read and write
activity. Queries or scans on a global secondary index consume capacity units from the index not
from the base table. The same holds true for global secondary index updates due to table writes.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 181
Which is true of a local secondary index?
Choose the correct answer:

A. It's an index that has the same sort key as the base table, but a different partition key
B. It's an index that has the same partition key as the base table but a different sort key
C. It's an index that has the same partition key and sort key as the base table
D. It's an index with a partition key and a sort key that can be different from those on the base table

Answer: B
Explanation:
An index that has the same partition key as the base table, but a different sort key. A local
secondary index is "local" in the sense that every partition of a local secondary index is scoped to
a base table partition that has the same partition key value.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 182
DynamoDB read and write speeds are determined by the number of available partitions.
Choose the correct answer:

A. True
B. False

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 65
http://www.passleader.com
Explanation:
When it stores data, DynamoDB divides a table's items into multiple partitions and distributes the
data primarily based upon the partition key value. Consequently, to achieve the full amount of
request throughput you have provisioned for a table, keep your workload spread evenly across
the partition key values. Distributing requests across partition key values distributes the requests
across partitions.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.
html

QUESTION 183
What are the filesystems that can run HDFS? Select all that apply.
Choose the 3 correct answers:

A. S3 on EMRFS
B. Local attached storage
C. EBS
D. Glacier on EMRFS

Answer: ABC
Explanation:
Amazon EMR and Hadoop provide a variety of file systems that you can use when processing
cluster steps. You specify which file system to use by the prefix of the URI used to access the
data. For example, s3://myawsbucket/path references an Amazon S3 bucket using EMRFS. The
following table lists the available file systems, with recommendations about when it's best to use
each one.
Reference:
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html

QUESTION 184
Which best describes EMRFS?
Choose the correct answer:

A. An implementation of HDFS used for reading and writing regular files from Amazon EMR directly
to EBS
B. An implementation of HDFS used for reading and writing regular files from Amazon EMR directly
to Amazon S3
C. A distributed, scalable, and portable file system for Hadoop
D. A distributed, scalable, and portable file system for EMR

Answer: B
Explanation:
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon
EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in
Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side
encryption, read-after-write consistency, and list consistency
Reference:
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html

QUESTION 185
For additional processing capacity, tasks nodes can be added on demand on EMR.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 66
http://www.passleader.com
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
The number of instances to use in your cluster is application dependent and should be based on
both the number of resources required to store and process your data and the acceptable amount
of time for your job to complete. As a general guideline, we recommend that you limit 60% of your
disk space to storing the data you will be processing, leaving the rest for intermediate output.
Hence, given 3x replication on HDFS, if you were looking to process 5 TB on m1.xlarge
instances, which have 1,690 GB of disk space, we recommend your cluster contains at least (5
TB * 3) / (1,690 GB * .6) = 15 m1.xlarge core nodes. You may want to increase this number if
your job generates a high amount of intermediate data or has significant I/O requirements. You
may also want to include additional task nodes to improve processing performance
Reference:
https://aws.amazon.com/emr/faqs/

QUESTION 186
____ is an open source data warehouse and analytic package that runs on top of a Hadoop
cluster
Choose the correct answer:

A. Spark
B. Ozie
C. Hive
D. Ganglia

Answer: C
Explanation:
Hive is an open source data warehouse and analytic package that runs on top of a Hadoop
cluster. Hive scripts use a SQL-like language called Hive QL (query language) that abstracts
programming models and supports typical data warehouse interactions. Hive enables you to
avoid the complexities of writing Tez jobs based on directed acyclic graphs (DAGs) or
MapReduce programs in a lower level computer language, such as Java.
Reference:
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive.html

QUESTION 187
____ is a tool for transferring data between Amazon S3, Hadoop, HDFS, and RDBMS databases.
Choose the correct answer:

A. Hadoop
B. Ganglia
C. Sqoop
D. Hive

Answer: C
Explanation:
Sqoop is a tool for transferring data between Amazon S3, Hadoop, HDFS, and RDBMS
databases.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 67
http://www.passleader.com
Reference:
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-sqoop.html

QUESTION 188
EMR can scale to your processing requirements using EC2 autoscaling Choose the correct
answer:

A. True
B. False

Answer: A
Explanation:
When a scale-out rule triggers a scaling activity for an instance group, Amazon EC2 instances
are added to the instance group according to your rules. New nodes can be used by applications
such as Apache Spark and Apache Hive as soon as the Amazon EC2 instance enters the
InService state. You can also set up a scale-in rule that terminates instances and removes nodes.
For more information about the lifecycle of Amazon EC2 instances that scale automatically, see
Auto Scaling Lifecycle in the Auto Scaling User Guide.
Reference:
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html

QUESTION 189
Select all that applies for resizing an EMR cluster.
Choose the 3 correct answers:

A. Core nodes can only be added

B. Master and task nodes can be resized up or down
C. Only task nodes can be resized up or down
D. Only one master is allowed

Answer: ACD
Explanation:
The master node manages the cluster and typically runs master components of distributed
applications. For example, the master node runs the YARN ResourceManager service to manage
resources for applications, as well as the HDFS NameNode service. It also tracks the status of
jobs submitted to the cluster and monitors the health of the instance groups. To monitor the
progress of a cluster, you can SSH into the master node as the Hadoop user and either look at
the Hadoop log files directly or access the user interface that Hadoop or Spark publishes to a web
server running on the master node.
Reference:
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-resize.html

QUESTION 190
You are provisioning an EMR cluster that will be dedicated to Spark machine learning. What is
the best choice of EC2 instance type?
Choose the correct answer:

A. t2
B. m1
C. r3
D. c1

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 68
http://www.passleader.com
Answer: C
Explanation:
EMR allows you the flexibility of choosing optimal instance types to fit different applications. For
example, Spark caches data in memory for faster processing, so it is best to use instances with
more memory (such as the R3 instance family). Also, EMR's ability to use Amazon EC2 Spot
capacity can dramatically reduce the cost of training and retraining ML models. Most of the time,
the Spot market price for larger instances such as the r3.4xl is around 10%-20% of the on-
demand price.
Reference:
https://aws.amazon.com/blogs/big-data/building-a-recommendation-engine-with-spark-ml-on-
amazon-emr-using-zeppelin/

QUESTION 191
To ensure data persistence for transient clusters, the best data store is instance storage.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
There are two types of storage volumes available for EC2 instances: Amazon EBS volumes and
the instance store. With Amazon EMR, both types of storage are ephemeral, meaning the data on
the volumes does not persist through instance termination. This ephemeral storage is ideal for
temporary storage of information that changes frequently, such as HDFS data, as well as buffers,
caches, scratch data, and other temporary content that some applications may "spill" to the local
file system. Although this ephemeral storage is used for HDFS, EMRFS can help ensure that
there is a persistent "source of truth" for data stored in Amazon S3.
Reference:
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html

QUESTION 192
____ is an open source project and is a scalable, distributed system designed to monitor clusters
and grids while minimizing the impact on their performance.
Choose the correct answer:

A. Pig
B. Spark
C. Ganglia
D. Hive

Answer: C
Explanation:
The Ganglia open source project is a scalable, distributed system designed to monitor clusters
and grids while minimizing the impact on their performance. When you enable Ganglia on your
cluster, you can generate reports and view the performance of the cluster as well as inspect the
performance of individual node instances. Ganglia is also configured to ingest and visualize
Hadoop and Spark metrics. For more information about the Ganglia open-source project, go to
http://ganglia.info/.
Reference:
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-ganglia.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 69
http://www.passleader.com
QUESTION 193
EMR is great for storing BLOB data like images and videos.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
If you plan on storing binary large object (BLOB) files such as digital video, images, or music, you
might want to consider storing the data in Amazon S3 and referencing its location in Amazon
Redshift.
Reference:
https://d0.awsstatic.com/whitepapers/enterprise-data-warehousing-on-aws.pdf

QUESTION 194
What are the supported data types for a target in a AML model? Choose the 3 correct answers:

A. Text
B. Binary
C. Numeric
D. Categorical

Answer: BCD
Explanation:
Amazon ML computes the following descriptive statistics for different attribute types: Numeric:
Distribution histograms Number of invalid values Minimum, median, mean, and maximum values
Binary and categorical: Count (of distinct values per category) Value distribution histogram Most
frequent values Unique values counts Percentage of true value (binary only) Most prominent
words Most frequent words Text: Name of the attribute Correlation to target (if a target is set)
Total words Unique words Range of the number of words in a row Range of word lengths Most
prominent words
Reference:
http://docs.aws.amazon.com/machine-learning/latest/dg/data-insights.html

QUESTION 195
With Amazon Kinesis Analytics, you just use standard JSON to process your data streams, so
you don't have to learn any new programming languages.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
With Amazon Kinesis Analytics, you just use standard SQL to process your data streams, so you
don't have to learn any new programming languages.
Reference:
https://aws.amazon.com/kinesis/analytics/faqs/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 70
http://www.passleader.com
QUESTION 196
What is the maximum number of Kinesis Processing Units (KPUs)? Choose the correct answer:

A. 8
B. 2
C. 4
D. 6

Answer: A
Explanation:
The maximum number of Kinesis Processing Units (KPUs) is eight.
Reference:
https://aws.amazon.com/kinesis/analytics/faqs/

QUESTION 197
What destinations are supported with Kinesis Analytics? Select all that apply.
Choose the 3 correct answers:

A. Data Pipeline
B. Elasticsearch
C. Redshift
D. S3

Answer: BCD
Explanation:
Kinesis Analytics supports up to four destinations per application. You can persist SQL results to
Amazon S3, Amazon Redshift, Amazon Elasticsearch Service (through an Amazon Kinesis
Firehose), and Amazon Kinesis Streams. You can write to a destination not directly supported by
Kinesis Analytics by sending SQL results to Amazon Kinesis Streams, and leveraging its
integration with AWS Lambda to send to a destination of your choice.
Reference:
https://aws.amazon.com/kinesis/analytics/faqs/

QUESTION 198
The size of a row in an in-application stream is limited to ____ KB.
Choose the correct answer:

A. 100
B. 10
C. 50
D. 250

Answer: C
Explanation:
The size of a row in an in-application stream is limited to 50 KB.
Reference:
http://docs.aws.amazon.com/kinesisanalytics/latest/dev/limits.html

QUESTION 199

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 71
http://www.passleader.com
Kinesis Analytics elastically scales your application to accommodate the data throughput of your
source stream and your query complexity for most scenarios.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Kinesis Analytics elastically scales your application to accommodate the data throughput of your
source stream and your query complexity for most scenarios.
Reference:
https://aws.amazon.com/kinesis/analytics/faqs/

QUESTION 200
Kinesis Analytics expected latency is in the minutes.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Sub 1 second.

QUESTION 201
Select all common usage patterns for Kinesis Analytics from below:
Choose the 4 correct answers:

A. Create real-time alarms and notifications

B. Feed real-time dashboard
C. Message pipeline for EMR
D. Generate time-series analytics

Answer: ABCD

QUESTION 202
What is the correct definition of a tumbling window? Choose the correct answer:

A. When a windowed query processes each window in a non-overlapping manner, each record on
an in-application stream belongs to a specific window, and its processed only once (when the
query processes the window to which the record belongs).
B. When a windowed query processes each window in an overlapping manner, each record on an
in-application stream belongs to a specific window, and its processed only once (when the query
processes the window to which the record belongs).
C. When a windowed query processes each window in a non-overlapping manner, each record on
an in-application stream belongs to any window, and its processed only once (when the query
processes the window to which the record belongs).
D. None of the above

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 72
http://www.passleader.com
Answer: A
Explanation:
When a windowed query processes each window in a non-overlapping manner, the window is
referred to as a tumbling window. In this case, each record on an in-application stream belongs to
a specific window, and its processed only once (when the query processes the window to which
the record belongs).
Reference:
http://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html

QUESTION 203
Which CloudWatch metric for Kinesis Analytics indicates how far behind from the current time an
application is reading from the streaming source? Choose the correct answer:

A. Records
B. Bytes
C. MillisBehindLatest
D. Flow

Answer: C
Explanation:
MillisBehindLatest indicates how far behind from the current time an application is reading from
the streaming source.
Reference:
http://docs.aws.amazon.com/kinesisanalytics/latest/dev/metrics-dimensions.html

QUESTION 204
Uniqueness, primary key, and foreign key constraints are required and enforced by Amazon
Redshift.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Uniqueness, primary key, and foreign key constraints are informational only; they are not
enforced by Amazon Redshift.
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

QUESTION 205
A small 2 MB fact table can utilize which DISTSTYLE? Choose the 3 correct answers:

A. EVEN
B. ALL
C. AUTO
D. KEY

Answer: ABD
Explanation:
DISTSTYLE EVEN Example If you create a new table with the same data as the USERS table

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 73
http://www.passleader.com
but set the DISTSTYLE to EVEN, rows are always evenly distributed across slices. DISTSTYLE
ALL Example If you create a new table with the same data as the USERS table but set the
DISTSTYLE to ALL, all the rows are distributed to the first slice of each node. DISTKEY
Examples Look at the schema of the USERS table in the TICKIT database. USERID is defined as
the SORTKEY column and the DISTKEY column:
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/c_Distribution_examples.html

QUESTION 206
How do I load data into my Amazon Redshift data warehouse? Select two.
Choose the 2 correct answers:

A. SQL insert
B. SQS
C. S3
D. DynamoDB

Answer: CD
Explanation:
You can load data into Amazon Redshift from a range of data sources including Amazon S3,
Amazon DynamoDB, Amazon EMR, AWS Data Pipeline or any SSH-enabled host on Amazon
EC2 or on-premises. Amazon Redshift attempts to load your data in parallel into each compute
node to maximize the rate at which you can ingest data into your data warehouse cluster.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 207
You can use our COPY command to load data in parallel directly to Amazon Redshift from
Amazon EMR, Amazon DynamoDB, or any SSH-enabled host.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
You can use our COPY command to load data in parallel directly to Amazon Redshift from
Amazon EMR, Amazon DynamoDB, or any SSH-enabled host Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 208
In which distribution style are the rows distributed according to the values in one column? The
leader node will attempt to place matching values on the same node slice.
Choose the correct answer:

A. Odd
B. Key
C. ALL
D. Even

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 74
http://www.passleader.com
Explanation:
In Key distribution style, the rows are distributed according to the values in one column. The
leader node will attempt to place matching values on the same node slice. If you distribute a pair
of tables on the joining keys, the leader node collocates the rows on the slices according to the
values in the joining columns so that matching values from the common columns are physically
stored together.
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html

QUESTION 209
By default, Amazon Redshift retains backups for how many days? Choose the correct answer:

A. 30
B. 14
C. 7
D. 1

Answer: D
Explanation:
By default, Amazon Redshift retains backups for 1 day.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 210
Redshift clusters spread the data on?
Choose the correct answer:

A. Slices
B. Leader nodes
C. Shards
D. Partitions

Answer: A
Explanation:
Slices per node is the number of slices into which a compute node is partitioned.
Reference:
http://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html

QUESTION 211
Redshift leader nodes are charged as EC2 instances.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Compute node hours are the total number of hours you run across all your compute nodes for the
billing period. You are billed for 1 unit per node per hour, so a 3-node data warehouse cluster
running persistently for an entire month would incur 2,160 instance hours. You will not be charged
for leader node hours; only compute nodes will incur charges.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 75
http://www.passleader.com
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 212
What is the maximum nodes a Amazon Redshift data warehouse cluster can contain? Choose
the correct answer:

A. 128
B. 256
C. 100
D. unlimited

Answer: A
Explanation:
An Amazon Redshift data warehouse cluster can contain from 1-128 compute nodes, depending
on the node type.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 213
A workload management (WLM) configuration can define a total concurrency level of _ for all
user-defined queues
Choose the correct answer:

A. 75
B. 25
C. 100
D. 50

Answer: D
Explanation:
A workload management (WLM) configuration can define a total concurrency level of 50 for all
user-defined queues
Reference:
http://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html

QUESTION 214
How can you monitor the performance of an Amazon Redshift data warehouse cluster? Choose
the 2 correct answers:

A. SNS
B. CloudWatch
C. CloudTrail
D. AWS Management Console

Answer: BD
Explanation:
Metrics for compute utilization, storage utilization, and read/write traffic to your Amazon Redshift
data warehouse cluster are available free of charge via the AWS Management Console or
Amazon CloudWatch APIs. You can also add additional, user-defined metrics via Amazon

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 76
http://www.passleader.com
Cloudwatch's custom metric functionality. In addition to CloudWatch metrics, Amazon Redshift
also provides information on query and cluster performance via the AWS Management Console.
This information enables you to see which users and queries are consuming the most system
resources and diagnose performance issues. In addition, you can see the resource utilization on
each of your compute nodes to ensure that you have data and queries that are well balanced
across all nodes.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 215
Amazon Redshift replicates all your data within your data warehouse cluster when it is loaded
and also continuously backs up your data to ____?
Choose the correct answer:

A. S3
B. Kinesis
C. DynamoDB
D. EMR

Answer: A
Explanation:
Amazon Redshift replicates all your data within your data warehouse cluster when it is loaded
and also continuously backs up your data to S3.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 216
Select the true statements from below.
Choose the correct answer:

A. The single-node configuration requires a leader node that manages client connections and
receives queries and two compute nodes that store data and perform queries and computations.
B. The multi-node configuration requires a leader node that manages client connections and
receives queries and two compute nodes that store data and perform queries and computations.
C. A single-node configuration does require a leader node
D. A single-node configuration does not require a leader node

Answer: B
Explanation:
The single node configuration enables you to get started with Amazon Redshift quickly and cost-
effectively and scale up to a multi-node configuration as your needs grow. The multi-node
configuration requires a leader node that manages client connections and receives queries, and
two compute nodes that store data and perform queries and computations. The leader node is
provisioned for you automatically and you are not charged for it.
Reference:
https://aws.amazon.com/redshift/faqs/

QUESTION 217
Which key made up of all of the columns listed in the sort key definition, in the order they are
listed, and is most useful when a query's filter applies conditions, such as filters and joins, that
use a prefix of the sort keys .

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 77
http://www.passleader.com
Choose the correct answer:

A. Interleaved
B. Primary
C. Sort
D. Compound

Answer: D
Explanation:
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html

QUESTION 218
What is the easiest way to put data into Elasticsearch? Choose the correct answer:

A. Kinesis
B. Data Pipeline
C. S3 Import
D. Redshift

Answer: A
Explanation:
Amazon Kinesis Firehose, the easiest way to load streaming data into AWS, now supports
Amazon Elasticsearch Service as a data delivery destination.
Reference:
https://aws.amazon.com/about-aws/whats-new/2016/04/amazon-kinesis-firehose-adds-amazon-
elasticsearch-data-ingestion-and-enhanced-monitoring-features/

QUESTION 219
Indexing is adding shards to ElasticSearch
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
It's adding records.

QUESTION 220
Amazon Elasticsearch Service supports integration with Kibana Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Amazon Elasticsearch Service supports integration with Kibana
Reference:
https://aws.amazon.com/elasticsearch-service/faqs/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 78
http://www.passleader.com
QUESTION 221
An Elasticsearch cluster with six master nodes is optimal.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Three is optimal.

QUESTION 222
The Amazon Elasticsearch Service is integrated with Amazon CloudWatch to produce metrics
that provide information about the state of the domains.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
The Amazon Elasticsearch Service is integrated with Amazon CloudWatch to produce metrics
that provide information about the state of the domains.
Reference:
https://aws.amazon.com/elasticsearch-service/faqs/

QUESTION 223
Amazon ES offers the following benefits of a managed service. Select all that apply.
Choose the 4 correct answers:

A. Data durability
B. Cluster scaling options
C. Self-healing clusters
D. Replication for high availability

Answer: ABCD
Explanation:
Amazon ES offers the following benefits of a managed service: Cluster scaling options Self-
healing clusters Replication for high availability Data durability Enhanced security Node
monitoring
Reference:
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/what-is-amazon-
elasticsearch-service.html

QUESTION 224
Elasticsearch has cluster node allocation across two Availability Zones in the same region, known
as zone awareness
Choose the correct answer:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 79
http://www.passleader.com
A. True
B. False

Answer: A
Explanation:
Reference:
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/what-is-amazon-
elasticsearch-service.html

QUESTION 225
Select the available CloudWatch metrics for Elasticsearch:
Choose the 3 correct answers:

A. ClusterStatus
B. NodeStatus
C. Nodes
D. SearchableDocuments

Answer: ACD
Explanation:
Reference:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/es-metricscollected.html

QUESTION 226
Elasticsearch automated snapshots are kept for 14 days.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Amazon Elasticsearch Service will retain the last 14 days worth of automated daily snapshots.
Reference:
https://aws.amazon.com/elasticsearch-service/faqs/

QUESTION 227
Select all the true statements regarding S3 snapshots with Elasticsearch:
Choose the 2 correct answers:

A. Manual snapshots are stored in your S3 bucket and will incur relevant Amazon S3 usage
charges.
B. There is no charge for automated or manual S3 snapshots.
C. There is an additional charge for automated S3 snapshots.
D. There is no additional charge for the automated daily S3 snapshots.

Answer: AD
Explanation:
There is no additional charge for the automated daily snapshots. The snapshots are stored for
free in an Amazon Elasticsearch Service S3 bucket and will be made available for node recovery

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 80
http://www.passleader.com
purposes. You can use the Elasticsearch snapshot API to create additional manual snapshots in
addition to the daily-automated snapshots created by Amazon Elasticsearch Service. The manual
snapshots are stored in your S3 bucket and will incur relevant Amazon S3 usage charges.
Reference:
https://aws.amazon.com/elasticsearch-service/faqs/

QUESTION 228
Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the need to
set up or manage any servers.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the need to
set up or manage any servers.
Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 229
What is SerDe?
Choose the correct answer:

A. Serializer/Deserializer, which are libraries that tell EMR how to interpret data formats
B. Serializer/Deserializer, which are libraries that tell Athena how to interpret data formats
C. Serializer/Deserializer, which are libraries that tell Hive how to interpret data formats
D. None of the above

Answer: C
Explanation:
SerDe stands for Serializer/Deserializer, which are libraries that tell Hive how to interpret data
formats. Hive DLL statements require you to specify a SerDe, so that the system knows how to
interpret the data that you're pointing to. Amazon Athena uses SerDes to interpret the data read
from Amazon S3.
Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 230
Select the true statements regarding Athena.
Choose the 2 correct answers:

A. Can write complex queries with JOINS

B. Offers full control over the configuration of your clusters and the software installed on them.
C. Allows for data partitioning
D. Athena is ideal for using with custom code to process and analyze extremely large datasets with
the latest big data processing

Answer: AC
Explanation:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 81
http://www.passleader.com
Amazon Athena allows you to partition your data on any column. Partitions allow you to limit the
amount of data each query scans, leading to cost savings and faster performance. You can
specify your partitioning scheme using the PARTITIONED BY clause in the CREATE TABLE
statement. Amazon Athena is ideal for quick, ad-hoc querying and integrates with Amazon
QuickSight for easy visualization, it can also handle complex analysis, including large joins,
window functions, and arrays.
Reference:
https://aws.amazon.com/athena/faqs/

QUESTION 231
You can submit many queries at a time with Athena.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
Currently, you can only submit one query at a time and you can only have 5 (five) concurrent
queries at one time per account.
Reference:
http://docs.aws.amazon.com/athena/latest/ug/service-limits.html

QUESTION 232
What is the default query timeout with Athena?
Choose the correct answer:

A. 30 minutes
B. 5 minutes
C. 60 minutes
D. There is none

Answer: A
Explanation:
Currently, you can only submit one query at a time and you can only have 5 (five) concurrent
queries at one time per account.
Reference:
http://docs.aws.amazon.com/athena/latest/ug/service-limits.html

QUESTION 233
Select all data sources that Amazon QuickSight supports.
Choose the 3 correct answers:

A. Aurora
B. RDS
C. Athena
D. DynamoDB

Answer: ABC
Explanation:
You can connect to AWS data sources , including Amazon RDS, Amazon Aurora, Amazon

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 82
http://www.passleader.com
Redshift, Amazon Athena and Amazon S3. You can also upload Excel spreadsheets or flat files
(CSV, TSV, CLF, and ELF), connect to on-premises databases like SQL Server, MySQL and
PostgreSQL and import data from SaaS applications like Salesforce.
Reference:
https://quicksight.aws/faq/

QUESTION 234
Which QuickSight visual should you use to visualize one or two measures for a dimension?
Choose the correct answer:

A. Tree Maps
B. Heat Map
C. Autograph
D. Pie Graphs

Answer: A
Explanation:
Use tree maps to visualize one or two measures for a dimension.
Reference:
http://docs.aws.amazon.com/quicksight/latest/user/tree-map.html

QUESTION 235
The max data file size in QuickSight is?
Choose the correct answer:

A. 5 GB
B. 1 GB
C. 10 GB
D. 2 GB

Answer: B
Explanation:
Any file you import into SPICE must be 1 GB or smaller, whether that file is local or coming from
Amazon S3. If you are retrieving multiple files from Amazon S3, the total size of the files specified
in the manifest file cannot exceed 10 GB, and the total number of files specified in the manifest
file cannot exceed 1000. Files can have up to 200 columns, with up to 25400 characters per row.
It does not matter how characters are distributed across the fields (with the exception of the field
length limitation of 511 characters) as long as they do not exceed this total.
Reference:
http://docs.aws.amazon.com/quicksight/latest/user/data-source-limits.html

QUESTION 236
What is Amazon QuickSight's super-fast, parallel, in-memory calculation engine? Choose the
correct answer:

A. Athena
B. SPICE
C. HIVE
D. EMR

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 83
http://www.passleader.com
Answer: B
Explanation:
SPICE is Amazon QuickSight's Super-fast, Parallel, In-memory Calculation Engine. SPICE is
engineered from the ground up to rapidly perform advanced calculations and serve data.
Importing your data into SPICE speeds your analytical queries by allowing you to take advantage
of its storage and processing capacity. It also allows you to avoid taking time to retrieve data from
a data source whenever you make a change to an analysis or update a visual.
Reference:
http://docs.aws.amazon.com/quicksight/latest/user/welcome.html

QUESTION 237
You can share data sets with other QuickSight users.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
You can give other Amazon QuickSight users access to a data set by sharing it with them. Any
user you share the data set with can create analyses from it. If you make a user an owner when
you share the data set with them, then that user can also refresh, edit, delete, or re-share the
data set.
Reference:
http://docs.aws.amazon.com/quicksight/latest/user/sharing-data-sets.html

QUESTION 238
You should always use root credentials.
Choose the correct answer:

A. True
B. False

Answer: B
Explanation:
You use an access key (an access key ID and secret access key) to make programmatic
requests to AWS. However, do not use your AWS account (root) access key. The access key for
your AWS account gives full access to all your resources for all AWS services, including your
billing information. You cannot restrict the permissions associated with your AWS account access
key.
Reference:
http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html

QUESTION 239
In transit encryption to EMR from S3 is done with TLS.
Choose the correct answer:

A. True
B. False

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 84
http://www.passleader.com
Explanation:
Amazon S3 encryption works with EMR File System (EMRFS) objects read from and written to
Amazon S3. You specify Amazon S3 server-side encryption (SSE) or client-side encryption
(CSE) when you enable at-rest encryption. Amazon S3 SSE and CSE encryption with EMRFS
are mutuallyexclusive; you can choose either but not both. Regardless of whether Amazon S3
encryption is enabled, transport layer security (TLS) encrypts the EMRFS objects in-transit
between Amazon EMR cluster nodes and Amazon S3. For in-depth information about Amazon
S3 encryption, see Protecting Data Using Encryption in theAmazon Simple Storage Service
Developer Guide.
Reference:
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-data-encryption-options.html

QUESTION 240
Data already in S3 can be encrypted with? Select two.
Choose the 2 correct answers:

A. CSE-KMS
B. CSE-C
C. SSE-S3
D. SSE KMS

Answer: CD
Explanation:
Server-side encryption is about protecting data at rest. Server-side encryption with Amazon S3-
managed encryption keys (SSE-S3) employs strong multi-factor encryption. Amazon S3 encrypts
each object with a unique key. Server-side encryption is about protecting data at rest. AWS Key
Management Service (AWS KMS) is a service that combines secure, highly available hardware
and software to provide a key management system scaled for the cloud. AWS KMS uses
customer master keys (CMKs) to encrypt your Amazon S3 objects.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html

QUESTION 241
Select all that apply for IAM best practice:
Choose the 2 correct answers:

A. Grant least privilege

B. Manage permission individually
C. Mange permissions with groups
D. Grant most privilege

Answer: AC
Explanation:
When you create IAM policies, follow the standard security advice of granting least privilege--that
is, granting only the permissions required to perform a task. Determine what users need to do
and then craft policies for them that let the users perform only those tasks. Instead of defining
permissions for individual IAM users, it's usually more convenient to create groups that relate to
job functions (administrators, developers, accounting, etc.). Next, define the relevant permissions
for each group. Finally, assign IAM users to those groups. All the users in an IAM group inherit
the permissions assigned to the group. That way, you can make changes for everyone in a group
in just one place. As people move around in your company, you can simply change what IAM
group their IAM user belongs to.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 85
http://www.passleader.com
Reference:
http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege

QUESTION 242
Select all the best practices for CloudTrail:
Choose the 2 correct answers:

A. Enable in only one region

B. Integrate with Data Pipeline
C. Integrate with CloudWatch Logs
D. Enable in all regions

Answer: CD
Explanation:
One of the ways that you can work with CloudTrail logs is to monitor them by sending them to
CloudWatch Logs. For a trail that is enabled in all regions in your account, CloudTrail sends log
files from all those regions to a CloudWatch Logs log group. You can configure CloudTrail to
deliver log files from multiple regions to a single S3 bucket for a single account.
Reference:
http://docs.aws.amazon.com/awscloudtrail/latest/userguide/monitor-cloudtrail-log-files-with-
cloudwatch-logs.html
http://docs.aws.amazon.com/awscloudtrail/latest/userguide/receive-cloudtrail-log-files-from-
multiple-regions.html

QUESTION 243
A company needs to deploy a data lake solution for their data scientists in which all company
data is accessible and stored in a central S3 bucket. The company segregates the data by
business unit, using specific prefixes. Scientists can only access the data from their own business
unit. The company needs a single sign-on identity and management solution based on Microsoft
Active Directory (AD) to manage access to the data in Amazon S3.

Which method meets these requirements?

A. Use AWS IAM Federation functions and specify the associated role based on the users' groups in
AD.
B. Create bucket policies that only allow access to the authorized prefixes based on the users' group
name in Active Directory.
C. Deploy the AD Synchronization service to create AWS IAM users and groups based on AD
information.
D. Use Amazon S3 API integration with AD to impersonate the users on access in a transparent
manner.

Answer: A
Explanation:
Identity Federation allows organizations to associate temporary credentials to users authenticated
through an external identity provider such as Microsoft Active Directory (AD). These temporary
credentials are linked to AWS IAM roles that grant access to the S3 bucket. Option B does not
work because bucket policies are linked to IAM principles and cannot recognize AD attributes.
Option C does not work because AD Synchronization will not sync directly with AWS IAM, and
custom synchronization would not result in Amazon S3 being able to see group information. D
isn't possible because there is no feature to integrate Amazon S3 directly with external identity
providers.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 86
http://www.passleader.com
QUESTION 244
A customer needs to load a 550-GB data file into an Amazon Redshift cluster from Amazon S3,
using the COPY command. The input file has both known and unknown issues that will probably
cause the load process to fail. The customer needs the most efficient way to detect load errors
without performing any cleanup if the load process fails.

Which technique should the customer use?

A. Split the input file into 50-GB blocks and load them separately.
B. Use COPY with NOLOAD parameter.
C. Write a script to delete the data from the tables in case of errors.
D. Compress the input file before running COPY.

Answer: B
Explanation:
From the AWS Documentation for NOLOAD: NOLOAD checks the integrity of all of the data
without loading it into the database. The NOLOAD option displays any errors that would occur if
you had attempted to load the data. All other options will require subsequent processing on the
cluster which will consume resources.

QUESTION 245
An organization needs a data store to handle the following data types and access patterns:

Key-value access pattern

Complex SQL queries and transactions
Consistent reads
Fixed schema

Which data store should the organization choose?

A. Amazon S3
B. Amazon Kinesis
C. Amazon DynamoDB
D. Amazon RDS

Answer: D
Explanation:
Amazon RDS handles all these requirements, and although Amazon RDS is not typically thought
of as optimized for key-value based access, a schema with a good primary key selection can
provide this functionality. Amazon S3 provides no fixed schema and does not have consistent
read after PUT support. Amazon Kinesis supports streaming data that is consistent as of a given
sequence number but doesn't provide key/value access. Finally, although Amazon DynamoDB
provides key/value access and consistent reads, it does not support SQL- based queries.
?The core of this question is how to send event messages to Kinesis synchronously vs.
asynchronously.

QUESTION 246
A web application emits multiple types of events to Amazon Kinesis Streams for operational
reporting. Critical events must be captured immediately before processing can continue, but
informational events do not need to delay processing.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 87
http://www.passleader.com
What is the most appropriate solution to record these different types of events?

A. Log all events using the Kinesis Producer Library.

B. Log critical events using the Kinesis Producer Library, and log informational events using the
PutRecords API method.
C. Log critical events using the PutRecords API method, and log informational events using the
Kinesis Producer Library.
D. Log all events using the PutRecords API method.

Answer: C
Explanation:
The critical events must be sent synchronously, and the informational events can be sent
asynchronously. The Kinesis Producer Library (KPL) implements an asynchronous send function,
so it can be used for the informational messages. PutRecords is a synchronous send function, so
it must be used for the critical events.

QUESTION 247
A company logs data from its application in large files and runs regular analytics of these logs to
support internal reporting for three months after the logs are generated. After three months, the
logs are infrequently accessed for up to a year. The company also has a regulatory control
requirement to store application logs for seven years.

Which course of action should the company take to achieve these requirements in the most cost-
efficient way?

A. Store the files in S3 Glacier with a Deny Delete vault lock policy for archives less than seven
years old and a vault access policy that restricts read access to the analytics IAM group and write
access to the log writer service role.
B. Store the files in S3 Standard with a lifecycle policy to transition the storage class to Standard - IA
after three months. After a year, transition the files to Glacier and add a Deny Delete vault lock
policy for archives less than seven years old.
C. Store the files in S3 Standard with lifecycle policies to transition the storage class to Standard ?IA
after three months and delete them after a year. Simultaneously store the files in Amazon Glacier
with a Deny Delete vault lock policy for archives less than seven years old.
D. Store the files in S3 Standard with a lifecycle policy to remove them after a year. Simultaneously
store the files in Amazon S3 Glacier with a Deny Delete vault lock policy for archives less than
seven years old.

Answer: C
Explanation:
most cost-effective storage, and ensuring that the regulatory control is met. The lifecycle policy
will store the ?IA objects on S3 Standard during the three months of active use, and then move
the objects to S3 Standard when access will be infrequent. That narrows the possible answer set
to B and C. The Deny Delete vault lock policy will ensure that the regulatory policy is met, but that
policy must be applied over the entire lifecycle of the object, not just after it is moved to Glacier
after the first year. Option C has the Deny Delete vault lock applied over the entire lifecycle of the
object and is the right answer.

QUESTION 248
A data engineer needs to architect a data warehouse for an online retail company to store historic
purchases. The data engineer needs to use Amazon Redshift. To comply with PCI:DSS and meet

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 88
http://www.passleader.com
corporate data protection standards, the data engineer must ensure that data is encrypted at rest
and that the keys are managed by a corporate on-premises HSM.

Which approach meets these requirements in the most cost-effective manner?

A. Create a VPC, and then establish a VPN connection between the VPC and the on-premises
network. Launch the Amazon Redshift cluster in the VPC, and configure it to use your corporate
HSM.
B. Use the AWS CloudHSM service to establish a trust relationship between the CloudHSM and the
corporate HSM over a Direct Connect connection. Configure Amazon Redshift to use the
CloudHSM device.
C. Configure the AWS Key Management Service to point to the corporate HSM device, and then
launch the Amazon Redshift cluster with the KMS managing the encryption keys.
D. Use AWS Import/Export to import the corporate HSM device into the AWS Region where the
Amazon Redshift cluster will launch, and configure Redshift to use the imported HSM.

Answer: A
Explanation:
Amazon Redshift can use an on-premises HSM for key management over the VPN, which
ensures that the encryption keys are locally managed. Option B is possible: CloudHSM can
cluster to an on-premises HSM.
that doesn't
But then key management could be performed on either the on-premises HSM or CloudHSM, and
meet the design goal. Option C does not describe a valid feature of KMS and violates the
requirement for the corporate HSM to manage the keys requirement, even if it were possible.
Option D is not possible because you cannot put hardware into an AWS Region.

QUESTION 249
In regard to DynamoDB, when you create a table with a hash-and-range key
_________________

A. You can optionally define one or more secondary indexes on that table
B. You must define one or more secondary indexes on that table
C. You must define one or more Global secondary indexes on that table
D. You must define one or more Local secondary indexes on that table

Answer: A
Explanation:
When you create a table with a hash-and-range key, you can optionally define one or more
secondary indexes on that table. A secondary index lets you query the data in the table using an
alternate key, in addition to queries against the primary key.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html

QUESTION 250
Amazon DynamoDB supports these scalar data types: ______________.

A. Number and String

B. Number and Binary
C. Number, String, and Binary
D. Number, String, Binary and Datetime

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 89
http://www.passleader.com
Answer: C
Explanation:
Amazon DynamoDB supports three scalar data types: Number, String, and Binary. Additionally,
Amazon DynamoDB supports multi-valued types: Number Set, String Set, and Binary Set.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

QUESTION 251
True or false: In DynamoDB, it is up to you to manage the partitioning and re-partitioning of your
data over additional DynamoDB tables if you need additional scale.

A. True, It is optional to re-partition by yourself or automatically.

B. False, DynamoDB automatically partitions your data and workload.
C. False, the table size is fixed and you cannot re-partition it.
D. True, AWS DynamoDB does automatic partitioning and SSD technologies to meet your
throughput requirements and deliver low latencies at any scale.

Answer: B
Explanation:
Amazon DynamoDB automatically partitions your data and workload over a sufficient number of
servers to meet the scale requirements you provide.
Reference:
https://aws.amazon.com/dynamodb/faqs/

QUESTION 252
Complete this statement: "When you load your table directly from an Amazon ___________
table, you have the option to control the amount of provisioned throughput you consume."

A. DataPipeline
B. S3
C. DynamoDB
D. RDS

Answer: C
Explanation:
When you load your table directly from an Amazon DynamoDB table, you have the option to
control the amount of Amazon DynamoDB provisioned throughput you consume.
Reference:
http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.
html

QUESTION 253
In DynamoDB, which of the following operations is not possible by the console?

A. Copying an item
B. Updating an item
C. Deleting an item
D. Blocking an item

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 90
http://www.passleader.com
Answer: D
Explanation:
By using the console to manage DynamoDB, you can perform the following: adding an item,
deleting an item, updating an item, and copying an item.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AddUpdateDeleteItems.ht
ml

QUESTION 254
DynamoDB uses _____ only as a transport protocol, not as a storage format.

A. JSON
B. XML
C. SGML
D. WDDX

Answer: A
Explanation:
DynamoDB uses JSON only as a transport protocol, not as a storage format. The AWS SDKs use
JSON to send data to DynamoDB, and DynamoDB responds with JSON, but DynamoDB does
not store data persistently in JSON format.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.LowLevelAP
I.html

QUESTION 255
When you create a table in DynamoDB, which one of the following bits of information is not
obligatory to be provided?

A. Units of Capacity required for reads

B. Range key
C. Hash key
D. Units of Capacity required for writes

Answer: B
Explanation:
To create a table in DynamoDB, you should provide the table name, the attribute name for the
primary Hash key, as well as throughput units, read and write units capacity.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStartedCreateTable
s.html

QUESTION 256
A data engineer needs to collect data from multiple Amazon Redshift clusters within a business
and consolidate the data into a single central data warehouse. Data must be encrypted at all
times while at rest or in flight.

What is the most scalable way to build this data collection process?

A. Run an ETL process that connects to the source clusters using SSL to issue a SELECT query for
new data, and then write to the target data warehouse using an INSERT command over another

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 91
http://www.passleader.com
SSL secured connection.
B. Use AWS KMS data key to run an UNLOAD ENCRYPTED command that stores the data in an
unencrypted S3 bucket; run a COPY command to move the data into the target cluster.
C. Run an UNLOAD command that stores the data in an S3 bucket encrypted with an AWS KMS
data key; run a COPY command to move the data into the target cluster.
D. Connect to the source cluster over an SSL client connection, and write data records to Amazon
Kinesis Firehose to load into your target data warehouse.

Answer: B
Explanation:
The most scalable solutions are the UNLOAD/COPY solutions because they will work in parallel,
which eliminates A and D as answers. Option C is incorrect because the data would not be
encrypted in flight, and you cannot encrypt an entire bucket with a KMS key. Option B meets the
encryption requirements, the UNLOAD ENCRYPTED command automatically stores the data
encrypted using-client side encryption and uses HTTPS to encrypt the data during the transfer to
S3.

QUESTION 257
In the DynamoDB console, you can choose the ___ tab to view some key CloudWatch metrics for
a selected table.

A. Browse Items
B. Details
C. Alarms
D. Metrics

Answer: D
Explanation:
In the DynamoDB console, you can choose the Metrics tab to view some key CloudWatch metrics
for a selected table.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/MonitoringConsole_Dyna
moDB.html

QUESTION 258
Which of the following statements is correct of Amazon DynamoDB?

A. Data in DynamoDB cannot be shifted to Amazon Redshift. Instead, data can be shifted to
Amazon CloudWatch.
B. Every item in a DynamoDB table is identified by a foreign key, which allows you to quickly access
data items.
C. DynamoDB does not support multiple native data types (numbers, strings, binaries, and multi-
valued attributes).
D. DynamoDB does not have a fixed schema. Instead, each data item may have a different number
of attributes.

Answer: D
Explanation:
DynamoDB does not have a fixed schema. Instead, each data item may have a different number
of attributes. Multiple data types (strings, numbers, binary, and sets) add richness to the data
model.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 92
http://www.passleader.com
Reference: http://awsdocs.s3.amazonaws.com/dynamodb/latest/dynamodb-dg.pdf

QUESTION 259
What kind of service is provided by AWS DynamoDB?

A. Relational Database
B. Document Database
C. Dynamic Database
D. NoSQL Database

Answer: D
Explanation:
DynamoDB is a fast, fully managed NoSQL database service.
Reference: http://aws.amazon.com/dynamodb/

QUESTION 260
True or false: For an online game, it is better to use DynamoDB than a relational database
engine.

A. This is true because DynamoDB fully manages your database and can grow with your application
requirements.
B. This is false.
C. This is true because DynamoDB has all the functionalities of a relational database.
D. This is false because a relational database engine on Redshift has more functionalities than
DynamoDB.

Answer: A
Explanation:
This is true. An online game might start out with only a few thousand users and a light database
workload consisting of 10 writes per second and 50 reads per second. However, if the game
becomes successful, it may rapidly grow to millions of users and generate tens (or even
hundreds) of thousands of writes and reads per second. It may also create terabytes or more of
data per day. Developing your applications against Amazon DynamoDB enables you to start
small and simply dial-up your request capacity for a table as your requirements scale, without
incurring downtime. Amazon DynamoDB gives you the peace of mind that your database is fully
managed and can grow with your application requirements.
Reference: http://aws.amazon.com/dynamodb/faqs/

QUESTION 261
In regard to DynamoDB, for which one of the following parameters does Amazon not charge you?

A. I/O usage within the same Region

B. Cost per provisioned write units
C. Storage cost
D. Cost per provisioned read units

Answer: A
Explanation:
In DynamoDB, you will be charged for the storage and the throughput you use rather than for the
I/O which has been used.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 93
http://www.passleader.com
Reference: http://aws.amazon.com/dynamodb/pricing/

QUESTION 262
Which of the following does Amazon DynamoDB perform?

A. Neither increment nor decrement operations

B. Only increment on vector values
C. Atomic increment or decrement on scalar values
D. Only atomic decrement operations

Answer: C
Explanation:
Amazon DynamoDB allows atomic increment and decrement operations on scalar values.
Reference: http://aws.amazon.com/dynamodb/faqs/

QUESTION 263
A gaming company comes to you and asks you to build them infrastructure for their site. They are
not sure how big they will be as with all start ups they have limited money and big ideas. What
they do tell you is that if the game becomes successful, like one of their previous games, it may
rapidly grow to millions of users and generate tens (or even hundreds) of thousands of writes and
reads per second. After considering all of this, you decide that they need a fully managed NoSQL
database service that provides fast and predictable performance with seamless scalability. Which
of the following databases do you think would best fit their needs?

A. Amazon DynamoDB
B. Amazon SimpleDB
C. Amazon Redshift
D. Any non-relational database.

Answer: A
Explanation:
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and
predictable performance with seamless scalability. Amazon DynamoDB enables customers to
offload the administrative burdens of operating and scaling distributed databases to AWS, so they
don't have to worry about hardware provisioning, setup and configuration, replication, software
patching, or cluster scaling.
Today's web-based applications generate and consume massive amounts of data. For example,
an online game might start out with only a few thousand users and a light database workload
consisting of 10 writes per second and 50 reads per second. However, if the game becomes
successful, it may rapidly grow to millions of users and generate tens (or even hundreds) of
thousands of writes and reads per second. It may also create terabytes or more of data per day.
Developing your applications against Amazon DynamoDB enables you to start small and simply
dial-up your request capacity for a table as your requirements scale, without incurring downtime.
You pay highly cost-efficient rates for the request capacity you provision, and let Amazon
DynamoDB do the work over partitioning your data and traffic over sufficient server capacity to
meet your needs. Amazon DynamoDB does the database management and administration, and
you simply store and request your data. Automatic replication and failover provides built-in fault
tolerance, high availability, and data durability. Amazon DynamoDB gives you the peace of mind
that your database is fully managed and can grow with your application requirements.
Reference: http://aws.amazon.com/dynamodb/faqs/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 94
http://www.passleader.com
QUESTION 264
Could you use IAM to grant access to Amazon DynamoDB resources and API actions?

A. Yes
B. Depended to the type of access
C. No
D. In DynamoDB there is no need to grant access

Answer: A
Explanation:
Amazon DynamoDB integrates with AWS Identity and Access Management (IAM). You can use
AWS IAM to grant access to Amazon DynamoDB resources and API actions. To do this, you first
write an AWS IAM policy, which is a document that explicitly lists the permissions you want to
grant. You then attach that policy to an AWS IAM user or role.
Reference:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/UsingIAMWithDDB.html

QUESTION 265
Which of the statements below best define Amazon Kinesis Streams? - (choose three)

A. Each record in the stream has a sequenced number that is assigned by Kinesis Streams.
B. An ordered sequence of data records
C. A record is the unit of data stored in the Amazon Kinesis Stream
D. A random sequence of data records

Answer: ABC
Explanation:
An Amazon Kinesis Stream is an ordered sequence of data records. Each record in the stream
has a sequenced number that is assigned by Kinesis Streams.
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 266
An administrator decides to use the Amazon Machine Learning service to classify social media
posts that mention your company into two categories: posts that require a response and posts
that do not. The training dataset of 10,000 posts contains the details of each post, including the
timestamp, author, and full text of the post. You are missing the target labels that are required for
training.

Which two options will create valid target label data?

A. Ask the social media handling team to review each post and provide the label.
B. Use the sentiment analysis NLP library to determine whether a post requires a response.
C. Use the Amazon Mechanical Turk web service to publish Human Intelligence Tasks that ask Turk
workers to label the posts.
D. Using the a priori probability distribution of the two classes, use Monte-Carlo simulation to
generate the labels.

Answer: AC
Explanation:
You need accurate data to train the service and get accurate results from future data. The options
described in B and D would end up training an ML model using the output from a different

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 95
http://www.passleader.com
machine learning model and therefore would significantly increase the possible error rate. It is
extremely important to have a very low error rate (if any!) in your training set, and therefore
human-validated or assured labels are essential.
It is essential when writing mobile applications that you consider the security of both how the
application

QUESTION 267
A mobile application collects data that must be stored in multiple Availability Zones within five
minutes of being captured in the app.

What architecture securely meets these requirements?

A. The mobile app should write to an S3 bucket that allows anonymous PutObject calls.
B. The mobile app should authenticate with an Amazon Cognito identity that is authorized to write to
an Amazon Kinesis Firehose with an Amazon S3 destination.
C. The mobile app should authenticate with an embedded IAM access key that is authorized to write
to an Amazon Kinesis Firehose with an Amazon S3 destination.
D. The mobile app should call a REST-based service that stores data on Amazon EBS. Deploy the
service on multiple EC2 instances across two Availability Zones.

Answer: B
Explanation:
Authenticates and how it stores credentials. Option A uses an anonymous Put, which may allow
other apps to write counterfeit data; Option B is the right answer, because using Amazon Cognito
gives you the ability to securely authenticate pools of users on any type of device at scale. Option
C would put credentials directly into the application, which is strongly discouraged because
applications can be decompiled which can compromise the keys. Option D does not meet our
availability requirements: although the EC2 instances are running in different Availability Zones,
the EBS volumes attached to each instance only store data in a single Availability Zone.

QUESTION 268
Which of the following statements about Amazon Kinesis streams are correct? (choose three)

A. Kinesis Streams can support up to 1,000 records per second for writes and up to a maximum total
data write rate of one megabyte per second, including partition keys.
B. You cannot increase or decrease the number of shards in a stream.
C. Each shard can support up to five transactions per second for reads
D. Kinesis streams can support up to a maximum total data read rate of two megabytes per second.

Answer: ACD

QUESTION 269
Which network connections are used by AWS Snowball to minimize data transfer times?

A. both thinnet and thicknet copper cables

B. both RJ45 and SFP+ with either a fiber or copper interface
C. both USB and Ethernet cables with special adapters
D. both UTP and STP copper cables

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 96
http://www.passleader.com
Explanation:
An AWS Snowball appliance has a 10GBaseT network connection (both RJ45 as well as SFP+
with either a fiber or copper interface) to minimize data transfer times.
Reference: https://aws.amazon.com/snowball/details/

QUESTION 270
You want to export objects in an S3 bucket using the AWS Snowball Management Console.
Assume you have a bucket containing the following objects, sorted in UTF-8 binary order:

Aardvark
Aardwolf
Aasvogel/apple
Aasvogel/banana
Aasvogel/cherry
Banana
Car

What happens if you specify no beginning range and set Aasvogel as the ending range?

A. No objects are exported; you must specify a beginning range.

B. Objects Aasvogel/apple, Aasvogel/banana and Aasvogel/cherry are exported.
C. Objects 01, Aardvark, and Aardwolf are exported.
D. Objects 01, Aardvark, Aardwolf, Aasvogel/apple, Aasvogel/banana and Aasvogel/cherry are
exported.

Answer: D
Explanation:
When you create an export job in the AWS Snowball Management Console, you can choose to
export an entire Amazon S3 bucket or a specific range of objects keys. To export a range, you
define the length of the range by providing either an inclusive range beginning, an inclusive range
ending, or both. Ranges are UTF-8 binary sorted.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-console.html

QUESTION 271
Sources of data that can be imported with the Snowball client are: (1) Files or objects hosted in
locally mounted file systems, and (2) files or objects from a(n) ____.

A. HDFS cluster
B. NFS server
C. AFS server
D. GPFS cluster

Answer: A
Explanation:
Sources of data that can be imported with the Snowball client are: (1) Files or objects hosted in
locally mounted file systems, and (2) files or objects from a Hadoop Distributed File System
(HDFS) cluster. Currently, only HDFS 2.x clusters are supported.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 272
Why does AWS recommend that you delete Snowball logs once the job that the logs are

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 97
http://www.passleader.com
associated with enters the completed status?

A. To avoid running out of local disk space

B. Because they contain potentially sensitive information
C. Because you will be charged for additional s3 storage
D. To avoid thrashing that would slow down the migration process

Answer: B
Explanation:
Snowball logs are saved in plaintext format and contain the names of the files and paths, for the
files that are transferred. This is potentially sensitive information, so AWS strongly suggests that
you delete these logs once the job that the logs are associated with enters the completed status.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 273
When running a Snowball client from the command prompt window, the start command is used
initially to ____.

A. authenticate your access to a Snowball appliance

B. configure your permissions to a Snowball appliance
C. unlock the Snowball appliance
D. power on the Snowball appliance

Answer: A
Explanation:
To authenticate your access to a Snowball appliance, open a terminal (or command prompt
window) on your workstation and run the command with the following syntax:
snowball start -i [Snowball IP Address] -m [Path/to/manifest/file] -u [29 character unlock code]
Reference: http://docs.aws.amazon.com/snowball/latest/ug/transfer-data.html

QUESTION 274
You are using the job management API for AWS Snowball to create and manage jobs in the US
West (Oregon) region. Which RFC documents are used by the API for HTTP request/response
bodies?

A. JSON
B. Protobuf
C. XML
D. Thrift

Answer: A
Explanation:
The job management API for AWS Snowball is a network protocol based on HTTP. It uses JSON
(RFC 4627) documents for HTTP request/response bodies and is an RPC model, in which there
is a fixed set of operations and the syntax for each operation is known to clients without any prior
interaction.
Reference: http://docs.aws.amazon.com/snowball/latest/api-reference/api-reference.html

QUESTION 275
How do you indicate where to send your Snowball once your data has been loaded to it?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 98
http://www.passleader.com
A. Print the return label via the AWS Console and stick it on the Snowball.
B. Turn it off. The E Ink will display changes to the return label. Then ship it to Amazon.
C. Use the E Ink display interface and select the "Return Snowball" option for the label. Then ship it
to Amazon.
D. Pack it in a box with the prepaid shipping label that came with it.

Answer: B
Explanation:
Snowballs don't need to be packed in a box or container because it already has its own physically
rugged shipping container. The E Ink display on the front of the Snowball changes to your return
shipping label when the Snowball is turned off.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-appliance.html

QUESTION 276
Which file formats are supported in Athena by default? Choose the 3 correct answers:

A. Apache Parquet
B. CSV
C. Adobe Acrobat
D. JSON

Answer: ABD

QUESTION 277
Your enterprise application requires key-value storage as the database. The data is expected to
be about 10 GB the first month and grow to 2PB over the next two years. There are no other
query requirements at this time. That solution would you recommend? Choose the correct
answer:

A. Hive on HDFS
B. HBase on HDFS
C. Hadoop with Spark
D. DynamoDB

Answer: B
Explanation:
HBase uses MPP processing, can scale into PBs

QUESTION 278
You have been hired as a consultant to provide a solution for a client to integrate their on-
premises data center to AWS. The customer requires a 300Mbps dedicated and private
connection to their VPC. Which AWS tool do you need? Choose the correct answer:

A. VPC peering
B. Data Pipeline
C. EMR
D. Direct Connect

Answer: D

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 99
http://www.passleader.com
Explanation:
AWS Direct Connect is a network service that provides an alternative to using the internet to
utilize AWS cloud services. 1Gbps and 10Gbps ports are available. Speeds of 50Mbps,
100Mbps, 200Mbps, 300Mbps, 400Mbps, and 500Mbps can be ordered from any APN partners
supporting AWS Direct Connect.
Reference:
https://aws.amazon.com/directconnect/faqs/

QUESTION 279
You need to analyze clickstream data on your website. You want to analyze the pattern of pages
a consumer clicks on and in what order. You need to be able to use the data in real time. Which
option would meet this requirement?
Choose the correct answer:

A. Publish web clicks by session to an Amazon SQS

B. Use ElasticMap Reduce to ingest the data and analyze
C. Send click events directly to Amazon Redshift and then analyze with SQL
D. Use Amazon Kinesis with a worker to process the data received from the Kinesis stream

Answer: D
Explanation:
At run time, a KCL application instantiates a worker with configuration information and then uses
a record processor to process the data received from an Amazon Kinesis stream.
Reference:
http://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html

QUESTION 280
Your company recently purchases five different companies that run different backend databases
that include Redshift, MySQL, Hive on EMR and PostgreSQL. You need a single tool that can run
queries on all the different platform for your daily ad-hoc analysis. Which tools enables you to do
that?
Choose the correct answer:

A. Write the data into SQS and dump the data into S3
B. Stream data from each device into an S3 bucket and migrate the data nightly to RDS for reporting
C. Create files on the device and copy them to S3 and run EMR Hive to query the data
D. Use Amazon Kinesis to collect the data, use Kinesis Analytics for real-time analytics, and save
the data in Redshift for trend analysis

Answer: D
Explanation:
The rest of the options do not do stream processing and analytics.

QUESTION 281
You work for a retail chain that collects point-of-sale data from all stores four times a day to get
sales statistics and to plan personnel schedules based on how busy the store is. Each store runs
independently and thus might have different data in terms of structure and format which comes in
at different frequency during the day. The expected size of the data is generally small but might
be high velocity depending on how busy the store is. The ETL and processing need to be done
on an event-driven basis at the time the data is received. The data needs to be processed with
several transformations and finally stored in a database where complex queries can be run

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 100
http://www.passleader.com
against it. Which option would be the best solution, especially knowing that you do not have a
team of people to manage the infrastructure behind it all and a limited budget? Choose the
correct answer:

A. Transfer the data from the stores to S3, use lambda to validate and load the data in batches into
RDS for analysis, and stop the RDS clusters after business hours
B. Transfer the data from the stores to S3, use lambda to validate and load the data in batches into
a EMR/Spark cluster, and load the output into Redshift for analysis and turn off the EMR cluster
after business hours
C. Use Kinesis streams to get the data from the stores, and write into elastic search
D. Log the data into DynamoDB and run scans for the needed data, then dump the results into S3
and analyze in QuickSight

Answer: B
Explanation:
Event driven using Lambda, transient EMR for costs and Redshift for analytics

QUESTION 282
What is the single point of failure on a Redshift cluster? Choose the correct answer:

A. Leader Node
B. Shards
C. Data Node
D. Parallel COPY commands

Answer: A

QUESTION 283
What is one function of Snowball?

A. To migrate thousands of on-premises workloads to AWS

B. To migrate databases into and out of AWS
C. To transfer exabyte-scale data into and out of the AWS cloud
D. To transfer petabyte-scale data into and out of the AWS cloud

Answer: D
Explanation:
AWS Snowball is a petabyte-scale data transport solution that can be used to securely transfer
large amounts of data into and out of the AWS cloud.
Reference: https://aws.amazon.com/snowball/

QUESTION 284
In AWS Kinesis a Kinesis partition key is used to? (choose one)

A. Strip the blob data from the record and store it in the appropriate data store
B. Segregate and route records to different shards of a stream
C. provide a secondary index for the kinesis stream ID
D. Sequence the records for when they are returned from the ingestion engine

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 101
http://www.passleader.com
Explanation:
A Kinesis partition key is used to segregate and route records to different shards of a stream
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 285
What does it mean if an AWS Snowball action returns a "ServiceUnavailable" error?

A. The request has failed due to a temporary failure of the server.

B. The action or operation requested is invalid.
C. The request processing has failed because of an unknown error, exception or failure.
D. The request signature does not conform to AWS standards.

Answer: A
Explanation:
An Incomplete Signature error is returned when the request signature does not conform to AWS
standards. An Internal Failure error is returned when the request processing has failed because
of an unknown error, exception or failure. An Invalid Action error is returned when the action or
operation requested is invalid. A Service Unavailable error is returned when the request has
failed due to a temporary failure of the server.
Reference: http://docs.aws.amazon.com/snowball/latest/api-reference/CommonErrors.html

QUESTION 286
In AWS Snowball, what happens if you try to transfer an object with a trailing slash in its name?

A. If the name update causes a conflict (e.g., if an object with the target new name already exists),
then the job fails.
B. It is not transferred during the import/export process.
C. Its name is auto-updated before transfer: any trailing slash is replaced by a dash.
D. It is treated as a directory and the job fails if it turns out to be a regular file.

Answer: B
Explanation:
In AWS Snowball, objects with trailing slashes in their names (/ or \) are not transferred. Before
exporting any objects with trailing slashes, you should update their names and remove the slash.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/create-export-job.html

QUESTION 287
What does the following command accomplish?
snowball cp -n hdfs://localhost:9000/ImportantPhotos/Cats s3://MyBucket/Photos/Cats

A. It imports data from a HDFS cluster to a Snowball.

B. It exports data from S3 to a HDFS cluster.
C. It imports data from S3 to a Snowball.
D. It exports data from a HDFS cluster to S3.

Answer: A
Explanation:
To transfer file data from a Hadoop Distributed File System (HDFS) to a Snowball, you specify
the Name node URI as the source schema, which has the hdfs://hostname:port format. For
example:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 102
http://www.passleader.com
snowball cp -n hdfs://localhost:9000/ImportantPhotos/Cats s3://MyBucket/Photos/Cats
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 288
Select all that apply for AWS DMS.
Choose the 2 correct answers:

A. Can only move data from the cloud

B. Can only move data to the cloud
C. Can move data to and from the cloud
D. Zero downtime

Answer: CD
Explanation:
AWS Database Migration Service helps you migrate your databases to AWS with virtually no
downtime. All data changes to the source database that occur during the migration are
continuously replicated to the target, allowing the source database to be fully operational during
the migration process. After the database migration is complete, the target database will remain
synchronized with the source for as long as you choose, allowing you to switchover the database
at a convenient time.

QUESTION 289
How do you provision AWS DMS?
Choose the correct answer:

A. Using your on-premises database

B. AWS SDK
C. AWS CLI
D. AWS Management Console

Answer: D
Explanation:
You can set up a migration task within minutes in the AWS Management Console. A migration
task is where you define the parameters the AWS Database Migration Service uses to execute
the migration. This includes setting up connections to the source and target databases, as well as
choosing the replication instance used to run the migration process. Once setup, the same task
can be used for test runs before performing the actual migration.
Reference:
https://aws.amazon.com/dms/

QUESTION 290
What is a best practice for S3 to achieve better throughput? Choose the correct answer:

A. Integration with CloudTrail

B. Bucket Policies
C. File encryption
D. Randomize naming structure

Answer: D
Explanation:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 103
http://www.passleader.com
If you anticipate that your workload will consistently exceed 100 requests per second, you should
avoid sequential key names. If you must use sequential numbers or date and time patterns in key
names, add a random prefix to the key name. The randomness of the prefix more evenly
distributes key names across multiple index partitions. Examples of introducing randomness are
provided later in this topic.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

QUESTION 291
How can you encrypt S3 at rest?
Choose the 2 correct answers:

A. HTTPS
B. SSL
C. SSE-KMS
D. SS3-S3

Answer: CD
Explanation:
Server-side encryption is about protecting data at rest. Server-side encryption with Amazon S3-
managed encryption keys (SSE-S3) employs strong multi-factor encryption. Server-side
encryption is about protecting data at rest. AWS Key Management Service (AWS KMS) is a
service that combines secure, highly available hardware and software to provide a key
management system scaled for the cloud. AWS KMS uses customer master keys (CMKs) to
encrypt your Amazon S3 objects.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html

QUESTION 292
What is the max file size limit in S3?
Choose the correct answer:

A. 1 TB
B. 5 TB
C. 1 GB
D. No limit

Answer: B
Explanation:
The total volume of data and number of objects you can store are unlimited. Individual Amazon
S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 terabytes. The largest
object that can be uploaded in a single PUT is 5 gigabytes. For objects larger than 100
megabytes, customers should consider using the multipart upload capability.
Reference:
https://aws.amazon.com/s3/faqs/

QUESTION 293
What are two services you can integrate with S3 for monitoring/auditing? Choose the 2 correct
answers:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 104
http://www.passleader.com
A. Kinesis
B. CloudTrail
C. CloudWatch
D. EMR

Answer: BC
Explanation:
Amazon CloudWatch metrics for Amazon S3 can help you understand and improve the
performance of applications that use Amazon S3. There are two ways that you can use
CloudWatch with Amazon S3. Amazon S3 is integrated with CloudTrail, a service that captures
specific API calls made to Amazon S3 from your AWS account and delivers the log files to an
Amazon S3 bucket that you specify. CloudTrail captures API calls made from the Amazon S3
console or from the Amazon S3 API.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/cloudtrail-logging.html

QUESTION 294
Random prefixes on folders in S3 ensure higher throughput on read and write in S3.
Choose the correct answer:

A. True
B. False

Answer: A
Explanation:
If you introduce some randomness in your key name prefixes, the key names, and therefore the
I/O load, will be distributed across more than one partition. If you anticipate that your workload will
consistently exceed 100 requests per second, you should avoid sequential key names. If you
must use sequential numbers or date and time patterns in key names, add a random prefix to the
key name. The randomness of the prefix more evenly distributes key names across multiple index
partitions. Examples of introducing randomness are provided later in this topic.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

QUESTION 295
Select the name of the partition key from this S3 bucket: s3://mybucket/Monday/103094.csv
Choose the correct answer:

A. 103094
B. Monday
C. Mybucket
D. s3://mybucket/Monday/103094.csv

Answer: B
Explanation:
When uploading a large number of objects, customers sometimes use sequential numbers or
date and time values as part of their key names.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 105
http://www.passleader.com
QUESTION 296
You are extra paranoid about your S3 bucket. What steps can you make sure to ensure your data
is protected against accidental loss?
Choose the 3 correct answers:

A. HTTPS
B. MFA Delete
C. Versioning
D. Cross-regional replication

Answer: BCD
Explanation:
Cross-region replication is a bucket-level feature that enables automatic, asynchronous copying
of objects across buckets in different AWS regions. If a bucket's versioning configuration is MFA
Delete璭nabled, the bucket owner must include the x-amz-mfa request header in requests to
permanently delete an object version or change the versioning state of the bucket. Versioning is a
means of keeping multiple variants of an object in the same bucket. You can use versioning to
preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket.
With versioning, you can easily recover from both unintended user actions and application
failures.
Reference:
http://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMFADelete.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html

QUESTION 297
For a detailed look at the status of your ____ in AWS Snowball, you can look at the two
associated logs: a success log and a failure log.

A. accumulated costs
B. physical devices
C. transferred objects
D. available storage

Answer: C
Explanation:
To see the status of your objects that have been transferred with Snowball, you can look at the
success and failure logs. The following list describes the possible values for the report.
Completed ?The transfer was completed successfully. You can find more detailed information in
the success log.
Completed with errors - Some or all of your data was not transferred. You can find more detailed
information in the failure log.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-console.html#report

QUESTION 298
In AWS Snowball, an "InvalidClientTokenId" error is returned when the ____ or AWS access key
ID provided does not exist in the AWS records.

A. X.509 certificate
B. RSA token
C. Kerberos ticket

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 106
http://www.passleader.com
D. PKI private key

Answer: A
Explanation:
In AWS Snowball, a InvalidClientTokenId error is returned when the X.509 certificate or AWS
access key ID provided does not exist in the AWS records. This error returns a HTTP Status
Code 403.
Reference: http://docs.aws.amazon.com/snowball/latest/api-reference/CommonErrors.html

QUESTION 299
Which of the statements below are correct for Amazon Kinesis streams? (choose three)

A. A record is composed of a sequence number and data blob

B. A record is the unit of data stored in the Amazon Kinesis Stream
C. A record is composed of a sequence number, partition key, and data blob.
D. Each record in the stream has a sequenced number that is assigned by Kinesis Streams.

Answer: BCD
Explanation:
With Amazon Kinesis streams
Each record in the stream has a sequenced number that is assigned by Kinesis Streams. A
record is the unit of data stored in the Amazon Kinesis Stream A record is composed of a
sequence number, partition key, and data blob.
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 300
Kinesis Streams supports re-sharding which enables you to adjust the number of shards in your
stream in order to adapt to changes in the rate of data flow through the stream. Which statements
are true about re-sharding?

A. In a shard split, you divide a single shard into two shards.

B. In a shard merge, you combine the two shards into a single shard.
C. You can merge more than two shards in a single operation.
D. You cannot split a shard in to more than two shards in a single operation.

Answer: ABD
Explanation:
Kinesis Streams supports re-sharding which enables you to adjust the number of shards in your
stream in order to adapt to changes in the rate of data flow through the stream. In a shard split,
you divide a single shard into two shards. In a shard merge, you combine the two shards into a
single shard.You cannot split a shard in to more than two shards in a single operation.
You cannot merge more than two shards in a single operation.
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 301
As the default setting, how long will Amazon Kinesis Streams store records for? (choose one)

A. 24 hours
B. 72 hours
C. 48 Hours

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 107
http://www.passleader.com
D. 12 hours

Answer: A
Explanation:
By default Amazon Kinesis Streams store records for 24 hours
Reference: http://docs.aws.amazon.com/streams/latest/dev/working-with-kinesis.html

QUESTION 302
A Snowball client transfers data to a Snowball appliance using the ____ command, with the ____
root directory identifier in the destination path.

A. cp; s3://
B. scp; /usr/sbin/

Answer: A
Explanation:
To transfer data to a Snowball appliance using the Snowball client from a command prompt, you
run the copy command with two paths specified, the source and the destination:
snowball cp [options] [path/to/data/source] s3://[path/to/data/destination]
Reference: http://docs.aws.amazon.com/snowball/latest/ug/transfer-data.html

QUESTION 303
When using Snowball to import data to AWS, the data is stored in a(n) ____.

A. DynamoDB instance
B. EBS volume
C. S3 bucket
D. EC2 instance

Answer: C
Explanation:
When using Snowball to import data to AWS, the data is stored in an S3 bucket. Data can also be
copied from an S3 bucket to your datacenter.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/jobs.html

QUESTION 304
When importing data from your facility to AWS, what are the schemas supported by Snowball?

A. Locally mounted storage (e.g., C:\\) for the data source, and s3 (s3://) or HDFS (hdfs://) for the
destination
B. Locally mounted storage (e.g., C:\\) for the data source and s3 (s3://) for the destination
C. HDFS (hdfs://) for the data source and s3 (s3://) for the destination
D. Locally mounted storage (e.g., C:\\) or HDFS (hdfs://) for the data source, and s3 (s3://) for the
destination

Answer: D
Explanation:
The Snowball client uses schemas to define what kind of data is transferred between the client's
data center and a Snowball. The schemas are declared when a command is issued. Currently,
Snowball supports the following schemas: Locally mounted storage (e.g., C:\\) or hdfs (hdfs://) for

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 108
http://www.passleader.com
the data source, and s3 (s3://) for the destination.
Reference: http://docs.aws.amazon.com/snowball/latest/ug/using-client.html

QUESTION 305
Which of the following is NOT a standard activity in AWS Data Pipeline?

A. SnsAlarm Activity
B. ShellCommand Activity
C. Hive Activity
D. EMR Activity

Answer: A
Explanation:
In AWS Data Pipeline, an activity is a pipeline component that defines the work to perform. AWS
Data Pipeline provides several pre-packaged activities that accommodate common scenarios,
such as moving data from one location to another, running Hive queries, and so on. Activities are
extensible, so you can run your own custom scripts to support endless combinations. AWS Data
Pipeline supports the following types of activities:
. CopyActivity: Copies data from one location to another.
. EmrActivity: Runs an Amazon EMR cluster.
. HiveActivity: Runs a Hive query on an Amazon EMR cluster. . HiveCopyActivity: Runs a Hive
query on an Amazon EMR cluster with support for advanced data filtering and support for
S3DataNode and DynamoDBDataNode. . PigActivity: Runs a Pig script on an Amazon EMR
cluster. . RedshiftCopyActivity: Copies data to and from Amazon Redshift tables. .
ShellCommandActivity: Runs a custom UNIX/Linux shell command as an activity.
. SqlActivity: Runs a SQL query on a database.
Reference:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-activities.html

QUESTION 306
In AWS Snowball, the ____ value is a 29-character code used to decrypt the manifest file.

A. UnlockCode
B. IAM code
C. IAM private key
D. KeyCode

Answer: A
Explanation:
In AWS Snowball, the UnlockCode value is a 29-character code with 25 alphanumeric characters
and 4 hyphens used to decrypt the manifest file. As a best practice, AWS recommends that you
don't save a copy of the UnlockCode in the same location as the manifest file for that job.
Reference:
http://docs.aws.amazon.com/snowball/latest/api-reference/API_GetJobUnlockCode.html

QUESTION 307
An administrator has a 500-GB file in Amazon S3. The administrator runs a nightly COPY
command into a 10-node Amazon Redshift cluster. The administrator wants to prepare the data
to optimize performance of the COPY command.
How should the administrator prepare the data?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 109
http://www.passleader.com
A. Compress the file using gz compression.
B. Split the file into 500 smaller files.
C. Convert the file format to AVRO.
D. Split the file into 10 files of equal size.

Answer: B
Explanation:
The critical aspect of this question is running the COPY command with the maximum amount of
parallelism. The two options that will increase parallelism are B and
D. Option D will load one file per node in parallel, which will increase performance, but option B
will have a greater effect because it will allow Amazon Redshift to load multiple files per instance
in parallel (COPY can process one file per slice on each node). Compressing the files (option
A. is a recommended practice and will also increase performance, but not to the same extent as
loading multiple files in parallel.

QUESTION 308
A root account owner is trying to understand the S3 bucket ACL. Which choice below is a not a
predefined group which can be granted object access via ACL?

A. Canonical user group

B. Log Delivery Group
C. All users group
D. Authenticated user group

Answer: A
Explanation:
An S3 bucket ACL grantee can be an AWS account or one of the predefined Amazon S3 groups.
Amazon S3 has a set of predefined groups.
When granting account access to a group, the user can specify one of the URLs of that group
instead of a canonical user ID. Amazon S3 has the following predefined groups:
. Authenticated Users group: It represents all AWS accounts.
. All Users group: Access permission to this group allows anyone to access the resource.
. Log Delivery group: WRITE permission on a bucket enables this group to write server access
logs to the bucket.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html

QUESTION 309
A company that provides economics data dashboards needs to be able to develop software to
display rich, interactive, data-driven graphics that run in web browsers and leverages the full
stack of web standards (HTML, SVG and CSS).
Which technology provides the most appropriate for this requirement?

A. D3.js
B. Python/Jupyter
C. R Studio
D. Hue

Answer: C

QUESTION 310

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 110
http://www.passleader.com
A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its
Redshift schema. The ORDERS table has foreign key relationships with multiple dimension
tables in this schema.
How should the company determine the most appropriate distribution key for the ORDRES table?

A. Identity the largest and most frequently joined dimension table and ensure that it and the
ORDERS table both have EVEN distribution
B. Identify the target dimension table and designate the key of this dimension table as the
distribution key of the ORDERS table
C. Identity the smallest dimension table and designate the key of this dimension table as the
distribution key of ORDERS table
D. Identify the largest and most frequently joined dimension table and designate the key of this
dimension table as the distribution key for the orders table

Answer: D

QUESTION 311
A company has several teams of analytics. Each team of analysts has their own cluster. The
teams need to run SQL queries using Hive, Spark-SQL and Presto with Amazon EMR. The
company needs to enable a centralized metadata layer to expose the Amazon S3 objects as
tables to the analysts. Which approach meets the requirement for a centralized metadata layer?

A. EMRFS consistent view with a common Amazon DynamoDB table

B. Bootstrap action to change the Hive Metastore to an Amazon RDS database
C. s3distcp with the outputManifest option to generate RDS DDL
D. naming scheme support with automatic partition discovery from Amazon S3

Answer: C

QUESTION 312
The department of transportation for a major metropolitan area has placed sensors on roads at
key locations around the city. The goal is to analyze the flow of traffic and notifications from
emergency services to identity potential issues and to help planners correct trouble spots. A data
engineer needs a scalable and fault-tolerant solution that allows planners to respond to issues
within 30 seconds of their occurrence.
Which solution should the data engineer choose?

A. Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for
analysis.
Collect emergency services events with Amazon SQS and store in Amazon DynamoDB for
analysis
B. Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis.
Collect emergency services events with Amazon Kinesis Firehouse and store in Amazon Redshift
for analysis
C. Collect both sensor data and emergency services events with Amazon Kinesis Streams and use
Amazon DynamoDB for analysis
D. Collect both sensor data and emergency services events with Amazon Kinesis Firehouse and use
Amazon Redshift for Analysis

Answer: C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 111
http://www.passleader.com
QUESTION 313
An online photo album app has a key design feature to support multiple screens (e.g. desktop,
mobile phone, and tablet) with high quality displays. Multiple versions of the image must be saved
in different resolutions and layouts.
The image processing Java program takes an average of five seconds per upload, depending on
the image size and format. Each image upload captures the following image metadata: user,
album, photo label, upload timestamp
The app should support the following requirements:

- Hundreds of user image uploads per second

- Maximum image metadata size of 10 MB
- Maximum image metadata size of 1 KB
- Image displayed in optimized resolution in all supported screens no later than one minute after
image upload

Which strategy should be used to meet these requirements?

A. Write images and metadata to Amazon Kinesis, Use a Kinesis Client Library (KCL) application to
run the image processing and save the image output to Amazon S3 and metadata to the app
repository DB
B. Write image and metadata RDS with BLOB data type. Use AWS Data Pipeline to run the image
processing and save the image output to Amazon S3 and metadata to the app repository DB
C. Upload image with metadata to Amazon S3 use Lambda function to run the image processing
and save the image output to Amazon S3 and metadata to the app repository DB
D. Write image and metadata to Amazon kinesis. Use Amazon Elastic MapReduce (EMR) with
Spark Streaming to run image processing and save image output to Amazon

Answer: C

QUESTION 314
A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data
engineer needs to build a dashboard that will be used by customers. Five big customers
represent 80% of usage, and there is a long tail of dozens of smaller customers. The data
engineer has selected the dashboarding tool.
How should the data engineer make sure that the larger customer workloads do NOT interfere
with the smaller customer workloads?

A. Apply query filters based on customer-id that can NOT be changed by the user and apply
distribution keys on customer id
B. Place the largest customers into a single user group with a dedicated query queue and place the
rest of the customer into a different query queue
C. Push aggregations into an RDS for Aurora instance. Connect the dashboard application to Aurora
rather than Redshift for faster queries
D. Route the largest customers to a dedicated Redshift cluster, Raise the concurrency of the multi-
tenant Redshift cluster to accommodate the remaining customers

Answer: D

QUESTION 315
You have written a server-side Node.js application and a web application with an
HTML/JavaScript front end that uses the Angular.js Framework. The server-side application

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 112
http://www.passleader.com
connects to an Amazon Redshift cluster, issue queries, and then returns the results to the front
end for display. Your user base is very large and distributed, but it is important to keep the cost of
running this application low. Which deployment strategy is both technically valid and the most
cost-effective?

A. Deploy an AWS Elastic Beanstalk application with two environments: one for the Node.js
application and another for the web front end. Launch an Amazon Redshift cluster, and point your
application to its Java Database connectivity (JDBC) endpoint
B. Deploy an AWS OpsWorks stack with three layers: a static web server layer for your front end, a
Node.js app server layer for your server-side application, and a Redshift DB layer Amazon
Redshift cluster
C. Upload the HTML, CSS, images, and JavaScript for the front end to an Amazon Simple Storage
Service (S3) bucket. Create an Amazon CloudFront distribution with this bucket as its origin. Use
AWS Elastic Beanstalk to deploy the Node.js application. Launch an Amazon Redshift cluster,
and point your application to its JDBC endpoint
D. Upload the HTML, CSS, images, and JavaScript for the front end, plus the Node.js code for the
server-side application, to an Amazon S3 bucket. Create a CloudFront distribution with this
bucket as its origin. Launch an Amazon Redshift cluster, and point your application to its JDBC
endpoint
E. Upload the HTML, CSS, images, and JavaScript for the front end to an Amazon S3 bucket. Use
AWS Elastic Beanstalk to deploy the Node.js application. Launch an Amazon Redshift cluster,
and point your application to its JDBC endpoint

Answer: C

QUESTION 316
Using only AWS services. You intend to automatically scale a fleet of stateless of stateless web
servers based on CPU and network utilization metrics. Which of the following services are
needed? Choose 2 answers

A. Auto Scaling
B. Amazon Simple Notification Service
C. AWS Cloud Formation
D. CloudWatch
E. Amazon Simple Workflow Service

Answer: AD

QUESTION 317
When an EC2 instance that is backed by an s3-based AMI is terminated. What happens to the
data on the root volume?

A. Data is unavailable until the instance is restarted

B. Data is automatically deleted
C. Data is automatically saved as an EBS snapshot
D. Data is automatically saved as an EBS volume

Answer: B

QUESTION 318

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 113
http://www.passleader.com
A user has created a launch configuration for Auto Scaling where CloudWatch detailed
monitoring is disabled. The user wants to now enable detailed monitoring. How can the user
achieve this?

A. Update the Launch config with CLI to set InstanceMonitoringDisabled = false

B. The user should change the Auto Scaling group from the AWS console to enable detailed
monitoring
C. Update the Launch config with CLI to set InstanceMonitoring.Enabled = true
D. Create a new Launch Config with detail monitoring enabled and update the Auto Scaling group

Answer: D

QUESTION 319
A web-startup runs its very successful social news application on Amazon EC2 with an Elastic
Load Balancer, an Auto-Scaling group of Java/Tomcat application-servers, and DynamoDB as
data store. The main web-application best runs on m2 x large instances since it is highly memory-
bound Each new deployment requires semi-automated creation and testing of a new AMI for the
application servers which takes quite a while ana is therefore only done once per week. Recently,
a new chat feature has been implemented in nodejs and wails to be integrated in the architecture.
First tests show that the new component is CPU bound Because the company has some
experience with using Chef, they decided to streamline the deployment process and use AWS
Ops Works as an application life cycle tool to simplify management of the application and reduce
the deployment cycles.
What configuration in AWS Ops Works is necessary to integrate the new chat module in the most
cost-efficient and flexible way?

A. Create one AWS OpsWorks stack, create one AWS Ops Works layer, create one custom recipe
B. Create one AWS OpsWorks stack create two AWS Ops Works layers create one custom recipe
C. Create two AWS OpsWorks stacks create two AWS Ops Works layers create one custom recipe
D. Create two AWS OpsWorks stacks create two AWS Ops Works layers create two custom recipe

Answer: C

QUESTION 320
Your firm has uploaded a large amount of aerial image data to S3 In the past, in your on-
premises environment, you used a dedicated group of servers to oaten process this data and
used Rabbit MQ - An open source messaging system to get job information to the servers. Once
processed the data would go to tape and be shipped offsite. Your manager told you to stay with
the current design, and leverage AWS archival storage and messaging services to minimize cost.
Which is correct?

A. Use SQS for passing job messages use Cloud Watch alarms to terminate EC2 worker instances
when they become idle. Once data is processed, change the storage class of the S3 objects to
Reduced Redundancy Storage.
B. Setup Auto-Scaled workers triggered by queue depth that use spot instances to process
messages in SOS Once data is processed,
C. Change the storage class of the S3 objects to Reduced Redundancy Storage. Setup Auto-Scaled
workers triggered by queue depth that use spot instances to process messages in SQS Once
data is processed, change the storage class of the S3 objects to Glacier.
D. Use SNS to pass job messages use Cloud Watch alarms to terminate spot worker instances
when they become idle. Once data is processed, change the storage class of the S3 object to

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 114
http://www.passleader.com
Glacier.

Answer: D

QUESTION 321
What does Amazon S3 stand for?

A. Simple Storage Solution.

B. Storage Storage Storage (triple redundancy Storage).
C. Storage Server Solution.
D. Simple Storage Service.

Answer: D

QUESTION 322
You must assign each server to at least _____ security group

A. 3
B. 2
C. 4
D. 1

Answer: A

QUESTION 323
Before I delete an EBS volume, what can I do if I want to recreate the volume later?

A. Create a copy of the EBS volume (not a snapshot)

B. Store a snapshot of the volume
C. Download the content to an EC2 instance
D. Back up the data in to a physical disk

Answer: B

QUESTION 324
Select the most correct answer: The device name /dev/sda1 (within Amazon EC2) is _____
Select the most correct

A. Possible for EBS volumes

B. Reserved for the root device
C. Recommended for EBS volumes
D. Recommended for instance store volumes

Answer: B

QUESTION 325
If I want an instance to have a public IP address, which IP address should I use?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 115
http://www.passleader.com
A. Elastic IP Address
B. Class B IP Address
C. Class A IP Address
D. Dynamic IP Address

Answer: A

QUESTION 326
What does RRS stand for when talking about S3?

A. Redundancy Removal System

B. Relational Rights Storage
C. Regional Rights Standard
D. Reduced Redundancy Storage

Answer: D

QUESTION 327
All Amazon EC2 instances are assigned two IP addresses at launch, out of which one can only
be reached from within the Amazon EC2 network?

A. Multiple IP address
B. Public IP address
C. Private IP address
D. Elastic IP Address

Answer: C

QUESTION 328
What does Amazon SWF stand for?

A. Simple Web Flow

B. Simple Work Flow
C. Simple Wireless Forms
D. Simple Web Form

Answer: B

QUESTION 329
What is the Reduced Redundancy option in Amazon S3?

A. Less redundancy for a lower cost.

B. It doesn't exist in Amazon S3, but in Amazon EBS.
C. It allows you to destroy any copy of your files outside a specific jurisdiction.
D. It doesn't exist at all

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 116
http://www.passleader.com
QUESTION 330
Resources that are created in AWS are identified by a unique identifier called an __________

A. Amazon Resource Number

B. Amazon Resource Nametag
C. Amazon Resource Name
D. Amazon Reesource Namespace

Answer: C

QUESTION 331
If I write the below command, what does it do?
ec2-run ami-e3a5408a -n 20 -g appserver

A. Start twenty instances as members of appserver group.

B. Creates 20 rules in the security group named appserver
C. Terminate twenty instances as members of appserver group.
D. Start 20 security groups

Answer: A

QUESTION 332
While creating an Amazon RDS DB, your first task is to set up a DB ______ that controls what IP
addresses or EC2 instances have access to your DB Instance.

A. Security Pool
B. Secure Zone
C. Security Token Pool
D. Security Group

Answer: D

QUESTION 333
When you run a DB Instance as a Multi-AZ deployment, the "_____" serves database writes and
reads

A. secondary
B. backup
C. stand by
D. primary

Answer: D

QUESTION 334
Every user you create in the IAM system starts with _________.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 117
http://www.passleader.com
A. Partial permissions
B. Full permissions
C. No permissions

Answer: C

QUESTION 335
Can you create IAM security credentials for existing users?

A. Yes, existing users can have security credentials associated with their account.
B. No, IAM requires that all users who have credentials set up are not existing users
C. No, security credentials are created within GROUPS, and then users are associated to GROUPS
at a later time.
D. Yes, but only IAM credentials, not ordinary security credentials.

Answer: A

QUESTION 336
What does Amazon EC2 provide?

A. Virtual servers in the Cloud.

B. A platform to run code (Java, PHP, Python), paying on an hourly basis.
C. Computer Clusters in the Cloud.
D. Physical servers, remotely managed by the customer.

Answer: A

QUESTION 337
Amazon SWF is designed to help users...

A. Design graphical user interface interactions

B. Manage user identification and authorization
C. Store Web content
D. Coordinate synchronous and asynchronous tasks which are distributed and fault tolerant.

Answer: D

QUESTION 338
Can I control if and when MySQL based RDS Instance is upgraded to new supported versions?

A. No
B. Only in VPC
C. Yes

Answer: C

QUESTION 339

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 118
http://www.passleader.com
If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot
the instance for the changes to take effect?

A. No
B. Yes

Answer: B

QUESTION 340
When you view the block device mapping for your instance, you can see only the EBS volumes,
not the instance store volumes.

A. Depends on the instance type

B. FALSE
C. Depends on whether you use API call
D. TRUE

Answer: D

QUESTION 341
By default, EBS volumes that are created and attached to an instance at launch are deleted when
that instance is terminated. You can modify this behavior by changing the value of the flag_____
to false when you launch the instance

A. DeleteOnTermination
B. RemoveOnDeletion
C. RemoveOnTermination
D. TerminateOnDeletion

Answer: A

QUESTION 342
What are the initial settings of an user created security group?

A. Allow all inbound traffic and Allow no outbound traffic

B. Allow no inbound traffic and Allow no outbound traffic
C. Allow no inbound traffic and Allow all outbound traffic
D. Allow all inbound traffic and Allow all outbound traffic

Answer: C

QUESTION 343
Will my standby RDS instance be in the same Region as my primary?

A. Only for Oracle RDS types

B. Yes
C. Only if configured at launch
D. No

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 119
http://www.passleader.com
Answer: B

QUESTION 344
What does Amazon Elastic Beanstalk provide?

A. A scalable storage appliance on top of Amazon Web Services.

B. An application container on top of Amazon Web Services.
C. A service by this name doesn't exist.
D. A scalable cluster of EC2 instances.

Answer: B

QUESTION 345
True or False: When using IAM to control access to your RDS resources, the key names that can
be used are case sensitive. For example,
aws:CurrentTime is NOT equivalent to AWS:currenttime.

A. TRUE
B. FALSE

Answer: A

QUESTION 346
What will be the status of the snapshot until the snapshot is complete.

A. running
B. working
C. progressing
D. pending

Answer: D

QUESTION 347
Can we attach an EBS volume to more than one EC2 instance at the same time?

A. No
B. Yes.
C. Only EC2-optimized EBS volumes.
D. Only in read mode.

Answer: A

QUESTION 348
True or False: Automated backups are enabled by default for a new DB Instance.

A. TRUE

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 120
http://www.passleader.com
B. FALSE

Answer: A

QUESTION 349
What does the AWS Storage Gateway provide?

A. It allows to integrate on-premises IT environments with Cloud Storage.

B. A direct encrypted connection to Amazon S3.
C. It's a backup solution that provides an on-premises Cloud storage.
D. It provides an encrypted SSL endpoint for backups in the Cloud.

Answer: A

QUESTION 350
Amazon RDS automated backups and DB Snapshots are currently supported for only the
__________ storage engine

A. InnoDB
B. MyISAM

Answer: A

QUESTION 351
How many relational database engines does RDS currently support?

A. Three: MySQL, Oracle and Microsoft SQL Server.

B. Just two: MySQL and Oracle.
C. Five: MySQL, PostgreSQL, MongoDB, Cassandra and SQLite.
D. Just one: MySQL.

Answer: A

QUESTION 352
The base URI for all requests for instance metadata is ___________

A. http://254.169.169.254/latest/
B. http://169.169.254.254/latest/
C. http://127.0.0.1/latest/
D. http://169.254.169.254/latest/

Answer: D

QUESTION 353
While creating the snapshots using the command line tools, which command should I be using?

A. ec2-deploy-snapshot

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 121
http://www.passleader.com
B. ec2-fresh-snapshot
C. ec2-create-snapshot
D. ec2-new-snapshot

Answer: C

QUESTION 354
Typically, you want your application to check whether a request generated an error before you
spend any time processing results. The easiest way to find out if an error occurred is to look for
an __________ node in the response from the Amazon RDS API.

A. Incorrect
B. Error
C. FALSE

Answer: B

QUESTION 355
What are the two permission types used by AWS?

A. Resource-based and Product-based

B. Product-based and Service-based
C. Service-based
D. User-based and Resource-based

Answer: D

QUESTION 356
In the Amazon cloudwatch, which metric should I be checking to ensure that your DB Instance
has enough free storage space?

A. FreeStorage
B. FreeStorageSpace
C. FreeStorageVolume
D. FreeDBStorageSpace

Answer: B

QUESTION 357
Amazon RDS DB snapshots and automated backups are stored in

A. Amazon S3
B. Amazon ECS Volume
C. Amazon RDS
D. Amazon EMR

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 122
http://www.passleader.com
QUESTION 358
What is the maximum key length of a tag?

A. 512 Unicode characters

B. 64 Unicode characters
C. 256 Unicode characters
D. 128 Unicode characters

Answer: D

QUESTION 359
Groups can't _____.

A. be nested more than 3 levels

B. be nested at all
C. be nested more than 4 levels
D. be nested more than 2 levels

Answer: B

QUESTION 360
You must increase storage size in increments of at least _____ %

A. 40
B. 20
C. 50
D. 10

Answer: D

QUESTION 361
Changes to the backup window take effect ______.

A. from the next billing cycle

B. after 30 minutes
C. immediately
D. after 24 hours

Answer: C

QUESTION 362
Using Amazon CloudWatch's Free Tier, what is the frequency of metric updates which you
receive?

A. 5 minutes
B. 500 milliseconds.
C. 30 seconds

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 123
http://www.passleader.com
D. 1 minute

Answer: A

QUESTION 363
Which is the default region in AWS?

A. eu-west-1
B. us-east-1
C. us-east-2
D. ap-southeast-1

Answer: B

QUESTION 364
What are the Amazon EC2 API tools?

A. They don't exist. The Amazon EC2 AMI tools, instead, are used to manage permissions.
B. Command-line tools to the Amazon EC2 web service.
C. They are a set of graphical tools to manage EC2 instances.
D. They don't exist. The Amazon API tools are a client interface to Amazon Web Services.

Answer: B

QUESTION 365
What are the two types of licensing options available for using Amazon RDS for Oracle?

A. BYOL and Enterprise License

B. BYOL and License Included
C. Enterprise License and License Included
D. Role based License and License Included

Answer: B

QUESTION 366
What does a "Domain" refer to in Amazon SWF?

A. A security group in which only tasks inside can communicate with each other
B. A special type of worker
C. A collection of related Workflows
D. The DNS record for the Amazon SWF service

Answer: C

QUESTION 367
EBS Snapshots occur _____

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 124
http://www.passleader.com
A. Asynchronously
B. Synchronously
C. Weekly

Answer: A

QUESTION 368
Disabling automated backups ______ disable the point-in-time recovery.

A. if configured to can
B. will never
C. will

Answer: C

QUESTION 369
Out of the stripping options available for the EBS volumes, which one has the following
disadvantage : 'Doubles the amount of I/O required from the instance to EBS compared to RAID
0, because you're mirroring all writes to a pair of volumes, limiting how much you can stripe.' ?

A. Raid 0
B. RAID 1+0 (RAID 10)
C. Raid 1
D. Raid

Answer: B

QUESTION 370
Is creating a Read Replica of another Read Replica supported?

A. Only in certain regions

B. Only with MSSQL based RDS
C. Only for Oracle RDS types
D. No

Answer: D

QUESTION 371
Can Amazon S3 uploads resume on failure or do they need to restart?

A. Restart from beginning

B. You can resume them, if you flag the "resume on failure" option before uploading.
C. Resume on failure
D. Depends on the file size

Answer: C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 125
http://www.passleader.com
QUESTION 372
Which of the following cannot be used in Amazon EC2 to control who has access to specific
Amazon EC2 instances?

A. Security Groups
B. IAM System
C. SSH keys
D. Windows passwords

Answer: B

QUESTION 373
_________ let you categorize your EC2 resources in different ways, for example, by purpose,
owner, or environment.

A. wildcards
B. pointers
C. Tags
D. special filters

Answer: C

QUESTION 374
How can I change the security group membership for interfaces owned by other AWS, such as
Elastic Load Balancing?

A. By using the service specific console or API\CLI commands

B. None of these
C. Using Amazon EC2 API/CLI
D. using all these methods

Answer: A

QUESTION 375
What is the maximum write throughput I can provision for a single Dynamic DB table?

A. 1,000 write capacity units

B. 100,000 write capacity units
C. Dynamic DB is designed to scale without limits, but if you go beyond 10,000 you have to contact
AWS first.
D. 10,000 write capacity units

Answer: C

QUESTION 376
What does the following command do with respect to the Amazon EC2 security groups? ec2-
revoke RevokeSecurityGroupIngress

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 126
http://www.passleader.com
A. Removes one or more security groups from a rule.
B. Removes one or more security groups from an Amazon EC2 instance.
C. Removes one or more rules from a security group.
D. Removes a security group from our account.

Answer: C

QUESTION 377
Can a 'user' be associated with multiple AWS accounts?

A. No
B. Yes

Answer: A

QUESTION 378
True or False: Manually created DB Snapshots are deleted after the DB Instance is deleted.

A. TRUE
B. FALSE

Answer: A

QUESTION 379
Can I move a Reserved Instance from one Region to another?

A. No
B. Only if they are moving into GovCloud
C. Yes
D. Only if they are moving to US East from another region

Answer: A

QUESTION 380
What is Amazon Glacier?

A. You mean Amazon "Iceberg": it's a low-cost storage service.

B. A security tool that allows to "freeze" an EBS volume and perform computer forensics on it.
C. A low-cost storage service that provides secure and durable storage for data archiving and
backup.
D. It's a security tool that allows to "freeze" an EC2 instance and perform computer forensics on it.

Answer: C

QUESTION 381
What is the durability of S3 RRS?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 127
http://www.passleader.com
A. 99.99%
B. 99.95%
C. 99.995%
D. 99.999999999%

Answer: A

QUESTION 382
What does specifying the mapping /dev/sdc=none when launching an instance do?

A. Prevents /dev/sdc from creating the instance.

B. Prevents /dev/sdc from deleting the instance.
C. Set the value of /dev/sdc to 'zero'.
D. Prevents /dev/sdc from attaching to the instance.

Answer: D

QUESTION 383
Is Federated Storage Engine currently supported by Amazon RDS for MySQL?

A. Only for Oracle RDS instances

B. No
C. Yes
D. Only in VPC

Answer: B

QUESTION 384
Is there a limit to how many groups a user can be in?

A. Yes for all users

B. Yes for all users except root
C. No
D. Yes unless special permission granted

Answer: A

QUESTION 385
True or False: When you perform a restore operation to a point in time or from a DB Snapshot, a
new DB Instance is created with a new endpoint.

A. FALSE
B. TRUE

Answer: B

QUESTION 386

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 128
http://www.passleader.com
A/An _____ acts as a firewall that controls the traffic allowed to reach one or more instances.

A. security group
B. ACL
C. IAM
D. Private IP Addresses

Answer: A

QUESTION 387
Will my standby RDS instance be in the same Availability Zone as my primary?

A. Only for Oracle RDS types

B. Yes
C. Only if configured at launch
D. No

Answer: D

QUESTION 388
While launching an RDS DB instance, on which page I can select the Availability Zone?

A. REVIEW
B. DB INSTANCE DETAILS
C. MANAGEMENT OPTIONS
D. ADDITIONAL CONFIGURATION

Answer: D

QUESTION 389
What does the following command do with respect to the Amazon EC2 security groups? ec2-
create-group CreateSecurityGroup

A. Groups the user created security groups in to a new group for easy access.
B. Creates a new security group for use with your account.
C. Creates a new group inside the security group.
D. Creates a new rule inside the security group.

Answer: B

QUESTION 390
In the Launch Db Instance Wizard, where can I select the backup and maintenance options?

A. Under DB INSTANCE DETAILS

B. Under REVIEW
C. Under MANAGEMENT OPTIONS
D. Under ENGINE SELECTION

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 129
http://www.passleader.com
Answer: C

QUESTION 391
What happens to the data on an instance if the instance reboots (intentionally or unintentionally)?

A. Data will be lost

B. Data persists
C. Data may persist however cannot be sure

Answer: B

QUESTION 392
How many types of block devices does Amazon EC2 support A

A. 2
B. 3
C. 4
D. 1

Answer: A

QUESTION 393
Provisioned IOPS Costs: you are charged for the IOPS and storage whether or not you use them
in a given month.

A. FALSE
B. TRUE

Answer: B

QUESTION 394
IAM provides several policy templates you can use to automatically assign permissions to the
groups you create. The _____ policy template gives the Admins group permission to access all
account resources, except your AWS account information

A. Read Only Access

B. Power User Access
C. AWS Cloud Formation Read Only Access
D. Administrator Access

Answer: D

QUESTION 395
While performing the volume status checks, if the status is insufficient-data, what does it mean?

A. the checks may still be in progress on the volume

B. the check has passed

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 130
http://www.passleader.com
C. the check has failed

Answer: A

QUESTION 396
IAM's Policy Evaluation Logic always starts with a default ____________ for every request,
except for those that use the AWS account's root security credentials b

A. Permit
B. Deny
C. Cancel

Answer: B

QUESTION 397
By default, when an EBS volume is attached to a Windows instance, it may show up as any drive
letter on the instance. You can change the settings of the _____ Service to set the drive letters of
the EBS volumes per your specifications.

A. EBSConfig Service
B. AMIConfig Service
C. Ec2Config Service
D. Ec2-AMIConfig Service

Answer: C

QUESTION 398
For each DB Instance class, what is the maximum size of associated storage capacity?

A. 5GB
B. 1TB
C. 2TB
D. 500GB

Answer: B

QUESTION 399
SQL Server __________ store logins and passwords in the master database.

A. can be configured to but by default does not

B. doesn't
C. does

Answer: C

QUESTION 400
What is Oracle SQL Developer?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 131
http://www.passleader.com
A. An AWS developer who is an expert in Amazon RDS using both the Oracle and SQL Server DB
engines
B. A graphical Java tool distributed without cost by Oracle.
C. It is a variant of the SQL Server Management Studio designed by Microsoft to support Oracle
DBMS functionalities
D. A different DBMS released by Microsoft free of cost

Answer: B

QUESTION 401
Does Amazon RDS allow direct host access via Telnet, Secure Shell (SSH), or Windows Remote
Desktop Connection?

A. Yes
B. No
C. Depends on if it is in VPC or not

Answer: B

QUESTION 402
To view information about an Amazon EBS volume, open the Amazon EC2 console at
https://console.aws.amazon.com/ec2/, click __________ in the Navigation pane.

A. EBS
B. Describe
C. Details
D. Volumes

Answer: D

QUESTION 403
Using Amazon IAM, can I give permission based on organizational groups?

A. Yes but only in certain cases

B. No
C. Yes always

Answer: C

QUESTION 404
While creating the snapshots using the API, which Action should I be using?

A. MakeSnapShot
B. FreshSnapshot
C. DeploySnapshot
D. CreateSnapshot

Answer: D

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 132
http://www.passleader.com
QUESTION 405
What is an isolated database environment running in the cloud (Amazon RDS) called?

A. DB Instance
B. DB Server
C. DB Unit
D. DB Volume

Answer: A

QUESTION 406
While signing in REST/ Query requests, for additional security, you should transmit your requests
using Secure Sockets Layer (SSL) by using _________

A. HTTP
B. Internet Protocol Security(IPsec)
C. TLS (Transport Layer Security)
D. HTTPS

Answer: D

QUESTION 407
What happens to the I/O operations while you take a database snapshot?

A. I/O operations to the database are suspended for a few minutes while the backup is in progress.
B. I/O operations to the database are sent to a Replica (if available) for a few minutes while the
backup is in progress.
C. I/O operations will be functioning normally
D. I/O operations to the database are suspended for an hour while the backup is in progress

Answer: A

QUESTION 408
Read Replicas require a transactional storage engine and are only supported for the _________
storage engine

A. OracleISAM
B. MSSQLDB
C. InnoDB
D. MyISAM

Answer: C

QUESTION 409
When running my DB Instance as a Multi-AZ deployment, can I use the standby for read or write
operations?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 133
http://www.passleader.com
A. Yes
B. Only with MSSQL based RDS
C. Only for Oracle RDS instances
D. No

Answer: D

QUESTION 410
When should I choose Provisioned IOPS over Standard RDS storage?

A. If you have batch-oriented workloads

B. If you use production online transaction processing (OLTP) workloads.
C. If you have workloads that are not sensitive to consistent performance

Answer: B

QUESTION 411
In the 'Detailed' monitoring data available for your Amazon EBS volumes, Provisioned IOPS
volumes automatically send _____ minute metrics to Amazon CloudWatch.

A. 3
B. 1
C. 5
D. 2

Answer: B

QUESTION 412
What is the minimum charge for the data transferred between Amazon RDS and Amazon EC2
Instances in the same Availability Zone?

A. USD 0.10 per GB

B. No charge. It is free.
C. USD 0.02 per GB
D. USD 0.01 per GB

Answer: B

QUESTION 413
Are Reserved Instances available for Multi-AZ Deployments?

A. Only for Cluster Compute instances

B. Yes for all instance types
C. Only for M3 instance types
D. No

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 134
http://www.passleader.com
QUESTION 414
Which service enables AWS customers to manage users and permissions in AWS?

A. AWS Access Control Service (ACS)

B. AWS Identity and Access Management (IAM)
C. AWS Identity Manager (AIM)

Answer: B

QUESTION 415
Which Amazon Storage behaves like raw, unformatted, external block devices that you can
attach to your instances?

A. None of these.
B. Amazon Instance Storage
C. Amazon EBS
D. All of these

Answer: C

QUESTION 416
Which Amazon service can I use to define a virtual network that closely resembles a traditional
data center?

A. Amazon VPC
B. Amazon ServiceBus
C. Amazon EMR
D. Amazon RDS

Answer: A

QUESTION 417
What is the command line instruction for running the remote desktop client in Windows?

A. desk.cpl
B. mstsc

Answer: B

QUESTION 418
Amazon RDS automated backups and DB Snapshots are currently supported for only the ______
storage engine

A. MyISAM
B. InnoDB

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 135
http://www.passleader.com
QUESTION 419
MySQL installations default to port _____.

A. 3306
B. 443
C. 80
D. 1158

Answer: A

QUESTION 420
If you have chosen Multi-AZ deployment, in the event of a planned or unplanned outage of your
primary DB Instance, Amazon RDS automatically switches to the standby replic

A. The automatic failover mechanism simply changes the ______ record of the main DB Instance to
point to the standby DB Instance.
B. DNAME
C. CNAME
D. TXT
E. MX

Answer: B

QUESTION 421
If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot
the instance for the changes to take effect?

A. No
B. Yes

Answer: B

QUESTION 422
If I want to run a database in an Amazon instance, which is the most recommended Amazon
storage option?

A. Amazon Instance Storage

B. Amazon EBS
C. You can't run a database inside an Amazon instance.
D. Amazon S3

Answer: B

QUESTION 423
In regards to IAM you can edit user properties later, but you cannot use the console to change
the ___________.

A. user name

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 136
http://www.passleader.com
B. password
C. default group

Answer: A

QUESTION 424
Can I test my DB Instance against a new version before upgrading?

A. No
B. Yes
C. Only in VPC

Answer: B

QUESTION 425
True or False: If you add a tag that has the same key as an existing tag on a DB Instance, the
new value overwrites the old value.

A. FALSE
B. TRUE

Answer: B

QUESTION 426
Can I use Provisioned IOPS with VPC?

A. Only Oracle based RDS

B. No
C. Only with MSSQL based RDS
D. Yes for all RDS instances

Answer: D

QUESTION 427
Making your snapshot public shares all snapshot data with everyone. Can the snapshots with
AWS Marketplace product codes be made public?

A. No
B. Yes

Answer: B

QUESTION 428
"To ensure failover capabilities, consider using a _____ for incoming traffic on a network
interface".

A. primary public IP
B. secondary private IP

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 137
http://www.passleader.com
C. secondary public IP
D. add on secondary IP

Answer: B

QUESTION 429
If I have multiple Read Replicas for my master DB Instance and I promote one of them, what
happens to the rest of the Read Replicas?

A. The remaining Read Replicas will still replicate from the older master DB Instance
B. The remaining Read Replicas will be deleted
C. The remaining Read Replicas will be combined to one read replica

Answer: A

QUESTION 430
What does Amazon CloudFormation provide?

A. The ability to setup Autoscaling for Amazon EC2 instances.

B. None of these.
C. A templated resource creation for Amazon Web Services.
D. A template to map network resources for Amazon Web Services.

Answer: C

QUESTION 431
Can I encrypt connections between my application and my DB Instance using SSL?

A. No
B. Yes
C. Only in VPC
D. Only in certain regions

Answer: B

QUESTION 432
What are the four levels of AWS Premium Support?

A. Basic, Developer, Business, Enterprise

B. Basic, Startup, Business, Enterprise
C. Free, Bronze, Silver, Gold
D. All support is free

Answer: A

QUESTION 433
What can I access by visiting the URL: http://status.aws.amazon.com/?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 138
http://www.passleader.com
A. Amazon Cloud Watch
B. Status of the Amazon RDS DB
C. AWS Service Health Dashboard
D. AWS Cloud Monitor

Answer: C

QUESTION 434
Please select the Amazon EC2 resource which cannot be tagged.

A. images (AMIs, kernels, RAM disks)

B. Amazon EBS volumes
C. Elastic IP addresses
D. VPCs

Answer: C

QUESTION 435
Can the string value of 'Key' be prefixed with :aws:"?

A. Only in GovCloud
B. Only for S3 not EC2
C. Yes
D. No

Answer: D

QUESTION 436
Because of the extensibility limitations of striped storage attached to Windows Server, Amazon
RDS does not currently support increasing storage on a _____ DB Instance.

A. SQL Server
B. MySQL
C. Oracle

Answer: A

QUESTION 437
Through which of the following interfaces is AWS Identity and Access Management available?

A) AWS Management Console

B) Command line interface (CLI)
C) IAM Query API
D) Existing libraries

A. Only through Command line interface (CLI)

B. A, B and C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 139
http://www.passleader.com
C. A and C
D. All of the above

Answer: D

QUESTION 438
Select the incorrect statement

A. In Amazon EC2, the private IP addresses only returned to Amazon EC2 when the instance is
stopped or terminated
B. In Amazon VPC, an instance retains its private IP addresses when the instance is stopped.
C. In Amazon VPC, an instance does NOT retain its private IP addresses when the instance is
stopped.
D. In Amazon EC2, the private IP address is associated exclusively with the instance for its lifetime

Answer: C

QUESTION 439
How are the EBS snapshots saved on Amazon S3?

A. Exponentially
B. Incrementally
C. EBS snapshots are not stored in the Amazon S3
D. Decrementally

Answer: B

QUESTION 440
What is the type of monitoring data (for Amazon EBS volumes) which is available automatically in
5- minute periods at no charge called?

A. Basic
B. Primary
C. Detailed
D. Local

Answer: A

QUESTION 441
The new DB Instance that is created when you promote a Read Replica retains the backup
window period.

A. TRUE
B. FALSE

Answer: A

QUESTION 442

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 140
http://www.passleader.com
What happens when you create a topic on Amazon SNS?

A. The topic is created, and it has the name you specified for it.
B. An ARN (Amazon Resource Name) is created.
C. You can create a topic on Amazon SQS, not on Amazon SNS.
D. This question doesn't make sense.

Answer: B

QUESTION 443
Can I delete a snapshot of the root device of an EBS volume used by a registered AMI?

A. Only via API

B. Only via Console
C. Yes
D. No

Answer: C

QUESTION 444
Can I test my DB Instance against a new version before upgrading?

A. Only in VPC
B. No
C. Yes

Answer: C

QUESTION 445
What is the maximum response time for a Business level Premium Support case?

A. 120 seconds
B. 1 hour
C. 10 minutes
D. 12 hours

Answer: B

QUESTION 446
The _____ service is targeted at organizations with multiple users or systems that use AWS
products such as Amazon EC2, Amazon SimpleDB, and the AWS Management Console.

A. Amazon RDS
B. AWS Integrity Management
C. AWS Identity and Access Management
D. Amazon EMR

Answer: C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 141
http://www.passleader.com
QUESTION 447
True or False: Without IAM, you cannot control the tasks a particular user or system can do and
what AWS resources they might use.

A. FALSE
B. TRUE

Answer: A

QUESTION 448
When you use the AWS Management Console to delete an IAM user, IAM also deletes any
signing certificates and any access keys belonging to the user.

A. FALSE
B. This is configurable
C. TRUE

Answer: C

QUESTION 449
When automatic failover occurs, Amazon RDS will emit a DB Instance event to inform you that
automatic failover occurred. You can use the _____ to return information about events related to
your DB Instance

A. FetchFailure
B. DescriveFailure
C. DescribeEvents
D. FetchEvents

Answer: C

QUESTION 450
What is the default maximum number of MFA devices in use per AWS account (at the root
account level)?

A. 1
B. 5
C. 15
D. 10

Answer: A

QUESTION 451
Do the Amazon EBS volumes persist independently from the running life of an Amazon EC2
instance?

A. Only if instructed to when created

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 142
http://www.passleader.com
B. Yes
C. No

Answer: B

QUESTION 452
Can we attach an EBS volume to more than one EC2 instance at the same time?

A. Yes.
B. No
C. Only EC2-optimized EBS volumes.
D. Only in read mode.

Answer: A

QUESTION 453
Select the correct set of options. These are the initial settings for the default security group:

A. Allow no inbound traffic, Allow all outbound traffic and Allow instances associated with this
security group to talk to each other
B. Allow all inbound traffic, Allow no outbound traffic and Allow instances associated with this
security group to talk to each other
C. Allow no inbound traffic, Allow all outbound traffic and Does NOT allow instances associated with
this security group to talk to each other
D. Allow all inbound traffic, Allow all outbound traffic and Does NOT allow instances associated with
this security group to talk to each other

Answer: A

QUESTION 454
What does Amazon Route53 provide?

A. A global Content Delivery Network.

B. None of these.
C. A scalable Domain Name System.
D. An SSH endpoint for Amazon EC2.

Answer: C

QUESTION 455
What does Amazon ElastiCache provide?

A. A service by this name doesn't exist. Perhaps you mean Amazon CloudCache.
B. A virtual server with a huge amount of memory.
C. A managed In-memory cache service.
D. An Amazon EC2 instance with the Memcached software already pre-installed.

Answer: C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 143
http://www.passleader.com
QUESTION 456
How many Elastic IP by default in Amazon Account?

A. 1 Elastic IP
B. 3 Elastic IP
C. 5 Elastic IP
D. 0 Elastic IP

Answer: D

QUESTION 457
What is a Security Group?

A. None of these.
B. A list of users that can access Amazon EC2 instances.
C. An Access Control List (ACL) for AWS resources.
D. A firewall for inbound traffic, built-in around every Amazon EC2 instance.

Answer: D

QUESTION 458
The one-time payment for Reserved Instances is __________ refundable if the reservation is
cancelled.

A. always
B. in some circumstances
C. never

Answer: C

QUESTION 459
Please select the Amazon EC2 resource which can be tagged.

A. key pairs
B. Elastic IP addresses
C. placement groups
D. Amazon EBS snapshots

Answer: C

QUESTION 460
If an Amazon EBS volume is the root device of an instance, can I detach it without stopping the
instance?

A. Yes but only if Windows instance

B. No

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 144
http://www.passleader.com
C. Yes
D. Yes but only if a Linux instance

Answer: B

QUESTION 461
If you are using Amazon RDS Provisioned IOPS storage with MySQL and Oracle database
engines, you can scale the throughput of your database Instance by specifying the IOPS rate
from __________ .

A. 1,000 to 1, 00, 000

B. 100 to 1, 000
C. 10, 000 to 1, 00, 000
D. 1, 000 to 10, 000

Answer: D

QUESTION 462
Every user you create in the IAM system starts with ___________.

A. full permissions
B. no permissions
C. partial permissions

Answer: B

QUESTION 463
After an Amazon VPC instance is launched, can I change the VPC security groups it belongs to?

A. Only if the tag "VPC_Change_Group" is true

B. Yes. You can.
C. No. You cannot.
D. Only if the tag "VPC Change Group" is true

Answer: B

QUESTION 464
A____________ is an individual, system, or application that interacts with AWS programmatically.

A. user
B. AWS Account
C. Group
D. Role

Answer: A

QUESTION 465

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 145
http://www.passleader.com
Select the correct statement:

A. You don't need not specify the resource identifier while stopping a resource
B. You can terminate, stop, or delete a resource based solely on its tags
C. You can't terminate, stop, or delete a resource based solely on its tags
D. You don't need to specify the resource identifier while terminating a resource

Answer: C

QUESTION 466
Amazon EC2 has no Amazon Resource Names (ARNs) because you can't specify a particular
Amazon EC2 resource in an IAM policy.

A. TRUE
B. FALSE

Answer: A

QUESTION 467
Can I initiate a "forced failover" for my MySQL Multi-AZ DB Instance deployment?

A. Only in certain regions

B. Only in VPC
C. Yes
D. No

Answer: A

QUESTION 468
A group can contain many users. Can a user belong to multiple groups?

A. Yes always
B. No
C. Yes but only if they are using two factor authentication
D. Yes but only in VPC

Answer: A

QUESTION 469
Is the encryption of connections between my application and my DB Instance using SSL for the
MySQL server engines available?

A. Yes
B. Only in VPC
C. Only in certain regions
D. No

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 146
http://www.passleader.com
QUESTION 470
Which AWS instance address has the following characteristics? :"If you stop an instance, its
Elastic IP address is unmapped, and you must remap it when you restart the instance."

A. Both A and B
B. None of these
C. VPC Addresses
D. EC2 Addresses

Answer: A

QUESTION 471
True or False: Common points of failures like generators and cooling equipment are shared
across Availability Zones.

A. TRUE
B. FALSE

Answer: B

QUESTION 472
Please select the most correct answer regarding the persistence of the Amazon Instance Store

A. The data on an instance store volume persists only during the life of the associated Amazon EC2
instance
B. The data on an instance store volume is lost when the security group rule of the associated
instance is changed.
C. The data on an instance store volume persists even after associated Amazon EC2 instance is
deleted

Answer: B

QUESTION 473
Multi-AZ deployment ___________ supported for Microsoft SQL Server DB Instances.

A. is not currently
B. is as of 2013
C. is planned to be in 2014
D. will never be

Answer: A

QUESTION 474
Security groups act like a firewall at the instance level, whereas ____________ are an additional
layer of security that act at the subnet level.

A. DB Security Groups

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 147
http://www.passleader.com
B. VPC Security Groups
C. network ACLs

Answer: C

QUESTION 475
What does Amazon Elastic Beanstalk provide?

A. An application container on top of Amazon Web Services.

B. A scalable storage appliance on top of Amazon Web Services.
C. A scalable cluster of EC2 instances.
D. A service by this name doesn't exist.

Answer: C

QUESTION 476
Is the SQL Server Audit feature supported in the Amazon RDS SQL Server engine?

A. No
B. Yes

Answer: A

QUESTION 477
Are you able to integrate a multi-factor token service with the AWS Platform?

A. Yes, using the AWS multi-factor token devices to authenticate users on the AWS platform.
B. No, you cannot integrate multi-factor token devices with the AWS platform.
C. Yes, you can integrate private multi-factor token devices to authenticate users to the AWS
platform.

Answer: A

QUESTION 478
My Read Replica appears "stuck" after a Multi-AZ failover and is unable to obtain or apply
updates from the source DB Instance. What do I do?

A. You will need to delete the Read Replica and create a new one to replace it.
B. You will need to disassociate the DB Engine and re associate it.
C. The instance should be deployed to Single AZ and then moved to Multi- AZ once again
D. You will need to delete the DB Instance and create a new one to replace it.

Answer: A

QUESTION 479
Which DNS name can only be resolved within Amazon EC2?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 148
http://www.passleader.com
A. Internal DNS name
B. External DNS name
C. Global DNS name
D. Private DNS name

Answer: A

QUESTION 480
If your DB instance runs out of storage space or file system resources, its status will change
to_____ and your DB Instance will no longer be available.

A. storage-overflow
B. storage-full
C. storage-exceed
D. storage-overage

Answer: B

QUESTION 481
Is it possible to access your EBS snapshots?

A. Yes, through the Amazon S3 APIs.

B. Yes, through the Amazon EC2 APIs.
C. No, EBS snapshots cannot be accessed; they can only be used to create a new EBS volume.
D. EBS doesn't provide snapshots.

Answer: B

QUESTION 482
Does Amazon RDS for SQL Server currently support importing data into the msdb database?

A. No
B. Yes

Answer: A

QUESTION 483
Does Route 53 support MX Records?

A. Yes.
B. It supports CNAME records, but not MX records.
C. No
D. Only Primary MX records. Secondary MX records are not supported.

Answer: A

QUESTION 484

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 149
http://www.passleader.com
Because of the extensibility limitations of striped storage attached to Windows Server, Amazon
RDS does not currently support increasing storage on a _____ DB Instance.

A. SQL Server
B. MySQL
C. Oracle

Answer: A

QUESTION 485
Which Amazon storage do you think is the best for my database-style applications that frequently
encounter many random reads and writes across the dataset?

A. None of these.
B. Amazon Instance Storage
C. Any of these
D. Amazon EBS

Answer: D

QUESTION 486
Select the correct set of steps for exposing the snapshot only to specific AWS accounts

A. Select public for all the accounts and check mark those accounts with whom you want to expose
the snapshots and click save.
B. SelectPrivate, enter the IDs of those AWS accounts, and clickSave.
C. SelectPublic, enter the IDs of those AWS accounts, and clickSave.
D. SelectPublic, mark the IDs of those AWS accounts as private, and clickSave.

Answer: C

QUESTION 487
Is decreasing the storage size of a DB Instance permitted?

A. Depends on the RDMS used

B. Yes
C. No

Answer: B

QUESTION 488
When should I choose Provisioned IOPS over Standard RDS storage?

A. If you use production online transaction processing (OLTP) workloads.

B. If you have batch-oriented workloads
C. If you have workloads that are not sensitive to consistent performance

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 150
http://www.passleader.com
QUESTION 489
In the context of MySQL, version numbers are organized as MySQL version = X.Y.Z. What does
X denote here?

A. release level
B. minor version
C. version number
D. major version

Answer: D

QUESTION 490
In the 'Detailed' monitoring data available for your Amazon EBS volumes, Provisioned IOPS
volumes automatically send _____ minute metrics to Amazon CloudWatch.

A. 5
B. 2
C. 1
D. 3

Answer: C

QUESTION 491
It is advised that you watch the Amazon CloudWatch "_____" metric (available via the AWS
Management Console or Amazon Cloud Watch APIs) carefully and recreate the Read Replica
should it fall behind due to replication errors.

A. Write Lag
B. Read Replica
C. Replica Lag
D. Single Replica

Answer: C

QUESTION 492
Can the string value of 'Key' be prefixed with laws?

A. No
B. Only for EC2 not S3
C. Yes
D. Only for S3 not EC

Answer: A

QUESTION 493
By default what are ENIs that are automatically created and attached to instances using the EC2
console set to do when the attached instance terminates?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 151
http://www.passleader.com
A. Remain as is
B. Terminate
C. Hibernate
D. Pause

Answer: B

QUESTION 494
Are you able to integrate a multi-factor token service with the AWS Platform?

A. Yes, you can integrate private multi-factor token devices to authenticate users to the AWS
platform.
B. No, you cannot integrate multi-factor token devices with the AWS platform.
C. Yes, using the AWS multi-factor token devices to authenticate users on the AWS platform.

Answer: C

QUESTION 495
You can use _____ and _____ to help secure the instances in your VPC.

A. security groups and multi-factor authentication

B. security groups and 2-Factor authentication
C. security groups and biometric authentication
D. security groups and network ACLs

Answer: D

QUESTION 496
_____ is a durable, block-level storage volume that you can attach to a single, running Amazon
EC2 instance.

A. Amazon S3
B. Amazon EBS
C. None of these
D. All of these

Answer: B

QUESTION 497
Do the Amazon EBS volumes persist independently from the running life of an Amazon EC2
instance?

A. No
B. Only if instructed to when created
C. Yes

Answer: C

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 152
http://www.passleader.com
QUESTION 498
If I want my instance to run on a single-tenant hardware, which value do I have to set the
instance's tenancy attribute to?

A. dedicated
B. isolated
C. one
D. reserved

Answer: A

QUESTION 499
What does Amazon RDS stand for?

A. Regional Data Server.

B. Relational Database Service.
C. Nothing.
D. Regional Database Service.

Answer: B

QUESTION 500
What is the maximum response time for a Business level Premium Support case?

A. 30 minutes
B. You always get instant responses (within a few seconds).
C. 10 minutes
D. 1 hour

Answer: D

QUESTION 501
What does Amazon ELB stand for?

A. Elastic Linux Box.

B. Encrypted Linux Box.
C. Encrypted Load Balancing.
D. Elastic Load Balancing.

Answer: D

QUESTION 502
What does Amazon CloudFormation provide?

A. None of these.
B. The ability to setup Autoscaling for Amazon EC2 instances.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 153
http://www.passleader.com
C. A template to map network resources for Amazon Web Services.
D. A templated resource creation for Amazon Web Services.

Answer: D

QUESTION 503
Is there a limit to the number of groups you can have?

A. Yes for all users except root

B. No
C. Yes unless special permission granted
D. Yes for all users

Answer: D

QUESTION 504
Location of Instances are ____________

A. Regional
B. based on Availability Zone
C. Global

Answer: B

QUESTION 505
Is there any way to own a direct connection to Amazon Web Services?

A. You can create an encrypted tunnel to VPC, but you don't own the connection.
B. Yes, it's called Amazon Dedicated Connection.
C. No, AWS only allows access from the public Internet.
D. Yes, it's called Direct Connect.

Answer: D

QUESTION 506
What is the maximum response time for a Business level Premium Support case?

A. 30 minutes
B. 1 hour
C. 12 hours
D. 10 minutes

Answer: B

QUESTION 507
Does Dynamic DB support in-place atomic updates?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 154
http://www.passleader.com
A. It is not defined
B. No
C. Yes
D. It does support in-place non-atomic updates

Answer: C

QUESTION 508
Is there a method in the IAM system to allow or deny access to a specific instance?

A. Only for VPC based instances

B. Yes
C. No

Answer: C

QUESTION 509
What is an isolated database environment running in the cloud (Amazon RDS) called?

A. DB Instance
B. DB Unit
C. DB Server
D. DB Volume

Answer: A

QUESTION 510
What does Amazon SES stand for?

A. Simple Elastic Server

B. Simple Email Service
C. Software Email Solution
D. Software Enabled Server

Answer: B

QUESTION 511
Amazon S3 doesn't automatically give a user who creates _____ permission to perform other
actions on that bucket or object.

A. a file
B. a bucket or object
C. a bucket or file
D. a object or file

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 155
http://www.passleader.com
QUESTION 512
Can I attach more than one policy to a particular entity?

A. Yes always
B. Only if within GovCloud
C. No
D. Only if within VPC

Answer: A

QUESTION 513
A_____ is a storage device that moves data in sequences of bytes or bits (blocks). Hint: These
devices support random access and generally use buffered I/O.

A. block map
B. storage block
C. mapping device
D. block device

Answer: D

QUESTION 514
Can I detach the primary (eth0) network interface when the instance is running or stopped?

A. Yes, You can.

B. No. You cannot
C. Depends on the state of the interface at the time

Answer: B

QUESTION 515
What's an ECU?

A. Extended Cluster User.

B. None of these.
C. Elastic Computer Usage.
D. Elastic Compute Unit.

Answer: D

QUESTION 516
REST or Query requests are HTTP or HTTPS requests that use an HTTP verb (such as GET or
POST) and a parameter named Action or Operation that specifies the API you are calling.

A. FALSE
B. TRUE

Answer: A

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 156
http://www.passleader.com
QUESTION 517
What is the charge for the data transfer incurred in replicating data between your primary and
standby?

A. No charge. It is free.
B. Double the standard data transfer charge
C. Same as the standard data transfer charge
D. Half of the standard data transfer charge

Answer: C

QUESTION 518
Does AWS Direct Connect allow you access to all Availabilities Zones within a Region?

A. Depends on the type of connection

B. No
C. Yes
D. Only when there's just one availability zone in a region. If there are more than one, only one
availability zone can be accessed directly.

Answer: A

QUESTION 519
What does the "Server Side Encryption" option on Amazon S3 provide?

A. It provides an encrypted virtual disk in the Cloud.

B. It doesn't exist for Amazon S3, but only for Amazon EC2.
C. It encrypts the files that you send to Amazon S3, on the server side.
D. It allows to upload files using an SSL endpoint, for a secure transfer.

Answer: A

QUESTION 520
What does Amazon EBS stand for?

A. Elastic Block Storage

B. Elastic Business Server
C. Elastic Blade Server
D. Elastic Block Store

Answer: D

QUESTION 521
Within the IAM service a GROUP is regarded as a:

A. A collection of AWS accounts

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 157
http://www.passleader.com
B. It's the group of EC2 machines that gain the permissions specified in the GROUP.
C. There's no GROUP in IAM, but only USERS and RESOURCES.
D. A collection of users.

Answer: D

QUESTION 522
A __________ is the concept of allowing (or disallowing) an entity such as a user, group, or role
some type of access to one or more resources.

A. user
B. AWS Account
C. resource
D. permission

Answer: B

QUESTION 523
After an Amazon VPC instance is launched, can I change the VPC security groups it belongs to?

A. No. You cannot.

B. Yes. You can.
C. Only if you are the root user
D. Only if the tag "VPC_Change_Group" is true

Answer: C

QUESTION 524
Do the system resources on the Micro instance meet the recommended configuration for Oracle?

A. Yes completely
B. Yes but only for certain situations
C. Not in any circumstance

Answer: B

QUESTION 525
Will I be charged if the DB instance is idle?

A. No
B. Yes
C. Only is running in GovCloud
D. Only if running in VPC

Answer: B

QUESTION 526

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 158
http://www.passleader.com
To help you manage your Amazon EC2 instances, images, and other Amazon EC2 resources,
you can assign your own metadata to each resource in the form of____________

A. special filters
B. functions
C. tags
D. wildcards

Answer: C

QUESTION 527
Are you able to integrate a multi-factor token service with the AWS Platform?

A. No, you cannot integrate multi-factor token devices with the AWS platform.
B. Yes, you can integrate private multi-factor token devices to authenticate users to the AWS
platform.
C. Yes, using the AWS multi-factor token devices to authenticate users on the AWS platform.

Answer: C

QUESTION 528
True or False: When you add a rule to a DB security group, you do not need to specify port
number or protocol.

A. Depends on the RDMS used

B. TRUE
C. FALSE

Answer: B

QUESTION 529
Is there a limit to the number of groups you can have?

A. Yes for all users

B. Yes for all users except root
C. No
D. Yes unless special permission granted

Answer: A

QUESTION 530
Can I initiate a "forced failover" for my Oracle Multi-AZ DB Instance deployment?

A. Yes
B. Only in certain regions
C. Only in VPC
D. No

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 159
http://www.passleader.com
Answer: A

QUESTION 531
Amazon EC2 provides a repository of public data sets that can be seamlessly integrated into
AWS cloud-based applications.What is the monthly charge for using the public data sets?

A. A 1 time charge of 10$ for all the datasets.

B. 1$ per dataset per month
C. 10$ per month for all the datasets
D. There is no charge for using the public data sets

Answer: D

QUESTION 532
In the Amazon RDS Oracle DB engine, the Database Diagnostic Pack and the Database Tuning
Pack are only available with ______________

A. Oracle Standard Edition

B. Oracle Express Edition
C. Oracle Enterprise Edition
D. None of these

Answer: C

QUESTION 533
Without _____, you must either create multiple AWS accounts-each with its own billing and
subscriptions to AWS products-or your employees must share the security credentials of a single
AWS account.

A. Amazon RDS
B. Amazon Glacier
C. Amazon EMR
D. Amazon IAM

Answer: D

QUESTION 534
Amazon RDS supports SOAP only through __________.

A. HTTP or HTTPS
B. TCP/IP
C. HTTP
D. HTTPS

Answer: D

QUESTION 535
The Amazon EC2 web service can be accessed using the _____ web services messaging

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 160
http://www.passleader.com
protocol. This interface is described by a Web Services Description Language (WSDL) document.

A. SOAP
B. DCOM
C. CORBA
D. XML-RPC

Answer: A

QUESTION 536
Is creating a Read Replica of another Read Replica supported?

A. Only in VPC
B. Yes
C. Only in certain regions
D. No

Answer: D

QUESTION 537
What is the charge for the data transfer incurred in replicating data between your primary and
standby?

A. Same as the standard data transfer charge

B. Double the standard data transfer charge
C. No charge. It is free
D. Half of the standard data transfer charge

Answer: C

QUESTION 538
HTTP Query-based requests are HTTP requests that use the HTTP verb GET or POST and a
Query parameter named_____________.

A. Action
B. Value
C. Reset
D. Retrieve

Answer: A

QUESTION 539
What happens to the I/O operations while you take a database snapshot?

A. I/O operations to the database are suspended for an hour while the backup is in progress.
B. I/O operations to the database are sent to a Replica (if available) for a few minutes while the
backup is in progress.
C. I/O operations will be functioning normally

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 161
http://www.passleader.com
D. I/O operations to the database are suspended for a few minutes while the backup is in progress.

Answer: D

QUESTION 540
Amazon RDS creates an SSL certificate and installs the certificate on the DB Instance when
Amazon RDS provisions the instance. These certificates are signed by a certificate authority. The
_____ is stored athttps://rds.amazonaws.com/doc/rds-ssl-ca-cert.pem.

A. private key
B. foreign key
C. public key
D. protected key

Answer: A

QUESTION 541
An AWS customer is deploying a web application that is composed of a front-end running on
Amazon EC2 and of confidential data that is stored on Amazon S3. The customer security policy
that all access operations to this sensitive data must be authenticated and authorized by a
centralized access management system that is operated by a separate security team. In addition,
the web application team that owns and administers the EC2 web front-end instances is
prohibited from having any ability to access the data that circumvents this centralized access
management system. Which of the following configurations will support these requirements?

A. Encrypt the data on Amazon S3 using a CloudHSM that is operated by the separate security
team.
Configure the web application to integrate with the CloudHSM for decrypting approved data
access operations for trusted end-users.
B. Configure the web application to authenticate end-users against the centralized access
management system. Have the web application provision trusted users STS tokens entitling the
download of approved data directly from Amazon S3
C. Have the separate security team create and IAM role that is entitled to access the data on
Amazon S3. Have the web application team provision their instances with this role while denying
their IAM users access to the data on Amazon S3
D. Configure the web application to authenticate end-users against the centralized access
management system using SAML. Have the end-users authenticate to IAM using their SAML
token and download the approved data directly from S3.

Answer: B

QUESTION 542
What is web identity federation?

A. Use of an identity provider like Google or Facebook to become an AWS IAM User.
B. Use of an identity provider like Google or Facebook to exchange for temporary AWS security
credentials.
C. Use of AWS IAM User tokens to log in as a Google or Facebook user.
D. Use of AWS STS Tokens to log in as a Google or Facebook user.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 162
http://www.passleader.com
Answer: B

QUESTION 543
You are building a mobile app for consumers to post cat pictures online. You will be storing the
images in AWS S3. You want to run the system very cheaply and simply. Which one of these
options allows you to build a photo sharing application without needing to worry about scaling
expensive uploads processes, authentication/authorization and so forth?

A. Build the application out using AWS Cognito and web identity federation to allow users to log in
using Facebook or Google Accounts. Once they are logged in, the secret token passed to that
user is used to directly access resources on AWS, like AWS S3. (Amazon Cognito is a superset
of the functionality provided by web identity federation. Referlink)
B. Use JWT or SAML compliant systems to build authorization policies. Users log in with a
username and password, and are given a token they can use indefinitely to make calls against
the photo infrastructure.
C. Use AWS API Gateway with a constantly rotating API Key to allow access from the client-side.
Construct a custom build of the SDK and include S3 access in it.
D. Create an AWS oAuth Service Domain ad grant public signup and access to the domain. During
setup, add at least one major social media site as a trusted Identity Provider for users

Answer: A

QUESTION 544
The Marketing Director in your company asked you to create a mobile app that lets users post
sightings of good deeds known as random acts of kindness in 80-character summaries. You
decided to write the application in JavaScript so that it would run on the broadest range of
phones, browsers, and tablets. Your application should provide access to Amazon DynamoDB to
store the good deed summaries. Initial testing of a prototype shows that there aren't large spikes
in usage. Which option provides the most cost-effective and scalable architecture for this
application?

A. Provide the JavaScript client with temporary credentials from the Security Token Service using a
Token Vending Machine
B. Register the application with a Web Identity Provider like Amazon, Google, or Facebook, create
an IAM role for that provider, and set up permissions for the IAM role to allow S3 gets and
DynamoDB puts. You serve your mobile application out of an S3 bucket enabled as a web site.
Your client updates DynamoDB.
C. Provide the JavaScript client with temporary credentials from the Security Token Service using a
Token Vending Machine (TVM) to provide signed credentials mapped to an IAM user allowing
DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-
balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows
DynamoDB puts. Your server updates DynamoDB.
D. Register the JavaScript application with a Web Identity Provider like Amazon, Google, or
Facebook, create an IAM role for that provider, and set up permissions for the IAM role to allow
DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-
balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows
DynamoDB puts. Your server updates DynamoDB.

Answer: B

QUESTION 545

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 163
http://www.passleader.com
You run a web application with the following components Elastic Load Balancer (ELB), 3
Web/Application servers, 1 MySQL RDS database with read replicas, and Amazon Simple
Storage Service (Amazon S3) for static content. Average response time for users is increasing
slowly. What three CloudWatch RDS metrics will allow you to identify if the database is the
bottleneck? Choose 3 answers

A. The number of outstanding IOs waiting to access the disk

B. The amount of write latency
C. The amount of disk space occupied by binary logs on the master.
D. The amount of time a Read Replica DB Instance lags behind the source DB Instance
E. The average number of disk I/O operations per second.

Answer: ABD

QUESTION 546
Typically, you want your application to check whether a request generated an error before you
spend any time processing results. The easiest way to find out if an error occurred is to look for
an __________ node in the response from the Amazon RDS API.

A. Incorrect
B. Error
C. FALSE

Answer: B

QUESTION 547
In the Amazon CloudWatch, which metric should I be checking to ensure that your DB Instance
has enough free storage space?

A. FreeStorage
B. FreeStorageSpace
C. FreeStorageVolume
D. FreeDBStorageSpace

Answer: B

QUESTION 548
A user is receiving a notification from the RDS DB whenever there is a change in the DB security
group. The user does not want to receive these notifications for only a month. Thus, he does not
want to delete the notification. How can the user configure this?

A. Change the Disable button for notification to "Yes" in the RDS console
B. Set the send mail flag to false in the DB event notification console
C. The only option is to delete the notification from the console
D. Change the Enable button for notification to "No" in the RDS console

Answer: D

QUESTION 549

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 164
http://www.passleader.com
A sys admin is planning to subscribe to the RDS event notifications. For which of the below
mentioned source categories the subscription cannot be configured?

A. DB security group
B. DB snapshot
C. DB options group
D. DB parameter group

Answer: C

QUESTION 550
A user is planning to setup notifications on the RDS DB for a snapshot. Which of the below
mentioned event categories is not supported by RDS for this snapshot source type?

A. Backup
B. Creation
C. Deletion
D. Restoration

Answer: A

QUESTION 551
A system admin is planning to setup event notifications on RDS. Which of the below mentioned
services will help the admin setup notifications?

A. AWS SES
B. AWS Cloudtrail
C. AWS CloudWatch
D. AWS SNS

Answer: D

QUESTION 552
A user has setup an RDS DB with Oracle. The user wants to get notifications when someone
modifies the security group of that DB. How can the user configure that?

A. It is not possible to get the notifications on a change in the security group

B. Configure SNS to monitor security group changes
C. Configure event notification on the DB security group
D. Configure the CloudWatch alarm on the DB for a change in the security group

Answer: C

QUESTION 553
It is advised that you watch the Amazon CloudWatch "_____" metric (available via the AWS
Management Console or Amazon Cloud Watch APIs) carefully and recreate the Read Replica
should it fall behind due to replication errors.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 165
http://www.passleader.com
A. Write Lag
B. Read Replica
C. Replica Lag
D. Single Replica

Answer: C

QUESTION 554
Can I encrypt connections between my application and my DB Instance using SSL?

A. No
B. Yes
C. Only in VPC
D. Only in certain regions

Answer: B

QUESTION 555
Which of these configuration or deployment practices is a security risk for RDS?

A. Storing SQL function code in plaintext

B. Non-Multi-AZ RDS instance
C. Having RDS and EC2 instances exist in the same subnet
D. RDS in a public subnet

Answer: D

QUESTION 556
What does Amazon RDS stand for?

A. Regional Data Server.

B. Relational Database Service
C. Regional Database Service.

Answer: B

QUESTION 557
How many relational database engines does RDS currently support?

A. MySQL, Postgres, MariaDB, Oracle and Microsoft SQL Server

B. Just two: MySQL and Oracle.
C. Five: MySQL, PostgreSQL, MongoDB, Cassandra and SQLite.
D. Just one: MySQL.

Answer: A

QUESTION 558

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 166
http://www.passleader.com
If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot
the instance for the changes to take effect?

A. No
B. Yes

Answer: B

QUESTION 559
What is the name of licensing model in which I can use your existing Oracle Database licenses to
run Oracle deployments on Amazon RDS?

A. Bring Your Own License

B. Role Bases License
C. Enterprise License
D. License Included

Answer: A

QUESTION 560
Will I be charged if the DB instance is idle?

A. No
B. Yes
C. Only is running in GovCloud
D. Only if running in VPC

Answer: B

QUESTION 561
What is the minimum charge for the data transferred between Amazon RDS and Amazon EC2
Instances in the same Availability Zone?

A. USD 0.10 per GB

B. No charge. It is free.
C. USD 0.02 per GB
D. USD 0.01 per GB

Answer: B

QUESTION 562
Does Amazon RDS allow direct host access via Telnet, Secure Shell (SSH), or Windows Remote
Desktop Connection?

A. Yes
B. No
C. Depends on if it is in VPC or not

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 167
http://www.passleader.com
Answer: B

QUESTION 563
What are the two types of licensing options available for using Amazon RDS for Oracle?

A. BYOL and Enterprise License

B. BYOL and License Included
C. Enterprise License and License Included
D. Role based License and License Included

Answer: B

QUESTION 564
A user plans to use RDS as a managed DB platform. Which of the below mentioned features is
not supported by RDS?

A. Automated backup
B. Automated scaling to manage a higher load
C. Automated failure detection and recovery
D. Automated software patching

Answer: B

QUESTION 565
A user is launching an AWS RDS with MySQL. Which of the below mentioned options allows the
user to configure the InnoDB engine parameters?

A. Options group
B. Engine parameters
C. Parameter groups
D. DB parameters

Answer: C

QUESTION 566
A user is planning to use the AWS RDS with MySQL. Which of the below mentioned services the
user is not going to pay?

A. Data transfer
B. RDS CloudWatch metrics
C. Data storage
D. I/O requests per month

Answer: B

QUESTION 567
Which of the following notification endpoints or clients does Amazon Simple Notification Service
support? Choose 2 answers

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 168
http://www.passleader.com
A. Email
B. CloudFront distribution
C. File Transfer Protocol
D. Short Message Service
E. Simple Network Management Protocol

Answer: AD

QUESTION 568
What happens when you create a topic on Amazon SNS?

A. The topic is created, and it has the name you specified for it.
B. An ARN (Amazon Resource Name) is created
C. You can create a topic on Amazon SQS, not on Amazon SNS.
D. This question doesn't make sense.

Answer: B

QUESTION 569
A user has deployed an application on his private cloud. The user is using his own monitoring
tool. He wants to configure that whenever there is an error, the monitoring tool should notify him
via SMS. Which of the below mentioned AWS services will help in this scenario?

A. None because the user infrastructure is in the private cloud/

B. AWS SNS
C. AWS SES
D. AWS SMS

Answer: B

QUESTION 570
A user wants to make so that whenever the CPU utilization of the AWS EC2 instance is above
90%, the redlight of his bedroom turns on.
Which of the below mentioned AWS services is helpful for this purpose?

A. AWS CloudWatch + AWS SES

B. AWS CloudWatch + AWS SNS
C. It is not possible to configure the light with the AWS infrastructure services
D. AWS CloudWatch and a dedicated software turning on the light

Answer: B
Explanation:
Amazon Simple Notification Service (Amazon SNS. is a fast, flexible, and fully managed push
messaging service. Amazon SNS can deliver notifications by SMS text message or email to the
Amazon Simple Queue Service (SQS. queues or to any HTTP endpoint. The user can configure
some sensor devices at his home which receives data on the HTTP end point (REST calls. and
turn on the red light. The user can configure the CloudWatch alarm to send a notification to the

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 169
http://www.passleader.com
AWS SNS HTTP end point (the sensor device. and it will turn the light red when there is an alarm
condition.

QUESTION 571
A user is trying to understand AWS SNS.
To which of the below mentioned end points is SNS unable to send a notification?

A. Email JSON
B. HTTP
C. AWS SQS
D. AWS SES

Answer: D
Explanation:
Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, and fully managed push
messaging service.
Amazon SNS can deliver notifications by SMS text message or email to the Amazon Simple
Queue Service (SQS) queues or to any HTTP endpoint. The user can select one the following
transports as part of the subscription requests: “HTTP”, “HTTPS”,”Email”, “Email-JSON”, “SQS”,
“and SMS”.
http://aws.amazon.com/sns/faqs/

QUESTION 572
A user is running a webserver on EC2. The user wants to receive the SMS when the EC2
instance utilization is above the threshold limit.
Which AWS services should the user configure in this case?

A. AWS CloudWatch + AWS SES

B. AWS CloudWatch + AWS SNS
C. AWS CloudWatch + AWS SQS
D. AWS EC2 + AWS CloudWatch

QUESTION 573
A user is planning to host a mobile game on EC2 which sends notifications to active users on
either high score or the addition of new features. The user should get this notification when he is
online on his mobile device.
Which of the below mentioned AWS services can help achieve this functionality?

A. AWS Simple Notification Service

B. AWS Simple Queue Service

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 170
http://www.passleader.com
C. AWS Mobile Communication Service
D. AWS Simple Email Service

Answer: A
Explanation:
Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, and fully managed push
messaging service.
Amazon SNS makes it simple and cost-effective to push to mobile devices, such as iPhone, iPad,
Android, Kindle Fire, and internet connected smart devices, as well as pushing to other
distributed services.
http://aws.amazon.com/sns

QUESTION 574
A company is running a batch analysis every hour on their main transactional DB running on an
RDS MySQL instance to populate their central Data Warehouse running on Redshift. During the
execution of the batch their transactional applications are very slow. When the batch completes
they need to update the top management dashboard with the new data The dashboard is
produced by another system running on-premises that is currently started when a manually-sent
email notifies that an update is required The on-premises system cannot be modified because is
managed by another team. How would you optimize this scenario to solve performance issues
and automate the process as much as possible?

A. Replace RDS with Redshift for the batch analysis and SNS to notify the on-premises system to
update the dashboard
B. Replace RDS with Redshift for the batch analysis and SQS to send a message to the on-
premises system to update the dashboard
C. Create an RDS Read Replica for the batch analysis and SNS to notify me on-premises system to
update the dashboard
D. Create an RDS Read Replica for the batch analysis and SQS to send a message to the on-
premises system to update the dashboard.

Answer: C
Explanation:
Between SNS and SQS, in this case, it must be SNS because it needs to send notification from
cloud (publisher) to the subscribers. SQS is distributing queue messaging. so, B and D are out.
The scenario says the redshift already implemented and is getting populated by RDS, Then no
need to implement or replace anything with redshift. The only thing that is left is read replica that
hasn’t mentioned anything about it in scenario and definitely enhances the performance.

QUESTION 575
A data engineer in a manufacturing company is designing a data processing platform that
receives a large volume of unstructured data. The data engineer must populate a well-structured
star schema in Amazon Redshift.
What is the most efficient architecture strategy for this purpose?

A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV
data into the analysis schema within Redshift.
B. Load the unstructured data into Redshift, and use string parsing functions to extract structured
data for inserting into the analysis schema.
C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform
the file contents. Insert the data into the analysis schema on Redshift.
D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 171
http://www.passleader.com
use AWS Lambda to INSERT the data into Redshift.

Answer: A

QUESTION 576
A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the
free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm
must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and
export the results of the model into Amazon Machine Learning.
B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming
program step.
C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client
to run analysis against the text index.
D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.

Answer: B
Explanation:
Spam filtering is a machine learning algorithm. It works with EMR and S3 which are most suitable
scenario.

QUESTION 577
A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This
application must be submitted to regulators for review. The data engineer needs to provide a
control framework that lists the security controls from the process to follow to add new users
down to the physical controls of the data center, including items like security guards and
cameras.
How should this control mapping be achieved using AWS?

A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS
responsibilities to the controls that must be provided.
B. Request data center Temporary Auditor access to an AWS data center to verify the control
mapping.
C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these
guidelines within the application's architecture to map to the control framework.
D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS
responsibilities to the control that must be provided.

Answer: A

QUESTION 578
An administrator needs to design a distribution strategy for a star schema in a Redshift cluster.
The administrator needs to determine the optimal distribution style for the tables in the Redshift
schema.
In which three circumstances would choosing Key-based distribution be most appropriate?
(Select three.)

A. When the administrator needs to optimize a large, slowly changing dimension table.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 172
http://www.passleader.com
B. When the administrator needs to reduce cross-node traffic.
C. When the administrator needs to optimize the fact table for parity with the number of slices.
D. When the administrator needs to balance data distribution and collocation data.
E. When the administrator needs to take advantage of data locality on a local node for joins and
aggregates.

Answer: BDE

QUESTION 579
Company A operates in Country X. Company A maintains a large dataset of historical purchase
orders that contains personal data of their customers in the form of full names and telephone
numbers. The dataset consists of 5 text files, 1TB each. Currently the dataset resides on-
premises due to legal requirements of storing personal data in-country. The research and
development department needs to run a clustering algorithm on the dataset and wants to use
Elastic Map Reduce service in the closest AWS region. Due to geographic distance, the minimum
latency between the on-premises system and the closet AWS region is 200 ms.

Which option allows Company A to do clustering in the AWS Cloud and meet the legal
requirement of maintaining personal data in-country?

A. Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in
the AWS region. Have the EMR cluster read the dataset using EMRFS.
B. Establish a Direct Connect link between the on-premises system and the AWS region to reduce
latency. Have the EMR cluster read the data directly from the on-premises storage system over
Direct Connect.
C. Encrypt the data files according to encryption standards of Country X and store them on AWS
region in Amazon S3. Have the EMR cluster read the dataset using EMRFS.
D. Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and
copy the files onto an EBS volume. Have the EMR cluster read the dataset using EMRFS.

Answer: B
Explanation:
How to Get Data Into Amazon EMR Amazon EMR provides several ways to get data onto a
cluster. The most common way is to upload the data to Amazon S3 and use the built-in features
of Amazon EMR to load the data onto your cluster. You can also use the Distributed Cache
feature of Hadoop to transfer files from a distributed file system to the local file system. The
implementation of Hive provided by Amazon EMR (Hive version 0.7.1.1 and later) includes
functionality that you can use to import and export data between DynamoDB and an Amazon
EMR cluster. If you have large amounts of on-premises data to process, you may find the AWS
Direct Connect service useful.

QUESTION 580
An administrator needs to design a strategy for the schema in a Redshift cluster. The
administrator needs to determine the optimal distribution style for the tables in the Redshift
schema.
In which two circumstances would choosing EVEN distribution be most appropriate? (Choose
two.)

A. When the tables are highly denormalized and do NOT participate in frequent joins.
B. When data must be grouped based on a specific key on a defined slice.
C. When data transfer between nodes must be eliminated.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 173
http://www.passleader.com
D. When a new table has been loaded and it is unclear how it will be joined to dimension.

Answer: AD
Explanation:
EVEN distribution is appropriate when a table does not participate in joins or when there is not a
clear choice between KEY distribution and ALL distribution.

QUESTION 581
A large grocery distributor receives daily depletion reports from the field in the form of gzip
archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files
are processed daily by an EMR job.

Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The
distributor needs to tune and optimize the data processing workflow with this limited information
to improve the performance of the EMR job.

Which recommendation should an administrator provide?

A. Reduce the HDFS block size to increase the number of task processors.
B. Use bzip2 or Snappy rather than gzip for the archives.
C. Decompress the gzip archives and store the data as CSV files.
D. Use Avro rather than gzip for the archives.

Answer: B
Explanation:
Because bzip2 or Snappy files are splittable and therefore multiple mappers can be used in
parallel.

QUESTION 582
A web-hosting company is building a web analytics tool to capture clickstream data from all of the
websites hosted within its platform and to provide near-real-time business intelligence. This entire
system is built on AWS services. The web-hosting company is interested in using Amazon
Kinesis to collect this data and perform sliding window analytics.

What is the most reliable and fault-tolerant technique to get each website to send data to Amazon
Kinesis with every click?

A. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis
PutRecord API. Use the sessionID as a partition key and set up a loop to retry until a success
response is received.
B. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis
Producer Library .addRecords method.
C. Each web server buffers the requests until the count reaches 500 and sends them to Amazon
Kinesis using the Amazon Kinesis PutRecord API.
D. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis
PutRecord API. Use the exponential back-off algorithm for retries until a successful response is
received.

Answer: A
Explanation:
There is concept of back-off algorithm in Kinesis.
https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 174
http://www.passleader.com
QUESTION 583
A customer has an Amazon S3 bucket. Objects are uploaded simultaneously by a cluster of
servers from multiple streams of data. The customer maintains a catalog of objects uploaded in
Amazon S3 using an Amazon DynamoDB table. This catalog has the following fileds:
StreamName, TimeStamp, and ServerName, from which ObjectName can be obtained.

The customer needs to define the catalog to support querying for a given stream or server within
a defined time range.

Which DynamoDB table scheme is most efficient to support these queries?

A. Define a Primary Key with ServerName as Partition Key and TimeStamp as Sort Key. Do NOT
define a Local Secondary Index or Global Secondary Index.
B. Define a Primary Key with StreamName as Partition Key and TimeStamp followed by
ServerName as Sort Key. Define a Global Secondary Index with ServerName as partition key and
TimeStamp followed by StreamName.
C. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with
StreamName as Partition Key. Define a Global Secondary Index with TimeStamp as Partition
Key.
D. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with
TimeStamp as Partition Key. Define a Global Secondary Index with StreamName as Partition Key
and TimeStamp as Sort Key.

Answer: B
Explanation:
B is correct because you can use composite primary keys using a combination of (StreamName
as the partition key and TimeStamp as the sort key) and (ServerName as the partition key and
TimeStamp as the sort key) which would allow you to meet the requirements of the question.
A is incorrect because you have no way of querying for StreamName.
C and D are incorrect because assuming that the DynamoDB table has been set, it is not
possible to add a LSI at that point.

QUESTION 584
A company has several teams of analysts. Each team of analysts has their own cluster. The
teams need to run SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The
company needs to enable a centralized metadata layer to expose the Amazon S3 objects as
tables to the analysts.

Which approach meets the requirement for a centralized metadata layer?

A. EMRFS consistent view with a common Amazon DynamoDB table

B. Bootstrap action to change the Hive Metastore to an Amazon RDS database
C. s3distcp with the outputManifest option to generate RDS DDL
D. Naming scheme support with automatic partition discovery from Amazon S3

Answer: B
Explanation:
Consistent view addresses an issue that can arise due to the Amazon S3 Data Consistency
Model.
For example, if you add objects to Amazon S3 in one operation and then immediately list objects
in a subsequent operation, the list and the set of objects processed may be incomplete.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 175
http://www.passleader.com
QUESTION 585
An administrator needs to manage a large catalog of items from various external sellers. The
administrator needs to determine if the items should be identified as minimally dangerous,
dangerous, or highly dangerous based on their textual descriptions. The administrator already
has some items with the danger attribute, but receives hundreds of new item descriptions every
day without such classification.

The administrator has a system that captures dangerous goods reports from customer support
team of from user feedback.

What is a cost-effective architecture to solve this issue?

A. Build a set of regular expression rules that are based on the existing examples, and run them on
the DynamoDB Streams as every new item description is added to the system.
B. Build a Kinesis Streams process that captures and marks the relevant items in the dangerous
goods reports using a Lambda function once more than two reports have been filed.
C. Build a machine learning model to properly classify dangerous goods and run it on the
DynamoDB Streams as every new item description is added to the system.
D. Build a machine learning model with binary classification for dangerous goods and run it on the
DynamoDB Streams as every new item description is added to the system.

Answer: C
Explanation:
https://aws.amazon.com/cn/blogs/machine-learning/anomaly-detection-on-amazon-dynamodb-
streams-using-the-amazon-sagemaker-random-cut-forest-algorithm/

QUESTION 586
A company receives data sets coming from external providers on Amazon S3. Data sets from
different providers are dependent on one another. Data sets will arrive at different times and in no
particular order.

A data architect needs to design a solution that enables the company to do the following:

Rapidly perform cross data set analysis as soon as the data becomes
available
Manage dependencies between data sets that arrive at different times

Which architecture strategy offers a scalable and cost-effective solution that meets these
requirements?

A. Maintain data dependency information in Amazon RDS for MySQL. Use an AWS Data Pipeline
job to load an Amazon EMR Hive table based on task dependencies and event notification
triggers in Amazon S3.
B. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon SNS and
event notifications to publish data to fleet of Amazon EC2 workers. Once the task dependencies
have been resolved, process the data with Amazon EMR.
C. Maintain data dependency information in an Amazon ElastiCache Redis cluster. Use Amazon S3
event notifications to trigger an AWS Lambda function that maps the S3 object to Redis. Once the
task dependencies have been resolved, process the data with Amazon EMR.
D. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon S3 event
notifications to trigger an AWS Lambda function that maps the S3 object to the task associated

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 176
http://www.passleader.com
with it in DynamoDB. Once all task dependencies have been resolved, process the data with
Amazon EMR.

Answer: D
Explanation:
Since the question was s scalable and cost effective solution.

QUESTION 587
A media advertising company handles a large number of real-time messages sourced from over
200 websites in real time. Processing latency must be kept low. Based on calculations, a 60-
shard Amazon Kinesis stream is more than sufficient to handle the maximum data throughput,
even with traffic spikes. The company also uses an Amazon Kinesis Client Library (KCL)
application running on Amazon Elastic Compute Cloud (EC2) managed by an Auto Scaling
group. Amazon CloudWatch indicates an average of 25% CPU and a modest level of network
traffic across all running servers.

The company reports a 150% to 200% increase in latency of processing messages from Amazon
Kinesis during peak times. There are NO reports of delay from the sites publishing to Amazon
Kinesis.

What is the appropriate solution to address the latency?

A. Increase the number of shards in the Amazon Kinesis stream to 80 for greater concurrency.
B. Increase the size of the Amazon EC2 instances to increase network throughput.
C. Increase the minimum number of instances in the Auto Scaling group.
D. Increase Amazon DynamoDB throughput on the checkpoint table.

Answer: D
Explanation:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html

QUESTION 588
A Redshift data warehouse has different user teams that need to query the same table with very
different query types. These user teams are experiencing poor performance.

Which action improves performance for the user teams in this situation?

A. Create custom table views.

B. Add interleaved sort keys per team.
C. Maintain team-specific copies of the table.
D. Add support for workload management queue hopping.

Answer: B
Explanation:
https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html#t_Sorting_data-interleaved

QUESTION 589
A company operates an international business served from a single AWS region. The company
wants to expand into a new country. The regulator for that country requires the Data Architect to
maintain a log of financial transactions in the country within 24 hours of the product transaction.
The production application is latency insensitive. The new country contains another AWS region.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 177
http://www.passleader.com
What is the most cost-effective way to meet this requirement?

A. Use CloudFormation to replicate the production application to the new region.

B. Use Amazon CloudFront to serve application content locally in the country; Amazon CloudFront
logs will satisfy the requirement.
C. Continue to serve customers from the existing region while using Amazon Kinesis to stream
transaction data to the regulator.
D. Use Amazon S3 cross-region replication to copy and persist production transaction logs to a
bucket in the new country's region.

Answer: D
Explanation:
CRR is a bucket-level configuration, and it can help you meet compliance requirements and
minimize latency by keeping copies of your data in different Regions.
https://aws.amazon.com/blogs/big-data/trigger-cross-region-replication-of-pre-existing-objects-
using-amazon-s3-inventory-amazon-emr-and-amazon-athena/

QUESTION 590
An administrator needs to design the event log storage architecture for events from mobile
devices. The event data will be processed by an Amazon EMR cluster daily for aggregated
reporting and analytics before being archived.

How should the administrator recommend storing the log data?

A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on
the device folders.
B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data
to table. Execute the EMR job on the Amazon DynamoDB table.
C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the
daily folder.
D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the
EMR job on the table.

Answer: C
Explanation:
Daily EMR job and based on time not device.

QUESTION 591
A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data
engineer needs to make sure it complies with regulatory requirements. The auditor must be able
to confirm at any point which servers are running and which network access controls are
deployed.

Which action should the data engineer take to meet this requirement?

A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group.
B. Provide the auditor with SSH keys for access to the Amazon EMR cluster.
C. Provide the auditor with CloudFormation templates.
D. Provide the auditor with access to AWS DirectConnect to use their existing tools.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 178
http://www.passleader.com
Answer: A
Explanation:
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-
functions.html#jf_security-auditor
For option C, there is not any information about cloudformation templates.

QUESTION 592
A social media customer has data from different data sources including RDS running MySQL,
Redshift, and Hive on EMR. To support better analysis, the customer needs to be able to analyze
data from different data sources and to combine the results.

What is the most cost-effective solution to meet these requirements?

A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy
data to Redshift for analysis.
B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector
to select from different data sources in a single query.
C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to
analyze.
D. Write a program running on a separate EC2 instance to run queries to three different systems.
Aggregate the results after getting the responses from all three systems.

Answer: B
Explanation:
Presto is a fast SQL query engine designed for interactive analytic queries over large datasets
from multiple sources.
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto.html), and EMR had been
already provisioned.

QUESTION 593
An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3,
originating from multiple unique data sources. The customer needs to query common fields
across some of the data sets to be able to perform interactive joins and then display results
quickly.

Which technology is most appropriate to enable this capability?

A. Presto
B. MicroStrategy
C. Pig
D. R Studio

Answer: A
Explanation:
Presto is good for Peta bytes of data and for interactive queries while pig is mostly used for etl
processing.

QUESTION 594
A game company needs to properly scale its game application, which is backed by DynamoDB.
Amazon Redshift has the past two years of historical data. Game traffic varies throughout the
year based on various factors such as season, movie release, and holiday season. An

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 179
http://www.passleader.com
administrator needs to calculate how much read and write throughput should be provisioned for
DynamoDB table for each week in advance.

How should the administrator accomplish this task?

A. Feed the data into Amazon Machine Learning and build a regression model.
B. Feed the data into Spark Mlib and build a random forest modest.
C. Feed the data into Apache Mahout and build a multi-classification model.
D. Feed the data into Amazon Machine Learning and build a binary classification model.

Answer: A
Explanation:
Regression model is used for numeric value and here we are looking for RCU/WCU which is
numeric as well. Though B can be an option but it's not cost effective as it needs EMR cluster.

QUESTION 595
A data engineer is about to perform a major upgrade to the DDL contained within an Amazon
Redshift cluster to support a new data warehouse application. The upgrade scripts will include
user permission updates, view and table structure changes as well as additional loading and data
manipulation tasks.

The data engineer must be able to restore the database to its existing state in the event of issues.

Which action should be taken prior to performing this upgrade task?

A. Run an UNLOAD command for all data in the warehouse and save it to S3.
B. Create a manual snapshot of the Amazon Redshift cluster.
C. Make a copy of the automated snapshot on the Amazon Redshift cluster.
D. Call the waitForSnapshotAvailable command from either the AWS CLI or an AWS SDK.

Answer: B
Explanation:
https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html#working-with-
snapshot-restore-table-from-snapshot

QUESTION 596
A large oil and gas company needs to provide near real-time alerts when peak thresholds are
exceeded in its pipeline system. The company has developed a system to capture pipeline
metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors
deliver to AWS IoT.

What is a cost-effective way to provide near real-time alerts on the pipeline metrics?

A. Create an AWS IoT rule to generate an Amazon SNS notification.

B. Store the data points in an Amazon DynamoDB table and poll if for peak metrics data from an
Amazon EC2 application.
C. Create an Amazon Machine Learning model and invoke it with AWS Lambda.
D. Use Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk.

Answer: A
Explanation:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 180
http://www.passleader.com
IOT rule is fast and cost-effective.

QUESTION 597
A company is using Amazon Machine Learning as part of a medical software application. The
application will predict the most likely blood type for a patient based on a variety of other clinical
tests that are available when blood type knowledge is unavailable.

What is the appropriate model choice and target attribute combination for this problem?

A. Multi-class classification model with a categorical target attribute.

B. Regression model with a numeric target attribute.
C. Binary Classification with a categorical target attribute.
D. K-Nearest Neighbors model with a multi-class target attribute.

Answer: A
Explanation:
KNN is simplest ML, blood type classification is serious.

QUESTION 598
A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data
engineer needs to build a dashboard that will be used by customers. Five big customers
represent 80% of usage, and there is a long tail of dozens of smaller customers. The data
engineer has selected the dashboarding tool.

How should the data engineer make sure that the larger customer workloads do NOT interfere
with the smaller customer workloads?

A. Apply query filters based on customer-id that can NOT be changed by the user and apply
distribution keys on customer-id.
B. Place the largest customers into a single user group with a dedicated query queue and place the
rest of the customers into a different query queue.
C. Push aggregations into an RDS for Aurora instance.
Connect the dashboard application to Aurora rather than Redshift for faster queries.
D. Route the largest customers to a dedicated Redshift cluster.
Raise the concurrency of the multi-tenant Redshift cluster to accommodate the remaining
customers.

Answer: B
Explanation:
The question asks for a solution in which large customer workload "do NOT interfere" with smaller
customer. It doesn't tell multi-tenant separation. Performance is important in serving the large
customers.
https://docs.aws.amazon.com/redshift/latest/dg/c_workload_mngmt_classification.html

QUESTION 599
An Amazon Kinesis stream needs to be encrypted.

Which approach should be used to accomplish this task?

A. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the
producer.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 181
http://www.passleader.com
B. Use a partition key to segment the data by MD5 hash function, which makes it undecipherable
while in transit.
C. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the
consumer.
D. Use a shard to segment the data, which has built-in functionality to make it indecipherable while
in transit.

Answer: A
Explanation:
https://docs.aws.amazon.com/firehose/latest/dev/encryption.html

QUESTION 600
An online photo album app has a key design feature to support multiple screens (e.g, desktop,
mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be saved
in different resolutions and layouts.

The image-processing Java program takes an average of five seconds per upload, depending on
the image size and format. Each image upload captures the following image metadata: user,
album, photo label, upload timestamp.

The app should support the following requirements:

Hundreds of user image uploads per second

Maximum image upload size of 10 MB
Maximum image metadata size of 1 KB
Image displayed in optimized resolution in all supported screens no
later than one minute after image upload

Which strategy should be used to meet these requirements?

A. Write images and metadata to Amazon Kinesis. Use a Kinesis Client Library (KCL) application to
run the image processing and save the image output to Amazon S3 and metadata to the app
repository DB.
B. Write image and metadata RDS with BLOB data type. Use AWS Data Pipeline to run the image
processing and save the image output to Amazon S3 and metadata to the app repository DB.
C. Upload image with metadata to Amazon S3, use Lambda function to run the image processing
and save the images output to Amazon S3 and metadata to the app repository DB.
D. Write image and metadata to Amazon Kinesis. Use Amazon Elastic MapReduce (EMR) with
Spark Streaming to run image processing and save the images output to Amazon S3 and
metadata to app repository DB.

Answer: C
Explanation:
https://aws.amazon.com/blogs/big-data/building-and-maintaining-an-amazon-s3-metadata-index-
without-servers/

QUESTION 601
A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its
Redshift schema. The ORDERS table has foreign key relationships with multiple dimension
tables in this schema.

How should the company determine the most appropriate distribution key for the ORDERS table?

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 182
http://www.passleader.com
A. Identify the largest and most frequently joined dimension table and ensure that it and the
ORDERS table both have EVEN distribution.
B. Identify the largest dimension table and designate the key of this dimension table as the
distribution key of the ORDERS table.
C. Identify the smallest dimension table and designate the key of this dimension table as the
distribution key of the ORDERS table.
D. Identify the largest and the most frequently joined dimension table and designate the key of this
dimension table as the distribution key of the ORDERS table.

Answer: D
Explanation:
https://aws.amazon.com/blogs/big-data/optimizing-for-star-schemas-and-interleaved-sorting-on-
amazon-redshift/

QUESTION 602
A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP
address into 5-minute chunks stored in Amazon S3.
Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries
always reference a single IP address. Data must be optimized for querying based on IP address
using Hive running on Amazon EMR.

What is the most efficient method to query the data with Hive?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.
B. Store the Amazon S3 objects with the following naming scheme:
bucket_name/source=ip_address/ year=yy/month=mm/day=dd/hour=hh/filename.
C. Store the data in an HBase table with the IP address as the row key.
D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys:
Hive_Partitioned_IPAddress.

Answer: B
Explanation:
Because analysts are, currently, using HIve and storing the data in organized manner on S3
would be efficient solution.

QUESTION 603
An online retailer is using Amazon DynamoDB to store data related to customer transactions. The
items in the table contains several string attributes describing the transaction as well as a JSON
attribute containing the shopping cart and other details corresponding to the transaction. Average
item size is 250KB, most of which is associated with the JSON attribute. The average customer
generates 3GB of data per month.

Customers access the table to display their transaction history and review transaction details as
needed. Ninety percent of the queries against the table are executed when building the
transaction history view, with the other 10% retrieving transaction details. The table is partitioned
on CustomerID and sorted on transaction date.

The client has very high read capacity provisioned for the table and experiences very even
utilization, but complains about the cost of Amazon DynamoDB compared to other NoSQL
solutions.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 183
http://www.passleader.com
Which strategy will reduce the cost associated with the client's read queries while not degrading
quality?

A. Modify all database calls to use eventually consistent reads and advise customers that
transaction history may be one second out-of-date.
B. Change the primary table to partition on TransactionID, create a GSI partitioned on customer and
sorted on date, project small attributes into GSI, and then query GSI for summary data and the
primary table for JSON details.
C. Vertically partition the table, store base attributes on the primary table, and create a foreign key
reference to a secondary table containing the JSON data. Query the primary table for summary
data and the secondary table for JSON details.
D. Create an LSI sorted on date, project the JSON attribute into the index, and then query the
primary table for summary data and the LSI for JSON details.

Answer: B
Explanation:
There is a limitation of max. 10GB data size per partition key for tables with LSI index and it is
given that avg customer generates approx. 3GB data per month which will make the data size
exceed 10GB limit in few months per a customer.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollecti
ons.SizeLimit

QUESTION 604
A company that manufactures and sells smart air conditioning units also offers add-on services
so that customers can see real-time dashboards in a mobile application or a web browser. Each
unit sends its sensor information in JSON format every two seconds for processing and analysis.
The company also needs to consume this data to predict possible equipment problems before
they occur. A few thousand pre-purchased units will be delivered in the next couple of months.
The company expects high market growth in the next year and needs to handle a massive
amount of data and scale without interruption.
Which ingestion solution should the company use?

A. Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications
for the end-consumer dashboard and anomaly detection workflows.
B. Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data
downstream to the end-consumer dashboard and to the anomaly detection application.
C. Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3)
as the destination. Consume the data with a KCL application for the end-consumer dashboard
and anomaly detection.
D. Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-
consumer dashboard and anomaly detection application on top of Amazon RDS.

Answer: A
Explanation:
https://docs.aws.amazon.com/streams/latest/dev/introduction.html

QUESTION 605
An organization needs a data store to handle the following data types and access patterns:

Faceting
Search
Flexible schema (JSON) and fixed schema

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 184
http://www.passleader.com
Noise word elimination

Which data store should the organization choose?

A. Amazon Relational Database Service (RDS)

B. Amazon Redshift
C. Amazon DynamoDB
D. Amazon Elasticsearch Service

Answer: D

QUESTION 606
A travel website needs to present a graphical quantitative summary of its daily bookings to
website visitors for marketing purposes. The website has millions of visitors per day, but wants to
control costs by implementing the least-expensive solution for this visualization.

What is the most cost-effective solution?

A. Generate a static graph with a transient EMR cluster daily, and store it an Amazon S3.
B. Generate a graph using MicroStrategy backed by a transient EMR cluster.
C. Implement a Jupyter front-end provided by a continuously running EMR cluster leveraging spot
instances for task nodes.
D. Implement a Zeppelin application that runs on a long-running EMR cluster.

Answer: A

QUESTION 607
A system engineer for a company proposes digitalization and backup of large archives for
customers. The systems engineer needs to provide users with a secure storage that makes sure
that data will never be tampered with once it has been uploaded.

How should this be accomplished?

A. Create an Amazon Glacier Vault. Specify a "Deny" Vault Lock policy on this Vault to block
"glacier:DeleteArchive".
B. Create an Amazon S3 bucket. Specify a "Deny" bucket policy on this bucket to block
"s3:DeleteObject".
C. Create an Amazon Glacier Vault. Specify a "Deny" vault access policy on this Vault to block
"glacier:DeleteArchive".
D. Create secondary AWS Account containing an Amazon S3 bucket. Grant "s3:PutObject" to the
primary account.

Answer: A
Explanation:
https://docs.aws.amazon.com/amazonglacier/latest/dev/vault-lock-policy.html

QUESTION 608
An organization needs to design and deploy a large-scale data storage solution that will be highly
durable and highly flexible with respect to the type and structure of data being stored. The data to
be stored will be sent or generated from a variety of sources and must be persistently available

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 185
http://www.passleader.com
for access and processing by multiple applications.

What is the most cost-effective technique to meet these requirements?

A. Use Amazon Simple Storage Service (S3) as the actual data storage system, coupled with
appropriate tools for ingestion/acquisition of data and for subsequent processing and querying.
B. Deploy a long-running Amazon Elastic MapReduce (EMR) cluster with Amazon Elastic Block
Store (EBS) volumes for persistent HDFS storage and appropriate Hadoop ecosystem tools for
processing and querying.
C. Use Amazon Redshift with data replication to Amazon Simple Storage Service (S3) for
comprehensive durable data storage, processing, and querying.
D. Launch an Amazon Relational Database Service (RDS), and use the enterprise grade and
capacity of the Amazon Aurora engine for storage, processing, and querying.

Answer: A

QUESTION 609
A customer has a machine learning workflow that consists of multiple quick cycles of reads-
writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned
that the reads in subsequent cycles will miss new data critical to the machine learning from the
prior cycles.

How should the customer accomplish this?

A. Turn on EMRFS consistent view when configuring the EMR cluster.

B. Use AWS Data Pipeline to orchestrate the data processing cycles.
C. Set hadoop.data.consistency = true in the core-site.xml file.
D. Set hadoop.s3.consistency = true in the core-site.xml file.

Answer: A

QUESTION 610
An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS
CLI to create a KMS encrypted snapshot of the database in another AWS region.

Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.

B. Copy the existing KMS key to the destination region.
C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source
region.
D. In the source region, enable cross-region replication and specify the name of the copy grant
created.
E. In the destination region, enable cross-region replication and specify the name of the copy grant
created.

Answer: ACD
Explanation:
https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html#working-with-
aws-kms

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 186
http://www.passleader.com
QUESTION 611
Managers in a company need access to the human resources database that runs on Amazon
Redshift, to run reports about their employees. Managers must only see information about their
direct reports.

Which technique should be used to address this requirement with Amazon Redshift?

A. Define an IAM group for each manager with each employee as an IAM user in that group, and
use that to limit the access.
B. Use Amazon Redshift snapshot to create one cluster per manager. Allow the manager to access
only their designated clusters.
C. Define a key for each manager in AWS KMS and encrypt the data for their employees with their
private keys.
D. Define a view that uses the employee's manager name to filter the records based on current user
names.

Answer: D
Explanation:
IAM access let you control access or deny at table level but not at records level as per my
understanding.

QUESTION 612
A company is building a new application in AWS. The architect needs to design a system to
collect application log events. The design should be a repeatable pattern that minimizes data loss
if an application instance fails, and keeps a durable copy of a log data for at least 30 days.

What is the simplest architecture that will allow the architect to analyze the logs?

A. Write them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an
Amazon Redshift cluster for analysis.
B. Write them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that
runs in response to the S3 event to load the events into Amazon Elasticsearch Service for
analysis.
C. Write them to the local disk and configure the Amazon CloudWatch Logs agent to load the data
into CloudWatch Logs and subsequently into Amazon Elasticsearch Service.
D. Write them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an
Amazon Elastic MapReduce (EMR) cluster for analysis.

Answer: B
Explanation:
S3 is simple and durable and Elasticsearch is for log analysis.

QUESTION 613
An organization uses a custom map reduce application to build monthly reports based on many
small data files in an Amazon S3 bucket. The data is submitted from various business units on a
frequent but unpredictable schedule. As the dataset continues to grow, it becomes increasingly
difficult to process all of the data in one day. The organization has scaled up its Amazon EMR
cluster, but other optimizations could improve performance.

The organization needs to improve performance with minimal changes to existing processes and
applications.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 187
http://www.passleader.com
What action should the organization take?

A. Use Amazon S3 Event Notifications and AWS Lambda to create a quick search file index in
DynamoDB.
B. Add Spark to the Amazon EMR cluster and utilize Resilient Distributed Datasets in-memory.
C. Use Amazon S3 Event Notifications and AWS Lambda to index each file into an Amazon
Elasticsearch Service cluster.
D. Schedule a daily AWS Data Pipeline process that aggregates content into larger files using
S3DistCp.
E. Have business units submit data via Amazon Kinesis Firehose to aggregate data hourly into
Amazon S3.

Answer: D
Explanation:
https://aws.amazon.com/blogs/big-data/seven-tips-for-using-s3distcp-on-amazon-emr-to-move-
data-efficiently-between-hdfs-and-amazon-s3/

QUESTION 614
An administrator is processing events in near real-time using Kinesis streams and Lambda.
Lambda intermittently fails to process batches from one of the shards due to a 5-munite time limit.

What is a possible solution for this problem?

A. Add more Lambda functions to improve concurrent batch processing.

B. Reduce the batch size that Lambda is reading from the stream.
C. Ignore and skip events that are older than 5 minutes and put them to Dead Letter Queue (DLQ).
D. Configure Lambda to read from fewer shards in parallel.

Answer: B
Explanation:
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
Test with different batch and record sizes so that the polling frequency of each event source is
tuned to how quickly your function is able to complete its task. BatchSize controls the maximum
number of records that can be sent to your function with each invoke. A larger batch size can
often more efficiently absorb the invoke overhead across a larger set of records, increasing your
throughput. By default, Lambda invokes your function as soon as records are available in the
stream. If the batch it reads from the stream only has one record in it, Lambda only sends one
record to the function. To avoid invoking the function with a small number of records, you can tell
the event source to buffer records for up to 5 minutes by configuring a batch window. Before
invoking the function, Lambda continues to read records from the stream until it has gathered a
full batch, or until the batch window expires.

QUESTION 615
An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transform-
load (ETL) steps that run in sequence. The output of each step must be fully processed in
subsequent steps but will not be retained.

Which of the following techniques will meet this requirement most efficiently?

A. Use the EMR File System (EMRFS) to store the outputs from each step as objects in Amazon
Simple Storage Service (S3).

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 188
http://www.passleader.com
B. Use the s3n URI to store the data to be processed as objects in Amazon S3.
C. Define the ETL steps as separate AWS Data Pipeline activities.
D. Load the data to be processed into HDFS, and then write the final output to Amazon S3.

Answer: C
Explanation:
The question never mentioned anything about keeping the final output in s3. ETL might be to-
and-from any other database. And D only says load data to be processed in HDFS, and not really
the output of each process.

QUESTION 616
The department of transportation for a major metropolitan area has placed sensors on roads at
key locations around the city. The goal is to analyze the flow of traffic and notifications from
emergency services to identify potential issues and to help planners correct trouble spots.

A data engineer needs a scalable and fault-tolerant solution that allows planners to respond to
issues within 30 seconds of their occurrence.

Which solution should the data engineer choose?

A. Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for
analysis.
Collect emergency services events with Amazon SQS and store in Amazon DynampDB for
analysis.
B. Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect
emergency services events with Amazon Kinesis Firehose and store in Amazon Redshift for
analysis.
C. Collect both sensor data and emergency services events with Amazon Kinesis Streams and use
DynamoDB for analysis.
D. Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use
Amazon Redshift for analysis.

Answer: C
Explanation:
https://sookocheff.com/post/aws/comparing-kinesis-and-sqs/

QUESTION 617
A telecommunications company needs to predict customer churn (i.e., customers who decide to
switch to a competitor). The company has historic records of each customer, including monthly
consumption patterns, calls to customer service, and whether the customer ultimately quit the
service. All of this data is stored in Amazon S3. The company needs to know which customers
are likely going to churn soon so that they can win back their loyalty.

What is the optimal approach to meet these requirements?

A. Use the Amazon Machine Learning service to build the binary classification model based on the
dataset stored in Amazon S3.
The model will be used regularly to predict churn attribute for existing customers.
B. Use AWS QuickSight to connect it to data stored in Amazon S3 to obtain the necessary business
insight.
Plot the churn trend graph to extrapolate churn likelihood for existing customers.
C. Use EMR to run the Hive queries to build a profile of a churning customer.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 189
http://www.passleader.com
Apply a profile to existing customers to determine the likelihood of churn.
D. Use a Redshift cluster to COPY the data from Amazon S3.
Create a User Defined Function in Redshift that computes the likelihood of churn.

Answer: A
Explanation:
https://aws.amazon.com/blogs/machine-learning/predicting-customer-churn-with-amazon-
machine-learning/

QUESTION 618
A system needs to collect on-premises application spool files into a persistent storage layer in
AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is
automatically deleted from the local server after an hour.
What is the most cost-efficient option to meet these requirements?

A. Write file contents to an Amazon DynamoDB table.

B. Copy files to Amazon S3 Standard Storage.
C. Write file contents to Amazon ElastiCache.
D. Copy files to Amazon S3 infrequent Access Storage.

Answer: A
Explanation:
With smaller object size and total storage size per month, DynamoDB is a more cost effective
solution. S3 would be a better choice if the object size and storage size per month was much
larger.

QUESTION 619
An administrator receives about 100 files per hour into Amazon S3 and will be loading the files
into Amazon Redshift. Customers who analyze the data within Redshift gain significant value
when they receive data as quickly as possible. The customers have agreed to a maximum
loading interval of 5 minutes.

Which loading approach should the administrator use to meet this objective?

A. Load each file as it arrives because getting data into the cluster as quickly as possibly is the
priority.
B. Load the cluster as soon as the administrator has the same number of files as nodes in the
cluster.
C. Load the cluster when the administrator has an event multiple of files relative to Cluster Slice
Count, or 5 minutes, whichever comes first.
D. Load the cluster when the number of files is less than the Cluster Slice Count.

Answer: C

QUESTION 620
An enterprise customer is migrating to Redshift and is considering using dense storage nodes in
its Redshift cluster. The customer wants to migrate 50 TB of data. The customer's query patterns
involve performing many joins with thousands of rows.

The customer needs to know how many nodes are needed in its target Redshift cluster. The
customer has a limited budget and needs to avoid performing tests unless absolutely needed.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 190
http://www.passleader.com
Which approach should this customer use?

A. Start with many small nodes.

B. Start with fewer large nodes.
C. Have two separate clusters with a mix of a small and large nodes.
D. Insist on performing multiple tests to determine the optimal configuration.

Answer: A
Explanation:
https://d1.awsstatic.com/whitepapers/Size-Cloud-Data-Warehouse-on-AWS.pdf
Using compression ratio of 3 as per the link. The 50TB/3= 16TB.
The calculation 50TB/3=16.66 * (1.25) =20.83 ~21TB. 21TB/2 =10.5 ~11 ds2.xlarge nodes
The calculation 50TB/3=16.66 * (1.25) =20.83 ~21TB. 21TB/16 =1.3125 ~2 ds2.8xlarge nodes.

QUESTION 621
A company is centralizing a large number of unencrypted small files from multiple Amazon S3
buckets. The company needs to verify that the files contain the same data after centralization.

Which method meets the requirements?

A. Compare the S3 Etags from the source and destination objects.

B. Call the S3 CompareObjects API for the source and destination objects.
C. Place a HEAD request against the source and destination objects comparing SIG v4.
D. Compare the size of the source and destination objects.

Answer: A
Explanation:
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

QUESTION 622
An online gaming company uses DynamoDB to store user activity logs and is experiencing
throttled writes on the company's DynamoDB table. The company is NOT consuming close to the
provisioned capacity. The table contains a large number of items and is partitioned on user and
sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU.

Which two additional pieces of information are required to determine the cause of the throttling?
(Choose two.)

A. The structure of any GSIs that have been defined on the table
B. CloudWatch data showing consumed and provisioned write capacity when writes are being
throttled
C. Application-level metrics showing the average item size and peak update rates for each attribute
D. The structure of any LSIs that have been defined on the table
E. The maximum historical WCU and RCU for the table

Answer: AD

QUESTION 623
A city has been collecting data on its public bicycle share program for the past three years. The

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 191
http://www.passleader.com
5PB dataset currently resides on Amazon S3. The data contains the following datapoints:

Bicycle origination points

Bicycle destination points
Mileage between the points
Number of bicycle slots available at the station (which is variable
based on the station location)
Number of slots available and taken at a given time

The program has received additional funds to increase the number of bicycle stations available.
All data is regularly archived to Amazon Glacier.
The new bicycle stations must be located to provide the most riders access to bicycles.

How should this task be performed?

A. Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC-2 based
Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent
optimization.
B. Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and
perform a SQL query that outputs the most popular bicycle stations.
C. Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a
Spark streaming job that will move the data into Amazon Kinesis.
D. Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot
instances to run a Spark job that performs a stochastic gradient descent optimization over
EMRFS.

Answer: D
Explanation:
For B, the query result is the existed bicycle stations who are most popular. But this question is to
add more bicycle stations which is no way to build the same place. So it is a prediction problem,
D is the right answer,

QUESTION 624
An administrator tries to use the Amazon Machine Learning service to classify social media posts
that mention the administrator's company into posts that require a response and posts that do
not. The training dataset of 10,000 posts contains the details of each post including the
timestamp, author, and full text of the post. The administrator is missing the target labels that are
required for training.

Which Amazon Machine Learning model is the most appropriate for the task?

A. Binary classification model, where the target class is the require-response post
B. Binary classification model, where the two classes are the require-response post and does-not-
require-response
C. Multi-class prediction model, with two classes: require-response post and does-not-require-
response
D. Regression model where the predicted value is the probability that the post requires a response

Answer: B
Explanation:
https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html
Binary Classification Model

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 192
http://www.passleader.com
ML models for binary classification problems predict a binary outcome (one of two possible
classes). To train binary classification models, Amazon ML uses the industry-standard learning
algorithm known as logistic regression

QUESTION 625
A medical record filing system for a government medical fund is using an Amazon S3 bucket to
archive documents related to patients. Every patient visit to a physician creates a new file, which
can add up millions of files each month. Collection of these files from each physician is handled
via a batch process that runs ever night using AWS Data Pipeline. This is sensitive data, so the
data and any associated metadata must be encrypted at rest.

Auditors review some files on a quarterly basis to see whether the records are maintained
according to regulations. Auditors must be able to locate any physical file in the S3 bucket for a
given date, patient, or physician. Auditors spend a significant amount of time location such files.

What is the most cost- and time-efficient collection methodology in this situation?

A. Use Amazon Kinesis to get the data feeds directly from physicians, batch them using a Spark
application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with
folders separated per physician.
B. Use Amazon API Gateway to get the data feeds directly from physicians, batch them using a
Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with
folders separated per physician.
C. Use Amazon S3 event notification to populate an Amazon DynamoDB table with metadata about
every file loaded to Amazon S3, and partition them based on the month and year of the file.
D. Use Amazon S3 event notification to populate an Amazon Redshift table with metadata about
every file loaded to Amazon S3, and partition them based on the month and year of the file.

Answer: C
Explanation:
They refer to collection and dynamodb refers to collection items could be linked. Also this will also
a look up or to find patient information quickly compared to the redshift answer.

QUESTION 626
A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who
participates in the trial requires visual reports each morning. The reports are built from
aggregations of all the sensor data taken each minute.

What is the most cost-effective solution for creating this visualization each day?

A. Use Kinesis Aggregators Library to generate reports for reviewing the patient sensor data and
generate a QuickSight visualization on the new data each morning for the physician to review.
B. Use a transient EMR cluster that shuts down after use to aggregate the patient sensor data each
night and generate a QuickSight visualization on the new data each morning for the physician to
review.
C. Use Spark streaming on EMR to aggregate the patient sensor data in every 15 minutes and
generate a QuickSight visualization on the new data each morning for the physician to review.
D. Use an EMR cluster to aggregate the patient sensor data each night and provide Zeppelin
notebooks that look at the new data residing on the cluster each morning for the physician to
review.

Answer: B

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 193
http://www.passleader.com
QUESTION 627
A company uses Amazon Redshift for its enterprise data warehouse. A new on-premises
PostgreSQL OLTP DB must be integrated into the data warehouse. Each table in the
PostgreSQL DB has an indexed timestamp column. The data warehouse has a staging layer to
load source data into the data last_modified
warehouse environment for further processing.

The data lag between the source PostgreSQL DB and the Amazon Redshift staging layer should
NOT exceed four hours.

What is the most efficient technique to meet these requirements?

A. Create a DBLINK on the source DB to connect to Amazon Redshift. Use a PostgreSQL trigger on
the source table to capture the new insert/update/delete event and execute the event on the
Amazon Redshift staging table.
B. Use a PostgreSQL trigger on the source table to capture the new insert/update/delete event and
write it to Amazon Kinesis Streams. Use a KCL application to execute the event on the Amazon
Redshift staging table.
C. Extract the incremental changes periodically using a SQL query. Upload the changes to multiple
Amazon Simple Storage Service (S3) objects, and run the COPY command to load to the
Amazon Redshift staging layer.
D. Extract the incremental changes periodically using a SQL query. Upload the changes to a single
Amazon Simple Storage Service (S3) object, and run the COPY command to load to the Amazon
Redshift staging layer.

Answer: C
Explanation:
https://forums.aws.amazon.com/message.jspa?messageID=807709
It doesn't seem to be time effective to do DBLINK. In addition as @muhsin said DBLINK to
redshift ONLY works with RDS postgresql instances

QUESTION 628
An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning
algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters for
each use case will be deployed. The data volumes on Amazon S3 are less than 10 GB.

How should the administrator align instance types with the cluster's purpose?

A. Machine Learning on C instance types and ad-hoc queries on R instance types

B. Machine Learning on R instance types and ad-hoc queries on G2 instance types
C. Machine Learning on T instance types and ad-hoc queries on M instance types
D. Machine Learning on D instance types and ad-hoc queries on I instance types

Answer: A

QUESTION 629
An organization is designing an application architecture. The application will have over 100 TB of
data and will support transactions that arrive at rates from hundreds per second to tens of
thousands per second, depending on the day of the week and time of the day. All transaction
data, must be durably and reliably stored. Certain read operations must be performed with strong

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 194
http://www.passleader.com
consistency.

Which solution meets these requirements?

A. Use Amazon DynamoDB as the data store and use strongly consistent reads when necessary.
B. Use an Amazon Relational Database Service (RDS) instance sized to meet the maximum
anticipated transaction rate and with the High Availability option enabled.
C. Deploy a NoSQL data store on top of an Amazon Elastic MapReduce (EMR) cluster, and select
the HDFS High Durability option.
D. Use Amazon Redshift with synchronous replication to Amazon Simple Storage Service (S3) and
row- level locking for strong consistency.

Answer: A

QUESTION 630
A company generates a large number of files each month and needs to use AWS import/export to
move these files into Amazon S3 storage. To satisfy the auditors, the company needs to keep a
record of which files were imported into Amazon S3.

What is a low-cost way to create a unique log for each import job?

A. Use the same log file prefix in the import/export manifest files to create a versioned log file in
Amazon S3 for all imports.
B. Use the log file prefix in the import/export manifest files to create a unique log file in Amazon S3
for each import.
C. Use the log file checksum in the import/export manifest files to create a unique log file in Amazon
S3 for each import.
D. Use a script to iterate over files in Amazon S3 to generate a log after each import/export job.

Answer: B
Explanation:
https://aws.amazon.com/cn/blogs/aws/send-us-that-data/
During manifest file creation we can specify "log file prefix"

QUESTION 631
A company needs a churn prevention model to predict which customers will NOT renew their
yearly subscription to the company's service. The company plans to provide these customers with
a promotional offer. A binary classification model that uses Amazon Machine Learning is
required.

On which basis should this binary classification model be built?

A. User profiles (age, gender, income, occupation)

B. Last user session
C. Each user time series events in the past 3 months
D. Quarterly results

Answer: A
Explanation:
https://aws.amazon.com/blogs/machine-learning/predicting-customer-churn-with-amazon-
machine-learning/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 195
http://www.passleader.com
Mobile operators have historical records on which customers ultimately ended up churning and
which continued using the service.
We can use this historical information to construct an ML model of one mobile operator’s churn
using a process called training.
After training the model, we can pass the profile information of an arbitrary customer (the same
profile information that we used to train the model) to the model, and have the model predict
whether this customer is going to churn.

QUESTION 632
A company with a support organization needs support engineers to be able to search historic
cases to provide fast responses on new issues raised. The company has forwarded all support
messages into an Amazon Kinesis Stream. This meets a company objective of using only
managed services to reduce operational overhead.

The company needs an appropriate architecture that allows support engineers to search on
historic cases and find similar issues and their associated responses.

Which AWS Lambda action is most appropriate?

A. Ingest and index the content into an Amazon Elasticsearch domain.

B. Stem and tokenize the input and store the results into Amazon ElastiCache.
C. Write data as JSON into Amazon DynamoDB with primary and secondary indexes.
D. Aggregate feedback in Amazon S3 using a columnar format with partitioning.

Answer: A

QUESTION 633
A solutions architect works for a company that has a data lake based on a central Amazon S3
bucket. The data contains sensitive information. The architect must be able to specify exactly
which files each user can access. Users access the platform through a SAML federation Single
Sign On platform.

The architect needs to build a solution that allows fine grained access control, traceability of
access to the objects, and usage of the standard tools (AWS Console, AWS CLI) to access the
data.

Which solution should the architect build?

A. Use Amazon S3 Server-Side Encryption with AWS KMS-Managed Keys for storing data. Use
AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for
auditing.
B. Use Amazon S3 Server-Side Encryption with Amazon S3-Managed Keys. Set Amazon S3 ACLs
to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.
C. Use Amazon S3 Client-Side Encryption with Client-Side Master Key. Set Amazon S3 ACLs to
allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.
D. Use Amazon S3 Client-Side Encryption with AWS KMS-Managed Keys for storing data. Use
AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for
auditing.

Answer: B
Explanation:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 196
http://www.passleader.com
Correct Answer is B as S3 Server Side Encryption with S3 Managed Keys provide encryption. S3
ACLs allows fine grained control access and S3 to access logs would help provide traceability
across all tools.
Use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) – Each object is
encrypted with a unique key. As an additional safeguard, it encrypts the key itself with a master
key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block
ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
Option C is wrong as with Client-Side Encryption, the users must have the keys to decrypt the
data.
When downloading an object—The client downloads the encrypted object from Amazon S3.
Using the material description from the object's metadata, the client determines which master key
to use to decrypt the data key. The client uses that master key to decrypt the data key and then
uses the data key to decrypt the object.
Options A & D are wrong as KMS Grants are mainly to provide access to the KMS keys. There is
not mention of fine grained control over the S3 objects.

QUESTION 634
A company that provides economics data dashboards needs to be able to develop software to
display rich, interactive, data-driven graphics that run in web browsers and leverages the full
stack of web standards (HTML, SVG, and CSS).

Which technology provides the most appropriate support for this requirements?

A. D3.js
B. IPython/Jupyter
C. R Studio
D. Hue

Answer: A
Explanation:
https://sa.udacity.com/course/data-visualization-and-d3js--ud507

QUESTION 635
A company hosts a portfolio of e-commerce websites across the Oregon, N. Virginia, Ireland, and
Sydney AWS regions. Each site keeps log files that capture user behavior. The company has
built an application that generates batches of product recommendations with collaborative filtering
in Oregon. Oregon was selected because the flagship site is hosted there and provides the
largest collection of data to train machine learning models against. The other regions do NOT
have enough historic data to train accurate machine learning models.

Which set of data processing steps improves recommendations for each region?

A. Use the e-commerce application in Oregon to write replica log files in each other region.
B. Use Amazon S3 bucket replication to consolidate log entries and build a single model in Oregon.
C. Use Kinesis as a buffer for web logs and replicate logs to the Kinesis stream of a neighboring
region.
D. Use the CloudWatch Logs agent to consolidate logs into a single CloudWatch Logs group.

Answer: D
Explanation:
https://aws.amazon.com/solutions/centralized-logging/

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 197
http://www.passleader.com
QUESTION 636
There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files
contain retail order information for the past 2 years. A data engineer needs to run multiple
interactive queries to manipulate the data. The Data Engineer has AWS access to spin up an
Amazon EMR cluster. The data engineer needs to use an application on the cluster to process
this data and return the results in interactive time frame.

Which application on the cluster should the data engineer use?

A. Oozie
B. Apache Pig with Tachyon
C. Apache Hive
D. Presto

Answer: D
Explanation:
Presto is an open source distributed SQL query engine for running interactive analytic queries
against data sources of all sizes ranging from gigabytes to petabytes.

QUESTION 637
A media advertising company handles a large number of real-time messages sourced from over
200 websites. The company's data engineer needs to collect and process records in real time for
analysis using Spark Streaming on Amazon Elastic MapReduce (EMR). The data engineer needs
to fulfill a corporate mandate to keep ALL raw messages as they are received as a top priority.

Which Amazon Kinesis configuration meets these requirements?

A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3).
Pull messages off Firehose with Spark Streaming in parallel to persistence to Amazon S3.
B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming
in parallel to AWS Lambda pushing messages from Streams to Firehose backed by Amazon
Simple Storage Service (S3).
C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3).
Use AWS Lambda to pull messages from Firehose to Streams for processing with Spark
Streaming.
D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and
write row data to Amazon Simple Storage Service (S3) before and after processing.

Answer: B
Explanation:
B as when i sat the test previously i used the current answers within this guide and scored very
poorly in the collection section. Cause of this i am under the impression C is incorrect and the
most likely answer is B for kinesis data streams being realtime and FH having the 60 second
delay.

QUESTION 638
A solutions architect for a logistics organization ships packages from thousands of suppliers to
end customers. The architect is building a platform where suppliers can view the status of one or
more of their shipments. Each supplier can have multiple roles that will only allow access to
specific fields in the resulting information.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 198
http://www.passleader.com
Which strategy allows the appropriate level of access control and requires the LEAST amount of
management work?

A. Send the tracking data to Amazon Kinesis Streams.

Use AWS Lambda to store the data in an Amazon DynamoDB Table.
Generate temporary AWS credentials for the suppliers' users with AWS STS, specifying fine-
grained security policies to limit access only to their applicable data.
B. Send the tracking data to Amazon Kinesis Firehose.
Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate
data for each supplier's roles.
Generate temporary AWS credentials for the suppliers' users with AWS STS.
Limit access to the appropriate files through security policies.
C. Send the tracking data to Amazon Kinesis Streams.
Use Amazon EMR with Spark Streaming to store the data in HBase.
Create one table per supplier. Use HBase Kerberos integration with the suppliers' users.
Use HBase ACL-based security to limit access for the roles to their specific table and columns.
D. Send the tracking data to Amazon Kinesis Firehose.
Store the data in an Amazon Redshift cluster.
Create views for the suppliers' users and roles.
Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view.

Answer: D
Explanation:
You can assign a different set of permissions to the view. A user might be able to query the view,
but not the underlying table. Creating the view excluding the sensitive columns (or rows) should
be useful in this scenario.
http://www.silota.com/blog/rethink-database-views-redshift/

QUESTION 639
A company's social media manager requests more staff on the weekends to handle an increase
in customer contacts from a particular region.
The company needs a report to visualize the trends on weekends over the past 6 months using
QuickSight.

How should the data be represented?

A. A line graph plotting customer contacts vs. time, with a line for each region
B. A pie chart per region plotting customer contacts per day of week
C. A map of regions with a heatmap overlay to show the volume of customer contacts
D. A bar graph plotting region vs. volume of social media contacts

Answer: A
Explanation:
Use line charts to compare changes in measure values over period of time
https://docs.aws.amazon.com/quicksight/latest/user/line-charts.html

QUESTION 640
How should an Administrator BEST architect a large multi-layer Long Short-Term Memory
(LSTM) recurrent neural network (RNN) running with MXNET on Amazon EC2? (Choose two.)

A. Use data parallelism to partition the workload over multiple devices and balance the workload
within the GPUs.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 199
http://www.passleader.com
B. Use compute-optimized EC2 instances with an attached elastic GPU.
C. Use general purpose GPU computing instances such as G3 and P3.
D. Use processing parallelism to partition the workload over multiple storage devices and balance
the workload within the GPUs.

Answer: AC
Explanation:
https://aws.amazon.com/blogs/machine-learning/reducing-deep-learning-inference-cost-with-
mxnet-and-amazon-elastic-inference/
Mentions increased performance with EI elastic GPUs on compute ec2 instances. However
answer doesn't refer to Amazon Elastic Inference.

QUESTION 641
An organization is soliciting public feedback through a web portal that has been deployed to track
the number of requests and other important data. As part of reporting and visualization,
AmazonQuickSight connects to an Amazon RDS database to virtualize data. Management wants
to understand some important metrics about feedback and how the feedback has changed over
the last four weeks in a visual representation.

What would be the MOST effective way to represent multiple iterations of an analysis in Amazon
QuickSight that would show how the data has changed over the last four weeks?

A. Use the analysis option for data captured in each week and view the data by a date range.
B. Use a pivot table as a visual option to display measured values and weekly aggregate data as a
row dimension.
C. Use a dashboard option to create an analysis of the data for each week and apply filters to
visualize the data change.
D. Use a story option to preserve multiple iterations of an analysis and play the iterations
sequentially.

Answer: D
Explanation:
https://docs.aws.amazon.com/quicksight/latest/user/working-with-stories.html

QUESTION 642
An organization is setting up a data catalog and metadata management environment for their
numerous data stores currently running on AWS. The data catalog will be used to determine the
structure and other attributes of data in the data stores. The data stores are composed of
Amazon RDS databases, Amazon Redshift, and CSV files residing on Amazon S3. The catalog
should be populated on a scheduled basis, and minimal administration is required to manage the
catalog.

How can this be accomplished?

A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that
connects to data sources to populate the database.
B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that
connects to data sources to populate the database.
C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data
sources to populate the database.
D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 200
http://www.passleader.com
connects to data sources to populate the metastore.

Answer: C

QUESTION 643
An organization is currently using an Amazon EMR long-running cluster with the latest Amazon
EMR release for analytic jobs and is storing data as external tables on Amazon S3.

The company needs to launch multiple transient EMR clusters to access the same tables
concurrently, but the metadata about the Amazon S3 external tables are defined and stored on
the long-running cluster. Which solution will expose the Hive metastore with the LEAST
operational effort?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to

the Amazon DynamoDB table.
B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon
EMR hive-site classification to point to the Amazon RDS database.
C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive
metastore information to derby.
D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.

Answer: B
Explanation:
Well long-running cluster is already operating Hive catalog on in MySQL DB on Master Node so I
think moving existing database to RDS and switching is more easy than using Glue Crawler.

QUESTION 644
An organization is using Amazon Kinesis Data Streams to collect data generated from thousands
of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to
12 million records every day, but Lambda is processing only around 450 thousand records.
Amazon CloudWatch indicates that throttling on Lambda is not occurring.

What should be done to ensure that all data is processed? (Choose two.)

A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the
Lambda function.
B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the
Lambda function.
C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.
D. Increase the number of vCores allocated for the Lambda function.
E. Increase the number of shards on the Amazon Kinesis stream.

Answer: AE
Explanation:
https://tech.trivago.com/2018/07/13/aws-kinesis-with-lambdas-lessons-learned/

QUESTION 645
An Operations team continuously monitors the number of visitors to a website to identify any
potential system problems. The number of website visitors varies throughout the day. The site is
more popular in the middle of the day and less popular at night.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 201
http://www.passleader.com
Which type of dashboard display would be the MOST useful to allow staff to quickly and correctly
identify system problems?

A. A vertical stacked bar chart showing today's website visitors and the historical average number of
website visitors.
B. An overlay line chart showing today's website visitors at one-minute intervals and also the
historical average number of website visitors.
C. A single KPI metric showing the statistical variance between the current number of website
visitors and the historical number of website visitors for the current time of day.
D. A scatter plot showing today's website visitors on the X-axis and the historical average number of
website visitors on the Y-axis.

Answer: C
Explanation:
Use a KPI to visualize a comparison between a key value and its target value. A KPI displays a
value comparison, the two values being compared, and a progress bar. For example, the
following KPI shows how closely revenue is meeting its forecast.
https://docs.aws.amazon.com/quicksight/latest/user/kpi.html

QUESTION 646
An organization would like to run analytics on their Elastic Load Balancing logs stored in Amazon
S3 and join this data with other tables in Amazon S3. The users are currently using a BI tool
connecting with JDBC and would like to keep using this BI tool.

Which solution would result in the LEAST operational overhead?

A. Trigger a Lambda function when a new log file is added to the bucket to transform and load it into
Amazon Redshift.
Run the VACUUM command on the Amazon Redshift cluster every night.
B. Launch a long-running Amazon EMR cluster that continuously downloads and transforms new
files from Amazon S3 into its HDFS storage.
Use Presto to expose the data through JDBC.
C. Trigger a Lambda function when a new log file is added to the bucket to transform and move it to
another bucket with an optimized data structure.
Use Amazon Athena to query the optimized bucket.
D. Launch a transient Amazon EMR cluster every night that transforms new log files and loads them
into Amazon Redshift.

Answer: C
Explanation:
Q: Does Athena support other BI Tools and SQL Clients?
Yes. Amazon Athena comes with an ODBC and JDBC driver that you can use with other
business intelligence tools and SQL clients. Learn more about using an ODBC or JDBC driver
with Athena.". This satisfies the requirement.

QUESTION 647
An organization has added a clickstream to their website to analyze traffic. The website is
sending each page request with the PutRecord API call to an Amazon Kinesis stream by using
the page name as the partition key. During peak spikes in website traffic, a support engineer
notices many events in the application logs.
ProvisionedThroughputExcededException

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 202
http://www.passleader.com
What should be done to resolve the issue in the MOST cost-effective way?

A. Create multiple Amazon Kinesis streams for page requests to increase the concurrency of the
clickstream.
B. Increase the number of shards on the Kinesis stream to allow for more throughput to meet the
peak spikes in traffic.
C. Modify the application to use on the Kinesis Producer Library to aggregate requests before
sending them to the Kinesis stream.
D. Attach more consumers to the Kinesis stream to process records in parallel, improving the
performance on the stream.

Answer: C
Explanation:
https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html
"The KPL is an easy-to-use, highly configurable library that helps you write to a Kinesis data
stream. It acts as an intermediary between your producer application code and the Kinesis Data
Streams API actions. The KPL performs the following primary tasks:
Writes to one or more Kinesis data streams with an automatic and configurable retry mechanism
Collects records and uses PutRecords to write multiple records to multiple shards per request
Aggregates user records to increase payload size and improve throughput".

QUESTION 648
An organization currently runs a large Hadoop environment in their data center and is in the
process of creating an alternative Hadoop environment on AWS, using Amazon EMR.

They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be
grouped and copied to Amazon S3 to be used for the Amazon EMR environment. They have
multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G
AWS Direct Connect setup between their data center and AWS, and the network team has
agreed to allocate

A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to
Amazon S3.
B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct
Connect.
C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct
Connect.
D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3
over AWS Direct Connect.

Answer: D
Explanation:
S3DistCop is the right answer and it's used for moving big data from Hadoop to s3, S3 to hadopp
or from s3 to s3 and at backend it uses map reduce job.
https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf

QUESTION 649
An organization is developing a mobile social application and needs to collect logs from all
devices on which it is installed. The organization is evaluating the Amazon Kinesis Data Streams
to push logs and Amazon EMR to process data. They want to store data on HDFS using the
default replication factor to replicate data among the cluster, but they are concerned about the

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 203
http://www.passleader.com
durability of the data. Currently, they are producing 300 GB of raw data daily, with additional
spikes during special events. They will need to scale out the Amazon EMR cluster to match the
increase in streamed data.

Which solution prevents data loss and matches compute demand?

A. Use multiple Amazon EBS volumes on Amazon EMR to store processed data and scale out the
Amazon EMR cluster as needed.
B. Use the EMR File System and Amazon S3 to store processed data and scale out the Amazon
EMR cluster as needed.
C. Use Amazon DynamoDB to store processed data and scale out the Amazon EMR cluster as
needed.
D. use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into
Amazon Elasticsearch Service.

Answer: B
Explanation:
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html
refers to a lot smaller reference data then what the question states. When searching for compute
resources on AWS elasticsearch isnt a service that appears.

QUESTION 650
An advertising organization uses an application to process a stream of events that are received
from clients in multiple unstructured formats.

The application does the following:

- Transforms the events into a single structured format and streams them to Amazon Kinesis for
real-time analysis.
- Stores the unstructured raw events from the log files on local hard drivers that are rotated and
uploaded to Amazon S3.

The organization wants to extract campaign performance reporting using an existing Amazon
redshift cluster.
Which solution will provide the performance data with the LEAST number of operations?

A. Install the Amazon Kinesis Data Firehose agent on the application servers and use it to stream
the log files directly to Amazon Redshift.
B. Create an external table in Amazon Redshift and point it to the S3 bucket where the unstructured
raw events are stored.
C. Write an AWS Lambda function that triggers every hour to load the new log files already in S3 to
Amazon redshift.
D. Connect Amazon Kinesis Data Firehose to the existing Amazon Kinesis stream and use it to
stream the event directly to Amazon Redshift.

Answer: D
Explanation:
Fire Hose can read the Structured data from Kinesis Stream and store it in Redshift.

QUESTION 651
An organization is designing an Amazon DynamoDB table for an application that must meet the
following requirements:

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 204
http://www.passleader.com
Item size is 40 KB
Read/write ratio 2000/500 sustained, respectively
Heavily read-oriented and requires low latencies in the order of milliseconds
The application runs on an Amazon EC2 instance
Access to the DynamoDB table must be secure within the VPC
Minimal changes to application code to improve performance using write-through cache

Which design options will BEST meet these requirements?

A. Size the DynamoDB table with 10000 RCUs/20000 WCUs, implement the DynamoDB
Accelerator (DAX) for read performance, use VPC endpoints for DynamoDB, and implement an
IAM role on the EC2 instance to secure DynamoDB access.
B. Size the DynamoDB table with 20000 RCUs/20000 WCUs, implement the DynamoDB
Accelerator (DAX) for read performance, leverage VPC endpoints for DynamoDB, and implement
an IAM user on the EC2 instance to secure DynamoDB access.
C. Size the DynamoDB table with 10000 RCUs/20000 WCUs, implement Amazon ElastiCache for
read performance, set up a NAT gateway on VPC for the EC2 instance to access DynamoDB,
and implement an IAM role on the EC2 instance to secure DynamoDB access.
D. Size the DynamoDB table with 20000 RCUs/20000 WCUs, implement Amazon ElastiCache for
read performance, leverage VPC endpoints for DynamoDB, and implement an IAM user on the
EC2 instance to secure DynamoDB access.

Answer: A

QUESTION 652
An organization needs to store sensitive information on Amazon S3 and process it through
Amazon EMR. Data must be encrypted on Amazon S3 and Amazon EMR at rest and in transit.
Using Thrift Server, the Data Analysis team users HIVE to interact with this data. The
organization would like to grant access to only specific databases and tables, giving permission
only to the SELECT statement.

Which solution will protect the data and limit user access to the SELECT statement on a specific
portion of data?

A. Configure Transparent Data Encryption on Amazon EMR.

Create an Amazon EC2 instance and install Apache Ranger.
Configure the authorization on the cluster to use Apache Ranger.
B. Configure data encryption at rest for EMR File System (EMRFS) on Amazon S3.
Configure data encryption in transit for traffic between Amazon S3 and EMRFS.
Configure storage and SQL base authorization on HiveServer2.
C. Use AWS KMS for encryption of data.
Configure and attach multiple roles with different permissions based on the different user needs.
D. Configure Security Group on Amazon EMR.
Create an Amazon VPC endpoint for Amazon S3.
Configure HiveServer2 to use Kerberos authentication on the cluster.

Answer: B
Explanation:
Transparent Data Encryption is for HDFS:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption-options.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 205
http://www.passleader.com
QUESTION 653
Multiple rows in an Amazon Redshift table were accidentally deleted. A System Administrator is
restoring the table from the most recent snapshot. The snapshot contains all rows that were in the
table before the deletion.

What is the SIMPLEST solution to restore the table without impacting users?

A. Restore the snapshot to a new Amazon Redshift cluster, then UNLOAD the table to Amazon S3. In
the original cluster, TRUNCATE the table, then load the data from Amazon S3 by using a COPY
command.
B. Use the Restore Table from a Snapshot command and specify a new table name DROP the
original table, then RENAME the new table to the original table name.
C. Restore the snapshot to a new Amazon Redshift cluster. Create a DBLINK between the two
clusters in the original cluster, TRUNCATE the destination table, then use an INSERT command to
copy the data from the new cluster.
D. Use the ALTER TABLE REVERT command and specify a time stamp of immediately before the
data deletion. Specify the Amazon Resource Name of the snapshot as the SOURCE and use the
OVERWRITE REPLACE option.

Answer: B
Explanation:
Truncate and load takes time which will impact user.

QUESTION 654
An organization's data warehouse contains sales data for reporting purposes. data governance
policies prohibit staff from accessing the customers' credit card numbers.

How can these policies be adhered to and still allow a Data Scientist to group transactions that
use the same credit card number?

A. Store a cryptographic hash of the credit card number.

B. Encrypt the credit card number with a symmetric encryption key, and give the key only to the
authorized Data Scientist.
C. Mask the credit card numbers to only show the last four digits of the credit card number.
D. Encrypt the credit card number with an asymmetric encryption key and give the decryption key
only to the authorized Data Scientist.

Answer: A
Explanation:
A CC has 16 digits... if you have only the last 4 you will most likely group incorrect transactions
that were made using different CCs.

QUESTION 655
A real-time bidding company is rebuilding their monolithic application and is focusing on serving
real-time data. A large number of reads and writes are generated from thousands of concurrent
users who follow items and bid on the company's sale offers.

The company is experiencing high latency during special event spikes, with millions of concurrent
users.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 206
http://www.passleader.com
The company needs to analyze and aggregate a part of the data in near real time to feed an
internal dashboard.

What is the BEST approach for serving and analyzing data, considering the constraint of the row
latency on the highly demanded data?

A. Use Amazon Aurora with Multi Availability Zone and read replicas. Use Amazon ElastiCache in
front of the read replicas to serve read-only content quickly. Use the same database as
datasource for the dashboard.
B. Use Amazon DynamoDB to store real-time data with Amazon DynamoDB. Accelerator to serve
content quickly. use Amazon DynamoDB Streams to replay all changes to the table, process and
stream to Amazon Elasti search Service with AWS Lambda.
C. Use Amazon RDS with Multi Availability Zone. Provisioned IOPS EBS volume for storage. Enable
up to five read replicas to serve read-only content quickly. Use Amazon EMR with Sqoop to
import Amazon RDS data into HDFS for analysis.
D. Use Amazon Redshift with a DC2 node type and a multi-mode cluster. Create an Amazon EC2
instance with pgpoo1 installed. Create an Amazon ElastiCache cluster and route read requests
through pgpoo1, and use Amazon Redshift for analysis.

Answer: B
Explanation:
Answer is B not D. The reason for D being incorrect is AWS big data blog post mentions only
having 6 to 10 users for the use case. B for the following reasons DynamoDB scales well for the
millions of users, DynamoDB streams can be used to aggregate data, Lambda function to push to
Elasticsearch. Elasticsearch can be used for application monitoring and Analyzing product usage
data. Has Kibana visualisations. Lastly question doesn't mention anything about querying
1.https://rockset.com/blog/live-dashboards-dynamodb-streams-lambda-elasticache/
2.https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-
elasticsearch-service-using-aws-lambda/
3. https://aws.amazon.com/blogs/startups/combining-dynamodb-and-amazon-elasticsearch-with-
lambda/
4.https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf?did=wp_card
&trk=wp_card

QUESTION 656
A gas company needs to monitor gas pressure in their pipelines. Pressure data is streamed from
sensors placed throughout the pipelines to monitor the data in real time. When an anomaly is
detected, the system must send a notification to open valve. An Amazon Kinesis stream collects
the data from the sensors and an anomaly Kinesis stream triggers an AWS Lambda function to
open the appropriate valve.

Which solution is the MOST cost-effective for responding to anomalies in real time?

A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket.
Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon
S3 to identify anomalies. When a change is detected, the Lambda function sends a message to
the anomaly stream to open the valve.
B. Launch an Amazon EMR cluster that uses Spark Streaming to connect to the Kinesis stream and
Spark machine learning to detect anomalies. When a change is detected, the Spark application
sends a message to the anomaly stream to open the valve.
C. Launch a fleet of Amazon EC2 instances with a Kinesis Client Library application that consumes
the stream and aggregates sensor data over time to identify anomalies. When an anomaly is
detected, the application sends a message to the anomaly stream to open the valve.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 207
http://www.passleader.com
D. Create a Kinesis Analytics application by using the RANDOM_CUT_FOREST function to detect
an anomaly. When the anomaly score that is returned from the function is outside of an
acceptable range, a message is sent to the anomaly stream to open the valve.

Answer: D
Explanation:
https://sagemaker-workshop.com/builtin/rcf.html
https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html

QUESTION 657
A gaming organization is developing a new game and would like to offer real-time competition to
their users. The data architecture has the following characteristics:

The game application is writing events directly to Amazon DynamoDB from the user’s mobile
device.
Users from the website can access their statistics directly from DynamoDB.
The game servers are accessing DynamoDB to update the user’s information.
The data science team extracts data from DynamoDB for various applications.

The engineering team has already agreed to the IAM roles and policies to use for the data
science team and the application.
Which actions will provide the MOST security, while maintaining the necessary access to the
website and game application? (Choose two.)

A. Use Amazon Cognito user pool to authenticate to both the website and the game application.
B. Use IAM identity federation to authenticate to both the website and the game application.
C. Create an IAM policy with PUT permission for both the website and the game application.
D. Create an IAM policy with fine-grained permission for both the website and the game application.
E. Create an IAM policy with PUT permission for the game application and an IAM policy with GET
permission for the website.

Answer: AE
Explanation:
https://docs.amazonaws.cn/en_us/IAM/latest/UserGuide/id_credentials_temp.html
https://aws.amazon.com/iam/
https://aws.amazon.com/blogs/security/create-fine-grained-session-permissions-using-iam-
managed-policies/
D is incorrect because we need different policies for game application role & user role from
mobile devices.

QUESTION 658
An organization has 10,000 devices that generate 10 GB of telemetry data per day, with each
record size around 10 KB. Each record has 100 fields, and one field consists of unstructured log
data with a "String" data type in the English language. Some fields are required for the real-time
dashboard, but all fields must be available for long-term generation.

The organization also has 10 PB of previously cleaned and structured data, partitioned by Date,
in a SAN that must be migrated to AWS within one month. Currently, the organization does not
have any real-time capabilities in their solution. Because of storage limitations in the on-premises
data warehouse, selective data is loaded while generating the long-term trend with ANSI SQL
queries through JDBC for visualization. In addition to the one-time data loading, the organization
needs a cost-effective and real-time solution.

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 208
http://www.passleader.com
How can these requirements be met? (Choose two.)

A. use AWS IoT to send data from devices to an Amazon SQS queue, create a set of workers in an
Auto Scaling group and read records in batch from the queue to process and save the data. Fan
out to an Amazon SNS queue attached with an AWS Lambda function to filter the request dataset
and save it to Amazon Elasticsearch Service for real-time analytics.
B. Create a Direct Connect connection between AWS and the on-premises data center and copy the
data to Amazon S3 using S3 Acceleration. Use Amazon Athena to query the data.
C. Use AWS IoT to send the data from devices to Amazon Kinesis Data Streams with the IoT rules
engine. Use one Kinesis Data Firehose stream attached to a Kinesis stream to batch and stream
the data partitioned by date. Use another Kinesis Firehose stream attached to the same Kinesis
stream to filter out the required fields to ingest into Elasticsearch for real-time analytics.
D. Use AWS IoT to send the data from devices to Amazon Kinesis Data Streams with the IoT rules
engine. Use one Kinesis Data Firehose stream attached to a Kinesis stream to stream the data
into an Amazon S3 bucket partitioned by date. Attach an AWS Lambda function with the same
Kinesis stream to filter out the required fields for ingestion into Amazon DynamoDB for real-time
analytics.
E. use multiple AWS Snowball Edge devices to transfer data to Amazon S3, and use Amazon
Athena to query the data.

Answer: CE
Explanation:
C as FH allows batching and custom transformations to obtain the date data to connect to the
near real time dashboard. Also C uses multiple FH streams. E as its a PB data transport solution
so you can load data from the SAN into S3 within a couple of days and will allow queries from
Athena.

QUESTION 659
An organization is designing a public web application and has a requirement that states all
application users must be centrally authenticated before any operations are permitted. The
organization will need to create a user table with fast data lookup for the application in which a
user can read only his or her own data. All users already have an account with amazon.com.

How can these requirements be met?

A. Create an Amazon RDS Aurora table, with Amazon_ID as the primary key. The application uses
amazon.com web identity federation to get a token that is used to assume an IAM role from AWS
STS.
Use IAM database authentication by using the rds:db-tag IAM authentication policy and GRANT
Amazon RDS row-level read permission per user.
B. Create an Amazon RDS Aurora table, with Amazon_ID as the primary key for each user. The
application uses amazon.com web identity federation to get a token that is used to assume an
IAM role. Use IAM database authentication by using rds:db-tag IAM authentication policy and
GRANT Amazon RDS row-level read permission per user.
C. Create an Amazon DynamoDB table, with Amazon_ID as the partition key. The application uses
amazon.com web identity federation to get a token that is used to assume an IAM role from AWS
STS in the Role, use IAM condition context key dynamodb:LeadingKeys with IAM substitution
variables $ and allow the required DynamoDB API operations in IAM JSON
{www.amazon.com:user_id}
policy Action element for reading the records.
D. Create an Amazon DynamoDB table, with Amazon_ID as the partition key. The application uses
amazon.com web identity federation to assume an IAM role from AWS STS in the Role, use IAM

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 209
http://www.passleader.com
condition context key dynamodb:LeadingKeys with IAM substitution variables $ and allow the
required DynamoDB API operations in IAM JSON {www.amazon.com:user_id}
policy Action element for reading the records.

Answer: C
Explanation:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WIF.html

Get Latest & Actual AWS-Certified-Big-Data-Specialty Exam's Question and Answers from
Passleader. 210
http://www.passleader.com

Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Ikm Attempt
No ratings yet
Ikm Attempt
16 pages
Killer Shell - Exam Simulators
No ratings yet
Killer Shell - Exam Simulators
61 pages
AWS Re Invent 2020 Hands-On Labs 12162020
No ratings yet
AWS Re Invent 2020 Hands-On Labs 12162020
1,131 pages
Percona Monitoring and Management Documentation: Date .Getfullyear )
No ratings yet
Percona Monitoring and Management Documentation: Date .Getfullyear )
589 pages
AWS Data Engineering Questions by Deepa Vasanth Kumar 1721182233
No ratings yet
AWS Data Engineering Questions by Deepa Vasanth Kumar 1721182233
68 pages
Chapter 8 - PPT Asli Buku
100% (1)
Chapter 8 - PPT Asli Buku
52 pages
SmartCall T
No ratings yet
SmartCall T
713 pages
Mastering Apache Spark PDF
No ratings yet
Mastering Apache Spark PDF
663 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
0% (1)
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
39 pages
Airflow - Notes
No ratings yet
Airflow - Notes
82 pages
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
Cloudera Spark Developer Training
No ratings yet
Cloudera Spark Developer Training
491 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
Real-Time Operating Systems: Queue Management
No ratings yet
Real-Time Operating Systems: Queue Management
90 pages
Airflow 101 Mobile
No ratings yet
Airflow 101 Mobile
48 pages
ETL Testing Process
No ratings yet
ETL Testing Process
23 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
Company Interview Question Bank
No ratings yet
Company Interview Question Bank
16 pages
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
1631 -GCS200454-Vo Hoang Hien - Assignment 2 full-đã chuyển đổi
0% (1)
1631 -GCS200454-Vo Hoang Hien - Assignment 2 full-đã chuyển đổi
24 pages
Archive Center 24.1 Release Notes
No ratings yet
Archive Center 24.1 Release Notes
24 pages
Edureka Training - AWS Solutions Architect
No ratings yet
Edureka Training - AWS Solutions Architect
11 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
ISO-13335 en WhitePaper
No ratings yet
ISO-13335 en WhitePaper
11 pages
50+ AWS Interview Questions and Answers (2024)
No ratings yet
50+ AWS Interview Questions and Answers (2024)
24 pages
Gaurav Yadav Resume
No ratings yet
Gaurav Yadav Resume
1 page
Jenkins Declarative Pipeline
No ratings yet
Jenkins Declarative Pipeline
41 pages
Bda QB
No ratings yet
Bda QB
18 pages
TalendOpenStudio BigData UG 5.2.1 en
No ratings yet
TalendOpenStudio BigData UG 5.2.1 en
266 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
Netbackup 8.0 Blueprint Enterprise Vault
No ratings yet
Netbackup 8.0 Blueprint Enterprise Vault
39 pages
Ansible 2
No ratings yet
Ansible 2
15 pages
Spark Tuning
No ratings yet
Spark Tuning
26 pages
Lab - Exploring DataLake With Athena and Quicksight PDF
No ratings yet
Lab - Exploring DataLake With Athena and Quicksight PDF
22 pages
Computer Project File
No ratings yet
Computer Project File
15 pages
CMU SE 303 Software Testing Lec 02
No ratings yet
CMU SE 303 Software Testing Lec 02
31 pages
8888888888888888888
100% (1)
8888888888888888888
131 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Automate Machine Learning - Aparna Elangovan
No ratings yet
Automate Machine Learning - Aparna Elangovan
26 pages
AWS EC2 Interview Questions - MindMajix
No ratings yet
AWS EC2 Interview Questions - MindMajix
27 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Apache Spark - Optimization Techniques
No ratings yet
Apache Spark - Optimization Techniques
7 pages
I2k2 Profile
No ratings yet
I2k2 Profile
22 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Locking Down Your Kubernetes Cluster With Linkerd
No ratings yet
Locking Down Your Kubernetes Cluster With Linkerd
24 pages
Artifactory With Amazon Ecs On The Aws Cloud PDF
No ratings yet
Artifactory With Amazon Ecs On The Aws Cloud PDF
37 pages
TIB JS-JRS-CP 7.1.0 Relnotes-Ce PDF
No ratings yet
TIB JS-JRS-CP 7.1.0 Relnotes-Ce PDF
22 pages
13 SparkBuildingAndDeploying
No ratings yet
13 SparkBuildingAndDeploying
53 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Demo CIE-1 TUT-1
No ratings yet
Demo CIE-1 TUT-1
5 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
13 pages
Full Stack, Web Development and Java Website Task - 6
No ratings yet
Full Stack, Web Development and Java Website Task - 6
2 pages
AWS CloudFormation Basics
No ratings yet
AWS CloudFormation Basics
13 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Anna (Yena) Lee
No ratings yet
Anna (Yena) Lee
5 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
AWS S3 Interview Questions
No ratings yet
AWS S3 Interview Questions
4 pages
Forking Vs Threading: What Is Fork/Forking
No ratings yet
Forking Vs Threading: What Is Fork/Forking
8 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Tuning Linux For MongoDB
No ratings yet
Tuning Linux For MongoDB
26 pages
Top 100 Hadoop Interview Questions and Answers 2016
No ratings yet
Top 100 Hadoop Interview Questions and Answers 2016
21 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Travel and Tourism Management System2017dotnetproject
No ratings yet
Travel and Tourism Management System2017dotnetproject
5 pages
Hauwa Skido's Assignment
No ratings yet
Hauwa Skido's Assignment
3 pages
AWS VPC Notes
No ratings yet
AWS VPC Notes
3 pages
How To Reprocess A Failed Background Job in SAP
No ratings yet
How To Reprocess A Failed Background Job in SAP
7 pages
Microsoft AZURE® AZ-104 Administrator Practice Tests
From Everand
Microsoft AZURE® AZ-104 Administrator Practice Tests
iCertify Training
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Copying Test IDocs From Another System-File Server
No ratings yet
Copying Test IDocs From Another System-File Server
4 pages
Hiren Jayprakash Thakar: Achievements
No ratings yet
Hiren Jayprakash Thakar: Achievements
4 pages
How To - Use VPN MPLS As A Backup (MPLS Scenario)
No ratings yet
How To - Use VPN MPLS As A Backup (MPLS Scenario)
4 pages
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
Madhavaraj Java J2EE NMS EMS 6 Years
No ratings yet
Madhavaraj Java J2EE NMS EMS 6 Years
4 pages
Ali Resume
No ratings yet
Ali Resume
7 pages
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
2 Introduction To Cloud Computing Part 1
No ratings yet
2 Introduction To Cloud Computing Part 1
4 pages
Systems Analysis and Design - Topics
No ratings yet
Systems Analysis and Design - Topics
2 pages
Linux Syllabus
No ratings yet
Linux Syllabus
3 pages
Amazon Web Services Training
No ratings yet
Amazon Web Services Training
5 pages
Apache Cassandra Certification
No ratings yet
Apache Cassandra Certification
0 pages
Kubernetes A Complete Guide
From Everand
Kubernetes A Complete Guide
Gerardus Blokdyk
No ratings yet