100% found this document useful (3 votes)
6K views13 pages

Exam - ACA Big Data Certification

Function Studio is a web development tool for functions independently developed by Alibaba. It supports several programming languages for function development except for Scala. A business flow in DataWorks integrates different task types by business type to improve code development. The PyODPS node in DataWorks can integrate with the MaxCompute Python SDK to edit Python code for operating MaxCompute data. Users can deploy APIs created in DataService Studio to API Gateway for management.

Uploaded by

rangel24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
6K views13 pages

Exam - ACA Big Data Certification

Function Studio is a web development tool for functions independently developed by Alibaba. It supports several programming languages for function development except for Scala. A business flow in DataWorks integrates different task types by business type to improve code development. The PyODPS node in DataWorks can integrate with the MaxCompute Python SDK to edit Python code for operating MaxCompute data. Users can deploy APIs created in DataService Studio to API Gateway for management.

Uploaded by

rangel24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Single answer

1 .Function Studio is a web project coding and development tool independently


developed by the Alibaba Group for function development scenarios. It is an
important component of DataWorks. Function Studio supports several programming
languages and platform-based function development scenarios except for ______ .
Score 2
A. Real-time computing
B. Python
C. Java
D. Scala

My Answer: D

Single answer
2 .A business flow in DataWorks integrates different node task types by business
type, such a structure improves business code development facilitation. Which of
the following descriptions about the node type is INCORRECT?
Score 2
A. A zero-load node is a control node that does not generate any data. The virtual
node is generally used as the root node for planning the overall node workflow.
B. An ODPS SQL task allows you to edit and maintain the SQL code on the Web, and
easily implement code runs, debug, and collaboration. 
C. The PyODPS node in DataWorks can be integrated with MaxCompute Python SDK. You
can edit the Python code to operate MaxCompute on a PyODPS node in DataWorks.
D. The SHELL node supports standard SHELL syntax and the interactive syntax. The
SHELL task can run on the default resource group.

My Answer: B

Single answer
3 .Apache Spark included in Alibaba E-MapReduce(EMR) is a fast and general-purpose
cluster computing system. It provides high-level APIs in Java, Scala, Python and R,
and an optimized engine that supports general execution graphs. It also supports a
rich set of higher-level tools. Which of the following tools does not be included
in Spark?
Score 2
A. Spark SQL for SQL and structured data processing
B.  MLlib for machine learning
C.  GraphX for graph processing
D. TensorFlow for AI

My Answer: A

Single answer
4 .DataWorks provides two billing methods: Pay-As-You-Go (post-payment) and
subscription (pre-payment). When DataWorks is activated in pay-as-you-go mode,
Which of the following billing items will not apply?
Score 2
A. Shared resource groups for scheduling and Data Integration instances
B. Baseline instances generated by Intelligent Monitor and Data Quality checks
C. Task nodes created by developer
D. Calls and execution time of APIs compiled in DataService Studio

My Answer: C

Single answer
5 .Users can use major BI tools, such as Tablueu and FineReport, to easily connect
to MaxCompute projects, and perform BI analysis or ad hoc queries. The quick query
feature in MaxCompute is called _________ allows you to provide services by
encapsulating project table data in APIs, supporting diverse application scenarios
without data migration.
Score 2
A. Lightning
B. MaxCompute Manager
C. Tunnel
D. Labelsecurity

My Answer: B

Single answer
6 .If a MySQL database contains 100 tables, and jack wants to migrate all those
tables to MaxCompute using DataWorks Data Integration, the conventional method
would require him to configure 100 data synchronization tasks. With _______ feature
in DataWorks, he can upload all tables at the same time.
Score 2
A.  Full-Database Migration feature
B. Configure a MySQL Reader plug-in
C. Configure a MySQL Writer plug-in
D. Add data sources in Bulk Mode

My Answer: B

Single answer
7 .Machine Learning Platform for Artificial Intelligence (PAI) node is one of the
node types in DataWorks business flow. It is used to call tasks created on PAI and
schedule production activities based on the node configuration. PAI nodes can be
added to DataWorks only _________ .
Score 2
A. after PAI experiments are created on PAI
B. after PAI service is activated
C. after MaxCompute service is activated
D. Spark on MaxCompute Machine Learning project is created

My Answer: B

Single answer
8 .In a scenario where a large enterprise plans to use MaxCompute to process and
analyze its data, tens of thousands of tables and thousands of tasks are expected
for this project, and a project team of 40 members is responsible for the project
construction and O&M. From the perspective of engineering, which of the following
can considerably reduce the cost of project construction and management?
Score 2
A. Develop directly on MaxCompute and use script-timed scheduling tasks
B. Use DataWorks
C. Use Eclipse
D. Use a private platform specially developed for this project

My Answer: B

Single answer
9 .AliOrg Company plans to migrate their data with virtually no downtime. They want
all the data changes to the source database that occur during the migration are
continuously replicated to the target, allowing the source database to be fully
operational during the migration process. After the database migration is
completed, the target database will remain synchronized with the source for as long
as you choose, allowing you to switch over the database at a convenient time. Which
of the following Alibaba products is the right choice for you to do it:
Score 2
A. Log Service
B. DTS(Data Transmission Service)
C. Message Service 
D. CloudMonitor

My Answer: B

Single answer
10 .There are three types of node instances in an E-MapReducecluster: master, core,
and _____ .
Score 2
A. task
B. zero-load
C. gateway
D. agent

My Answer: D

Single answer
11 .A dataset includes the following items (time, region, sales amount). If you
want to present the information above in a chart, ______ is applicable.
Score 2
A. Bubble Chart
B. Tree Chart
C. Pie Chart
D. Radar Chart

My Answer: C

Single answer
12 .Alibaba Cloud Quick BI reporting tools support a variety of data sources,
facilitating users to analyze and present their data from different data sources.
______ is not supported as a data source yet.
Score 2
A. Results returned from the API
B. MaxCompute
C. Local Excel files
D. MySQL RDS

My Answer: C

Single answer
13 .DataV is a powerful yet accessible data visualization tool, which features
geographic information systems allowing for rapid interpretation of data to
understand relationships, patterns, and trends. When a DataV screen is ready, it
can embed works to the existing portal of the enterprise through ______.
Score 2
A. URL after the release
B. URL in the preview
C. MD5 code obtained after the release
D. Jar package imported after the release

My Answer: C

Single answer
14 .Where is the meta data(e.g.,table schemas) in Hive?
Score 2
A. Stored as metadata on the NameNode
B. Stored along with the data in HDFS
C. Stored in the RDBMS like MySQL
D. Stored in ZooKeeper

My Answer: C

Single answer
15 ._______ instances in E-MapReduce are responsible for computing and can quickly
add computing power to a cluster. They can also scale up and down at any time
without impacting the operations of the cluster.
Score 2
A. Task
B. Gateway
C. Master
D. Core

My Answer: A

Single answer
16 .Your company stores user profile records in an OLTP databases. You want to join
these records with web server logs you have already ingested into the Hadoop file
system. What is the best way to obtain and ingest these user records?
Score 2
A. Ingest with Hadoop streaming
B. Ingest using Hive
C. Ingest with sqoop import
D. Ingest with Pig's LOAD command

My Answer: B

Single answer
17 .You are working on a project where you need to chain together MapReduce, Hive
jobs. You also need the ability to use forks, decision points, and path joins.
Which ecosystem project should you use to perform these actions?
Score 2
A. Spark
B. HUE
C. Zookeeper
D. Oozie

My Answer: C

Single answer
18 .Which node type in DataWorks can edit the Python code to operate data in
MaxCompute?
Score 2
A. PyODPS
B. ODPS MR Node
C. ODPS Script Node
D. SHELL node

My Answer: A

Single answer
19 .DataService Studio in DataWorks aims to build a data service bus to help
enterprises centrally manage private and public APIs. DataService Studio allows you
to quickly create APIs based on data tables and register existing APIs with the
DataService Studio platform for centralized management and release.  Which of the
following descriptions about DataService Studio in DataWorks is INCORRECT?
Score 2
A. DataService Studio is connected to API Gateway. Users can deploy APIs to API
Gateway with one-click. 
B. DataService Studio adopts the serverless architecture. All you need to care is
the query logic of APIs, instead of the infrastructure such as the running
environment.
C. To meet the personalized query requirements of advanced users, DataService
Studio provides the custom Python script mode to allow you compile the API query by
yourself. It also supports multi-table association, complex query conditions, and
aggregate functions.
D. Users can deploy any APIs created and registered in DataService Studio to API
Gateway for management, such as API authorization and authentication, traffic
control, and metering.

My Answer: C

Single answer
20 .MaxCompute Tunnel provides high concurrency data upload and download services.
User can use the Tunnel service to upload or download the data to MaxCompute. Which
of the following descriptions about Tunnel is NOT correct: 
Score 2
A. MaxCompute Tunnel provides the Java programming interface for users
B. MaxCompute provides two data import and export methods: using Tunnel
Operation on the console directly or using TUNNEL written with java
C. If data fails to be uploaded, use the restore command to restore the upload from
where it was interrupted
D. Tunnel commands are mainly used to upload or download data.They provide the
following functions:upload, download, resume, show, purge etc.

My Answer: B

Single answer
21 .Which of the following is not proper for granting the permission on a L4
MaxCompute table to a user. (L4 is a level in MaxCompute Label-based security
(LabelSecurity), it is a required MaxCompute Access Control (MAC) policy at the
project space level. It allows project administrators to control the user access to
column-level sensitive data with improved flexibility.)
Score 2
A. If no permissions have been granted to the user and the user does not belong to
the project, add the user to the project. The user does not have any permissions
before they are added to the project.
B. Grant a specific operation permission to the user.
C. If the user manages resources that have labels, such as datasheets and packages
with datasheets, grant label permissions to the user. 
D. The user need to create a project in simple mode

My Answer: A

Single answer
22 .MaxCompute supports two kinds of charging methods: Pay-As-You-Go and
Subscription (CU cost). Pay-As-You-Go means each task is measured according to the
input size by job cost. In this charging method the billing items do not include
charges due to ______.
Score 2
A. Data upload
B. Data download
C. Computing
D. Storage

My Answer: B
Single answer
23 .MaxCompute is a general purpose, fully managed, multi-tenancy data processing
platform for large-scale data warehousing, and it is mainly used for storage and
computing of batch structured data. Which of the following is not a use case for
MaxCompute?
Score 2
A. Order management
B. Date Warehouse
C. Social networking analysis
D. User profile

My Answer: B

Single answer
24 .Tom is the administrator of a project prj1 in MaxCompute. The project involves
a large volume of sensitive data such as user IDs and shopping records, and many
data mining algorithms with proprietary intellectual property rights. Tom wants to
properly protect these sensitive data and algorithms. To be specific, project users
can only access the data within the project, all data flows only within the
project. What operation should he perform?
Score 2
A. Use ACL authorization to set the status to read-only for all users
B. Use Policy authorization to set the status to read-only for all users
C. Allow the object creator to access the object
D. Enable the data protection mechanism in the project, using set
ProjectProtection=true;

My Answer: B

Single answer
25 .There are multiple connection clients for MaxCompute, which of the following is
the easiest way to configure workflow and scheduling for MaxCompute tasks?
Score 2
A. Use DataWorks
B. Use Intelij IDEA
C. Use MaxCompute Console
D. No supported tool yet

My Answer: B

Single answer
26 .In MaxCompute, you can use Tunnel command line for data upload and download.
Which of the following description of Tunnel command is NOT correct:
Score 2
A. Upload: Supports file or directory (level-one) uploading. Data can only be
uploaded to a single table or table partition each time.
B. Download: You can only download data to a single file. Only data in one table or
partition can be downloaded to one file each time. For partitioned tables, the
source partition must be specified.
C. Resume: If an error occurs due to the network or the Tunnel service, you can
resume transmission of the file or directory after interruption.
D. Purge: Clears the table directory. By default, use this command to clear
information of the last three days.

My Answer: B

Single answer
27 .Scenario: Jack is the administrator of project prj1. A new team member, Alice
(already has an Alibaba Cloud account [email protected]), applies for joining this
project with the following permissions: view table lists, submit jobs, and create
tables. Which of the following SQL statements is useless:
Score 2
A. use prj1;
B. add user [email protected];
C. grant List, CreateTable, CreateInstance on project prj1 to user
aliyun$alice@aliyun;
D. flush privileges;

My Answer: B

Single answer
28 .Apache Spark is an open-source framework that functions on the service level to
support data processing and analysis operations. Equipped with unified computing
resources and data set permissions, Spark on MaxCompute allows you to submit and
run jobs while using your preferred development methods. Which of the following
descriptions about Spark on MaxCompute is NOT correct:
Score 2
A. Spark on MaxCompute provides you with native Spark Web UIs.
B. Different versions of Spark can run in MaxCompute at the same time.
C. Similar to MaxCompute SQL and MaxCompute MapReduce, Spark on MaxCompute runs in
the unified computing resources activated for MaxCompute projects.
D. Spark on MaxCompute has a separate permission system which will not allow users
to query data without any additional permission modifications required.

My Answer: B

Single answer
29 .In MaxCompute command line, if you want to view all tables in a project, you
can execute command: ______.
Score 2
A. show tables;
B. use tables;
C. desc tables;
D. select tables;

My Answer: B

Single answer
30 .When odpscmd is used to connect to a project in MaxCompute, the command ______
can be executed to view the size of the space occupied by table table_a.
Score 2
A. select size from table_a;
B. size table_a;
C. desc table_a;
D. show table table_a;

My Answer: B

True/False
31 .Data Migration Unit (DMU) is used to measure the amount of resources consumed
by data integration, including CPU, memory, and network. One DMU represents the
minimum amount of resources used for a data synchronization task.
Score 1
True
False

My Answer: A
True/False
32 .DataWorks can be used to create all types of tasks and configure scheduling
cycles as needed. The supported granularity levels of scheduling cycles include
days, weeks, months, hours, minutes and seconds.
Score 1
True
False

My Answer: A

True/False
33 .If a task node of DataWorks is deleted from the recycle bin, it can still be
restored.
Score 1
True
False

My Answer: A

True/False
34 .If the DataWorks(MaxCompute) tables in your request belong to two owners. In
this case, Data Guard(DataWorks component) automatically splits your request into
two by table owner.
Score 1
True
False

My Answer: A

True/False
35 .The FTP data source in DataWorks allows you to read/write data to FTP, and
supports configuring synchronization tasks in wizard and script mode.
Score 1
True
False

My Answer: A

True/False
36 .In each release of E-MapReduce, the software and software version are flexible.
You can select multiple software versions.
Score 1
True
False

My Answer: A

True/False
37 .Alibaba Cloud Elastic MapReduce (E-MapReduce) is a big data processing solution
to quickly process huge amounts of data. Based on open source Apache Hadoop and
Apache Spark, E-MapReduce flexibly manages your big data use cases such as trend
analysis, data warehousing, and analysis of continuously streaming data.
Score 1
True
False

My Answer: A
True/False
38 .An enterprise uses Alibaba Cloud MaxCompute for storage of service orders,
system logs and management data. Because the security levels for the data are
different, it is needed to register multiple Alibaba Cloud accounts for data
management.
Score 1
True
False

My Answer: B

True/False
39 .JindoFS in E-MapReduce provided by SmartData uses OSS as the storage back end. 
Score 1
True
False

My Answer: A

True/False
40 .In DataWorks table permission system, you can revoke permissions only on the
fields whose security level is higher than the security level of your account.
Score 1
True
False

My Answer: A

True/False
41 .Project is an important concept in MaxCompute. A user can create multiple
projects, and each object belongs to a certain project.
Score 1
True
False

My Answer: A

True/False
42 .Assume that Task 1 is configured to run at 02:00 each day. In this case, the
scheduling system automatically generates a snapshot at the time predefined by the
periodic node task at 23:30 each day. That is, the instance of Task 1 will run at
02:00 the next day. If the system detects the upstream task is complete, the system
automatically runs the Task 1 instance at 02:00 the next day.
Score 1
True
False

My Answer: A

True/False
43 .In MaxCompute, if error occurs in Tunnel transmission due to network or Tunnel
service, the user can resume the last update operation through the command
             tunnel resume;.
Score 1
True
False

My Answer: A
True/False
44 .A company originally handled the local data services through the Java programs.
The local data have been migrated to MaxCompute on the cloud, now the data can be
accessed through modifying the Java code and using the Java APIs provided by
MaxCompute.
Score 1
True
False

My Answer: A

True/False
45 .MaxCompute takes Project as a charged unit. The bill is charged according to
three aspects: the usage of storage, computing resource, and data download
respectively. You pay for compute and storage resources by the day with no long-
term commitments.
Score 1
True
False

My Answer: A

True/False
46 .There are various methods for accessing to MaxCompute, for example, through
management console, client command line, and Java API. Command line tool odpscmd
can be used to create, operate, or delete a table in a project.
Score 1
True
False

My Answer: A

True/False
47 .A start-up company wants to use Alibaba Cloud MaxCompute to provide product
recommendation services for its users. However, the company does not have much
users at the initial stage, while the charge for MaxCompute is higher than that of
ApsaraDB RDS, so the company should be recommended to use MaxCompute service until
the number of its users increases to a certain size.
Score 1
True
False

My Answer: A

True/False
48 .Synchronous development in DataWorks provides both wizard and script modes.
Score 1
True
False

My Answer: A

True/False
49 .MaxCompute SQL is suitable for processing less real-time massive data, and
employs a syntax similar to that of SQL. The efficiency of data query can be
improved through creating proper indexes in the table.
Score 1
True
False
My Answer: A

True/False
50 .Table is a data storage unit in MaxCompute. It is a two-dimensional logical
structure composed of rows and columns. All data is stored in the tables. Operating
objects of computing tasks are all tables. A user can perform create table, drop
table, and tunnel upload as well as update the qualified data in the table.
Score 1
True
False

My Answer: A

Multiple answers
51 .Which of the following Hadoop ecosystem componets can you choose to setup a
streaming log analysis system?(Number of correct answers: 3)
Score 2
A. Apache Flume
B. Apache Kafka
C. Apache Spark
D. Apache Lucene

My Answer: A,C,D

Multiple answers
52 .A distributed file system like GFS and Hadoop are design to have much larger
block(or chunk) size like 64MB or 128MB, which of the following descriptions are
correct? (Number of correct answers: 4)
Score 2
A. It reduces clients' need to interact with the master because reads and writes on
the same block( or chunck) require only one initial request to the master for block
location information
B. Since on a large block(or chunk), a client is more likely to perform many
operations on a given block, it can reduce network overhead by keeping a persistent
TCP connection to the metadata server over an extended period of time
C. It reduces the size of the metadata stored on the master
D. The servers storing those blocks may become hot spots if many clients are
accessing the same small files
E. If necessary to support even larger file systems, the cost of adding extra
memory to the meta data server is a big price

My Answer: A,B,C,D,E

Multiple answers
53 .MaxCompute can coordinate multiple users to operate one project through ACL
authorization. The objects that can be authorized by ACL include ______. (Number of
correct answers: 3)
Score 2
A. Project
B. Table
C. Resource
D. Procedure
E. Job

My Answer: A,B,C

Multiple answers
54 .DataWorks can be used to develop and configure data sync tasks. Which of the
following statements are correct? (Number of correct answers: 3)
Score 2
A. The data source configuration in the project management is required to add data
source
B. Some of the columns in source tables can be extracted to create a mapping
relationship between fields, and constants or variables can't be added
C. For the extraction of source data, "where" filtering clause can be referenced as
the criteria of incremental synchronization
D. Clean-up rules can be set to clear or preserve existing data before data write

My Answer: A,B,C,D

Multiple answers
55 .The data development mode in DataWorks has been upgraded to the three-level
structure comprising of _____, _____, and ______. (Number of correct answers: 3)
Score 2
A. Project
B. Solution
C. Business flow
D. Directory

My Answer: A,B,C

Multiple answers
56 .In DataWorks, we can configure alert policies to monitor periodically scheduled
tasks, so that an alert will be issued timely. Currently DataWorks supports
________ alerts. (Number of correct answers: 2)
Score 2
A. Email
B. Text message
C. Telephone
D. Aliwangwang

My Answer: A,B

Multiple answers
57 .DataWorks provides powerful scheduling capabilities including time-based or
dependency-based task trigger mechanisms to perform tens of millions of tasks
accurately and punctually each day based on DAG relationships. It supports multiple
scheduling frequency configurations like: (Number of correct answers: 4)
Score 2
A. By Minute
B. By Hour
C. By Day
D. By Week
E. By Second

My Answer: A,B,C,D

Multiple answers
58 .MaxCompute is a fast and fully-managed TB/PB-level data warehousing solution
provided by Alibaba Cloud. Which of the following product features are correct?
______ (Number of correct answers: 3)
Score 2
A. Distributed architecture
B. High security and reliability
C. Multi-level management and authorization
D. Efficient transaction processing
E. Fast real-time response
My Answer: A,B,E

Multiple answers
59 .Resource is a particular concept of MaxCompute. If you want to use user-defined
function UDF or MapReduce, resource is needed. For example: After you have prepared
UDF, you must upload the compiled jar package to MaxCompute as resource. Which of
the following objects are MaxCompute resources? (Number of correct answers: 4)
Score 2
A. Files
B. Tables: Tables in MaxCompute
C. Jar: Compiled Java jar package
D. Archive: Recognize the compression type according to the postfix in the resource
name
E. ACL Policy

My Answer: A,B,C,D,E

Multiple answers
60 .In order to ensure smooth processing of tasks in the Dataworks data development
kit, you must create an AccessKey. An AccessKey is primarily used for access
permission verification between various Alibaba Cloud products. The AccessKey has
two parts, they are ____. (Number of correct answers: 2)
Score 2
A. Access Username
B. Access Key ID 
C. Access Key Secret
D. Access Password

My Answer: B,C

You might also like