0% found this document useful (0 votes)
262 views68 pages

ACA Big Data Dumps Full

Uploaded by

achmaddayat230
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
262 views68 pages

ACA Big Data Dumps Full

Uploaded by

achmaddayat230
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

ACA Sample exam Questions

Single selection
1. Scenario: Jack is the administrator of project prj1. The project involves a large volume of
sensitive data such as bank account, medical record, etc. Jack wants to properly protect
the data. Which of the follow statements is necessary?
a) set ProjectACL=true;
b) add accountprovider ram;
c) set ProjectProtection=true;
d) use prj1;

2. Where is the meta data (e.g.,table schemas) in Hive?


a) Stored as metadata on the NameNode
b) Stored along with the data in HDFS
c) Stored in the RDBMS like MySQL
d) Stored in ZooKeeper

3. MaxCompute tasks contain computational tasks and non-computational tasks. The


computational tasks require actual operations on data stored in the table. MaxCompute
parses the task to obtain its execution plan, and submits the task for execution. The noncomputational
tasks require substantial reading of and modification to metadata
information. Therefore, the task is not parsed, and no execution plan is provided. The
task is directly submitted for execution. The latter one has a faster response speed than
the former one. Which of the following operations on the table t_test is a computational
task?
a) desc t_test
b) alter table t_test add columns (comments string);
c) select count(*) from t_test;
d) truncate table t_test;

4. When we use the MaxCompute tunnel command to upload the log.txt file to the t_log
table, the t_log is a partition table and the partitioning column is (p1 string, p2 string).
Which of the following commands is correct?
a) tunnel upload log.txt t_log/p1="b1”, p2="b2"
b) tunnel upload log.txt t_log/(p1="b1”, p2="b2")
c) tunnel upload log.txt t_log/p1="b1"/p2="b2"

5. A Log table named log in MaxCompute is a partition table, and the partition key is dt. A
new partition is created daily to store the new data of that day. Now we have one
month's data, starting from dt='20180101' to dt='20180131', and we may use ________
to delete the data on 20180101.
a) delete from log where dt='20180101'
b) truncate table where dt='20180101'
c) drop partition log (dt='20180101')
d) alter table log drop partition(dt='20180101')

6. DataV is a powerful yet accessible data visualization tool, which features geographic
information systems allowing for rapid interpretation of data to understand
relationships, patterns, and trends. When a DataV screen is ready, it can embed works to
the existing portal of the enterprise through ______.
a) URL after the release
b) URL in the preview
c) MD5 code obtained after the release
d) Jar package imported after the release

7. By integrating live dashboards, DataV can present and monitor business data
simultaneously. This data-driven approach enables for well-organized data mining and
analysis allowing the user to seize new opportunities that otherwise might remain
hidden. It can support wide range of databases and data formats. Which of the following
options DataV does not support?
a) Alibaba Cloud' s AnalyticDB, ApsaraDB
b) Static data in CSV and JSON formats
c) Oracle Database
d) MaxCompute Project

8. You want to understand more about how users browse your public website. For example,
you want to know which pages they visit prior to placing an order. You have a server farm
of 100 web servers hosting your website. Which is the most efficient process to gather
these web servers across logs into traditional Hadoop ecosystem.
a) Just copy them into HDFS using curl
b) Ingest the server web logs into HDFS using Apache Flume
c) Channel these clickstreams into Hadoop using Hadoop Streaming
d) Import all user clicks from your OLTP databases into Hadoop using Sqoop

9. Your company stores user profile records in an OLTP databases. You want to join these
records with web server logs you have already ingested into the Hadoop file system.
What is the best way to obtain and ingest these user records?
a) Ingest with Hadoop streaming
b) Ingest using Hive
c) Ingest with sqoop import
d) Ingest with Pig's LOAD command
My Answer: Other file says B. Another file says B.
Correct Answer: B or C? Comment [1]: Answer: C

10. You are working on a project where you need to chain together MapReduce, Hive jobs.
You also need the ability to use forks, decision points, and path joins. Which ecosystem
project should you use to perform these actions?
a) Apache HUE
b) Apache Zookeeper
c) Apache Oozie
d) Apache Spark

Multiple selections

1. In DataWorks, we can configure alert policies to monitor periodically scheduled tasks, so that an alert will
be issued timely. Currently DataWorks supports ________ alerts.
(Number of correct answers: 2)
a) Email
b) Text message
c) Telephone
c) Aliwangwang

2. Which of the following task types does DataWorks support?


(Number of correct answers: 4)
a) Data Synchronization
b) SHELL
c) MaxCompute SQL
d) MaxCompute MR
e) Scala

3. In order to improve the processing efficiency when using


MaxCompute, you can specify the partition when creating a
table. That is, several fields in the table are specified as
partition columns. Which of the following descriptions about
MaxCompute partition table are correct? (Number of correct
answers: 4)
a) In most cases, user can consider the partition to be the
directory under the file system
b) User can specify multiple partitions, that is, multiple
fields of the table are considered as the partitions of the
table, and the relationship among partitions is similar to
that of multiple directories
c) If the partition columns to be accessed are specified
when using data, then only corresponding partitions are
read and full table scan is avoided, which can improve the
processing efficiency and save costs
d) MaxCompute partition only supports string type and the
conversion of any other types is not allowed
e) The partition value cannot have a double byte characters
(such as Chinese)
4. In DataWorks, a task should be instantiated first before a
scheduled task is running every time, that is, generating a
corresponding instance which is executed for running the
scheduled task. The status is different in each phase of the
scheduling process, including ________. (Number of correct
answers: 3)
a) Not running
b) Running
c) Running Successfully
5. Alibaba Cloud E-MapReduce can be easily plugged with other
Alibaba Cloud services such as Log Service, ONS, MNS that act
as data ingestion channels from real-time data streams. Which
of the following descriptions about real-time processing are
correct? (Number of correct answers: 3)

a) This data is streamed and processed using Apache


Flume or Kafka in integration with Apache Storm using
complex algorithms
b) Kafka is usually preferred with Apache Storm to
provide data pipeline
c) The final processed data can be stored in HDFS, HBase
or any other big data store service in real time.
d) Apache Sqoop is used to do the real-time data
transmission of structured data
True-or-false questions
1. One Alibaba Cloud account is entitled to join only one organization that uses DataWorks.
True
False
2. DataWorks can be used to create all types of tasks and configure scheduling cycles as
needed. The supported granularity levels of scheduling cycles include days, weeks,
months, hours, minutes and seconds.
True
False
Another file says True.
Correct Answer: True or False? Comment [2]: Answer: False

3. MaxCompute SQL is suitable for processing less real-time massive data, and employs a
syntax similar to that of SQL. The efficiency of data query can be improved through
creating proper indexes in the table.
True
False

Answer: Another file says True.

Correct Answer: True or False? Comment [3]: I think answes is false..


Corrected QA’s

1 .Function Studio is a web project coding and development tool independently developed by the
Alibaba Group for function development scenarios. It is an important component of DataWorks.
Function Studio supports several programming languages and platform-based function development
scenarios except for ______ .

A. Real-time computing

B. Python

C. Java

D. Scala

My Answer: D

2 .A business flow in DataWorks integrates different node task types by business type, such a structure
improves business code development facilitation. Which of the following descriptions about the node
type is INCORRECT?

A. A zero-load node is a control node that does not generate any data. The virtual node is generally used
as the root node for planning the overall node workflow.

B. An ODPS SQL task allows you to edit and maintain the SQL code on the Web, and easily implement
code runs, debug, and collaboration.

C. The PyODPS node in DataWorks can be integrated with MaxCompute Python SDK. You can edit the
Python code to operate MaxCompute on a PyODPS node in DataWorks.

D. The SHELL node supports standard SHELL syntax and the interactive syntax. The SHELL task can run on
the default resource group.

My Answer: A. Other file says B. Another file says B. Siddesh corrected file C.

Correct Answer: A or B or C? Comment [4]: Ans: A

3 .Apache Spark included in Alibaba E-MapReduce(EMR) is a fast and general-purpose cluster computing
system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports
general execution graphs. It also supports a rich set of higher-level tools. Which of the following tools
does not be included in Spark?

A. Spark SQL for SQL and structured data processing

B. MLlib for machine learning


C. GraphX for graph processing

D. TensorFlow for AI
Reason: Tensorflow is a different sdk

My Answer: D. Another file says A. Siddesh corrected file A.

Correct Answer: A or D? Comment [5]: Not sure about this

4 .DataWorks provides two billing methods: Pay-As-You-Go (post-payment) and subscription (pre-
payment). When DataWorks is activated in pay-as-you-go mode, Which of the following billing items will
not apply?

A. Shared resource groups for scheduling and Data Integration instances

B. Baseline instances generated by Intelligent Monitor and Data Quality checks

C. Task nodes created by developer

D. Calls and execution time of APIs compiled in DataService Studio

My Answer: C

5 .Users can use major BI tools, such as Tablueu and FineReport, to easily connect to MaxCompute
projects, and perform BI analysis or ad hoc queries. The quick query feature in MaxCompute is called
_________ allows you to provide services by encapsulating project table data in APIs, supporting diverse
application scenarios without data migration.

A. Lightning

B. MaxCompute Manager

C. Tunnel

D. Labelsecurity

My Answer: A. Another file says B.

Correct Answer: A or B? Comment [6]: Answer: A

6 .If a MySQL database contains 100 tables, and jack wants to migrate all those tables to MaxCompute
using DataWorks Data Integration, the conventional method would require him to configure 100 data
synchronization tasks. With _______ feature in DataWorks, he can upload all tables at the same time.

A. Full-Database Migration feature


B. Configure a MySQL Reader plug-in

C. Configure a MySQL Writer plug-in

D. Add data sources in Bulk Mode

My Answer: D. Another file says B. Siddesh corrected file A.

Correct Answer: A or B or D? Comment [7]: Answer: D

7 .Machine Learning Platform for Artificial Intelligence (PAI) node is one of the node types in DataWorks
business flow. It is used to call tasks created on PAI and schedule production activities based on the
node configuration. PAI nodes can be added to DataWorks only _________ .

A. after PAI experiments are created on PAI

B. after PAI service is activated

C. after MaxCompute service is activated

D. Spark on MaxCompute Machine Learning project is created

My Answer: A. Another file says B.

Correct Answer: A or B? Comment [8]: Not sure about this

8 .In a scenario where a large enterprise plans to use MaxCompute to process and analyze its data, tens
of thousands of tables and thousands of tasks are expected for this project, and a project team of 40
members is responsible for the project construction and O&M. From the perspective of engineering,
which of the following can considerably reduce the cost of project construction and management?

A. Develop directly on MaxCompute and use script-timed scheduling tasks

B. Use DataWorks

C. Use Eclipse

D. Use a private platform specially developed for this project

My Answer: B

9 .AliOrg Company plans to migrate their data with virtually no downtime. They want all the data
changes to the source database that occur during the migration are continuously replicated to the
target, allowing the source database to be fully operational during the migration process. After the
database migration is completed, the target database will remain synchronized with the source for as
long as you choose, allowing you to switch over the database at a convenient time. Which of the
following Alibaba products is the right choice for you to do it:
A. Log Service

B. DTS(Data Transmission Service)

C. Message Service

D. CloudMonitor

My Answer: B

10 .There are three types of node instances in an E-MapReducecluster: master, core, and _____ .

A. task

B. zero-load

C. gateway

D. agent

My Answer: A. Another file says D.

Correct Answer: A or D? Comment [9]: Answer: A

11 .A dataset includes the following items (time, region, sales amount). If you want to present the
information above in a chart, ______ is applicable.

A. Bubble Chart

B. Tree Chart

C. Pie Chart

D. Radar Chart

My Answer: A. Another file says C,.

Correct Answer: A or C? Comment [10]: Answer: A

12 .Alibaba Cloud Quick BI reporting tools support a variety of data sources, facilitating users to analyze
and present their data from different data sources. ______ is not supported as a data source yet.

A. Results returned from the API


B. MaxCompute

C. Local Excel files

D. MySQL RDS

Reason: Big data: Volume + variability + velocity. Api data has velocity and variability. But does it have
high volume? yes an API can handle large amount of data.

My Answer: A. Another file says C. Siddesh corrected file C.

Correct Answer: A or C? Comment [11]: I think Answer is B

13 .DataV is a powerful yet accessible data visualization tool, which features geographic information
systems allowing for rapid interpretation of data to understand relationships, patterns, and trends.
When a DataV screen is ready, it can embed works to the existing portal of the enterprise through
______.

A. URL after the release

B. URL in the preview

C. MD5 code obtained after the release

D. Jar package imported after the release

My Answer: A. Another file says C.

Correct Answer: A or C? Comment [12]: Answer: A

14 .Where is the meta data(e.g.,table schemas) in Hive?

A. Stored as metadata on the NameNode

B. Stored along with the data in HDFS

C. Stored in the RDBMS like MySQL

D. Stored in ZooKeeper

My Answer: C. Siddesh corrected file A.

Correct Answer: A or C? Comment [13]: Ans: C

15 ._______ instances in E-MapReduce are responsible for computing and can quickly add computing
power to a cluster. They can also scale up and down at any time without impacting the operations of the
cluster.

A. Task
B. Gateway

C. Master

D. Core

My Answer: A

16 .Your company stores user profile records in an OLTP databases. You want to join these records with
web server logs you have already ingested into the Hadoop file system. What is the best way to obtain
and ingest these user records?

A. Ingest with Hadoop streaming

B. Ingest using Hive

C. Ingest with sqoop import

D. Ingest with Pig's LOAD command

My Answer: C.
My Answer: Other file says B. Siddesh corrected file B.
Correct Answer: B or C? Comment [14]: Ans: C

17 .You are working on a project where you need to chain together MapReduce, Hive jobs. You also
need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to
perform these actions?

A. Spark

B. HUE

C. Zookeeper

D. Oozie

My Answer: Other file says C. Another file says C.

Ans: D Comment [15]: Ans: D

18 .Which node type in DataWorks can edit the Python code to operate data in MaxCompute?

A. PyODPS

B. ODPS MR Node

C. ODPS Script Node


D. SHELL node

My Answer: A

19 .DataService Studio in DataWorks aims to build a data service bus to help enterprises centrally
manage private and public APIs. DataService Studio allows you to quickly create APIs based on data
tables and register existing APIs with the DataService Studio platform for centralized management and
release. Which of the following descriptions about DataService Studio in DataWorks is INCORRECT?

A. DataService Studio is connected to API Gateway. Users can deploy APIs to API Gateway with one-
click.

B. DataService Studio adopts the serverless architecture. All you need to care is the query logic of APIs,
instead of the infrastructure such as the running environment.

C. To meet the personalized query requirements of advanced users, DataService Studio provides the
custom Python script mode to allow you compile the API query by yourself. It also supports multi-table
association, complex query conditions, and aggregate functions.

D. Users can deploy any APIs created and registered in DataService Studio to API Gateway for
management, such as API authorization and authentication, traffic control, and metering.

My Answer: C

20 .MaxCompute Tunnel provides high concurrency data upload and download services. User can use
the Tunnel service to upload or download the data to MaxCompute. Which of the following descriptions
about Tunnel is NOT correct:

A. MaxCompute Tunnel provides the Java programming interface for users

B. MaxCompute provides two data import and export methods: using Tunnel Operation on the console
directly or using TUNNEL written with java

C. If data fails to be uploaded, use the restore command to restore the upload from where it was
interrupted

D. Tunnel commands are mainly used to upload or download data.They provide the following
functions:upload, download, resume, show, purge etc.

My Answer: B Comment [16]: Ans: C, it's not Restore


command, correct command is Resume.

21 .Which of the following is not proper for granting the permission on a L4 MaxCompute table to a
user. (L4 is a level in MaxCompute Label-based security (LabelSecurity), it is a required MaxCompute
Access Control (MAC) policy at the project space level. It allows project administrators to control the
user access to column-level sensitive data with improved flexibility.)

A. If no permissions have been granted to the user and the user does not belong to the project, add the
user to the project. The user does not have any permissions before they are added to the project.

B. Grant a specific operation permission to the user.

C. If the user manages resources that have labels, such as datasheets and packages with datasheets,
grant label permissions to the user.

D. The user need to create a project in simple mode

My Answer: D. Other file says A. Another file says A. Siddesh corrected file A.

Correct Answer: A or D? Comment [17]: Ans: D

22 .MaxCompute supports two kinds of charging methods: Pay-As-You-Go and Subscription (CU cost).
Pay-As-You-Go means each task is measured according to the input size by job cost. In this charging
method the billing items do not include charges due to ______.

A. Data upload

B. Data download

C. Computing

D. Storage

Reason: MaxCompute counts and charges for storage, computation, and download operations. This
topic describes how to select the billing method and preliminarily estimate calculation and storage
costs for MaxCompute. The billing methods include pay-as-you-go and subscription.

My Answer: B Comment [18]: Answer is A, Alibaba do not


charge for uploading

23 .MaxCompute is a general purpose, fully managed, multi-tenancy data processing platform for large-
scale data warehousing, and it is mainly used for storage and computing of batch structured data. Which
of the following is not a use case for MaxCompute?

A. Order management

B. Date Warehouse

C. Social networking analysis

D. User profile

My Answer: A. Other file says B. Another file says B. Siddesh corrected file B.
Correct Answer: A or B? Comment [19]: Answer: B

24 .Tom is the administrator of a project prj1 in MaxCompute. The project involves a large volume of
sensitive data such as user IDs and shopping records, and many data mining algorithms with proprietary
intellectual property rights. Tom wants to properly protect these sensitive data and algorithms. To be
specific, project users can only access the data within the project, all data flows only within the project.
What operation should he perform?

A. Use ACL authorization to set the status to read-only for all users

B. Use Policy authorization to set the status to read-only for all users

C. Allow the object creator to access the object

D. Enable the data protection mechanism in the project, using set ProjectProtection=true;

My Answer: D. Other file says B. Siddesh corrected file B. Prem-B

Correct Answer: B or D? Comment [20]: Answer: D

25 .There are multiple connection clients for MaxCompute, which of the following is the easiest way to
configure workflow and scheduling for MaxCompute tasks?

A. Use DataWorks

B. Use Intelij IDEA

C. Use MaxCompute Console

D. No supported tool yet

My Answer: A. Other file says B. Another file say B. Siddesh corrected file C.

Correct Answer: A or B or C? Comment [21]: Answer: B

26 . In MaxCompute, you can use Tunnel command line for data upload and download. Which of the
following description of Tunnel command is NOT correct:

A. Upload: Supports file or directory (level-one) uploading. Data can only be uploaded to a single table
or table partition each time.

B. Download: You can only download data to a single file. Only data in one table or partition can be
downloaded to one file each time. For partitioned tables, the source partition must be specified.

C. Resume: If an error occurs due to the network or the Tunnel service, you can resume transmission of
the file or directory after interruption.
D. Purge: Clears the table directory. By default, use this command to clear information of the last three
days.

My Answer: B. Siddesh corrected file D.

Correct Answer: B or D? Comment [22]: Answer: B

27 .Scenario: Jack is the administrator of project prj1. A new team member, Alice (already has an Alibaba
Cloud account [email protected]), applies for joining this project with the following permissions: view
table lists, submit jobs, and create tables. Which of the following SQL statements is useless:

A. use prj1;

B. add user [email protected];

C. grant List, CreateTable, CreateInstance on project prj1 to user aliyun$alice@aliyun;

D. flush privileges;

My Answer: D. Other file says B. Another file says B.

Correct Answer: B or D? Comment [23]: Answer: D

28 .commark is an open-source framework that functions on the service level to support data processing
and analysis operations. Equipped with unified computing resources and data set permissions, Spark on
MaxCompute allows you to submit and run jobs while using your preferred development methods.
Which of the following descriptions about Spark on MaxCompute is NOT correct:

A. Spark on MaxCompute provides you with native Spark Web UIs.

B. Different versions of Spark can run in MaxCompute at the same time.

C. Similar to MaxCompute SQL and MaxCompute MapReduce, Spark on MaxCompute runs in the unified
computing resources activated for MaxCompute projects.

D. Spark on MaxCompute has a separate permission system which will not allow users to query data
without any additional permission modifications required.

My Answer: B

29 .In MaxCompute command line, if you want to view all tables in a project, you can execute command:
______.

A. show tables;

B. use tables;

C. desc tables;
D. select tables;

My Answer: Other file says B. Another file says B.

Correct Answer: A or B? Comment [24]: Answer: A

30 .When odpscmd is used to connect to a project in MaxCompute, the command ______ can be
executed to view the size of the space occupied by table table_a.

A. select size from table_a;

B. size table_a;

C. desc table_a;

D. show table table_a;

My Answer: Other file says B. Another file says B. Siddesh corrected file C. Prem-B

Correct Answer: B or C? Comment [25]: Answer: C

True/False
31 .Data Migration Unit (DMU) is used to measure the amount of resources consumed by data
integration, including CPU, memory, and network. One DMU represents the minimum amount of
resources used for a data synchronization task.

True

False

My Answer: True

32 .DataWorks can be used to create all types of tasks and configure scheduling cycles as needed. The
supported granularity levels of scheduling cycles include days, weeks, months, hours, minutes and
seconds.

True

False

My Answer: False

33 .If a task node of DataWorks is deleted from the recycle bin, it can still be restored.
True

False

My Answer: True. Siddesh corrected file False.

Correct Answer: True or False? Comment [26]: Answer: False

34 .If the DataWorks(MaxCompute) tables in your request belong to two owners. In this case, Data
Guard(DataWorks component) automatically splits your request into two by table owner.

True

False

My Answer: B . Other file says True. Another file says True. Siddesh corrected file True.

Correct Answer: True or False? Comment [27]: Answer: True

35 .The FTP data source in DataWorks allows you to read/write data to FTP, and supports configuring
synchronization tasks in wizard and script mode.

True

False

My Answer: True

36 .In each release of E-MapReduce, the software and software version are flexible. You can select
multiple software versions.

True

False

My Answer: False. Other file says True. Another file says True. Siddesh corrected file False.

Correct Answer: True or False? Comment [28]: Norlt sure but I think answer is
True.

37 .Alibaba Cloud Elastic MapReduce (E-MapReduce) is a big data processing solution to quickly process
huge amounts of data. Based on open source Apache Hadoop and Apache Spark, E-MapReduce flexibly
manages your big data use cases such as trend analysis, data warehousing, and analysis of continuously
streaming data.

True

False
My Answer: True

38 .An enterprise uses Alibaba Cloud MaxCompute for storage of service orders, system logs and
management data. Because the security levels for the data are different, it is needed to register multiple
Alibaba Cloud accounts for data management.

True

False

My Answer: False

39 .JindoFS in E-MapReduce provided by SmartData uses OSS as the storage back end.

True

False

My Answer: True

40 .In DataWorks table permission system, you can revoke permissions only on the fields whose security
level is higher than the security level of your account.

True

False

My Answer: True

41 .Project is an important concept in MaxCompute. A user can create multiple projects, and each object
belongs to a certain project.

True

False

My Answer: True

42 .Assume that Task 1 is configured to run at 02:00 each day. In this case, the scheduling system
automatically generates a snapshot at the time predefined by the periodic node task at 23:30 each day.
That is, the instance of Task 1 will run at 02:00 the next day. If the system detects the upstream task is
complete, the system automatically runs the Task 1 instance at 02:00 the next day.

True
False

My Answer: True

43 .In MaxCompute, if error occurs in Tunnel transmission due to network or Tunnel service, the user
can resume the last update operation through the command tunnel resume;.

True

False

My Answer: True

44 .A company originally handled the local data services through the Java programs. The local data have
been migrated to MaxCompute on the cloud, now the data can be accessed through modifying the Java
code and using the Java APIs provided by MaxCompute.

True

False

My Answer: True

45 .MaxCompute takes Project as a charged unit. The bill is charged according to three aspects: the
usage of storage, computing resource, and data download respectively. You pay for compute and
storage resources by the day with no long-term commitments.

True

False

My Answer: True

46 .There are various methods for accessing to MaxCompute, for example, through management
console, client command line, and Java API. Command line tool odpscmd can be used to create, operate,
or delete a table in a project.

True

False

My Answer: True

47 .A start-up company wants to use Alibaba Cloud MaxCompute to provide product recommendation
services for its users. However, the company does not have much users at the initial stage, while the
charge for MaxCompute is higher than that of ApsaraDB RDS, so the company should be recommended
to use MaxCompute service until the number of its users increases to a certain size.

True

False

My Answer: Other file says True. Another file says True. Siddesh corrected file says False. Prem-True

Correct Answer: True or False? Comment [29]: Answer: False

48 .Synchronous development in DataWorks provides both wizard and script modes.

True

False

My Answer: True

49 .MaxCompute SQL is suitable for processing less real-time massive data, and employs a syntax similar
to that of SQL. The efficiency of data query can be improved through creating proper indexes in the
table.

True

False

My Answer: B. Siddesh corrected file says True.

Correct Answer: True or False? Comment [30]: Answer: True

50 .Table is a data storage unit in MaxCompute. It is a two-dimensional logical structure composed of


rows and columns. All data is stored in the tables. Operating objects of computing tasks are all tables. A
user can perform create table, drop table, and tunnel upload as well as update the qualified data in the
table.

True

False

My Answer: True

51 .Which of the following Hadoop ecosystem componets can you choose to setup a streaming log
analysis system?(Number of correct answers: 3)

A. Apache Flume
B. Apache Kafka

C. Apache Spark

D. Apache Lucene

My Answer: ABC. Other file says ACD. Another file says ACD.

Correct Answer: ABC or ACD? Comment [31]: I think answer is ABC

52 .A distributed file system like GFS and Hadoop are design to have much larger block(or chunk) size
like 64MB or 128MB, which of the following descriptions are correct? (Number of correct answers: 4)

A. It reduces clients' need to interact with the master because reads and writes on the same block( or
chunck) require only one initial request to the master for block location information

B. Since on a large block(or chunk), a client is more likely to perform many operations on a given block, it
can reduce network overhead by keeping a persistent TCP connection to the metadata server over an
extended period of time

C. It reduces the size of the metadata stored on the master

D. The servers storing those blocks may become hot spots if many clients are accessing the same small
files

E. If necessary to support even larger file systems, the cost of adding extra memory to the meta data
server is a big price

My Answer: Other file says ABCDE.

Correct Answer: ABCD or ABCDE? Comment [32]: Not sure about this

53 .MaxCompute can coordinate multiple users to operate one project through ACL authorization. The
objects that can be authorized by ACL include ______. (Number of correct answers: 3)

A. Project

B. Table

C. Resource

D. Procedure

E. Job

My Answer: ACD. Another file says ABC.

Correct Answer: ABC or ACD? Comment [33]: Answer: ABC


54 .DataWorks can be used to develop and configure data sync tasks. Which of the following statements
are correct? (Number of correct answers: 3)

A. The data source configuration in the project management is required to add data source

B. Some of the columns in source tables can be extracted to create a mapping relationship between
fields, and constants or variables can't be added

C. For the extraction of source data, "where" filtering clause can be referenced as the criteria of
incremental synchronization

D. Clean-up rules can be set to clear or preserve existing data before data write

My Answer: A,B,D. Another file say ABCD.

Correct Answer: ABD or ABCD? Comment [34]: I think answer is ABD

55 .The data development mode in DataWorks has been upgraded to the three-level structure
comprising of _____, _____, and ______. (Number of correct answers: 3)

A. Project

B. Solution

C. Business flow

D. Directory

My Answer: A,B,C

56 .In DataWorks, we can configure alert policies to monitor periodically scheduled tasks, so that an
alert will be issued timely. Currently DataWorks supports ________ alerts. (Number of correct answers:
2)

A. Email

B. Text message

C. Telephone

D. Aliwangwang

My Answer: A,B

57 .DataWorks provides powerful scheduling capabilities including time-based or dependency-based


task trigger mechanisms to perform tens of millions of tasks accurately and punctually each day based
on DAG relationships. It supports multiple scheduling frequency configurations like: (Number of correct
answers: 4)

A. By Minute

B. By Hour

C. By Day

D. By Week

E. By Second

My Answer: A,B,C,D

58 .MaxCompute is a fast and fully-managed TB/PB-level data warehousing solution provided by Alibaba
Cloud. Which of the following product features are correct? ______ (Number of correct answers: 3)

A. Distributed architecture

B. High security and reliability

C. Multi-level management and authorization

D. Efficient transaction processing

E. Fast real-time response

My Answer: A,B,E

59 .Resource is a particular concept of MaxCompute. If you want to use user-defined


function UDF or MapReduce, resource is needed. For example: After you have prepared UDF, you must
upload the compiled jar package to MaxCompute as resource. Which of the following objects are
MaxCompute resources? (Number of correct answers: 4)

A. Files

B. Tables: Tables in MaxCompute

C. Jar: Compiled Java jar package

D. Archive: Recognize the compression type according to the postfix in the resource name

E. ACL Policy

My Answer: Other file says ABCD. Other file says ABCDE.

Correct Answer: ABCD or ABCDE? Comment [35]: Answer: ABCD


60 .In order to ensure smooth processing of tasks in the Dataworks data development kit, you must
create an AccessKey. An AccessKey is primarily used for access permission verification between various
Alibaba Cloud products. The AccessKey has two parts, they are ____. (Number of correct answers: 2)

A. Access Username

B. Access Key ID

C. Access Key Secret

D. Access Password

My Answer: B,C

61. DataWorks uses MaxCompute as the core computing and storage engine to provide massive offline
data processing, analysis, and mining capabilities. It introduces both simple and standard modes
workspace. Which of the following descriptions about DataWorks Workspace and MaxCompute Project
is INCORRECT?

A. A simple mode refers to a DataWorks Workspace that corresponds to a MaxCompute Project and
cannot set up a development and Production Environment

B. The advantage of the simple mode is that the iteration is fast, and the code is submitted without
publishing, it will take effect. The risk of a simple mode is that the development role is too privileged to
delete the tables under this project, there is a risk of table permissions.

C. Standard mode refers to a DataWorks project corresponding to two MaxCompute projects, which can
be set up to develop and produce dual environments, improve code development specifications and be
able to strictly control table permissions, the operation of tables in Production Environments is
prohibited, and the data security of production tables is guaranteed.

D. All Task edits can be performed in the Development Environment, and the Production Environment
Code can also be directly modified

My Answer: B

Correct Answer: B? Comment [36]: Not sure, I think it's B or D

62. MaxCompute provides SQL and MapReduce for calculation and analysis service. Which of the
following descriptions about MaxCompute and SQL is NOT correct:

A. In MaxCompute, data is stored in forms of tables, MaxCompute provides a SQL query function for the
external interface

B. You can operate MaxCompute just like traditional database software, but It is worth to mention that
MaxCompute SQL does not support transactions, index and Update/Delete operations
C. MaxCompute SQL syntax differs from Oracle and MySQL, so the user cannot migrate SQL statements
of other databases into MaxCompute seamlessly

D. MaxCompute SQL can complete the query in minutes even seconds, and it can be able to return the
result in millisecond without using other process engine.

My Answer: D

Correct Answer: D? Comment [37]: Answer: D

63. DataWorks provides scheduling capabilities including time-based or dependency-based task trigger
functions to perform tens of millions of tasks accurately and timely each day, based on DAG
relationships. Which of the following descriptions about scheduling and dependency in DataWorks is
INCORRECT?

A. Users can configure an upstream dependency for a task. In this way, even if the current task instance
reaches the scheduled time, the task only run after the instance upstream task is completed.

B. If no upstream tasks is configured then, by default the current task is triggered by the project. As a
result, the default upstream task of the current task is project_start in the scheduling system. By default,
a project_start task is created as a root task for each project.

C. If the task is submitted after 23: 30, the scheduling system automatically cycle-generate instances
from the second day and run on time.

D. The system automatically generates an instance for the task at each time point according to the
scheduling attribute configuration and periodically runs the task from the second day only after a task is
submitted to the scheduling system.

My Answer: D

Correct Answer: D? Comment [38]: Answer: D

64. E-MapReduce simplifies big data processing, making it easy, fast, scalable and cost-effective for you
to provision distributed Hadoop clusters and process your data. This helps you to streamline your
business through better decisions based on massive data analysis completed in real time. Which of the
following descriptions about E-MR is NOT true?

A. E-MapReduce allows you simply select the required ECS model (CPU or memory) and disks, and the
required software for automatic deployment

B. Saves extra overheads involved in managing the underlying instances

C. Seamless integration with other Alibaba Cloud products to be used as the input source or output
destination of Hadoop/Spark calculation engine

D. It supports the Pay-As-You-Go payment method, which means that the cost of each task is measured
according to the input size
My Answer: B

Correct Answer: B? Comment [39]: I think answer is A

65. When a local file is updated to Quick BI for presentation, the data is stored in ______.

A. Exploration space built in Quick BI

B. MaxCompute built in Quick BI

C. AnalyticDB

D. Client local cache

My Answer: B. Prem-D

Correct Answer: B? Comment [40]: I think answer is A

66. Which HDFS daemon or service manage all the meta data stored in HDFS?

A. secondary namenode

B. namenode

C. datanode

D. node manager

My Answer: B. Prem-C

Correct Answer: B? Comment [41]: Not sure about this

67. Which of the following descriptions about MaxCompute security is NOT correct:

A. MaxCompute supports two account systems: the Alibaba Cloud account system and RAM user system

B. MaxCompute recognizes RAM users but cannot recognize RAM permissions. That is, you can add RAM
users under your Alibaba Cloud account to a MaxCompute project. However, MaxCompute does not
consider the RAM permission definitions when it verifies the permissions of RAM users.

C. LabelSecurity is a workspace-level mandatory access control (MAC) policy that enables workspace
administrators to control user access to row-level sensitive data more flexibly.

D. MaxCompute users can share data and resources, such as tables and functions, among workspaces by
using packages.

My Answer: B

Correct Answer: B? Comment [42]: Not sure about this


68. MaxCompute SQL is suitable for the scenarios: there is massive data (TB level) to be processed and
real-time requirement is not high. It takes seconds or even minutes to prepare each job and submit each
job, so MaxCompute SQL is not acceptable for the services which need to process thousands to tens of
thousands of transactions per second. Which of the following descriptions about MaxCompute SQL is
NOT correct:

A. The synax of ODPS SQL is similar to SQL. It can be considered as a subset of standard SQL

B. MaxCompute SQL is not equivalent to a database, which has no database characteristics in many
aspects, such as transaction, primary key constraints, index

C. At present, the maximum length of SQL in MaxCompute is 2MB

D. MaxCompute SQL is 100% equivalent to Hive SQL

My Answer: D. Prem-B

Correct Answer: D? Comment [43]: Not sure but I think it could be D

69. By default, the resource group in DataWorks provides you 50 slots and each DMU occupies 2 slots.
This means the default resource group supports 25 DMUs at the same time.

True

False

My Answer:

Correct Answer: True or False? Comment [44]: I think its True

70. JindoFS is a cloud-native file system that combines the advantages of OSS and local storage. JindoFS
is also the next-generation storage system that provides efficient and reliable storage services for cloud
computing. To use JindoFS, select the related services when creating an E-MapReduce cluster.

True

False

My Answer: A

71. A partition table can be created through the following statement in MaxCompute SQL:

create table if not exists t_student(

name string,

number string)
partitioned by ( department string);

True

False

My Answer: A. Prem-False

Correct Answer: True or False? Comment [45]: Answer: True

72. E-MapReduce(EMR) Auto Scaling feature is designed to reduce costs and improve execution
efficiency. which of the following descriptions about EMR Auto Scaling are correct? (Number of correct
answers: 3)

A. Auto Scaling only supports scaling in and scaling out a cluster by adding or removing task nodes.

B. Scale by Time is recommended as the rule type if you can specify the time to scale a cluster.

C. Scale by Rule is recommended as the rule type if you cannot specify the time to scale a cluster and
need to add and remove computing resources based on the specified YARN metrics.

D. Auto Scaling only supports Pay-As-You-Go Hadoop clusters.

My Answer: A,B,D

Correct Answer: ABD? Comment [46]: Not sure about this

73. DataWorks App Studio is a tool designed to facilitate your data product development. It comes with
a rich set of frontend components that you can drag and drop to easily and quickly build frontend apps.
With App Studio, you do not need to download and install a local integrated development environment
(IDE) or configure and maintain environment variables. Instead, you can use a browser to write, run, and
debug apps and enjoy the same coding experience as that in a local IDE. App Studio also allows you to
publish apps online. Which of the following descriptions about APP Studio in DataWorks is CORRECT?
(Number of correct answers: 3)

A. App Studio comes with all breakpoint types and operations of a local IDE. It supports thread switching
and filtering, variable viewing and watching, remote debugging, and hot code replacement.

B. You can directly access the runtime environment, which is currently built based on MacOS as the base
image.

C. You and your team members can use App Studio to share the development environment for
collaborative coding.

D. App Studio supports real-time collaborative coding. Multiple collaborators of a team can develop and
write code at the same time in the same project, and view changes made by other collaborators in real
time. This feature helps avoid the hassle of synchronizing code and merging branches and significantly
improve the development efficiency.
E. APP Studio is included in Basic edition of DataWorks

My Answer: A,B,C

Correct Answer: ABC? Comment [47]: Not sure about this

74. MaxCompute Graph is a processing framework designed for iterative graph computing.
MaxCompute Graph jobs use graphs to build models. Graphs are composed of vertices and edges.
Which of the following operations can MaxCompute support? (Number of correct answers: 3)

A. Modify the value of a vertex or edge.

B. Add/delete a vertex.

C. Add/delete an edge.

D. When editing a vertex and an edge, you don't have to maintain their relationship.

My Answer: A,B,C. Prem-BCD

Correct Answer: ABC? Comment [48]: Answer: ABC

75. There are various methods for connecting and using MaxCompute, which of the following options
have lower thresholds for the size of uploading file? ______. (Number of correct answers: 2)

A. DataWorks

B. IntelliJ IDEA

C. MaxCompute Tunnel

D. Alibaba DTS

My Answer: A,D

Correct Answer: AD or BD? Comment [49]: Not sure about thisNot sure
about this
MaxCompute SQL Quiz

LATEST SUBMISSION GRADE

100%

1.Question 1

MaxCompute SQL uses a syntax similar to SQL. Which statement is correct?

MaxCompute SQL extended standard SQL

MaxCompute can be equivalent to a database.

The maximum SQL length allowed in MaxCompute varies with the environment resources that are
applied.

MaxCompute SQL is suitable for massive data (GB, TB, EB level), off-line batch calculation scenarios.

2.Question 2

Currently, MaxCompute supports specifying up to 6 small tables in a mapjoin, otherwise syntax errors
are reported, and records for a single small table are limited to no more than 10,000.
True

False

3.Question 3

MaxCompute SQL syntax does not support between conditional queries.

True

False

4.Question 4

When you have multiple tables join, it only allows one leftmost table to be a mapjoin table.

True

False

5.Question 5

Data type inconsistency is often encountered in business data processing. In order to keep data type
consistent, data processing system involves data type conversion. If MaxComputeSQL is used for data
processing, which of the following conversion can not be achieved?

Bigint to String

String to Boolean
String to Bigint

Datetime to String

6.Question 6

Which logical operation is incorrect in MaxCompute SQL?

NULL and FALSE=FALSE

NULL and TRUE=NULL

FALSE or TRUE=TRUE

TRUE or NULL=TRUE

7.Question 7

Suppose table t_dml only has one field named as id (type is string), which MaxCompute SQL query
cannot be executed correctly?

create table t_dml_bak like t_dml;

insert into table t_dml select '1900-01-01 00:00:00' from dual;

insert overwrite table t_dml select '' from dual;

update table t_dml set id='a' ;

Correct

8.Question 8

Which UNION ALL statements in MaxCompute SQL are correct? (Number of Correct Answers: 3)

Combines two or multiple data sets returned by a SELECT operation into one data set. If the result
contains duplicated rows, all rows that meet the conditions are returned, and deduplication of
duplicated rows is not applied.

MaxCompute does not support union two main query results, but you can do it on two subquery results.

The columns of each sub query corresponding to the union all operation must be listed, not supporting
*.

The number, names, and types of queried columns corresponding to the UNION ALL/UNION operation
must be consistent.

9.Question 9

Which of the following queries can be executed? (Number of Correct Answers: 2)


select sum(total_price) from sale_detail group by region;

elect region as r from sale_detail group by r;

select region, total_price from sale_detail group by region, total_price;

select region as r from sale_detail order by region limit 100;

select region as r from sale_detail distribute by region;

10.Question 10

Which of the following statements of mapjoin Hint application of MaxCompute SQL are correct ?
(Number of Correct Answers: 3)

When a large table joins one or multiple small tables, you can use MapJoin, which performs much faster
than regular Joins

When Mapjoin references to a small table or sub query, alias should be referenced.

MaxCompute SQL does not support the use of complex Join conditions such as unequal expressions, or
logic in normal Join's on conditions, but in MapJoin it can

When do multiple tables Join, the two left tables can be MapJoin tables at the same time.

MaxCompute Quiz: UDF


TOTAL POINTS 10

1.Question 1

Array type in MaxCompute maps to Java Array.

True

False

2.Question 2

Java UDF supports Bigint, String, Double, Boolean , ARRAY, MAP, and STRUT

True

False
3.Question 3

UDF output a return value at a time. UDTF can output more than two records at one time.

True

False

4.Question 4

For UDAF(User Defined Aggregation Function), input and output is many-for-one relationship.

True

False

5.Question 5

The NULL value in SQL is represented by a NULL reference in Java; therefore, ‘Java primitive type’ is not
allowed because it cannot represent a NULL value in SQL.

True

False

6.Question 6

Which of the following is not included in MaxCompute user-defined functions?

UDF

UDAF

UDGF

UDTF

7.Question 7

Which correspondence is incorrect between MaxCompute data type and Java data type?

Tinyint maps to java.lang.Byte

Smallint maps to java.lang.Short

Decimal maps to java.lang.BigDecimal

Timestamp maps to java.lang.Timestamp


8.Question 8

Which statement is incorrect for UDF debug?

It can be tested in two ways: unit test and local execution.

Need to specify the running data source when run UDF locally.

UDF/UDAF/UDTF typically works on some columns of the table in the SELECT clause, and needs to
configure MaxCompute project, table, and column when run local test.

Warehouse is built locally to store tables (including meta and data) or resources for executing UDF
locally. The project name, tables, table name and sample data are under the warehouse directory in
order.

9.Question 9

Which of the following ways of UDTF usage in SQL are correct? (Number of Correct Answers: 3)

select user_udtf(col0, col1, col2) as (c0, c1) from my_table;

select user_udtf(col0, col1, col2) as (c0, c1),col3 from my_table;

select user_udtf(col0, col1, col2) as (c0, c1) from (select * from my_table distribute by key sort by key) t;

select reduce_udtf(col0, col1, col2) as (c0, c1) from (select col0, col1, col2 from (select map_udtf(a0, a1,
a2, a3) as (col0, col1, col2) from my_table) t1 distribute by col0 sort by col0, col1) t2;

10.Question 10

Which of the following UDTF statements are correct? (Number of Correct Answers: 2)

It does not support the use with group by together in the same SELECT clause.

It supports the use with distribute by together in the same SELECT clause.

Supports other expressions in the same SELECT clause.

It does not support the use with sort by together in the same SELECT clause.
Data Visualization Quiz
TOTAL POINTS 10
1.Question 1

As an important Platform as a service (PaaS) product in Alibaba Cloud product portfolios. Alibaba Cloud
DataWorks offers its users a one-stop solution, which of the following capability is not included in the
solution?

Data Integration

Data Management

Data Governance

Data Big Screen Dashboard

2.Question 2

If today's date is 2019-03-11, what is the result if the Partition Expression is set to dt=$[yyyymmdd-1]
when creating rule configuration in Data Quality?

dt=20190310

dt=20190311

dt=20190312

dt=20190311-1

3.Question 3

Function Studio allows you to edit MaxCompute Java user-defined functions (UDFs) and to compile and
publish them to DataWorks with one click.

True

False

4.Question 4

Which of the following notification method is not supported by the Data Quality?

DingTalk

Email
WeChat
SMS

5.Question 5

Which of the following tasks is supported for being associated with Data Quality?

Scheduling Tasks

ODPS SQL tasks

Data Sync Tasks

ODPS Script tasks

6.Question 6

What alarm levels are supported by DataWorks Data Quality Control component? (Number of correct
answers: 2)

Black alarm level

Blue alarm level

Orange alarm level

Red alarm level

7.Question 7

Function Studio in DataWorks supports UDF(user define function), ______  and ______
templates.

UDAF(user defined aggregate function)

UDTF(user defined table function)

MapReduce function

Flink function

8.Question 8

The process of using data quality is to configure monitoring rules for existing tables. After you configure
a rule, what can be done to verify the rule?

Run a Trial

Test
Run A/B Test

Run Stress Testing

9.Question 9

DataService Studio works together with _________ to provide a secure, stable, low-cost and easy-to-use
data sharing service.

Alibaba Cloud API Gateway

Alibaba Cloud PAI platform

Alibaba Cloud CDN service

Alibaba Cloud OSS service

10.Question 10

Which of the following products is one of the underlying computing engine layer of DataWorks?

ApsaraDB for RDS

MaxCompute

OSS

Polar DB

Quick BI Quiz
TOTAL POINTS 10

1.Question 1

Which operation is not supported in Quick BI?

Delete

Edit

Copy

Move
2.Question 2

Which local file type is not supported when using local files as the data source of QuickBI ?

CSV

XLS

XLSX

TXT

3.Question 3

The exploration space is a dedicated storage area of Quick BI. It supports txt, CSC, Excel, DataWorks.

True

False

4.Question 4

Organizational unit management supports collaborative data development.

True

False

5.Question 5

In QuickBI, when use a local Excel file contains multiple sheets as the data source, all sheets can be
uploaded at once.

True

False

6.Question 6

Which chart is suitable for comparing the sales situation of a commodity in various regions?

Gauge

Bar Chart

Card Chart
Scatter Chart

7.Question 7

Which one is not contained in the data elements of scatter chart?

Color Legend

Color block size

X axis

Y axis

8.Question 8

Which of the following statements are correct for dashboard in QuickBI? (Number of Correct Answers:
3)

The dashboard supports two modes:Standard dashboard & Full -Screen mode .

Before creating a dashboard, you must prepare a dataset.

The shared dashboard can be modified.

You can simply edit the dataset accordingly to meet the actual dashboard demands.

9.Question 9

What are the Key components of QuickBI? (Number of Correct Answers: 3)

Data connection module

Data preparation module

Data presentation module

Data storage module

10.Question 10

Which of the following scenarios apply to QuickBI? (Number of Correct Answers: 3)

Making reports

Business exploration
Self-help data acquisition

Data synchronization

Machine Learning Quiz


TOTAL POINTS 10

1.Question 1

Machine Learning Platform For AI provides end-to-end machine learning services, including data
processing, feature engineering, model training, model prediction, and model evaluation. Machi ne
Learning Platform For AI combines all of these services to make AI more accessible than ever.

True

False

2.Question 2

The Write MaxCompute Table component supports partitioned tables in MaxCompute.

True

False

3.Question 3

The Read MaxCompute Table component is unaware of any modifications (such as add or remove a
column) made to a table that is already loaded to the component.

True

False

4.Question 4

Feature engineering includes feature derivation and scale change. The heart disease prediction project
uses the feature selection and data normalization components for feature engineering.

True

False

5.Question 5
The Read MaxCompute Table component enables you to read data from MaxCompute tables. To read a
table from another project that you are authorized to access, you can use the format of _________.

Project Name.Table Name

The component only reads tables in the current project.

Only the Table Name

6.Question 6

The total service fee is equal to the billing fee of the component you use multiplied by the number of
computing hours. The computing hours are measured by using the formula ______

Max(vCPU cores, memory size/4) x running time (minutes).

Max(vCPU cores, memory size/4) x running time (hours).

Max(vCPU cores, memory size) x running time (hours).

Max(vCPU cores, memory size) x running time (minutes).

7.Question 7

A SQL Script Component supports a maximum of _____ input port(s) and one output port.

8.Question 8

How do you set the algorithm parameters in PAI ?

XML

JSON

CSV

XLS

9.Question 9
Sampling data in PAI is generated in the weighted mode. The weight column must be of ____ or ____
type.

Double

Boolean

Int

String

10.Question 10

The processor layer of PAI is the infrastructure layer that consists of ______ and _____ clusters.

CPU

MPI

GPU

MapReduce
Start Self-Test - Stage I: Big Data Fundamentals
A. Stage I: Big Data Fundamentals - DataWorks
1. Which of these DataWorks roles has permission to deploy workflows but not to edit them?
a. Developer
b. Deployer
c. Visitor
d. Project Administrator

2. When using MaxCompute with DataWorks, it is possible to create and run multiple types of
jobs on top of MaxCompute, including MaxCompute SQL jobs, and Spark Jobs
a. True
b. False

3. Which og these permissions is NOT granted to the OAM role in DataWorks?


a. Maintenance users are granted with permissions by the project administrator
b. Maintenance users have release permissions
c. maintenance users can create workflows
d. maintenance users have online maintenance permissions

4. Which og these statement about the project administrator role in DataWorks is correct?
a. When adding users to a DataWorks workspace, the project administrator can only add
RAM users under the current account
b. When adding user to DataWorks Workspace, the project administrator can add
other Alibaba Cloud accounts as uers, but not RAM users under the current account
c. The project administrator does not have permiq edit workflows
d. The project administrator can add users to a workspace, nut not remove the

5. Which of these is NOT a time interval supported by DataWorks?


a. Daily
b. Hourly
c. Quarterly
d. monthly

6. one of the key features of DataWorks is its ability to create and run scheduled tasks that can
perform data import, processing, and analysis at regular intervals, without human intervention
a. true
b. false

7. you want to create a scheduled job in DataWorks to import data from a MySQL database into
MaxCompute once a day, is this possible?
a. No, dataWorks doesn’t support this
b. Yes, but you have to create a data synchronization job in DTS (Data Transformation
Service) first
c. Yes, you can do this directly using DataWorks Data Integration Feature
d. Yes, but this feature is not supported in the basic edition of DataWorks
8. Which of the user roles in DataWorks does not have permissions to alter anything within the
DataWorks workspace
a. OAM
b. Developer
c. Visitor
d. Project Administrator

9. Which of these is NOT a permissions that DataWorks grants to developer role?


a. Developers can create workflows
b. Developers can manage data source within the DataWorks Workspace (i.e can
add/remove data source)
c. Developers can create new script files and new UDF functions
d. Developers can publish packages

10. Assuming you have chosen MaxCompute as the compute engine to use within your
DataWorks workspace, what types of tasks will you be able to run? number of correct
answers: 3)
a. MaxCompute SQL (ODPS SQL)
b. Resource
c. MAxCompute MR (MapReduce)
d. Python (PyODPS)

11. Which of these are valid DataWorks user roles? (number of correct answers: 2)
a. Project administrator
b. Deployer
c. SecOps
d. Geust

12. Which of these are “compute engine” that DataWorks can work with? (number of correct
answers: 3)
a. MaxCompute
b. HBase
c. AnalyticDB
d. E-MapReduce (EMR)

13. Which of these are good reasons to choose standard mode over basic mode, when setting up a
new DataWorks Workspace? (number of correct answers: 2)
a. DataWorks workspace in standard mode can provide more fine-grained user
permissions because it is possible to separate developers from production
b. DataWorks workspaces in standard mode cost less
c. Workspace in standard mode allow developers to test code first before deploying it
into production
d. Code runs faster in standard mode

14. Which statements about DataWorkd workflows are correct? (number of correct answers: 3)
a. Workflows can be scheduled to run at regular intervals (weekly, daily, hourly)
b. Workflows can be triggered manually or via an API call
c. Workflows can be edited or updated and the redeployed, if a change needs to be
made
d. A workflow can only be edited by the DataWorks user that created the workflow
15. You have created a MaxComputed project in Alibab Cloud’s Singapore region, but now need
to move it to Indonesia. Which are viable method for migrating the project? number of correct
answers: 2
a. User the DataWorks cross-project cloning feature to migrate your DataWorks
workspace content, and use data integration to move MaxCompute tables
b. Use DTS (Data Transmission Service)
c. Recreate your workflows and users in a new DataWorks workspace, then use data
integration to migrate your MaxCompute tables
d. Use the export project feature to move both DataWorks and MaxCompute content all at
once

16. Workflows in DataWorks have a type of node called a “Zero-load node” which does no work
but can be used to indicate the start of a workflow or to connect other nodes together.
a. True
b. False

17. You have created a DataWorks workspace and have added several users to the project (each of
which is associated with a RAM user). Which of these statements about the billing for you
DataWorks and MaxCompute usage is correct?
a. Each RAM user will be billed separately for the workflow they create and run
b. All bills will be charged to the Alibaba cloud account that created the DataWorks
workspace
c. You can turn on “spliy billing” to generate separate bills for each RAM user
d. You cannot add RAM users to a DataWorks workspace

18. In DataWorks, what is the difference between standard mode and basic mode?
a. Basic mode workspace contain only one MaxCompute project, while standard mode
contain two (“development” and “production”)
b. Basic mode has fewer features than standard mode
c. Jobs in basic mode do not run as fast as jobs in standard mode
d. Only one user can join a basic mode DataWorks workspace

19. Which functionality does DataWorks provide? (number of correct answers: 3)


a. The ability to create and edit scheduled data processing workflows
b. Monitoring and alarm features for scheduled tasks
c. The ability to import and export data to or from a variety of different data storage
systems like MySQL databases and OSS
d. The ability to to “Undo” a failed workflow by running it in reverse

20. Which of these are best practices when creating a new workflow in DataWorks? (number of
correct answers: 3)
a. If you are in standard mode, run the workflow in the development environment to
test that it works, before deploying to production
b. Run your workflow using a subset of you data, to improve the speed at which it runs
and reduce costs, while still helping you catch errors
c. Avoid making workflows that contain more than 3-4 nodes
d. Implement an approvals process whereby one user develops workflows and another
is responsible for signing off on the workflow and deploying It into production

21. Which of these are techniques you can use to debug DataWorks workflows, if they fail to run?
(number of correct answers: 2)
a. Look at the logs for the failed node(s) in the workflow; they usually contain helpful
information
b. Step through your workflow one node at a time, and use an ad-hoc MaxCompute
SQL node to check the table(s) output by each step to ensure there are no data
quality issues
c. Buy a third-party debugging tool
d. Use the MaxCompute command-line tool to run your node instead of DataWorks

22. Which these of statements about MaxCompute is true? (number of correct answers: 2)
a. MaxCompute charges an up-front monthly fee for data storage
b. maxCompute tables are designed to be appended to, but existing recotd cannot be
updated (SQL UPDATE and DELETE are not supported)
c. MaxCompute has its own SQL language dialect, called “ODPS SQL” or sometimes
“MaxCompute SQL”
d. MaxCompute is the same thing as haddop hive
Start Self-Test - Stage II: Data Warehousing and Data
Processing
A. Stage II: Data Warehousing and Data Processing - SQL for
Beginners
1. Data manipulation language (DMl) is a subset of SQL. Which of the following descriptions
match DML the best?
a. A programming language that is typically used in relational database or data stream
management system
b. A family of computer languages including commands permitting users to manipulate
data in database
c. A computer language used to create and modify the structure of database objects in a
database
d. A set of special commands that deal with the transactions within the database

2. Jason has tow tables with data that are related to each other. He wants to combine these tables
to obtain more insight on the data. Which MySQL keyword does he need to use?
a. Drop
b. Group By
c. Insert
d. Distinct
e. Join

3. Data definition language (DDL) is a subset of SQL. Which of the following descriptions
match DDL the best?
a. A programming language that is typically used in relational database or data stream
management system
b. A family of computer languages including commands permitting users to manipulate data
in database
c. A computer language used to create and modify the structure of database objects in
a database
d. A set of special commands that deal with the transactions within the database

4. What does SQL stand for?


a. Standard Quality Language
b. Structured Query Language
c. Standard Query Language
d. Structured, Quantifiable, Labeled

5. Which of the following statements would you use to obatin ALL record from a table named
“Employees” where the value of “FirstName” is “Peter”
a. Select FirtName from employees where firtsname=”Peter”
b. Select * from employees where firtsname="Peter”
c. Select [ALL] employees where firtsname=”Peter”
d. Select * from employees where firtsname=”Peter”
6. Transaction control langauge (TCL) is a subset of SQL. Which of the following descriptions
match TCL the best?
a. A programming language that is typically used in relational database or data stream
management system
b. A family of computer languages including commands permitting users to manipulate data
in database
c. A computer language used to create and modify the structure of database objects in a
database
d. set of special commands that deal with the transactions within the database

7. What is MySQL?
a. An open-source relational database management system
b. A type of database
c. A type of language that you can use to create and manage databases
d. A program developed by microsoft

8. The Distinct clause can be used to remove duplicate value when querying a table
a. True
b. False

9. The JOIN statement lets us combine tables in MySQL which of the following JOINs are valid
in MySQL? (correct answers 4)
a. INNER JOIN
b. RIGHT JOIN
c. OUTER JOIN
d. LEFT JOIN

10. Which is true in regards to the following statement: SELECT * FROM Employees WHERE
Salary> 5000 OR Department=”Engineering”. (correct answers 2)
a. All rows in which the value of the “Department” column is “Engineering” is
returned
b. All rows in which the value of the “Department” column is not “Engineering” is returned
c. All rows in which the value of the “Salary” column is above above 5,000 returned
d. All rows in which the value of the “Salary” column is below above 5,000 returned
e. All rows in which the value of the "Salary" is above 5,000 AND the value of
“Department” is “Engineering” is returned

B. Stage II: Data Warehousing and Data Processing -


MaxCompute Basic
1. Project space is the boundary of multi-tenant and access control.
a. True
b. False

2. What is the most suitable scenario for MaxCompute application?


a. Commodity trading system
b. Online query system
c. Data warehouse system
d. Multimedia system

3. The computing layer is the core part of MaxCompute?


a. True
b. False

4. MaxCompute uses sandbox mechanism to achieve automatic storage failt tolerance


a. True
b. False

5. Which one is not the MaxCompute feature?


a. automatic storage fault tolerance
b. computation runs in sandbox
c. high concurrent and high throughput of data uploading and downloading
d. large transaction processing

6. MaxCompute is suitable for real-time systems with high requirements


a. True
b. False

7. Which one is not the MaxCompute function?


a. Data compression
b. Data encryption
c. Data uploading
d. Data downloading

8. What is the functional layer of MaxCompute structures?


a. Basic layer
b. Logic layer
c. Storage layer
d. authentication layer

9. The data llifecycle of MaxCompute can be set to field level


a. True
b. False

10. Project space is the basic computing unit of MaxCompute?


a. True
b. False
11. The main function of the MaxCompute logic layer are:
a. Project space management
b. Metadata management
c. Command management
d. Data retrieval
e. Authorization management

12. About history of the development of MaxCompute, which of the following options are the
correct?
a. MaxCompute was formally put into production in 2010
b. MaxCompute 2.0 was published in 2016
c. maxCompute was formaly put into production in 2009
d. maxCompute 2.0 was published in 2016

13. What functions do the MaxCompute computing layer include?


a. Security management
b. Task scheduling
c. Data storage
d. Metadata Management
e. Resource management

14. What are the layers of MaxCompute functional structures?


a. Basic layer
b. Logical layer
c. Computing layer
d. Metadata layer

15. Features of MaxCompute’s product are:


a. Distributed
b. Security
c. Transcational management
d. Automatic storage fault tolerance mechanism
e. Computing runs in sandbox

16. Which one is the incorrect description of table partition?


a. Several fields in the table are used as partition coloumns
b. Partitions of tables can be classified
c. If the partitions to be accessed are specified when you use data, then only
corresponding partitions are read and a full table scan is avoided
d. Several fields are used as partition key and also as coloumn fields for tables

17. Which one is not maxcompute resource type?


a. File type
b. Log type
c. Table type
d. Archive type

18. MaxCompute is suitable for dealing with massive data, and the amount of data reaches
TB, PB, and EB
a. True
b. False

19. MaxCompute sets up a unified data platform, which year did data storage, data security
and data standard unification start?
a. 2009
b. 2010
c. 2011
d. 2012

20. MaxCompute SQL is built with Standard SQL


a. True
b. False

21. MAxCompute’s products have advantages:


a. Large scale computation storage
b. Support SQL, MR, MPI, and algorithms
c. Data security
d. Flexibility in accessing data

22. Which statement are correct for the MaxCompute table?


a. Row and coloumn composition
b. Table are divided into internal tables and external tables
c. Table has no index
d. In MaxCompute, the object of the different types computation task can not be a table

23. What are the components of the MaxCompute logical layer?


a. Runner
b. Worker
c. Scheduler
d. Executor

24. Use can log in to ODPS systems in defferent ways, such as RESTful API, Java
Development tools, ODPS CLT, regardless of the way is adopted, which way will
eventually be converted to login?
a. ODPS SDK
b. ODPS CLT
c. RESTful API
d. Java CLT

25. Which one is the incorrect statement for task?


a. Single SQL query becomes a task
b. A MapReduce program is a task
c. Task is the basic computing unit in MaxCompute
d. All requests in MaxCompute are transformed into tasks

26. Which one is the incorrect description of table partition?


a. Several fields in the table are used as partition coloumns
b. Partitions of tables can be classified
c. If the partitions to be accessed are specified when you use data, then only
corresponding partitions are read and a full table scan is avoided
d. Several fields are used as partitions key and also as coloumn fields for tables

27. Which one is not the instance status of MaxCompute?


a. Cancelled
b. Running
c. Terminated
d. Success

28. MaxCompute uses sandbox mechanism to achieve automatic storage fault tolerance
a. True
b. False

29. Which scenarios are correct for the application of MaxCompute?


a. Building data warehouse system
b. Distributed transaction system
c. Machine learning
d. Data mining

30. a user can only have one project space permission?


a. true
b. false

31. MaxCompute adopt distributed architecture, and clusters can be flexibly expand?
a. True
b. False
Start Self-Test - Stage III: Advanced Data Processing
Tools and Techniques
A. Stage III: Advanced Data Processing Tools and Techniques -
MaxCompute SQL Development
1. MaxCompute SQL syntax does not support between conditional queries.
a. True
b. False

2. Which statement is incorrect for view in MaxCompute SQL?


a. To create a view, you must have ‘read’ privilege on the table referenced by view
b. Other views can be referenced by a view, circular reference is supported
c. Views can only contain one valid ‘select’ statement
d. Writing the data into a view is not allowed, such as, using ‘insert into’ or ‘insert
overwrite’ to operate view

3. Which calculation is incorrect In MaxCompute SQL?


a. If A or B is null, A+B returns NULL, otherwise returns A+B
b. If A or B is null, A*B returns NULL, otherwise returns A*B
c. Axb’like ‘a\%b’=TRUE
d. A % B: if A or B is NULL, return NULL, otherwise return the result of A mod B

4. Suppose table t_dml only has one field named as id (type is string), which MaxCompute SQL
query cannot be executed correctly?
a. Create table t_dml_bak like t_dml
b. Insert into table d_dml select’1900-01-01 00:00:00’ from dual
c. Insert overwrite table t_dml select * from dual
d. Update table t_dml set id=’a’

5. Currently, MaxCompute supports specifying up to 6 small tables in a mapjoin, otherwise


syntax errors are reported, and record for a single small table are limited to no more than
10,000
a. True
b. False

6. Which of the following statements about order by and distribute by / sort by is incorrect in the
MaxCompute SQL syntax
a. The keys of order by/sort/distribute bu must be output coloumns (namelu, column aliases)
of select statement
b. Order by or group by cannot be used together with distribute by
c. When order by is used for sorting, NULL is considered to be zero
d. Distribute by performs hash-based sharding on data by values of certain column. Aliases
of select output column must be used

7. MaxCompute SQL uses a syntax similar to SQL. Which statement is correct?


a. MaxCompute SQL extended standard SQL
b. MaxCompute can be equivalent to a database
c. The maximum SQL length allowed in MaxCompute varies with environment resources
that are applied
d. MaxCompute SQL is suitable for massive data (GB, TB, EB level), of-line batch
calculations scenarios

8. Which of the following MaxCompute SQL syntax is incorrect?


a. Select a.shop_name as ashop, b.shop_name as bshop from shop a right outer join
sale_detail b on a.shop_name=b.shop_name
b. Select a.shop_name as ashop, b.shop_name as bshop from shop a full outer join
sale_detail b on a.shop_name=b.shop_name
c. Select * from table1, table2 where table1.id=table2.id
d. Select a.shop_name as ashop, b.shop_name as bshop from shop a inner join
sale_detail b

9. Which logical operation is incorrect in MaxCompute SQL?


a. NULL and FALSE=FALSE
b. NULL and TRUE=NULL
c. FALSE or TRUE=TRUE
d. TRUE or NULL=TRUE

10. Which statement is incorrect for the dynamics partition in MaxCompute SQL?
a. If the destination table has multi-level partitions, it is allowed to specify parts of partitions
to be static partitions through insert statement, but the statuc partitions must be advanced
partitions.
b. The value of dynamic partition can be special characters
c. In the select statement field, the following field provides a dynamic partition value for the
target table. If the target table has only one-level dynamic partition, the last field value of
select statement is the dynamic partition value of the target table
d. The value of dynamic partition cannot be NULL

11. Which statement of the table life cycle is incorrect?


a. The unit of the life cycle time of a table is day
b. The data of the non partition table will be automatically recycled after setting the
day number of life cycle
c. We can set the life cycle of tables or partitions
d. The partition table determines whether the partition should be recycled according to the
last modification time or each partition

12. When you have multiple tables join, it only allows leftmost table to be a mapjoin table.
a. True
b. False

13. Which of the following statements of mapjoin hint application of MaxCOmpute SQL are
correct? (Number of Correct Answers: 3)
a. When a large table joins one or multiple small tables, you can use MapJoin, which
performs much faster than regular joins
b. When MapJoin references to a small table or sub query, alias should be referenced
c. MaxCompute SQL does not support the ise of complex join conditions such as
unequal expressions, or logic in normal join’s on conditions, but in MapJoin it can
d. When do multiple tables join, the two left table can be MapJoin tables at the same time
14. Which of the following statements are correct?
a. Order by must be used together with limit
b. When sorting with order by, NULL is considered smaller than any value
c. Distribute by is to make hask slices of data according to the values of a certain
columns. It is similar to group by
d. Sort by and order by are all used for sorting in essence, the difference is that the
scope is not same
e. The key of order by or sort by must be the outpit column of the select sentences, that is,
the alias of the column

15. Which of the following MaxCompute SQL syntax statements are correct?
a. JOIN of MaxCompute supports n-way join, but it must be a non cartesion product
b. The indirect expression of MaxCompute’s JOIN must be an equation expression
c. When MapJoin references to a small table or sub query, alias should be referenced,
otherwise ot will report syntax errors
d. Right outer join returns all records in the righ table, even if there is no matched
record in the left table

16. Data type inconsistency is often encountered in business data processin. In order to keep data
type consistent, data processing system involves data type conversion. If MaxCompute SQL
is used for data processing, which of the following conversion can not be achieved?
a. Bigint to string
b. String to Boolean
c. String to bigint
d. Datetime to string

17. Which description of select in MaxCompute SQL is incorrect?


a. When using SELECT to read data from the table, specify the names of the columns to be
read, or use an asterisk (*) to represent all columns
b. When MaxCompute SQL does parsing, order by/sort by/distribute by are in front of
SELECT
c. The where clause of MaxCompute SQL supports between… and conditional query
d. If duplicated data rows exist, you can use the distinct option before the field to
remoce duplicates in this case, only one value is returned

18. Which statement is incorrect for partition in MaxCompute SQL?


a. To modify values in one or more partitions among multi-level partitions, users must write
values for partitions at each level
b. We cannot specify order for a new column, by default, a new column is placed in the last
column
c. The name of partition column can be modified
d. For tables that have multi-level partitions, to add a new partition, all partition values must
be specified

19. Which of the following limitations in MaxCompute SQL are correct?


a. Table name length
b. Table column definition
c. Table partition level and single table partition number
d. Windows function number
e. Table name and field definition case
20. Which join operations in MaxCompute SQL are correct?
a. MaxCompute’s JOIN supports multiple links, and it also supports cartesian product
b. Left join returns all records from the left table
c. Right outer join returns all records from the right table
d. Full outer join indicates the full join and returns all records from the both left and
right table

21. Which of the following descriptions about the MaxCompute SQL constraint conditions are
correct?
a. MaxCompute SQL does not support transactions
b. MaxCompute SQL does not support indexes
c. MaxCompute SQL does not support delete operations
d. MaxCompute SQL does not support update operations

22. Group by is for group query in MaxCompute SQL. Which of the following query of group by
are correct?
a. Generally group by and aggregate function are used together
b. The key of group by can be the column name of the input table
c. When SELECT contains aggregate function, group by can be an expression
consisting of columns of input tables
d. When SELECT contains aggregate function, group by can be the alias of the output
column of the SELECT statement

23. MaxCompute SQL provides EXPLAIN operation. What does the execution result include?
a. All resource structures corresponding to the DML sentence
b. The dependency structure of all task corresponding to the DML sentence
c. The dependency structure of all operator in task
d. The dependency structure of all task in task

24. When you use the MapJoin, which one is incorrect?


a. The left table of ‘left outer join’ must be a big table
b. The right table of ‘right outer join’ must be a big table
c. For INNER JOIN, both the left and right table can be larges tables
d. For FULL OUTER JOIN, MapJoin can be used

25. The source and pattern parameters of like and rlike must be string types or integer
a. True
b. False

26. Which statement is incorrect when updating data by MaxCompute SQL?


a. When performing insert operations, the correspondence between the source table and the
target table depens on the column order in the select clause, not on the correspondence
between the column names of the tables
b. The value of dynamic partition cannot be NULL, but it supports special or Chinese
characters
c. Partitioned columns are not allowed to appear in the select column list when data is
inserted into a partition
d. In the select statement field, the following field provides a dynamic partition value for the
target table. If the target table has only one-level dynamic partition, the last field value of
select statement is the dynamic partition value of the target table
27. Which UNION ALL statements in MaxCompute SQL are correct? (Number of Correct
Answers: 3)
a. Combines two or multiple data sets returned by a select operation into one data set.
If the result contains duplicated rows, all rows that meet the conditions are returned,
and deduplication of duplicated rows is not applied
b. MaxCompute does not support union two main quert results, but you can do it on
two subquery results
c. The columns of each sub query corresponding to the union all operation must be listed,
not supporting *
d. The number, names, and types of queried columns corresponding to the UNION
ALL/UNION operation must be consistent

28. Which the following queries are can be executed?


a. Select sum(total_price) from sale-detail group by region;
b. Elect region as r from sale_detail group by r;
c. Select region, total_price from sale_detail group by region, total_price
d. Select region as r from sale_detail order by region limit 100;
e. Select region as r from sale_detail distribute by region;

29. During MaxCompute SQL parsing, order by/sort by /distribute by is behind of select
operation
a. True
b. False

B. Belum Stage III: Advanced Data Processing Tools and


Techniques - MaxCompute User Define Function
1. When developing UDF in MaxCompute, the corresponding date type and return data type in
java objects, and the initial letter must be capitalized
a. True
b. False

2. Which limitation is not related to UDTF?


a. No other expressions are allowed in the same SELECT clause
b. No other expressions are allowed in the same SELECT clause
c. It can be used in where filtering conditions
d. No support use with distribute by together in same SELECT clause

3. When define java UDF, MaxCompute supports using writable type as parameter and return
value, eg string maps to text, map maps to struct
a. True
b. False

4. Which statement is incorrect for UDF in MaxCompute?


a. UDAF (User Defined Aggregation Function) input and output are many-for-one
b. User Definaed Scalar Function input and output are one-to-one or many-for-one
c. UDTF (User Defined Table Valued Function) input and output are many-for-one
d. User Defined Scalar Function input and output are one-to-one

5. Which correspondence is incorrect between MaxCompute data type and java data type?
a. Tinyint maps to java.lang.Byte
b. Smallint maps to java.lang.Short
c. Decimal maps to java.lang.BigDecimal
d. TimeStamp maps to java.lang.TimeStamp

6. Which one is not included in MaxCompute user-defined functions?


a. UDF
b. UDAF
c. UDGF
d. UDTF

7. The way which UDF is used in MaxCompute is different from the common built-in functions
in MaxCompute SQL
a. True
b. False

8. User Defined Scalar Function, input and output is one to one relationship, that is, read a row
data, write an output value
a. True
b. False

9. Which statement is incorrect for user-defined function in MaxCompute?


a. Like resource file, the same name funcation can only be registered once
b. The owner of the project space has the right to overwrite the system built-in
function
c. User_built funcation of ordinary user can not overwrite the system built-in function
d. Once the function is unregistered, the resource are deleted too

10. Which UDTF implementation logic statements are correct?


a. To implement UDF, we need to inherit the com.aliyun.odps.udf.UDTF class
b. To implement UDF, we need to inherit the com.aliyun.odps.udf.UDTF class\
c. .@Resolvle() defines the function input/output parameters data type
d. When invoking UDTF, the input parameter can be not consistent with @Resolve
definiton

11. Which the following UDAF statements in MaxCompute are correct?


a. A plurality of input records can be aggregated into one output value, and then output it\
b. It does not support the use with group by together in the same select clause
c. Can’t use with group by together in SQL
d. The syntax of UDAF in SQL is the same as that of common built_in aggregate
function

12. Which UDAF implementation logic statements are correct?


a. The main logic of UDAF relies on these three interfaces: ‘iterate’,’merge’,and
‘terminate’
b. To implement UDAF we need to inherit the com.aliyun.odps.udf.Aggregator class
c. Need to implement interfaces, eg setup, newBuffer, iterate, terminate, merge, close and so
on
d. Don’t support user-defined writable buffer

13. Which of the following ways of UDTF usage in SQL are correct? (Answers 3)
a. Select user_udtf)col0,col1,col2) as (c0,c1) from my_table;
b. Select user_udtf(col0,col1.col2) as (c0,c1),col3 from my_table
c. Select user_udtf(col0,col1,col2) as (c0,c1) from (select * from my_table distribute by
key sort by key) t;
d. Select reduce_udtf(col0,col1) as (c0,c1) from (select col1,col2 from (select
map_udtf(a0,a1,a2,a3) as (col0, col1, col2) from my_table) t1 distribute by col0 sort
col0, col1) t2

14. ‘myudf_vertical’ is a UDTF. Which of the following usages are incorrect?


a. SELECT myudft_vertical(name,score) as (name,score) from t_udtf group by
name,score
b. SELECT myudf_vertical(name,score) as (name,score) from t_udtf;
c. SELECT myudf_vertical(myudf_vertical(name,score) as (name,score)) from t_udtf;
d. SELECT 1, myudf_vertical(name,score) as (name,score) from t_udtf

15. Which UDF implementation logic statement are correct?


a. To implement UDF, the class ‘cpm.aliyun.odps.udtf.UDF’ must be inherited and the
‘evaluate’ method must be applied
b. The parameter type and return value type of evaluate method is considered as UDF
signature in SQL
c. To call UDF, the framework must match the correct evaluate method according to
the parameter type called by UDF
d. The ‘evaluate’ method must be a static public method
e. User can implement multiple evaluate methods in UDF

16. The NULL value in SQL is represented by a NULL reference in Java; therefore, ‘java
primitive type’ is not allowed because it cannot represent a NULL value in SQL
a. True
b. False

17. UDTF (User Defined Table Valued Function) is used to solve scenario which output multi-
line data by a function hall, it is also the only UDF which can return multiple fields
a. True
b. False

18. Array type in MaxCompute maps to java array


a. True
b. False

19. Which one is an incorrect method when Java UDF uses complex types?
a. UDTF through @Resolve annotation to specify the signature
b. UDAF through evaluate signature to map UDF input/output type
c. UDF through evaluate signature to map UDF input/output type
d. UDAF through @Resolve annotation to get the signature

20. Which the following select statements does not work properly in MaxCompute?
a. Select myudf_lower(name) from t_test;
b. Select 2, myudf_vertical(name,score) as (name,score) from t_udtf;
c. Select * from t_test where myudf_lower(myudf_lower(name)) = ‘udfff’;
d. Select AggrAvg(score) from t_udaf;
21. UDF output a return value at a time. UDTF can output more than two records at one time
a. True
b. False

22. Which statement is incorrect for UDF debug?


a. It can be tested in two ways: unit test and local execution
b. Need to specify the running data source when run UDF locally
c. UDF/UDAF/UDTF typically works on some columns of the table in the select clause, and
needs to configure MaxCompute project, table, and column when run local test
d. Warehouse is built locally to store tables (including meta and data) or resource for
executing UDF locally the project name, tables, table ma,e and sample data are
under the warehouse directory in order to

23. For UDAF (User Defined Aggregation Function), input and output is many-for-one
relationships
a. True
b. False

24. Which of the following UDTF (user Defined function) statements in MaxCompute are
correct?
a. Other expressions are allowed in the same select clause
b. Solving the problem of exporting multiple rows and multiple columns data scenarios
by one function call
c. The only user-defined function that can return multiple fields
d. Cannot be nested

25. Which of the following UDF (user Defined function) statements in MaxCompute correct?
a. Function input and output are one to one
b. Return a scalar value of a specified type
c. Can not be used with other functions
d. It can be used in WHERE filtering conditions

26. myudf_lower()’ is a UDF (user Defined Function), table is ‘t_test (name string)’. Which of
the following statements are correct?
a. Select myudf_lower(name) from t_test;
b. Select * from t_test where myudf_lower(‘Aaa’)=name;
c. Select * from t_test where myudf_lower(name) =’odps’
d. Select * from t_test where myudf_lower(myudf_lower(name)) = ‘zzzz’

27. Java UDF supports Bigint,String,Double,Boolean, ARRAY,MAP,STRUCT and so on


a. True
b. False

28. Which statement is incorrect when you use Java UDF?


a. The code is added to the MaxCompute through the form of resources
b. Java UDF must be package as Jar format
c. UDF framework can load jar packets automatically
d. After adding the jar package, this UDF is ready for use

29. Which UDAF implementation logic statement are correct?


a. The main logic of UDAF relies on these three interfaces: ‘iterate’,’merge’, and
‘terminate’
b. To implement UDAF we need to inherit the com.alitun.odps.udf.Aggregator class
c. Need to implement interfaces. Eg setup, newBuffer, iterate, terminate, merge, close
and so on
d. Don’t support user-defined writable buffer

30. Which of the following UDF (User Defined Function) statement in MaxCompute are correct?
a. Function input and output are one to one
b. Return a scalar value of a specified type
c. Can not be used with other functions
d. It can be used in where filtering conditions
Start Self-Test - Stage IV: Visualization, Machine
Learning, and AI
A. Stage IV: Visualization, Machine Learning, and AI – QuickBI
1. In one dashboard you can added more than one chart and the charts can be different types
a. incorrect
b. correct

2. only one chart type can be used in one GUI report. This is statement?
a. Incorrect
b. Correct

3. Which chart can be selected for distribution analysis on datasets, based on the number of
variables, specific demand, and so on?
a. Conversion chart
b. Scatter plot
c. Line chart
d. Bubble chart

4. In QuickBI, a portal is also called a data product, which is a set dashboards that contain?
a. Menus
b. Template
c. External link
d. iFrame

5. MaxCompute can be used as quickBI data source


a. Correct
b. Incorrect

6. Which of the following function quickBI doesn’t provide so far?


a. Report display
b. Ad-hoc query
c. Data acquisition
d. Portal integration

7. A dataset contains the following items: time, region, sales volume. Which of the following is
the best choice to visualize this information in one chart?
a. Bubble chart
b. Tree chart
c. Pie chart
d. Radar chart

8. Column chart only shows data of two dimensions, such as: time, transaction value?
a. Incorrect
b. Correct

9. Which is better choice for displaying the progress of current sales amount against to annual
KPI target?
a. Dashboard
b. Radar chart
c. Pie chart
d. Polar chart

10. Which is a correct description of the differences between quickBI’s tree map and tree chart?
a. Tree maps can show the rations of members at the same level by area, but tree chart
cannot
b. Different from a tree map, a tree chart requires that each of its branches has the same
depth
c. Tree maps are not as widely used as tree charts

11. A dataset (father height, son height) describes the father height and son height. Which chart
are suitable for displaying whether the two are correlated?
a. Scatter plot
b. Line chart
c. Tree chart
d. Column chart

12. QuickBI currently does not support charts such as? (number of correct answers: 3)
a. Bubble chart
b. pyramid chart
c. tornado chart
d. bar chart
e. donut chart

13. which of the following geo charts does QucikBI provide? (number of correct answers: 2)
a. geo bubble chart
b. color geo chart
c. point plotting geo chart
d. navigation geo chart
e. vector geo chart

14. which of the following charys are related to business procedures? (number of correct answers:
2)
a. conversion chart
b. funnel chart
c. bar chart
d. scatter chart
e. tree chart

15. which of the following are variants of the pie chart? (number of correct answers: 2)
a. 3D pie chart
b. Donut chart
c. Dashboard
d. Funnel chart
e. Radar chart

16. Which of the following is wrong about line charts?


a. a line chart displays multiple dimensions but only one measure
b. a line chart is also known as a broken line graph
c. a line char can display big datasets
d. when using a line chart to show trends, an ordinal dependent variable must be
included

17. No specific standards exist to judge whether a graphical report is good or bad. However, good
reports have common characteristics, which of following options is not a good one?
a. Intuitive, easy to understand
b. Highlighted key information
c. Complex, nice-looking
d. Proper display monner

18. When you update your datasource, the new file can use different schema comparing the
original one
a. Incorrect
b. Correct

19. E-commerce ABC cares about the conversion rate defined as the ratio of number of users
placing and playing orders to number of users visiting their website. They want to analyze the
difference in conversion rate by gender. Which chart can properly help them with this
analysis?
a. Tornado chart
b. Funnel chart
c. Pie chart
d.
e. Tree chart

20. The dashboard you created will be public readable by default


a. Correct
b. Incorrect

21. A column chart only shows data of two dimensions, such as: time, transaction valume?
a. Incorrect
b. Correct

22. Which of the following charts are suitable for displaying hierarchies?
a. Tree chart
b. Tree chart
c. Conversion chart
d. Geo bubble chart
e. Dashboard

23. Which of the following charts use areas to display the magnitude of metric values? ((number
of correct answers: 2)
a. Pie chart
b. Polar chart
c. Tree chart
d. Flunnel chart
e. Conversion chart
f. Color geo chart

24. Which of the following charts are suitable for displaying large datasets? (number of correct
answers: 2)
a. Scatter plot
b. Line chart
c. Column chart
d. Radar chart
e. Flunnel chart

25. Quick BI supports multiple data sources, which of the following is not included?
a. Local CSV files
b. OSS
c. MaxCompute
d. Local Excel files

26. Quick BI basic version support generating customized portal function


a. Incorrect
b. Correct

27. A column chart only shows data of two dimensions, such as: time, transaction volume?
a. Incorrect
b. Correct

28. A dataset includes two attributes (province, number of customers). Use a chart to clearly
demonstrate the number of customers by province. Which of the following charts are not a
good choice for this scenario? (number of correct answers: 2)
a. Conversion chart
b. Card
c. Word cloud
d. Polar chart
e. Column chart

B. Stage IV: Visualization, Machine Learning, and AI - Machine


Learning Platform for AI
1. Which of the following descriptions of PAI is correct?
a. machine learning platform for AI provides end-to-end machine learning services,
including data processing, feature engineering, model training, model prediction, and
model evaluation
b. machine learning platform for AI provides a visualized web interface allowing you create
experiments by dragging and dropping different components to the canvas
c. using machine learning servitization, machine learning platform for AI allows you to
create a complete workflow for enterprise level machine learning data modelling and
application
d. the infrastructure of machine learning platform for AI relies on Alibaba Cloud distributed
computing clusters. This allows machine learning platform for AI to handle a large
number of concurrent algorithm computing tasks
e. all above

2. How to set the algorithm parameters in PAI?


a. XML
b. JSON
c. CSV
d. XLS

3. DSW2.0, the new version directly uses Alibaba cloud instances


a. Numeric
b. Boolean
c. 1 or 0
d. Just keep the original format

4. Machine learning platform for AI is billed on ____ basis


a. Pay-As-You-Go
b. Subcription

5. The architecture of Machine Learning Platform for AI is divided into five layers, which of the
following statements is correct?
a. Infrastructure layer: includes CPU, GPU, field programmable gate array (FPGA), and
neural network processing unit (NPU) resources.
b. Computing framework layer: includes Alink, TensorFlow, PyTorch, Caffe, MapReduce,
SQL, and Message Passing Interface (MPI). You can run distributed computing tasks in
these frameworks
c. Business layer: Machine learning platform for AI is widely used in finance, medical care,
education, transportation, and security sectors. Search systems, recommendation systems,
and financial service systems of Alibaba cCloud all use Machine learning platform for AI
to explore data values for making informed business decisions
d. Machine platform for AI streamlines the workflows of machine learning, including data
preparation, model creation and training, and model deployment
e. All above

6. PAI cloud automate and optimize AI algorithm, so the Algorithm experts could focus on the
process of modelling rather than the engineering part
a. True
b. False

7. Machine learning platform for AI provides end-to-end machine learning services, including
data processing, feature engineering, model training, model prediction, and model evaluation.
Machine learning platform for AI combined all of these services to make AI more accessible
than ever
a. True
b. False

8. The computing result of the entire machine learning process on PAI can NOT be visually
displayed
a. True
b. False

9. In the compilation and optimazation framework, PAI has added support for pytorch, and we
have also expanded support for more hardware, such as GPU, CPU, and other ASIC
a. True
b. False

10. On DLC platform, we can reveal our optimazation workload on the back-end engine in the
Kubernetes environment
a. True
b. False

11. Which of the following benefits of deep learning container are correct? (number of correct
answers: 3)
a. Cloud native
b. Elastic
c. Cheap
d. Easy to use

12. Which of the following development type is are supported by DSW 2.0? (number of correct
answers: 3)
a. Jupyterlab interactive programming
b. Webide mode
c. Terminal command line
d. Notebook

13. Machine learning platform for AI (PAI) provides text processing components for NLP,
including _______ (number of correct answers: 5)
a. Word splitting
b. Depreceated word filtering
c. LDA
d. TF-IDF
e. Text summarization
f. OpenCV

14. PAI-Essy series deep learning toolkit includes? (number of correct answers: 3)
a. EasyTranfer (NLP)
b. EasyViion (CV)
c. EasyRL (reinforcement learning)
d. Shogun

15. Three deep learning frameworks will be supported by PAI, they are ____.(number of correct
answers: 3)
a. TensorFlow
b. Caffe
c. MXNet
d. Spark MIIib

You might also like