0% found this document useful (0 votes)
336 views7 pages

QCM Bigdata 1 Exampdf

The document contains questions about tools and concepts related to big data, Hadoop, and related technologies. Some key points addressed include HDFS for data security across Hadoop, user defined functions in Hive to promote code reuse, the external keyword for creating non-managed tables in Big SQL, impersonation in Big SQL for secure access on behalf of other users, and the permalink always pointing to the most recent version when sharing a notebook.

Uploaded by

younes khouna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views7 pages

QCM Bigdata 1 Exampdf

The document contains questions about tools and concepts related to big data, Hadoop, and related technologies. Some key points addressed include HDFS for data security across Hadoop, user defined functions in Hive to promote code reuse, the external keyword for creating non-managed tables in Big SQL, impersonation in Big SQL for secure access on behalf of other users, and the permalink always pointing to the most recent version when sharing a notebook.

Uploaded by

younes khouna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

You need to monitor and manage data security across a Hadoop platform.

Which tool would you


use?
a. HDFS
b. Hive
c. SSL
d. Apache Ranger
Which type of function promotes code re-use and reduces query complexity?

a. Scalar
b. OLAP
c. User defined
d. Built in
You need to create a table that is not managed by the Big SQL database manager. Which keyword
would you use to create the table?
a. boolean
b. string
c. external
d. smallint
Which feature allows the big SQL user to securely access data in Hadoop on behalf of another
user?
a. Schema
b. impersonation
c. rights
d. privilege
When sharing a notebook, what will always point to the most recent version of the notebook?

a. Watson Studio homepage


b. The permalink (URL)
c. The Spark service
d. PixieDust visualization
Which statement describes "Big Data" as it is used in the modern business world?

a. The summarization of large, indexed data stores to provide information about potential
problems or opportunities.
b. Indexed databases containing very large volumes of historical data used for compliance
reporting purposes
c. Non-conventional methods used by businesses and organizations to capture, manage,
process, and make sense of a large volume of data.
d. Structured data stores containing very large data sets such as video and audio streams.
Which two descriptions are advantages of Hadoop?
a. intensive calculations on small amounts of data
b. processing random access transactions
c. processing a large number of small files
d. able to use inexpensive commodity hardware
e. processing large volumes of data with high throughput
Which statement is true about the Hadoop Distributed File System (HDFS)?
a. HDFS is a software framework to support computing on large clusters of computers.
b. HDFS is the framework for job scheduling and cluster resource management.
c. HDFS provides a web-based tool for managing Hadoop clusters.
d. HDFS links the disks on multiple nodes into one large file system.

You need to define a server to act as the medium between an application and a data source

in a big SQL federation. Which command would you use?

a. SET AUTHORIZATION
b. CREATE WRAPPER
c. CREATE NICKNAME
d. CREATE SERVER
What are three examples of Big Data?

a. messages tweeted on Twitter


b. bank records
c. photos posted on Instagram
d. web server logs
Which command would you run to make a remote table accessible using an alias?

a. CREATE NICKNAME
b. CREATE SERVER
c. SET AUTHORIZATION
d. CREATE WRAPPER
In Big SQL, what is used for table definitions, location, and storage format of input files?
a. Ambari
b. Scheduler
c. Hadoop Cluster
d. The Hive Metastore
What are three examples of "Data Exhaust"?
a. browser cache
b. video streams
c. banner ads
d. log files
e. cookies
f. JavaScript
Which Hortonworks Data Platform (HDP) component provides a common web user interface for
applications running on a Hadoop cluster?
a. Ambari
b. HDFS
c. YARN
d. MapReduce
What Python statement is used to add a library to the current code cell?
a. using
b. pull
c. import
d. load
Which two are examples of personally identifiable information (PII)?
a. Email address
b. Medical record number
c. IP address
d. Time of interaction
Which component of the HDFS architecture manages storage attached to the nodes?
a. NameNode
b. MasterNode
c. DataNode
d. StorageNode
Which component of the HDFS architecture manages the file system namespace and metadata
a. NameNode
b. SlaveNode
c. WorkerNode
d. DataNode
Which type of foundation does Big SQL build on?
a. RStudio
b. Jupyter
c. Apache HIVE
d. MapReduc
How does MapReduce use ZooKeeper?
a. Coordination between servers.
b. Aid in the high availability of Resource Manager.
c. Server lease management of nodes.
d. Master server election and discovery
What is the default data format Sqoop parses to export data to a database?

a. JSON
b. CSV
c. XML
d. SQL
Under the HDFS storage model, what is the default method of replication?

a. 3 replicas, 2 on the same rack, 1 on a different rack


b. 4 replicas, 2 on the same rack, 2 on a separate rack
c. 3 replicas, each on a different rack
d. 2 replicas, each on a different rack
e. 4 replicas, each on a different rack

What is meant by data at rest?

a. A file that has been processed by Hadoop.


b. A file that has not been encrypted.
c. Data in a file that has expired.
d. A data file that is not changing
What is the term for the process of converting data from one "raw" format to another format making

it more appropriate and valuable for a variety of downstream purposes such as analytics and that

allows for efficient consumption of the data?

a. MapReduce
b. Data mining
c. Data munging
d. YARN
The Big SQL head node has a set of processes running. What is the name of the service ID running

these processes?

a. user1
b. bigsql
c. hdfs
d. Db2
When sharing a notebook, what will always point to the most recent version of the notebook?

a. Watson Studio homepage


b. The permalink (URL)
c. The Spark services
d. PixieDust visualization
Which is the primary advantage of using column-based data formats over record-based formats?

a. facilitates SQL-based queries


b. faster query execution
c. supports in-memory processing
d. better compression using GZip

Which hortonworks data platform (HDP) component provides a common web user interface for
applications running on a hadoop cluster?
a. Ambari
b. HDFS
c. YARN
d. MapReduce
Which file format contains human-readable data where the column values are separated by a
comma?
a. Parquet
b. ORC
c. Sequence
d. Delimited
Which file format has the highest performance?

a. ORC
b. Sequence
c. Delimited
d. Parquet
Which two of the following are column-based data encoding formats?

a. ORC
b. JSON
c. Parquet
d. Flat
e. Avro
What is the default number of rows Sqoop will export per transaction?

a. 100,000
b. 1,000
c. 100

Which statement describes the action performed by HDFS when data is written to the Hadoop
cluster?
a. The data is spread out and replicated across the cluster.
b. The MasterNodes write the data to disk.
c. The data is replicated to at least 5 different computers.
d. The FsImage is updated with the new data map. i think
Which two are use cases for deploying ZooKeeper?
a. Managing the hardware of cluster nodes.
b. Storing local temporary data files.
c. Simple data registry between nodes.
d. Configuration bootstrapping for new nodes.
What is one disadvantage to using CSV formatted data in a Hadoop data store?
a. Data must be extracted, cleansed, and loaded into the data warehouse.
b. It is difficult to represent complex data structures such as maps.
c. Fields must be positioned at a fixed offset from the beginning of the record.
d. Columns of data must be separated by a delimiter.
Which environmental variable needs to be set to properly start ZooKeeper?

a. ZOOKEEPER_HOME
b. ZOOKEEPER_DATA
c. ZOOKEEPER_APP
d. ZOOKEEPER
What is the primary purpose of Apache NiFi?
a. Identifying non-compliant data access.
b. Finding data across the cluster. –
c. Connect remote data sources via WiFi.
d. Collect and send data into a stream
Under the MapReduce v1 architecture, which element of the system manages the map and reduce
functions?
a. TaskTracker
b. JobTracker
c. StorageNode
d. SlaveNode
e. MasterNode

Under the MapReduce v1 (or classic) architecture, the JobTracker is responsible for managing and

coordinating the map and reduce functions. It tracks the progress of all the submitted MapReduce

jobs, schedules tasks on TaskTrackers, and ensures that tasks are executed successfully. The

TaskTrackers are responsible for running individual map and reduce tasks on cluster nodes

Which command is used to populate a Big SQL table?

Load

Which statement describes the purpose of Ambari?

It is used for provisioning, managing, and monitoring Hadoop clusters.


What ZK CLI command is used to list all the ZNodes at the top level of the ZooKeeper hierarchy, in

the ZooKeeper command-line interface?

Is
What must be done before using Sqoop to import from a relational database?

$SQOOP_HOME/lib

What is the default number of rows Sqoop will export per transaction?

1000

Which of the "Five V's"of Big Data describes the real purpose of deriving business insight from Big
Data?
Value

Which Spark RDD operation returns values after performing the evaluations?

Actions

Which two of the following are row-based data encoding formats?

a. avro
b. csv
Which element of Hadoop is responsible for spreading data across the cluster?

MapReduce

Under the MapReduce v1 programming model, what happens in the "Map" step?

Input is processed as individual splits.

What OS command starts the ZooKeeper command-line interface?

zkCli.s

You might also like