BDA Obj Questions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

BDA Objective Questions

1. Which of the following command sets the value of a particular configuration variable (key)?
a) set -v
b) set <key>=<value>
c) set
d) reset
2. Point out the correct statement.
a) Hive Commands are non-SQL statement such as setting a property or adding a resource
b) Set -v prints a list of configuration variables that are overridden by the user or Hive
c) Set sets a list of variables that are overridden by the user or Hive
d) None of the mentioned
3. Which of the following operator executes a shell command from the Hive shell?
a) |
b) !
c) ^
d) +
4. Which of the following will remove the resource(s) from the distributed cache?
a) delete FILE[S] <filepath>*
b) delete JAR[S] <filepath>*
c) delete ARCHIVE[S] <filepath>*
d) all of the mentioned
5. Point out the wrong statement.
a) source FILE <filepath> executes a script file inside the CLI
b) bfs <bfs command> executes a dfs command from the Hive shell
c) hive is Query language similar to SQL
d) none of the mentioned
6. ________ is a shell utility which can be used to run Hive queries in either interactive or batch
mode.
a) $HIVE/bin/hive
b) $HIVE_HOME/hive
c) $HIVE_HOME/bin/hive
d) All of the mentioned
7. Which of the following is a command line option?
a) -d,–define <key=value>
b) -e,–define <key=value>
c) -f,–define <key=value>
d) None of the mentioned
8. Which is the additional command line option is available in Hive 0.10.0?
a) –database <dbname>
b) –db <dbname>
c) –dbase <<dbname>
d) All of the mentioned
9. The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc
and $HOME/.hiverc as _______ files.
a) processing
b) termination
c) initialization
d) none of the mentioned
10. When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters _______
mode.
a) Batch
b) Interactive shell
c) Multiple
d) None of the mentioned
11. The results of a hive query can be stored as
a) Local File
b) HDFS File
c) Both the above
d) Can not stored

12. If the database contains some tables then it can be forced to drop without dropping the tables
by using the keyword
a) RESTRICT
b) OVERWRITE
c) F DROP
d) CASCADE
13. Users can pass configuration information to the SerDe using
a) SET SERDEPRPERTIES
b) WITH SERDEPRPERTIES
c) BY SERDEPRPERTIES
d) CONFIG SERDEPRPERTIES
14. The property set to run hive in local mode as true so that it runs without creating a mapreduce
job is
a) hive.exec.mode.local.auto
b) hive.exec.mode.local.override
c) hive.exec.mode.local.settings
d) hive.exec.mode.local.config
15. Which kind of keys(CONSTRAINTS) Hive can have?
a) Primary Keys
b) Foreign Keys
c) Unique Keys
d) None of the above
16. What is the disadvantage of using too many partitions in Hive tables?
a) It slows down the namenode
b) Storage space is wasted
c) Join quires become slow
d) All of the above
17. The default delimiter in hive to separate the element in STRUCT is
a) '\001'
b) '\oo2'
c) '\oo3'
d) '\oo4'
18. By default when a database is dropped in Hive
a) The tables are also deleted
b) The directory is deleted if there are no tables
c) The HDFS blocks are formatted
d) None of the above
19. The main advantage of creating table partition is
a) Effective storage memory utilization
b) Faster query performance
c) Less RAM required by namenode
d) Simpler query syntax
20. If the schema of the table does not match with the data types present in the file containing the
table then Hive
a) Automatically drops the file
b) Automatically corrects the data
c) Reports Null values for mismatched data
d) Does not allow any query to run on the table
21. A view in Hive can be seen by using
a) SHOW TABLES
b) SHOW VIEWS
c) DESCRIBE VIEWS
d) VIEW VIEWS
22. If an Index is dropped then
a) The underlying table is also dropped
b) The directory containing the index is deleted
c) The underlying table is not dropped
d) Error is thrown by hive
23. Which file controls the logging of Mapreduce Tasks?
a) hive-log4j.properties
b) hive-exec-log4j.properties
c) hive-cli-log4j.properties
d) hive-create-log4j.properties
24. What Hive can not offer
a) Storing data in tables and columns
b) Online transaction processing
c) Handling date time data
d) Partitioning stored data
25. To see the partitions keys present in a Hive table the command used is
a) Describe
b) Describe extended
c) Show
d) Show Extended
26. For optimizing join of three tables, the largest sized tables should be placed as
a) The first table in the join clause
b) Second table in the join clause
c) Third table in the join clause
d) Does not matter
27. Which of the following hint is used to optimize the join queries
a) /* joinlast(table_name) */
b) /* joinfirst(table_name) */
c) /* streamtable(table_name) */
d) /* cacheable(table_name) */
28. Calling a unix bash script inside a Hive Query is an example of
a) Hive Pipeline
b) Hive Caching
c) Hive Forking
d) Hive Streaming
29. Hive uses _________ for logging.
a) logj4
b) log4l
c) log4i
d) log4j
30. HiveServer2 introduced in Hive 0.11 has a new CLI called
a) BeeLine
b) SqlLine
c) HiveLine
d) CLilLine
31. In which mode HiveServer2 only accepts valid Thrift calls.
a) Remote
b) HTTP
c) Embedded
d) Interactive
32. Which of the following data type is supported by Hive?
a) map
b) record
c) string
d) enum
33. Which of the following is not a complex data type in Hive?
a) Matrix
b) Array
c) Map
d) Struct
34. Each database created in hive is stored as
a) A file
b) A directory
c) A HDFS block
d) A jar file
35. When a partition is archived in Hive it
a) Reduces space through compression
b) Reduces the length of records
c) Reduces the number of files stored
d) Reduces the block size
36. When a Hive query joins 3 tables, How many mapreduce jobs will be started?
a) 0
b) 1
c) 2
d) 3
37. The reverse() function reverses a string passed to it in a Hive query. This is an example of
a) Standard UDF
b) Aggregate UDF
c) Table Generating UDF
d) None of the above
38. Hive can be accessed remotely by using programs written in C++, Ruby etc, over a single port.
This is achieved by using
a) HiveServer
b) HiveMetaStore
c) HiveWeb
d) Hive Streaming
39. The thrift service component in hive is used for
a) Moving hive data files between different servers
b) Use multiple hive versions
c) Submit hive queries from a remote client
d) Installing hive
40. The query “SHOW DATABASE LIKE ‘h.*’ ; gives the output with database name
a) Containing h in their name
b) Starting with h
c) Ending with h
d) Containing 'h.'
41. The tables created in hive are stored as
a) A file under the database directory
b) A subdirectory under the database directory
c) A .java file present in the database directory
d) A HDFS block containing the database directory
42. Besides the JDBC driver, sqoop also needs which of the following to connect to remote
databases?
a) A – Putty
b) SSH
c) Conenctor
d) sqoop client
43. What option can be used to import the entire database from a relational system using sqoop?
a) --import-all-db
b) - --import-all-tables
c) - --import-all
d) - --import
44.The parameter to give a custom name to the mapreduce job running a sqoop import command
is −
a) --sqoop-job-name
b) --map-job-name
c) --mapreduce-job-name
d) --rename-job
45. The export and import of data between sqoop and relational system happens through which of
the following programs?
a) Sqoop client program
b) Mapreduce job submitted by the sqoop command
c) Database stores procedure
d) Hdfs file management program
46. Using the –staging-table parameter while loading data to relational tables the creation of staging
table is done
a) Automatically b sqoop
b) Automatically by database
c) User has to ensure it is created
d) Automatically created by a Hadoop process beyond sqoop
47. When using –update-mode allowinsert parameter with oracle database the feature of oracle used
by sqoop is

a) UPSERT statement
b) MERGE statement
c) MULTITABLE INSERT statement
d) BULK LOAD statement

48. What is the disadvantage of using the –columns parameter to insert a subset of columns to the
relational table?
a) The relational table may have not null columns not covered in the –columns
parameter.
b) The relational table may store the data from HDFS in wrong columns.
c) It may not load all the required data
d) It will not be able to populate primary key values
49. The temporary location to which sqoop moves the data before loading into hive is specified by
the parameter
a) --target-dir
b) --source-dir
c) --hive-dir
d) --sqoop-dir

50. The parameter that can create a hbase table using sqoop when importing data to hbase is
a) -hbase-create-table
b) -create-hbase-table
c) -create-hbase-table-columnlist
d) -create-hbase-table-rowkey
51. The comparison of row counts between the source system and the target database while loading
the data using sqoop is done using the parameter

a) –Validate
b) –Rowcount
c) -row(count)
d) –allrows
52. _________ tool can list all the available database schemas.
a) sqoop-list-tables
b) sqoop-list-databases
c) sqoop-list-schema
d) sqoop-list-columns
53. Point out the correct statement.
a) The sqoop command-line program is a wrapper which runs the bin/hadoop script
shipped with Hadoop
b) If $HADOOP_HOME is set, Sqoop will use the default installation location for Cloudera’s
Distribution for Hadoop
c) The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the
$HADOOP_CONF_DIR environment variable is unset
d) None of the mentioned
54. Data can be imported in maximum ______ file formats.
a) 1
b) 2
c) 3
d) All of the mentioned
55. ________ text is appropriate for most non-binary data types.
a) Character
b) Binary
c) Delimited
d) None of the mentioned
56. Point out the wrong statement.
a) Avro data files are a compact, efficient binary format that provides interoperability with
applications written in other programming languages
b) By default, data is compressed while importing
c) Delimited text also readily supports further manipulation by other tools, such as Hive
d) None of the mentioned
57. If you set the inline LOB limit to ________ all large objects will be placed in external storage.
a) 0
b) 1
c) 2
d) 3
58. ________ does not support the notion of enclosing characters that may include field delimiters in
the enclosed string.
a) Imphala
b) Oozie
c) Sqoop
d) Hive
59. Sqoop can also import the data into Hive by generating and executing a ____________ statement
to define the data’s layout in Hive.
a) SET TABLE
b) CREATE TABLE
c) INSERT TABLE
d) All of the mentioned
60. The __________ tool imports a set of tables from an RDBMS to HDFS.
a) export-all-tables
b) import-all-tables
c) import-tables
d) none of the mentioned
61. Which of the following argument is not supported by import-all-tables tool?
a) –class-name
b) –package-name
c) –database-name
d) –table-name
62. Which of the follwing is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs
a) Pig Latin
b) Oozie
c) Pig
d) Hive
63. Pig Latin scripting language is not only a higher-level data flow language but also has operators
similar to
a) SQL
b) JSON
c) XML
64. Which of the following is data flow scripting language for analyzing unstructured data?
a) Mahout
b) Hive
c) Pig
65. Which of the following command is used to show values to keys used in Pig?
a) Set
b) Declare
c) Display
66. Use the __________ command to run a Pig script that can interact with the Grunt shell
(interactive mode).
a) Fetch
b) Declare
c) Run
67. Which of the following command can be used for debugging?
a) Exec
b) Execute
c) Error
d) Throw
68. ____________ method will be called by Pig both in the front end and back end to pass a unique
signature to the Loader.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) getShipFiles
69. Which of the following is a framework for collecting and storing script-level statistics for Pig
Latin.
a) Pig Stats
b) PStatistics
c) Pig Statistics
70. Which among the following is simple xUnit framework that enables you to easily test your Pig
scripts.
a) PigUnit
b) PigXUnit
c) PigUnitX
71. Which of the following will compile the Pigunit?
a) $pig_trunk ant pigunit-jar
b) $pig_tr ant pigunit-jar
c) $pig_ ant pigunit-jar
72. PigUnit runs in Pig’s _______ mode by default.
a) Local
b) Tez
c) MapReduce
73. Pig operates in mainly how many nodes?
a) 2
b) 3
c) 4
d) 5
74. There are 2 programs which confirm a write into Hbase. One is write-ahead log(WAL) and the
other one is

a) Mem confirm log


b) Write complete log
c) log store
d) Memstore

75. When a compaction operates over all HFiles in a column family in a given region, it is called

a) Major compaction

b) Family compaction

c) Final compaction

d) Full Compaction

76. Retrieving a batch of rows in every RPC call made by an API to a HBase database is called a

a) Batch
b) Scan
c) Bulkrow
d) Grouprow

77. The size of a individual region is governed by the parameter

a) Hbase.region.size
b) Hbase.region.filesize
c) Hbase.region.max.filesize
d) Hbase.max.region.size

78. In a Map-Side join, we take rows from one table and map it with rows from the other table. The
size of one of the table should be

a) Enough to fit into memory

b) Half the size of the other table

c) Double the size of the other table

d) Small enough to be located in one physical machine.

79. The tuple which specifies a cell in Hbase is


a) {row, column, version}
b) {row, column family, version}
c) {row, table, column}
d) {column-family,column-attribute, version}
80. What does the following command do?
hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
a) All the columns in the column family f1 of table t1 can have minimum 2 versions.
b) All the columns in the column family f1 of table t1 can have maximum 2 versions.
c) Creates 2 versions of the column family named f1 in table t1.
d) Created 2 versions of the table t1,

You might also like