Midterm Exam Multiple Choice
Midterm Exam Multiple Choice
TRAINING
.....................................
NATIONAL ECONOMIC
..
UNIVERSITY
Program: DSEB Intake:
63
A) cache()
B) persist()
C) saveAsTextFile()
D) store()
A) df.createOrReplaceTempView("view_name")
B) df.registerTempView("view_name")
C) df.createGlobalTempView("view_name")
6. When using Spark SQL, what is the purpose of the explain() method?
A) df.groupBy("column").agg(sum("value"))
B) df.aggregate("column", sum("value"))
C) df.group("column").sum("value")
D) df.groupBy("column").aggregate(sum("value"))
A) fillna(value)
B) dropna()
C) replaceNulls(value)
D) ignoreNulls()
12. What type of join does Spark perform by default when joining two
DataFrames?
A) Inner join
B) Left join
C) Right join
C) To sort records
D) To group records
B) df1.innerJoin(df2, "key")
C) df1.join(df2, "key")
D) df1.joinInner(df2, "key")
A) renameColumn()
B) withColumnRenamed("oldName", "newName")
C) changeColumnName()
D) setColumnName()
18. How can you optimize query performance in Spark SQL? (Select all that
apply)
20. Which of the following is the best practice for handling large datasets in
Spark?
22. Which method allows you to change the data type of a column in a
DataFrame?
A) cast("newType")
B) changeType("newType")
C) convertType("newType")
D) modifyType("newType")
D) It aggregates data.
25. How can you apply a user-defined function (UDF) to a column in a
DataFrame?
A) df.apply(udf, "column")
B) df.withColumn("new_column", udf(df["column"]))
C) df.transform(udf, "column")
D) df.udf("column")
26. How can you optimize performance in a Spark application? (Select all
that apply)
D) To filter datasets.
28. Which of the following methods can be used to create a DataFrame from
an existing RDD?
A) createDataFrame()
B) toDF()
C) fromRDD()
D) loadDataFrame()
29. Which of the following is NOT a feature of Apache Spark?
A) In-memory processing
B) Lazy evaluation
D) Strict consistency
30. What should you do to avoid memory issues when processing large
datasets?