0% found this document useful (0 votes)

31 views8 pages

Midterm Exam Multiple Choice

Uploaded by

Tuan Anh Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views8 pages

Midterm Exam Multiple Choice

Uploaded by

Tuan Anh Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

MINISTRY OF EDUACATION AND MIDTERM EXAM

TRAINING
.....................................
NATIONAL ECONOMIC
..
UNIVERSITY
Program: DSEB Intake:
63

Faculty of Economic Date: 09/11/2024

Mathematics Session: 1
DSEB Program
Time limit: 30 minutes

1. What is the default behavior of the dropDuplicates() method in a

DataFrame?

A) It drops all rows.

B) It keeps the first occurrence of each duplicate row.

C) It drops all duplicates without keeping any.

D) It only drops duplicates based on specified columns.

2. Which of the following methods can be used to persist data in Spark?

(Select all that apply)

A) cache()

B) persist()

C) saveAsTextFile()

D) store()

3. How can you register a DataFrame as a temporary view in Spark SQL?

A) df.createOrReplaceTempView("view_name")

B) df.registerTempView("view_name")
C) df.createGlobalTempView("view_name")

D) df. createOrReplaceTemporaryView ("view_name")

4. What does the map() transformation do in Spark?

A) It filters elements from an RDD.

B) It applies a function to each element and returns a new RDD

C) It reduces the number of partitions.

D) It combines two RDDs.

5. Which of the following describes "lazy evaluation" in Spark?

A) Operations are executed immediately upon being called.

B) Transformations are not computed until an action is called

C) Data is stored on disk by default.

D) All computations happen in parallel.

6. When using Spark SQL, what is the purpose of the explain() method?

A) To execute the query

B) To display the physical plan for the query execution

C) To optimize the query

D) To show the schema of the DataFrame

7. What is a common use case for window functions in Spark SQL?

A) To group data by categories

B) To perform calculations across a set of rows related to the current row

C) To filter data based on conditions

D) To create temporary views

8. In Spark, what does the cache() method do?

A) It permanently stores the DataFrame.

B) It optimizes the query plan.

C) It stores the DataFrame in memory for faster access.

D) It drops the DataFrame from memory.

9. How can you group data in a DataFrame and perform an aggregation?

A) df.groupBy("column").agg(sum("value"))

B) df.aggregate("column", sum("value"))

C) df.group("column").sum("value")

D) df.groupBy("column").aggregate(sum("value"))

10. Which of the following can be used to handle missing values in a

DataFrame? (Select all that apply)

A) fillna(value)

B) dropna()

C) replaceNulls(value)

D) ignoreNulls()

11. What does the coalesce() method do when applied to a DataFrame?

A) It increases the number of partitions.

B) It reduces the number of partitions.

C) It merges multiple DataFrames.

D) It filters out null values.

12. What type of join does Spark perform by default when joining two
DataFrames?
A) Inner join

B) Left join

C) Right join

D) Full outer join

13. What does the reduceByKey() operation do?

A) It combines values with the same key using a specified function

B) It filters out keys based on a condition.

C) It sorts keys in ascending order.

D) It groups keys together without aggregation.

14. Which of the following statements about Spark DataFrames is true?

(Select all that apply)

A) They are immutable.

B) They can contain mixed data types.

C) They can only contain numeric data types.

D) They are optimized for query execution.

15. What is the purpose of the HAVING clause in Spark SQL?

A) To filter records before aggregation

B) To filter records after aggregation

C) To sort records

D) To group records

16. How do you perform an inner join between two DataFrames?

A) df1.join(df2, "key", "inner")

B) df1.innerJoin(df2, "key")
C) df1.join(df2, "key")

D) df1.joinInner(df2, "key")

17. Which method is used to rename a column in a DataFrame?

A) renameColumn()

B) withColumnRenamed("oldName", "newName")

C) changeColumnName()

D) setColumnName()

18. How can you optimize query performance in Spark SQL? (Select all that
apply)

A) Use partitioning on large tables

B) Avoid using too many joins

C) Always use non-optimized formats like CSV

D) Cache frequently accessed DataFrames

19. How do you perform an aggregation with grouping in Spark SQL?

A) SELECT column, SUM(value_column FROM table GROUP BY column

B) SELECT SUM(value_column), GROUP BY column FROM table

C) SELECT column, COUNT(value_column FROM table GROUP BY column

D) SELECT GROUP(column), SUM(value_column FROM table

20. Which of the following is the best practice for handling large datasets in
Spark?

A) Load all data into memory at once

B) Use partitioning to distribute data efficiently

C) Avoid using caching or persistence

D) Read data from disk only once

21. How should you handle skewed data in Spark?

A) Ignore the skew and proceed with processing

B) Use salting techniques to distribute data evenly

C) Increase the number of partitions

D) Use only one partition for processing

22. Which method allows you to change the data type of a column in a
DataFrame?

A) cast("newType")

B) changeType("newType")

C) convertType("newType")

D) modifyType("newType")

23. What is the best approach to monitor Spark applications?

A) Monitor logs only after job completion

B) Use the Spark UI and external monitoring tools

C) Ignore monitoring unless there are errors

D) Rely solely on system resource metrics

24. What does the explode() function do in Spark DataFrames?

A) It flattens nested structures into separate rows.

B) It combines multiple columns into one.

C) It filters out null values.

D) It aggregates data.
25. How can you apply a user-defined function (UDF) to a column in a
DataFrame?

A) df.apply(udf, "column")

B) df.withColumn("new_column", udf(df["column"]))

C) df.transform(udf, "column")

D) df.udf("column")

26. How can you optimize performance in a Spark application? (Select all
that apply)

A) Using partitioning effectively

B) Reducing the number of transformations

C) Increasing the number of partitions unnecessarily

D) Caching intermediate results

27. In which scenario would you use broadcast variables in Spark?

A) To send large amounts of data to all nodes efficiently

B) To store intermediate results.

C) To partition data across nodes.

D) To filter datasets.

28. Which of the following methods can be used to create a DataFrame from
an existing RDD?

A) createDataFrame()

B) toDF()

C) fromRDD()

D) loadDataFrame()
29. Which of the following is NOT a feature of Apache Spark?

A) In-memory processing

B) Lazy evaluation

C) Real-time data processing

D) Strict consistency

30. What should you do to avoid memory issues when processing large
datasets?

A) Increase the driver memory limit

B) Use more shuffle partitions

C) Load all data into memory

D) Reduce the number of executors

Pyspark Dumps
No ratings yet
Pyspark Dumps
10 pages
Associate Developer Apache Spark 3.5
No ratings yet
Associate Developer Apache Spark 3.5
6 pages
Databricks Certified Professional Data Engineer Practice Questions
No ratings yet
Databricks Certified Professional Data Engineer Practice Questions
13 pages
Exertherm® Modbus Datacard
No ratings yet
Exertherm® Modbus Datacard
2 pages
cs441 Big Data Concept by Sial
No ratings yet
cs441 Big Data Concept by Sial
23 pages
Study of Home Services Platform-Urban Company
No ratings yet
Study of Home Services Platform-Urban Company
15 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Databricks Certified Associate Developer For Apache Spark 3.0
No ratings yet
Databricks Certified Associate Developer For Apache Spark 3.0
11 pages
IDAS MultiTrunk Configuration Guide 5th Ed
No ratings yet
IDAS MultiTrunk Configuration Guide 5th Ed
128 pages
25 Solana
No ratings yet
25 Solana
11 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
50 PySpark Interview Questions 1732556477
No ratings yet
50 PySpark Interview Questions 1732556477
7 pages
Sap Time Management Holiday and Virriant Configuration Final PDF
No ratings yet
Sap Time Management Holiday and Virriant Configuration Final PDF
47 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
PracticeExam DCADAS3 Scala 1
No ratings yet
PracticeExam DCADAS3 Scala 1
27 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
PYSPARK Interview Questions
100% (3)
PYSPARK Interview Questions
126 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
Evans Analytics3e PPT 02 Accessible v2
No ratings yet
Evans Analytics3e PPT 02 Accessible v2
64 pages
Ai Copywriting Module For Youtube (Description Version)
No ratings yet
Ai Copywriting Module For Youtube (Description Version)
1 page
5000 SQli Vulnerable Websites List 2016 Fresh
No ratings yet
5000 SQli Vulnerable Websites List 2016 Fresh
120 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
19 pages
Ready For IELTS AK
No ratings yet
Ready For IELTS AK
38 pages
Python - Final 1
No ratings yet
Python - Final 1
17 pages
PDS2MC02 Locf
No ratings yet
PDS2MC02 Locf
2 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Week 3-1
No ratings yet
Week 3-1
8 pages
Service-Manual Xplorer-Xplorer-plus Eng Version 10
No ratings yet
Service-Manual Xplorer-Xplorer-plus Eng Version 10
94 pages
Sparksql
No ratings yet
Sparksql
3 pages
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
No ratings yet
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
16 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
Midterm Exam Multiple Choice
No ratings yet
Midterm Exam Multiple Choice
8 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Databricksdataanalystassociateexamdumps2024 240515054759 7de10a6a
No ratings yet
Databricksdataanalystassociateexamdumps2024 240515054759 7de10a6a
13 pages
Apache Spark - Practices
No ratings yet
Apache Spark - Practices
24 pages
Recently Asked Data Analyst Interview Questions-2
No ratings yet
Recently Asked Data Analyst Interview Questions-2
4 pages
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
No ratings yet
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
72 pages
ABD Exame PDF
No ratings yet
ABD Exame PDF
17 pages
Xii Ip QP
No ratings yet
Xii Ip QP
11 pages
Apache Spark - Practices 2nd
No ratings yet
Apache Spark - Practices 2nd
26 pages
Imp Pyspark Questions
No ratings yet
Imp Pyspark Questions
1 page
LT Mindtree
No ratings yet
LT Mindtree
3 pages
Full PySpark Interview QA
No ratings yet
Full PySpark Interview QA
5 pages
Spark Questions
No ratings yet
Spark Questions
7 pages
12th - QPAPER - Half Yearly 2023
No ratings yet
12th - QPAPER - Half Yearly 2023
9 pages
10 Spark1
No ratings yet
10 Spark1
31 pages
Pyspark 12 Questions
No ratings yet
Pyspark 12 Questions
8 pages
Top 75 Apache Spark Interview Questions
No ratings yet
Top 75 Apache Spark Interview Questions
18 pages
Spark Optimization 1741826797
No ratings yet
Spark Optimization 1741826797
7 pages
Pre Requisite Form For CCS368
No ratings yet
Pre Requisite Form For CCS368
4 pages
Will We Ever Have A Fool
No ratings yet
Will We Ever Have A Fool
6 pages
PySpark Real Time Q&A
No ratings yet
PySpark Real Time Q&A
5 pages
PolyClad Manual, 200 Series
No ratings yet
PolyClad Manual, 200 Series
69 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
Qwen Technical Report
No ratings yet
Qwen Technical Report
59 pages
Pyspark 1
No ratings yet
Pyspark 1
7 pages
Turning Points - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
No ratings yet
Turning Points - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
10 pages
Tiger Analytics 1735834470
No ratings yet
Tiger Analytics 1735834470
27 pages
Connections I V2.1.0
No ratings yet
Connections I V2.1.0
49 pages
Interviewsss
No ratings yet
Interviewsss
4 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
Unit1 LS
No ratings yet
Unit1 LS
20 pages
850c Display Manual Biktrix Version
No ratings yet
850c Display Manual Biktrix Version
9 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
Spark Material
No ratings yet
Spark Material
6 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
Databricks Certified Professional Data Engineer 6
No ratings yet
Databricks Certified Professional Data Engineer 6
16 pages
Music Recommendation System and Recommendation Model
No ratings yet
Music Recommendation System and Recommendation Model
14 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
Vsfiltermod: List of New Override Tags
No ratings yet
Vsfiltermod: List of New Override Tags
6 pages
Mini Max Algorithm
No ratings yet
Mini Max Algorithm
31 pages
HTML Script
No ratings yet
HTML Script
81 pages
Simulado Databricks
No ratings yet
Simulado Databricks
25 pages
Whitepaper - Myriad Social - Decentralized Social Platform - v0-2
No ratings yet
Whitepaper - Myriad Social - Decentralized Social Platform - v0-2
30 pages
Intro Stat 153
No ratings yet
Intro Stat 153
198 pages
COMP246-016 - Fridge Management System - Parts A, B, & C
No ratings yet
COMP246-016 - Fridge Management System - Parts A, B, & C
56 pages
HWW7 Linked List V2
No ratings yet
HWW7 Linked List V2
5 pages
Tran Tuan Anh-11219259-Hw6
No ratings yet
Tran Tuan Anh-11219259-Hw6
31 pages
The Fourth Industrial Revolution Will Be People Powered
No ratings yet
The Fourth Industrial Revolution Will Be People Powered
8 pages
Journal Pone 0286362
No ratings yet
Journal Pone 0286362
19 pages
Prompt Engineering 101
No ratings yet
Prompt Engineering 101
26 pages
EDM - Module II - Intro. To Digital Marketing
No ratings yet
EDM - Module II - Intro. To Digital Marketing
24 pages
Reading 1
No ratings yet
Reading 1
9 pages
DBMS Record
No ratings yet
DBMS Record
42 pages
Bugreport NAM L29 HUAWEINAM L29 2025 02 04 08 20 40 Dumpstate - Log 9524
No ratings yet
Bugreport NAM L29 HUAWEINAM L29 2025 02 04 08 20 40 Dumpstate - Log 9524
40 pages
Week 13
No ratings yet
Week 13
26 pages
Neural Network
No ratings yet
Neural Network
36 pages
NLP Kneserney
No ratings yet
NLP Kneserney
10 pages
The Million Song Dataset
No ratings yet
The Million Song Dataset
7 pages
Workshop RecSys Challenge 2018
No ratings yet
Workshop RecSys Challenge 2018
6 pages
Devops: Roadmap - SH
No ratings yet
Devops: Roadmap - SH
1 page
Sugar Canes Task 1
No ratings yet
Sugar Canes Task 1
6 pages
ArticleText 63744 2 10 20220502
No ratings yet
ArticleText 63744 2 10 20220502
12 pages
Advanced Idioms4
No ratings yet
Advanced Idioms4
1 page
Time Window and Location Based Clustered Routing With Big and Distributed Data
No ratings yet
Time Window and Location Based Clustered Routing With Big and Distributed Data
11 pages
Writing S C
No ratings yet
Writing S C
7 pages
Lab 10
No ratings yet
Lab 10
10 pages
Internet of Thing
No ratings yet
Internet of Thing
7 pages
Listening
No ratings yet
Listening
5 pages
Microblogging Platforms Explained
No ratings yet
Microblogging Platforms Explained
4 pages
Course 2
No ratings yet
Course 2
2 pages
A Unit4 Page37
No ratings yet
A Unit4 Page37
1 page
A Unit16 Page145
No ratings yet
A Unit16 Page145
1 page
Advanced - Idioms3
No ratings yet
Advanced - Idioms3
1 page
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
Manish Soni
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet