0% found this document useful (0 votes)

42 views10 pages

eBAY QA 1

The document provides a series of questions and answers related to SQL commands for data engineering tasks, focusing on database creation, table management, and data extraction techniques. It emphasizes the importance of using correct SQL syntax and understanding the implications of different commands, such as creating managed tables and handling duplicates in Delta tables. Key takeaways include the necessity of specifying locations when creating databases, the advantages of Delta format for data updates, and the use of SELECT DISTINCT for removing duplicates.

Uploaded by

Arshad Arain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views10 pages

eBAY QA 1

Uploaded by

Arshad Arain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Q12

A data engineer needs to create a database called customer360at the location

/customer/customer360. The data engineer is unsure if one of their colleagues has

already created the database.

Which of the following commands should the data engineer run to complete this

task?

A. CREATE DATABASE customer360 LOCATION '/customer/customer360';

B. CREATE DATABASE IF NOT EXISTS customer360;

C. CREATE DATABASE IF NOT EXISTS customer360 LOCATION'/customer/customer360';

D. CREATE DATABASE IF NOT EXISTS customer360 DELTA LOCATION

'/customer/customer360';

E. CREATE DATABASE customer360 DELTA LOCATION'/customer/customer360';

The correct answer is:

C. CREATE DATABASE IF NOT EXISTS customer360 LOCATION

'/customer/customer360';

Explanation:

• CREATE DATABASE IF NOT EXISTS ensures that the command won't fail if the
database has already been created by a colleague.

• The LOCATION '/customer/customer360' clause ensures that the database is

physically stored at the specified path in the file system.

• Option C is the only one that:

o Checks for existing database (to avoid error),

o Specifies the correct custom location.

Why the other options are incorrect:

• A: Will fail if the database already exists.

• B: Doesn’t specify the location.

• D and E: The DELTA keyword is not valid syntax in standard CREATE DATABASE
statements (you specify Delta format at the table level, not database level).
So, the best and correct answer is:
C. CREATE DATABASE IF NOT EXISTS customer360 LOCATION
'/customer/customer360';

Q13

A junior data engineer needs to create a Spark SQL table my_table for which Spark
manages both the data and the metadata. The metadata and data should also be stored
in the Databricks Filesystem (DBFS).

Which of the following commands should a senior data engineer share with the junior
data engineer to complete this task?

A. CREATE TABLE my_table (id STRING, value STRING) USING

org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");

B. CREATE MANAGED TABLE my_table (id STRING, value STRING) USING

org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");

C. CREATE MANAGED TABLE my_table (id STRING, value STRING);

D. CREATE TABLE my_table (id STRING, value STRING) USING DBFS;

E. CREATE TABLE my_table (id STRING, value STRING);

The correct answer is:

E. CREATE TABLE my_table (id STRING, value STRING);

Explanation:

This command creates a managed table by default in Spark SQL and Databricks.

In a managed table, Spark manages both the data and metadata, which is exactly
what's required.

Data and metadata are stored in DBFS by default when no external LOCATION or
OPTIONS(PATH) is provided.

The USING clause and OPTIONS(PATH) are typically used for external tables, where you
manage the storage location manually.

So, simply using CREATE TABLE without a LOCATION clause or OPTIONS(PATH) makes it
a managed table stored in DBFS by default on Databricks.

Q15

A data engineering team has created a series of tables using Parquet data stored
in an external system. The team is noticing that after appending new rows to the

data in the external system, their queries within Databricks are not returning the

new rows. They identify the caching of the previous data as the cause of this issue.

Which of the following approaches will ensure that the data returned by queries is

alwaysup-to-date?

A. The tables should be converted to the Delta format

B.The tables should be stored in a cloud-based external system

C. The tables should be refreshed in the writing cluster before the next query is run

D. The tables should be altered to include metadata to not cache

E. The tables should be updated before the next query is run

The correct answer is:

A. The tables should be converted to the Delta format

Explanation:

• The issue described — queries not returning updated data after new rows are
appended in an external system — is typically due to caching or the non-
transactional nature of Parquet files.

• Parquet tables are not automatically updated in Databricks when the

underlying files change, especially if the table was registered as a cached or
managed table.

Why Delta format solves this:

• Delta Lake provides ACID transactions, schema enforcement, and automatic

metadata management.

• When using Delta tables in Databricks, changes to the underlying data are
automatically reflected in queries, especially if you avoid explicitly caching the
tables.

• Delta format supports features like time travel, MERGE, and optimized upserts,
making it much more robust for frequent data updates.

Why the other options are incorrect:

• B: Storing data in the cloud (e.g., S3 or ADLS) doesn't inherently solve the
caching/refresh problem.

• C: Manually refreshing is not scalable or reliable.

• D: There is no metadata flag in table schema to "not cache"; caching behavior

must be managed explicitly using CACHE or UNCACHE.

• E: Updating the table manually still doesn't ensure up-to-date results unless
caching and metadata refresh are handled correctly.

Final Answer:

A. The tables should be converted to the Delta format

Q16

A table customerLocationsexists with the following schema:

id STRING,

date STRING, city STRING, country STRING

A senior data engineer wants to create a new table from this table using the

following command:

CREATE TABLE customersPerCountry ASSELECT country,

COUNT(*) AS customersFROM customerLocations

GROUP BY country;

A junior data engineer asks why the schema is not being declared for the new

table. Which of the following responses explains why declaring the schema is not

necessary?

A. CREATE TABLE AS SELECT statements adopt schema details from the sourcetable

and query.

B. CREATE TABLE AS SELECTstatements infer the schema by scanning the data.

C. CREATE TABLE AS SELECT statements result in tables where schemas areoptional.

D. CREATE TABLE AS SELECT statements assign all columns the type STRING.

E. CREATE TABLE AS SELECT statements result in tables that do not support

schemas.
Explanation:

A CREATE TABLE AS SELECT (CTAS) statement creates a new table based on the result
of a query.

The schema of the new table is automatically derived from the columns and their types
in the SELECT clause.

Therefore, manually declaring the schema is not necessary — it's inferred directly from
the source table's schema and the transformations/aggregations applied in the query.

Why not the others?

B is misleading — CTAS doesn't "scan the data" to infer schema like schema inference
on raw files; it uses the schema from the query result.

C is vague — schemas aren't optional; they're just inferred.

D is incorrect — CTAS does not assign all columns the STRING type.

E is false — CTAS-created tables absolutely do support schemas.

Q17

A data engineer is overwriting data in a table by deleting the table and recreating

the table. Another data engineer suggests that this is inefficient and the table

should simply be overwritten instead.

Which of the following reasons to overwrite the table instead of deleting and

recreating thetable is incorrect?

A. Overwriting a table is efficient because no files need to be deleted.

B. Overwriting a table results in a clean table history for logging and audit purposes.
C. Overwriting a table maintains the old version of the table for Time Travel.

D. Overwriting a table is an atomic operation and will not leave the table in an

unfinished state.

E. Overwriting a table allows for concurrent queries to be completed while in

progress.

B. Overwriting a table results in a clean table history for logging and audit
purposes.

Explanation:

This statement is incorrect because overwriting a table does not "clean" the table
history — in fact, Delta Lake preserves the table history, including overwrite
operations, for logging, audit, and Time Travel purposes. That means:

• The overwrite is recorded in the transaction log, and

• You can still access previous versions of the table (unless they've been
vacuumed or the retention period has passed).

Why the other options are correct:

• A. True: Overwriting a table writes new files and marks old ones as deleted —
it’s more efficient than deleting and recreating the entire table structure.

• C. True: Delta Lake maintains the old version of the table, enabling Time
Travel.

• D. True: Overwriting is an atomic operation in Delta Lake; either the new

data fully replaces the old data, or the operation fails cleanly.

• E. True: Delta Lake supports concurrent reads; queries in progress during an

overwrite read from the old snapshot.

Final Answer:

B. Overwriting a table results in a clean table history for logging and audit
purposes.

Q18
Which

of the following commands will from an existing Delta table

my_table where duplicates have been removed?

records

Exar

A. DROP DUPLICATES FROM my_table;

table:

B. SELECT * FROM my_table WHERE duplicate = False;

C. SELECT DISTINCT * FROM my_table;

D. MERGE INTO my_table a USING new_records b ON a.id = b.id WHENNOT MATCHED

THEN INSERT *;

E. MERGE INTO my_table a USING new_records b;

The correct answer is:

C. SELECT DISTINCT * FROM my_table;

Explanation:

• If you want to form a set of records from an existing Delta table with
duplicates removed, the simplest and correct way is to use:

sql

CopyEdit

SELECT DISTINCT * FROM my_table;

• SELECT DISTINCT removes duplicate rows based on all columns, which aligns
with the question.

Why the other options are incorrect:

• A. DROP DUPLICATES FROM my_table;

Invalid SQL syntax — there's no such SQL command as DROP DUPLICATES.
• B. SELECT * FROM my_table WHERE duplicate = False;
Assumes a duplicate column exists and tracks duplicate status, which is not
standard or guaranteed.

• *D. MERGE INTO my_table a USING new_records b ON a.id = b.id WHEN NOT
MATCHED THEN INSERT ;
This is for merging new data into an existing table, not for removing
duplicates from existing data.

• E. MERGE INTO my_table a USING new_records b;

Incomplete syntax and still not related to de-duplication of existing table
records.

Final Answer:

C. SELECT DISTINCT * FROM my_table;

QUESTION 21

A data engineer has ingested a JSON file into a table named "raw_table" with a schema
containing transaction_id (STRING) and payload (ARRAY). The data engineer wants to
efficiently extract the date of each transaction into a new table with the schema
transaction_id (STRING) and date (TIMESTAMP). Which of the following commands
should the data engineer run to complete this task?

SELECT transaction_id, explode(payload) FROM raw_table; SELECT transaction_id,

payload.date FROM raw_table; SELECT transaction_id, date FROM raw_table; SELECT
transaction_id, payload[date] FROM raw_table; SELECT transaction_id, date from
payload FROM raw_table;

Given:

• Table: raw_table

• Schema:

o transaction_id (STRING)

o payload (ARRAY of STRUCTs), where each struct includes:

▪ customer_id (STRING)

▪ date (TIMESTAMP)

▪ store_id (STRING)
Goal:

Create a table with this schema:

• transaction_id (STRING)

• date (TIMESTAMP)

That means we want:

• To extract the date from each struct inside the payload array.

• To flatten the array (payload) so each struct becomes its own row.

Correct SQL Concept:

You need to explode or unnest the payload array, then select transaction_id and the
date field from each struct.

In Spark SQL or Hive:

sql

CopyEdit

SELECT transaction_id, p.date

FROM raw_table

LATERAL VIEW explode(payload) AS p;

In BigQuery:

sql

CopyEdit

SELECT transaction_id, p.date

FROM raw_table, UNNEST(payload) AS p;

Now evaluate the options:

1. SELECT transaction_id, explode(payload) FROM raw_table;

o Explodes the array, returning each struct.

o But doesn't select the date field specifically.

o Close, but incomplete — you'd still need to access payload.date.

o Closest correct among the options given.

2. SELECT transaction_id, payload.date FROM raw_table;

o Invalid. payload is an array — you can't directly use .date on an array.

3. SELECT transaction_id, date FROM raw_table;

o Invalid. date is not a top-level column.

4. SELECT transaction_id, payload[date] FROM raw_table;

o Invalid syntax. Array elements can’t be accessed like this.

5. SELECT transaction_id, date from payload FROM raw_table;

o Invalid SQL syntax.

Correct Answer:

SELECT transaction_id, explode(payload) FROM raw_table;

Even though it's not fully extracting date, it's the only valid syntax among the options
that starts the correct process.

Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
No ratings yet
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
19 pages
Databricks Questions
No ratings yet
Databricks Questions
31 pages
Databricks Question 1668314325
No ratings yet
Databricks Question 1668314325
104 pages
Databricks Certified Data Engineer Associate 9
No ratings yet
Databricks Certified Data Engineer Associate 9
12 pages
Databricks Certified Data Engineer Associate 4
100% (1)
Databricks Certified Data Engineer Associate 4
13 pages
The Ai Millionaire Checklist
No ratings yet
The Ai Millionaire Checklist
21 pages
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Databricks Questions
No ratings yet
Databricks Questions
23 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
Verilum® 5.2: Video Display Calibration and Conformance Tracking
No ratings yet
Verilum® 5.2: Video Display Calibration and Conformance Tracking
19 pages
BAPI ACC Document Post
No ratings yet
BAPI ACC Document Post
4 pages
eBAY QA
No ratings yet
eBAY QA
9 pages
eBAY QA
No ratings yet
eBAY QA
78 pages
DB 3
No ratings yet
DB 3
12 pages
Simulado 82
No ratings yet
Simulado 82
10 pages
Databricks Certified Data Engineer Associate Practice Questions
No ratings yet
Databricks Certified Data Engineer Associate Practice Questions
6 pages
Databricks
No ratings yet
Databricks
15 pages
Databricks Practice Questions
No ratings yet
Databricks Practice Questions
83 pages
PracticeExam DBKS
No ratings yet
PracticeExam DBKS
26 pages
CSCE 4523 Introduction To Database Management Systems
No ratings yet
CSCE 4523 Introduction To Database Management Systems
9 pages
Certified Data Engineer Professional Questions
No ratings yet
Certified Data Engineer Professional Questions
24 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Sequel Set A
No ratings yet
Sequel Set A
6 pages
DDL Statements
No ratings yet
DDL Statements
4 pages
INF3707 Examination
No ratings yet
INF3707 Examination
7 pages
Data Bricks 3
No ratings yet
Data Bricks 3
10 pages
DAC - EANDC - QB - Question Bank For DBT
No ratings yet
DAC - EANDC - QB - Question Bank For DBT
23 pages
The Most Commonly Used SQL Queries
No ratings yet
The Most Commonly Used SQL Queries
29 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Creating Other Schema Objects (Sol)
No ratings yet
Creating Other Schema Objects (Sol)
4 pages
DBMS Possible Qset 3
No ratings yet
DBMS Possible Qset 3
5 pages
300+ TOP Database Management System Questions and Answers
No ratings yet
300+ TOP Database Management System Questions and Answers
26 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Databricks Exam
No ratings yet
Databricks Exam
14 pages
Answer: D Explanation: The Term "DDL" Stands For Data Definition Language, Used To
No ratings yet
Answer: D Explanation: The Term "DDL" Stands For Data Definition Language, Used To
17 pages
Gokhale Institute of Politics and Economics: B.Sc. (Sem - 01) Database Management System Assignment 02
No ratings yet
Gokhale Institute of Politics and Economics: B.Sc. (Sem - 01) Database Management System Assignment 02
7 pages
SQL Test1
No ratings yet
SQL Test1
18 pages
Helpful
No ratings yet
Helpful
9 pages
Wipro Interview Questions
100% (2)
Wipro Interview Questions
39 pages
MS SQL Server Developer
No ratings yet
MS SQL Server Developer
93 pages
1z0-071 - 7 Dump Exam
No ratings yet
1z0-071 - 7 Dump Exam
25 pages
File Viewer
No ratings yet
File Viewer
7 pages
Micro Project Report (DMA) 2020
No ratings yet
Micro Project Report (DMA) 2020
28 pages
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
Shashemene Poly Technique College Coc
No ratings yet
Shashemene Poly Technique College Coc
10 pages
DBMS Ques 2
No ratings yet
DBMS Ques 2
11 pages
PRÁCTICA 1 Base de DATOS
No ratings yet
PRÁCTICA 1 Base de DATOS
5 pages
Quiz (2025)
No ratings yet
Quiz (2025)
9 pages
(Exam) Data Engineering Certification Prep Guide - Partners
No ratings yet
(Exam) Data Engineering Certification Prep Guide - Partners
15 pages
Unit 3 - Dbms - Cs Worksheet 2
No ratings yet
Unit 3 - Dbms - Cs Worksheet 2
2 pages
Shivaji University, Kolhapur: Question Bank For Mar 2022 (Summer) Examination
No ratings yet
Shivaji University, Kolhapur: Question Bank For Mar 2022 (Summer) Examination
13 pages
Etl Interview Questions
100% (1)
Etl Interview Questions
4 pages
Databricks Practice Questions 1
No ratings yet
Databricks Practice Questions 1
10 pages
Answer: D
No ratings yet
Answer: D
52 pages
Databricks Certified Data Engineer Associate - 6
No ratings yet
Databricks Certified Data Engineer Associate - 6
10 pages
Oracle Testinises 1z0-071 Exam Prep 2022-Jan-09 by Burke 158q Vce
No ratings yet
Oracle Testinises 1z0-071 Exam Prep 2022-Jan-09 by Burke 158q Vce
19 pages
Data Collection and DBMS - PG DBDA - Feb20
No ratings yet
Data Collection and DBMS - PG DBDA - Feb20
45 pages
Tiger Analytics 1735834470
No ratings yet
Tiger Analytics 1735834470
27 pages
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
NODE2 Lsinventory Detail
No ratings yet
NODE2 Lsinventory Detail
65 pages
Business Models in Two-Sided Markets - An Assessment of Strategies
No ratings yet
Business Models in Two-Sided Markets - An Assessment of Strategies
13 pages
3.4.4 Lab Research Networking Standards
No ratings yet
3.4.4 Lab Research Networking Standards
4 pages
Sample Paper - 2010 Class - XII Subject - Computer Science
No ratings yet
Sample Paper - 2010 Class - XII Subject - Computer Science
7 pages
ETL Testing Besant Technologies Course Syllabus
No ratings yet
ETL Testing Besant Technologies Course Syllabus
4 pages
CD6003 - CD Player
No ratings yet
CD6003 - CD Player
2 pages
2024-07-20
No ratings yet
2024-07-20
10 pages
Assignment 5 (PivotTables)
No ratings yet
Assignment 5 (PivotTables)
3 pages
Evolution of Cyber Attacks and Their Economic Impa
No ratings yet
Evolution of Cyber Attacks and Their Economic Impa
6 pages
Predictive Analytics in Data Science For Business Intelligence Solutions
No ratings yet
Predictive Analytics in Data Science For Business Intelligence Solutions
4 pages
Design of Bridges by Krishna Raju PDF Free
18% (11)
Design of Bridges by Krishna Raju PDF Free
4 pages
JAVA Unit 1
No ratings yet
JAVA Unit 1
297 pages
HBE Green LED
No ratings yet
HBE Green LED
4 pages
An Introduction To Programming For Hackers
No ratings yet
An Introduction To Programming For Hackers
62 pages
Hospital Management System Synopsis
No ratings yet
Hospital Management System Synopsis
9 pages
Pa 1400 Series
No ratings yet
Pa 1400 Series
46 pages
Assignment 02 - Spring 21
No ratings yet
Assignment 02 - Spring 21
4 pages
How To Manage Qualitative Data: A Step-by-Step Guide
No ratings yet
How To Manage Qualitative Data: A Step-by-Step Guide
3 pages
Java Primer
No ratings yet
Java Primer
187 pages
Dere 0922
No ratings yet
Dere 0922
7 pages
Introduction To Digital Integrated Circuits (DIC)
No ratings yet
Introduction To Digital Integrated Circuits (DIC)
13 pages
Panelx: Operating Manual
No ratings yet
Panelx: Operating Manual
656 pages
x300 Schematics
No ratings yet
x300 Schematics
99 pages
Obiee
No ratings yet
Obiee
11 pages
I&cs MCQ Set-3
No ratings yet
I&cs MCQ Set-3
22 pages
DriveLock Admin Guide
No ratings yet
DriveLock Admin Guide
566 pages
Docx4j - Getting Started
No ratings yet
Docx4j - Getting Started
45 pages