eBAY QA 1
eBAY QA 1
Which of the following commands should the data engineer run to complete this
task?
'/customer/customer360';
Explanation:
• CREATE DATABASE IF NOT EXISTS ensures that the command won't fail if the
database has already been created by a colleague.
• D and E: The DELTA keyword is not valid syntax in standard CREATE DATABASE
statements (you specify Delta format at the table level, not database level).
So, the best and correct answer is:
C. CREATE DATABASE IF NOT EXISTS customer360 LOCATION
'/customer/customer360';
Q13
A junior data engineer needs to create a Spark SQL table my_table for which Spark
manages both the data and the metadata. The metadata and data should also be stored
in the Databricks Filesystem (DBFS).
Which of the following commands should a senior data engineer share with the junior
data engineer to complete this task?
Explanation:
This command creates a managed table by default in Spark SQL and Databricks.
In a managed table, Spark manages both the data and metadata, which is exactly
what's required.
Data and metadata are stored in DBFS by default when no external LOCATION or
OPTIONS(PATH) is provided.
The USING clause and OPTIONS(PATH) are typically used for external tables, where you
manage the storage location manually.
So, simply using CREATE TABLE without a LOCATION clause or OPTIONS(PATH) makes it
a managed table stored in DBFS by default on Databricks.
Q15
A data engineering team has created a series of tables using Parquet data stored
in an external system. The team is noticing that after appending new rows to the
data in the external system, their queries within Databricks are not returning the
new rows. They identify the caching of the previous data as the cause of this issue.
Which of the following approaches will ensure that the data returned by queries is
alwaysup-to-date?
C. The tables should be refreshed in the writing cluster before the next query is run
Explanation:
• The issue described — queries not returning updated data after new rows are
appended in an external system — is typically due to caching or the non-
transactional nature of Parquet files.
• When using Delta tables in Databricks, changes to the underlying data are
automatically reflected in queries, especially if you avoid explicitly caching the
tables.
• Delta format supports features like time travel, MERGE, and optimized upserts,
making it much more robust for frequent data updates.
• E: Updating the table manually still doesn't ensure up-to-date results unless
caching and metadata refresh are handled correctly.
Final Answer:
Q16
id STRING,
A senior data engineer wants to create a new table from this table using the
following command:
GROUP BY country;
A junior data engineer asks why the schema is not being declared for the new
table. Which of the following responses explains why declaring the schema is not
necessary?
A. CREATE TABLE AS SELECT statements adopt schema details from the sourcetable
and query.
D. CREATE TABLE AS SELECT statements assign all columns the type STRING.
schemas.
Explanation:
A CREATE TABLE AS SELECT (CTAS) statement creates a new table based on the result
of a query.
The schema of the new table is automatically derived from the columns and their types
in the SELECT clause.
Therefore, manually declaring the schema is not necessary — it's inferred directly from
the source table's schema and the transformations/aggregations applied in the query.
B is misleading — CTAS doesn't "scan the data" to infer schema like schema inference
on raw files; it uses the schema from the query result.
D is incorrect — CTAS does not assign all columns the STRING type.
Q17
A data engineer is overwriting data in a table by deleting the table and recreating
the table. Another data engineer suggests that this is inefficient and the table
Which of the following reasons to overwrite the table instead of deleting and
B. Overwriting a table results in a clean table history for logging and audit purposes.
C. Overwriting a table maintains the old version of the table for Time Travel.
D. Overwriting a table is an atomic operation and will not leave the table in an
unfinished state.
progress.
B. Overwriting a table results in a clean table history for logging and audit
purposes.
Explanation:
This statement is incorrect because overwriting a table does not "clean" the table
history — in fact, Delta Lake preserves the table history, including overwrite
operations, for logging, audit, and Time Travel purposes. That means:
• You can still access previous versions of the table (unless they've been
vacuumed or the retention period has passed).
• A. True: Overwriting a table writes new files and marks old ones as deleted —
it’s more efficient than deleting and recreating the entire table structure.
• C. True: Delta Lake maintains the old version of the table, enabling Time
Travel.
Final Answer:
B. Overwriting a table results in a clean table history for logging and audit
purposes.
Q18
Which
records
Exar
table:
THEN INSERT *;
Explanation:
• If you want to form a set of records from an existing Delta table with
duplicates removed, the simplest and correct way is to use:
sql
CopyEdit
• SELECT DISTINCT removes duplicate rows based on all columns, which aligns
with the question.
• *D. MERGE INTO my_table a USING new_records b ON a.id = b.id WHEN NOT
MATCHED THEN INSERT ;
This is for merging new data into an existing table, not for removing
duplicates from existing data.
Final Answer:
QUESTION 21
A data engineer has ingested a JSON file into a table named "raw_table" with a schema
containing transaction_id (STRING) and payload (ARRAY). The data engineer wants to
efficiently extract the date of each transaction into a new table with the schema
transaction_id (STRING) and date (TIMESTAMP). Which of the following commands
should the data engineer run to complete this task?
Given:
• Table: raw_table
• Schema:
o transaction_id (STRING)
▪ customer_id (STRING)
▪ date (TIMESTAMP)
▪ store_id (STRING)
Goal:
• transaction_id (STRING)
• date (TIMESTAMP)
• To extract the date from each struct inside the payload array.
• To flatten the array (payload) so each struct becomes its own row.
You need to explode or unnest the payload array, then select transaction_id and the
date field from each struct.
sql
CopyEdit
FROM raw_table
In BigQuery:
sql
CopyEdit
Correct Answer:
Even though it's not fully extracting date, it's the only valid syntax among the options
that starts the correct process.