DP-600 Implementing Analytics Solutions Using Microsoft Fabric Updated Dumps
DP-600 Implementing Analytics Solutions Using Microsoft Fabric Updated Dumps
2.HOTSPOT
You have a Fabric tenant.
You need to configure OneLake security for users shown in the following table.
Answer:
Explanation:
Box 1: ReadAll
User1, Read all the Spark data
If the “Read all Apache Spark” box is checked, users will be given ReadAll. This permission allows
users to access data in OneLake. This could be through direct OneLake access, Apache Spark
queries, or the lakehouse UX.
Box 2: ReadData
User2, Read all the SQL endpoint data
If the “Read all SQL endpoint data” is checked, users will be given the ReadData permission.
ReadData gives access to all Tables in the item when accessing through the SQL Endpoint. Users
will not be able to access OneLake directly.
Reference: https://support.fabric.microsoft.com/en-us/blog/building-common-data-architectures-with-
onelake-in-microsoft-fabric
Etc.
Reference: https://learn.microsoft.com/en-us/fabric/get-started/delta-lake-interoperability
5.HOTSPOT
You have a Microsoft Power BI report and a semantic model that uses Direct Lake mode.
From Power BI Desktop, you open Performance analyzer as shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the
information presented in the graphic. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Automatic
The Direct Lake fallback behavior is set to
Power BI datasets in Direct Lake mode read delta tables directly from OneLake ? unless they have to
fall back to DirectQuery mode. Typical fallback reasons include memory pressures that can prevent
loading of columns required to process a DAX query, and certain features at the data source might
not support Direct Lake mode, like SQL views in a Warehouse. In general, Direct Lake mode provides
the best DAX query performance unless a fallback to DirectQuery mode is necessary. Because
fallback to DirectQuery mode can impact DAX query performance, it's important to analyze query
processing for a Direct Lake dataset to identify if and how often fallbacks occur.
Note: Fallback behavior
Direct Lake models include the DirectLakeBehavior property, which has three options:
Automatic - (Default) Specifies queries fall back to DirectQuery mode if data can't be efficiently loaded
into memory.
DirectLakeOnly - Specifies all queries use Direct Lake mode only. Fallback to DirectQuery mode is
disabled. If data can't be loaded into memory, an error is returned. Use this setting to determine if
DAX queries fail to load data into memory, forcing an error to be returned.
DirectQueryOnly - Specifies all queries use DirectQuery mode only. Use this setting to test fallback
performance.
Box 2: Direct Query
In the Performance analyzer pane, select Refresh visuals, and then expand the Card visual. The card
visual doesn't cause any DirectQuery processing, which indicates the dataset was able to process the
visual’s DAX queries in Direct Lake mode.
If the dataset falls back to DirectQuery mode to process the visual’s DAX query, you see a Direct
query performance metric, as shown in the following image:
Reference:
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview
6. Sign into Fabric and navigate to the workspace you want to connect with.
7.You have a Fabric tenant that uses a Microsoft Power BI Premium capacity.
You need to enable scale-out for a semantic model.
What should you do first?
A. At the semantic model level, set Large Semantic model storage format to Off.
B. At the tenant level, set Create and use Metrics to Enabled.
C. At the semantic model level, set Large Semantic model storage format to On.
D. At the tenant level, set Data Activator to Enabled.
Answer: C
Explanation:
Power BI semantic model scale-out
Prerequisites
By default, scale-out is enabled for your tenant, but it's not enabled for semantic models in your
tenant. To enable scale-out for a semantic model, you must use the Power BI REST APIs. Before
enabling, the
following prerequisites must be met:
* The Scale-out queries for large semantic models setting for your tenant is enabled (default).
* Your workspace resides on a Power BI Premium capacity
*-> The Large semantic model storage format setting is enabled.
* Etc.
Note: Power BI semantic models can store data in a highly compressed in-memory cache for
optimized query performance, enabling fast user interactivity. With Premium capacities, large
semantic models beyond the default limit can be enabled with the Large semantic model storage
format setting. When enabled, semantic model size is limited by the Premium capacity size or the
maximum size set by the administrator.
Enable large semantic models
Steps here describe enabling large semantic models for a new model published to the service. For
existing semantic models, only step 3 is necessary.
8.HOTSPOT
You have a Fabric tenant that contains a lakehouse named Lakehouse1.
Lakehouse1 contains a table named Nyctaxi_raw. Nyctaxi_raw contains the following table:
Answer:
Explanation:
Box 1: df.withColumnRenamed
Add a column named pickupDate that will contain only the date portion of pickupDateTime.
withColumnRenamed(existing, new)[source]
Returns a new DataFrame by renaming an existing column. This is a no-op if schema doesn’t contain
the given column name.
Parameters:
existing C string, name of the existing column to rename.
col C string, new name of the column.
>>> df.withColumnRenamed('age', 'age2').collect() [Row(age2=2, name='Alice'), Row(age2=5,
name='Bob')]
Incorrect:
* df.withColumn withColumn(colName, col)[source]
Returns a new DataFrame by adding a column or replacing the existing column that has the same
name.
The column expression must be an expression over this DataFrame; attempting to add a column from
some other dataframe will raise an error.
Parameters:
colName C string, name of the new column.
col C a Column expression for the new column.
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]
Box 2: cast('date')
cast(dataType)[source]
Convert the column into type dataType.
>>> df.select(df.age.cast("string").alias('ages')).collect() [Row(ages='2'), Row(ages='5')]
>>> df.select(df.age.cast(StringType()).alias('ages')).collect() [Row(ages='2'), Row(ages='5')]
Box 3: .filter("fareAmount > 0 AND fareAmount < 100"
Filter the DataFrame to include only rows where fareAmount is a positive number that is less than
100.
filter(condition)[source]
Filters rows using the given condition.
where() is an alias for filter().
>>> df.filter(df.age > 3).collect() [Row(age=5, name='Bob')]
>>> df.where(df.age == 2).collect() [Row(age=2, name='Alice')]
>>> df.filter("age > 3").collect() [Row(age=5, name='Bob')]
>>> df.where("age = 2").collect() [Row(age=2, name='Alice')]
Incorrect:
*.where
Isin will not give the desired result.
Note: In Apache Spark, the where() function can be used to filter rows in a DataFrame based on a
given condition. The condition is specified as a string that is evaluated for each row in the DataFrame.
Rows for which the condition evaluates to True are retained, while those for which it evaluates to
False are removed.
isin(*cols)[source]
A boolean expression that is evaluated to true if the value of this expression is contained by the
evaluated values of the arguments.
>>> df[df.name.isin("Bob", "Mike")].collect() [Row(age=5, name='Bob')]
>>> df[df.age.isin([1, 2, 3])].collect() [Row(age=2, name='Alice')]
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html
9.Which syntax should you use in a notebook to access the Research division data for Productline1?
A. spark.read.format(“delta”).load(“Tables/ResearchProduct”)
B. spark.read.format(“delta”).load(“Files/ResearchProduct”)
C. external_table(‘Tables/ResearchProduct)
D. external_table(ResearchProduct)
Answer: A
Explanation:
Correct:
* spark.read.format(“delta”).load(“Tables/ResearchProduct”)
* spark.sql(“SELECT * FROM Lakehouse1.ResearchProduct ”)
Incorrect:
* external_table(‘Tables/ResearchProduct)
* external_table(ResearchProduct)
* spark.read.format(“delta”).load(“Files/ResearchProduct”)
* spark.read.format(“delta”).load(“Tables/productline1/ResearchProduct”)
* spark.sql(“SELECT * FROM Lakehouse1.Tables.ResearchProduct ”)
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
Can use either:
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
OR
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
Reference: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
10.HOTSPOT
You have a Fabric tenant that contains a lakehouse named LH1.
You need to deploy a new semantic model.
The solution must meet the following requirements:
- Support complex calculated columns that include aggregate functions, calculated tables, and
Multidimensional Expressions (MDX) user hierarchies.
- Minimize page rendering times.
How should you configure the model? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Mode C Import
The Import mode allows for complex calculated columns, calculated tables, and MDX user
hierarchies. This mode loads the data into memory, enabling fast query performance and minimizing
page rendering times.
Query Caching C On
Enabling query caching improves performance by caching the results of queries, reducing the time it
takes to render pages.
11.You have a Fabric workspace named Workspace1 and an Azure SQL database.
You plan to create a dataflow that will read data from the database, and then transform the data by
performing an inner join.
You need to ignore spaces in the values when performing the inner join. The solution must minimize
development effort.
What should you do?
A. Append the queries by using fuzzy matching.
B. Merge the queries by using fuzzy matching.
C. Append the queries by using a lookup table.
D. Merge the queries by using a lookup table.
Answer: B
Explanation:
Joins are merge operations.
Join transformation in mapping data flow
Use the join transformation to combine data from two sources or streams in a mapping data flow. The
output stream will include all columns from both sources matched based on a join condition.
Inner join only outputs rows that have matching values in both tables.
Fuzzy join
You can choose to join based on fuzzy join logic instead of exact column value matching by turning
on the "Use fuzzy matching" checkbox option.
*-> Combine text parts: Use this option to find matches by remove space between words. For
example, Data Factory is matched with DataFactory if this option is enabled.
Similarity score column: You can optionally choose to store the matching score for each row in a
column by entering a new column name here to store that value.
Similarity threshold: Choose a value between 60 and 100 as a percentage match between values in
the columns you've selected.
Reference: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-join
12. Set the permissions for the table or column to None or Read.
13. Manually enable the Sync the default Power BI semantic model setting for each Warehouse or
SQL analytics endpoint in the workspace. This will restart the background sync that will incur some
consumption costs.
14. Select the role you want to enable an OLS definition for, and expand the Table Permissions.
Step 3: Set Object Level Security to None for SalesRegionManager
You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 is assigned to
a Fabric capacity.
You need to recommend a solution to provide users with the ability to create and publish custom
Direct Lake semantic models by using external tools. The solution must follow the principle of least
privilege.
Which three actions in the Fabric Admin portal should you include in the recommendation? Each
correct answer presents part of the solution. NOTE: Each correct answer is worth one point.
A. From the Tenant settings, set Allow XMLA Endpoints and Analyze in Excel with on-premises
datasets to Enabled.
B. From the Tenant settings, set Allow Azure Active Directory guest users to access Microsoft Fabric
to Enabled.
C. From the Tenant settings, select Users can edit data model in the Power BI service.
D. From the Capacity settings, set XMLA Endpoint to Read Write.
E. From the Tenant settings, set Users can create Fabric items to Enabled.
F. From the Tenant settings, enable Publish to Web.
Answer: ADE
18.HOTSPOT
You have a Fabric workspace that uses the default Spark starter pool and runtime version 1.2.
You plan to read a CSV file named Sales_raw.csv in a lakehouse, select columns, and save the data
as a Delta table to the managed area of the lakehouse. Sales_raw.csv contains 12 columns.
You have the following code.
For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE:
Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Yes
Yes - The Spark engine will read only the
'SalesOrderNumber','OrderDate','CustomerName','UnitPrice'
columns from Sales_raw.csv.
Note:
DataFrame.select(*cols: ColumnOrName) ? DataFrame[source] Projects a set of expressions and
returns a new DataFrame
Parameters
colsstr, Column, or list
column names (string) or expressions (Column). If one of the column names is ‘*’, that column is
expanded to include all columns in the current DataFrame.
Box 2: No
No - The Year column replaces the OrderDate column in the table.
withColumn adds one extra column
Note: pyspark.sql.dataframe.DataFrame[source]
Returns a new DataFrame by adding multiple columns or replacing the existing columns that have the
same names.
The colsMap is a map of column name and column, the column must only refer to attributes supplied
by this Dataset. It is an error to add columns that refer to some other Dataset.
Box 3: Yes
Yes - Adding inferSchema='true' to the options will increase the execution time of the query.
When you set inferSchema to True, PySpark will make an additional pass over the data to determine
the data types of each column. This can be useful when you don't have a predefined schema for your
data and want Spark to automatically deduce the types based on the actual data values.
Reference:
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.sele
ct.html
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.with
Columns.html
https://medium.com/@sujathamudadla1213/what-are-the-considerations-and-implications-of-setting-
inferschema-to-true-and-false-in-pyspark-9fc77fa2ad9a
19.DRAG DROP
You have a Fabric workspace that contains a Dataflow Gen2 query. The query returns the following
data.
You need to filter the results to ensure that only the latest version of each customer’s record is
retained.
The solution must ensure that no new columns are loaded to the semantic model.
Which four actions should you perform in sequence in Power Query Editor? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Answer:
Explanation:
20. Publish the model as a semantic model to the service.
21.You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains a
Delta table that has one million Parquet files.
You need to remove files that were NOT referenced by the table during the past 30 days. The solution
must ensure that the transaction log remains consistent, and the ACID properties of the table are
maintained.
What should you do?
A. From OneLake file explorer, delete the files.
B. Run the OPTIMIZE command and specify the Z-order parameter.
C. Run the OPTIMIZE command and specify the V-order parameter.
D. Run the VACUUM command.
Answer: D
Explanation:
VACUUM
Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime Remove
unused files from a table directory.
VACUUM removes all files from the table directory that are not managed by Delta, as well as data
files that are no longer in the latest state of the transaction log for the table and are older than a
retention threshold.
Incorrect:
Not B: What is Z order optimization?
Z-ordering is a technique to colocate related information in the same set of files. This co-locality is
automatically used by Delta Lake on Azure Databricks data-skipping algorithms. This behavior
dramatically
reduces the amount of data that Delta Lake on Azure Databricks needs to read.
Not C: Delta Lake table optimization and V-Order
V-Order is a write time optimization to the parquet file format that enables lightning-fast reads under
the
Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.
Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered parquet files
to achieve in-memory like data access times. Spark and other non-Verti-Scan compute engines also
benefit from the V-Ordered files with an average of 10% faster read times, with some scenarios up to
50%.
V-Order works by applying special sorting, row group distribution, dictionary encoding and
compression on parquet files, thus requiring less network, disk, and CPU resources in compute
engines to read it, providing cost efficiency and performance. V-Order sorting has a 15% impact on
average write times but provides up to 50% more compression.
Reference:
https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?
22.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a Fabric tenant that contains a new semantic model in OneLake.
You use a Fabric notebook to read the data into a Spark DataFrame.
You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all
the string and numeric columns.
Solution: You use the following PySpark expression:
df.explain()
Does this meet the goal?
A. Yes
B. No
Answer: B
Explanation:
Correct: You use the following PySpark expression:
df.summary()
Incorrect:
* df.describe().show()
* df.show()
* df.explain().show()
* df.explain()
explain(extended=False)[source]
Prints the (logical and physical) plans to the console for debugging purpose.
Parameters: extended C boolean, default False. If False, prints only the physical plan.
>>> df.explain()
== Physical Plan ==
Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
Note:
summary(*statistics)[source]
Computes specified statistics for numeric and string columns. Available statistics are: - count - mean -
stddev - min - max - arbitrary approximate percentiles specified as a percentage (eg, 75%)
If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles
(percentiles at 25%, 50%, and 75%), and max.
Note This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame.
>>> df.summary().show() +-------+------------------+-----+
| stddev|2.1213203435596424| null|
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html
23. Expand Workloads. In the XMLA Endpoint setting, select Read Write. The XMLA Endpoint setting
applies to all workspaces and semantic models assigned to the capacity.
Reference: https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools
24.HOTSPOT
You have a data warehouse that contains a table named Stage.Customers. Stage.Customers
contains all the customer record updates from a customer relationship management (CRM) system.
There can be multiple updates per customer.
You need to write a T-SQL query that will return the customer ID, name, postal code, and the last
updated time of the most recent row for each customer ID.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Transact SQL LAST_Value
Transact NTILE
Box 1: ROW_NUMBER() ()
* ROW_NUMBER()
Numbers the output of a result set. More specifically, returns the sequential number of a row within a
partition of a result set, starting at 1 for the first row in each partition.
Syntax: ROW_NUMBER ()
OVER ([ PARTITION BY value_expression , ... [ n ] ] order_by_clause )
Incorrect:
* LAST_VALUE() Incorrect syntax.
Note: LAST_VALUE (Transact-SQL)
Returns the last value in an ordered set of values.
Syntax
LAST_VALUE ([ scalar_expression ])[ IGNORE NULLS | RESPECT NULLS ] OVER ([
partition_by_clause ] order_by_clause [ rows_range_clause ] )
* LAST_Value()
* NTILE()
Incorrect syntax used.
Note: NTILE (Transact-SQL)
Distributes the rows in an ordered partition into a specified number of groups. The groups are
numbered, starting at one. For each row, NTILE returns the number of the group to which the row
belongs.
NTILE (integer_expression) OVER ([ <partition_by_clause> ] < order_by_clause > )
Box 2: WHERE X = 1
Reference:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql
https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql
https://learn.microsoft.com/en-us/sql/t-sql/functions/last-value-transact-sql
26.Note: This section contains one or more sets of questions with the same scenario and problem.
Each question presents a unique solution to the problem. You must determine whether the solution
meets the stated goals. More than one solution in the set might solve the problem. It is also possible
that none of the solutions in the set solve the problem.
After you answer a question in this section, you will NOT be able to return. As a result, these
questions do not appear on the Review Screen.
Your network contains an on-premises Active Directory Domain Services (AD DS) domain named
contoso.com that syncs with a Microsoft Entra tenant by using Microsoft Entra Connect.
You have a Fabric tenant that contains a semantic model.
You enable dynamic row-level security (RLS) for the model and deploy the model to the Fabric
service.
You query a measure that includes the USERNAME() function, and the query returns a blank result.
You need to ensure that the measure returns the user principal name (UPN) of a user.
Solution: You update the measure to use the USERPRINCIPALNAME() function.
Does this meet the goal?
A. Yes
B. No
Answer: A
Explanation:
The USERPRINCIPALNAME () function directly retrieves the UPN of the user querying the measure.
This is the most appropriate function to use if your goal is to obtain the UPN, which is the format
typically used in environments that integrate with Microsoft Entra.
27.DRAG DROP
You are creating a data flow in Fabric to ingest data from an Azure SQL database by using a T-SQL
statement.
You need to ensure that any foldable Power Query transformation steps are processed by the
Microsoft SQL Server engine.
How should you complete the code? To answer, drag the appropriate values to the correct targets.
Each value may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content. NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Value
Query folding on native queries
Use Value.NativeQuery function
The goal of this process is to execute the following SQL code, and to apply more transformations with
Power Query that can be folded back to the source.
SELECT DepartmentID, Name FROM HumanResources.Department WHERE GroupName =
'Research and Development'
The first step was to define the correct target, which in this case is the database where the SQL code
will be run. Once a step has the correct target, you can select that step?in this case, Source in
Applied Steps ?and then select the fx button in the formula bar to add a custom step. In this example,
replace the Source formula with the following formula:
Value.NativeQuery(Source, "SELECT DepartmentID, Name FROM HumanResources.Department
WHERE GroupName = 'Research and Development'
Box 2: NativeQuery
Box 3: EnableFolding
The most important component of this formula is the use of the optional record for the forth parameter
of the function that has the EnableFolding record field set to true.
Reference: https://learn.microsoft.com/en-us/power-query/native-query-folding
Testlet 1
Case study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on
this exam. You must manage your time to ensure that you are able to complete all questions included
on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is independent
of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin a
new section, you cannot return to this section.
Overview
Contoso, Ltd. is a US-based health supplements company. Contoso has two divisions named Sales
and Research. The Sales division contains two departments named Online Sales and Retail Sales.
The Research division assigns internally developed product lines to individual teams of researchers
and analysts.
Existing Environment
Identity Environment
Contoso has a Microsoft Entra tenant named contoso.com. The tenant contains two groups named
ResearchReviewersGroup1 and ResearchReviewersGroup2.
Data Environment
Contoso has the following data environment:
- The Sales division uses a Microsoft Power BI Premium capacity.
- The semantic model of the Online Sales department includes a fact table named Orders that uses
Import made. In the system of origin, the OrderID value represents the sequence in which orders are
created.
- The Research department uses an on-premises, third-party data warehousing product.
- Fabric is enabled for contoso.com.
- An Azure Data Lake Storage Gen2 storage account named storage1 contains Research division
data for a product line named Productline1. The data is in the delta format.
- A Data Lake Storage Gen2 storage account named storage2 contains Research division data for a
product line named Productline2. The data is in the CSV format.
Requirements
Planned Changes
Contoso plans to make the following changes:
- Enable support for Fabric in the Power BI Premium capacity used by the Sales division.
- Make all the data for the Sales division and the Research division available in Fabric.
- For the Research division, create two Fabric workspaces named Productline1ws and
Productline2ws.
- In Productline1ws, create a lakehouse named Lakehouse1.
- In Lakehouse1, create a shortcut to storage1 named ResearchProduct.
General Requirements
Contoso identifies the following high-level requirements that must be considered for all solutions:
- Follow the principle of least privilege when applicable.
- Minimize implementation and maintenance effort when possible.
Which syntax should you use in a notebook to access the Research division data for Productline1?
A. spark.read.format(“delta”).load(“Tables/ResearchProduct”)
B. spark.read.format(“delta”).load(“Files/ResearchProduct”)
C. spark.sql(“SELECT * FROM Lakehouse1.productline1.ResearchProduct”)
D. spark.read.format(“delta”).load(“Tables/productline1/ResearchProduct”)
Answer: A
Explanation:
Correct:
* spark.read.format("delta").load("Tables/ResearchProduct")
* spark.sql(“SELECT * FROM Lakehouse1.ResearchProduct”)
Incorrect:
* external_table(ResearchProduct)
* external_table(Tables/ResearchProduct)
* spark.read.format(“delta”).load(“Files/ResearchProduct”)
* spark.read.format("delta").load("Tables/productline1/ResearchProduct")
* spark.sql(“SELECT * FROM Lakehouse1.productline1.ResearchProduct”)
* spark.sql("SELECT * FROM Lakehouse1.Tables.ResearchProduct")
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
Can use either:
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
OR
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
--
The spark.read.format("delta").load(...) method is specifically designed for reading data stored in
Delta format, which is what the Research division data for Productline1 is based on.
The path "Tables/ResearchProduct" correctly refers to the shortcut created in Lakehouse1, allowing
you to access the data efficiently.
Scenario:
An Azure Data Lake Storage Gen2 storage account named storage1 contains Research division data
for a product line named Productline1. The data is in the delta format.
The Research division data for Productline1 must be retrieved from Lakehouse1 by using Fabric
notebooks.
Planned changes include:
In Lakehouse1, create a shortcut to storage1 named ResearchProduct.
Reference: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
29.You have a Fabric tenant that contains a complex semantic model. The model is based on a star
schema and contains many tables, including a fact table named Sales.
You need to visualize a diagram of the model. The diagram must contain only the Sales table and
related tables.
What should you use from Microsoft Power BI Desktop?
A. data categories
B. Data view
C. Model view
D. DAX query view
Answer: C
Explanation:
The Model view in Microsoft Power BI Desktop provides a visual representation of the relationships
between tables in your semantic model. It allows you to see the structure of your star schema,
including the Sales fact table and its related dimension tables. You can filter or focus on specific
tables (like the Sales table and its related tables) to create a simplified view.
31.You have a Fabric tenant that contains a warehouse. The warehouse uses row-level security
(RLS).
You create a Direct Lake semantic model that uses the Delta tables and RLS of the warehouse.
When users interact with a report built from the model, which mode will be used by the DAX queries?
A. DirectQuery
B. Dual
C. Direct Lake
D. Import
Answer: C
32.HOTSPOT
You need to migrate the Research division data for Productline2. The solution must meet the data
preparation requirements.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: delta
Delta Lake uses versioned Parquet files to store your data in your cloud storage.
Box 2: Tables/productline2
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
Scenario:
A Data Lake Storage Gen2 storage account named storage2 contains Research division data for a
product line named Productline2. The data is in the CSV format.
Requirements, Planned Changes
For the Research division, create two Fabric workspaces named Productline1ws and Productline2ws.
Data Preparation Requirements
Contoso identifies the following data preparation requirements:
* The Research division data for Productline1 must be retrieved from Lakehouse1 by using Fabric
notebooks.
*-> All the Research division data in the lakehouses must be presented as managed tables in
Lakehouse explorer.
Note:
Spark provides two types of tables that Azure Synapse exposes in SQL automatically:
* Managed tables
Spark provides many options for how to store data in managed tables, such as TEXT, CSV, JSON,
JDBC, PARQUET, ORC, HIVE, DELTA, and LIBSVM. These files are normally stored in the
warehouse directory where managed table data is stored.
* External tables
Reference:
https://learn.microsoft.com/en-us/azure/synapse-analytics/metadata/table
https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
33. Filter the query where the version date value equals the max version date value.
Use the result from the grouping operation to filter the original dataset, keeping only the rows where
VersionDate matches the calculated maximum version date for each CustomerID.
34.HOTSPOT
You have a Fabric tenant that contains lakehouse named Lakehouse1. Lakehouse1 contains a Delta
table with eight columns.
You receive new data that contains the same eight columns and two additional columns.
You create a Spark DataFrame and assign the DataFrame to a variable named df. The DataFrame
contains the new data.
You need to add the new data to the Delta table to meet the following requirements:
- Keep all the existing rows.
- Ensure that all the new data is added to the table.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: append
Mode "append" atomically adds new data to an existing Delta table and "overwrite" atomically
replaces all of the data in a table.
Box 2: overwriteSchema false
Explicitly update schema to change column type or name
You can change a column’s type or name or drop a column by rewriting the table. To do this, use the
overwriteSchema option.
The following example shows changing a column type:
(spark.read.table(...)
.withColumn("birthDate", col("birthDate").cast("date"))
.write
.mode("overwrite")
.option("overwriteSchema", "true")
.saveAsTable(...)
), when performing an Overwrite, the data will be deleted before writing out the new data.
Reference: https://learn.microsoft.com/en-us/azure/databricks/delta/update-schema
35. On the External Tools ribbon, select Tabular Editor. If you don’t see the Tabular Editor button,
install the program. When open, Tabular Editor will automatically connect to your model.
Step 2: Select the Address column in SalesAddress
Get DP-600 exam dumps full version.