0% found this document useful (0 votes)
319 views32 pages

DP-600 Implementing Analytics Solutions Using Microsoft Fabric Updated Dumps

Itfreedumps offers the latest online questions for various IT certifications, including Microsoft and Cisco exams. The document provides sample questions and answers for specific exams like AZ-204 and DP-600, along with explanations for the correct answers. Additionally, it discusses configurations and best practices for using Microsoft Fabric and Power BI in data management and analytics.

Uploaded by

donghuachan1281
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views32 pages

DP-600 Implementing Analytics Solutions Using Microsoft Fabric Updated Dumps

Itfreedumps offers the latest online questions for various IT certifications, including Microsoft and Cisco exams. The document provides sample questions and answers for specific exams like AZ-204 and DP-600, along with explanations for the correct answers. Additionally, it discusses configurations and best practices for using Microsoft Fabric and Power BI in data management and analytics.

Uploaded by

donghuachan1281
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Itfreedumps provides the latest online questions for all IT certifications,

such as IBM, Microsoft, CompTIA, Huawei, and so on.

Hot exams are available below.

AZ-204 Developing Solutions for Microsoft Azure

820-605 Cisco Customer Success Manager

MS-203 Microsoft 365 Messaging

HPE2-T37 Using HPE OneView

300-415 Implementing Cisco SD-WAN Solutions (ENSDWI)

DP-203 Data Engineering on Microsoft Azure

500-220 Engineering Cisco Meraki Solutions v1.0

NACE-CIP1-001 Coating Inspector Level 1

NACE-CIP2-001 Coating Inspector Level 2

200-301 Implementing and Administering Cisco Solutions

Share some DP-600 exam online questions below.


1.You have a Fabric tenant that contains a lakehouse named Lakehouse1.
You need to prevent new tables added to Lakehouse1 from being added automatically to the default
semantic model of the lakehouse.
What should you configure?
A. the SQL analytics endpoint settings
B. the semantic model settings
C. the workspace settings
D. the Lakehouse1 settings
Answer: A
Explanation:
Default Power BI semantic models in Microsoft Fabric Sync the default Power BI semantic model
Previously we auto added all tables and views in the Warehouse to the default Power BI semantic
model. Based on feedback, we have modified the default behavior to not automatically add tables and
views to the default Power BI semantic model. This change will ensure the background sync will not
get triggered. This will also disable some actions like "New Measure", "Create Report", "Analyze in
Excel".
If you want to change this default behavior, you can:

2.HOTSPOT
You have a Fabric tenant.
You need to configure OneLake security for users shown in the following table.

The solution must follow the principle of least privilege.


Which permission should you assign to each user? To answer, select the appropriate options in the
answer area. NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: ReadAll
User1, Read all the Spark data
If the “Read all Apache Spark” box is checked, users will be given ReadAll. This permission allows
users to access data in OneLake. This could be through direct OneLake access, Apache Spark
queries, or the lakehouse UX.
Box 2: ReadData
User2, Read all the SQL endpoint data
If the “Read all SQL endpoint data” is checked, users will be given the ReadData permission.
ReadData gives access to all Tables in the item when accessing through the SQL Endpoint. Users
will not be able to access OneLake directly.
Reference: https://support.fabric.microsoft.com/en-us/blog/building-common-data-architectures-with-
onelake-in-microsoft-fabric

3.You have source data in a folder on a local computer.


You need to create a solution that will use Fabric to populate a data store.
The solution must meet the following requirements:
- Support the use of dataflows to load and append data to the data store.
- Ensure that Delta tables are V-Order optimized and compacted automatically.
Which two types of data stores should you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. a lakehouse
B. an Azure SQL database
C. a warehouse
D. a KQL database
Answer: AC
Explanation:
Delta Lake table format interoperability
In Microsoft Fabric, the Delta Lake table format is the standard for analytics. Delta Lake is an open-
source storage layer that brings ACID (Atomicity, Consistency, Isolation, Durability) transactions to
big data and analytics workloads.
All Fabric experiences generate and consume Delta Lake tables, driving interoperability and a unified
product experience. Delta Lake tables produced by one compute engine, such as *Synapse Data
warehouse* or Synapse Spark, can be consumed by any other engine, such as Power BI. When you
ingest data into Fabric, Fabric stores it as Delta tables by default. You can easily integrate external
data containing Delta Lake tables by using OneLake shortcuts.
The following matrix shows key Delta Lake features and their support on each Fabric capability.

Etc.
Reference: https://learn.microsoft.com/en-us/fabric/get-started/delta-lake-interoperability

4.You have a Fabric tenant that contains a lakehouse.


You plan to query sales data files by using the SQL endpoint. The files will be in an Amazon Simple
Storage Service (Amazon S3) storage bucket.
You need to recommend which file format to use and where to create a shortcut.
Which two actions should you include in the recommendation? Each correct answer presents part of
the solution. NOTE: Each correct answer is worth one point.
A. Create a shortcut in the Files section.
B. Use the Parquet format
C. Use the CSV format.
D. Create a shortcut in the Tables section.
E. Use the delta format.
Answer: BD

5.HOTSPOT
You have a Microsoft Power BI report and a semantic model that uses Direct Lake mode.
From Power BI Desktop, you open Performance analyzer as shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the
information presented in the graphic. NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: Automatic
The Direct Lake fallback behavior is set to
Power BI datasets in Direct Lake mode read delta tables directly from OneLake ? unless they have to
fall back to DirectQuery mode. Typical fallback reasons include memory pressures that can prevent
loading of columns required to process a DAX query, and certain features at the data source might
not support Direct Lake mode, like SQL views in a Warehouse. In general, Direct Lake mode provides
the best DAX query performance unless a fallback to DirectQuery mode is necessary. Because
fallback to DirectQuery mode can impact DAX query performance, it's important to analyze query
processing for a Direct Lake dataset to identify if and how often fallbacks occur.
Note: Fallback behavior
Direct Lake models include the DirectLakeBehavior property, which has three options:
Automatic - (Default) Specifies queries fall back to DirectQuery mode if data can't be efficiently loaded
into memory.
DirectLakeOnly - Specifies all queries use Direct Lake mode only. Fallback to DirectQuery mode is
disabled. If data can't be loaded into memory, an error is returned. Use this setting to determine if
DAX queries fail to load data into memory, forcing an error to be returned.
DirectQueryOnly - Specifies all queries use DirectQuery mode only. Use this setting to test fallback
performance.
Box 2: Direct Query
In the Performance analyzer pane, select Refresh visuals, and then expand the Card visual. The card
visual doesn't cause any DirectQuery processing, which indicates the dataset was able to process the
visual’s DAX queries in Direct Lake mode.
If the dataset falls back to DirectQuery mode to process the visual’s DAX query, you see a Direct
query performance metric, as shown in the following image:
Reference:
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview

6. Sign into Fabric and navigate to the workspace you want to connect with.

7.You have a Fabric tenant that uses a Microsoft Power BI Premium capacity.
You need to enable scale-out for a semantic model.
What should you do first?
A. At the semantic model level, set Large Semantic model storage format to Off.
B. At the tenant level, set Create and use Metrics to Enabled.
C. At the semantic model level, set Large Semantic model storage format to On.
D. At the tenant level, set Data Activator to Enabled.
Answer: C
Explanation:
Power BI semantic model scale-out
Prerequisites
By default, scale-out is enabled for your tenant, but it's not enabled for semantic models in your
tenant. To enable scale-out for a semantic model, you must use the Power BI REST APIs. Before
enabling, the
following prerequisites must be met:
* The Scale-out queries for large semantic models setting for your tenant is enabled (default).
* Your workspace resides on a Power BI Premium capacity
*-> The Large semantic model storage format setting is enabled.
* Etc.
Note: Power BI semantic models can store data in a highly compressed in-memory cache for
optimized query performance, enabling fast user interactivity. With Premium capacities, large
semantic models beyond the default limit can be enabled with the Large semantic model storage
format setting. When enabled, semantic model size is limited by the Premium capacity size or the
maximum size set by the administrator.
Enable large semantic models
Steps here describe enabling large semantic models for a new model published to the service. For
existing semantic models, only step 3 is necessary.

8.HOTSPOT
You have a Fabric tenant that contains a lakehouse named Lakehouse1.
Lakehouse1 contains a table named Nyctaxi_raw. Nyctaxi_raw contains the following table:

You create a Fabric notebook and attach it to Lakehouse1.


You need to use PySpark code to transform the data.
The solution must meet the following requirements:
- Add a column named pickupDate that will contain only the date portion of pickupDateTime.
- Filter the DataFrame to include only rows where fareAmount is a positive number that is less than
100.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: df.withColumnRenamed
Add a column named pickupDate that will contain only the date portion of pickupDateTime.
withColumnRenamed(existing, new)[source]
Returns a new DataFrame by renaming an existing column. This is a no-op if schema doesn’t contain
the given column name.
Parameters:
existing C string, name of the existing column to rename.
col C string, new name of the column.
>>> df.withColumnRenamed('age', 'age2').collect() [Row(age2=2, name='Alice'), Row(age2=5,
name='Bob')]
Incorrect:
* df.withColumn withColumn(colName, col)[source]
Returns a new DataFrame by adding a column or replacing the existing column that has the same
name.
The column expression must be an expression over this DataFrame; attempting to add a column from
some other dataframe will raise an error.
Parameters:
colName C string, name of the new column.
col C a Column expression for the new column.
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]
Box 2: cast('date')
cast(dataType)[source]
Convert the column into type dataType.
>>> df.select(df.age.cast("string").alias('ages')).collect() [Row(ages='2'), Row(ages='5')]
>>> df.select(df.age.cast(StringType()).alias('ages')).collect() [Row(ages='2'), Row(ages='5')]
Box 3: .filter("fareAmount > 0 AND fareAmount < 100"
Filter the DataFrame to include only rows where fareAmount is a positive number that is less than
100.
filter(condition)[source]
Filters rows using the given condition.
where() is an alias for filter().
>>> df.filter(df.age > 3).collect() [Row(age=5, name='Bob')]
>>> df.where(df.age == 2).collect() [Row(age=2, name='Alice')]
>>> df.filter("age > 3").collect() [Row(age=5, name='Bob')]
>>> df.where("age = 2").collect() [Row(age=2, name='Alice')]
Incorrect:
*.where
Isin will not give the desired result.
Note: In Apache Spark, the where() function can be used to filter rows in a DataFrame based on a
given condition. The condition is specified as a string that is evaluated for each row in the DataFrame.
Rows for which the condition evaluates to True are retained, while those for which it evaluates to
False are removed.
isin(*cols)[source]
A boolean expression that is evaluated to true if the value of this expression is contained by the
evaluated values of the arguments.
>>> df[df.name.isin("Bob", "Mike")].collect() [Row(age=5, name='Bob')]
>>> df[df.age.isin([1, 2, 3])].collect() [Row(age=2, name='Alice')]
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html

9.Which syntax should you use in a notebook to access the Research division data for Productline1?
A. spark.read.format(“delta”).load(“Tables/ResearchProduct”)
B. spark.read.format(“delta”).load(“Files/ResearchProduct”)
C. external_table(‘Tables/ResearchProduct)
D. external_table(ResearchProduct)
Answer: A
Explanation:
Correct:
* spark.read.format(“delta”).load(“Tables/ResearchProduct”)
* spark.sql(“SELECT * FROM Lakehouse1.ResearchProduct ”)
Incorrect:
* external_table(‘Tables/ResearchProduct)
* external_table(ResearchProduct)
* spark.read.format(“delta”).load(“Files/ResearchProduct”)
* spark.read.format(“delta”).load(“Tables/productline1/ResearchProduct”)
* spark.sql(“SELECT * FROM Lakehouse1.Tables.ResearchProduct ”)
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
Can use either:
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
OR
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
Reference: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts

10.HOTSPOT
You have a Fabric tenant that contains a lakehouse named LH1.
You need to deploy a new semantic model.
The solution must meet the following requirements:
- Support complex calculated columns that include aggregate functions, calculated tables, and
Multidimensional Expressions (MDX) user hierarchies.
- Minimize page rendering times.
How should you configure the model? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Mode C Import
The Import mode allows for complex calculated columns, calculated tables, and MDX user
hierarchies. This mode loads the data into memory, enabling fast query performance and minimizing
page rendering times.
Query Caching C On
Enabling query caching improves performance by caching the results of queries, reducing the time it
takes to render pages.

11.You have a Fabric workspace named Workspace1 and an Azure SQL database.
You plan to create a dataflow that will read data from the database, and then transform the data by
performing an inner join.
You need to ignore spaces in the values when performing the inner join. The solution must minimize
development effort.
What should you do?
A. Append the queries by using fuzzy matching.
B. Merge the queries by using fuzzy matching.
C. Append the queries by using a lookup table.
D. Merge the queries by using a lookup table.
Answer: B
Explanation:
Joins are merge operations.
Join transformation in mapping data flow
Use the join transformation to combine data from two sources or streams in a mapping data flow. The
output stream will include all columns from both sources matched based on a join condition.
Inner join only outputs rows that have matching values in both tables.
Fuzzy join
You can choose to join based on fuzzy join logic instead of exact column value matching by turning
on the "Use fuzzy matching" checkbox option.
*-> Combine text parts: Use this option to find matches by remove space between words. For
example, Data Factory is matched with DataFactory if this option is enabled.
Similarity score column: You can optionally choose to store the matching score for each row in a
column by entering a new column name here to store that value.
Similarity threshold: Choose a value between 60 and 100 as a percentage match between values in
the columns you've selected.
Reference: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-join

12. Set the permissions for the table or column to None or Read.

13. Manually enable the Sync the default Power BI semantic model setting for each Warehouse or
SQL analytics endpoint in the workspace. This will restart the background sync that will incur some
consumption costs.

14. Select the role you want to enable an OLS definition for, and expand the Table Permissions.
Step 3: Set Object Level Security to None for SalesRegionManager

15. Question Set 3

You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 is assigned to
a Fabric capacity.
You need to recommend a solution to provide users with the ability to create and publish custom
Direct Lake semantic models by using external tools. The solution must follow the principle of least
privilege.
Which three actions in the Fabric Admin portal should you include in the recommendation? Each
correct answer presents part of the solution. NOTE: Each correct answer is worth one point.
A. From the Tenant settings, set Allow XMLA Endpoints and Analyze in Excel with on-premises
datasets to Enabled.
B. From the Tenant settings, set Allow Azure Active Directory guest users to access Microsoft Fabric
to Enabled.
C. From the Tenant settings, select Users can edit data model in the Power BI service.
D. From the Capacity settings, set XMLA Endpoint to Read Write.
E. From the Tenant settings, set Users can create Fabric items to Enabled.
F. From the Tenant settings, enable Publish to Web.
Answer: ADE

16.You have a Fabric tenant that contains a semantic model.


You need to prevent report creators from populating visuals by using implicit measures.
What are two tools that you can use to achieve the goal? Each correct answer presents a complete
solution. NOTE: Each correct answer is worth one point.
A. Microsoft Power BI Desktop
B. Tabular Editor
C. Microsoft SQL Server Management Studio (SSMS)
D. DAX Studio
Answer: AB
17.You need to recommend a solution to prepare the tenant for the PoC.
Which two actions should you recommend performing from the Fabric Admin portal? Each correct
answer presents part of the solution. NOTE: Each correct answer is worth one point.
A. Enable the Users can try Microsoft Fabric paid features option for the entire organization.
B. Enable the Users can try Microsoft Fabric paid features option for specific security groups.
C. Enable the Allow Azure Active Directory guest users to access Microsoft Fabric option for specific
security groups.
D. Enable the Users can create Fabric items option and exclude specific security groups.
E. Enable the Users can create Fabric items option for specific security groups.
Answer: BE
Explanation:
B: Fabric trial capacity for the analytics team.
Scenario: Planned Changes
Litware plans to enable Fabric features in the existing tenant. The analytics team will create a new
data store as a proof of concept (PoC). The remaining Litware users will only get access to the Fabric
features once the PoC is complete. The PoC will be completed by using a Fabric trial capacity.
E: Enable the Users can create Fabric items option for the data engineers.
Scenario: The data engineers will create data pipelines to load data to OneLake either hourly or daily
depending on the data source. The analytics engineers will create processes to ingest, transform, and
load the data to the data store in the AnalyticsPOC workspace daily. Whenever possible, the data
engineers will use low-code tools for data ingestion. The choice of which data cleansing and
transformation tools to use will be at the data engineers’ discretion.

18.HOTSPOT
You have a Fabric workspace that uses the default Spark starter pool and runtime version 1.2.
You plan to read a CSV file named Sales_raw.csv in a lakehouse, select columns, and save the data
as a Delta table to the managed area of the lakehouse. Sales_raw.csv contains 12 columns.
You have the following code.

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE:
Each correct selection is worth one point.
Answer:

Explanation:
Box 1: Yes
Yes - The Spark engine will read only the
'SalesOrderNumber','OrderDate','CustomerName','UnitPrice'
columns from Sales_raw.csv.
Note:
DataFrame.select(*cols: ColumnOrName) ? DataFrame[source] Projects a set of expressions and
returns a new DataFrame
Parameters
colsstr, Column, or list
column names (string) or expressions (Column). If one of the column names is ‘*’, that column is
expanded to include all columns in the current DataFrame.
Box 2: No
No - The Year column replaces the OrderDate column in the table.
withColumn adds one extra column
Note: pyspark.sql.dataframe.DataFrame[source]
Returns a new DataFrame by adding multiple columns or replacing the existing columns that have the
same names.
The colsMap is a map of column name and column, the column must only refer to attributes supplied
by this Dataset. It is an error to add columns that refer to some other Dataset.
Box 3: Yes
Yes - Adding inferSchema='true' to the options will increase the execution time of the query.
When you set inferSchema to True, PySpark will make an additional pass over the data to determine
the data types of each column. This can be useful when you don't have a predefined schema for your
data and want Spark to automatically deduce the types based on the actual data values.
Reference:
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.sele
ct.html
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.with
Columns.html
https://medium.com/@sujathamudadla1213/what-are-the-considerations-and-implications-of-setting-
inferschema-to-true-and-false-in-pyspark-9fc77fa2ad9a

19.DRAG DROP
You have a Fabric workspace that contains a Dataflow Gen2 query. The query returns the following
data.

You need to filter the results to ensure that only the latest version of each customer’s record is
retained.
The solution must ensure that no new columns are loaded to the semantic model.
Which four actions should you perform in sequence in Power Query Editor? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Answer:

Explanation:
20. Publish the model as a semantic model to the service.

21.You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains a
Delta table that has one million Parquet files.
You need to remove files that were NOT referenced by the table during the past 30 days. The solution
must ensure that the transaction log remains consistent, and the ACID properties of the table are
maintained.
What should you do?
A. From OneLake file explorer, delete the files.
B. Run the OPTIMIZE command and specify the Z-order parameter.
C. Run the OPTIMIZE command and specify the V-order parameter.
D. Run the VACUUM command.
Answer: D
Explanation:
VACUUM
Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime Remove
unused files from a table directory.
VACUUM removes all files from the table directory that are not managed by Delta, as well as data
files that are no longer in the latest state of the transaction log for the table and are older than a
retention threshold.
Incorrect:
Not B: What is Z order optimization?
Z-ordering is a technique to colocate related information in the same set of files. This co-locality is
automatically used by Delta Lake on Azure Databricks data-skipping algorithms. This behavior
dramatically
reduces the amount of data that Delta Lake on Azure Databricks needs to read.
Not C: Delta Lake table optimization and V-Order
V-Order is a write time optimization to the parquet file format that enables lightning-fast reads under
the
Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.
Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered parquet files
to achieve in-memory like data access times. Spark and other non-Verti-Scan compute engines also
benefit from the V-Ordered files with an average of 10% faster read times, with some scenarios up to
50%.
V-Order works by applying special sorting, row group distribution, dictionary encoding and
compression on parquet files, thus requiring less network, disk, and CPU resources in compute
engines to read it, providing cost efficiency and performance. V-Order sorting has a 15% impact on
average write times but provides up to 50% more compression.
Reference:
https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?

22.Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a Fabric tenant that contains a new semantic model in OneLake.
You use a Fabric notebook to read the data into a Spark DataFrame.
You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all
the string and numeric columns.
Solution: You use the following PySpark expression:
df.explain()
Does this meet the goal?
A. Yes
B. No
Answer: B
Explanation:
Correct: You use the following PySpark expression:
df.summary()
Incorrect:
* df.describe().show()
* df.show()
* df.explain().show()
* df.explain()
explain(extended=False)[source]
Prints the (logical and physical) plans to the console for debugging purpose.
Parameters: extended C boolean, default False. If False, prints only the physical plan.
>>> df.explain()
== Physical Plan ==
Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
Note:
summary(*statistics)[source]
Computes specified statistics for numeric and string columns. Available statistics are: - count - mean -
stddev - min - max - arbitrary approximate percentiles specified as a percentage (eg, 75%)
If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles
(percentiles at 25%, 50%, and 75%), and max.
Note This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame.
>>> df.summary().show() +-------+------------------+-----+
| stddev|2.1213203435596424| null|
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html

23. Expand Workloads. In the XMLA Endpoint setting, select Read Write. The XMLA Endpoint setting
applies to all workspaces and semantic models assigned to the capacity.
Reference: https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools

24.HOTSPOT
You have a data warehouse that contains a table named Stage.Customers. Stage.Customers
contains all the customer record updates from a customer relationship management (CRM) system.
There can be multiple updates per customer.
You need to write a T-SQL query that will return the customer ID, name, postal code, and the last
updated time of the most recent row for each customer ID.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Transact SQL LAST_Value
Transact NTILE
Box 1: ROW_NUMBER() ()
* ROW_NUMBER()
Numbers the output of a result set. More specifically, returns the sequential number of a row within a
partition of a result set, starting at 1 for the first row in each partition.
Syntax: ROW_NUMBER ()
OVER ([ PARTITION BY value_expression , ... [ n ] ] order_by_clause )
Incorrect:
* LAST_VALUE() Incorrect syntax.
Note: LAST_VALUE (Transact-SQL)
Returns the last value in an ordered set of values.
Syntax
LAST_VALUE ([ scalar_expression ])[ IGNORE NULLS | RESPECT NULLS ] OVER ([
partition_by_clause ] order_by_clause [ rows_range_clause ] )
* LAST_Value()
* NTILE()
Incorrect syntax used.
Note: NTILE (Transact-SQL)
Distributes the rows in an ordered partition into a specified number of groups. The groups are
numbered, starting at one. For each row, NTILE returns the number of the group to which the row
belongs.
NTILE (integer_expression) OVER ([ <partition_by_clause> ] < order_by_clause > )
Box 2: WHERE X = 1
Reference:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql
https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql
https://learn.microsoft.com/en-us/sql/t-sql/functions/last-value-transact-sql

25. Create a new domain:


- This requires the ability to manage domains, which is specifically granted by the domain admin role.

26.Note: This section contains one or more sets of questions with the same scenario and problem.
Each question presents a unique solution to the problem. You must determine whether the solution
meets the stated goals. More than one solution in the set might solve the problem. It is also possible
that none of the solutions in the set solve the problem.
After you answer a question in this section, you will NOT be able to return. As a result, these
questions do not appear on the Review Screen.
Your network contains an on-premises Active Directory Domain Services (AD DS) domain named
contoso.com that syncs with a Microsoft Entra tenant by using Microsoft Entra Connect.
You have a Fabric tenant that contains a semantic model.
You enable dynamic row-level security (RLS) for the model and deploy the model to the Fabric
service.
You query a measure that includes the USERNAME() function, and the query returns a blank result.
You need to ensure that the measure returns the user principal name (UPN) of a user.
Solution: You update the measure to use the USERPRINCIPALNAME() function.
Does this meet the goal?
A. Yes
B. No
Answer: A
Explanation:
The USERPRINCIPALNAME () function directly retrieves the UPN of the user querying the measure.
This is the most appropriate function to use if your goal is to obtain the UPN, which is the format
typically used in environments that integrate with Microsoft Entra.

27.DRAG DROP
You are creating a data flow in Fabric to ingest data from an Azure SQL database by using a T-SQL
statement.
You need to ensure that any foldable Power Query transformation steps are processed by the
Microsoft SQL Server engine.
How should you complete the code? To answer, drag the appropriate values to the correct targets.
Each value may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Value
Query folding on native queries
Use Value.NativeQuery function
The goal of this process is to execute the following SQL code, and to apply more transformations with
Power Query that can be folded back to the source.
SELECT DepartmentID, Name FROM HumanResources.Department WHERE GroupName =
'Research and Development'
The first step was to define the correct target, which in this case is the database where the SQL code
will be run. Once a step has the correct target, you can select that step?in this case, Source in
Applied Steps ?and then select the fx button in the formula bar to add a custom step. In this example,
replace the Source formula with the following formula:
Value.NativeQuery(Source, "SELECT DepartmentID, Name FROM HumanResources.Department
WHERE GroupName = 'Research and Development'
Box 2: NativeQuery
Box 3: EnableFolding
The most important component of this formula is the use of the optional record for the forth parameter
of the function that has the EnableFolding record field set to true.

Reference: https://learn.microsoft.com/en-us/power-query/native-query-folding

28. Prepare data

Testlet 1
Case study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on
this exam. You must manage your time to ensure that you are able to complete all questions included
on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is independent
of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin a
new section, you cannot return to this section.

To start the case study


To display the first question in this case study, click the Next button. Use the buttons in the left pane
to explore the content of the case study before you answer the questions. Clicking these buttons
displays information such as business requirements, existing environment, and problem statements. If
the case study has an All Information tab, note that the information displayed is identical to the
information displayed on the subsequent tabs. When you are ready to answer a question, click the
Question button to return to the question.

Overview
Contoso, Ltd. is a US-based health supplements company. Contoso has two divisions named Sales
and Research. The Sales division contains two departments named Online Sales and Retail Sales.
The Research division assigns internally developed product lines to individual teams of researchers
and analysts.

Existing Environment
Identity Environment
Contoso has a Microsoft Entra tenant named contoso.com. The tenant contains two groups named
ResearchReviewersGroup1 and ResearchReviewersGroup2.

Data Environment
Contoso has the following data environment:
- The Sales division uses a Microsoft Power BI Premium capacity.
- The semantic model of the Online Sales department includes a fact table named Orders that uses
Import made. In the system of origin, the OrderID value represents the sequence in which orders are
created.
- The Research department uses an on-premises, third-party data warehousing product.
- Fabric is enabled for contoso.com.
- An Azure Data Lake Storage Gen2 storage account named storage1 contains Research division
data for a product line named Productline1. The data is in the delta format.
- A Data Lake Storage Gen2 storage account named storage2 contains Research division data for a
product line named Productline2. The data is in the CSV format.

Requirements
Planned Changes
Contoso plans to make the following changes:
- Enable support for Fabric in the Power BI Premium capacity used by the Sales division.
- Make all the data for the Sales division and the Research division available in Fabric.
- For the Research division, create two Fabric workspaces named Productline1ws and
Productline2ws.
- In Productline1ws, create a lakehouse named Lakehouse1.
- In Lakehouse1, create a shortcut to storage1 named ResearchProduct.

Data Analytics Requirements


Contoso identifies the following data analytics requirements:
- All the workspaces for the Sales division and the Research division must support all Fabric
experiences.
- The Research division workspaces must use a dedicated, on-demand capacity that has per-minute
billing.
- The Research division workspaces must be grouped together logically to support OneLake data hub
filtering based on the department name.
- For the Research division workspaces, the members of ResearchReviewersGroup1 must be able to
read lakehouse and warehouse data and shortcuts by using SQL endpoints.
- For the Research division workspaces, the members of ResearchReviewersGroup2 must be able to
read lakehouse data by using Lakehouse explorer.
- All the semantic models and reports for the Research division must use version control that supports
branching.

Data Preparation Requirements


Contoso identifies the following data preparation requirements:
- The Research division data for Productline1 must be retrieved from Lakehouse1 by using Fabric
notebooks.
- All the Research division data in the lakehouses must be presented as managed tables in
Lakehouse explorer.

Semantic Model Requirements


Contoso identifies the following requirements for implementing and managing semantic models:
- The number of rows added to the Orders table during refreshes must be minimized.
- The semantic models in the Research division workspaces must use Direct Lake mode.

General Requirements
Contoso identifies the following high-level requirements that must be considered for all solutions:
- Follow the principle of least privilege when applicable.
- Minimize implementation and maintenance effort when possible.

Which syntax should you use in a notebook to access the Research division data for Productline1?
A. spark.read.format(“delta”).load(“Tables/ResearchProduct”)
B. spark.read.format(“delta”).load(“Files/ResearchProduct”)
C. spark.sql(“SELECT * FROM Lakehouse1.productline1.ResearchProduct”)
D. spark.read.format(“delta”).load(“Tables/productline1/ResearchProduct”)
Answer: A
Explanation:
Correct:
* spark.read.format("delta").load("Tables/ResearchProduct")
* spark.sql(“SELECT * FROM Lakehouse1.ResearchProduct”)
Incorrect:
* external_table(ResearchProduct)
* external_table(Tables/ResearchProduct)
* spark.read.format(“delta”).load(“Files/ResearchProduct”)
* spark.read.format("delta").load("Tables/productline1/ResearchProduct")
* spark.sql(“SELECT * FROM Lakehouse1.productline1.ResearchProduct”)
* spark.sql("SELECT * FROM Lakehouse1.Tables.ResearchProduct")
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
Can use either:
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
OR
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
--
The spark.read.format("delta").load(...) method is specifically designed for reading data stored in
Delta format, which is what the Research division data for Productline1 is based on.
The path "Tables/ResearchProduct" correctly refers to the shortcut created in Lakehouse1, allowing
you to access the data efficiently.
Scenario:
An Azure Data Lake Storage Gen2 storage account named storage1 contains Research division data
for a product line named Productline1. The data is in the delta format.
The Research division data for Productline1 must be retrieved from Lakehouse1 by using Fabric
notebooks.
Planned changes include:
In Lakehouse1, create a shortcut to storage1 named ResearchProduct.
Reference: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts

29.You have a Fabric tenant that contains a complex semantic model. The model is based on a star
schema and contains many tables, including a fact table named Sales.
You need to visualize a diagram of the model. The diagram must contain only the Sales table and
related tables.
What should you use from Microsoft Power BI Desktop?
A. data categories
B. Data view
C. Model view
D. DAX query view
Answer: C
Explanation:
The Model view in Microsoft Power BI Desktop provides a visual representation of the relationships
between tables in your semantic model. It allows you to see the structure of your star schema,
including the Sales fact table and its related dimension tables. You can filter or focus on specific
tables (like the Sales table and its related tables) to create a simplified view.

30.You have a Fabric tenant.


You are creating a Fabric Data Factory pipeline.
You have a stored procedure that returns the number of active customers and their average sales for
the current month.
You need to add an activity that will execute the stored procedure in a warehouse. The returned
values must be available to the downstream activities of the pipeline.
Which type of activity should you add?
A. Switch
B. KQL
C. Append variable
D. Lookup
Answer: D
Explanation:
Lookup Activity
Lookup Activity can be used to read or look up a record/ table name/ value from any external source.
This output can further be referenced by succeeding activities.
Note: Lookup activity can retrieve a dataset from any of the data sources supported by data factory
and Synapse pipelines. You can use it to dynamically determine which objects to operate on in a
subsequent activity, instead of hard coding the object name. Some object examples are files and
tables.
Lookup activity reads and returns the content of a configuration file or table. It also returns the result
of executing a query or stored procedure. The output can be a singleton value or an array of
attributes, which can be consumed in a subsequent copy, transformation, or control flow activities like
ForEach activity.
Incorrect:
* Append variable
Append Variable activity in Azure Data Factory and Synapse Analytics
Use the Append Variable activity to add a value to an existing array variable defined in a Data Factory
or Synapse Analytics pipeline
* Copy data
In Data Pipeline, you can use the Copy activity to copy data among data stores located in the cloud.
After you copy the data, you can use other activities to further transform and analyze it. You can also
use the Copy activity to publish transformation and analysis results for business intelligence (BI) and
application consumption.
* KQL
The KQL activity in Data Factory for Microsoft Fabric allows you to run a query in Kusto Query
Language (KQL) against an Azure Data Explorer instance.
* Switch
The Switch activity in Microsoft Fabric provides the same functionality that a switch statement
provides in programming languages. It evaluates a set of activities corresponding to a case that
matches the condition evaluation.
Reference:
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-append-variable-activity

31.You have a Fabric tenant that contains a warehouse. The warehouse uses row-level security
(RLS).
You create a Direct Lake semantic model that uses the Delta tables and RLS of the warehouse.
When users interact with a report built from the model, which mode will be used by the DAX queries?
A. DirectQuery
B. Dual
C. Direct Lake
D. Import
Answer: C

32.HOTSPOT
You need to migrate the Research division data for Productline2. The solution must meet the data
preparation requirements.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: delta
Delta Lake uses versioned Parquet files to store your data in your cloud storage.
Box 2: Tables/productline2
Note: Apache Spark
Apache Spark notebooks and Apache Spark jobs can use shortcuts that you create in OneLake.
Relative file paths can be used to directly read data from shortcuts. Additionally, if you create a
shortcut in the Tables section of the lakehouse and it is in the Delta format, you can read it as a
managed table using Apache Spark SQL syntax.
df = spark.read.format("delta").load("Tables/MyShortcut")
display(df)
df = spark.sql("SELECT * FROM MyLakehouse.MyShortcut LIMIT 1000")
display(df)
Scenario:
A Data Lake Storage Gen2 storage account named storage2 contains Research division data for a
product line named Productline2. The data is in the CSV format.
Requirements, Planned Changes
For the Research division, create two Fabric workspaces named Productline1ws and Productline2ws.
Data Preparation Requirements
Contoso identifies the following data preparation requirements:
* The Research division data for Productline1 must be retrieved from Lakehouse1 by using Fabric
notebooks.
*-> All the Research division data in the lakehouses must be presented as managed tables in
Lakehouse explorer.
Note:
Spark provides two types of tables that Azure Synapse exposes in SQL automatically:
* Managed tables
Spark provides many options for how to store data in managed tables, such as TEXT, CSV, JSON,
JDBC, PARQUET, ORC, HIVE, DELTA, and LIBSVM. These files are normally stored in the
warehouse directory where managed table data is stored.
* External tables
Reference:
https://learn.microsoft.com/en-us/azure/synapse-analytics/metadata/table
https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts

33. Filter the query where the version date value equals the max version date value.
Use the result from the grouping operation to filter the original dataset, keeping only the rows where
VersionDate matches the calculated maximum version date for each CustomerID.

34.HOTSPOT
You have a Fabric tenant that contains lakehouse named Lakehouse1. Lakehouse1 contains a Delta
table with eight columns.
You receive new data that contains the same eight columns and two additional columns.
You create a Spark DataFrame and assign the DataFrame to a variable named df. The DataFrame
contains the new data.
You need to add the new data to the Delta table to meet the following requirements:
- Keep all the existing rows.
- Ensure that all the new data is added to the table.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: append
Mode "append" atomically adds new data to an existing Delta table and "overwrite" atomically
replaces all of the data in a table.
Box 2: overwriteSchema false
Explicitly update schema to change column type or name
You can change a column’s type or name or drop a column by rewriting the table. To do this, use the
overwriteSchema option.
The following example shows changing a column type:
(spark.read.table(...)
.withColumn("birthDate", col("birthDate").cast("date"))
.write
.mode("overwrite")
.option("overwriteSchema", "true")
.saveAsTable(...)
), when performing an Overwrite, the data will be deleted before writing out the new data.
Reference: https://learn.microsoft.com/en-us/azure/databricks/delta/update-schema

35. On the External Tools ribbon, select Tabular Editor. If you don’t see the Tabular Editor button,
install the program. When open, Tabular Editor will automatically connect to your model.
Step 2: Select the Address column in SalesAddress
Get DP-600 exam dumps full version.

Powered by TCPDF (www.tcpdf.org)

You might also like