0% found this document useful (0 votes)

205 views5 pages

DP700 Exam Prep Study Notes

The document contains study notes on using Microsoft Fabric for data ingestion, real-time intelligence, and implementing lakehouses. It includes training modules, code snippets for data processing with PySpark, and best practices for using Kusto Query Language. Additionally, it covers Delta Lake table management and optimization techniques within Microsoft Fabric.

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

205 views5 pages

DP700 Exam Prep Study Notes

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Notes from Study

Wednesday, December 18, 2024 12:44 PM

Learning Path Module Time Notes Links to check

Ingest data with Microsoft Fabric - Training | Microsoft Learn Ingest Data with Dataflows in Microsoft Fabric - 8 Power Query documentation - Power Query | Microsoft Learn
Training | Microsoft Learn
Ingest data with Microsoft Fabric - Training | Microsoft Learn Orchestrate processes and data movement with 72 Practice PySpak: Look for PySpark syntax
Microsoft Fabric - Training | Microsoft Learn df.write.format("delta").mode("append").saveAsTable("sales")

# Derive FirstName and LastName columns

df = df.withColumn("FirstName", split(col("CustomerName"), "
").getItem(0)).withColumn("LastName",split(col("CustomerName")," ").getItem(1))

## Add month and year columns

df = df.withColumn("Year", year(col("OrderDate"))).withColumn("Month",month(col("OrderDate")))
display(df)

from pyspark.sql.functions import *

# Read the new sales data
df = spark.read.format("csv").option("header","true").load("Files/RawData/Sales")

Review how to use parameters with notebooks

Ingest data with Microsoft Fabric - Training | Microsoft Learn Get started with Real-Time Intelligence in 48 Review links for Join types Tutorial: Learn common Kusto Query Language operators - Kusto | Microsoft Learn
Microsoft Fabric - Training | Microsoft Learn Tutorial: Use aggregation functions in Kusto Query Language - Kusto | Microsoft Learn
and Tutorial: Join data from multiple tables - Kusto | Microsoft Learn
Weather join operator - Kusto | Microsoft Learn
Implement Real-Time Intelligence with Microsoft Fabric - Training | Microsoft Learn | summarize EventCount = count() by State Supported sources in to Real-Time hub - Microsoft Fabric | Microsoft Learn
| sort by EventCount Supported sources in to Real-Time hub - Microsoft Fabric | Microsoft Learn
Process event data with the event processor editor - Microsoft Fabric | Microsoft Learn
Add and manage eventstream destinations - Microsoft Fabric | Microsoft Learn

Weather Review RoundL

| extend damage = DamageProperty + DamageCrops
| summarize sum(damage) by bin(StartTime, 7d)
Write your first query with Kusto Query Language - Training | Microsoft Learn
| render columnchart
Explore the fundamentals of data analysis using Kusto Query Language (KQL) - Training | Microsoft Learn
Gain insights from your data by using Kusto Query Language - Training | Microsoft Learn
Write multi-table queries by using Kusto Query Language - Training | Microsoft Learn
Weather
| extend damage = DamageProperty + DamageCrops
| summarize sum(damage) by EventType
| render piechart

Ingest data with Microsoft Fabric - Training | Microsoft Learn Use real-time eventstreams in Microsoft Fabric - 32 Review window functions https://learn.microsoft.com/en-us/training/modules/explore-event-streams-microsoft-fabric/4-route-event-data-to-destinations
and Training | Microsoft Learn
Implement Real-Time Intelligence with Microsoft Fabric - Training | Microsoft Learn

Ingest data with Microsoft Fabric - Training | Microsoft Learn Work with real-time data in a Microsoft Fabric 17 Review KQL best practices E
eventhouse - Training | Microsoft Learn Functions: getmonth(), getyear(), hourofday(), now(), ago(30min), ago(1d), ingestion_time(),
and summarize () SummaryColumnName = avg(ValueColumntoSumUp) by ColumToGroupByWith

Implement Real-Time Intelligence with Microsoft Fabric - Training | Microsoft Learn Review Materialized view syntax:
.create materialized-view NameOfView on table NameofTable

.create async materialized-view with (backfill=true) --> To ingest existing data

Review function syntax

.create-or-alter function trips_by_min_passenger_count(num_passengers:long)

case(isempty(pickup_boroname) or isnull(pickup_boroname), "Unidentified", pickup_boroname)

Implement Real-Time Intelligence with Microsoft Fabric - Training | Microsoft Learn Create a Real-Time Dashboard - Microsoft Fabric | 51 arg_max(): Use parameters in Real-Time Dashboards - Microsoft Fabric | Microsoft Learn
Microsoft Learn Finds a row in the table that maximizes the specified expression. It returns all columns of the input table or specified columns. Create real-time dashboards with Microsoft Fabric - Training | Microsoft Learn

arg_max() (aggregation function) - Kusto | Microsoft Learn

arg_max() (aggregation function) - Kusto | Microsoft Learn
bikes | where ingestion_time() between (ago(30min) .. now()) | summarize latest_observation = arg_max(ingestion_time(), *) by Neighbourhood | project
Neighbourhood, latest_observation, No_Bikes, No_Empty_Docks | order by Neighbourhood asc Best practices for Kusto Query Language queries - Kusto | Microsoft Learn

Named expressions - Kusto | Microsoft Learn

bikes
| where ingestion_time() between (ago(30min)..now())
and (isempty(['selected_neighbourhoods']) or Neighbourhood in (['selected_neighbourhoods']))
| summarize latest_observation = arg_max(ingestion_time(),*) by Neighbourhood

For best performance, if one table is always smaller than the other, use it as the left side of the join operator.

From <https://learn.microsoft.com/en-us/training/modules/multi-table-queries-with-kusto-query-language/2-multi-table-queries>
The materialize() function caches results within a query execution for subsequent reuse in the query. It's like taking a snapshot of the results of a subquery and
using it multiple times within the query. This function is useful in optimizing queries for scenarios where the results:
Are expensive to compute
Are nondeterministic

From <https://learn.microsoft.com/en-us/training/modules/multi-table-queries-with-kusto-query-language/2-multi-table-queries>

Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Introduction to end-to-end analytics using 18 minutes Review admin roles
Microsoft Fabric - Training | Microsoft Learn Workspace settings Workspaces in Microsoft Fabric and Power BI - Microsoft Fabric | Microsoft Learn

Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Get started with lakehouses in Microsoft Fabric - 60 minutes Maybe Review Spark Job Definition Security in Microsoft Fabric - Microsoft Fabric | Microsoft Learn
Training | Microsoft Learn
Create an Apache Spark job definition - Microsoft Fabric | Microsoft Learn

Unify data sources with OneLake shortcuts - Microsoft Fabric | Microsoft Learn

Workspaces in Microsoft Fabric and Power BI - Microsoft Fabric | Microsoft Learn

Roles in workspaces in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Use Apache Spark in Microsoft Fabric - Training | 120 mintues Review code in notebook 1 Master Link: Search for Data Engineering Documentation
Microsoft Learn df = spark.read.format("CSV").option("header","true").load("Files/orders/2019.csv") Data Engineering in Microsoft Fabric documentation - Microsoft Fabric | Microsoft Learn

Data engineering and science capacity admin settings - Microsoft Fabric | Microsoft Learn
orders_df.write.partitionBy("Year","Month").mode("overwrite").parquet("Files/partitioned_data") Manage settings for data engineering and science capacity - Microsoft Fabric | Microsoft Learn
Configure and manage starter pools in Fabric Spark. - Microsoft Fabric | Microsoft Learn
df.write.format("delta").saveAsTable("salesorders") Create custom Apache Spark pools in Fabric - Microsoft Fabric | Microsoft Learn
Apache Spark runtime in Fabric - Microsoft Fabric | Microsoft Learn
Create, configure, and use an environment in Fabric - Microsoft Fabric | Microsoft Learn
High concurrency mode in Apache Spark compute for Fabric - Microsoft Fabric | Microsoft Learn
from pyspark.sql.functions import *
# Create Year and Month columns
transformed_df = df.withColumn("Year", year(col("OrderDate"))).withColumn("Month", month(col("OrderDate")))
# Create the new FirstName and LastName fields
transformed_df = transformed_df.withColumn("FirstName", split(col("CustomerName"), " ").getItem(0)).withColumn("LastName",
split(col("CustomerName"), " ").getItem(1))
# Filter and reorder columns
transformed_df = transformed_df["SalesOrderNumber", "SalesOrderLineNumber", "OrderDate", "Year", "Month", "FirstName",
"LastName", "Email", "Item", "Quantity", "UnitPrice", "Tax"]
# Display the first five orders
display(transformed_df.limit(5))

sqlQuery = "SELECT CAST(YEAR(OrderDate) AS CHAR(4)) AS OrderYear, \

SUM((UnitPrice * Quantity) + Tax) AS GrossRevenue \
FROM salesorders \
GROUP BY CAST(YEAR(OrderDate) AS CHAR(4)) \
ORDER BY OrderYear"
df_spark = spark.sql(sqlQuery)
df_spark.show()

DP-700 Exam Prep Study Notes Page 1

df_spark.show()

Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Work with Delta Lake tables in Microsoft Fabric - 1 hour Use delta tables with streaming data - Training | Microsoft Learn --> Read again
Training | Microsoft Learn
Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn --> Read again
%%sql
SET spark.sql.parquet.vorder.enabled=TRUE
Using optimize write on Apache Spark to produce more efficient tables - Azure Synapse Analytics | Microsoft Learn
%%sql Low Shuffle Merge optimization on Delta tables - Azure Synapse Analytics | Microsoft Learn
CREATE TABLE person (id INT, name STRING, age INT) USING parquet TBLPROPERTIES("delta.parquet.vorder.enabled" = "true"); Delta table maintenance in Microsoft Fabric - Microsoft Fabric | Microsoft Learn
Compute management in Fabric environments - Microsoft Fabric | Microsoft Learn
%%sql Apache Spark compute for Data Engineering and Data Science - Microsoft Fabric | Microsoft Learn
ALTER TABLE person SET TBLPROPERTIES("delta.parquet.vorder.enabled" = "true");

ALTER TABLE person SET TBLPROPERTIES("delta.parquet.vorder.enabled" = "false"); Interesting but preview, won't be in exam: Native execution engine for Fabric Spark - Microsoft Fabric | Microsoft Learn

ALTER TABLE person UNSET TBLPROPERTIES("delta.parquet.vorder.enabled");

--When session level V-Order is not enabled or unset, individual operations need this:
.option("parquet.vorder.enabled ","true")\

Merge optimization: for handling unmodified rows

spark.microsoft.delta.merge.lowShuffle.enabled

Bin-compaction is achieved by the OPTIMIZE command; it merges all changes into bigger, consolidated parquet files.
Dereferenced storage clean-up is achieved by the VACUUM command.

Control V-Order when optimizing a table

The following command structures bin-compact and rewrite all affected files using V-Order,
independent of the TBLPROPERTIES setting or session configuration setting
%%sql
OPTIMIZE <table|fileOrFolderPath> VORDER;

OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> VORDER;

OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> [ZORDER BY (col_name1, col_name2, ...)] VORDER;

Apache Spark performs bin-compaction, ZORDER, VORDER sequentially.

The following commands bin-compact and rewrite all affected files using the TBLPROPERTIES setting:

%%sql
OPTIMIZE <table|fileOrFolderPath>;

OPTIMIZE <table|fileOrFolderPath> WHERE predicate;

OPTIMIZE <table|fileOrFolderPath> WHERE predicate [ZORDER BY (col_name1, col_name2, ...)];

Optimized Write: It dynamically optimizes partitions while generating files with a default 128-MB size.
Benefits of Optimized Write:
OPTIMIZE operations will be faster as it will operate on fewer files.
VACUUM command for deletion of old unreferenced files will also operate faster.
Queries will scan fewer files with more optimal file sizes, improving either read performance or resource usage.

When to avoid it:

Non partitioned tables.
Use cases where extra write latency isn't acceptable.
Large tables with well defined optimization schedules and read patterns.

spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")

SET `spark.microsoft.delta.optimizeWrite.enabled` = true

spark.conf.get("spark.microsoft.delta.optimizeWrite.enabled")
Using table properties vs. session level: SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)

Get bin size: spark.conf.get("spark.microsoft.delta.optimizeWrite.binSize")

SET `spark.microsoft.delta.optimizeWrite.binSize`

Optimize: 128 MB, and optimally close to 1 GB

Create table Use the DeltaTableBuilder API: %%sql CREATE EXTERNAL TABLE
%%sql
%PySpark CREATE TABLE salesorders
from delta.tables import * ( CREATE TABLE
DeltaTable.create(spark) \ Orderid INT NOT NULL, MyExternalTable
.tableName("products") \ OrderDate TIMESTAMP NOT NULL, USING DELTA
.addColumn("Productid", "INT") \ CustomerName STRING, LOCATION 'Files/mydata'
.addColumn("ProductName", "STRING") \ SalesTotal FLOAT NOT NULL
.addColumn("Category", "STRING") \ )
.addColumn("Price", "FLOAT") \ USING DELTA
.execute()
delta_path = "Files/mydatatable" new_df.write.format("delta").mode("overwrite").save(delta_path)
df.write.format("delta").save(delta_path) new_rows_df.write.format("delta").mode("append").save(delta_p
ath)

In Microsoft Fabric, OptimizeWrite is enabled by default.

# Disable Optimize Write at the Spark session level

spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", False)

# Enable Optimize Write at the Spark session level

spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", True)

print(spark.conf.get("spark.microsoft.delta.optimizeWrite.enabled"))

In Microsoft Fabric, the Power BI and SQL engines use Microsoft Verti-Scan technology
V-Order might not be beneficial for write-intensive scenarios such as staging data stores where data is only read once or twice. In these situations, disabling V-
Order might reduce the overall processing time for data ingestion.

VACUUM WITH SQL

%%sql
VACUUM lakehouse2.products RETAIN 168 HOURS;

%%sql
DESCRIBE HISTORY lakehouse2.products;

df.write.format("delta").partitionBy("Category").saveAsTable("partitioned_products", path="abfs_path/partitioned_products")
%%sql
CREATE TABLE partitioned_products (
ProductID INTEGER,
ProductName STRING,
Category STRING,
ListPrice DOUBLE
)
PARTITIONED BY (Category);

spark.sql("INSERT INTO products VALUES (1, 'Widget', 'Accessories', 2.99)")

or
%%sql
UPDATE products
SET Price = 2.49 WHERE ProductId = 1;

Use the Delta API:

from delta.tables import *
from pyspark.sql.functions import *
# Create a DeltaTable object
delta_path = "Files/mytable"
deltaTable = DeltaTable.forPath(spark, delta_path)
# Update the table (reduce price of accessories by 10%)
deltaTable.update( condition = "Category == 'Accessories'", set = { "Price": "Price * 0.9" })

Use time travel to work with table versioning

%%sql
DESCRIBE HISTORY products (Table name or external path)

df = spark.read.format("delta").option("versionAsOf", 0).load(delta_path)
df = spark.read.format("delta").option("timestampAsOf", '2022-01-01').load(delta_path)

Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Ingest Data with Dataflows in Microsoft Fabric - Repeat from Learning Path 1
Training | Microsoft Learn
Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Orchestrate processes and data movement with Redid lab Repeat from Learning Path 1
Microsoft Fabric - Training | Microsoft Learn part: 20
minutes with Note: Notebook parameterization is in this module!
some error
handling
Implement a Lakehouse with Microsoft Fabric DP-601T00 - Training | Microsoft Learn Organize a Fabric lakehouse using medallion 1 hour Review pySpark syntax: Overview of Fabric Git integration - Microsoft Fabric | Microsoft Learn
architecture design - Training | Microsoft Learn because of
dimensional from pyspark.sql.functions import when, lit, col, current_timestamp, input_file_name Review code in DP700Study_TransformDataForSilver for UPSERT statement
model load in # Add columns IsFlagged, CreatedTS and ModifiedTS Review lab : https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/03b-medallion-lakehouse.html
the lab
df = df.withColumn("FileName", input_file_name())
\ .withColumn("IsFlagged", when(col("OrderDate") < '2019-08-01',True).otherwise(False))
\ .withColumn("CreatedTS", current_timestamp()).withColumn("ModifiedTS", current_timestamp())

# Update CustomerName to "Unknown" if CustomerName null or empty

df = df.withColumn("CustomerName", when((col("CustomerName").isNull() | (col("CustomerName")=="")),lit("Unknown")).otherwise(col("CustomerName")))

dfdimDate_gold = df.dropDuplicates(["OrderDate"]).select(col("OrderDate"), \
dayofmonth("OrderDate").alias("Day"), \
month("OrderDate").alias("Month"), \
year("OrderDate").alias("Year"), \

DP-700 Exam Prep Study Notes Page 2

year("OrderDate").alias("Year"), \
date_format(col("OrderDate"), "MMM-yyyy").alias("mmmyyyy"), \
date_format(col("OrderDate"), "yyyyMM").alias("yyyymm"), \
).orderBy("OrderDate")

monotonically_increasing_id()
Implement a data warehouse with Microsoft Fabric DP-602T00 - Training | Microsoft Get started with data warehouses in Microsoft 15 minutes Unit 6: Secure and monitor your data warehouse - Training | Microsoft Learn: Workspaces in Power BI - Power BI | Microsoft Learn
Learn Fabric - Training | Microsoft Learn Read: Allows the user to CONNECT using the SQL connection string.
ReadData: Allows the user to read data from any table/view within the warehouse. Roles in workspaces in Power BI - Power BI | Microsoft Learn
ReadAll: Allows user to read data the raw parquet files in OneLake that can be consumed by Spark

sys.dm_exec_connections: Returns information about each connection established between the warehouse and the engine.
sys.dm_exec_sessions: Returns information about each session authenticated between the item and engine.
sys.dm_exec_requests: Returns information about each active request in a session.

KILL 'SESSION_ID WITH LONG-RUNNING QUERY';

Member, Contributor, and Viewer roles can see their own results within the warehouse, but cannot see other users' results.
Implement a data warehouse with Microsoft Fabric DP-602T00 - Training | Microsoft Load data into a Microsoft Fabric data warehouse - 1 hour Unit 2: Explore data load strategies - Training | Microsoft Learn
Learn Training | Microsoft Learn
Type 0 SCD: The dimension attributes never change.
Type 1 SCD: Overwrites existing data, doesn't keep history.
Type 2 SCD: Adds new records for changes, keeps full history for a given natural key.
Type 3 SCD: History is added as a new column.
Type 4 SCD: A new dimension is added.
Type 5 SCD: When certain attributes of a large dimension change over time, but using type 2 isn't feasible due to the dimension’s large size.
Type 6 SCD: Combination of type 2 and type 3.

Unit 4: https://learn.microsoft.com/en-us/training/modules/load-data-into-microsoft-fabric-data-warehouse/4-load-data-using-tsql

COPY my_table
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.csv,
https://myaccount.blob.core.windows.net/myblobcontainer/folder1/'
WITH (
FILE_TYPE = 'CSV',
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
FIELDTERMINATOR = '|'
)

COPY INTO test_parquet

FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
)

CREATE TABLE AS SELECT:

Allows you to create a new table based on the output of a SELECT statement.
This operation is often used for creating a copy of a table or for transforming and loading the results of complex queries.

INSERT...SELECT
Allows you to insert data from one table into another.
It’s useful when you want to copy data from one table to another without creating a new table.

When working with external data on files, we recommend that files are at least 4 MB in size.

Implement a data warehouse with Microsoft Fabric DP-602T00 - Training | Microsoft Query a data warehouse in Microsoft Fabric - 5 minutes SELECT ProductCategory,
Learn Training | Microsoft Learn ProductName,
ListPrice,
ROW_NUMBER() OVER
(PARTITION BY ProductCategory ORDER BY ListPrice DESC) AS RowNumber,
RANK() OVER
(PARTITION BY ProductCategory ORDER BY ListPrice DESC) AS Rank,
DENSE_RANK() OVER
(PARTITION BY ProductCategory ORDER BY ListPrice DESC) AS DenseRank,
NTILE(4) OVER
(PARTITION BY ProductCategory ORDER BY ListPrice DESC) AS Quartile
FROM dbo.DimProduct
ORDER BY ProductCategory;
Implement a data warehouse with Microsoft Fabric DP-602T00 - Training | Microsoft Monitor a Microsoft Fabric data warehouse - 1 hour 15 Search term "Fabric Capacity Metrics" in Learn to come to this page:
Learn Training | Microsoft Learn minutes In Spark, one CU translates to two spark vCores of compute. For example, when a customer purchases an F64 SKU, 128 spark v-cores are available for Spark Understand the metrics app compute page - Microsoft Fabric | Microsoft Learn
experiences.All Spark operations are background operations, and they're smoothed over a 24-hour period. Plan your capacity size - Microsoft Fabric | Microsoft Learn
Metrics app calculations - Microsoft Fabric | Microsoft Learn
You can view the number of executors allocated to a notebook in the Fabric monitoring hub Evaluate and optimize your Microsoft Fabric capacity - Microsoft Fabric | Microsoft Learn
KQL database CU consumption is calculated based on the number of seconds the database is active and the number of vCores used. For example, when your
Search term "fabric operations"
database uses four vCores and is active for 10 minutes, you'll consume 2,400 (4 x 10 x 60) seconds of CU.

All KQL database operations are interactive operations.

Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn
All Data Factory operations are considered background operations, and they're smoothed over a 24-hour period.
Data warehouse billing and utilization reporting - Microsoft Fabric | Microsoft Learn

Monitor connections, sessions, and requests using DMVs - Microsoft Fabric | Microsoft Learn
"The first phase of throttling begins when a capacity has consumed all its available CU resources for the next 10 minutes. For example, if you purchased 10 units
of capacity and then consumed 50 units per minute, you would create a carryforward of 40 units per minute. After two and a half minutes, you would have
Query insights - Microsoft Fabric | Microsoft Learn
accumulated a carryforward of 100 units, borrowed from future windows. At this point where all capacity is already exhausted for the next 10 minutes, Fabric
initiates its first level of throttling, and all new interactive operations are delayed by 20 seconds upon submission. If thecarryforward reaches a full hour,
interactive requests are rejected, but scheduled background operations continue to run. If the capacity accumulates a full 24hours of carryforward, the entire
capacity is frozen until the carryforward is paid off."

In simple terms, 1 Fabric capacity unit = 0.5 Warehouse vCores. For example, a Fabric capacity SKU F64 has 64 capacity units,which is equivalent to 32
Warehouse vCores.

From <https://learn.microsoft.com/en-us/fabric/data-warehouse/usage-reporting>

Implement a data warehouse with Microsoft Fabric DP-602T00 - Training | Microsoft Secure a Microsoft Fabric data warehouse - 25 minutes No outside links
Learn Training | Microsoft Learn DDM
-- For Email
ALTER TABLE Customers
ALTER COLUMN Email ADD MASKED WITH (FUNCTION = 'email()');

-- For PhoneNumber
ALTER TABLE Customers
ALTER COLUMN PhoneNumber ADD MASKED WITH (FUNCTION = 'partial(3,"XXX-XXX-",4)');

-- For CreditCardNumber
ALTER TABLE Customers
ALTER COLUMN CreditCardNumber ADD MASKED WITH (FUNCTION = 'partial(4,"XXXX-XXXX-XXXX-",4)');

RLS

--Create a schema
CREATE SCHEMA [Sec];
GO

--Create the filter predicate

CREATE FUNCTION sec.tvf_SecurityPredicatebyTenant(@TenantName AS NVARCHAR(10))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS result
WHERE @TenantName = USER_NAME() OR USER_NAME() = '[email protected]';
GO

--Create security policy and add the filter predicate

CREATE SECURITY POLICY sec.SalesPolicy
ADD FILTER PREDICATE sec.tvf_SecurityPredicatebyTenant(TenantName) ON [dbo].[Sales]
WITH (STATE = ON);
GO

CLS
-- Create roles
CREATE ROLE Doctor AUTHORIZATION dbo;
CREATE ROLE Nurse AUTHORIZATION dbo;
CREATE ROLE Receptionist AUTHORIZATION dbo;
CREATE ROLE Patient AUTHORIZATION dbo;
GO

-- Grant SELECT on all columns to all roles

GRANT SELECT ON dbo.Patients TO Doctor;
GRANT SELECT ON dbo.Patients TO Nurse;
GRANT SELECT ON dbo.Patients TO Receptionist;
GRANT SELECT ON dbo.Patients TO Patient;
GO

-- Deny SELECT on the MedicalHistory column to the Receptionist and Patient roles
DENY SELECT ON dbo.Patients (MedicalHistory) TO Receptionist;
DENY SELECT ON dbo.Patients (MedicalHistory) TO Patient;
GO

Always use parameterization methods like sp_executesql or QUOTENAME to sanitize

inputs.
From <https://learn.microsoft.com/en-us/training/modules/secure-data-warehouse-in-microsoft-fabric/5-configure-sql-granular-permissions>

CREATE PROCEDURE sp_TopTenRows @tableName NVARCHAR(128)

AS
BEGIN
DECLARE @query NVARCHAR(MAX);
SET @query = N'SELECT TOP 10 * FROM ' + QUOTENAME(@tableName);
EXEC sp_executesql @query;
END;

GRANT UNMASK ON dbo.Customers TO [<username>@<your_domain>.com];

From <https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/06d-secure-data-warehouse.html>

Manage a Microsoft Fabric environment - Training | Microsoft Learn Implement continuous integration and continuous +24 minutes
delivery (CI/CD) in Microsoft Fabric - Training | + 18 minutes
Microsoft Learn + 20 minutes

DP-700 Exam Prep Study Notes Page 3

Manage a Microsoft Fabric environment - Training | Microsoft Learn Monitor activities in Microsoft Fabric - Training | +20 minutes Column options in Monitor Hub: Activator tutorial using sample data - Microsoft Fabric | Microsoft Learn
Microsoft Learn +30 minutes • Activity name Apache Spark monitoring overview - Microsoft Fabric | Microsoft Learn
• Status
• Item type
• Start time
• Submitted by
• Location
• End time
• Duration
• Refresh type

From <https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/18-monitor-hub.html>

Manage a Microsoft Fabric environment - Training | Microsoft Learn https://learn.microsoft.com/en- 30 minutes Within each data item, granular engine permissions such as Read, ReadData, or ReadAll can be applied. Roles in workspaces in Microsoft Fabric - Microsoft Fabric | Microsoft Learn
us/training/modules/secure-data-access-in-fabric/ Workspace roles can be assigned to individuals, security groups, Microsoft 365 groups, and distribution lists
Search Roles In Workspacess
Admin - Can view, modify, share, and manage all content and data in the workspace, and manage permissions.
Member - Can view, modify, and share all content and data in the workspace.
Contributor - Can view and modify all content and data in the workspace.
Viewer - Can view all content and data in the workspace, but can't modify it.

Manage a Microsoft Fabric environment - Training | Microsoft Learn Administer a Microsoft Fabric environment - +15 Tenant is a dedicated space for organizations to create, store, and manage Fabric items.
Training | Microsoft Learn +15 Capacity is a dedicated set of resources that is available at a given time to be used.
Domain is a logical grouping of workspaces.
Workspace is a collection of items that brings together different functionality in a single tenant.

The rest of the items Configure domain workspace settings https://learn.microsoft.com/en-us/fabric/governance/domains#configure-domain-settings

Configure data workflow workspace settings Workspaces in Microsoft Fabric and Power BI - Microsoft Fabric | Microsoft Learn
Configuring dataflow storage to use Azure Data Lake Gen 2 - Power BI | Microsoft Learn
Implement database projects https://learn.microsoft.com/en-us/fabric/data-warehouse/source-control#database-projects-for-a-warehouse-in-git
Fabricators guide to database projects for Microsoft Fabric Data Warehouses - Kevin Chant
Three ways to create a Microsoft Fabric Data Warehouse Database Project - Kevin Chant
Apply sensitivity labels to items Apply sensitivity labels to Fabric items - Microsoft Fabric | Microsoft Learn
How to apply sensitivity labels in Power BI - Power BI | Microsoft Learn
Enable sensitivity labels in Fabric - Power BI | Microsoft Learn
Implement orchestration patterns with notebooks You can use parameters to pass external values into pipelines. Once the parameter is passed into the resource, it can't be changed. Parameters - Microsoft Fabric | Microsoft Learn
and pipelines, including parameters and dynamic Expressions and functions - Microsoft Fabric | Microsoft Learn
expressions @ is only removed if it is the first character. "@@" returns "@", " @" returns " @".
Search for "dynamic expressions fabric pipelines"
String interpolation: The result is always string @{X} returns the value of X in string format.
@{pipeline().parameters.firstName}
For Notebook parameters:
Develop, execute, and manage notebooks - Microsoft Fabric | Microsoft Learn
"@pipeline().parameters.myNumber" Returns 42 as a number.
"@{pipeline().parameters.myNumber}" Returns 42 as a string.

Design and implement full and incremental data select * from data_source_table where LastModifytime > '@{activity('LookupOldWaterMarkActivity').output.firstRow.WatermarkValue}' and LastModifytime <= Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn
loads '@{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}' Incrementally copy new and changed files based on the last modified date - Microsoft Fabric | Microsoft Learn

To incrementally copy files based on timestamp:

In the Copy activity under Advanced choose Filter by Last modified:
For every 5 minutes: @formatDateTime(addMinutes(pipeline().TriggerTime, -5), 'yyyy-MM-dd HH:mm:ss')
For every x minutes: @formatDateTime(addMinutes(pipeline().TriggerTime, -<your set repeat minute>), 'yyyy-MM-dd HH:mm:ss')

AddHours(…,-x)
AddDays(…,-1)
AddDays(…,-7)

Implement mirroring To successfully configure Mirroring for Azure SQL Database, the principal used to connect to the source Azure SQL Database must be granted the permission Mirroring - Microsoft Fabric | Microsoft Learn
ALTER ANY EXTERNAL MIRROR, which is included in higher level permission like CONTROL permission or the db_owner role. Microsoft Fabric Mirrored Databases From Azure SQL Database - Microsoft Fabric | Microsoft Learn
Tutorial: Configure Microsoft Fabric Mirrored Databases From Azure SQL Database - Microsoft Fabric | Microsoft Learn
When mirroring data from Azure SQL Database or Azure SQL Managed Instance, its System Assigned Managed Identity needs to have"Read and write" Limitations and Behaviors for Fabric Mirrored Databases From Azure SQL Database - Microsoft Fabric | Microsoft Learn
permission to the mirrored database. If you create the mirrored database from the Fabric portal, the permission is granted automatically. Share Your Mirrored Database and Manage Permissions - Microsoft Fabric | Microsoft Learn

By default, sharing a mirrored database grants users Read permission to the mirrored database, the associated SQL analytics endpoint, and the default semantic
model. In addition to these default permissions, you can grant: Read all SQL analytics endpoint data, Read all OneLake data,Build reports on the default
semantic model, Read and write.

Currently, you must update your Azure SQL logical server firewall rules to Allow public network access.
You must enable the Allow Azure services option to connect to your Azure SQL Database logical server.

The SPN for Azure SQL DB Must have contributor role in the workspace that has the mirrored database.

Denormalize data All that's known about the dimension member is its natural key. The fact load process needs to create a new dimension member by using Unknown attribute Modeling dimension tables in Warehouse - Microsoft Fabric | Microsoft Learn
values. Importantly, it must set the IsInferredMember audit attribute to TRUE. That way, when the late arriving details are sourced, the dimension load process Modeling fact tables in Warehouse - Microsoft Fabric | Microsoft Learn
can make the necessary updates to the dimension row. For more information, see Manage historical change in this article. Load tables in a dimensional model - Microsoft Fabric | Microsoft Learn
Handle duplicate, missing, and late-arriving data Load tables in a dimensional model - Microsoft Fabric | Microsoft Learn
Optimize a pipeline Fabric workspace admins can enable the high concurrency mode for pipelines using the workspace settings. Copy activity performance with SQL databases - Microsoft Fabric | Microsoft Learn

Intelligent throughput optimization and Parallel copy. Configure high concurrency mode for notebooks in pipelines - Microsoft Fabric | Microsoft Learn
Copy activity performance and scalability guide - Microsoft Fabric | Microsoft Learn
Staging is required when the Copy activity sink is Fabric Warehouse. Options such as Degree of copy parallelism and Intelligent throughput optimization only
apply in that case from Source to Staging. Test cases to Lakehouse did not have staging enabled.

Dynamic range with a Degree of parallel copies can significantly improve performance.

Within the For-Each activity, all of the copy activities run in parallel (up to the batch count maximum of 50) and have degree of copy parallelism set to Auto.

From <https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases>

Partition option: Specify the data partitioning options used to load data from Azure SQL Database. Allowed values are: None (default), Physical partitions of
table, and Dynamic range. When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from an Azure SQL
Database is controlled by the parallel copy setting on the copy activity.

It's recommended to leave Isolation level as None if you want to leave Degree of copy parallelism set to Auto.

If the table has a physical partition, then using the Partition option: Physical partitions of table would be the most balanced approach for transfer
duration, capacity units, and compute overhead on the source. This setting is especially ideal if you have more sessions running against the database during the
time of data movement.

From <https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases>

Consider maintainability and developer effort. While leaving the default options take the longest time to move data, running with the defaults might be the best
option, especially if the source table’s DDL is unknown. This also provides reasonable Capacity Units consumption.
Optimize a data warehouse Consider updating column-level statistics regularly after data changes that significantly change rowcount or distribution of the data. Statistics - Microsoft Fabric | Microsoft Learn

Disabling V-Order can be useful for write-intensive warehouses, such as for warehouses that are dedicated to staging data as part of a data ingestion process. Understand V-Order - Microsoft Fabric | Microsoft Learn

Group INSERT statements into batches (avoid trickle inserts) Caching in Fabric data warehousing - Microsoft Fabric | Microsoft Learn

Consider using CTAS (Transact-SQL) to write the data you want to keep in a table rather than using DELETE. If a CTAS takes the same amount of time, it's safer to Warehouse performance guidelines - Microsoft Fabric | Microsoft Learn
run since it has minimal transaction logging and can be canceled quickly if needed.
https://blog.fabric.microsoft.com/en-US/blog/announcing-automatic-data-compaction-for-fabric-warehouse//
Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations complete faster on integers than on character data.
https://learn.microsoft.com/en-us/fabric/data-warehouse/disable-v-order
There is no manual way to trigger data compaction.
https://learn.microsoft.com/en-us/fabric/data-warehouse/v-order
The Warehouse and SQL analytics endpoint have a user session limit of 724 per workspace.
Workload management - Microsoft Fabric | Microsoft Learn
The Microsoft Fabric workspace provides a natural isolation boundary of the distributed compute system.

OneLake shortcuts can be used to create read-only replicas of tables in other workspaces to distribute load across multiple SQL engines, creating an isolation
boundary. This can effectively increase the maximum number of sessions performing read-only queries.

Optimize Query Performance SQL: SQL Query Optimization: 12 Useful Performance Tuning Tips and Techniques
Use Exist() instead of Count()
Keep Wild cards at the End of Phrases -- SARGABLE

SELECT *
FROM TestTable
WHERE DATEPART(YEAR, SomeMyDate) = '2021'; --> BAD
Avoid using multiple OR in the FILTER predicate

USE BETWEEN INSTEAD OF > AND <

Choose between a pipeline and a notebook Fabric decision guide - copy activity, dataflow, or Spark - Microsoft Fabric | Microsoft Learn

Choose an appropriate data store Fabric decision guide - choose a data store - Microsoft Fabric | Microsoft Learn

Denormalize data Modeling dimension tables in Warehouse - Microsoft Fabric | Microsoft Learn

Optimize a data warehouse Warehouse performance guidelines - Microsoft Fabric | Microsoft Learn

Optimize query performance SQL Query Optimization: 12 Useful Performance Tuning Tips and Techniques

Pipeline error handling Pipeline failure and error message - Azure Data Factory | Microsoft Learn
Operationalize your Azure Data Factory or Azure Synapse Pipeline - Training | Microsoft Learn
Monitor Azure Data Factory - Azure Data Factory | Microsoft Learn
Add parameters to data factory components - Training | Microsoft Learn
Dedupe rows and find nulls by using data flow snippets - Azure Data Factory | Microsoft Learn
Orchestrating data movement and transformation in Azure Data Factory - Training | Microsoft Learn
Mapping data flow script - Azure Data Factory | Microsoft Learn
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-script#distinct-row-using-all-columns

DP-700 Exam Prep Study Notes Page 4

Generic error handling

We determine pipeline success and failures as follows:

Evaluate outcome for all leaves activities. If a leaf activity was skipped, we evaluate
its parent activity instead
Pipeline result is success if and only if all nodes evaluated succeed

After an activity ran and completed, you may reference its status with
@activity('ActivityName').Status. It's either "Succeeded"_ or "Failed". We use this
property to build conditional or logic.

@or(or(equals(activity('ActivityFailed').Status, 'Failed'),
equals(activity('ActivitySucceeded1').Status,
'Failed')),equals(activity('ActivitySucceeded1').Status, 'Failed'))
@or(equals(activity('ActivityFailed').Status, 'Succeeded'),
equals(activity('ActivitySucceeded').Status, 'Succeeded'))
@and(equals(activity('ActivityFailed').Status, 'Succeeded'),
equals(activity('ActivitySucceeded').Status, 'Succeeded'))

The pattern is equivalent to try catch block in coding. For instance, I attempt to run a copy
job, moving files into storage. However it might fail half way through. And in that case, I want to
delete the partially copied, unreliable files from the storage account (my error handling step). But
I'm OK to proceed with other activities afterwards.

SQL Reference Learn how to navigate to so that you can use this during the exam if needed. Transact-SQL Reference (Database Engine) - SQL Server | Microsoft Learn
Other DP700 exam videos
Will MJ be the first CERTIFIED Fabric Data Engineer? DP700

DP-700 Beta Exam Review: Tips to Pass and Become a Microsoft Certified Fabric Engineer

DP-700 Exam Prep Study Notes Page 5

Analyzing Data With Power BI and Power Pivot For Excel
No ratings yet
Analyzing Data With Power BI and Power Pivot For Excel
256 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
30 pages
New CS WorkBook 2023 (Computer Science) (Editable)
No ratings yet
New CS WorkBook 2023 (Computer Science) (Editable)
94 pages
How CUDA Programming Works - 1647539841016001sz6e
No ratings yet
How CUDA Programming Works - 1647539841016001sz6e
101 pages
Ushtrime
33% (3)
Ushtrime
8 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
12.data Dictionary
No ratings yet
12.data Dictionary
16 pages
25 Pyspark Transformation
No ratings yet
25 Pyspark Transformation
10 pages
Basic Audit Data Analytics
No ratings yet
Basic Audit Data Analytics
142 pages
SAQLGuide
No ratings yet
SAQLGuide
157 pages
Az-204 0
No ratings yet
Az-204 0
50 pages
MicrosoftFabric Training
No ratings yet
MicrosoftFabric Training
16 pages
DBMS NOSQLMongoDBMCQ
No ratings yet
DBMS NOSQLMongoDBMCQ
9 pages
Data Analyst Syllabus (For Aundh)
No ratings yet
Data Analyst Syllabus (For Aundh)
8 pages
KQL Cheat Sheet DP700
No ratings yet
KQL Cheat Sheet DP700
2 pages
Walmart Data Analyst Interview Experience
No ratings yet
Walmart Data Analyst Interview Experience
10 pages
Commands For O M
100% (1)
Commands For O M
13 pages
FMUv6X RT Pin Comparison
No ratings yet
FMUv6X RT Pin Comparison
45 pages
Chapter 1 Part 1
No ratings yet
Chapter 1 Part 1
41 pages
Record DSCP508 - DV-1-1
No ratings yet
Record DSCP508 - DV-1-1
89 pages
Ultimate 1MB Incognito and 1088XEL U1MB BIOS Technical Documentation
No ratings yet
Ultimate 1MB Incognito and 1088XEL U1MB BIOS Technical Documentation
45 pages
DP600 Code Used 240514
No ratings yet
DP600 Code Used 240514
27 pages
Forecasting
No ratings yet
Forecasting
58 pages
BI Lab Record
No ratings yet
BI Lab Record
52 pages
Panda
No ratings yet
Panda
39 pages
(10-12-2024) Aryan NEW Internship PROJECT REPORT
No ratings yet
(10-12-2024) Aryan NEW Internship PROJECT REPORT
28 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
28 pages
Hardware and Software. Networks
No ratings yet
Hardware and Software. Networks
32 pages
Data Science Slybus
No ratings yet
Data Science Slybus
23 pages
Power+BI Cheat+Sheet
No ratings yet
Power+BI Cheat+Sheet
1 page
Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani
No ratings yet
Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani
42 pages
Sap Hana Database: User's Guide To Measurement
No ratings yet
Sap Hana Database: User's Guide To Measurement
12 pages
Adv Data Analytics Training
No ratings yet
Adv Data Analytics Training
15 pages
Businessobjects Olap Connect User'S Guide
No ratings yet
Businessobjects Olap Connect User'S Guide
216 pages
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
No ratings yet
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
76 pages
01 Calculated Fields
No ratings yet
01 Calculated Fields
13 pages
SDH Network Elements: Anand S Jto Dts Ernakulam
No ratings yet
SDH Network Elements: Anand S Jto Dts Ernakulam
41 pages
combinationofallGROUPBYEVEYTHINGwatermark Z7n95hehml
No ratings yet
combinationofallGROUPBYEVEYTHINGwatermark Z7n95hehml
22 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
ADF - Dataflow Mapping
No ratings yet
ADF - Dataflow Mapping
27 pages
DW Quiz 1
No ratings yet
DW Quiz 1
2 pages
Results User Guide Windows
No ratings yet
Results User Guide Windows
192 pages
Be Itc MCQ 675263
No ratings yet
Be Itc MCQ 675263
18 pages
Dsmlusingpython
No ratings yet
Dsmlusingpython
10 pages
04 10-Mark Questions
No ratings yet
04 10-Mark Questions
3 pages
Data Engineering - Ignite - 4 Weeksbbbbu
No ratings yet
Data Engineering - Ignite - 4 Weeksbbbbu
18 pages
Lumira
No ratings yet
Lumira
36 pages
Grocery Store Data Warehouse: Dr. Navneet Goyal Professor Computer Science Department BITS, Pilani
No ratings yet
Grocery Store Data Warehouse: Dr. Navneet Goyal Professor Computer Science Department BITS, Pilani
23 pages
Training - Web Development With Laravel Framework (Advanced)
No ratings yet
Training - Web Development With Laravel Framework (Advanced)
18 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
10 pages
Structured Query Language (SQL)
No ratings yet
Structured Query Language (SQL)
17 pages
Azure Data Engineer Road Map
No ratings yet
Azure Data Engineer Road Map
8 pages
Java CheatSheet
No ratings yet
Java CheatSheet
30 pages
File Handling in C
No ratings yet
File Handling in C
8 pages
Architectural Patterns in de
No ratings yet
Architectural Patterns in de
15 pages
How To Use Fast Incremental Backups With Block Change Tracking With Oracle 10g
No ratings yet
How To Use Fast Incremental Backups With Block Change Tracking With Oracle 10g
11 pages
Data Analytics SQL Power Bi
No ratings yet
Data Analytics SQL Power Bi
23 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
1 page
Cloud Computing Research Paper
No ratings yet
Cloud Computing Research Paper
15 pages
Comp
No ratings yet
Comp
173 pages
Notebook 1 - Matplotlib Basics
No ratings yet
Notebook 1 - Matplotlib Basics
15 pages
Adv Data Analytics Training
No ratings yet
Adv Data Analytics Training
14 pages
Ashish Naidu - InfoCepts
No ratings yet
Ashish Naidu - InfoCepts
8 pages
Int Ques General Mcse NW
No ratings yet
Int Ques General Mcse NW
27 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
GSW 5.1 Options Pack Installation Card
No ratings yet
GSW 5.1 Options Pack Installation Card
12 pages
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
No ratings yet
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
4 pages
ADF Data Flow Cheat Sheet
No ratings yet
ADF Data Flow Cheat Sheet
9 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
6 pages
Data Analytics With Excel
No ratings yet
Data Analytics With Excel
3 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
Cloud Research
No ratings yet
Cloud Research
10 pages
Data Analyst or Bussiness Analyst
No ratings yet
Data Analyst or Bussiness Analyst
7 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
Sort Hash Tuning Paper
No ratings yet
Sort Hash Tuning Paper
4 pages
Data Warehousing
No ratings yet
Data Warehousing
29 pages
Mosheer Khan CV
No ratings yet
Mosheer Khan CV
1 page
Array in Typescript
No ratings yet
Array in Typescript
3 pages
Online Food Ordering System
No ratings yet
Online Food Ordering System
67 pages
Mosheer Khan CV
No ratings yet
Mosheer Khan CV
1 page
Unit 1: Data Base Management System
No ratings yet
Unit 1: Data Base Management System
22 pages
Core Enhancements
No ratings yet
Core Enhancements
4 pages
Nat
No ratings yet
Nat
4 pages
MongoDB Deployments Tunning
No ratings yet
MongoDB Deployments Tunning
4 pages
Chirag DataScientist
No ratings yet
Chirag DataScientist
3 pages
Specialization Manual
No ratings yet
Specialization Manual
5 pages
T SQL
No ratings yet
T SQL
5 pages
Sample Questions For Practical Exam
No ratings yet
Sample Questions For Practical Exam
2 pages
Jio Mart
No ratings yet
Jio Mart
3 pages
Apache Traffic Server
No ratings yet
Apache Traffic Server
3 pages
Pandas DataFrames
No ratings yet
Pandas DataFrames
1 page
Shriram Mantri Vidyanidhi Info Tech Academy: PG DAC Linux Question Bank
No ratings yet
Shriram Mantri Vidyanidhi Info Tech Academy: PG DAC Linux Question Bank
9 pages
Plantilla de Preparación para Examen de Certificación
No ratings yet
Plantilla de Preparación para Examen de Certificación
4 pages
Part 3: The Art and Science of Power Bi: Crafting The Data Model
No ratings yet
Part 3: The Art and Science of Power Bi: Crafting The Data Model
1 page
TIBCO Spotfire Online Training Institutes
No ratings yet
TIBCO Spotfire Online Training Institutes
6 pages
Priyansh Saini
No ratings yet
Priyansh Saini
1 page
Life of A DBA Improved
No ratings yet
Life of A DBA Improved
2 pages
2020 06 10 STD 7 Environment Q.bank Part1
No ratings yet
2020 06 10 STD 7 Environment Q.bank Part1
2 pages
2020-06-05 STD VII Math Data Handling
No ratings yet
2020-06-05 STD VII Math Data Handling
2 pages
ODI Course Content: Datawarehousing Concepts
No ratings yet
ODI Course Content: Datawarehousing Concepts
5 pages
Export C4b3ca49 E3cf 47c0 Ae60 5d6c7b70aaac
No ratings yet
Export C4b3ca49 E3cf 47c0 Ae60 5d6c7b70aaac
1 page
What Is A Digital Signal
No ratings yet
What Is A Digital Signal
3 pages
@bhaveshvaishnav844 Commented Video Timeline Topic
No ratings yet
@bhaveshvaishnav844 Commented Video Timeline Topic
3 pages
Azure DW
No ratings yet
Azure DW
2 pages
II Term Syllabus (Summative Assessment) STD 6th, 2019-20-1
No ratings yet
II Term Syllabus (Summative Assessment) STD 6th, 2019-20-1
1 page
Register in Advance For This Meeting:: STD 8 B at 7:00AM (19 - 24 Apr) Time: 7:00 - 9:15 AM
No ratings yet
Register in Advance For This Meeting:: STD 8 B at 7:00AM (19 - 24 Apr) Time: 7:00 - 9:15 AM
1 page
Exception Codes: EN-9 Xe-145F Modbus Rev B
No ratings yet
Exception Codes: EN-9 Xe-145F Modbus Rev B
1 page
Poster - Excel - Web (Treinamento) PDF
No ratings yet
Poster - Excel - Web (Treinamento) PDF
1 page
Poster Excel Web
No ratings yet
Poster Excel Web
1 page
Poster Excel Web PDF
No ratings yet
Poster Excel Web PDF
1 page
Cloud Notes
No ratings yet
Cloud Notes
1 page
Poster Excel Web PDF
No ratings yet
Poster Excel Web PDF
1 page
Poster Excel Web PDF
No ratings yet
Poster Excel Web PDF
1 page
Excel Training Poster
No ratings yet
Excel Training Poster
1 page
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
International Financial MGM
No ratings yet
International Financial MGM
8 pages
DBM PPT Ch1
No ratings yet
DBM PPT Ch1
38 pages

DP700 Exam Prep Study Notes

Uploaded by

DP700 Exam Prep Study Notes

Uploaded by

Notes from Study

Wednesday, December 18, 2024 12:44 PM

Learning Path Module Time Notes Links to check

# Derive FirstName and LastName columns

## Add month and year columns

from pyspark.sql.functions import *

Review how to use parameters with notebooks

Weather Review RoundL

.create async materialized-view with (backfill=true) --> To ingest existing data

Review function syntax

case(isempty(pickup_boroname) or isnull(pickup_boroname), "Unidentified", pickup_boroname)

arg_max() (aggregation function) - Kusto | Microsoft Learn

Named expressions - Kusto | Microsoft Learn

Workspaces in Microsoft Fabric and Power BI - Microsoft Fabric | Microsoft Learn

Roles in workspaces in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

sqlQuery = "SELECT CAST(YEAR(OrderDate) AS CHAR(4)) AS OrderYear, \

DP-700 Exam Prep Study Notes Page 1

ALTER TABLE person UNSET TBLPROPERTIES("delta.parquet.vorder.enabled");

Merge optimization: for handling unmodified rows

Control V-Order when optimizing a table

OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> VORDER;

OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> [ZORDER BY (col_name1, col_name2, ...)] VORDER;

Apache Spark performs bin-compaction, ZORDER, VORDER sequentially.

OPTIMIZE <table|fileOrFolderPath> WHERE predicate;

OPTIMIZE <table|fileOrFolderPath> WHERE predicate [ZORDER BY (col_name1, col_name2, ...)];

When to avoid it:

SET `spark.microsoft.delta.optimizeWrite.enabled` = true

Get bin size: spark.conf.get("spark.microsoft.delta.optimizeWrite.binSize")

Optimize: 128 MB, and optimally close to 1 GB

In Microsoft Fabric, OptimizeWrite is enabled by default.

# Disable Optimize Write at the Spark session level

# Enable Optimize Write at the Spark session level

VACUUM WITH SQL

spark.sql("INSERT INTO products VALUES (1, 'Widget', 'Accessories', 2.99)")

Use the Delta API:

Use time travel to work with table versioning

# Update CustomerName to "Unknown" if CustomerName null or empty

df = df.withColumn("CustomerName", when((col("CustomerName").isNull() | (col("CustomerName")=="")),lit("Unknown")).otherwise(col("CustomerName")))

DP-700 Exam Prep Study Notes Page 2

KILL 'SESSION_ID WITH LONG-RUNNING QUERY';

COPY INTO test_parquet

CREATE TABLE AS SELECT:

All KQL database operations are interactive operations.

--Create the filter predicate

--Create security policy and add the filter predicate

-- Grant SELECT on all columns to all roles

Always use parameterization methods like sp_executesql or QUOTENAME to sanitize

CREATE PROCEDURE sp_TopTenRows @tableName NVARCHAR(128)

GRANT UNMASK ON dbo.Customers TO [<username>@<your_domain>.com];

DP-700 Exam Prep Study Notes Page 3

The rest of the items Configure domain workspace settings https://learn.microsoft.com/en-us/fabric/governance/domains#configure-domain-settings

To incrementally copy files based on timestamp:

USE BETWEEN INSTEAD OF > AND <

DP-700 Exam Prep Study Notes Page 4

We determine pipeline success and failures as follows:

DP-700 Exam Prep Study Notes Page 5

You might also like