Fabric and Data Bricks
Fabric and Data Bricks
Synergy in Analytics:
Unifying Azure Databricks
and Microsoft Fabric
3/ 26 /
Empower modern data analytics with Data Factory and Azure Databricks activity
Azure Databricks and Microsoft Fabric in Microsoft Fabric
4/ 28 /
Simplify analytics workloads with Azure Enhance organisational capabilities with
Databricks and Microsoft Fabric generative AI
10 / 33 /
Medallion architecture in Azure Databricks Explore real-world use cases with
and Microsoft Fabric hands‑on examples
15 / 42 /
Use lakehouse data with Azure Databricks Achieve excellence with Azure Databricks
and Microsoft Fabric and Microsoft Fabric
22 / 43 /
Better together: Azure Databricks, Unity Next steps
Catalogue and Microsoft Fabric Purview
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 3
Data lake
With the unique combination of the vast storage capacity of a data lake and the structured,
query-optimised environment of a data warehouse, the modern lakehouse emerges as the ideal
platform for developing and deploying AI algorithms. This dual capability ensures that AI projects
can utilise the necessary computational power and data accessibility, speeding up innovation and
reducing overhead costs associated with managing separate data systems.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 9
By simplifying data architecture and Enterprises can use the advanced machine
reducing infrastructure complexity, learning and AI capabilities of Azure
businesses can focus on creating value Databricks and Fabric on their full data
through AI rather than grappling with estate stored in a data lakehouse. These
data management challenges in the tools include end-to-end experiment
following ways: management and automated machine
learning toolkits that can super-charge
• The lakehouse architecture stores Delta AI projects.
Lake files in an ADLS account. This cloud
storage service is extremely cost
effective, and the Delta Lake format
allows the storage of both structured
and unstructured data.
The three layers of the medallion Azure Databricks and Fabric utilise this
architecture are: architecture to enhance their data
management and analytical offerings.
1. Bronze (raw): In this layer, raw data is Together, they create a robust
initially ingested, retaining its original environment in which data not only flows
form. It acts as a staging area and is seamlessly through each stage of the
crucial for capturing the full granularity medallion architecture, but is also enriched
of data without any loss of fidelity. and made more accessible.
Warehousing AI
UC: Governance,
sharing, integration
The default dataset includes all tables from the lakehouse, allowing users to establish
relationships and apply various modelling changes. These datasets from Unity Catalogue can
be directly published to Power BI. Users can access and edit a published semantic model with
the web modelling editor that’s accessible through Power BI.
In the model view within the web modelling editor, you can see whether there’s a Direct Lake
connection by hovering the cursor over the table headers. Direct Lake also allows for the
creation of new Power BI datasets directly through the web. This process ensures the use of
Direct Lake for the connection. To learn more about using the web editor for semantic models,
the following document will help you get started: Edit data models in the Power BI service
(preview) – Power BI | Microsoft Learn.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 19
These scenarios can be handled by a new 3. Click the ellipsis (…) in the top-right
feature in Fabric called Data Activator. This corner of the temperature visual and
no-code tool monitors data in a Power BI select Set Alert or use the Set Alert
report and automatically takes action if the button found in the Power BI toolbar.
data matches certain patterns or hits
specified thresholds. When these events 4. In the Set Alert pane, specify how you
occur, Data Activator can take an action wish to receive alerts (email or Teams).
such as alerting a user or launching a If your visual includes multiple freezers
Power Automate workflow. (dimensions), use the For each
dropdown to select the specific
In order to enable Data Activator, please dimension (freezer) to monitor.
follow the official documentation here:
https://learn.microsoft.com/fabric/ 5. Define the alert condition, such as when
data-activator/ the temperature drops below 30° F. Data
Activator will monitor the temperature
To create an alert with Data Activator and notify you when this condition
when a freezer’s temperature falls below is met.
30° F in a Power BI report, follow these
steps for monitoring freezer temperatures 6. Decide where to save your Data
within a Fabric workspace: Activator trigger in Power BI. You can
add it to an existing reflex item or create
a new one.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 21
7. Click Create alert to finalise your Data and automatically takes action if the data
Activator trigger. You can optionally matches certain conditions or exceeds
deselect Start my alert if you prefer to thresholds. When these events occur,
edit the trigger in Data Activator before Alerts will take a specified action, such as
activating it. sending a notification via email, Slack or
Teams. The alert can also call a webhook
By following these steps, you’ve action, allowing users to build extensible,
successfully set up an alert in Data custom workflows based on changes in
Activator to notify you when a monitored the data.
freezer’s temperature falls below 30° F,
allowing you to take immediate action if A monitor is a process that runs on a
necessary. Once these data updates are specified schedule to check the data
complete, you should receive the alert quality of a particular table. When a user
from Data Activator that was configured. creates a monitor, it computes the data
quality metrics for the table and stores the
current values in a separate system table.
Use Lakehouse Monitoring Each time the monitor runs, it recomputes
with Alerts to alert on changes the quality metrics and compares them to
the original values. If the quality has
in Azure Databricks
deteriorated, then an alert will be raised.
Enterprises often require alerts when data For details on how a monitor can be
quality metrics exceed certain thresholds. created, consult the following document:
For example, they may want to know if https://docs.databricks.com/lakehouse-
there’s a sudden, unexpected spike in the monitoring/create-monitor-ui.html
number of missing values within a
particular field, indicating a possible If a monitor detects that the quality of the
problem in the transaction pipeline, or if data in the table has declined, it will raise
the quality of predictions from a machine the specified alert. This can be used to
learning model has declined, indicating a send a notification to the data engineering
need to retrain the model on newer data. teams so they can investigate further. For
details on how these alerts can be
These scenarios can be handled with an configured, check the following document:
Azure Databricks feature called Lakehouse https://docs.databricks.com/lakehouse-
Monitoring with Alerts. This no-code tool monitoring/monitor-alerts.html
monitors data quality in Unity Catalogue
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 22
One activity that encompasses all three job types: Unity Catalog support and
Notebook, Jar, Python Policy ID integration
Figure 6: Seamless integration between Azure Databricks and Data Factory with Microsoft Fabric
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 27
Moreover, the enhanced Azure Databricks • Monitor notebook runs in real time:
integration introduces several new During operation, users can initiate a
features. They are: data pipeline and immediately monitor
its execution, with the system providing
• Configure Unity Catalogue access direct links to the outputs in the Azure
mode: Users can configure the access Databricks instance. This real-time
mode of Unity Catalogue, which monitoring capability allows users to
enhances governance and security by track the details of the notebook runs,
managing permissions more including cluster performance and
meticulously. computational efficiency, directly
from Fabric.
• Run multiple tasks in a single
Databricks activity: Users can run This integration not only brings existing
various tasks, such as notebooks, JARs Azure Databricks capabilities from Azure
and Python scripts, within a single Data Factory into Fabric but also
Databricks activity, streamlining the introduces new functionalities such as
process and reducing the complexity cluster policy and Unity Catalogue
previously associated with managing support, enhancing the overall data
multiple types of data jobs. management and analytics experience.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 28
Enhance organisational
capabilities with generative AI
Advanced AI models have changed the technology landscape. Enterprises are trying to unlock
the potential of their data and use AI to expand their business capabilities. This includes the use
of generative AI to build their data lakehouse on Azure.
The next section will detail how to describe your desired architecture to ChatGPT and have it
generate code that can be implemented in Azure.
Using prompt engineering, you can generate architecture plans and code using the same process
that would be used to summarise a technical article.
Prompt Input
Architecture Overview
• Data storage layer: Azure Data Lake Storage Gen2: Acts as the
central repository for storing raw data, processed data and
machine learning artefacts. ADLS Gen2 is optimised for large-scale
analytics scenarios and supports hierarchical namespace, which
simplifies data management.
Detailed Workflow
1. Data ingestion: Data is ingested into ADLS Gen2 from various sources,
including structured databases, IoT devices, log files, etc. This
data is stored in raw format within a hierarchical file system
structure.
Note
The rest of the content generated by Azure OpenAI can be found in the Appendix.
Prompts and answers reflected here may not reflect your exact experience.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 30
Diagnosis and fix: Databricks Assistant These examples illustrate the practical
identifies a missing comma in a benefits of Databricks Assistant in
DataFrame operation and suggests the real‑world development environments,
corrected code with a change streamlining the coding process,
highlighted. simplifying error resolution and ensuring
thorough documentation.
Utility: The user gets immediate
feedback and correction suggestions, Note
speeding up troubleshooting and Prompts and answers reflected here may
reducing frustration. not reflect your exact experience.
3. Code documentation
User input:
6. Query using English: Finally, query the quarter, using natural language.
DataFrame by asking questions in plain This approach simplifies complex data
English. The SDK interprets these analysis tasks into straightforward
questions and executes the English questions.
corresponding SQL queries, returning
the results directly to your notebook.
Creating a notebook in
An example query using English with the Microsoft Fabric
English SDK for Apache Spark could be
something such as: Fabric notebooks are a key tool for crafting
Apache Spark jobs and conducting
machine learning experiments. With
What was the average trip support for advanced visualisations and
distance for each day during
the month of January 2016?
Markdown text integration, it offers a
Print the averages to the web-based interactive platform that’s
nearest tenth. popular among data scientists and
engineers for coding. Data scientists
rely on these notebooks to develop
This query demonstrates how plain English and deploy machine learning models,
can be utilised to conduct data analysis including experimentation, model
activities, such as calculating averages tracking and deployment phases.
from a dataset, with the English SDK, Fabric notebooks offer:
allowing Apache Spark to interpret and
execute English-language instructions. • Immediate usability with no
set-up required
Another example query using English for
the English SDK for Apache Spark could be: • An intuitive, low-code interface for data
exploration and processing
Loading data into OneLake via a Microsoft Fabric data engineering notebook
spark = SparkSession.builder.appName("ParkDataImport").getOrCreate()
data_url = "https://www.dropbox.com/s/268uogek0mcypn9/park-data.
csv?raw=1"
df = spark.read.option("header", "true").csv(data_url)
csv_table_name = "park_data_csv"
parquet_table_name = "park_data_parquet"
delta_table_name = "park_data_delta"
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 36
df.write.mode("overwrite").format("csv").save("Files/" + csv_table_name)
df.write.mode("overwrite").format("parquet").save("Files/" + parquet_
table_name)
df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)
# Make sure the table exists and the schema matches to avoid errors
df.write.mode("append").format("delta").saveAsTable(delta_table_name)
Once the data has been successfully uploaded, try reading and analysing the data:
animal_sightings = spark.sql("""
FROM park_data_view
GROUP BY Animal_Type
""")
animal_sightings.show()
avg_temp = spark.sql("""
FROM park_data_view
""")
avg_temp.show()
common_weather = spark.sql("""
FROM park_data_view
GROUP BY Weather
LIMIT 5
""")
common_weather.show()
squirrel_sightings = spark.sql("""
FROM park_data_view
""")
squirrel_sightings.show()
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 38
1. Open your Azure Databricks workspace in a browser of your choice and launch a new Azure
Databricks notebook.
2. Copy and paste the following script into your new notebook. Then, execute the following
Python script in your notebook to create a Delta table within your ADLS Gen2 account.
This script reads some sample Parquet data and then writes it as a Delta table into your
ADLS account:
#python
# Adjust the file path to point to your sample parquet data using the
following format:
# The line below reads Parquet files from your ADLS account
df = spark.read.format('Parquet').load("abfss://[email protected].
core.windows.net/demo/full/dimension_city/")
#This line writes the read data as Delta tables back into your ADLS
account
df.write.mode("overwrite").format("delta").save("abfss://datasetsv1@
olsdemo.dfs.core.windows.net/demo/adb_dim_city_delta/")
And, of course, Azure Databricks can also read the data in the ADLS account.
3. Azure Databricks can also modify the same sets of data that were originally created previously
with Fabric. To see this in action, append some new rows to the Delta Lake tables you created
in OneLake:
spark = SparkSession.builder.appName("AppendToDeltaTable").
getOrCreate()
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 40
delta_table_path = "abfss://<container-name>@<storage-account-name>.
dfs.core.windows.net/<your-delta-table-path>"
# Replace the column names and values with those relevant to your table
new_rows = [
("NewValue1", 10),
("NewValue2", 20)
new_data_df.write.format("delta").mode("append").save(delta_table_path)
# Verify by reading back the data from the Delta Lake table
df = spark.read.format("delta").load(delta_table_path)
df.show()
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 41
As the examples illustrate, a data 6. Input the server hostname and HTTP
lakehouse, built on any platform with the path you obtained earlier.
advantages of open platforms, enables
enterprises to use a variety of engines to 7. Decide between the Import and
work on the same copy of the data at the DirectQuery modes for your data
same time. connectivity. For more insights into
these options, consider reading about
the use of DirectQuery in Power BI
Azure Databricks connector Desktop.
within Power BI
8. Select your preferred authentication
The Power BI connector for Azure method:
Databricks provides seamless integration
between Power BI and Azure Databricks, a. Personal Access Token: Enter
enabling organisations to connect, analyse your Azure Databricks personal
and visualise data stored in Azure access token.
Databricks with ease. This integration
supports Microsoft Entra ID authentication, b. Microsoft Entra ID: Choose Sign in
removing the need for administrators to and follow the prompts.
generate personal access tokens for
connection. It is designed to enhance data c. Username/Password: This option is
connectivity and analysis experiences, typically not applicable.
allowing for efficient and secure data
visualisation directly from the data lake. 9. After authentication, Power BI will
present you with the Navigator
1. Obtain your Azure Databricks server window. Here, you can select the Azure
hostname and HTTP path for setting up Databricks data you wish to query. If
the connection in Power BI. your workspace has Unity Catalogue
enabled, you’ll first select a catalogue,
2. Launch Power BI Desktop. followed by a schema and a table.
3. Choose Get Data from the home screen For workloads that need the processing
or navigate through File > Get Data. power and flexibility offered by Azure
Databricks, enterprises can use the
4. Search for Azure Databricks. advanced visualisation capabilities of
Power BI along with Azure Databricks.
5. Select Azure Databricks connector
and then click Connect.
Synergy in Analytics: Unifying Azure Databricks and Microsoft Fabric 42
Next steps
• Talk to a sales specialist.
• Get more value from your data with Microsoft Fabric.
• Unlock insights from your data and build AI solutions with
Azure Databricks.
•
Watch the webinar to learn more about how Azure Databricks
Appendix
Prompt input
Architecture Overview
• Data storage layer: Azure Data Lake Storage Gen2: Acts as the
central repository for storing raw data, processed data and
machine learning artefacts. ADLS Gen2 is optimised for large-scale
analytics scenarios and supports hierarchical namespace, which
simplifies data management.
Detailed Workflow
1. Data ingestion: Data is ingested into ADLS Gen2 from various sources,
including structured databases, IoT devices, log files, etc. This
data is stored in raw format within a hierarchical file system
structure.
Note
Prompts and answers reflected here may not reflect your exact experience.
Prompt input
Azure Databricks
Python
Microsoft Fabric
pythonCopy code
df = spark.read.csv("/mnt/<mount-name>/path/to/data.csv")
2. Data Transformation:
pythonCopy code
df.write.format("delta").save("/mnt/<mount-name>/path/to/
processed/data/")
1. Model Training:
pythonCopy code
2. Model Deployment:
Note
Prompts and answers reflected here may not reflect your exact experience.
©2024 Microsoft Corporation. All rights reserved. This document is provided ‘as-is’. Information and views expressed in this document, including URLs and other
internet website references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual
property in any Microsoft product. You may copy and use this document for your internal reference purposes.