Modren Analytics AK and Governance
Modren Analytics AK and Governance
Modern
Analytics, AI,
and Governance
at Scale
Learn how a strategic framework for data is
the foundation for AI innovation
Bring on the era of AI with Microsoft 2
03 / 09 /
Executive summary Microsoft Fabric powers MA2G
11 Enterprise Data Governance
14 Microsoft Purview provides a unified
data governance solution
04 / 15
19
Data Management Foundation
Domains and Data Products
When your data is siloed,
your organization is siloed
22 /
Copilots reduce the heavy lifting
06 /
What will it take to make
your business AI-ready?
23 /
LLM capabilities power your
generative AI applications
24 /
Get ready for the era of AI
Bring on the era of AI with Microsoft 3
This is because many organizations approach a data challenge with a technology solution—but
that only solves a fraction of the problem. Organizations need a wider and deeper focus on
solutions that drive cultural changes and align people and processes with technology.
Throughout hundreds of engagements with organizations worldwide to help them become data
driven, Microsoft has seen the following top challenges over and over. Here are three common
problems associated with culture, people, and processes that impede a unified analytics and
AI ecosystem.
For example, imagine a data warehouse migration that lands data in proprietary data formats.
At the same time, Internet of Things (IoT) data is streamed into a data lake store. Data from these
two separate projects lands in separate data stores, causing siloed data. To become data driven,
the entire organization must be able to build meaningful insights from both sources, regardless of
the boundaries between business unit, so they can access data of all attributes and types.
1
Building a high-performance data and AI organization, MIT Technology Review, April 2021
Bring on the era of AI with Microsoft 5
All these problems can be avoided if the organizations have enterprise data governance that
provides the inventory and context of all data, automated processes to streamline workflows, and
policies that automatically manage data access. The goal is to implement robust data governance
and data management that enables different analytics projects for different business units.
However, this creates fragmented data engineering solutions that become harder to maintain as
they grow to include thousands on top of thousands of pipelines. The worst part is, most data
engineering tasks are manual. Many organizations use their best people to perform these manual
tasks when it should be a process change instead. By implementing proper data management
processes with automation, organizations can reassign data engineers to more meaningful work
of business data modelling, data aggregations, and calculations.
Bring on the era of AI with Microsoft 6
However, a paradox still exists when it comes to AI. While 78% of executives agree that AI is a
top business priority,3 they also recognize that data problems will likely stand in their way of
achieving their AI goals. This is because many organizations see AI and analytics as initiatives that
can be adopted by investing in technology solutions—but that’s just a small piece of the puzzle.
Organizations that successfully deploy AI and analytics also change processes, adapt their culture,
and support their people to use these new capabilities in effective ways.
2
2024 AI Business Predictions, PWC
3
CIO perspectives on generative AI, MIT Technology Review, July 2023
Bring on the era of AI with Microsoft 7
Enterprise Data Governance includes the set of policies and practices used to discover,
describe, and manage data to accelerate responsible data democratization. Data
governance ties together the data and analytics stack and automates data operations,
such as cataloging, classification, creating lineage, and applying security through policies.
Without it, organizations limit their ability to innovate and unlock new insights.
• Data Management Services • Data Order Service
• Governance • Rapid Access to Data
• Quality • On Premises or Azure
• Policy
• Lineage
• Classification
• Catalogue
Bring on the era of AI with Microsoft 8
Data Management Foundation involves the practices and processes that help you
create efficiencies with ingesting, storing, protecting, and ultimately serving data to
different domains in the organization.
• Self Service - Automation • Automated Data Operations
• Domain Provisioning • Data Virtualization: Shortcuts and Mirroring
• Workspace Provisioning • Ingestion at Scale Solution
• Data Onboarding • Data Engineering Acceleration
• Automation
• Lakehouse and open data formats
Domains and Data Products describes the environments and services that enable
your business units to fully use their data. Allowing departments to self-serve data and
analytics enables non-technical users to access, analyze, and build data insights or data
products on their own.
• Federation • AI Copilots
• Autonomous Business Units • Empowering Data Practitioners
• Domains, Workspaces, Capacities • Accelerating Report Creation
• Data Sharing / Collaboration • Enable Data Exploration
• AI Guided Insights
• Data Integration
Together, these three solution pillars combine to help organizations achieve MA2G. By first
setting business priorities as the guiding North Star, then implementing aspects from each
solution pillar, your organization can shift into a whole new paradigm of doing business that
empowers everyone to work toward a common goal fueled by data.
Bring on the era of AI with Microsoft 9
Microsoft Fabric
Enterprise Governance,
Security, and Compliance
IT Finance Marketing
Fabric domain Fabric domain Fabric domain
OneLake Security
OneLake
Shortcuts Fabric storage Mirroring
Existing data lakes (Delta/Parquet) for proprietary data stores
Azure, AWS, Google
Innovation HR Operations
Fabric domain Fabric domain Fabric domain
• Support for multi-cloud data estates: Automatically scan and catalog all data assets—
including machine learning models and Power BI reports—across the organization, whether
they’re on premises, in Azure, or running on other public clouds.
• Governance experience: Develop clear role definitions for administrators, domain creators,
data health owners, and data health readers.
• Business-friendly terminology: Assign language that follows the data governance experience
through data products, domains, quality assessments, and reports.
• Data scan and search: Find the data you need across your entire estate and profile data at the
source to indicate attributes like min, max, average, and thresholds.
• Data quality scores: Generate data quality scores once rules and policies are applied, giving
you insights into your data quality relative to your business rules.
• Metadata analysis: Capture metadata and data lineage to help personas to decide if data is
usable, then use profiling or data quality scans for recommendations.
• Data health controls: Ensure your rules and indicators reflect the unique standards of your
organization with a set of cloud data management controls.
• Summarized insights: Showcase the overall health of your governed data estate with built-in
data governance reports.
• Pre-built integrations: Extend the value of Purview with integrations for solutions related to
master data management and data lineage.
Bring on the era of AI with Microsoft 15
Implementing automation, frameworks, and services can help organizations bolster their data
management practices. With a data management foundation that uses Microsoft Fabric and
OneLake, you gain both an open and governed data lakehouse for storing data, as well as
automated data virtualization that efficiently sends data to domains without overburdening
IT teams.
Microsoft Fabric
Data Onboarding
OneLake
Shortcuts allow instant linking of data
already in Azure and other clouds, Shortcuts Fabric Storage Mirroring
Existing Delta and for proprietary
without any data duplication and
data lakes Parquet data stores
movement.
Azure, AWS, Google
Mirroring is a feature that offers
continuous and seamless access to and
replication of data from database or
data warehouse with no ETL required.
Bring on the era of AI with Microsoft 16
Shortcuts: OneLake shortcuts let you easily onboard data by instantly linking data that already
exists in Azure or other clouds through a unified namespace. This eliminates data duplication or
movement, reducing latency associated with data copies and staging.
• A shortcut is a symbolic link which points from one data location to another
• You can consolidate data across items or workspaces without changing the data ownership
• Existing ADLS Gen2 storage accounts and Amazon S3 buckets can be managed externally to
Fabric and Microsoft while still being virtualized into OneLake with shortcuts
• All data is mapped to a unified namespace and can be accessed using the same APIs, including
the ADLS Gen2 DFS APIs
Bring on the era of AI with Microsoft 17
Industry solutions: Fabric includes pre-built, industry-specific solutions that help organizations
integrate data from different sources and use rich analytics. Data solutions combine data
integration services and, in some cases, machine learning support, so organizations can face
industry-specific data challenges. These solutions include retail, healthcare, sustainability,
and more.
Mirroring: Fabric offers a mirroring feature that provides continuous and seamless access to—
and replication of—data from databases or data warehouses, without ETL. Any database can be
accessed and managed via Fabric without having to switch database clients. By just providing
connection details, your database is instantly available in Fabric as a Mirrored database.
• A full editing experience of the source database is available for the Mirrored database
• Data is replicated into OneLake in Delta format and kept up to date in near real time
• All the Fabric experiences instantly work with the OneLake replica
Data-agnostic ingestion
Automatically ingest data regardless of its attribute, format, and the domain it belongs to.
Organizations can push or pull data from different sources then process it. Metadata-ingestion
frameworks or Kafka-based solutions are sample solutions that can be implemented to automate
this process.
Data standardization
As your data gets ingested, you can standardize it through processes such as format conversions,
versioning, merging, PII handing, and master data management. Use Apache Spark notebooks
within Fabric to quickly implement data standardization practices. Additional services related
to data quality management address issues such as deduplication, threshold identification, and
alignment with master data. Without proper checks on data quality, you run the risk of slowing
down time to insights.
Workspace provisioning: You can create a workspace to collaborate with teammates in your
domain and create collections of items such as lakehouses, warehouses, and reports.
Each experience is tailored to a specific persona and a specific task, allowing different domains to
find the tools they need to create their own data products.
Data Factory Synapse Data Synapse Synapse Data Synapse Real- Power BI Data Activator
Engineering Data Science Warehouse Time Analytics
OneLake
Data Factory offers a modern data integration experience to ingest, prepare, and
transform data from a rich set of data sources. Data Factory brings Fast Copy capabilities
to both dataflows and data pipelines so you can move data between your lakehouse and
data warehouse in Fabric at blazing speed.
Synapse Data Engineering provides a world class Spark platform with great authoring
experiences, enabling data engineers to perform large scale data transformation and
democratize data through the lakehouse. The Spark integration with Data Factory also
enables notebooks and Spark jobs to be scheduled and orchestrated.
Synapse Data Science allows you to build, deploy, and operationalize machine learning
models within your Fabric experience. It integrates with Azure Machine Learning to
provide built-in experiment tracking and model registry.
Bring on the era of AI with Microsoft 21
Synapse Data Warehouse provides industry-leading SQL performance and scale. It fully
separates compute from storage, allowing independent scaling of both the components.
Additionally, it natively stores data in the open Delta Lake format.
Synapse Real-Time Analytics gives you a way to focus and scale up your analytics
solution while democratizing data for both citizen data scientists and advanced data
engineers. As a fully managed big data analytics platform, Real-Time Analytics utilizes
a query language and engine so you can search structured, semi-structured, and
unstructured data.
Power BI provides business owners the ability to access all their data in Fabric quickly
and intuitively to make better decisions with data. This experience allows organizations
to turn unrelated data sources into coherent, visually immersive, and interactive insights.
Data Activator monitors data in Power BI reports and automatically takes actions when
certain patterns or conditions are detected. This allows you to build a digital nervous
system that acts across all your data, at scale and in a timely manner.
Bring on the era of AI with Microsoft 22
Copilot for Data Science and Data Engineering provides intelligent code completion,
automates routine tasks, and supplies industry-standard code templates to facilitate
tasks like data enrichment and the creation of analytical models. Copilot offers
contextual code suggestions and prompts that adapt to specific tasks, helping you code
more effectively and with greater ease.
Copilot for Data Factory supports both citizen and professional data wranglers in
streamlining their workflow. It provides intelligent code to transform data, as well as
code explanations to help you understand complex tasks.
Copilot for Power BI allows you to create Power BI reports automatically. You can
generate summaries of existing reports or ask for suggestions on which reports to create
based on your data. Prompts like “Create a page to examine next month’s forecast” yield
visualizations that help you spot trends and patterns quickly.
Bring on the era of AI with Microsoft 23
Extract insights from unstructured data: Use Fabric to tap into information stored in
unstructured documents like PDFs. You can load PDF documents into a Spark DataFrame, read the
documents using the Azure AI Document Intelligence in Azure AI Services, and use SynapseML to
split the documents into chunks.
Integrate Azure OpenAI: Apply LLMs at scale by integrating Azure OpenAI Service
and SynapseML. Azure OpenAI can be used to solve natural language tasks by prompting
the completion API. Through SynapseML, you can use Apache Spark distributed computing
framework to easily process millions of prompts.
Generate embeddings: Connect Azure OpenAI Service and use SynapseML to generate
embeddings in a distributed manner that allows you to efficiently process large volumes of data.
You can also store the embeddings in a vector store using Azure AI Search and search the vector
store to answer users’ questions.
Bring on the era of AI with Microsoft 24
the era of AI
critical. Frameworks like MA2G make sure
your systems support governance, data
management, and domains, allowing
organization to create customized AI and
analytics experiences. Through Microsoft
Fabric, all the data and analytics tools you
need are available in one, end-to-end
platform.