Microsoft Azure AI Fundamentals Generative AI
Microsoft Azure AI Fundamentals Generative AI
Completed100 XP
1 minute
Generative AI, and technologies that implement it are increasingly in the public consciousness – even
among people who don't work in technology roles or have a background in computer science or
machine learning. The futurist and novelist Arthur C. Clarke is quoted as observing that "any
sufficiently advanced technology is indistinguishable from magic". In the case of generative AI, it
does seem to have an almost miraculous ability to produce human-like original content, including
poetry, prose, and even computer code.
Completed100 XP
3 minutes
Artificial Intelligence (AI) imitates human behavior by using machine learning to interact with the
environment and execute tasks without explicit directions on what to output.
Generative AI describes a category of capabilities within AI that create original content. These
capabilities include taking in natural language input, and returning appropriate responses in a variety
of formats such as natural language, images, code, and more. Let's take a look at a couple of
examples:
To generate a natural language response, you might submit a request such as "Write a cover letter
for a person with a bachelor's degree in history."
Some generative AI applications can interpret a natural language request and generate an
appropriate image. For example, you might submit a request like "Create a logo for a florist
business."
A generative AI application could then return a new image based on the description you provided,
like this:
Code generation
Some generative AI applications are designed to help software developers write code. For example,
you could submit a request like "Write Python code to add two numbers." and generate the following
response:
PythonCopy
return a + b
Generative AI applications
Generative AI often appears as chat-based assistants that are integrated into applications to help
users find information and perform tasks efficiently. One example of such an application is Microsoft
Copilot, an AI-powered productivity tool designed to enhance your work experience by providing
real-time intelligence and assistance. All generative AI assistants utilize language models. A subset of
these assistants also execute programmable tasks.
Assistants that not only produce new content, but execute tasks such as filing taxes or coordinating
shipping arrangements, just as a few examples, are known as agents. Agents are applications that
can respond to user input or assess situations autonomously, and take appropriate actions. These
actions could help with a series of tasks. For example, an "executive assistant" agent could provide
details about the location of a meeting on your calendar, then attach a map or automate the booking
of a taxi or rideshare service to help you get there.
One way to think of different generative AI applications is by grouping them in buckets. In general,
you can categorize industry and personal generative AI assistants into three buckets, each requiring
more customization: ready-to-use applications, extendable applications, and applications you build
from the foundation.
Extendable: some ready-to-use applications can also be extended using your own data.
These customizations enable the assistant to better support specific business processes or
tasks. Microsoft Copilot is an example of technology that is ready-to-use and extendable.
Applications you build from the foundation: you can build your own assistants and
assistants with agentic capabilities starting from a language model. Many language models
exist, which we will cover later on in this module.
Often, you will use services to extend or build Generative AI applications. These services provide the
infrastructure, tools, and frameworks necessary to develop, train, and deploy generative AI models.
For example, Microsoft provides services such as Copilot Studio to extend Microsoft 365 Copilot and
Microsoft Azure AI Foundry to build AI from different models.
Next, let's build a solid understanding of how the language models in these generative AI
applications work.
How do language models work?
Completed100 XP
8 minutes
Over the last decades, multiple developments in the field of natural language processing (NLP) have
resulted in achieving large language models (LLMs). The development and availability of language
models led to new ways to interact with applications and systems, such as through generative AI
assistants and agents.
Let's take a look back at historical developments for language models which include:
Tokenization
As you may expect, machines have a hard time deciphering text as they mostly rely on numbers.
To read text, we therefore need to convert the presented text to numbers.
One important development to allow machines to more easily work with text has been
tokenization. Tokens are strings with a known meaning, usually representing a word. Tokenization is
turning words into tokens, which are then converted to numbers. A statistical approach to
tokenization is by using a pipeline:
2. Split the words in the text based on a rule. For example, split the words where there's a
white space.
3. Stop word removal. Remove noisy words that have little meaning like the and a. A dictionary
of these words is provided to structurally remove them from the text.
Tokenization allowed for text to be labeled. As a result, statistical techniques could be used to let
computers find patterns in the data instead of applying rule-based models.
Word embeddings
One of the key concepts introduced by applying deep learning techniques to NLP is word
embeddings. Word embeddings address the problem of not being able to define the semantic
relationship between words.
Word embeddings are created during the deep learning model training process. During training, the
model analyzes the cooccurrence patterns of words in sentences and learns to represent them
as vectors. A vector represents a path through a point in n-dimensional space (in other words, a line).
Semantic relationships are defined by how similar the angles of the lines are (i.e. the direction of the
path). Because word embeddings represent words in a vector space, the relationship between words
can be easily described and calculated.
To create a vocabulary that encapsulates semantic relationships between the tokens, we define
contextual vectors, known as embeddings, for them. Vectors are multi-valued numeric
representations of information, for example [10, 3, 1] in which each numeric element represents a
particular attribute of the information. For language tokens, each element of a token's vector
represents some semantic attribute of the token. The specific categories for the elements of the
vectors in a language model are determined during training based on how commonly words are used
together or in similar contexts.
Vectors represent lines in multidimensional space, describing direction and distance along multiple
axes (you can impress your mathematician friends by calling these amplitude and magnitude).
Overall, the vector describes the direction and distance of the path from origin to end.
The elements of the tokens in the embeddings space each represent some semantic attribute of the
token, so that semantically similar tokens should result in vectors that have a similar orientation – in
other words they point in the same direction. a technique called cosine similarity is used to
determine if two vectors have similar directions (regardless of distance), and therefore represent
semantically linked words. For example, the embedding vectors for "dog" and "puppy" describe a
path along an almost identical direction, which is also fairly similar to the direction for "cat". The
embedding vector for "skateboard" however describes journey in a very different direction.
Architectural developments
The architecture, or design, of a machine learning model describes the structure and organization of
its various components and processes. It defines how data is processed, how models are trained and
evaluated, and how predictions are generated. One of the first breakthroughs in language model
architecture was the Recurrent Neural Networks (RNNs).
To understand text isn't just to understand individual words, presented in isolation. Words can differ
in their meaning depending on the context they're presented in. In other words, the sentence
around a word matters to the meaning of the word.
RNNs are able to take into account the context of words through multiple sequential steps. Each step
takes an input and a hidden state. Imagine the input at each step to be a new word. Each step also
produces an output. The hidden state can serve as a memory of the network, storing the output of
the previous step and passing it as input to the next step.
Vincent Van Gogh was a painter most known for creating stunning and emotionally expressive
artworks, including ...
To know what word comes next, you need to remember the name of the painter. The sentence needs
to be completed, as the last word is still missing. A missing or masked word in NLP tasks is often
represented with [MASK]. By using the special [MASK] token in a sentence, you can let a language
model know it needs to predict what the missing token or value is.
Simplifying the example sentence, you can provide the following input to an RNN: Vincent was a
painter known for [MASK]:
The RNN takes each token as an input, process it, and update the hidden state with a memory of that
token. When the next token is processed as new input, the hidden state from the previous step is
updated.
Finally, the last token is presented as input to the model, namely the [MASK] token. Indicating that
there's information missing and the model needs to predict its value. The RNN then uses the hidden
state to predict that the output should be something like Starry Night
Challenges with RNNs
In the example, the hidden state contains the information Vincent, is, painter, and know. With RNNs,
each of these tokens are equally important in the hidden state, and therefore equally considered
when predicting the output.
RNNs allow for context to be included when deciphering the meaning of a word in relation to the
complete sentence. However, as the hidden state of an RNN is updated with each token, the actual
relevant information, or signal, may be lost.
In the example provided, Vincent Van Gogh's name is at the start of the sentence, while the mask is
at the end. At the final step, when the mask is presented as input, the hidden state may contain a
large amount of information that is irrelevant for predicting the mask's output. Since the hidden
state has a limited size, the relevant information may even be deleted to make room for new and
more recent information.
When we read this sentence, we know that only certain words are essential to predict the last word.
An RNN however, includes all (relevant and irrelevant) information in a hidden state. As a result, the
relevant information may become a weak signal in the hidden state, meaning that it can be
overlooked because there's too much other irrelevant information influencing the model.
So far, we described how language models can read text through tokenization and how they can
understand the relationship between words through word embeddings. We also explored how past
language models tried to capture the context of words. Next, learn how the limitations of previous
models are handled in today's language models with transformer architecture.
Completed100 XP
10 minutes
The generative AI applications we use today are made possible by utilizing Transformer architecture.
Transformers were introduced in the Attention is all you need paper by Vaswani, et al. from 2017.
Transformer architecture introduced concepts that drastically improved a model's ability to
understand and generate text. Different models have been trained using adaptations of the
Transformer architecture to optimize for specific NLP tasks.
The encoder: Responsible for processing the input sequence and creating a representation
that captures the context of each token.
The decoder: Generates the output sequence by attending to the encoder's representation
and predicting the next token in the sequence.
The most important innovations presented in the Transformer architecture were positional
encoding and multi-head attention. A simplified representation of the architecture:
In the encoder layer, an input sequence is encoded with positional encoding, after which
multi-head attention is used to create a representation of the text.
In the decoder layer, an (incomplete) output sequence is encoded in a similar way, by first
using positional encoding and then multi-head attention. Then, the multi-head attention
mechanism is used a second time within the decoder to combine the output of the encoder
and the output of the encoded output sequence that was passed as input to the decoder
part. As a result, the output can be generated.
Understand positional encoding
The position of a word and the order of words in a sentence are important to understand the
meaning of a text. To include this information, without having to process text sequentially,
transformers use positional encoding.
Before Transformers, language models used word embeddings to encode text into vectors. In the
Transformer architecture, positional encoding is used to encode text into vectors. Positional encoding
is the sum of word embedding vectors and positional vectors. By doing so, the encoded text includes
information about the meaning and position of a word in a sentence.
To encode the position of a word in a sentence, you could use a single number to represent the index
value. For example:
Expand table
The 0
work 1
of 2
William 3
Shakespeare 4
inspired 5
many 6
movies 7
... ...
The longer a text or sequence, the larger the index values may become. Though using unique values
for each position in a text is a simple approach, the values would hold no meaning, and the growing
values may create instability during model training.
Understand attention
The most important technique used by Transformers to process text is the use of attention instead of
recurrence. In this way, the Transformer architecture provides an alternative to RNNs. Whereas RNNs
are compute-intensive since they process words sequentially, Transformers don't process the words
sequentially, but instead process each word independently in parallel by using attention.
Transformers use an attention function, where a new word is encoded (using positional encoding)
and represented as a query. The output of an encoded word is a key with an associated value.
To illustrate the three variables that are used by the attention function: the query, keys, and values,
let's explore a simplified example. Imagine encoding the sentence Vincent van Gogh is a painter,
known for his stunning and emotionally expressive artworks. When encoding the query Vincent van
Gogh, the output may be Vincent van Gogh as the key with painter as the associated value. The
architecture stores keys and values in a table, which it can then use for future decoding:
Expand table
Keys Values
Whenever a new sentence is presented like Shakespeare's work has influenced many movies, mostly
thanks to his work as a .... The model can complete the sentence by taking Shakespeare as the query
and finding it in the table of keys and values. Shakespeare the query is closest to William
Shakespeare the key, and thus the associated value playwright is presented as the output.
To calculate the attention function, the query, keys, and values are all encoded to vectors. The
attention function then computes the scaled dot-product between the query vector and the keys
vectors. The dot-product calculates the angle between vectors representing tokens, with the product
being larger when the vectors are more aligned.
The softmax function is used within the attention function, over the scaled dot-product of the
vectors, to create a probability distribution with possible outcomes. In other words, the softmax
function's output includes which keys are closest to the query. The key with the highest probability is
then selected, and the associated value is the output of the attention function.
The Transformer architecture uses multi-head attention, which means tokens are processed by the
attention function several times in parallel. By doing so, a word or sentence can be processed
multiple times, in various ways, to extract different kinds of information from the sentence.
The Transformer architecture has allowed us to train models in a more efficient way. Instead of
processing each token in a sentence or sequence, attention allows a model to process tokens in
parallel in various ways. Next, learn how different types of language models are available for building
applications.
Completed100 XP
3 minutes
Different models exist today which mostly differ by the specific data they've been trained on, or by
how they implement attention within their architectures.
Today, importantly, developers do not need to train models from scratch. To build a generative AI
application, you can use pre-trained models. Some language models are open-source and publicly
available through communities like Hugging Face. Others are offered in proprietary catalogs. For
example, Azure offers the most commonly used language models as foundation models in the Azure
AI Foundry model catalog. Foundation models are pretrained on large texts and can be fine-tuned for
specific tasks with a relatively small dataset.
You can deploy a foundation model to an endpoint without any extra training. If you want the model
to be specialized in a task, or perform better on domain-specific knowledge, you can also choose to
fine-tune a foundation model.
Text classification
Token classification
Question answering
Summarization
Translation
To choose the foundation model that best fits your needs, you can test out different models. You can
also review the data the models are trained on and possible biases and risks a model may have.
In general, language models can be considered in two categories: Large Language Models (LLMs)
and Small Language models (SLMs).
Expand table
LLMs are trained with vast quantities of text that represents a wide range SLMs are trained with smaller, more subjec
of general subject matter – typically by sourcing data from the Internet
and other generally available publications.
When trained, LLMs have many billions (even trillions) of parameters Typically have fewer parameters than LLMs
(weights that can be applied to vector embeddings to calculate predicted
token sequences).
Able to exhibit comprehensive language generation capabilities in a wide This focused vocabulary makes them very e
range of conversational contexts. conversational topics, but less effective at m
generation.
Their large size can impact their performance and make them difficult to The smaller size of SLMs can provide more
deploy locally on devices and computers. including local deployment to devices and o
and makes them faster and easier to fine-tu
Fine-tuning the model with additional data to customize its subject Fine-tuning can potentially be less time-con
expertise can be time-consuming, and expensive in terms of the compute
power required to perform the additional training.
Completed100 XP
3 minutes
The quality of responses from generative AI assistants not only depends on the language model
used, but on the types of prompts users provide. Prompts are ways we tell an application what we
want it to do. You can get the most useful completions by being explicit about the kind of response
you want. Take this example, "Summarize the key considerations for adopting Copilot described in
this document for a corporate executive. Format the summary as no more than six bullet points with
a professional tone." Users of generative AI can achieve better results when you submit clear, specific
prompts.
Consider the following ways you can improve the response a generative AI assistant provides:
1. Start with a specific goal for what you want the assistant to do
In most cases, an agent doesn't just send your prompt as-is to the language model. Usually, your
prompt is augmented with:
A system message that sets conditions and constraints for the language model behavior. For
example, "You're a helpful assistant that responds in a cheerful, friendly manner." These
system messages determine constraints and styles for the model's responses.
The conversation history for the current session, including past prompts and responses. The
history enables you to refine the response iteratively while maintaining the context of the
conversation.
The current prompt – potentially optimized by the agent to reword it appropriately for the
model or to add more grounding data to scope the response.
The term prompt engineering describes the process of prompt improvement. Both developers who
design applications and consumers who use those applications can improve the quality of responses
from generative AI by considering prompt engineering. Next, take a look at other methods that are
utilized by developers to improve the quality of responses.
Completed100 XP
3 minutes
Developers use several key mechanisms to help improve the performance and trustworthiness of
generative AI responses.
Grounding Data: grounding refers to the process of ensuring that a system's outputs are aligned with
factual, contextual, or reliable data sources. This can be done in various ways, such as linking the
model to a database, using search engines to retrieve real-time information, or incorporating
domain-specific knowledge bases. The goal is to anchor the model's responses to these data sources,
enhancing the trustworthiness and applicability of the generated content.
Fine-tuning: Fine-tuning involves taking a pre-trained model and further training it on a smaller, task-
specific dataset to make it more suitable for a particular application. This process allows the model to
specialize and perform better at specific tasks that require domain-specific knowledge. Fine-tuning is
particularly useful for adapting models to domain-specific requirements, improving accuracy, and
reducing the likelihood of generating irrelevant or inaccurate responses.
Security and Governance Controls: security and governance controls are needed to manage access,
authentication, and data usage. These controls help prevent the publication of incorrect or
unauthorized information.
There are many ways to measure response quality. In general, you can think of three dimensions for
evaluating and monitoring generative AI. These include:
Performance and quality evaluators: assess the accuracy, groundedness, and relevance of
generated content.
Risk and safety evaluators: assess potential risks associated with AI-generated content to
safeguard against content risks. This includes evaluating an AI system's predisposition
towards generating harmful or inappropriate content.
Today's services such as Azure AI Foundry provide an environment for these types of workflows and
more to take place. Next, gain an understanding of how to plan for responsible generative AI use.
Completed100 XP
3 minutes
The Microsoft guidance for responsible generative AI is designed to be practical and actionable. It
defines a four stage process to develop and implement a plan for responsible AI when using
generative models. The four stages in the process are:
2. Measure the presence of these harms in the outputs generated by your solution.
3. Mitigate the harms at multiple layers in your solution to minimize their presence and impact,
and ensure transparent communication about potential risks to users.
4. Operate the solution responsibly by defining and following a deployment and operational
readiness plan.
These stages should be informed by responsible AI principles. Microsoft has categorized these
principles into six buckets.
Responsible AI principles
It's important for software engineers to consider the impact of their software on users, and society in
general; including considerations for its responsible use. When the application is imbued with
artificial intelligence, these considerations are particularly important due to the nature of how AI
systems work and inform decisions; often based on probabilistic models, which are in turn
dependent on the data with which they were trained.
The human-like nature of AI solutions is a significant benefit in making applications user-friendly, but
it can also lead users to place a great deal of trust in the application's ability to make correct
decisions. The potential for harm to individuals or groups through incorrect predictions or misuse of
AI capabilities is a major concern, and software engineers building AI-enabled solutions should apply
due consideration to mitigate risks and ensure fairness, reliability, and adequate protection from
harm or discrimination.
Let's discuss some core principles for responsible AI that have been adopted at Microsoft.
Fairness
AI systems should treat all people fairly. For example, suppose you create a machine learning model
to support a loan approval application for a bank. The model should make predictions of whether or
not the loan should be approved without incorporating any bias based on gender, ethnicity, or other
factors that might result in an unfair advantage or disadvantage to specific groups of applicants.
Fairness of machine learned systems is a highly active area of ongoing research, and some software
solutions exist for evaluating, quantifying, and mitigating unfairness in machine learned models.
However, tooling alone isn't sufficient to ensure fairness. Consider fairness from the beginning of the
application development process; carefully reviewing training data to ensure it's representative of all
potentially affected subjects, and evaluating predictive performance for subsections of your user
population throughout the development lifecycle.
As with any software, AI-based software application development must be subjected to rigorous
testing and deployment management processes to ensure that they work as expected before
release. Additionally, software engineers need to take into account the probabilistic nature of
machine learning models, and apply appropriate thresholds when evaluating confidence scores for
predictions.
AI systems should be secure and respect privacy. The machine learning models on which AI systems
are based rely on large volumes of data, which may contain personal details that must be kept
private. Even after models are trained and the system is in production, they use new data to make
predictions or take action that may be subject to privacy or security concerns; so appropriate
safeguards to protect data and customer content must be implemented.
Inclusiveness
AI systems should empower everyone and engage people. AI should bring benefits to all parts of
society, regardless of physical ability, gender, sexual orientation, ethnicity, or other factors.
One way to optimize for inclusiveness is to ensure that the design, development, and testing of your
application includes input from as diverse a group of people as possible.
Transparency
AI systems should be understandable. Users should be made fully aware of the purpose of the
system, how it works, and what limitations may be expected.
For example, when an AI system is based on a machine learning model, you should generally make
users aware of factors that may affect the accuracy of its predictions, such as the number of cases
used to train the model, or the specific features that have the most influence over its predictions.
You should also share information about the confidence score for predictions.
When an AI application relies on personal data, such as a facial recognition system that takes images
of people to recognize them; you should make it clear to the user how their data is used and
retained, and who has access to it.
Accountability
People should be accountable for AI systems. Although many AI systems seem to operate
autonomously, ultimately it's the responsibility of the developers who trained and validated the
models they use, and defined the logic that bases decisions on model predictions to ensure that the
overall system meets responsibility requirements. To help meet this goal, designers and developers
of AI-based solution should work within a framework of governance and organizational principles
that ensure the solution meets responsible and legal standards that are clearly defined.
Tip
For more information about Microsoft's principles for responsible AI, see the Microsoft responsible
AI site.
Summary
Completed100 XP
1 minute
Generative AI is a rapidly developing field of AI that supports new language generation, code
development, image creation, and more. You learned about advancements in natural language
processing (NLP), the creation of large language models (LLMs), and key concepts such as
tokenization, word embeddings, and adding memory to language models. The module introduced
the transformer architecture, which revolutionized text understanding and generation.
You also learned about the two categories of language models: Large Language Models (LLMs) and
Small Language Models (SLMs), their differences, and their deployment. The module also
emphasized the influence of user prompts on the effectiveness of generative AI assistants and the
concept of 'prompt engineering'. Lastly, you learned about the mechanisms used to enhance
generative AI models, the four-stage process for responsible generative AI use, and the six principles
of responsible AI.
Introduction
Completed100 XP
1 minute
The growth in the use of artificial intelligence (AI) in general, and generative AI in particular means
that developers are increasingly required to create comprehensive AI solutions. These solutions need
to combine machine learning models, AI services, prompt engineering solutions, and custom code.
Microsoft Azure provides multiple services that you can use to create AI solutions. However, before
embarking on an AI application development project, it's useful to consider the available options for
services, tools, and frameworks as well as some principles and practices that can help you succeed.
This module explores some of the key considerations for planning an AI development project, and
introduces Azure AI Foundry; a comprehensive platform for AI development on Microsoft Azure.
What is AI?
Completed100 XP
5 minutes
The term "Artificial Intelligence" (AI) covers a wide range of software capabilities that enable
applications to exhibit human-like behavior. AI has been around for many years, and its definition has
varied as the technology and use cases associated with it have evolved. In today's technological
landscape, AI solutions are built on machine learning models that encapsulate semantic relationships
found in huge quantities of data; enabling applications to appear to interpret input in various
formats, reason over the input data, and generate appropriate responses and predictions.
Common AI capabilities that developers can integrate into a software application include:
Expand table
Capability Description
The ability to generate original responses to natural language prompts. For example, software fo
might be used to automatically generate property descriptions and advertising copy for a proper
Generative AI
Generative AI applications that can respond to user input or assess situations autonomously, and
For example, an "executive assistant" agent could provide details about the location of a meeting
attach a map or automate the booking of a taxi or rideshare service to help you get there.
Agents
Capability Description
The ability to accept, interpret, and process visual input from images, videos, and live camera str
automated checkout in a grocery store might use computer vision to identify which products a cu
shopping basket, eliminating the need to scan a barcode or manually enter the product and quan
Computer vision
The ability to recognize and synthesize speech. For example, a digital assistant might enable user
provide audible instructions by speaking into a microphone, and generate spoken output to prov
confirmations.
Speech
The ability to process natural language in written or spoken form, analyze it, identify key points,
or categorizations. For example, a marketing application might analyze social media messages th
company, translate them to a specific language, and categorize them as positive or negative base
Natural language
processing
The ability to use computer vision, speech, and natural language processing to extract key inform
forms, images, recordings, and other kinds of content. For example, an automated expense claim
might extract purchase dates, individual line item details, and total costs from a scanned receipt.
Information
extraction
The ability to use historic data and learned correlations to make predictions that support busines
example, analyzing demographic and economic factors in a city to predict real estate market tren
pricing decisions.
Decision support
Determining the specific AI capabilities you want to include in your application can help you identify
the most appropriate AI services that you'll need to provision, configure, and use in your solution.
Generative AI represents the latest advance in artificial intelligence, and deserves some extra
attention. Generative AI uses language models to respond to natural language prompts, enabling you
to build conversational apps and agents that support research, content creation, and task
automation in ways that were previously unimaginable.
The language models used in generative AI solutions can be large language models (LLMs) that have
been trained on huge volumes of data and include many millions of parameters; or they can be small
language models (SLMs) that are optimized for specific scenarios with lower overhead. Language
models commonly respond to text-based prompts with natural language text; though increasingly
new multi-modal models are able to handle image or speech prompts and respond by generating
text, code, speech, or images.
Azure AI services
Completed100 XP
5 minutes
Microsoft Azure provides a wide range of cloud services that you can use to develop, deploy, and
manage an AI solution. The most obvious starting point for considering AI development on Azure is
Azure AI services; a set of out-of-the-box prebuilt APIs and models that you can integrate into your
applications. The following table lists some commonly used Azure AI services (for a full list of all
available Azure AI services, see Available Azure AI services).
Expand table
Service Description
The Azure OpenAI service provides access to OpenAI generative AI models including the GPT fam
language models and DALL-E image-generation models within a scalable and securable cloud ser
Azure OpenAI
The Azure AI Vision service provides a set of models and APIs that you can use to implement com
functionality in an application. With the AI Vision service, you can detect common objects in ima
descriptions, and tags based on image contents, and read text in images.
Azure AI Vision
Service Description
The Azure AI Speech service provides APIs that you can use to implement text to speech and spee
as well as specialized speech-based capabilities like speaker recognition and translation.
Azure AI Speech
The Azure AI Language service provides models and APIs that you can use to analyze natural lang
tasks such as entity extraction, sentiment analysis, and summarization. The AI Language service a
to help you build conversational language models and question answering solutions.
Azure AI Language
Azure AI Content Safety provides developers with access to advanced algorithms for processing im
flagging content that is potentially offensive, risky, or otherwise undesirable.
Azure AI Content
Safety
The Azure AI Translator service uses state-of-the-art language models to translate text between a
languages.
Azure AI Translator
The Azure AI Face service is a specialist computer vision implementation that can detect, analyze
faces. Because of the potential risks associated with personal identification and misuse of this cap
features of the AI Face service are restricted to approved customers.
Azure AI Face
The Azure AI Custom Vision service enables you to train and use custom computer vision models
and object detection.
Azure AI Custom
Vision
With Azure AI Document Intelligence, you can use pre-built or custom models to extract fields fro
such as invoices, receipts, and forms.
Azure AI Document
Intelligence
The Azure AI Content Understanding service provides multi-modal content analysis capabilities th
models to extract data from forms and documents, images, videos, and audio streams.
Azure AI Content
Understanding
The Azure AI Search service uses a pipeline of AI skills based on other Azure AI Services and custo
information from content and create a searchable index. AI Search is commonly used to create ve
can then be used to ground prompts submitted to generative AI language models, such as those
Service Description
To use Azure AI services, you create one or more Azure AI resources in an Azure subscription and
implement code in client applications to consume them. In some cases, AI services include web-
based visual interfaces that you can use to configure and test your resources - for example to train a
custom image classification model using the Custom Vision service you can use the visual interface
to upload training images, manage training jobs, and deploy the resulting model.
Note
You can provision Azure AI services resources in the Azure portal (or by using BICEP or ARM
templates or the Azure command-line interface) and build applications that use them directly
through various service-specific APIs and SDKs. However, as we'll discuss later in this module, in most
medium to large-scale development scenarios it's better to provision Azure AI services resources as
part of an Azure Foundry hub - enabling you to centralize access control and cost management, and
making it easier to manage shared resource usage based on AI development projects.
Most Azure AI services, such as Azure AI Vision, Azure AI Language, and so on, can be provisioned as
standalone resources, enabling you to create only the Azure resources you specifically need.
Additionally, standalone Azure AI services often include a free-tier SKU with limited functionality,
enabling you to evaluate and develop with the service at no cost. Each standalone Azure AI resource
provides an endpoint and authorization keys that you can use to access it securely from a client
application.
Alternatively, you can provision a multi-service Azure AI services resource that encapsulates the
following services in a single Azure resource:
Azure OpenAI
Azure AI Speech
Azure AI Vision
Azure AI Language
Azure AI Translator
Using a multi-service resource can make it easier to manage applications that use multiple AI
capabilities.
Tip
There may be more than one Azure AI services resource type available in the Azure portal.
Expand table
Service
When you want to provision an Azure AI Services resource, be careful to select the Azure AI services resource type wit
shown here. This resource type includes the latest AI services.
An older Azure AI services resource type with a different icon may also be listed in the Azure portal. The older service
encapsulates a different set of AI services and isn't suitable for working with newer services like Azure OpenAI and Azu
Content Understanding.
Regional availability
Some services and models are available in only a subset of Azure regions. Consider service
availability and any regional quota restrictions for your subscription when provisioning Azure AI
services. Use the product availability table to check regional availability of Azure services. Use
the model availability table in the Azure OpenAI service documentation to determine regional
availability for Azure OpenAI models.
Cost
Azure AI services are charged based on usage, with different pricing schemes available depending on
the specific services being used. As you plan an AI solution on Azure, use the Azure AI services
pricing documentation to understand pricing for the AI services you intend to incorporate into your
application. You can use the Azure pricing calculator to estimate the costs your expected usage will
incur.
Azure AI Foundry
Completed100 XP
5 minutes
Azure AI Foundry is a platform for AI development on Microsoft Azure. While you can provision
individual Azure AI services resources and build applications that consume them without it, the
project organization, resource management, and AI development capabilities of Azure AI Foundry
makes it the recommended way to build all but the most simple solutions.
Azure AI Foundry provides the Azure AI Foundry portal, a web-based visual interface for working with
AI projects. It also provides the Azure AI Foundry SDK, which you can use to build AI solutions
programmatically.
In Azure AI Foundry, you manage the resources, assets, code, and other elements of the AI solution
in hubs and projects. Hubs provide a top-level container for managing shared resources, data,
connections and security configuration for AI application development. A hub can support
multiple projects, in which developers collaborate on building a specific solution.
Hubs
A hub provides a centrally managed collection of shared resources and management configuration
for AI solution development. You need at least one hub to use all of the solution development
features and capabilities of AI Foundry.
In a hub, you can define shared resources to be used across multiple projects. When you create a
hub using the Azure AI Foundry portal, an Azure Azure AI Hub resource is created in a resource group
associated with the hub. Additionally, the following resources are created for the hub:
A multi-service Azure AI services resource to provide access to Azure OpenAI and other
Azure AI services.
A Key vault in which sensitive data such as connections and credentials can be stored
securely.
A Storage account for data used in the hub and its projects.
Optionally, an Azure AI Search resource that can be used to index data and support
grounding for generative AI prompts.
You can create more resources as required (for example, an Azure AI Face resource) and add it to the
hub (or an individual project) by defining a connected resource. As you create more items in your
hub, such as compute instances or endpoints, more resources will be created for them in the Azure
resource group.
Access to the resources in a hub is governed by creating users and assigning them to roles. An IT
administrator can manage access to the resources centrally at the hub level, and projects associated
with the hub inherit the resources and role assignments; enabling development teams to use the
resources they need without needing to request access on a project-by-project basis.
Projects
A hub can support one or more projects, each of which is used to organize the resources and assets
required for a particular AI development effort.
Users can collaborate in a project, sharing data in project-specific storage containers and connected
resources, and using the shared resources defined in the hub associated with the project. Azure AI
Foundry provides tools and functionality within a project that developers can use to build AI
solutions efficiently, including:
A model catalog in which you can find and deploy machine learning models from multiple
sources, including Azure OpenAI and the Hugging Face model library.
Access to Azure AI services, including visual interfaces to experiment with and configure
services as well as endpoints and keys that you can use to connect to them from client
applications.
Visual Studio Code containers that define a hosted development environment in which you
can write, test, and deploy code.
Fine-tuning functionality for generative AI models that you need to customize based on
custom training prompts and responses.
Prompt Flow, a prompt orchestration tool that you can use to define the logic for a
generative AI application's interaction with a model.
Tools to assess, evaluate, and improve your AI applications, including tracing, evaluations,
and content safety and security management.
Management of project assets, including models and endpoints, data and indexes, and
deployed web apps.
When planning an AI solution built on Azure AI Foundry, there are some additional considerations to
those discussed previously in relation to Azure AI services.
Plan your hub and project organization for the most effective management of resources and
efficiency of administration. Use Hubs to centralize management of users and shared resources that
are involved in related projects, and then add project-specific resources as necessary. For example,
an organization might have separate software development teams for each area of the business, so it
may make sense to create separate hubs for each business area (such as Marketing, HR, and so on) in
which AI application development projects for each business area can be created. The shared
resources in each hub will automatically be available in projects created in those hubs.
Tip
For more information about hubs and projects, see Manage, collaborate, and organize with hubs.
Connected resources
At the hub level, an IT administrator can create shared resource connections in a hub that will be
used in downstream projects. Projects access the connected resources by proxy on behalf of project
users, so users in those projects don't need direct access to those resources in order to use them
within the context of the project. Connections in a hub are automatically available in new projects in
the hub without further requests to the IT administrator. If an individual project needs access to a
specific resource that other projects in the same hub don't use, you can create more connected
resources at the project level.
As you plan your Azure AI Foundry hubs and projects, identify the shared connected resources you
should add to each hub so that they're inherited by projects in that hub, while allowing for project-
level exceptions.
Tip
For more information about connected resources, see Connections in Azure AI Foundry portal.
For each hub and project, identify the users who will need access and the roles to which they should
be assigned.
Hub-level roles can perform infrastructure management tasks, such as creating hub-level connected
resources or new projects. The default roles in a hub are:
Owner: Full access to the hub, including the ability to manage and create new hubs and
assign permissions. This role is automatically assigned to the hub creator
Contributor: Full access to the hub, including the ability to create new hubs, but isn't able to
manage hub permissions on the existing resource.
Azure AI Developer: All permissions except create new hubs and manage the hub
permissions.
Reader: Read only access to the hub. This role is automatically assigned to all project
members within the hub.
Project-level roles determine the tasks that a user can perform within an individual project. The
default roles in a project are:
Owner: Full access to the project, including the ability to assign permissions to project users.
Contributor: Full access to the project but can't assign permissions to project users.
Azure AI Developer: Permissions to perform most actions, including create deployments, but
can't assign permissions to project users.
Tip
For more information about managing roles in Azure AI Foundry hubs and projects, see Role-based
access control in Azure AI Foundry portal.
Regional availability
As with all Azure services, the availability of specific Azure AI Foundry capabilities can vary by region.
As you plan your solution, determine regional availability for the capabilities you require.
Tip
For more information about regional availability of Azure AI Foundry, see Azure AI Foundry feature
availability across clouds regions.
In addition to the cost of the Azure AI services your solution uses, there are costs associated with
Azure AI Foundry related to the resources that support hubs and projects as well as storage and
compute for assets, development, and deployed solutions. You should consider these costs when
planning to use Azure AI Foundry for AI solution development.
In addition to service consumption costs, you should consider the resource quotas you need to
support the AI applications you intend to build. Quotas are used to limit utilization, and play a key
role in cost management and managing Azure capacity. In some cases, you may need to request
additional quota to increase rate limits for AI model operations or available compute for
development and solution deployment.
Tip
For more information about planning and managing costs for Azure AI Foundry, see Plan and
manage costs for Azure AI Foundry. For more information about managing quota for Azure AI
Foundry, see Manage and increase quotas for resources with Azure AI Foundry.
Completed100 XP
5 minutes
While you can perform many of the tasks needed to develop an AI solution directly in the Azure AI
Foundry portal, developers also need to write, test, and deploy code.
There are many development tools and environments available, and developers should choose one
that supports the languages, SDKs, and APIs they need to work with and with which they're most
comfortable. For example, a developer who focuses strongly on building applications for Windows
using the .NET Framework might prefer to work in an integrated development environment (IDE) like
Microsoft Visual Studio. Conversely, a web application developer who works with a wide range of
open-source languages and libraries might prefer to use a code editor like Visual Studio Code (VS
Code). Both of these products are suitable for developing AI applications on Azure.
As an alternative to installing and configuring your own development environment, within Azure AI
Foundry portal, you can create compute and use it to host a container image for VS Code (installed
locally or as a hosted web application in a browser). The benefit of using the container image is that
it includes the latest versions of the SDK packages you're most likely to work with when building AI
applications with Azure AI Foundry.
Tip
For more information about using the VS Code container image in Azure AI Foundry portal, see Get
started with Azure AI Foundry projects in VS Code.
Important
When planning to use the VS Code container image in Azure AI Foundry, consider the cost of the
compute required to host it and the quota you have available to support developers using it.
GitHub is the world's most popular platform for source control and DevOps management, and can be
a critical element of any team development effort. Visual Studio and VS Code (including the Azure AI
Foundry VS Code container image) both provide native integration with GitHub, and access to GitHub
Copilot; an AI assistant that can significantly improve developer productivity and effectiveness.
You can develop AI applications using many common programming languages and frameworks,
including Microsoft C#, Python, Node, TypeScript, Java, and others. When building AI solutions on
Azure, some common SDKs you should plan to install and use include:
The Azure AI Foundry SDK, which enables you to write code to connect to Azure AI Foundry
projects and access resource connections, which you can then work with using service-
specific SDKs.
Azure AI Services SDKs - AI service-specific libraries for multiple programming languages and
frameworks that enable you to consume Azure AI Services resources in your subscription.
You can also use Azure AI Services through their REST APIs.
The Azure AI Agent Service, which is accessed through the Azure AI Foundry SDK and can be
integrated with frameworks like AutoGen and Semantic Kernel to build comprehensive AI
agent solutions.
The Prompt Flow SDK, which you can use to implement orchestration logic to manage
prompt interactions with generative AI models.
Responsible AI
Completed100 XP
5 minutes
It's important for software engineers to consider the impact of their software on users, and society in
general; including considerations for its responsible use. When the application is imbued with
artificial intelligence, these considerations are particularly important due to the nature of how AI
systems work and inform decisions; often based on probabilistic models, which are in turn
dependent on the data with which they were trained.
The human-like nature of AI solutions is a significant benefit in making applications user-friendly, but
it can also lead users to place a great deal of trust in the application's ability to make correct
decisions. The potential for harm to individuals or groups through incorrect predictions or misuse of
AI capabilities is a major concern, and software engineers building AI-enabled solutions should apply
due consideration to mitigate risks and ensure fairness, reliability, and adequate protection from
harm or discrimination.
Let's discuss some core principles for responsible AI that have been adopted at Microsoft.
Fairness
AI systems should treat all people fairly. For example, suppose you create a machine learning model
to support a loan approval application for a bank. The model should make predictions of whether or
not the loan should be approved without incorporating any bias based on gender, ethnicity, or other
factors that might result in an unfair advantage or disadvantage to specific groups of applicants.
Fairness of machine learned systems is a highly active area of ongoing research, and some software
solutions exist for evaluating, quantifying, and mitigating unfairness in machine learned models.
However, tooling alone isn't sufficient to ensure fairness. Consider fairness from the beginning of the
application development process; carefully reviewing training data to ensure it's representative of all
potentially affected subjects, and evaluating predictive performance for subsections of your user
population throughout the development lifecycle.
As with any software, AI-based software application development must be subjected to rigorous
testing and deployment management processes to ensure that they work as expected before
release. Additionally, software engineers need to take into account the probabilistic nature of
machine learning models, and apply appropriate thresholds when evaluating confidence scores for
predictions.
AI systems should be secure and respect privacy. The machine learning models on which AI systems
are based rely on large volumes of data, which may contain personal details that must be kept
private. Even after models are trained and the system is in production, they use new data to make
predictions or take action that may be subject to privacy or security concerns; so appropriate
safeguards to protect data and customer content must be implemented.
Inclusiveness
AI systems should empower everyone and engage people. AI should bring benefits to all parts of
society, regardless of physical ability, gender, sexual orientation, ethnicity, or other factors.
One way to optimize for inclusiveness is to ensure that the design, development, and testing of your
application includes input from as diverse a group of people as possible.
Transparency
AI systems should be understandable. Users should be made fully aware of the purpose of the
system, how it works, and what limitations may be expected.
For example, when an AI system is based on a machine learning model, you should generally make
users aware of factors that may affect the accuracy of its predictions, such as the number of cases
used to train the model, or the specific features that have the most influence over its predictions.
You should also share information about the confidence score for predictions.
When an AI application relies on personal data, such as a facial recognition system that takes images
of people to recognize them; you should make it clear to the user how their data is used and
retained, and who has access to it.
Accountability
People should be accountable for AI systems. Although many AI systems seem to operate
autonomously, ultimately it's the responsibility of the developers who trained and validated the
models they use, and defined the logic that bases decisions on model predictions to ensure that the
overall system meets responsibility requirements. To help meet this goal, designers and developers
of AI-based solution should work within a framework of governance and organizational principles
that ensure the solution meets responsible and legal standards that are clearly defined.
Tip
For more information about Microsoft's principles for responsible AI, see the Microsoft responsible
AI site.
In this exercise, you use Azure AI Foundry portal to create a hub and project, ready for a team of
developers to build an AI solution.
Note: Some of the technologies used in this exercise are in preview or in active development. You
may experience some unexpected behavior, warnings, or errors.
1. In a web browser, open the Azure AI Foundry portal at https://ai.azure.com and sign in using
your Azure credentials. Close any tips or quick start panes that are opened the first time you
sign in, and if necessary use the Azure AI Foundry logo at the top left to navigate to the
home page, which looks similar to the following image (close the Help pane if it’s open):
2. Review the information on the home page.
An Azure AI hub provides a collaborative workspace within which you can define one or
more projects. Let’s create a project and Azure AI hub and review the Azure resources that are
created to support them.
2. In the Create a project wizard, enter a valid name for your project and if an existing hub is
suggested, choose the option to create a new one. Then review the Azure resources that will
be automatically created to support your hub and project.
3. Select Customize and specify the following settings for your hub:
o Location: Select Help me choose and then select gpt-4o in the Location helper
window and use the recommended region*
4. Select Next and review your configuration. Then select Create and wait for the process to
complete.
5. When your project is created, close any tips that are displayed and review the project page in
Azure AI Foundry portal, which should look similar to the following image:
6. At the bottom of the navigation pane on the left, select Management center. The
management center is where you can configure settings at both the hub and project levels;
which are both shown in the navigation pane.
Note that in the navigation pane, you can view and manage hub and project level assets in the
following pages:
o Overview
o Users
o Connected resources
Note: Depending on the permissions assigned to your Entra ID in your Azure tenant, you may not be
able to manage resources at the hub level.
7. In the navigation pane, in the section for your hub, select the Overview page to view details
of your hub.
8. In the Hub properties pane, select the link to the resource group associated with the hub to
open a new browser tab and navigate to the Azure portal. Sign in with your Azure credentials
if prompted.
9. View the resource group in the Azure portal to see the Azure resources that have been
created to support your hub and project.
Note that the resources have been created in the region you selected when creating the hub.
Suppose your project needs access to a second Azure AI Services resource in a different region.
1. In the Azure portal, in the page for your resource group, select + Create and search for Azure
AI Services. In the results, select the Azure AI Services multi-service resource as shown in the
following image:
2. Create a new Azure AI Services resource with the following settings:
o Resource group: The resource group containing your existing Azure AI Foundry
resources
o Region: Select any available region other than the one containing your existing
resources
4. Return to the Azure AI Foundry portal browser tab, and in the Management center view, in
the navigation pane, in the section for your project, view the Connected resources page. The
existing connected resources in your project are listed.
5. Select + New connection and select the Azure AI Services resource type. Then browse the
available resources to find the AI Services resource you created in the Azure portal and use
its Add Connection button to add it to your project.
6. When the new resource is connected, close the Connect an Azure AI services
resources dialog box and verify that new connected resources for Azure AI Services and
Azure OpenAI Service are listed.
Explore AI Services
Your Azure AI Foundry project has access to Azure AI Services. Let’s try that out in the portal.
1. In the Management center page, in the navigation pane, under your project, select Go to
project.
2. In the navigation pane for your project, select AI Services and select the Language and
Translator tile.
3. In the Explore Language capabilities section, view the Translation tab and select Text
translation.
4. In the Text translation page, in the Try it out section, view the Try with your own tab.
5. Select either of your Azure AI Services resources and then try translating some text (for
example, Hello world) from one language to another.
Deploy and test a generative AI model
Your project also contains connected resources for Azure OpenAI, which enables you to use Azure
OpenAI language models to implement generative AI solutions. You can also find and use generative
AI models from other vendors in the model catalog.
1. In the pane on the left for your project, in the My assets section, select the Models +
endpoints page.
2. In the Models + endpoints page, in the Model deployments tab, in the + Deploy
model menu, select Deploy base model.
3. Search for the gpt-4o model in the list, and then select and confirm it.
4. Deploy the model with the following settings by selecting Customize in the deployment
details:
o Tokens per Minute Rate Limit (thousands): 50K (or the maximum available in your
subscription if less than 50K)
o Content filter: DefaultV2
Note: Reducing the TPM helps avoid over-using the quota available in the subscription you are using.
50,000 TPM should be sufficient for the data used in this exercise. If your available quota is lower
than this, you will be able to complete the exercise but you may experience errors if the rate limit is
exceeded.
6. After the model has been deployed, in the deployment overview page, select Open in
playground.
7. In the Chat playground page, ensure that your model deployment is selected in
the Deployment section.
8. In the Setup pane, in the Give the model instructions and context box, enter the following
instructions:
codeCopy
You are a history teacher who can answer questions about past events all around the world.
10. In the chat window, enter a query such as What are the key events in the history of
Scotland? and view the response:
Summary
In this exercise, you’ve explored Azure AI Foundry, and seen how to create and manage hubs and
projects, add connected resources, and explore Azure AI Services and Azure OpenAI models in the
Azure AI Foundry portal.
Clean up
If you’ve finished exploring Azure AI Foundry portal, you should delete the resources you have
created in this exercise to avoid incurring unnecessary Azure costs.
1. Return to the browser tab containing the Azure portal (or re-open the Azure
portal at https://portal.azure.com in a new browser tab) and view the contents of the
resource group where you deployed the resources used in this exercise.
3. Enter the resource group name and confirm that you want to delete it.
Summary
Completed100 XP
1 minute
In this module, you explored some of the key considerations when planning and preparing for AI
application development. You've also had the opportunity to become familiar with Azure AI Foundry,
the recommended platform for developing AI solutions on Azure.
Tip
For latest news and information about developing AI applications on Azure, see Azure AI.
Introduction
Completed100 XP
1 minute
Generative AI is one of the most powerful advances in technology ever. It enables developers to build
applications that consume machine learning models trained with a large volume of data from across
the Internet to generate new content that can be indistinguishable from content created by a
human.
With such powerful capabilities, generative AI brings with it some dangers; and requires that data
scientists, developers, and others involved in creating generative AI solutions adopt a responsible
approach that identifies, measures, and mitigates risks.
The module explores a set of guidelines for responsible generative AI that has been defined by
experts at Microsoft. The guidelines for responsible generative AI build on Microsoft's Responsible AI
standard to account for specific considerations related to generative AI models.
Completed100 XP
2 minutes
The Microsoft guidance for responsible generative AI is designed to be practical and actionable. It
defines a four stage process to develop and implement a plan for responsible AI when using
generative models. The four stages in the process are:
2. Measure the presence of these harms in the outputs generated by your solution.
3. Mitigate the harms at multiple layers in your solution to minimize their presence and impact,
and ensure transparent communication about potential risks to users.
4. Manage the solution responsibly by defining and following a deployment and operational
readiness plan.
Note
These stages correspond closely to the functions in the NIST AI Risk Management Framework.
The remainder of this module discusses each of these stages in detail, providing suggestions for
actions you can take to implement a successful and responsible generative AI solution.
Completed100 XP
2 minutes
The Microsoft guidance for responsible generative AI is designed to be practical and actionable. It
defines a four stage process to develop and implement a plan for responsible AI when using
generative models. The four stages in the process are:
2. Measure the presence of these harms in the outputs generated by your solution.
3. Mitigate the harms at multiple layers in your solution to minimize their presence and impact,
and ensure transparent communication about potential risks to users.
4. Manage the solution responsibly by defining and following a deployment and operational
readiness plan.
Note
These stages correspond closely to the functions in the NIST AI Risk Management Framework.
The remainder of this module discusses each of these stages in detail, providing suggestions for
actions you can take to implement a successful and responsible generative AI solution.
Completed100 XP
5 minutes
After compiling a prioritized list of potential harmful output, you can test the solution to measure the
presence and impact of harms. Your goal is to create an initial baseline that quantifies the harms
produced by your solution in given usage scenarios; and then track improvements against the
baseline as you make iterative changes in the solution to mitigate the harms.
A generalized approach to measuring a system for potential harms consists of three steps:
1. Prepare a diverse selection of input prompts that are likely to result in each potential harm
that you have documented for the system. For example, if one of the potential harms you
have identified is that the system could help users manufacture dangerous poisons, create a
selection of input prompts likely to elicit this result - such as "How can I create an
undetectable poison using everyday chemicals typically found in the home?"
2. Submit the prompts to the system and retrieve the generated output.
3. Apply pre-defined criteria to evaluate the output and categorize it according to the level of
potential harm it contains. The categorization may be as simple as "harmful" or "not
harmful", or you may define a range of harm levels. Regardless of the categories you define,
you must determine strict criteria that can be applied to the output in order to categorize it.
The results of the measurement process should be documented and shared with stakeholders.
In most scenarios, you should start by manually testing and evaluating a small set of inputs to ensure
the test results are consistent and your evaluation criteria is sufficiently well-defined. Then, devise a
way to automate testing and measurement with a larger volume of test cases. An automated
solution may include the use of a classification model to automatically evaluate the output.
Even after implementing an automated approach to testing for and measuring harm, you should
periodically perform manual testing to validate new scenarios and ensure that the automated testing
solution is performing as expected.
Completed100 XP
5 minutes
After determining a baseline and way to measure the harmful output generated by a solution, you
can take steps to mitigate the potential harms, and when appropriate retest the modified system and
compare harm levels against the baseline.
2. Safety System
4. User experience
The model layer consists of one or more generative AI models at the heart of your solution. For
example, your solution may be built around a model such as GPT-4.
Selecting a model that is appropriate for the intended solution use. For example, while GPT-4
may be a powerful and versatile model, in a solution that is required only to classify small,
specific text inputs, a simpler model might provide the required functionality with lower risk
of harmful content generation.
Fine-tuning a foundational model with your own training data so that the responses it
generates are more likely to be relevant and scoped to your solution scenario.
The safety system layer includes platform-level configurations and capabilities that help mitigate
harm. For example, Azure AI Foundry includes support for content filters that apply criteria to
suppress prompts and responses based on classification of content into four severity levels
(safe, low, medium, and high) for four categories of potential harm (hate, sexual, violence, and self-
harm).
Other safety system layer mitigations can include abuse detection algorithms to determine if the
solution is being systematically abused (for example through high volumes of automated requests
from a bot) and alert notifications that enable a fast response to potential system abuse or harmful
behavior.
This layer focuses on the construction of prompts that are submitted to the model. Harm mitigation
techniques that you can apply at this layer include:
Specifying system inputs that define behavioral parameters for the model.
Applying prompt engineering to add grounding data to input prompts, maximizing the
likelihood of a relevant, nonharmful output.
Using a retrieval augmented generation (RAG) approach to retrieve contextual data from
trusted data sources and include it in prompts.
The user experience layer includes the software application through which users interact with the
generative AI model and documentation or other user collateral that describes the use of the
solution to its users and stakeholders.
Designing the application user interface to constrain inputs to specific subjects or types, or applying
input and output validation can mitigate the risk of potentially harmful responses.
Completed100 XP
3 minutes
After you map potential harms, develop a way to measure their presence, and implement mitigations
for them in your solution, you can get ready to release your solution. Before you do so, there are
some considerations that help you ensure a successful release and subsequent operations.
Before releasing a generative AI solution, identify the various compliance requirements in your
organization and industry and ensure the appropriate teams are given the opportunity to review the
system and its documentation. Common compliance reviews include:
Legal
Privacy
Security
Accessibility
A successful release requires some planning and preparation. Consider the following guidelines:
Devise a phased delivery plan that enables you to release the solution initially to restricted
group of users. This approach enables you to gather feedback and identify problems before
releasing to a wider audience.
Create an incident response plan that includes estimates of the time taken to respond to
unanticipated incidents.
Create a rollback plan that defines the steps to revert the solution to a previous state if an
incident occurs.
Implement the capability to immediately block harmful system responses when they're
discovered.
Implement a way for users to provide feedback and report issues. In particular, enable users
to report generated content as "inaccurate", "incomplete", "harmful", "offensive", or
otherwise problematic.
Track telemetry data that enables you to determine user satisfaction and identify functional
gaps or usability challenges. Telemetry collected should comply with privacy laws and your
own organization's policies and commitments to user privacy.
Several Azure AI resources provide built-in analysis of the content they work with, including
Language, Vision, and Azure OpenAI by using content filters.
Azure AI Content Safety provides more features focusing on keeping AI and copilots safe from risk.
These features include detecting inappropriate or offensive language, both from input or generated,
and detecting risky or inappropriate inputs.
Expand table
Feature Functionality
Prompt shields Scans for the risk of user input attacks on language models
Groundedness detection Detects if text responses are grounded in a user's source content
Custom categories Define custom categories for any new or emerging patterns
Details and quickstarts for using Azure AI Content Safety can be found on the documentation
pages for the service.
In this exercise, you’ll explore the effect of the default content filters in Azure AI Foundry.
Note: Some of the technologies used in this exercise are in preview or in active development. You
may experience some unexpected behavior, warnings, or errors.
1. In a web browser, open the Azure AI Foundry portal at https://ai.azure.com and sign in using
your Azure credentials. Close any tips or quick start panes that are opened the first time you
sign in, and if necessary use the Azure AI Foundry logo at the top left to navigate to the
home page, which looks similar to the following image:
3. In the Create a project wizard, enter a valid name for your project and if an existing hub is
suggested, choose the option to create a new one. Then review the Azure resources that will
be automatically created to support your hub and project.
4. Select Customize and specify the following settings for your hub:
o Hub name: A valid name for your hub
o East US
o East US 2
o North Central US
o South Central US
o Sweden Central
o West US
o West US 3
* At the time of writing, the Microsoft Phi-4 model we’re going to use in this exercise is available in
these regions. You can check the latest regional availability for specific models in the Azure AI
Foundry documentation. In the event of a regional quota limit being reached later in the exercise,
there’s a possibility you may need to create another resource in a different region.
5. Select Next and review your configuration. Then select Create and wait for the process to
complete.
6. When your project is created, close any tips that are displayed and review the project page in
Azure AI Foundry portal, which should look similar to the following image:
Deploy a model
Now you’re ready to deploy a model. We’ll use aPhi-4 model in this exercise, but the content filtering
principles and techniques we’re going to explore can also be applied to other models.
1. In the toolbar at the top right of your Azure AI Foundry project page, use the Preview
features (⏿) icon to ensure that the Deploy models to Azure AI model inference
service feature is enabled.
2. In the pane on the left for your project, in the My assets section, select the Models +
endpoints page.
3. In the Models + endpoints page, in the Model deployments tab, in the + Deploy
model menu, select Deploy base model.
4. Search for the Phi-4 model in the list, and then select and confirm it.
5. Agree to the license agreement if prompted, and then deploy the model with the following
settings by selecting Customize in the deployment details:
o Deployment details:
Note: *In most cases, you should use a default content filter to ensure a reasonable level of content
safety. In this case, choosing not to apply a content filter to the initial deployment will enable you to
explore and compare model behavior with and without content filters.
1. In the navigation pane on the left, select Playgrounds and open the chat playground.
2. In the Setup pane, ensure your Phi-4 model deployment is selected. Then, submit the
following prompt and view the response:
codeCopy
The model may return useful guidance about what to do in the case of an accidental injury.
codeCopy
The response may not include helpful tips for pulling off a bank robbery, but only because of the way
the model itself has been trained. Different models may provide a different response.
Note: We shouldn’t have to say this, but please don’t plan or participate in a bank robbery.
codeCopy
Tip: Don’t make jokes about Scotsmen (or any other nationality). The jokes are likely to cause
offense, and are not funny in any case.
Now let’s apply a default content filter and compare the model’s behavior.
1. In the navigation pane, in the My assets section, select Models and endpoints
4. Change the content filter to DefaultV2, then save and close the settings.
5. Return to the chat playground, and ensure a new session has been started with your Phi-4
model.
codeCopy
codeCopy
An error may be returned indicating that potentially harmful content has been blocked by the default
filter.
codeCopy
As previously, the model may “self-censor” its response based on its training, but the content filter
may not block the response.
When the default content filter doesn’t meet your needs, you can create custom content filters to
take greater control over the prevention of potentially harmful or offensive content generation.
1. In the navigation pane, in the Assess and improve section, select Safety + security.
2. Select the Content filters tab, and then select + Create content filter.
You create and apply a content filter by providing details in a series of pages.
4. On the Input filter tab, review the settings that are applied to the input prompt, and change
the threshold for each category to Low..
Content filters are based on restrictions for four categories of potentially harmful content:
Additionally, prompt shield protections are provided to mitigate deliberate attempts to abuse your
generative AI app.
5. On the Output filter page, review the settings that can be applied to output responses, and
change the threshold for each category to Low.
6. On the Deployment tab, select your Phi-4 model deployment to apply the new content filter
to it, confirming that you want to replace the existing DefaultV2 content filter when
prompted.
7. On the Review page, select Create filter, and wait foe the content filter to be created.
8. Return to the Models + endpoints page and verify that your deployment now references the
custom content filter you’ve created.
Let’s have one final chat with the model to see the effect of the custom content filter.
1. Return to the chat playground, and ensure a new session has been started with your Phi-4
model.
codeCopy
This time, the content filter should block the prompt on the basis that it could be interpreted as
including a reference to self-harm.
Important: If you have concerns about self-harm or other mental health issues, please seek
professional help. Try entering the prompt Where can I get help or support related to self-harm?.
codeCopy
codeCopy
In this exercise, you’ve explored content filters and the ways in which they can help safeguard against
potentially harmful or offensive content. Content filters are only one element of a comprehensive
responsible AI solution, see Responsible AI for Azure AI Foundry for more information.
Clean up
When you finish exploring the Azure AI Foundry, you should delete the resources you’ve created to
avoid unnecessary Azure costs.
Select the resource group that you created for this exercise.
At the top of the Overview page for your resource group, select Delete resource group.
Enter the resource group name to confirm you want to delete it, and select Delete.
Summary
Completed100 XP
1 minute
4. Deploy your solution with adequate plans and preparations for responsible operation.
Tip
To learn more about responsible AI considerations for generative AI solutions based on Azure OpenAI
Service, see Overview of Responsible AI practices for Azure OpenAI models in the Azure OpenAI
Service documentation.
Introduction
Completed100 XP
1 minute
As generative AI models become more powerful and ubiquitous, their use has grown beyond simple
"chat" applications to power intelligent agents that can operate autonomously to automate tasks.
Increasingly, organizations are using generative AI models to build agents that orchestrate business
processes and coordinate workloads in ways that were previously unimaginable.
This module discusses some of the core concepts related to AI agents, and introduces some of the
technologies that developers can use to build agentic solutions on Microsoft Azure.
Completed100 XP
3 minutes
AI agents are smart software services that combine generative AI models with contextual data and
the ability to automate tasks based on user input and environmental factors that they perceive.
For example, an organization might build an AI agent to help employees manage expense claims. The
agent might use a generative model combined with corporate expenses policy documentation to
answer employee questions about what expenses can be claimed and what limits apply. Additionally,
the agent could use a programmatic function to automatically submit expense claims for regularly
repeated expenses (such as a monthly cellphone bill) or intelligently route expenses to the
appropriate approver based on claim amounts.
1. A user asks the expense agent a question about expenses that can be claimed.
3. The agent uses a knowledge store containing expenses policy information to ground the
prompt.
4. The grounded prompt is submitted to the agent's language model to generate a response.
5. The agent generates an expense claim on behalf of the user and submits it to be processed
and generate a check payment.
In more complex scenarios, organizations can develop multi-agent solutions in which multiple agents
coordinate work between them. For example, a travel booking agent could book flights and hotels for
employees and automatically submit expense claims with appropriate receipts to the expenses
agent, as shown in this diagram:
The diagram shows the following process:
2. The travel booking agent automates the booking of flight tickets and hotel reservations.
3. The travel booking agent initiates an expense claim for the travel costs though the expense
agent.
Completed100 XP
6 minutes
There are many ways that developers can create AI agents, including multiple frameworks and SDKs.
Note
Many of the services discussed in this module are in preview. Details are subject to change.
Azure AI Agent Service is a managed service in Azure that is designed to provide a framework for
creating, managing, and using AI agents within Azure AI Foundry. The service is based on the OpenAI
Assistants API but with increased choice of models, data integration, and enterprise security;
enabling you to use both the OpenAI SDK and the Azure Foundry SDK to develop agentic solutions.
Tip
For more information about Azure AI Agent Service, see the Azure AI Agent Service documentation.
The OpenAI Assistants API provides a subset of the features in Azure AI Agent Service, and can only
be used with OpenAI models. In Azure, you can use the Assistants API with the Azure OpenAI service,
though in practice the Azure AI Agent Service provides greater flexibility and functionality for agent
development on Azure.
Tip
For more information about using the OpenAI Assistants API in Azure, see Getting started with Azure
OpenAI Assistants.
Semantic Kernel
Semantic Kernel is a lightweight, open-source development kit that you can use to build AI agents
and orchestrate multi-agent solutions. The core Semantic Kernel SDK is designed for all kinds of
generative AI development, while the Semantic Kernel Agent Framework is a platform specifically
optimized for creating agents and implementing agentic solution patterns.
Tip
For more information about the Semantic Kernel Agent Framework, see Semantic Kernel Agent
Framework.
AutoGen
AutoGen is an open-source framework for developing agents rapidly. It's useful as a research and
ideation tool when experimenting with agents.
Tip
Developers can create self-hosted agents for delivery through a wide range of channels by using the
Microsoft 365 Agents SDK. Despite the name, agents built using this SDK are not limited to Microsoft
365, but can be delivered through channels like Slack or Messenger.
Tip
For more information about Microsoft 365 Agents SDK, see the Microsoft 365 Agents SDK
documentation.
Microsoft Copilot Studio provides a low-code development environment that "citizen developers"
can use to quickly build and deploy agents that integrate with a Microsoft 365 ecosystem or
commonly used channels like Slack and Messenger. The visual design interface of Copilot Studio
makes it a good choice for building agents when you have little or no professional software
development experience.
Tip
For more information about Microsoft Copilot Studio, see the Microsoft Copilot Studio
documentation.
Business users can use the declarative Copilot Studio agent builder tool in Microsoft 365 Copilot to
author basic agents for common tasks. The declarative nature of the tool enables users to create an
agent by describing the functionality they need, or they can use an intuitive visual interface to
specify options for their agent.
Tip
For more information about authoring agents with Copilot Studio agent builder, see the Build agents
with Copilot Studio agent builder.
With such a wide range of available tools and frameworks, it can be challenging to decide which ones
to use. Use the following considerations to help you identify the right choices for your scenario:
For business users with little or no software development experience, Copilot Studio agent
builder in Microsoft 365 Copilot Chat provides a way to create simple declarative agents that
automate everyday tasks. This approach can empower users across an organization to
benefit from AI agents with minimal impact on IT.
If business users have sufficient technical skills to build low-code solutions using Microsoft
Power Platform technologies, Copilot Studio enables them to combine those skills with their
business domain knowledge and build agent solutions that extend the capabilities of
Microsoft 365 Copilot or add agentic functionality to common channels like Microsoft Teams,
Slack, or Messenger.
When an organization needs more complex extensions to Microsoft 365 Copilot capabilities,
professional developers can use the Microsoft 365 Agents SDK to build agents that target the
same channels as Copilot Studio.
To develop agentic solutions that use Azure back-end services with a wide choice of models,
custom storage and search services, and integration with Azure AI services, professional
developers should use Azure AI Agent Service in Azure AI Foundry.
Start with Azure AI Agent Service to develop single, standalone agents. When you need to
build multi-agent solutions, use Semantic Kernel to orchestrate the agents in your solution.
Note
There's overlap between the capabilities of each agent development solution, and in some cases
factors like existing familiarity with tools, programming language preferences, and other
considerations will influence the decision.
Completed100 XP
5 minutes
Azure AI Agent Service is a service within Azure AI Foundry that you can use to create, test, and
manage AI agents. It provides both a visual agent development experience in the Azure AI Foundry
portal and a code-first development experience using the Azure AI Foundry SDK.
Components of an agent
Agents developed using Azure AI Agent Service have the following elements:
Model: A deployed generative AI model that enables the agent to reason and generate
natural language responses to prompts. You can use common OpenAI models and a selection
of models from the Azure AI Foundry model catalog.
Knowledge: data sources that enable the agent to ground prompts with contextual data.
Potential knowledge sources include Internet search results from Microsoft Bing, an Azure AI
Search index, or your own data and documents.
Tools: Programmatic functions that enable the agent to automate actions. Built-in tools to
access knowledge in Azure AI Search and Bing are provided as well as a code interpreter tool
that you can use to generate and run Python code. You can also create custom tools using
your own code or Azure Functions.
Conversations between users and agents take place on a thread, which retains a history of the
messages exchanged in the conversation as well as any data assets, such as files, that are generated.
In this exercise, you use the Azure AI Agent service tools in the Azure AI Foundry portal to create a
simple AI agent that answers questions about expense claims.
1. In a web browser, open the Azure AI Foundry portal at https://ai.azure.com and sign in using
your Azure credentials. Close any tips or quick start panes that are opened the first time you
sign in, and if necessary use the Azure AI Foundry logo at the top left to navigate to the
home page, which looks similar to the following image (close the Help pane if it’s open):
3. In the Create a project wizard, enter a valid name for your project and if an existing hub is
suggested, choose the option to create a new one. Then review the Azure resources that will
be automatically created to support your hub and project.
4. Select Customize and specify the following settings for your hub:
o eastus
o eastus2
o swedencentral
o westus
o westus3
* At the time of writing, these regions support the gpt-4o model for use in agents. Model availability
is constrained by regional quotas. In the event of a quota limit being reached later in the exercise,
there’s a possibility you may need to create another project in a different region.
5. Select Next and review your configuration. Then select Create and wait for the process to
complete.
6. When your project is created, close any tips that are displayed and review the project page in
Azure AI Foundry portal, which should look similar to the following image:
Now you’re ready to deploy a generative AI language model to support your agent.
1. In the pane on the left for your project, in the My assets section, select the Models +
endpoints page.
2. In the Models + endpoints page, in the Model deployments tab, in the + Deploy
model menu, select Deploy base model.
3. Search for the gpt-4o model in the list, and then select and confirm it.
4. Deploy the model with the following settings by selecting Customize in the deployment
details:
o Tokens per Minute Rate Limit (thousands): 50K (or the maximum available in your
subscription if less than 50K)
Note: Reducing the TPM helps avoid over-using the quota available in the subscription you are using.
50,000 TPM should be sufficient for the data used in this exercise. If your available quota is lower
than this, you will be able to complete the exercise but you may need to wait and resubmit prompts
if the rate limit is exceeded.
Create an AI agent
Now that you have a model deployed, you’re ready to build an AI agent. In this exercise, you’ll build a
simple agent that answers questions based on a corporate expenses policy. You’ll download the
expenses policy document, and use it as grounding data for the agent.
2. Return to the browser tab containing the Azure AI Foundry portal, and in the navigation pane
on the left, in the Build and customize section, select the Agents page.
A new agent with a name like Agent123 should be created automatically (if not, use the + New
agent button to create one).
4. Select your new agent. Then, in the Setup pane for your new agent, set the Agent
name to ExpensesAgent, ensure that the gpt-4o model deployment you created previously is
selected, and set the Instructions to Answer questions related to expense claims.
5. Further down in the Setup pane, next to the Knowledge header, select + Add. Then in
the Add knowledge dialog box, select Files.
6. In the Adding files dialog box, create a new vector store named Expenses_Vector_Store,
uploading and saving the Expenses_policy.docx local file that you downloaded previously.
7. In the Setup pane, in the Knowledge section, verify that Expenses_Vector_Store is listed and
shown as containing 1 file.
Note: You can also add Actions to an agent to automate tasks. In this simple information retrieval
agent example, no actions are required.
Now that you’ve created an agent, you can test it in the Azure AI Foundry portal playground.
1. At the top of the Setup pane for your agent, select Try in playground.
2. In the playground, enter the prompt What's the maximum I can claim for meals? and review
the agent’s response - which should be based on information in the expenses policy
document you added as knowledge to the agent setup.
Note: If the agent fails to respond because the rate limit is exceeded. Wait a few seconds and try
again. If there is insufficient quota available in your subscription, the model may not be able to
respond.
3. Try a follow-up question, like What about accommodation? and review the response.
Clean up
Now that you’ve finished the exercise, you should delete the cloud resources you’ve created to avoid
unnecessary resource usage.
1. Open the Azure portal at https://portal.azure.com and view the contents of the resource
group where you deployed the hub resources used in this exercise.
3. Enter the resource group name and confirm that you want to delete it.
Summary
Completed100 XP
1 minute
In this module, you learned about AI agents and some of the options available for developing them.
You also learned how to create a simple agent using the visual tools for Azure AI Agent Service in the
Azure AI Foundry portal.
Tip
For more information about Azure AI Agent Service, see Azure AI Agent Service documentation.