DP 900T00A ENU TrainerHandbook.
DP 900T00A ENU TrainerHandbook.
Official
Course
DP-900T00
Microsoft Azure Data
Fundamentals
DP-900T00
Microsoft Azure Data
Fundamentals
II Disclaimer
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is
not responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
1 http://www.microsoft.com/trademarks
EULA III
13. “Personal Device” means one (1) personal computer, device, workstation or other digital electronic
device that you personally own or control that meets or exceeds the hardware level specified for
the particular Microsoft Instructor-Led Courseware.
14. “Private Training Session” means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective using Microsoft Instructor-Led
Courseware. These classes are not advertised or promoted to the general public and class attend-
ance is restricted to individuals employed by or contracted by the corporate customer.
15. “Trainer” means (i) an academically accredited educator engaged by a Microsoft Imagine Academy
Program Member to teach an Authorized Training Session, (ii) an academically accredited educator
validated as a Microsoft Learn for Educators – Validated Educator, and/or (iii) a MCT.
16. “Trainer Content” means the trainer version of the Microsoft Instructor-Led Courseware and
additional supplemental content designated solely for Trainers’ use to teach a training session
using the Microsoft Instructor-Led Courseware. Trainer Content may include Microsoft PowerPoint
presentations, trainer preparation guide, train the trainer materials, Microsoft One Note packs,
classroom setup guide and Pre-release course feedback form. To clarify, Trainer Content does not
include any software, virtual hard disks or virtual machines.
2. USE RIGHTS. The Licensed Content is licensed, not sold. The Licensed Content is licensed on a one
copy per user basis, such that you must acquire a license for each individual that accesses or uses the
Licensed Content.
●● 2.1 Below are five separate sets of use rights. Only one set of rights apply to you.
1. If you are a Microsoft Imagine Academy (MSIA) Program Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User who is enrolled in the Authorized Training Session, and only immediately
prior to the commencement of the Authorized Training Session that is the subject matter
of the Microsoft Instructor-Led Courseware being provided, or
2. provide one (1) End User with the unique redemption code and instructions on how they
can access one (1) digital version of the Microsoft Instructor-Led Courseware, or
3. provide one (1) Trainer with the unique redemption code and instructions on how they
can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure each End User attending an Authorized Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Authorized Training Session,
3. you will ensure that each End User provided with the hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
EULA V
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified Trainers who have in-depth knowledge of and experience with
the Microsoft technology that is the subject of the Microsoft Instructor-Led Courseware
being taught for all your Authorized Training Sessions,
6. you will only deliver a maximum of 15 hours of training per week for each Authorized
Training Session that uses a MOC title, and
7. you acknowledge that Trainers that are not MCTs will not have access to all of the trainer
resources for the Microsoft Instructor-Led Courseware.
2. If you are a Microsoft Learning Competency Member:
1. Each license acquire may only be used to review one (1) copy of the Microsoft Instruc-
tor-Led Courseware in the form provided to you. If the Microsoft Instructor-Led Course-
ware is in digital format, you may install one (1) copy on up to three (3) Personal Devices.
You may not install the Microsoft Instructor-Led Courseware on a device you do not own or
control.
2. For each license you acquire on behalf of an End User or MCT, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Authorized Training Session and only immediately prior to
the commencement of the Authorized Training Session that is the subject matter of the
Microsoft Instructor-Led Courseware provided, or
2. provide one (1) End User attending the Authorized Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) MCT with the unique redemption code and instructions on how
they can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Authorized Training Session has their
own valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of
the Authorized Training Session,
3. you will ensure that each End User provided with a hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
VI EULA
4. you will ensure that each MCT teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified MCTs who also hold the applicable Microsoft Certification
credential that is the subject of the MOC title being taught for all your Authorized
Training Sessions using MOC,
6. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
7. you will only provide access to the Trainer Content to MCTs.
3. If you are a MPN Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Private Training Session, and only immediately prior to the
commencement of the Private Training Session that is the subject matter of the Micro-
soft Instructor-Led Courseware being provided, or
2. provide one (1) End User who is attending the Private Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) Trainer who is teaching the Private Training Session with the
unique redemption code and instructions on how they can access one (1) Trainer
Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Private Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Private Training Session,
3. you will ensure that each End User provided with a hard copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Private Training Session has their own valid
licensed copy of the Trainer Content that is the subject of the Private Training Session,
EULA VII
5. you will only use qualified Trainers who hold the applicable Microsoft Certification
credential that is the subject of the Microsoft Instructor-Led Courseware being taught
for all your Private Training Sessions,
6. you will only use qualified MCTs who hold the applicable Microsoft Certification creden-
tial that is the subject of the MOC title being taught for all your Private Training Sessions
using MOC,
7. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
8. you will only provide access to the Trainer Content to Trainers.
4. If you are an End User:
For each license you acquire, you may use the Microsoft Instructor-Led Courseware solely for
your personal training use. If the Microsoft Instructor-Led Courseware is in digital format, you
may access the Microsoft Instructor-Led Courseware online using the unique redemption code
provided to you by the training provider and install and use one (1) copy of the Microsoft
Instructor-Led Courseware on up to three (3) Personal Devices. You may also print one (1) copy
of the Microsoft Instructor-Led Courseware. You may not install the Microsoft Instructor-Led
Courseware on a device you do not own or control.
5. If you are a Trainer.
1. For each license you acquire, you may install and use one (1) copy of the Trainer Content in
the form provided to you on one (1) Personal Device solely to prepare and deliver an
Authorized Training Session or Private Training Session, and install one (1) additional copy
on another Personal Device as a backup copy, which may be used only to reinstall the
Trainer Content. You may not install or use a copy of the Trainer Content on a device you do
not own or control. You may also print one (1) copy of the Trainer Content solely to prepare
for and deliver an Authorized Training Session or Private Training Session.
2. If you are an MCT, you may customize the written portions of the Trainer Content that are
logically associated with instruction of a training session in accordance with the most recent
version of the MCT agreement.
3. If you elect to exercise the foregoing rights, you agree to comply with the following: (i)
customizations may only be used for teaching Authorized Training Sessions and Private
Training Sessions, and (ii) all customizations will comply with this agreement. For clarity, any
use of “customize” refers only to changing the order of slides and content, and/or not using
all the slides or content, it does not mean changing or modifying any slide or content.
●● 2.2 Separation of Components. The Licensed Content is licensed as a single unit and you
may not separate their components and install them on different devices.
●● 2.3 Redistribution of Licensed Content. Except as expressly provided in the use rights
above, you may not distribute any Licensed Content or any portion thereof (including any permit-
ted modifications) to any third parties without the express written permission of Microsoft.
●● 2.4 Third Party Notices. The Licensed Content may include third party code that Micro-
soft, not the third party, licenses to you under this agreement. Notices, if any, for the third party
code are included for your information only.
●● 2.5 Additional Terms. Some Licensed Content may contain components with additional
terms, conditions, and licenses regarding its use. Any non-conflicting terms in those conditions
and licenses also apply to your use of that respective component and supplements the terms
described in this agreement.
VIII EULA
laws and treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property
rights in the Licensed Content.
6. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regula-
tions. You must comply with all domestic and international export laws and regulations that apply to
the Licensed Content. These laws include restrictions on destinations, end users and end use. For
additional information, see www.microsoft.com/exporting.
7. SUPPORT SERVICES. Because the Licensed Content is provided “as is”, we are not obligated to
provide support services for it.
8. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you
fail to comply with the terms and conditions of this agreement. Upon termination of this agreement
for any reason, you will immediately stop all use of and delete and destroy all copies of the Licensed
Content in your possession or under your control.
9. LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed
Content. The third party sites are not under the control of Microsoft, and Microsoft is not responsible
for the contents of any third party sites, any links contained in third party sites, or any changes or
updates to third party sites. Microsoft is not responsible for webcasting or any other form of trans-
mission received from any third party sites. Microsoft is providing these links to third party sites to
you only as a convenience, and the inclusion of any link does not imply an endorsement by Microsoft
of the third party site.
10. ENTIRE AGREEMENT. This agreement, and any additional terms for the Trainer Content, updates and
supplements are the entire agreement for the Licensed Content, updates and supplements.
11. APPLICABLE LAW.
1. United States. If you acquired the Licensed Content in the United States, Washington state law
governs the interpretation of this agreement and applies to claims for breach of it, regardless of
conflict of laws principles. The laws of the state where you live govern all other claims, including
claims under state consumer protection laws, unfair competition laws, and in tort.
2. Outside the United States. If you acquired the Licensed Content in any other country, the laws of
that country apply.
12. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the
laws of your country. You may also have rights with respect to the party from whom you acquired the
Licensed Content. This agreement does not change your rights under the laws of your country if the
laws of your country do not permit it to do so.
13. DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS" AND "AS AVAILA-
BLE." YOU BEAR THE RISK OF USING IT. MICROSOFT AND ITS RESPECTIVE AFFILIATES GIVES NO
EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. YOU MAY HAVE ADDITIONAL CON-
SUMER RIGHTS UNDER YOUR LOCAL LAWS WHICH THIS AGREEMENT CANNOT CHANGE. TO
THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS, MICROSOFT AND ITS RESPECTIVE AFFILI-
ATES EXCLUDES ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICU-
LAR PURPOSE AND NON-INFRINGEMENT.
14. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM
MICROSOFT, ITS RESPECTIVE AFFILIATES AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO
US$5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST
PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
X EULA
■■ Module 0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Welcome to the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
■■ Module 1 Explore core data concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Explore core data concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Explore roles and responsibilities in the world of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Describe concepts of relational data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Explore concepts of non-relational data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Explore concepts of data analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
■■ Module 2 Explore relational data in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Explore relational data offerings in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Explore provisioning and deploying relational database offerings in Azure . . . . . . . . . . . . . . . . . . . . . . 79
Query relational data in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
■■ Module 3 Explore non-relational data offerings on Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Explore non-relational data offerings in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Explore provisioning and deploying non-relational data services in Azure . . . . . . . . . . . . . . . . . . . . . . . 156
Manage non-relational data stores in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
■■ Module 4 Explore modern data warehouse analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Examine components of a modern data warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Explore data ingestion in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Explore data storage and processing in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Get started building with Power BI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Module 0 Introduction
Learning objectives
After completing this course, you will be able to:
●● Describe core data concepts in Azure.
●● Explain concepts of relational data in Azure.
●● Explain concepts of non-relational data in Azure.
●● Identify components of a modern data warehouse in Azure.
Course Agenda
This course includes the following modules:
1 https://docs.microsoft.com/learn
2 Module 0 Introduction
2 https://docs.microsoft.com/learn
Module 1 Explore core data concepts
Learning objectives
In this lesson you will:
●● Identify how data is defined and stored
●● Identify characteristics of relational and non-relational data
●● Describe and differentiate data workloads
●● Describe and differentiate batch and streaming data
What is data?
Data is a collection of facts such as numbers, descriptions, and observations used in decision making. You
can classify data as structured, semi-structured, or unstructured. Structured data is typically tabular data
that is represented by rows and columns in a database. Databases that hold tables in this form are called
relational databases (the mathematical term relation refers to an organized set of data held as a table).
Each row in a table has the same set of columns. The image below illustrates an example showing two
tables in an ecommerce database. The first table contains the details of customers for an organization,
and the second holds information about products that the organization sells.
Semi-structured data is information that doesn't reside in a relational database but still has some struc-
ture to it. Examples include documents held in JavaScript Object Notation (JSON) format. The example
below shows a pair of documents representing customer information. In both cases, each customer docu-
ment includes child documents containing the name and address, but the fields in these child documents
vary between customers.
Explore core data concepts 5
## Document 1 ##
{
"customerID": "103248",
"name":
{
"first": "AAA",
"last": "BBB"
},
"address":
{
"street": "Main Street",
"number": "101",
"city": "Acity",
"state": "NY"
},
"ccOnFile": "yes",
"firstOrder": "02/28/2003"
}
## Document 2 ##
{
"customerID": "103249",
"name":
{
"title": "Mr",
"forename": "AAA",
"lastname": "BBB"
},
"address":
{
"street": "Another Street",
"number": "202",
"city": "Bcity",
"county": "Gloucestershire",
"country-region": "UK"
},
"ccOnFile": "yes"
}
There are other types of semi-structured data as well. Examples include key-value stores and graph
databases.
A key-value store is similar to a relational table, except that each row can have any number of columns.
You can use a graph database to store and query information about complex relationships. A graph
contains nodes (information about objects), and edges (information about the relationships between
objects). The image below shows an example of how you might structure the data in a graph database.
6 Module 1 Explore core data concepts
Not all data is structured or even semi-structured. For example, audio and video files, and binary data
files might not have a specific structure. They're referred to as unstructured data.
Transactional systems are often high-volume, sometimes handling many millions of transactions in a
single day. The data being processed has to be accessible very quickly. The work performed by transac-
tional systems is often referred to as Online Transactional Processing (OLTP).
To support fast processing, the data in a transactional system is often divided into small pieces. For
example, if you're using a relational system each table involved in a transaction only contains the columns
necessary to perform the transactional task. In the bank transfer example, a table holding information
about the funds in the account might only contain the account number and the current balance. Other
tables not involved in the transfer operation would hold information such as the name and address of the
customer, and the account history. Splitting tables out into separate groups of columns like this is called
normalized. The next unit discusses this process in more detail. Normalization can enable a transactional
system to cache much of the information required to perform transactions in memory, and speed
throughput.
While normalization enables fast throughput for transactions, it can make querying more complex.
Queries involving normalized tables will frequently need to join the data held across several tables back
together again. This can make it difficult for business users who might need to examine the data.
●● Data Ingestion: Data ingestion is the process of capturing the raw data. This data could be taken
from control devices measuring environmental information such as temperature and pressure,
point-of-sale devices recording the items purchased by a customer in a supermarket, financial data
recording the movement of money between bank accounts, and weather data from weather stations.
Some of this data might come from a separate OLTP system. To process and analyze this data, you
must first store the data in a repository of some sort. The repository could be a file store, a document
database, or even a relational database.
8 Module 1 Explore core data concepts
●● Data Transformation/Data Processing: The raw data might not be in a format that is suitable for
querying. The data might contain anomalies that should be filtered out, or it may require transforming
in some way. For example, dates or addresses might need to be converted into a standard format.
After data is ingested into a data repository, you may want to do some cleaning operations and
remove any questionable or invalid data, or perform some aggregations such as calculating profit,
margin, and other Key Performance Metrics (KPIs). KPIs are how businesses are measured for growth
and performance.
●● Data Querying: After data is ingested and transformed, you can query the data to analyze it. You may
be looking for trends, or attempting to determine the cause of problems in your systems. Many
database management systems provide tools to enable you to perform ad-hoc queries against your
data and generate regular reports.
●● Data Visualization: Data represented in tables such as rows and columns, or as documents, aren't
always intuitive. Visualizing the data can often be useful as a tool for examining data. You can
generate charts such as bar charts, line charts, plot results on geographical maps, pie charts, or
illustrate how data changes over time. Microsoft offers visualization tools like Power BI to provide rich
graphical representation of your data.
1 https://docs.microsoft.com/office/troubleshoot/access/database-normalization-description
Explore core data concepts 9
Non-relational databases enable you to store data in a format that more closely matches the original
structure. For example, in a document database, you could store the details of each customer in a single
document, as shown by the example in the previous unit. Retrieving the details of a customer, including
the address, is a matter of reading a single document. There are some disadvantages to using a docu-
ment database though. If two customers cohabit and have the same address, in a relational database you
would only need to store the address information once. In the diagram below, Jay and Frances Adams
both share the same address.
In a document database, the address would be duplicated in the documents for Jay and Francis Adams.
This duplication not only increases the storage required, but can also make maintenance more complex
(if the address changes, you must modify it in two documents).
## Document for Jay Adams ##
{
"customerID": "1",
"name":
{
"firstname": "Jay",
"lastname": "Adams"
},
"address":
{
"number": "12",
"street": "Park Street",
"city": "Some City",
}
}
"address":
{
"number": "12",
"street": "Park Street",
"city": "Some City",
}
}
Distributed databases are widely used in many organizations. A distributed database is a database in
which data is stored across different physical locations. It may be held in multiple computers located in
the same physical location (for example, a datacenter), or may be dispersed over a network of intercon-
nected computers. When compared to non-distributed database systems, any data update to a distribut-
ed database will take time to apply across multiple locations. If you require transactional consistency in
this scenario, locks may be retained for a very long time, especially if there's a network failure between
databases at a critical point in time. To counter this problem, many distributed database management
systems relax the strict isolation requirements of transactions and implement "eventual consistency." In
this form of consistency, as an application writes data, each change is recorded by one server and then
propagated to the other servers in the distributed database system asynchronously. While this strategy
helps to minimize latency, it can lead to temporary inconsistencies in the data. Eventual consistency is
ideal where the application doesn't require any ordering guarantees. Examples include counts of shares,
likes, or non-threaded comments in a social media system.
●● Analysis: You typically use batch processing for performing complex analytics. Stream processing is
used for simple response functions, aggregates, or calculations such as rolling averages.
Knowledge check
Question 1
How is data in a relational table organized?
Rows and Columns
Header and Footer
Pages and Paragraphs
Question 2
Which of the following is an example of unstructured data?
An Employee table with columns Employee ID, Employee Name, and Employee Designation
Audio and Video files
A table within SQL Server database
Question 3
What of the following is an example of a streaming dataset?
Data from sensors and devices
Sales data for the past month
List of employees working for a company
Summary
Microsoft Azure provides a range of technologies for storing relational and non-relational data. Each
technology has its own strengths, and is suited to specific scenarios.
In this lesson you have learned how to:
●● Identify how data is defined and stored
●● Identify characteristics of relational and non-relational data
●● Describe and differentiate data workloads
●● Describe and differentiate batch and streaming data
Learn more
●● Introduction to Azure SQL Database2
●● Introduction to Azure Blob storage3
2 https://docs.microsoft.com/azure/sql-database/sql-database-technical-overview
3 https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction
14 Module 1 Explore core data concepts
4 https://docs.microsoft.com/azure/cosmos-db/introduction
5 https://docs.microsoft.com/office/troubleshoot/access/database-normalization-description
Explore roles and responsibilities in the world of data 15
Learning objectives
In this lesson you will:
●● Explore data job roles
●● Explore common tasks and tools for data job roles
An Azure database administrator is responsible for the design, implementation, maintenance, and
operational aspects of on-premises and cloud-based database solutions built on Azure data services and
SQL Server. They are responsible for the overall availability and consistent performance and optimiza-
tions of the database solutions. They work with stakeholders to implement policies, tools, and processes
for backup and recovery plans to recover following a natural disaster or human-made error.
The database administrator is also responsible for managing the security of the data in the database,
granting privileges over the data, granting or denying access to users as appropriate.
16 Module 1 Explore core data concepts
A data engineer collaborates with stakeholders to design and implement data-related assets that include
data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads.
They use a wide range of data platform technologies, including relational and nonrelational databases,
file stores, and data streams.
They are also responsible for ensuring that the privacy of data is maintained within the cloud and span-
ning from on-premises to the cloud data stores. They also own the management and monitoring of data
stores and data pipelines to ensure that data loads perform as expected.
A data analyst enables businesses to maximize the value of their data assets. They are responsible for
designing and building scalable models, cleaning and transforming data, and enabling advanced analyt-
ics capabilities through reports and visualizations.
A data analyst processes raw data into relevant insights based on identified business requirements to
deliver relevant insights.
A useful feature of SQL Server Management Studio is the ability to generate Transact-SQL scripts for
almost all of the functionality that SSMS provides. This gives the DBA the ability to schedule and auto-
mate many common tasks.
NOTE: Transact-SQL is a set of programming extensions from Microsoft that adds several features to the
Structured Query Language (SQL), including transaction control, exception and error handling, row
processing, and declared variables.
Explore roles and responsibilities in the world of data 19
You can use the Azure portal to dynamically manage and adjust resources such as the data storage size
and the number of cores available for the database processing. These tasks would require the support of
a system administrator if you were running the database on-premises.
6 https://portal.azure.com/#home
20 Module 1 Explore core data concepts
7 https://docs.microsoft.com/azure/azure-databricks/what-is-azure-databricks
8 https://docs.microsoft.com/azure/hdinsight/hdinsight-overview
9 https://docs.microsoft.com/azure/cosmos-db/introduction
Explore roles and responsibilities in the world of data 21
●● Creating charts and graphs, histograms, geographical maps, and other visual models that help to
explain the meaning of large volumes of data, and isolate areas of interest.
●● Transforming, improving, and integrating data from many sources, depending on the business
requirements.
●● Combining the data result sets across multiple sources. For example, combining sales data and
weather data provides a useful insight into how weather influenced sales of certain products such as
ice creams.
●● Finding hidden patterns using data.
●● Delivering information in a useful and appealing way to users by creating rich graphical dashboards
and reports.
Knowledge check
Question 1
Which one of the following tasks is a role of a database administrator?
Backing up and restoring databases
Creating dashboards and reports
Identifying data quality issues
Question 2
Which of the following tools is a visualization and reporting tool?
SQL Server Management Studio
Power BI
SQL
Explore roles and responsibilities in the world of data 23
Question 3
Which one of the following roles is not a data job role?
Systems Administrator
Data Analyst
Database Administrator
Summary
Managing and working with data is a specialist skill. Most organizations define job roles for the various
tasks responsible for managing data.
In this lesson you have learned:
●● Some of the common job roles for handling data
●● The tasks typically performed by these job roles, and the types of tools that they use
Learn more
●● Overview of Azure Databricks10
●● Overview of Azure HDInsight11
●● Introduction to Azure Cosmos DB12
●● Overview of Power BI13
●● SQL Server Technical Documentation14
●● Introduction to Azure Data Factory15
10 https://docs.microsoft.com/azure/azure-databricks/what-is-azure-databricks
11 https://docs.microsoft.com/azure/hdinsight/hdinsight-overview
12 https://docs.microsoft.com/azure/cosmos-db/introduction
13 https://docs.microsoft.com/power-bi/fundamentals/power-bi-overview
14 https://docs.microsoft.com/sql/sql-server/?view=sql-server-ver15
15 https://docs.microsoft.com/azure/data-factory/introduction
24 Module 1 Explore core data concepts
Learning objectives
In this lesson you will:
●● Explore the characteristics of relational data
●● Define tables, indexes, and views
●● Explore relational data workload offerings in Azure
used to maintain relationships between tables. This is where the relational model gets its name from. In
the image below, the Orders table contains both a Customer ID and a Product ID. The Customer ID
relates to the Customers table to identify the customer that placed the order, and the Product ID relates
to the Products table to indicate what product was purchased.
You design a relational database by creating a data model. The model below shows the structure of the
entities from the previous example. In this diagram, the columns marked PK are the Primary Key for the
table. The primary key indicates the column (or combination of columns) that uniquely identify each row.
Every table should have a primary key.
The diagram also shows the relationships between the tables. The lines connecting the tables indicate the
type of relationship. In this case, the relationship from customers to orders is 1-to-many (one customer
can place many orders, but each order is for a single customer). Similarly, the relationship between orders
and products is many-to-1 (several orders might be for the same product).
The columns marked FK are Foreign Key columns. They reference, or link to, the primary key of another
table, and are used to maintain the relationships between tables. A foreign key also helps to identify and
prevent anomalies, such as orders for customers that don't exist in the Customers table. In the model
below, the Customer ID and Product ID columns in the Orders table link to the customer that placed the
order and the product that was ordered:
26 Module 1 Explore core data concepts
16 https://docs.microsoft.com/office/troubleshoot/access/database-normalization-description
Describe concepts of relational data 27
table. The example query below finds the details of every customer from the sample database shown
above.
SELECT CustomerID, CustomerName, CustomerAddress
FROM Customers
Rather than retrieve every row, you can filter data by using a WHERE clause. The next query fetches the
order ID and product ID for all orders placed by customer 1.
SELECT OrderID, ProductID
FROM Orders
WHERE CustomerID = 'C1'
You can combine the data from multiple tables in a query using a join operation. A join operation spans
the relationships between tables, enabling you to retrieve the data from more than one table at a time.
The following query retrieves the name of every customer, together with the product name and quantity
for every order they've placed. Notice that each column is qualified with the table it belongs to:
SELECT Customers.CustomerName, Orders.QuantityOrdered, Products.ProductName
FROM Customers JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
JOIN Products
ON Orders.ProductID = Products.ProductID
You can find full details about SQL on the Microsoft website, on the Structured Query Language (SQL)17
page.
17 https://docs.microsoft.com/sql/odbc/reference/structured-query-language-sql
28 Module 1 Explore core data concepts
What is an index?
An index helps you search for data in a table. Think of an index over a table like an index at the back of a
book. A book index contains a sorted set of references, with the pages on which each reference occurs.
When you want to find a reference to an item in the book, you look it up through the index. You can use
the page numbers in the index to go directly to the correct pages in the book. Without an index, you
might have to read through the entire book to find the references you're looking for.
When you create an index in a database, you specify a column from the table, and the index contains a
copy of this data in a sorted order, with pointers to the corresponding rows in the table. When the user
runs a query that specifies this column in the WHERE clause, the database management system can use
this index to fetch the data more quickly than if it had to scan through the entire table row by row. In the
example below, the query retrieves all orders for customer C1. The Orders table has an index on the
Customer ID column. The database management system can consult the index to quickly find all match-
ing rows in the Orders table.
You can create many indexes on a table. So, if you also wanted to find all orders for a specific product,
then creating another index on the Product ID column in the Orders table, would be useful. However,
indexes aren't free. An index might consume additional storage space, and each time you insert, update,
or delete data in a table, the indexes for that table must be maintained. This additional work can slow
down insert, update, and delete operations, and incur additional processing charges. Therefore, when
deciding which indexes to create, you must strike a balance between having indexes that speed up your
queries versus the cost of performing other operations. In a table that is read only, or that contains data
that is modified infrequently, more indexes will improve query performance. If a table is queried infre-
quently, but subject to a large number of inserts, updates, and deletes (such as a table involved in OLTP),
then creating indexes on that table can slow your system down.
Some relational database management systems also support clustered indexes. A clustered index physi-
cally reorganizes a table by the index key. This arrangement can improve the performance of queries still
further, because the relational database management system doesn't have to follow references from the
index to find the corresponding data in the underlying table. The image below shows the Orders table
with a clustered index on the Customer ID column.
Describe concepts of relational data 29
In database management systems that support them, a table can only have a single clustered index.
What is a view?
A view is a virtual table based on the result set of a query. In the simplest case, you can think of a view as
a window on specified rows in an underlying table. For example, you could create a view on the Orders
table that lists the orders for a specific product (in this case, product P1) like this:
CREATE VIEW P1Orders AS
SELECT CustomerID, OrderID, Quantity
FROM Orders
WHERE ProductID = "P1"
You can query the view and filter the data in much the same way as a table. The following query finds the
orders for customer C1 using the view. This query will only return orders for product P1 made by the
customer:
SELECT CustomerID, OrderID, Quantity
FROM P1Orders
WHERE CustomerID = "C1"
A view can also join tables together. If you regularly needed to find the details of customers and the
products that they've ordered, you could create a view based on the join query shown in the previous
unit:
CREATE VIEW CustomersProducts AS
SQL
SELECT Customers.CustomerName, Orders.QuantityOrdered, Products.ProductName
FROM Customers JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
JOIN Products
ON Orders.ProductID = Products.ProductID
The following query finds the customer name and product names of all orders placed by customer C2,
using this view:
SELECT CustomerName, ProductName
FROM CustomersProducts
30 Module 1 Explore core data concepts
IaaS is an acronym for Infrastructure-as-a-Service. Azure enables you to create a virtual infrastructure in
the cloud that mirrors the way an on-premises data center might work. You can create a set of virtual
machines, connect them together using a virtual network, and add a range of virtual devices. In many
ways, this approach is similar to the way in which you run your systems inside an organization, except
that you don't have to concern yourself with buying or maintaining the hardware. However, you're still
responsible for many of the day-to-day operations, such as installing and configuring the software,
patching, taking backups, and restoring data when needed. You can think of IaaS as a half-way-house to
fully managed operations in the cloud; you don't have to worry about the hardware, but running and
managing the software is still very much your responsibility.
You can run any software for which you have the appropriate licenses using this approach. You're not
restricted to any specific database management system.
The IaaS approach is best for migrations and applications requiring operating system-level access. SQL
virtual machines are lift-and-shift. That is, you can copy your on-premises solution directly to a virtual
machine in the cloud. The system should work more or less exactly as before in its new location, except
for some small configuration changes (changes in network addresses, for example) to take account of the
change in environment.
PaaS stands for Platform-as-a-service. Rather than creating a virtual infrastructure, and installing and
managing the database software yourself, a PaaS solution does this for you. You specify the resources
that you require (based on how large you think your databases will be, the number of users, and the
performance you require), and Azure automatically creates the necessary virtual machines, networks, and
other devices for you. You can usually scale up or down (increase or decrease the size and number of
resources) quickly, as the volume of data and the amount of work being done varies; Azure handles this
scaling for you, and you don't have to manually add or remove virtual machines, or perform any other
form of configuration.
Azure offers several PaaS solutions for relational databases, include Azure SQL Database, Azure Database
for PostgreSQL, Azure Database for MySQL, and Azure Database for MariaDB. These services run man-
aged versions of the database management systems on your behalf. You just connect to them, create
your databases, and upload your data. However, you may find that there are some functional restrictions
in place, and not every feature of your selected database management system may be available. These
restrictions are often due to security issues. For example, they might expose the underlying operating
system and hardware to your applications. In these cases, you may need to rework your applications to
remove any dependencies on these features.
The image below illustrates the benefits and tradeoffs when running a database management system (in
this case, SQL Server) on-premises, using virtual machines in Azure (IaaS), or using Azure SQL Database
(PaaS). The same generalized considerations are true for other database management systems.
32 Module 1 Explore core data concepts
Knowledge check
Question 1
Which one of the following statements is a characteristic of a relational database?
All data must be stored as character strings
A row in a table represents a single entity
Different rows in the same table can contain different columns
Question 2
What is an index?
A structure that enables you to locate rows in a table quickly, using an indexed value
A virtual table based on the result set of a query
A structure comprising rows and columns that you use for storing data
Describe concepts of relational data 33
Summary
Relational databases are widely used for building real world applications. Understanding the characteris-
tics of relational data is important. A relational database is based on tables. You can run many database
management systems on-premises and in the cloud.
In this lesson you have learned:
●● The characteristics of relational data
●● What are tables, indexes and views
●● The various relational data workload offerings available in Azure.
Learn more
●● Description of the database normalization basics18
●● Structured Query Language (SQL)19
●● Technical overview of SQL Database20
●● SQL Server Technical Documentation21
●● SQL Database PaaS vs IaaS Offerings22
18 https://docs.microsoft.com/office/troubleshoot/access/database-normalization-description
19 https://docs.microsoft.com/sql/odbc/reference/structured-query-language-sql
20 https://docs.microsoft.com/azure/sql-database/sql-database-technical-overview
21 https://docs.microsoft.com/sql/sql-server/?view=sql-server-ver15
22 https://docs.microsoft.com/azure/sql-database/sql-database-paas-vs-sql-server-iaas
34 Module 1 Explore core data concepts
Learning objectives
In this lesson, you will:
●● Explore the characteristics of non-relational data
●● Define types of non-relational data
●● Describe NoSQL, and the types of non-relational databases
database might not be appropriate at this point; you can perform these tasks as part at a later date. At
the time of ingestion, you simply need to store the data in its original state and format.
A key aspect of non-relational databases is that they enable you to store data in a very flexible manner.
Non-relational databases don't impose a schema on data. Instead, they focus on the data itself rather
than how to structure it. This approach means that you can store information in a natural format, that
mirrors the way in which you would consume, query and use it.
In a non-relational system, you store the information for entities in collections or containers rather than
relational tables. Two entities in the same collection can have a different set of fields rather than a regular
set of columns found in a relational table. The lack of a fixed schema means that each entity must be
self-describing. Often this is achieved by labeling each field with the name of the data that it represents.
For example, a non-relational collection of customer entities might look like this:
## Customer 1
ID: 1
Name: Mark Hanson
Telephone: [ Home: 1-999-9999999, Business: 1-888-8888888, Cell: 1-777-
7777777 ]
Address: [ Home: 121 Main Street, Some City, NY, 10110,
Business: 87 Big Building, Some City, NY, 10111 ]
## Customer 2
ID: 2
Title: Mr
Name: Jeff Hay
Telephone: [ Home: 0044-1999-333333, Mobile: 0044-17545-444444 ]
Address: [ UK: 86 High Street, Some Town, A County, GL8888, UK,
US: 777 7th Street, Another City, CA, 90111 ]
In this example, fields are prefixed with a name. Fields might also have multiple subfields, also with
names. In the example, multiple subfields are denoted by enclosing them between square brackets.
Adding a new customer is matter of inserting an entity with its fields labeled in a meaningful way. An
application that queries this data must be prepared to parse the information in the entity that it retrieves.
The data retrieval capabilities of a non-relational database can vary. Each entity should have a unique key
value. The entities in a collection are usually stored in key-value order. In the example above, the unique
key is the ID field. The simplest type of non-relational database enables an application to either specify
the unique key, or a range of keys as query criteria. In the customers example, the database would enable
an application to query customers by ID only. Filtering data on other fields would require scanning the
entire collection of entities, parsing each entity in turn, and then applying any query criteria to each entity
to find any matches. In the example below, a query that fetches the details of a customer by ID can
quickly identify which entity to retrieve. A query that attempts to find all customers with a UK address
would have to iterate through every entity, and for each entity examine each field in turn. If the database
contains many millions of entities, this query could take a considerable time to run.
36 Module 1 Explore core data concepts
More advanced non-relational systems support indexing, in a similar manner to an index in a relational
database. Queries can then use the index to identify and fetch data based on non-key fields. Non-rela-
tional systems such as Azure Cosmos DB (a non-relational database management system available in
Azure), support indexing even when the structure of the indexed data can very from record to record. For
more information, read Indexing in Azure Cosmos DB - Overview23.
When you design a non-relational database, it's important to understand the capabilities of the database
management system and the types of query it will have to support.
NOTE: Non-relational databases often provide their own proprietary language for managing and query-
ing data. This language may be procedural, or it may be similar to SQL; it depends on how the database
is implemented by the database management system.
23 https://docs.microsoft.com/azure/cosmos-db/index-overview
Explore concepts of non-relational data 37
(software development kits) can be used build rich iOS and Android applications using the popular
Xamarin framework.
{
"ID": "2",
"Title": "Mr",
38 Module 1 Explore core data concepts
You're free to define whatever fields you like. The important point is that the data follows the JSON
grammar. When an application reads a document, it can use a JSON parser to break up the document
into its component fields and extract the individual pieces of data.
Other formats you might see include Avro, ORC, and Parquet:
●● Avro is a row-based format. It was created by Apache. Each record contains a header that describes
the structure of the data in the record. This header is stored as JSON. The data is stored as binary
information. An application uses the information in the header to parse the binary data and extract
the fields it contains. Avro is a very good format for compressing data and minimizing storage and
network bandwidth requirements.
●● ORC (Optimized Row Columnar format) organizes data into columns rather than rows. It was devel-
oped by HortonWorks for optimizing read and write operations in Apache Hive. Hive is a data ware-
house system that supports fast data summarization and querying over very large datasets. Hive
supports SQL-like queries over unstructured data. An ORC file contains stripes of data. Each stripe
holds the data for a column or set of columns. A stripe contains an index into the rows in the stripe,
the data for each row, and a footer that holds statistical information (count, sum, max, min, and so on)
for each column.
●● Parquet is another columnar data format. It was created by Cloudera and Twitter. A Parquet file
contains row groups. Data for each column is stored together in the same row group. Each row group
contains one or more chunks of data. A Parquet file includes metadata that describes the set of rows
found in each chunk. An application can use this metadata to quickly locate the correct chunk for a
given set of rows, and retrieve the data in the specified columns for these rows. Parquet specializes in
storing and processing nested data types efficiently. It supports very efficient compression and
encoding schemes.
Explore concepts of non-relational data 39
What is NoSQL?
You might see the term NoSQL when reading about non-relational databases. NoSQL is a rather loose
term that simply means non-relational. There's some debate about whether it's intended to imply Not
SQL, or Not Only SQL; some non-relational databases support a version of SQL adapted for documents
rather than tables (examples include Azure Cosmos DB).
NoSQL (non-relational) databases generally fall into four categories: key-value stores, document databas-
es, column family databases, and graph databases. The following sections discuss these types of NoSQL
databases.
A query specifies the keys to identify the items to be retrieved. You can't search on values. An application
that retrieves data from a key-value store is responsible for parsing the contents of the values returned.
Write operations are restricted to inserts and deletes. If you need to update an item, you must retrieve
the item, modify it in memory (in the application), and then write it back to the database, overwriting the
original (effectively a delete and an insert).
The focus of a key-value store is the ability to read and write data very quickly. Search capabilities are
secondary. A key-value store is an excellent choice for data ingestion, when a large volume of data arrives
as a continual stream and must be stored immediately.
Azure Table storage is an example of a key-value store. Cosmos DB also implements a key-value store
using the Table API24.
24 https://docs.microsoft.com/azure/cosmos-db/table-introduction
Explore concepts of non-relational data 41
An application can retrieve documents by using the document key. The key is a unique identifier for the
document. Some document databases create the document key automatically. Others enable you to
specify an attribute of the document to use as the key. The application can also query documents based
on the value of one or more fields. Some document databases support indexing to facilitate fast lookup
of documents based on one or more indexed fields.
Some document database management systems support in-place updates, enabling an application to
modify the values of specific fields in a document without rewriting the entire document. Other docu-
ment database management systems (such as Cosmos DB) can only read and write entire documents. In
these cases, an update replaces the entire document with a new version. This approach helps to reduce
fragmentation in the database, which can, in turn, improve performance.
Most document databases will ingest large volumes of data more rapidly than a relational database, but
aren't as optimal as a key-value store for this type of processing. The focus of a document database is its
query capabilities.
Azure Cosmos DB implements a document database approach in its Core (SQL) API.
The relational model supports a very generalized approach to implementing this type of relationship, but
to find the address of any given customer an application needs to run a query that joins two tables. If this
is the most common query performed by the application, then the overhead associated with performing
this join operation can quickly become significant if there are a large number of requests and the tables
themselves are large.
The purpose of a column family database is to efficiently handle situations such as this. You can think of a
column family database as holding tabular data comprising rows and columns, but you can divide the
columns into groups known as column-families. Each column family holds a set of columns that are
logically related together. The image below shows one way of structuring the same information as the
previous image, by using a column family database to group the data into two column-families holding
the customer name and address information. Other ways of organizing the columns are possible, but you
should implement your column-families to optimize the most common queries that your application
performs. In this case, queries that retrieve the addresses of customers can fetch the data with fewer
reads than would be required in the corresponding relational database; these queries can fetch the data
directly from the AddressInfo column family.
Explore concepts of non-relational data 43
The illustration above is conceptual rather than physical, and is intended to show the logical structure of
the data rather than how it might be physically organized. Each row in a column family database contains
a key, and you can fetch the data for a row by using this key.
In most column family databases, the column-families are stored separately. In the previous example, the
CustomerInfo column family might be held in one area of physical storage and the AddressInfo column
family in another, in a simple form of vertical partitioning. You should really think of the structure in terms
of column-families rather than rows. The data for a single entity that spans multiple column-families will
have the same row key in each column family. As an alternative to the conceptual layout shown previous-
ly, you can visualize the data shown as the following pair of physical structures.
The most widely used column family database management system is Apache Cassandra. Azure Cosmos
DB supports the column-family approach through the Cassandra API.
A structure such as this makes it straightforward to conduct inquiries such as “Find all employees who
directly or indirectly work for Sarah” or "Who works in the same department as John?" For large graphs
with lots of entities and relationships, you can perform very complex analyses very quickly, and many
graph databases provide a query language that you can use to traverse a network of relationships
efficiently. You can often store the same information in a relational database, but the SQL required to
query this information might require many expensive recursive join operations and nested subqueries.
Azure Cosmos DB supports graph databases using the Gremlin API25. The Gremlin API is a standard
language for creating and querying graphs.
Knowledge check
Question 1
Which of the following services should you use to implement a non-relational database?
Azure Cosmos DB
Azure SQL Database
The Gremlin API
Question 2
Which of the following is a characteristic of non-relational databases?
Non-relational databases contain tables with flat fixed-column records
Non-relational databases require you to use data normalization techniques to reduce data duplication
Non-relational databases are either schema free or have relaxed schemas
25 https://docs.microsoft.com/azure/cosmos-db/graph-introduction
Explore concepts of non-relational data 45
Question 3
You are building a system that monitors the temperature throughout a set of office blocks, and sets the air
conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has to
manage the air conditioning in several thousand buildings spread across the country or region, and each
building typically contains at least 100 air-conditioned rooms. What type of NoSQL data store is most
appropriate for capturing the temperature data to enable it to be processed quickly?
A key-value store
A column family database
Write the temperatures to a blob in Azure Blob storage
Summary
Microsoft Azure provides a variety of technologies for storing non-relational data. Each technology has
its own strengths, and is suited to specific scenarios.
You have explored:
●● The characteristics of non-relational data
●● Different types of non-relational data
●● NoSQL, and the types of non-relational databases
Learn more
●● Choose the right data store26
●● Welcome to Azure Cosmos DB27
●● Indexing in Azure Cosmos DB - Overview28
●● Introduction to Azure Cosmos DB: Table API29
●● Introduction to Azure Cosmos DB: Gremlin API30
●● Introduction to Azure Blob storage31
26 https://docs.microsoft.com/azure/architecture/guide/technology-choices/data-store-overview
27 https://docs.microsoft.com/azure/cosmos-db/introduction
28 https://docs.microsoft.com/azure/cosmos-db/index-overview
29 https://docs.microsoft.com/azure/cosmos-db/table-introduction
30 https://docs.microsoft.com/azure/cosmos-db/graph-introduction
31 https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction
46 Module 1 Explore core data concepts
Learning objectives
In this lesson you will:
●● Learn about data ingestion and processing
●● Explore data visualization
●● Explore data analytics
An alternative approach is ELT. ELT is an abbreviation of Extract, Load, and Transform. The process differs
from ETL in that the data is stored before being transformed. The data processing engine can take an
iterative approach, retrieving and processing the data from storage, before writing the transformed data
and models back to storage. ELT is more suitable for constructing complex models that depend on
multiple items in the database, often using periodic batch processing.
ELT is a scalable approach that is suitable for the cloud because it can make use of the extensive process-
ing power available. The more stream-oriented approach of ETL places more emphasis on throughput.
However, ETL can filter data before it's stored. In this way, ETL can help with data privacy and compliance,
removing sensitive data before it arrives in your analytical data models.
Azure provides several options that you can use to implement the ELT and ETL approaches. For example,
if you are storing data in Azure SQL Database, you can use SQL Server Integration Services. Integration
Services can extract and transform data from a wide variety of sources such as XML data files, flat files,
and relational data sources, and then load the data into one or more destinations.
Another more generalized approach is to use Azure Data Factory. Azure Data Factory is a cloud-based
data integration service that allows you to create data-driven workflows for orchestrating data movement
and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven
workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL
processes that transform data visually with data flows, or by using compute services such as Azure
HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
What is reporting?
Reporting is the process of organizing data into informational summaries to monitor how different areas
of an organization are performing. Reporting helps companies monitor their online business, and know
50 Module 1 Explore core data concepts
when data falls outside of expected ranges. Good reporting should raise questions about the business
from its end users. Reporting shows you what has happened, while analysis focuses on explaining why it
happened and what you can do about it.
●● Line charts: Line charts emphasize the overall shape of an entire series of values, usually over time.
52 Module 1 Explore core data concepts
●● Matrix: A matrix visual is a tabular structure that summarizes data. Often, report designers include
matrixes in reports and dashboards to allow users to select one or more element (rows, columns, cells)
in the matrix to cross-highlight other visuals on a report page.
●● Key influencers: A key influencer chart displays the major contributors to a selected result or value. Key
influencers are a great choice to help you understand the factors that influence a key metric. For
example, what influences customers to place a second order or why sales were so high last June.
Explore concepts of data analytics 53
●● Treemap: Treemaps are charts of colored rectangles, with size representing the relative value of each
item. They can be hierarchical, with rectangles nested within the main rectangles.
●● Scatter: A scatter chart shows the relationship between two numerical values. A bubble chart is a
scatter chart that replaces data points with bubbles, with the bubble size representing an additional
third data dimension.
54 Module 1 Explore core data concepts
A dot plot chart is similar to a bubble chart and scatter chart, but can plot categorical data along the
X-Axis.
●● Filled map. If you have geographical data, you can use a filled map to display how a value differs in
proportion across a geography or region. You can see relative differences with shading that ranges
from light (less-frequent/lower) to dark (more-frequent/more).
Explore concepts of data analytics 55
Descriptive analytics
Descriptive analytics helps answer questions about what has happened, based on historical data. Descrip-
tive analytics techniques summarize large datasets to describe outcomes to stakeholders.
By developing KPIs (Key Performance Indicators), these strategies can help track the success or failure of
key objectives. Metrics such as return on investment (ROI) are used in many industries. Specialized
metrics are developed to track performance in specific industries.
Examples of descriptive analytics include generating reports to provide a view of an organization's sales
and financial data.
Diagnostic analytics
Diagnostic analytics helps answer questions about why things happened. Diagnostic analytics techniques
supplement more basic descriptive analytics. They take the findings from descriptive analytics and dig
deeper to find the cause. The performance indicators are further investigated to discover why they got
better or worse. This generally occurs in three steps:
1. Identify anomalies in the data. These may be unexpected changes in a metric or a particular market.
2. Collect data that's related to these anomalies.
3. Use statistical techniques to discover relationships and trends that explain these anomalies.
56 Module 1 Explore core data concepts
Predictive analytics
Predictive analytics helps answer questions about what will happen in the future. Predictive analytics
techniques use historical data to identify trends and determine if they're likely to recur. Predictive
analytical tools provide valuable insight into what may happen in the future. Techniques include a variety
of statistical and machine learning techniques such as neural networks, decision trees, and regression.
Prescriptive analytics
Prescriptive analytics helps answer questions about what actions should be taken to achieve a goal or
target. By using insights from predictive analytics, data-driven decisions can be made. This technique
allows businesses to make informed decisions in the face of uncertainty. Prescriptive analytics techniques
rely on machine learning strategies to find patterns in large datasets. By analyzing past decisions and
events, the likelihood of different outcomes can be estimated.
Cognitive analytics
Cognitive analytics attempts to draw inferences from existing data and patterns, derive conclusions based
on existing knowledge bases, and then add these findings back into the knowledge base for future
inferences–a self-learning feedback loop. Cognitive analytics helps you to learn what might happen if
circumstances change, and how you might handle these situations.
Inferences aren't structured queries based on a rules database, rather they're unstructured hypotheses
gathered from a number of sources, and expressed with varying degrees of confidence. Effective cogni-
tive analytics depends on machine learning algorithms. It uses several NLP (Natural Language Processing)
concepts to make sense of previously untapped data sources, such as call center conversation logs and
product reviews.
Theoretically, by tapping the benefits of massive parallel/distributed computing and the falling costs of
data storage and computing power, there's no limit to the cognitive development that these systems can
achieve.
Knowledge check
Question 1
What is data ingestion?
The process of transforming raw data into models containing meaningful information
Analyzing data for anomalies,
Capturing raw data streaming from various sources and storing it
Question 2
Which one of the following visuals displays the major contributors to a selected result or value?
Key influencers
Column and bar chart
Matrix chart
Explore concepts of data analytics 57
Question 3
Which type of analytics helps answer questions about what has happened in the past?
Descriptive analytics
Prescriptive analytics
Predictive analytics
Summary
Organizations have enormous amounts of data. The purpose of data analysis is to discover important
insights that can help you drive your business forward.
You have explored:
●● Data ingestion and processing
●● Data visualization
●● Data analytics
Learn more
●● Create reports and dashboards in Power BI - documentation32
●● Azure Databricks33
●● Azure Cognitive Services34
●● Extract, transform, and load (ETL)35
32 https://docs.microsoft.com/power-bi/create-reports/
33 https://azure.microsoft.com/services/databricks/
34 https://azure.microsoft.com/services/databricks/
35 https://docs.microsoft.com/azure/architecture/data-guide/relational-data/etl
58 Module 1 Explore core data concepts
Answers
Question 1
How is data in a relational table organized?
■■ Rows and Columns
Header and Footer
Pages and Paragraphs
Explanation
That's correct. Structured data is typically tabular data that is represented by rows and columns in a
database table.
Question 2
Which of the following is an example of unstructured data?
An Employee table with columns Employee ID, Employee Name, and Employee Designation
■■ Audio and Video files
A table within SQL Server database
Explanation
That's correct. Audio and video files are unstructured data.
Question 3
What of the following is an example of a streaming dataset?
■■ Data from sensors and devices
Sales data for the past month
List of employees working for a company
Explanation
That's correct. Sensor and device feeds are examples of streaming datasets as they are published continu-
ously.
Question 1
Which one of the following tasks is a role of a database administrator?
■■ Backing up and restoring databases
Creating dashboards and reports
Identifying data quality issues
Explanation
That's correct. Database Administrators will back up the database and will restore database when data is
lost or corrupted.
Explore concepts of data analytics 59
Question 2
Which of the following tools is a visualization and reporting tool?
SQL Server Management Studio
■■ Power BI
SQL
Explanation
That's correct. Power BI is a standard tool for creating rich graphical dashboards and reports.
Question 3
Which one of the following roles is not a data job role?
■■ Systems Administrator
Data Analyst
Database Administrator
Explanation
That's correct. Systems administrators deal with infrastructure components such as networks, virtual
machines and other physical devices in a data center
Question 1
Which one of the following statements is a characteristic of a relational database?
All data must be stored as character strings
■■ A row in a table represents a single entity
Different rows in the same table can contain different columns
Explanation
That's correct. Each row in a table contains the data for a single entity in that table
Question 2
What is an index?
■■ A structure that enables you to locate rows in a table quickly, using an indexed value
A virtual table based on the result set of a query
A structure comprising rows and columns that you use for storing data
Explanation
That's correct. You create indexes to help speed up queries.
Question 1
Which of the following services should you use to implement a non-relational database?
■■ Azure Cosmos DB
Azure SQL Database
The Gremlin API
Explanation
That's correct. Cosmos DB supports several common models pf non-relational database, include key-value
stores, graph databases, document databases, and column family stores.
60 Module 1 Explore core data concepts
Question 2
Which of the following is a characteristic of non-relational databases?
Non-relational databases contain tables with flat fixed-column records
Non-relational databases require you to use data normalization techniques to reduce data duplication
■■ Non-relational databases are either schema free or have relaxed schemas
Explanation
That's correct. Each entity in a non-relational database only has the fields it needs, and these fields can vary
between different entities.
Question 3
You are building a system that monitors the temperature throughout a set of office blocks, and sets the
air conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has
to manage the air conditioning in several thousand buildings spread across the country or region, and
each building typically contains at least 100 air-conditioned rooms. What type of NoSQL data store is
most appropriate for capturing the temperature data to enable it to be processed quickly?
■■ A key-value store
A column family database
Write the temperatures to a blob in Azure Blob storage
Explanation
That's correct. A key-value store can ingest large volumes of data rapidly. A thermometer in each room can
send the data to the database.
Question 1
What is data ingestion?
The process of transforming raw data into models containing meaningful information
Analyzing data for anomalies,
■■ Capturing raw data streaming from various sources and storing it
Explanation
That's correct. The purpose of data ingestion is to receive raw data and save it as quickly as possible. The
data can then be processed and analyzed.
Question 2
Which one of the following visuals displays the major contributors to a selected result or value?
■■ Key influencers
Column and bar chart
Matrix chart
Explanation
That's correct. A key influencer chart displays the major contributors to a selected result or value. Key
influencers are a great choice to help you understand the factors that influence a key metric.
Explore concepts of data analytics 61
Question 3
Which type of analytics helps answer questions about what has happened in the past?
■■ Descriptive analytics
Prescriptive analytics
Predictive analytics
Explanation
That's correct. Descriptive analytics helps answer questions about what happened.
Module 2 Explore relational data in Azure
Learning objectives
In this lesson, you will:
●● Identify relational Azure data services
●● Explore considerations in choosing a relational data service
updates, and security of the databases that it hosts. All you do is create your databases under the control
of the data service.
Azure Data Services are available for several common relational database management systems. The
most well-known service is Azure SQL Database. The others currently available are Azure Database for
MySQL servers, Azure Database for MariaDB servers, and Azure Database for PostgreSQL servers. The
remaining units in this module describe the features provided by these services.
NOTE: Microsoft also provides data services for non-relational database management systems, such as
Cosmos DB.
Using Azure Data Services reduces the amount of time that you need to invest to administer a DBMS.
However, these services can also limit the range of custom administration tasks that you can perform,
because manually performing some tasks might risk compromising the way in which the service runs. For
example, some DBMSs enable you to install custom software into a database, or run scripts as part of a
database operation. This software might not be supported by the data service, and allowing an applica-
tion to run a script from a database could affect the security of the service. You must be prepared to work
with these restrictions in mind.
Apart from reducing the administrative workload, Azure Data Services ensure that your databases are
available for at least 99.99% of the time.
There are costs associated with running a database in Azure Data Services. The base price of each service
covers underlying infrastructure and licensing, together with the administration charges. Additionally,
these services are designed to be always on. This means that you can't shut down a database and restart
it later.
Not all features of a database management system are available in Azure Data Services. This is because
Azure Data Services takes on the task of managing the system and keeping it running using hardware
situated in an Azure datacenter. Exposing some administrative functions might make the underlying
platform vulnerable to misuse, and even open up some security concerns. Therefore, you have no direct
control over the platform on which the services run. If you need more control than Azure Data Services
allow, you can install your database management system on a virtual machine that runs in Azure. The
next unit examines this approach in more detail for SQL Server, although the same issues apply for the
other database management systems supported by Azure Data Services.
The image below highlights the different ways in which you could run a DBMS such as SQL Server,
starting with an on-premises system in the bottom left-hand corner, to PaaS in the upper right. The
diagram illustrates the benefits of moving to the PaaS approach.
66 Module 2 Explore relational data in Azure
NOTE: The term lift-and-shift refers to the way in which you can move a database directly from an
on-premises server to an Azure virtual machine without requiring that you make any changes to it.
Applications that previously connected to the on-premises database can be quickly reconfigured to
connect to the database running on the virtual machine, but should otherwise remain unchanged.
Use cases
This approach is optimized for migrating existing applications to Azure, or extending existing on-premis-
es applications to the cloud in hybrid deployments.
NOTE: A hybrid deployment is a system where part of the operation runs on-premises, and part in the
cloud. Your database might be part of a larger system that runs on-premises, although the database
elements might be hosted in the cloud.
68 Module 2 Explore relational data in Azure
You can use SQL Server in a virtual machine to develop and test traditional SQL Server applications. With
a virtual machine, you have the full administrative rights over the DBMS and operating system. It's a
perfect choice when an organization already has IT resources available to maintain the virtual machines.
These capabilities enable you to:
●● Create rapid development and test scenarios when you do not want to buy on-premises non-produc-
tion SQL Server hardware.
●● Become lift-and-shift ready for existing applications that require fast migration to the cloud with
minimal changes or no changes.
●● Scale up the platform on which SQL Server is running, by allocating more memory, CPU power, and
disk space to the virtual machine. You can quickly resize an Azure virtual machine without the require-
ment that you reinstall the software that is running on it.
Business benefits
Running SQL Server on virtual machines allows you to meet unique and diverse business needs through a
combination of on-premises and cloud-hosted deployments, while using the same set of server products,
development tools, and expertise across these environments.
It's not always easy for businesses to switch their DBMS to a fully managed service. There may be specific
requirements that must be satisfied in order to migrate to a managed service that requires making
changes to the database and the applications that use it. For this reason, using virtual machines can offer
a solution, but using them does not eliminate the need to administer your DBMS as carefully as you
would on-premises.
NOTE: A SQL Database server is a logical construct that acts as a central administrative point for multiple
single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover
groups.
Azure SQL Database is available with several options: Single Database, Elastic Pool, and Managed Instance.
The following sections describe Single Instance and Elastic Pool. Managed Instance is the subject of the
next unit.
Single Database
This option enables you to quickly set up and run a single SQL Server database. You create and run a
database server in the cloud, and you access your database through this server. Microsoft manages the
server, so all you have to do is configure the database, create your tables, and populate them with your
data. You can scale the database if you need additional storage space, memory, or processing power. By
default, resources are pre-allocated, and you're charged per hour for the resources you've requested. You
can also specify a serverless configuration. In this configuration, Microsoft creates its own server, which
might be shared by a number of databases belonging to other Azure subscribers. Microsoft ensures the
privacy of your database. Your database automatically scales and resources are allocated or deallocated
as required. For more information, read What is a single database in Azure SQL Database1.
1 https://docs.microsoft.com/azure/sql-database/sql-database-single-database
70 Module 2 Explore relational data in Azure
Elastic Pool
This option is similar to Single Database, except that by default multiple databases can share the same
resources, such as memory, data storage space, and processing power. The resources are referred to as a
pool. You create the pool, and only your databases can use the pool. This model is useful if you have
databases with resource requirements that vary over time, and can help you to reduce costs. For example,
your payroll database might require plenty of CPU power at the end of each month as you handle payroll
processing, but at other times the database might become much less active. You might have another
database that is used for running reports. This database might become active for several days in the
middle of the month as management reports are generated, but with a lighter load at other times. Elastic
Pool enables you to use the resources available in the pool, and then release the resources once process-
ing has completed.
Explore relational data offerings in Azure 71
Use cases
Azure SQL Database gives you the best option for low cost with minimal administration. It is not fully
compatible with on-premises SQL Server installations. It is often used in new cloud projects where the
application design can accommodate any required changes to your applications.
NOTE: You can use the Data Migration Assistant to detect compatibility issues with your databases that
can impact database functionality in Azure SQL Database. For more information, see Overview of Data
Migration Assistant2.
Azure SQL Database is often used for:
●● Modern cloud applications that need to use the latest stable SQL Server features.
●● Applications that require high availability.
2 https://docs.microsoft.com/sql/dma/dma-overview
72 Module 2 Explore relational data in Azure
●● Systems with a variable load, that need the database server to scale up and down quickly.
Business benefits
Azure SQL Database automatically updates and patches the SQL Server software to ensure that you are
always running the latest and most secure version of the service.
The scalability features of Azure SQL Database ensure that you can increase the resources available to
store and process data without having to perform a costly manual upgrade.
The service provides high availability guarantees, to ensure that your databases are available at least
99.99% of the time. Azure SQL Database supports point-in-time restore, enabling you to recover a
database to the state it was in at any point in the past. Databases can be replicated to different regions to
provide additional assurance and disaster recovery
Advanced threat protection provides advanced security capabilities, such as vulnerability assessments, to
help detect and remediate potential security problems with your databases. Threat protection also
detects anomalous activities that indicate unusual and potentially harmful attempts to access or exploit
your database. It continuously monitors your database for suspicious activities, and provides immediate
security alerts on potential vulnerabilities, SQL injection attacks, and anomalous database access patterns.
Threat detection alerts provide details of the suspicious activity, and recommend action on how to
investigate and mitigate the threat.
Auditing tracks database events and writes them to an audit log in your Azure storage account. Auditing
can help you maintain regulatory compliance, understand database activity, and gain insight into discrep-
ancies and anomalies that might indicate business concerns or suspected security violations.
SQL Database helps secure your data by providing encryption. For data in motion, it uses transport layer
security. For data at rest, it uses transparent data encryption. For data in use, it uses always encrypted.
In the Wide World Importers scenario, linked servers are used to perform distributed queries. However,
neither Single Database nor Elastic Pool support linked servers. If you want to use Single Database or
Elastic Pool, you may need to modify the queries that use linked servers and rework the operations that
depend on these features.
3 https://docs.microsoft.com/azure/sql-database/sql-database-managed-instance
Explore relational data offerings in Azure 73
Managed instances depend on other Azure services such as Azure Storage for backups, Azure Event Hubs
for telemetry, Azure Active Directory for authentication, Azure Key Vault for Transparent Data Encryption
(TDE) and a couple of Azure platform services that provide security and supportability features. The
managed instances make connections to these services.
All communications are encrypted and signed using certificates. To check the trustworthiness of commu-
nicating parties, managed instances constantly verify these certificates through certificate revocation lists.
If the certificates are revoked, the managed instance closes the connections to protect the data.
The following image summarizes the differences between SQL Database managed instance, Single
Database, and Elastic Pool
Use cases
Consider Azure SQL Database managed instance if you want to lift-and-shift an on-premises SQL Server
instance and all its databases to the cloud, without incurring the management overhead of running SQL
Server on a virtual machine.
SQL Database managed instance provides features not available with the Single Database or Elastic Pool
options. If your system uses features such as linked servers, Service Broker (a message processing system
that can be used to distribute work across servers), or Database Mail (which enables your database to
send email messages to users), then you should use managed instance. To check compatibility with an
existing on-premises system, you can install Data Migration Assistant (DMA)4. This tool analyzes your
databases on SQL Server and reports any issues that could block migration to a managed instance.
Business benefits
SQL Database managed instance provides all the management and security benefits available when using
Single Database and Elastic Pool. managed instance deployment enables a system administrator to spend
less time on administrative tasks because the SQL Database service either performs them for you or
greatly simplifies those tasks. Automated tasks include operating system and database management
4 https://www.microsoft.com/download/details.aspx?id=53595
74 Module 2 Explore relational data in Azure
system software installation and patching, dynamic instance resizing and configuration, backups, data-
base replication (including system databases), high availability configuration, and configuration of health
and performance monitoring data streams.
Managed instance has near 100% compatibility with SQL Server Enterprise Edition, running on-premises.
The SQL Database managed instance deployment option supports traditional SQL Server Database
engine logins and logins integrated with Azure Active Directory (AD). Traditional SQL Server Database
engine logins include a username and a password. You must enter your credentials each time you
connect to the server. Azure AD logins use the credentials associated with your current computer sign-in,
and you don't need to provide them each time you connect to the server.
In the Wide World Importers scenario, SQL Database managed instance may be a more suitable choice
than Single Database or Elastic Pool. SQL Database managed instance supports linked servers, although
some of the other the advanced features required by the database might not be available. If you want a
complete match, then running SQL Server on a virtual machine may be your only option, but you need to
balance the benefits of complete functionality against the administrative and maintenance overhead
required.
PostgreSQL has its own query language called pgsql. This language is a variant of the standard relational
query language, SQL, with features that enable you to write stored procedures that run inside the
database.
Knowledge check
Question 1
Which deployment requires the fewest changes when migrating an existing SQL Server on-premises
solution?
Azure SQL Database Managed Instance
SQL Server running on a virtual machine
Azure SQL Database Single Database
Question 2
Which of the following statements is true about SQL Server running on a virtual machine?
You must install and maintain the software for the database management system yourself, but
backups are automated
Software installation and maintenance are automated, but you must do your own backups
You're responsible for all software installation and maintenance, and performing back ups
Question 3
Which of the following statement is true about Azure SQL Database?
Scaling up doesn't take effect until you restart the database
Scaling out doesn't take effect until you restart the database
Scaling up or out will take effect without restarting the SQL database
Question 4
When using an Azure SQL Database managed instance, what is the simplest way to implement backups?
Manual Configuration of the SQL server
Create a scheduled task to back up
Backups are automatically handled
5 https://docs.microsoft.com/azure/dms/tutorial-postgresql-azure-postgresql-online
78 Module 2 Explore relational data in Azure
Question 5
What is the best way to transfer the data in a PostgreSQL database running on-premises into a database
running Azure Database for PostgreSQL service?
Export the data from the on-premises database and import it manually into the database running in
Azure
Upload a PostgreSQL database backup file to the database running in Azure
Use the Azure Database Migration Services
Summary
In this lesson, you've learned about the PaaS and IaaS deployment options for running databases in the
cloud. You've seen how Azure Data Services provides a range of PaaS services for running relational
databases in Azure. You've learned how the PaaS options provide support for automated management
and administration, compared to an IaaS approach.
Additional resources
●● Choose the right deployment option6
●● What is a single database in Azure SQL Database7
●● What is Azure SQL Database managed instance?8
●● Data Migration Assistant (DMA)9
●● Azure Database Migration Service (DMS)10
●● Choose the right data store11
6 https://docs.microsoft.com/azure/sql-database/sql-database-paas-vs-sql-server-iaas
7 https://docs.microsoft.com/azure/sql-database/sql-database-single-database
8 https://docs.microsoft.com/azure/sql-database/sql-database-managed-instance
9 https://www.microsoft.com/download/details.aspx?id=53595
10 https://docs.microsoft.com/azure/dms/tutorial-postgresql-azure-postgresql-online
11 https://docs.microsoft.com/azure/architecture/guide/technology-choices/data-store-overview
Explore provisioning and deploying relational database offerings in Azure 79
Learning objectives
In this lesson, you will:
●● Provision relational data services
●● Configure relational data services
●● Explore basic connectivity issues
●● Explore data security
What is provisioning?
Provisioning is the act of running series of tasks that a service provider, such as Azure SQL Database,
performs to create and configure a service. Behind the scenes, the service provider will set up the various
resources (disks, memory, CPUs, networks, and so on) required to run the service. You'll be assigned these
resources, and they remain allocated to you (and charged to you), until you delete the service.
How the service provider provisions resources is opaque, and you don't need to be concerned with how
this process works. All you do is specify parameters that determine the size of the resources required
(how much disk space, memory, computing power, and network bandwidth). These parameters are
determined by estimating the size of the workload that you intend to run using the service. In many
cases, you can modify these parameters after the service has been created, perhaps increasing the
80 Module 2 Explore relational data in Azure
amount of storage space or memory if the workload is greater than you initially anticipated. The act of
increasing (or decreasing) the resources used by a service is called scaling.
The following video summarizes the process that Azure performs when you provision a service.
https://www.microsoft.com/videoplayer/embed/RE4zTud
You send the template to Azure using the az deployment group create command in the Azure CLI,
or New-AzResourceGroupDeployment command in Azure PowerShell. For more information about
12 https://docs.microsoft.com/cli/azure/what-is-azure-cli
13 https://docs.microsoft.com/powershell/azure
Explore provisioning and deploying relational database offerings in Azure 81
creating and using Azure Resource Manager templates to provision Azure resources, see What are Azure
Resource Manager templates?14
https://www.microsoft.com/videoplayer/embed/RE4AkhG
14 https://docs.microsoft.com/azure/azure-resource-manager/templates/overview
82 Module 2 Explore relational data in Azure
The processes for provisioning Azure Database for PostgreSQL and Azure Database for MySQL are very
similar.
NOTE: PostgreSQL also gives you the hyperscale option, which supports ultra-high performance work-
loads.
NOTE: The term compute refers to the amount of processor power available, but in terms of size and
number of CPUs allocated to the service.
You can select between three pricing tiers, each of which is designed to support different workloads:
●● Basic. This tier is suitable for workloads that require light compute and I/O performance. Examples
include servers used for development or testing or small-scale, infrequently used applications.
●● General Purpose. Use this pricing tier for business workloads that require balanced compute and
memory with scalable I/O throughput. Examples include servers for hosting web and mobile apps and
other enterprise applications.
●● Memory Optimized This tier supports high-performance database workloads that require in-memo-
ry performance for faster transaction processing and higher concurrency. Examples include servers for
processing real-time data and high-performance transactional or analytical apps.
You can fine-tune the resources available for the selected tier. You can scale these resources up later, if
necessary.
NOTE: The Configure page displays the performance that General Purpose and Memory Optimized
configurations provide in terms of IOPS. IOPS is an acronym for Input/Output Operations per seconds, and
is a measure of the read and write capacity available using the configured resources.
●● Admin username. A sign-in account to use when you're connecting to the server. The admin sign-in
name can't be azure_superuser, admin, administrator, root, guest, or public.
●● Password. Provide a new password for the server admin account. It must contain from 8 to 128
characters. Your password must contain characters from three of the following categories: English
Explore provisioning and deploying relational database offerings in Azure 85
uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #,
%, and so on).
After you've specified the appropriate settings, select Review + create to provision the server.
NOTE: Azure SQL Database communicates over port 1433. If you're trying to connect from within a
corporate network, outbound traffic over port 1433 might not be allowed by your network's firewall. If so,
you can't connect to your Azure SQL Database server unless your IT department opens port 1433.
IMPORTANT: A firewall rule of 0.0.0.0 enables all Azure services to pass through the server-level firewall
rule and attempt to connect to a single or pooled database through the server.
15 https://docs.microsoft.com/azure/private-link/private-endpoint-overview
Explore provisioning and deploying relational database offerings in Azure 87
Firewalls and virtual networks page, to completely lock down users and applications from accessing
public endpoints to connect to your Azure SQL Database account.
Configure authentication
With Azure Active Directory (AD) authentication, you can centrally manage the identities of database
users and other Microsoft services in one central location. Central ID management provides a single place
to manage database users and simplifies permission management.
You can use these identities and configure access to your relational data services.
For detailed information on using Azure AD with Azure SQL database, visit the page What is Azure
Active Directory authentication for SQL database16 on the Microsoft website. You can also authenti-
cate users connecting to Azure Database for PostgreSQL17 and Azure Database for MySQL18 with AD.
16 https://docs.microsoft.com/azure/sql-database/sql-database-aad-authentication
17 https://docs.microsoft.com/azure/postgresql/concepts-aad-authentication
18 https://docs.microsoft.com/azure/mysql/concepts-azure-ad-authentication
88 Module 2 Explore relational data in Azure
You can also create your own custom roles. For detailed information, see Create or update Azure
custom roles using the Azure portal19 on the Microsoft website.
●● A scope lists the set of resources that the access applies to. When you assign a role, you can further
limit the actions allowed by defining a scope. This is helpful if, for example, you want to make some-
one a Website Contributor, but only for one resource group.
You add role assignments to a resource in the Azure portal using the Access control (IAM) page. The
Role assignments tab enables you to associate a role with a security principal, defining the level of
access the role has to the resource. For further information, read Add or remove Azure role assign-
ments using the Azure portal20.
19 https://docs.microsoft.com/azure/role-based-access-control/custom-roles-portal
20 https://docs.microsoft.com/azure/role-based-access-control/role-assignments-portal
Explore provisioning and deploying relational database offerings in Azure 89
The image below shows the Advanced data security page for SQL database. The corresponding pages
for MySQL and PostgreSQL are similar.
Configure DoSGuard
Denial of service (DoS) attacks are reduced by a SQL Database gateway service called DoSGuard. DoS-
Guard actively tracks failed logins from IP addresses. If there are multiple failed logins from a specific IP
92 Module 2 Explore relational data in Azure
address within a period of time, the IP address is blocked from accessing any resources in the service for
a short while.
In addition, the Azure SQL Database gateway performs the following tasks:
●● It validates all connections to the database servers, to ensure that they are from genuine clients.
●● It encrypts all communications between a client and the database servers.
●● It inspects each network packet sent over a client connection. The gateway validates the connection
information in the packet, and forwards it to the appropriate physical server based on the database
name that's specified in the connection string.
NOTE: Connections to your Azure Database for PostgreSQL server communicate over port 5432. When
you try to connect from within a corporate network, outbound traffic over port 5432 might not be
allowed by your network's firewall. If so, you can't connect to your server unless your IT department
opens port 5432.
If you're familiar with PostgreSQL, you'll find that not all parameters are supported in Azure. The Server
parameters21 page on the Microsoft website describes the PostgreSQL parameters that are available.
PostgreSQL also provides the ability to extend the functionality of your database using extensions.
Extensions bundle multiple related SQL objects together in a single package that can be loaded or
removed from your database with a single command. After being loaded in the database, extensions
function like built-in features. You install an extension in your database before you can use it. To install a
particular extension, run the CREATE EXTENSION command from psql tool to load the packaged objects
into your database. Not all PostgreSQL extensions are supported in Azure. For a full list, read PostgreSQL
extensions in Azure Database for PostgreSQL - Single Server22.
21 https://docs.microsoft.com/azure/postgresql/concepts-servers#server-parameters
22 https://docs.microsoft.com/azure/postgresql/concepts-extensions
94 Module 2 Explore relational data in Azure
Replicas are new servers that you manage similar to regular Azure Database for PostgreSQL servers. For
each read replica, you're billed for the provisioned compute in vCores and storage in GB/month.
Use the Replication page for a PostgreSQL server in the Azure portal to add read replicas to your
database:
NOTE: Connections to your Azure Database for MySQL server communicate over port 3306. When you
try to connect from within a corporate network, outbound traffic over port 3306 might not be allowed by
your network's firewall. If so, you can't connect to your server unless your IT department opens port 3306.
IMPORTANT: By default, SSL connection security is required and enforced on your Azure Database for
MySQL server.
You can find more information about the parameters available for MySQL in Azure on the How to
configure server parameters in Azure Database for MySQL by using the Azure portal23 page on the
Microsoft website.
23 https://docs.microsoft.com/azure/mysql/howto-server-parameters
96 Module 2 Explore relational data in Azure
Go to the Exercise: Provision non-relational Azure data services24 module on Microsoft Learn, and
follow the instructions in the module to create the following data stores:
●● A Cosmos DB for holding information about the volume of items in stock. You need to store current
and historic information about volume levels, so you can track how levels vary over time. The data is
recorded daily.
●● A Data Lake store for holding production and quality data.
●● A blob container for holding images of the products the company manufactures.
●● File storage for sharing reports.
Summary
In this lesson, you've learned how to provision and deploy relational databases using different types of
data stores. You've seen how you can deploy Azure data services through the Azure portal, the Azure CLI,
and Azure PowerShell. You've also learned how to configure connectivity to these databases to allow
access from on-premises or within an Azure virtual network. You've also seen how to protect your
database using tools such as the firewall, and by configuring authentication.
Additional resources
●● Create an Azure Database for PostgreSQL25
●● Create an Azure Database for MySQL26
●● Create an Azure single Database27
●● Azure SQL Database documentation28
●● PostgreSQL Server parameters29
●● PostgreSQL extensions in Azure Database for PostgreSQL - Single Server30
●● How to configure server parameters in Azure Database for MySQL by using the Azure portal31
24 https://docs.microsoft.com/learn/modules/explore-provision-deploy-relational-database-offerings-azure/7-exercise-provision-relational-
azure-data-services?pivots=azuresqls
25 https://docs.microsoft.com/azure/postgresql/quickstart-create-server-database-portal
26 https://docs.microsoft.com/azure/mysql/quickstart-create-mysql-server-database-using-azure-portal
27 https://docs.microsoft.com/azure/sql-database/sql-database-single-database-quickstart-guide
28 https://docs.microsoft.com/azure/sql-database
29 https://docs.microsoft.com/azure/postgresql/concepts-servers#server-parameters
30 https://docs.microsoft.com/azure/postgresql/concepts-extensions
31 https://docs.microsoft.com/azure/mysql/howto-server-parameters
Query relational data in Azure 97
Learning objectives
In this lesson, you will:
●● Describe query techniques for data using the SQL language
●● Query relational data
Introduction to SQL
SQL stands for Structured Query Language. SQL is used to communicate with a relational database. It's
the standard language for relational database management systems. SQL statements are used to perform
tasks such as update data in a database, or retrieve data from a database. Some common relational
database management systems that use SQL include Microsoft SQL Server, MySQL, PostgreSQL, MariaDB,
and Oracle.
NOTE: SQL was originally standardized by the American National Standards Institute (ANSI) in 1986, and
by the International Organization for Standardization (ISO) in 1987. Since then, the standard has been
extended several times as relational database vendors have added new features to their systems. Addi-
tionally, most database vendors include their own proprietary extensions that are not part of the stand-
ard, which has resulted in a variety of dialects of SQL.
In this unit, you'll learn about SQL. You'll see how it's used to query and maintain data in a database, and
the different dialects that are available.
the database), and managing user accounts. PostgreSQL and MySQL also have their own versions of
these features.
Some popular dialects of SQL include:
●● Transact-SQL (SQL). This version of SQL is used by Microsoft SQL Server and Azure SQL Database.
●● pgSQL. This is the dialect, with extensions implemented in PostgreSQL.
●● PL/SQL. This is the dialect used by Oracle. PL/SQL stands for Procedural Language/SQL.
Users who plan to work specifically with a single database system should learn the intricacies of their
preferred SQL dialect and platform.
Statement Description
SELECT Select/Read rows from a table
INSERT Insert new rows into a table
UPDATE Edit/Update existing rows
DELETE Delete existing rows in a table
The basic form of an INSERT statement will insert one row at a time. By default, the SELECT, UPDATE,
and DELETE statements are applied to every row in a table. You usually apply a WHERE clause with these
statements to specify criteria; only rows that match these criteria will be selected, updated, or deleted.
WARNING: SQL doesn't provide are you sure? prompts, so be careful when using DELETE or UPDATE
without a WHERE clause because you can lose or modify a lot of data.
The following code is an example of a SQL statement that selects all rows that match a single filter from a
table. The FROM clause specifies the table to use:
SELECT *
FROM MyTable
WHERE MyColumn2 = 'contoso'
If a query returns many rows, they don't necessarily appear in any specific sequence. If you want to sort
the data, you can add an ORDER BY clause. The data will be sorted by the specified column:
SELECT *
FROM MyTable
ORDER BY MyColumn1
Query relational data in Azure 99
You can also run SELECT statements that retrieve data from multiple tables using a JOIN clause. Joins
indicate how the rows in one table are connected with rows in the other to determine what data to
return. A join condition defines the way two tables are related in a query by:
●● Specifying the column from each table to be used for the join. A typical join condition specifies a
foreign key from one table and its associated primary key in the other table.
●● Specifying a logical operator (for example, = or <>,) to be used in comparing values from the col-
umns.
The following query shows an example that joins two tables, named Inventory and CustomerOrder. It
retrieves all rows where the value in the ID column in the Inventory table matches the value in the
InventoryId column in the Inventory table matches the value in the InventoryID column in the Customer-
Order table.
SELECT *
FROM Inventory
JOIN CustomerOrder
WHERE Inventory.ID = CustomerOrder.InventoryID
SQL provides aggregate functions. An aggregate function calculates a single result across a set of rows or
an entire table. The example below finds the minimum value in the MyColumn1 column across all rows in
the MyTable table:
SELECT MIN(MyColumn1)
FROM MyTable
A number of other aggregate functions are available, including MAX (which returns the largest value in a
column), AVG (which returns the average value, but only if the column contains numeric data), and SUM
(which returns the sum of all the values in the column, but again, only if the column is numeric).
The next example shows how to update an existing row using SQL. It modifies the value of the second
column but only for rows that have the value 3 in MyColumn3. All other rows are left unchanged:
UPDATE MyTable
SET MyColumn2 = 'contoso'
WHERE MyColumn1 = 3
WARNING: If you omit the WHERE clause, an UPDATE statement will modify every row in the table.
Use the DELETE statement to remove rows. You specify the table to delete from, and a WHERE clause
that identifies the rows to be deleted:
DELETE FROM MyTable
WHERE MyColumn2 = 'contoso'
WARNING: If you omit the WHERE clause, a DELETE statement will remove every row from the table.
The INSERT statement takes a slightly different form. You specify a table and columns in an INTO clause,
and a list of values to be stored in these columns. Standard SQL only supports inserting one row at a
time, as shown in the following example. Some dialects allow you to specify multiple VALUES clauses to
add several rows at a time:
INSERT INTO MyTable(MyColumn1, MyColumn2, MyColumn3)
VALUES (99, 'contoso', 'hello')
100 Module 2 Explore relational data in Azure
Statement Description
CREATE Create a new object in the database, such as a
table or a view.
ALTER Modify the structure of an object. For instance,
altering a table to add a new column.
DROP Remove an object from the database.
RENAME Rename an existing object.
WARNING: The DROP statement is very powerful. When you drop a table, all the rows in that table are
lost. Unless you have a backup, you won't be able to retrieve this data.
The following example creates a new database table. The items between the parentheses specify the
details of each column, including the name, the data type, whether the column must always contain a
value (NOT NULL), and whether the data in the column is used to uniquely identify a row (PRIMARY KEY).
Each table should have a primary key, although SQL doesn't enforce this rule.
NOTE: Columns marked as NOT NULL are refererred to as mandatory columns. If you omit the NOT
NULL clause, you can create rows that don't contain a value in the column. An empty column in a row is
said to have a NULL value.
CREATE TABLE MyTable
(
MyColumn1 INT NOT NULL PRIMARY KEY,
MyColumn2 VARCHAR(50) NOT NULL,
MyColumn3 VARCHAR(10) NULL
);
The datatypes available for columns in a table will vary between database management systems. Howev-
er, most database management systems support numeric types such as INT, and string types such as
VARCHAR (VARCHAR stands for variable length character data). For more information, see the documen-
tation for your selected database management system.
Some tools and applications require a connection string that identifies the server, database, account
name, and password. You can find this information from the Overview page for a database in the Azure
portal: select Show database connection strings.
NOTE: The database connection string shown in the Azure portal does not include the password for the
account. You must contact your database administrator for this information.
You enter your SQL query in the query pane and then click Run to execute it. Any rows that are returned
appear in the Results pane. The Messages pane displays information such as the number of rows
returned, or any errors that occurred:
Query relational data in Azure 103
You can also enter INSERT, UPDATE, DELETE, CREATE, and DROP statements in the query pane.
If the sign-in command succeeds, you'll see a 1> prompt. You can enter SQL commands, then type GO on
a line by itself to run them.
32 https://docs.microsoft.com/sql/tools/sqlcmd-utility
104 Module 2 Explore relational data in Azure
Setting Description
Server name The fully qualified server name. You can find the
server name in the Azure portal, as described
earlier.
Authentication SQL Login or Windows Authentication. Unless
you're using Azure Active Directory, select SQL
Login.
User name The server admin account user name. Specify the
user name from the account used to create the
server.
Password The password you specified when you provisioned
the server.
Database name The name of the database to which you wish to
connect.
Server Group If you have many servers, you can create groups to
help categorize them. These groups are for
convenience in Azure Data Studio, and don't affect
the database or server in Azure.
33 https://docs.microsoft.com/sql/azure-data-studio/download-azure-data-studio
Query relational data in Azure 105
2. Select Connect.
If your server doesn't have a firewall rule allowing Azure Data Studio to connect, the Create new
firewall rule form opens. Complete the form to create a new firewall rule. For details, see Create a
server-level firewall rule using the Azure portal34.
3. After successfully connecting, your server is available in the SERVERS sidebar on the Connections
page. You can now use the New Query command to create and run scripts of SQL commands.
34 https://docs.microsoft.com/azure/azure-sql/database/firewall-create-server-level-portal-quickstart
106 Module 2 Explore relational data in Azure
The example below uses Transact-SQL commands to create a new database (CREATE DATABASE and
ALTER DATABASE commands are part of the Transact-SQL dialect, and aren't part of standard SQL). The
script then creates a new table named Customers, and inserts four rows into this table. Again, the version
of the INSERT statement, with four VALUES clauses, is part of the Transact-SQL dialect. The -- characters
start a comment in Transact-SQL. The [ and ] characters surround identifiers, such as the name of a
table, database, column, or data type. The N character in front of a string indicates that the string uses the
Unicode character set.
NOTE: You can't create new SQL databases from a connection in Azure Data Studio if you're running SQL
Database single database or elastic pools. You can only create new databases in this way if you're using
SQL Database managed instance.
IF NOT EXISTS (
SELECT name
FROM sys.databases
WHERE name = N'TutorialDB'
)
CREATE DATABASE [TutorialDB];
GO
To execute the script, select Run on the toolbar. Notifications appear in the MESSAGES pane showing
query progress.
108 Module 2 Explore relational data in Azure
Setting Value
Server type Database engine
Server name The fully qualified server name, from the Over-
view page in the Azure portal
Authentication SQL Server Authentication
Login The user ID of the server admin account used to
create the server.
Password Server admin account password
35 https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms
Query relational data in Azure 109
Setting Value
Server name The fully qualified server name, from the Over-
view page in the Azure portal
Authentication SQL Server Authentication
Login The user ID of the server admin account used to
create the server
Password Server admin account password
Database Name Your database name
36 https://visualstudio.microsoft.com/downloads/
Query relational data in Azure 111
3. In the Query window, enter your SQL query, and then select the Execute button in the toolbar. The
results appear in the Results pane.
112 Module 2 Explore relational data in Azure
As with Azure SQL Database, you must open the PostgreSQL firewall to enable client applications to
connect to the service. For detailed information, see Firewall rules in Azure Database for PostgreSQL
- Single Server37.
37 https://docs.microsoft.com/azure/postgresql/concepts-firewall-rules
38 http://postgresql.org
Query relational data in Azure 113
NOTE: postgres is the default management database created with Azure Database for PostgreSQL. You
can create additional databases using the CREATE DATABASE command from psql.
2. If your connection is successful, you'll see the prompt postgres=>.
3. You can create a new database with the following SQL command:
CREATE DATABASE "Adventureworks";
NOTE: You can enter commands across several lines. The semi-colon character acts as the command
terminator.
4. Inside psql, you can run the command \c Adventureworks to connect to the database.
5. You can create tables and insert data using CREATE and INSERT commands, as shown in the following
examples::
CREATE TABLE PEOPLE(NAME TEXT NOT NULL, AGE INT NOT NULL);
INSERT INTO PEOPLE(NAME, AGE) VALUES ('Bob', 35);
INSERT INTO PEOPLE(NAME, AGE) VALUES ('Sarah', 28);
CREATE TABLE LOCATIONS(CITY TEXT NOT NULL, STATE TEXT NOT NULL);
INSERT INTO LOCATIONS(CITY, STATE) VALUES ('New York', 'NY');
INSERT INTO LOCATIONS(CITY, STATE) VALUES ('Flint', 'MI');
6. You can retrieve the data you just added using the following SQL commands:
SELECT * FROM PEOPLE;
SELECT * FROM LOCATIONS;
2. In the Connection dialog box, in the Connection type drop-down list box, select PostgreSQL.
3. Fill in the remaining fields using the server name, user name, and password for your PostgreSQL
server.
Query relational data in Azure 115
Setting Description
Server Name The fully qualified server name from the Azure
portal.
User name The user name you want to sign in with. This must
be in the format shown in the Azure portal,
<username>@<hostname>.
Password The password for the account you're logging in
with.
Database name Fill this if you want the connection to specify a
database.
Server Group This option lets you assign this connection to a
specific server group you create.
Name (optional) This option lets you specify a friendly name for
your server.
4. Select Connect to establish the connection. After successfully connecting, your server opens in the
SERVERS sidebar. You can expand the Databases node to connect to databases on the server and
view their contents. Use the New Query command in the toolbar to create and run queries.
116 Module 2 Explore relational data in Azure
The following example adds a new table to the database and inserts four rows.
-- Create a new table called 'customers'
CREATE TABLE customers(
customer_id SERIAL PRIMARY KEY,
name VARCHAR (50) NOT NULL,
location VARCHAR (50) NOT NULL,
email VARCHAR (50) NOT NULL
);
5. From the toolbar, select Run to execute the query. As with Azure SQL, notifications appear in the
MESSAGES pane to show query progress.
6. To query the data, enter a SELECT statement, and then click Run:
-- Select rows from table 'customers'
SELECT * FROM customers;
You must also open the MySQL firewall to enable client applications to connect to the service. For
detailed information, see Azure Database for MySQL server firewall rules39.
39 https://docs.microsoft.com/azure/mysql/concepts-firewall-rules
40 https://dev.mysql.com/downloads/workbench
118 Module 2 Explore relational data in Azure
3. In the Connect to Database dialog box, enter the following information on the Parameters tab:
Setting Description
Stored connection Leave blank
Connection Method Standard (TCP/IP)
Hostname Specify the fully qualified server name from the
Azure portal
Port 3306
Username Enter the server admin login username from the
Azure portal, in the format <username><data-
basename>
Password Select Store in Vault, and enter the administrator
password specified when the server was created
4. Select OK to create the connection. If the connection is successful, the query editor will open.
Query relational data in Azure 119
5. You can use this editor to create and run scripts of SQL commands. The following example creates a
database named quickstartdb, and then adds a table named inventory. It inserts some rows, then reads
the rows. It changes the data with an update statement, and reads the rows again. Finally it deletes a
row, and then reads the rows again.
-- Create a database
-- DROP DATABASE IF EXISTS quickstartdb;
CREATE DATABASE quickstartdb;
USE quickstartdb;
-- Read
SELECT * FROM inventory;
-- Update
UPDATE inventory SET quantity = 200 WHERE id = 1;
SELECT * FROM inventory;
-- Delete
DELETE FROM inventory WHERE id = 2;
SELECT * FROM inventory;
6. To run the sample SQL Code, select the lightning bolt icon in the toolbar
120 Module 2 Explore relational data in Azure
The query results appear in the Result Grid section in the middle of the page. The Output list at the
bottom of the page shows the status of each command as it is run.
Summary
In this lesson, you've learned how to use SQL to store and retrieve data in Azure SQL Database, Azure
Database for PostgreSQL, and Azure Database for MySQL. You've seen how to connect to these database
management systems using some of the common tools currently available.
Learn more
●● sqlcmd Utility42
●● Download and install Azure Data Studio43
●● Download SQL Server Management Studio (SSMS)44
41 https://docs.microsoft.com/learn/modules/query-relational-data/6-perform-query
42 https://docs.microsoft.com/sql/tools/sqlcmd-utility
43 https://docs.microsoft.com/sql/azure-data-studio/download-azure-data-studio
44 https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms
Query relational data in Azure 121
●● Tutorial: Design a relational database in a single database within Azure SQL using SSMS45
●● MySQL Community Downloads46
●● Azure Database for MySQL: Use MySQL Workbench to connect and query data47
●● Quickstart: Use the Azure portal's query editor to query a database48
●● DML Queries with SQL49
●● Joins (SQL Server)50
45 https://docs.microsoft.com/azure/sql-database/sql-database-design-first-database
46 https://dev.mysql.com/downloads/workbench
47 https://docs.microsoft.com/azure/mysql/connect-workbench
48 https://docs.microsoft.com/azure/mysql/connect-workbench
49 https://docs.microsoft.com/sql/t-sql/queries/queries
50 https://docs.microsoft.com/sql/relational-databases/performance/joins
122 Module 2 Explore relational data in Azure
Answers
Question 1
Which deployment requires the fewest changes when migrating an existing SQL Server on-premises
solution?
Azure SQL Database Managed Instance
■■ SQL Server running on a virtual machine
Azure SQL Database Single Database
Explanation
That's correct. SQL Server running on a virtual machine supports anything an on-premises solution has.
Question 2
Which of the following statements is true about SQL Server running on a virtual machine?
You must install and maintain the software for the database management system yourself, but
backups are automated
Software installation and maintenance are automated, but you must do your own backups
■■ You're responsible for all software installation and maintenance, and performing back ups
Explanation
That's correct. With SQL Server running on a virtual machine, you're responsible for patching and backing
up.
Question 3
Which of the following statement is true about Azure SQL Database?
Scaling up doesn't take effect until you restart the database
Scaling out doesn't take effect until you restart the database
■■ Scaling up or out will take effect without restarting the SQL database
Explanation
That's correct, you can scale up or out without interrupting the usage of the DB.
Question 4
When using an Azure SQL Database managed instance, what is the simplest way to implement backups?
Manual Configuration of the SQL server
Create a scheduled task to back up
■■ Backups are automatically handled
Explanation
That's correct. A managed instance comes with the benefit of automatic backups and the ability to restore
to a point in time.
Query relational data in Azure 123
Question 5
What is the best way to transfer the data in a PostgreSQL database running on-premises into a database
running Azure Database for PostgreSQL service?
Export the data from the on-premises database and import it manually into the database running in
Azure
Upload a PostgreSQL database backup file to the database running in Azure
■■ Use the Azure Database Migration Services
Explanation
That's correct. The Database Migration Service offers the safest way to push your on-premises PostgreSQL
database into Azure.
Module 3 Explore non-relational data offer-
ings on Azure
Learning objectives
In this lesson, you will:
●● Explore use-cases and management benefits of using Azure Table storage
●● Explore use-cases and management benefits of using Azure Blob storage
●● Explore use-cases and management benefits of using Azure File storage
●● Explore use-cases and management benefits of using Azure Cosmos DB
126 Module 3 Explore non-relational data offerings on Azure
To help ensure fast access, Azure Table Storage splits a table into partitions. Partitioning is a mechanism
for grouping related rows, based on a common property or partition key. Rows that share the same
partition key will be stored together. Partitioning not only helps to organize data, it can also improve
scalability and performance:
●● Partitions are independent from each other, and can grow or shrink as rows are added to, or removed
from, a partition. A table can contain any number of partitions.
●● When you search for data, you can include the partition key in the search criteria. This helps to narrow
down the volume of data to be examined, and improves performance by reducing the amount of I/O
(reads and writes) needed to locate the data.
The key in an Azure Table Storage table comprises two elements; the partition key that identifies the
partition containing the row (as described above), and a row key that is unique to each row in the same
partition. Items in the same partition are stored in row key order. If an application adds a new row to a
table, Azure ensures that the row is placed in the correct position in the table. In the example below,
taken from an IoT scenario, the row key is a date and time value.
128 Module 3 Explore non-relational data offerings on Azure
This scheme enables an application to quickly perform Point queries that identify a single row, and Range
queries that fetch a contiguous block of rows in a partition.
In a point query, when an application retrieves a single row, the partition key enables Azure to quickly
hone in on the correct partition, and the row key lets Azure identify the row in that partition. You might
have hundreds of millions of rows, but if you've defined the partition and row keys carefully when you
designed your application, data retrieval can be very quick. The partition key and row key effectively
define a clustered index over the data.
Explore non-relational data offerings in Azure 129
In a range query, the application searches for a set of rows in a partition, specifying the start and end
point of the set as row keys. This type of query is also very quick, as long as you have designed your row
keys according to the requirements of the queries performed by your application.
130 Module 3 Explore non-relational data offerings on Azure
The columns in a table can hold numeric, string, or binary data up to 64 KB in size. A table can have to
252 columns, apart from the partition and row keys. The maximum row size is 1 MB. For more informa-
tion, read Understanding the Table service data model1.
1 https://docs.microsoft.com/rest/api/storageservices/Understanding-the-Table-Service-Data-Model
Explore non-relational data offerings in Azure 131
Azure Table Storage is intended to support very large volumes of data, up to several hundred TBs in size.
As you add rows to a table, Azure Table Storage automatically manages the partitions in a table and
allocates storage as necessary. You don't need to take any additional steps yourself.
Azure Table Storage provides high-availability guarantees in a single region. The data for each table is
replicated three times within an Azure region. For increased availability, but at additional cost, you can
create tables in geo-redundant storage. In this case, the data for each table is replicated a further three
times in another region several hundred miles away. If a replica in the local region becomes unavailable,
Azure will transparently switch to a working replica while the failed replica is recovered. If an entire region
is hit by an outage, your tables are safe in a remote region, and you can quickly switch your application
to connect to that remote region.
Azure Table Storage helps to protect your data. You can configure security and role-based access control
to ensure that only the people or applications that need to see your data can actually retrieve it.
3. On the New page, select Storage account - blob, file, table, queue
132 Module 3 Explore non-relational data offerings on Azure
4. On the Create storage account page, enter the following details, and then select Review + create.
Field Value
Subscription Select your Azure subscription
Explore non-relational data offerings in Azure 133
Field Value
Resource group Select Create new, and specify the name of a new
Azure resource group. Use a name of your choice,
such as mystoragegroup
Storage account name Enter a name of your choice for the storage
account. The name must be unique though
Location Select your nearest location
Performance Standard
Account kind StorageV2 (general purpose v2)
Replication Read-access geo-redundant storage (RA-GRS)
Access tier Hot
5. On the validation page, click Create, and wait while the new storage account is configured.
6. When the Your deployment is complete page appears, select Go to resource.
134 Module 3 Explore non-relational data offerings on Azure
7. On the Overview page for the new storage account, select Tables.
9. In the Add table dialog box, enter testtable for the name of the table, and then select OK.
10. When the new table has been created, select Storage Explorer.
136 Module 3 Explore non-relational data offerings on Azure
11. On the Storage Explorer page, expand Tables, and then select testtable. Select Add to insert a new
entity into the table.
NOTE: In Storage Explorer, rows are also called entities.
12. In the Add Entity dialog box, enter your own values for the PartitionKey and RowKey properties,
and then select Add Property. Add a String property called Name and set the value to your name.
Select Add Property again, and add a Double property (this is numeric) named Age, and set the
value to your age. Select Insert to save the entity.
Explore non-relational data offerings in Azure 137
13. Verify that the new entity has been created. The entity should contain the values you specified,
together with a timestamp that contains the date and time that the entity was created.
138 Module 3 Explore non-relational data offerings on Azure
14. If time allows, experiment with creating additional entities. Not all entities must have the same
properties. You can use the Edit function to modify the values in entity, and add or remove properties.
The Query function enables you to find entities that have properties with a specified set of values.
Blob storage provides three access tiers, which help to balance access latency and storage cost:
●● The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data is
stored on high-performance media.
●● The Cool tier. This tier has lower performance and incurs reduced storage charges compared to the
Hot tier. Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs
to be accessed frequently initially, but less so as time passes. In these situations, you can create the
blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob form the Cool tier
back to the Hot tier.
●● The Archive tier. This tier provides the lowest storage cost, but with increased latency. The Archive tier
is intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier
are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few
milliseconds, but for the Archive tier, it can take hours for the data to become available. To retrieve a
blob from the Archive tier, you must change the access tier to Hot or Cool. The blob will then be
rehydrated. You can read the blob only when the rehydration process is complete.
You can create lifecycle management policies for blobs in a storage account. A lifecycle management
policy can automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is
used less frequently (policy is based on the number of days since modification). A lifecycle management
policy can also arrange to delete outdated blobs.
140 Module 3 Explore non-relational data offerings on Azure
2 https://docs.microsoft.com/azure/storage/blobs/storage-blob-static-website
3 https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
Explore non-relational data offerings in Azure 141
3. On the Storage accounts page, select the storage account you created in the previous unit.
4. On the Overview page for your storage account, select Storage Explorer.
5. On the Storage Explorer page, right-click BLOB CONTAINERS, and then select Create blob contain-
er.
142 Module 3 Explore non-relational data offerings on Azure
6. In the New Container dialog box, give your container a name, accept the default public access level,
and then select Create.
Explore non-relational data offerings in Azure 143
7. In the Storage Explorer window, expand BLOB CONTAINERS, and then select your new blob con-
tainer.
9. In the Upload blob dialog box, use the files button to pick a file of your choice on your computer,
and then select Upload
10. When the upload has completed, close the Upload blob dialog box. Verify that the block blob
appears in your container.
144 Module 3 Explore non-relational data offerings on Azure
11. If you have time, you can experiment uploading other files as block blobs. You can also download
blobs back to your computer using the Download button.
You create Azure File storage in a storage account. Azure File Storage enables you to share up to 100 TB
of data in a single storage account. This data can be distributed across any number of file shares in the
account. The maximum size of a single file is 1 TiB, but you can set quotas to limit the size of each share
below this figure. Currently, Azure File Storage supports up to 2000 concurrent connections per shared
file.
Once you've created a storage account, you can upload files to Azure File Storage using the Azure portal,
or tools such as the AzCopy utility. You can also use the Azure File Sync service to synchronize locally
cached copies of shared files with the data in Azure File Storage.
Azure File Storage offers two performance tiers. The Standard tier uses hard disk-based hardware in a
datacenter, and the Premium tier uses solid-state disks. The Premium tier offers greater throughput, but
is charged at a higher rate.
where. Applications running in the cloud can share data with on-premises applications using the same
consistency guarantees implemented by on-premises SMB servers.
●● Integrate modern applications with Azure File Storage.
By leveraging the modern REST API that Azure File Storage implements in addition to SMB 3.0, you
can integrate legacy applications with modern cloud applications, or develop new file or file share-
based applications.
●● Simplify hosting High Availability (HA) workload data.
Azure File Storage delivers continuous availability so it simplifies the effort to host HA workload data
in the cloud. The persistent handles enabled in SMB 3.0 increase availability of the file share, which
makes it possible to host applications such as SQL Server and IIS in Azure with data stored in shared
file storage.
NOTE: Don't use Azure File Storage for files that can be written by multiple concurrent processes simulta-
neously. Multiple writers require careful synchronization, otherwise the changes made by one process can
be overwritten by another. The alternative solution is to lock the file as it is written, and then release the
lock when the write operation is complete. However, this approach can severaly impact concurrency and
limit performance.
Azure Files Storage is a fully managed service. Your shared data is replicated locally within a region, but
can also be geo-replicated to a second region.
Azure aims to provide up to 300 MB/second of throughput for a single Standard file share, but you can
increase throughput capacity by creating a Premium file share, for additional cost.
All data is encrypted at rest, and you can enable encryption for data in-transit between Azure File Storage
and your applications.
For additional information on managing and planning to use Azure File Storage, read Planning for an
Azure Files deployment4.
4 https://docs.microsoft.com/azure/storage/files/storage-files-planning
Explore non-relational data offerings in Azure 147
6. In the New file share dialog box, enter a name for your file share, leave Quota empty, and then select
Create.
148 Module 3 Explore non-relational data offerings on Azure
7. In the Storage Explorer window, expand FILE SHARES, and select your new file share, and then select
Upload.
TIP: If your new file share doesn't appear, right-click FILE SHARES, and then select Refresh.
8. In the Upload files dialog box, use the files button to pick a file of your choice on your computer, and
then select Upload
9. When the upload has completed, close the Upload files dialog box. Verify that the file appears in file
share.
TIP: If the file doesn't appear, right-click FILE SHARES, and then select Refresh.
Relational databases store data in relational tables, but sometimes the structure imposed by this model
can be too rigid, and often leads to poor performance unless you spend time implementing detailed
tuning. Other models, collectively known as NoSQL databases exist. These models store data in other
structures, such as documents, graphs, key-value stores, and column family stores.
## Document 2 ##
{
"customerID": "103249",
"name":
{
"title": "Mr",
"forename": "AAA",
"lastname": "BBB"
},
"address":
{
"street": "Another Street",
"number": "202",
"city": "Bcity",
"county": "Gloucestershire",
"country-region": "UK"
},
150 Module 3 Explore non-relational data offerings on Azure
"ccOnFile": "yes"
}
A document can hold up to 2 MB of data, including small binary objects. If you need to store larger blobs
as part of a document, use Azure Blob storage, and add a reference to the blob in the document.
Cosmos DB provides APIs that enable you to access these documents using a set of well-known interfac-
es.
NOTE: An API is an Application Programming Interface. Database management systems (and other
software frameworks) provide a set of APIs that developers can use to write programs that need to access
data. The APIs will often be different for different database management systems.
The APIs that Cosmos DB currently supports include:
●● SQL API. This interface provides a SQL-like query language over documents, enable to identify and
retrieve documents using SELECT statements. The example below finds the address for customer
103248 in the documents shown above:
SELECT a.address
FROM customers a
WHERE a.customerID = "103248"
●● Table API. This interface enables you to use the Azure Table Storage API to store and retrieve docu-
ments. The purpose of this interface is to enable you to switch from Table Storage to Cosmos DB
without requiring that you modify your existing applications.
●● MongoDB API. MongoDB is another well-known document database, with its own programmatic inter-
face. Many organizations run MongoDB on-premises. You can use the MongoDB API for Cosmos DB
to enable a MongoDB application to run unchanged against a Cosmos DB database. You can migrate
the data in the MongoDB database to Cosmos DB running in the cloud, but continue to run your
existing applications to access this data.
●● Cassandra API. Cassandra is a column family database management system. This is another database
management system that many organizations run on-premises. The Cassandra API for Cosmos DB
provides a Cassandra-like programmatic interface for Cosmos DB. Cassandra API requests are mapped
to Cosmos DB document requests. As with the MongoDB API, the primary purpose of the Cassandra
API is to enable you to quickly migrate Cassandra databases and applications to Cosmos DB.
●● Gremlin API. The Gremlin API implements a graph database interface to Cosmos DB. A graph is a
collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos
DB, but the Gremlin API enables you to perform graph queries over data. Using the Gremlin API you
can walk through the objects and relationships in the graph to discover all manner of complex
relationships, such as “What is the name of the pet of Sam's landlord?” in the graph shown below.
Explore non-relational data offerings in Azure 151
NOTE: The primary purpose of the Table, MongoDB, Cassandra, and Gremlin APIs is to support existing
applications. If you are building a new application and database, you should use the SQL API.
Documents in a Cosmos DB database are organized into containers. The documents in a container are
grouped together into partitions. A partition holds a set of documents that share a common partition
key. You designate one of the fields in your documents as the partition key. You should select a partition
key that collects all related documents together. This approach helps to reduce the amount of I/O (disk
reads) that queries might need to perform when retrieving a set of documents for a given entity. For
example, in a document database for an ecommerce system recording the details of customers and the
orders they've placed, you could partition the data by customer ID, and store the customer and order
details for each customer in the same partition. To find all the information and orders for a customer, you
simply need to query that single partition:
152 Module 3 Explore non-relational data offerings on Azure
There's a superficial similarity between a Cosmos DB container and a table in Azure Table storage: in both
cases, data is partitioned and documents (rows in a table) are identified by a unique ID within a partition.
However, the similarity ends there. Unlike Azure Table storage, documents in a Cosmos DB partition
aren't sorted by ID. Instead, Cosmos DB maintains a separate index. This index contains not only the
document IDs, but also tracks the value of every other field in each document. This index is created and
maintained automatically. This index enables you to perform queries that specify criteria referencing any
fields in a container, without incurring the need to scan the entire partition to find that data. For a
detailed description of how Cosmos DB indexing works, read Indexing in Azure Cosmos DB - Over-
view.5
5 https://docs.microsoft.com/azure/cosmos-db/index-overview
Explore non-relational data offerings in Azure 153
6 https://docs.microsoft.com/azure/cosmos-db/consistency-levels
7 https://docs.microsoft.com/azure/cosmos-db/use-cases
154 Module 3 Explore non-relational data offerings on Azure
Knowledge check
Question 1
What are the elements of an Azure Table storage key?
Table name and column name
Partition key and row key
Row number
Question 2
When should you use a block blob, and when should you use a page blob?
Use a block blob for unstructured data that requires random access to perform reads and writes. Use
a page blob for discrete objects that rarely change.
Use a block blob for active data stored using the Hot data access tier, and a page blob for data stored
using the Cool or Archive data access tiers.
Use a page block for blobs that require random read and write access. Use a block blob for discrete
objects that change infrequently.
Question 3
Why might you use Azure File storage?
To share files that are stored on-premises with users located at other sites.
To enable users at different sites to share files.
To store large binary data files containing images or other unstructured data.
Question 4
You are building a system that monitors the temperature throughout a set of office blocks, and sets the air
conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has to
manage the air conditioning in several thousand buildings spread across the country/region, and each
building typically contains at least 100 air-conditioned rooms. What type of NoSQL data store is most
appropriate for capturing the temperature data to enable it to be processed quickly?
Send the data to an Azure Cosmos DB database and use Azure Functions to process the data.
Store the data in a file stored in a share created using Azure File Storage.
Write the temperatures to a blob in Azure Blob storage.
Summary
Microsoft Azure provides a range of technologies for storing non-relational data. Each technology has its
own strengths, and is suited to specific scenarios.
In this lesson, you've learned about the following technologies, and how you can use them to meet the
requirements of various scenarios:
●● Azure Table storage
Explore non-relational data offerings in Azure 155
Learn more
●● Understanding the Table service data model8
●● Azure Table storage table design guide: Scalable and performant tables9
●● Introduction to Azure Blob storage10
●● Introduction to Azure Data Lake Storage Gen211
●● Static website hosting in Azure Storage12
●● What is Azure Files?13
●● Planning for an Azure Files deployment14
●● Welcome to Azure Cosmos DB15
●● Indexing in Azure Cosmos DB - Overview16
●● Consistency levels in Azure Cosmos DB17
8 https://docs.microsoft.com/rest/api/storageservices/Understanding-the-Table-Service-Data-Model
9 https://docs.microsoft.com/azure/cosmos-db/table-storage-design-guide
10 https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction
11 https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
12 https://docs.microsoft.com/azure/storage/blobs/storage-blob-static-website
13 https://docs.microsoft.com/azure/storage/files/storage-files-introduction
14 https://docs.microsoft.com/azure/storage/files/storage-files-planning
15 https://docs.microsoft.com/azure/cosmos-db/introduction
16 https://docs.microsoft.com/azure/cosmos-db/index-overview
17 https://docs.microsoft.com/azure/cosmos-db/consistency-levels
156 Module 3 Explore non-relational data offerings on Azure
Learning objectives
In this lesson, you will:
●● Provision non-relational data services
●● Configure non-relational data services
●● Explore basic connectivity issues
●● Explore data security components
What is provisioning?
Provisioning is the act of running a series of tasks that a service provider, such as Azure Cosmos DB,
performs to create and configure a service. Behind the scenes, the service provider will set up the various
resources (disks, memory, CPUs, networks, and so on) required to run the service. You'll be assigned these
resources, and they remain allocated to you (and charged to you), until you delete the service.
How the service provider provisions resources is opaque, and you don't need to be concerned with how
this process works. All you do is specify parameters that determine the size of the resources required
Explore provisioning and deploying non-relational data services in Azure 157
(how much disk space, memory, computing power, and network bandwidth). These parameters are
determined by estimating the size of the workload that you intend to run using the service. In many
cases, you can modify these parameters after the service has been created, perhaps increasing the
amount of storage space or memory if the workload is greater than you initially anticipated. The act of
increasing (or decreasing) the resources used by a service is called scaling.
The following video summarizes the process that Azure performs when you provision a service.
https://www.microsoft.com/videoplayer/embed/RE4zTud
18 https://docs.microsoft.com/cli/azure/what-is-azure-cli
19 https://docs.microsoft.com/powershell/azure
158 Module 3 Explore non-relational data offerings on Azure
You send the template to Azure using the az deployment group create command in the Azure CLI,
or New-AzResourceGroupDeployment command in Azure PowerShell. For more information about
creating and using Azure Resource Manager templates to provision Azure resources, see What are Azure
Resource Manager templates?20
https://www.microsoft.com/videoplayer/embed/RE4AwNK
If you prefer to use the Azure CLI or Azure PowerShell, you can run the following commands to create a
Cosmos DB account. The parameters to these commands correspond to many of the options you can
select using the Azure portal. The examples shown below create an account for the Core(SQL) API, with
geo-redundancy between the EastUS and WestUS regions, and support for multi-region writes. For more
information about these commands, see the az cosmosdb create21 page for the Azure CLI, or the
New-AzCosmosDBAccount22 page for PowerShell.
## Azure CLI
az cosmosdb create \
--subscription <your-subscription> \
--resource-group <resource-group-name> \
--name <cosmosdb-account-name> \
--locations regionName=eastus failoverPriority=0 \
--locations regionName=westus failoverPriority=1 \
20 https://docs.microsoft.com/azure/azure-resource-manager/templates/overview
21 https://docs.microsoft.com/cli/azure/cosmosdb?view=azure-cli-latest#az-cosmosdb-create
22 https://docs.microsoft.com/powershell/module/az.cosmosdb/new-azcosmosdbaccount
Explore provisioning and deploying non-relational data services in Azure 159
--enable-multiple-write-locations
## Azure PowerShell
New-AzCosmosDBAccount `
-ResourceGroupName "<resource-group-name>" `
-Name "<cosmosbd-account-name>" `
-Location @("West US", "East US") `
-EnableMultipleWriteLocations
NOTE: To use Azure PowerShell to provision a Cosmos DB account, you must first install the Az.Cosmos-
DB PowerShell module:
Install-Module -Name Az.CosmosDB
The other deployment option is to use an Azure Resource Manager template. The template for Cosmos
DB can be rather lengthy, because of the number of parameters. To make life easier, Microsoft has
published a number of example templates for handling different configurations. You can download these
templates from the Microsoft web site, at Manage Azure Cosmos DB Core (SQL) API resources with
Azure Resource Manager templates23.
23 https://docs.microsoft.com/azure/cosmos-db/manage-sql-with-resource-manager
160 Module 3 Explore non-relational data offerings on Azure
https://www.microsoft.com/videoplayer/embed/RE4AkhH
If you prefer to use the Azure CLI or Azure PowerShell, you can run the following commands to create
documents and containers. The code below shows some examples:
## Azure CLI - create a database
Set-AzCosmosDBSqlDatabase `
-ResourceGroupName "<resource-group-name>" `
-AccountName "<cosmos-db-account-name>" `
-Name "<database-name>" `
-Throughput <number-of-RU/s>
Set-AzCosmosDBSqlContainer `
-ResourceGroupName "<resource-group-name>" `
-AccountName "<cosmos-db-account-name>" `
-DatabaseName "<database-name>" `
-Name "<container-name>" `
-PartitionKeyKind Hash `
-PartitionKeyPath "<key-field-in-documents>"
Explore provisioning and deploying non-relational data services in Azure 161
such as databases. You can also use premium storage to hold Azure virtual machine disks. A
premium storage account is more expensive than a standard account.
NOTE: Data Lake storage is only available with a standard storage account, not premium.
●● Account kind. Azure storage supports several different types of account:
●● General-purpose v2. You can use this type of storage account for blobs, files, queues, and tables,
and is recommended for most scenarios that require Azure Storage. If you want to provision Azure
Data Lake Storage, you should specify this account type.
●● General-purpose v1. This is a legacy account type for blobs, files, queues, and tables. Use gener-
al-purpose v2 accounts when possible.
●● BlockBlobStorage. The type of storage account is only available for premium accounts. You use
this account type for block blobs and append blobs. It's recommended for scenarios with high
transaction rates, or that use smaller objects, or require consistently low storage latency.
●● FileStorage. This type is also only available for premium accounts. You use it to create files-only
storage accounts with premium performance characteristics. It's recommended for enterprise or
high-performance scale applications. Use this type if you're creating an account to support File
Storage.
●● BlobStorage. This is another legacy account type that can only hold blobs. Use general-purpose
v2 accounts instead, when possible. You can use this account type for Azure Data Lake storage, but
the General-purpose v2 account type is preferable.
●● Replication. Data in an Azure Storage account is always replicated three times in the region you
specify as the primary location for the account. Azure Storage offers two options for how your data is
replicated in the primary region:
●● Locally redundant storage (LRS) copies your data synchronously three times within a single
physical location in the region. LRS is the least expensive replication option, but isn't recommend-
ed for applications requiring high availability.
●● Geo-redundant storage (GRS) copies your data synchronously three times within a single
physical location in the primary region using LRS. It then copies your data asynchronously to a
single physical location in the secondary region. This form of replication protects you against
regional outages.
●● Read-access geo-redundant storage (RA-GRS) replication is an extension of GRS that provides
direct read-only access to the data in the secondary location. In contrast, the GRS option doesn't
expose the data in the secondary location, and it's only used to recover from a failure in the
primary location. RA-GRS replication enables you to store a read-only copy of the data close to
users that are located in a geographically distant location, helping to reduce read latency times.
NOTE: To maintain performance, premium storage accounts only support LRS replication. This is
because replication is performed synchronously to maintain data integrity. Replicating data to a
distant region can increase latency to the point at which any advantages of using premium storage
are lost.
●● Access tier. This option is only available for standard storage accounts. You can select between Hot
and Cool.
The hot access tier has higher storage costs than cool and archive tiers, but the lowest access costs.
Example usage scenarios for the hot access tier include:
●● Data that's in active use or expected to be accessed (read from and written to) frequently.
●● Data that's staged for processing and eventual migration to the cool access tier.
164 Module 3 Explore non-relational data offerings on Azure
The cool access tier has lower storage costs and higher access costs compared to hot storage. This tier
is intended for data that will remain in the cool tier for at least 30 days. Example usage scenarios for
the cool access tier include:
●● Short-term backup and disaster recovery datasets.
●● Older media content not viewed frequently anymore but is expected to be available immediately
when accessed.
●● Large data sets that need to be stored cost effectively while more data is being gathered for future
processing. For example, long-term storage of scientific data, or raw telemetry data from a
manufacturing facility.
The sku is combination of the performance tier and replication options. It can be one of Premium_LRS,
Premium_ZRS, Standard_GRS, Standard_GZRS, Standard_LRS, Standard_RAGRS, Standard_RAGZRS, or
Standard_ZRS.
NOTE: ZRS in some of these skus stands for Zone redundant storage. Zone-redundant storage replicates
your Azure Storage data synchronously across three Azure availability zones in the primary region. Each
availability zone is a separate physical location with independent power, cooling, and networking. This is
useful for applications requiring high availability.
The kind parameter should be one of BlobStorage, BlockBlobStorage, FileStorage, Storage, or StorageV2.
The access-tier parameter can either be Cool or Hot.
The values for SkuName, Kind, and AccessTier are the same as those in the Azure CLI command.
Explore provisioning and deploying non-relational data services in Azure 165
After the storage account has been created, you can add one or more Data Lake Storage containers to
the account. Each container supports a directory structure for storing Data Lake files.
166 Module 3 Explore non-relational data offerings on Azure
The Containers page enables you to create and manage containers. Each container must have a unique
name within the storage account. You can also specify the access level. By default, data held in a contain-
er is only accessible by the container owner. You can set the access level to Blob to enable public read
access to any blobs created in the container, or Container to allow read access to the entire contents of
the container, including the ability to list all blobs. You can also configure role-based access control for a
blob if you need a more granular level of security.
168 Module 3 Explore non-relational data offerings on Azure
Once you've provisioned a container, your applications can upload blobs into the container.
The public-access parameter can be blob, container, or off (for private access only).
Using the File shares page, create a new file share. Give the file share a name, and optionally set a quota
to limit the size of files on the share. The total size of all files across all file shares in a storage account
can't exceed 5120 GB.
170 Module 3 Explore non-relational data offerings on Azure
After you've created the file share, applications can read and write shared files using the file share.
24 https://docs.microsoft.com/azure/storage/common/storage-network-security
172 Module 3 Explore non-relational data offerings on Azure
Cosmos DB, SQL, or your own Private Link Service. For detailed information, read What is Azure Private
Endpoint?25.
The Private endpoint connections page for a service allows you to specify which private endpoints, if
any, are permitted access to your service. You can use the settings on this page, together with the
Firewalls and virtual networks page, to completely lock down users and applications from accessing
public endpoints to connect to your Cosmos DB account.
Configure authentication
Many services include an access key that you can specify when you attempt to connect to the service. If
you provide an incorrect key, you'll be denied access. The image below shows how to find the access key
for an Azure Storage account; you select Access Keys under Settings on the main page for the account.
Many other services allow you to view the access key in the same way from the Azure portal. If your key is
compromised, you can generate a new access key.
NOTE: Azure services actually provide two keys, labeled key1 and key2. An application can use either key
to connect to the service.
Any user or application that knows the access key for a resource can connect to that resource. However,
access keys provide a rather coarse-grained level of authentication. Additionally, if you need to regener-
ate an access key (after accidental disclosure, for example), you may need to update all applications that
connect using that key.
Azure Active Directory (Azure AD) provides superior security and ease of use over access key authoriza-
tion. Microsoft recommends using Azure AD authorization when possible to minimize potential security
vulnerabilities inherent in using access keys.
Azure AD is a separate Azure service. You add users and other security principals (such as an application)
to a security domain managed by Azure AD. The following video describes how authentication works with
Azure.
25 https://docs.microsoft.com/azure/private-link/private-endpoint-overview
Explore provisioning and deploying non-relational data services in Azure 173
https://www.microsoft.com/videoplayer/embed/RE4A94T
For detailed information on using Azure AD, visit the page What is Azure Active Directory?26 on the
Microsoft website.
26 https://docs.microsoft.com/azure/active-directory/fundamentals/active-directory-whatis
27 https://docs.microsoft.com/azure/role-based-access-control/custom-roles-portal
174 Module 3 Explore non-relational data offerings on Azure
You add role assignments to a resource in the Azure portal using the Access control (IAM) page. The
Role assignments tab enables you to associate a role with a security principal, defining the level of
access the role has to the resource. For further information, read Add or remove Azure role assign-
ments using the Azure portal28.
28 https://docs.microsoft.com/azure/role-based-access-control/role-assignments-portal
Explore provisioning and deploying non-relational data services in Azure 175
Configure Cosmos DB
Configure replication
Azure Cosmos DB enables you to replicate the databases and containers in your account across multiple
regions. When you initially provision an account, you can specify that you want to copy data to another
region. You don't have control over which region is used as the next nearest region is automatically
selected. The Replicate data globally page enables you to configure replication in more detail. You can
replicate to multiple regions, and you select the regions to use. In this way, you can pick the regions that
are closest to your consumers, to help minimize the latency of requests made by those consumers.
You can also use this page to configure automatic failover to help ensure high availability. If the databas-
es in the primary region (the region in which you created the account) become unavailable, one of the
replicated regions will take over processing and become the new primary region.
By default, only the region in which you created the account supports write operations; the replicas are all
read-only. However, you can enable multi-region writes. Multi-region writes can cause conflicts though, if
applications running in different regions modify the same data. In this case, the most recent write will
176 Module 3 Explore non-relational data offerings on Azure
overwrite changes made earlier when data is replicated, although you can write your own code to apply a
different strategy.
Replication is asynchronous, so there's likely to be a lag between a change made in one region, and that
change becoming visible in other regions.
NOTE: Each replica increases the cost of the Cosmos DB service. For example, if you replicate your
account to two regions, your costs will be three times that of a non-replicated account.
Configure consistency
Within a single region, Cosmos DB uses a cluster of servers. This approach helps to improve scalability
and availability. A copy of all data is held in each server in the cluster. The following video explains how
this works, and the effects it can have on consistency.
https://www.microsoft.com/videoplayer/embed/RE4AbG9
Cosmos DB enables you to specify how such inconsistencies should be handled. It provides the following
options:
●● Eventual. This option is the least consistent. It's based on the situation just described. Changes won't
be lost, they'll appear eventually, but they might not appear immediately. Additionally, if an applica-
tion makes several changes, some of those changes might be immediately visible, but others might be
delayed; changes could appear out of order.
●● Consistent Prefix. This option ensures that changes will appear in order, although there may be a
delay before they become visible. In this period, applications may see old data.
Explore provisioning and deploying non-relational data services in Azure 177
●● Session. If an application makes a number of changes, they'll all be visible to that application, and in
order. Other applications may see old data, although any changes will appear in order, as they did for
the Consistent Prefix option. This form of consistency is sometimes known as read your own writes.
●● Bounded Staleness. There's a lag between writing and then reading the updated data. You specify
this staleness either as a period of time, or number of previous versions the data will be inconsistent
for.
●● Strong: In this case, all writes are only visible to clients after the changes are confirmed as written
successfully to all replicas. This option is unavailable if you need to distribute your data across multi-
ple global regions.
Eventual consistency provides the lowest latency and least consistency. Strong consistency results in the
highest latency but also the greatest consistency. You should select a default consistency level that
balances the performance and requirements of your applications.
You can change the default consistency for a Cosmos DB account using the Default consistency page in
the Azure portal. Applications can override the default consistency level for individual read operations.
However, they can't increase the consistency above that specified on this page; they can only decrease it.
General configuration
The Configuration page for a storage account enables you to modify some general settings of the
account. You can:
●● Enable or disable secure communications with the service. By default, all requests and responses are
encrypted by using the HTTPS protocol as they traverse the Internet. You can disable encryption if
required, although this isn't recommended.
●● Switch the default access tier between Cool and Hot.
●● Change the way in which the account is replicated.
178 Module 3 Explore non-relational data offerings on Azure
●● Enable or disable integration with Azure AD for requests that access file shares.
Other options, such as the account kind and performance tier, are displayed on this page for information
only; you can't change them.
Configure encryption
All data held in an Azure Storage account is automatically encrypted. By default, encryption is performed
using keys managed and owned by Microsoft. If you prefer, you can provide your own encryption keys.
To use your own keys, add them to Azure Key Vault. You then provide the details of the vault and key, or
the URI of the key in the vault. All new data will be encrypted as it's written. Existing data will be encrypt-
ed using a process running in the background; this process may take a little time.
Explore provisioning and deploying non-relational data services in Azure 179
29 https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-non-relational-data-services-azure/7-exercise-provision-non-
relational-azure
Explore provisioning and deploying non-relational data services in Azure 181
Knowledge check
Question 1
What is provisioning?
The act of running series of tasks that a service provider performs to create and configure a service.
Providing other users access to an existing service.
Tuning a service to improve performance.
Question 2
What is a security principal?
A named collection of permissions that can be granted to a service, such as the ability to use the
service to read, write, and delete data. In Azure, examples include Owner and Contributor.
A set of resources managed by a service to which you can grant access.
An object that represents a user, group, service, or managed identity that is requesting access to
Azure resources.
Question 3
Which of the following is an advantage of using multi-region replication with Cosmos DB?
Data will always be consistent in every region.
Availability is increased.
Increased security for your data.
Summary
Provisioning is the act of creating an instance of a service. Azure takes care of allocating the resources
required to run a service as part of the provisioning process. After you've provisioned a service, you can
then configure it to enable your applications and users to access the service.
In this lesson, you've learned how to:
●● Provision non-relational data services
●● Configure non-relational data services
●● Explore basic connectivity issues
●● Explore data security components
Learn more
●● What is Azure CLI30
●● Azure PowerShell documentation31
30 https://docs.microsoft.com/cli/azure/what-is-azure-cli
31 https://docs.microsoft.com/powershell/azure
182 Module 3 Explore non-relational data offerings on Azure
32 https://docs.microsoft.com/azure/azure-resource-manager/templates/overview
33 https://azure.microsoft.com/blog/analyze-and-visualize-your-data-with-azure-cosmos-db-notebooks/
34 https://docs.microsoft.com/azure/private-link/private-endpoint-overview
35 https://docs.microsoft.com/azure/cosmos-db/manage-sql-with-resource-manager
36 https://docs.microsoft.com/azure/storage/common/storage-network-security
37 https://docs.microsoft.com/azure/role-based-access-control/custom-roles-portal
38 https://docs.microsoft.com/azure/role-based-access-control/role-assignments-portal
39 https://docs.microsoft.com/azure/active-directory/fundamentals/active-directory-whatis
Manage non-relational data stores in Azure 183
Learning objectives
In this lesson, you will:
●● Upload data to a Cosmos DB database, and learn how to query this data.
●● Upload and download data in an Azure Storage account.
},
"address":
{
"street": "Main Street",
"number": "101",
"city": "Acity",
"state": "NY"
},
"ccOnFile": "yes",
"firstOrder": "02/28/2003"
}
## Document 2 ##
{
"customerID": "103249",
"name":
{
"title": "Mr",
"forename": "AAA",
"lastname": "BBB"
},
"address":
{
"street": "Another Street",
"number": "202",
"city": "Bcity",
"county": "Gloucestershire",
"country-region": "UK"
},
"ccOnFile": "yes"
}
Documents in a Cosmos DB database are organized into containers. The documents in a container are
grouped together into partitions. A partition holds a set of documents that share a common partition
key. You designate one of the fields in your documents as the partition key. Select a partition key that
collects all related documents together. This approach helps to reduce the amount of disk read opera-
tions that queries use when retrieving a set of documents for a given entity. For example, in a document
database for an ecommerce system recording the details of customers and the orders they've placed, you
could partition the data by customer ID, and store the customer and order details for each customer in
the same partition. To find all the information and orders for a customer, you simply need to query that
single partition:
Manage non-relational data stores in Azure 185
Cosmos DB is a foundational service in Azure. Cosmos DB is used by many of Microsoft's products for
mission critical applications running at global scale, including Skype, Xbox, Office 365, and Azure. Cosmos
DB is highly suitable for IoT and telematics, Retail and marketing, Gaming, and Web and mobile applica-
tions. For additional information about uses for Cosmos DB, read Common Azure Cosmos DB use
cases40.
40 https://docs.microsoft.com/azure/cosmos-db/use-cases
186 Module 3 Explore non-relational data offerings on Azure
Cosmos DB also provides other APIs that enable you to access these documents using the command sets
of other NoSQL database management systems. These APIs are:
●● Table API. This interface enables you to use the Azure Table Storage API to store and retrieve docu-
ments. The purpose of this interface is to enable you to switch from Table Storage to Cosmos DB
without requiring that you modify your existing applications.
●● MongoDB API. MongoDB is another well-known document database, with its own programmatic inter-
face. Many organizations use on-premises. You can use the MongoDB API for Cosmos DB to enable a
MongoDB application to run unchanged against a Cosmos DB database. You can migrate the data in
the MongoDB database to Cosmos DB running in the cloud, but continue to run your existing applica-
tions to access this data.
●● Cassandra API. Cassandra is a column family database management system. This is another database
management system that many organizations run on-premises. The Cassandra API for Cosmos DB
provides a Cassandra-like programmatic interface for Cosmos DB. Cassandra API requests are mapped
to Cosmos DB document requests. As with the MongoDB API, the primary purpose of the Cassandra
API is to enable you to quickly migrate Cassandra databases and applications to Cosmos DB.
●● Gremlin API. The Gremlin API implements a graph database interface to Cosmos DB. A graph is a
collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos
DB, but the Gremlin API enables you to perform graph queries over the data. Using the Gremlin API
you can walk through the objects and relationships in the graph to discover all manner of complex
relationships, such as “What is the name of the pet of Sam's landlord?” in the graph shown below.
The principal use of the Table, MongoDB, and Cassandra APIs is to support existing applications written
using these data stores. If you're building a new application and database, you should use the SQL API or
Gremlin API.
●● Use the Cosmos DB Data Migration tool41 to perform a bulk-load or transfer of data from another
data source.
●● Use Azure Data Factory42 to import data from another source.
●● Write a custom application that imports data using the Cosmos DB BulkExecutor43 library. This
strategy is beyond the scope of this module.
●● Create your own application that uses the functions available through the Cosmos DB SQL API client
library44 to store data. This approach is also beyond the scope of this module.
41 https://docs.microsoft.com/azure/cosmos-db/import-data
42 https://docs.microsoft.com/azure/data-factory/connector-azure-cosmos-db
43 https://docs.microsoft.com/azure/cosmos-db/tutorial-sql-api-dotnet-bulk-import
44 https://docs.microsoft.com/azure/cosmos-db/create-sql-api-dotnet-v4
45 https://aka.ms/csdmtool
188 Module 3 Explore non-relational data offerings on Azure
If you've already created the container, use the Scale settings of the database in the Data Explorer page
for your database in the Azure portal to specify the maximum throughput, or set the throughput to
Autoscale.
Once the data has been loaded, you may be able to reduce the throughput resources to lower the costs
of the database.
190 Module 3 Explore non-relational data offerings on Azure
Type Operator
Unary +,-,~, NOT
Arithmetic +,-,*,/,%
Bitwise |, &, ^, <<, >>, >>>
Logical AND, OR
Comparison =, !=, <, >, <=, >=, <>
String (concatenate) ||
Ternary (if) ?
The SQL API also supports:
●● The DISTINCT operator that you use as part of the SELECT clause to eliminate duplicates in the result
data.
●● The TOP operator that you can use to retrieve only the first few rows returned by a query that might
otherwise generate a large result set.
●● The BETWEEN operation that you use as part of the WHERE clause to define an inclusive range of
values. The condition field BETWEEN a AND b is equivalent to the condition field >= a AND field
<= b.
●● The IS_DEFINED operator that you can use for detecting whether a specified field exists in a docu-
ment.
The query below shows some examples using these operators.
// List all customer cities (remove duplicates) for customers living in
states with codes between AK (Alaska) and MD (Maryland)
SELECT DISTINCT c.Address.City
FROM c
WHERE c.Address.State BETWEEN "AK" AND "MD"
SELECT TOP 3 *
FROM c
ORDER BY c.Name
// Display the details of every customer for which the data of birth is
recorded
SELECT * FROM p
WHERE IS_DEFINED(p.DateOfBirth)
The SQL API also supports a large number of mathematical, trigonometric, string, array, and spatial
functions. For detailed information on the syntax of queries, and the functions and operators supported
by the Cosmos DB SQL API, visit the page Getting started with SQL queries in Azure Cosmos DB46 on
the Microsoft website.
46 https://docs.microsoft.com/azure/cosmos-db/sql-api-sql-query
Manage non-relational data stores in Azure 193
In the query pane that appears, you can enter a SQL query. Select Execute Query to run it. The results
will be displayed as a list of JSON documents
194 Module 3 Explore non-relational data offerings on Azure
You can save the query text if you need to repeat it in the future. The query is saved in a separate con-
tainer. You can retrieve it later using the Open Query command in the toolbar.
Manage non-relational data stores in Azure 195
NOTE: The Items page also lets you modify and delete documents. Select a document from the list to
display it in the main pane. You can modify any of the fields, and select Update to save the changes.
Select Delete to remove the document from the collection. The New Item command enables you to
manually add a new document to the collection. You can use the Upload Item to create new documents
from a file containing JSON data.
196 Module 3 Explore non-relational data offerings on Azure
On the Containers page, select + Container, and provide a name for the new container. You can also
specify the public access level. For a container that will be used to hold blobs, the most appropriate
access level is Blob. This setting supports anonymous read-only access for blobs. However, unauthenti-
cated clients can't list the blobs in the container. This means they can only download a blob if they know
its name and location within the container.
198 Module 3 Explore non-relational data offerings on Azure
47 https://docs.microsoft.com/cli/azure/storage/container?view=azure-cli-latest#az-storage-container-create
48 https://docs.microsoft.com/powershell/module/az.storage/new-azstoragecontainer
Manage non-relational data stores in Azure 199
On the page for the container, in the toolbar, select Upload. In the Upload blob dialog box, browse to
the file container the data to upload. The Advanced drop-down section provides options you can modify
the default options. For example, you can specify the name of a folder in the container (the folder will be
created if it doesn't exist), the type of blob, and the access tier. The blob that is created is named after the
file you uploaded.
NOTE: You can select multiple files. They will each be uploaded into seperate blobs.
49 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-upload
200 Module 3 Explore non-relational data offerings on Azure
--file "\data\racer_green_large.gif" \
--name "bikes\racer_green"
If you need to upload several files, use the az storage blob upload-batch command. This com-
mand takes the name of a local folder rather than a file name, and uploads the files in that folder to
separate blobs. The example below uploads all gif files in the data folder to the bikes folder in the images
container.
az storage blob upload-batch \
--account-name contosodata \
--source "\data" \
--pattern "*.gif" \
--destination "images\bikes"
Azure PowerShell doesn't currently include a batch blob upload command. If you need to upload multiple
files, you can write your own PowerShell script (use the Get-ChildItem cmdlet) to iterate through the
files and upload each one individually.
50 https://docs.microsoft.com/powershell/module/azure.storage/set-azurestorageblobcontent
Manage non-relational data stores in Azure 201
51 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-list
52 https://docs.microsoft.com/powershell/module/az.storage/Get-AzStorageBlob
202 Module 3 Explore non-relational data offerings on Azure
53 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-download
54 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-download-batch
55 https://docs.microsoft.com/powershell/module/az.storage/get-azstorageblobcontent
Manage non-relational data stores in Azure 203
Get-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Get-AzStorageBlobContent `
-Container "images" `
-Blob "bikes\racer_green_large.gif" `
-Destination "racer_green_large.gif"
WARNING: If you created the storage account with support for hierarchical namespaces (for Data Lake
Storage), the soft delete option isn't available. All blob delete operations will be final.
If you've enabled soft delete for the storage account, the blobs page listing the blobs in a container
includes the option Show deleted blobs. If you select this option, you can view and undelete a deleted
blob.
Manage non-relational data stores in Azure 205
56 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-delete
57 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-delete-batch
58 https://docs.microsoft.com/powershell/module/az.storage/remove-azstorageblob
206 Module 3 Explore non-relational data offerings on Azure
-Confirm
59 https://docs.microsoft.com/cli/azure/storage/container?view=azure-cli-latest#az-storage-container-delete
60 https://docs.microsoft.com/powershell/module/az.storage/remove-azstoragecontainer
Manage non-relational data stores in Azure 207
On the File shares page, select + File share. Give the file share a name, and optionally specify a quota.
Azure allows you to store up to 5 PiB of files across all files shares in a storage account. A quota enables
61 https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10
208 Module 3 Explore non-relational data offerings on Azure
you to limit the amount of space an individual file share consumes, to prevent it from starving other file
shares of file storage. If you have only one file share, you can leave the quota empty.
After you've created a share, you can use the Azure portal to add directories to the share, upload files to
the share, and delete the share. The Connect command generates a PowerShell script that you can run to
attach to the share from your local computer. You can then use the share as though it was a local disk
drive.
Manage non-relational data stores in Azure 209
A version of this utility is also available in the Azure portal, on the Overview page for an Azure Storage
account.
62 https://azure.microsoft.com/features/storage-explorer/
210 Module 3 Explore non-relational data offerings on Azure
To create a new file share, right-click File Shares, and then select Create file share. In the Azure portal,
Storage Explorer displays the same dialog box that you saw earlier. In the desktop version, you simply
enter a name for the new file share; you don't get the option to set a quota at this point.
As with the Azure portal, once you have created a new share, you can use Storage Explorer to create
folders, and upload and download files.
Manage non-relational data stores in Azure 211
However, if you need to transfer a significant number of files in and out of Azure File storage, you should
use the AzCopy utility. AzCopy is a command-line utility optimized for transferring large files (and blobs)
between your local computer and Azure File storage. It can detect transfer failures, and restart a failed
transfer at the point an error occurred - you don't have to repeat the entire operation.
Upload files
To transfer a single file into File Storage using AzCopy, use the form of the command shown in the
following example. Run this command from the command line. In this example, replace <storage-ac-
count-name> with the name of the storage account, replace <file-share> with the name of a file share in
this account, and replace <SAS-token> with the token you created using the Azure portal. You must
include the quotes where shown.
NOTE: Don't forget to include the copy keyword after the azcopy command. AzCopy supports other
operations, such as deleting files and blobs, listing files and blobs, and creating new file shares. Each of
these operations has its own keyword.
azcopy copy "myfile.txt" "https://<storage-account-name>.file.core.windows.
net/<file-share-name>/myfile.txt<SAS-token>"
You can transfer the entire contents of a local folder to Azure File storage using a similar command. You
replace the file name (“myfile.txt”) with the name of the folder. If the folder contains subfolders that you
want to copy, add the –recursive flag.
azcopy copy "myfolder" "https://<storage-account-name>.file.core.windows.
net/<file-share-name>/myfolder<SAS-token>" --recursive
INFO: Scanning...
INFO: Any empty folders will be processed, because source and destination
both support folders
When the transfer is complete, you'll see a summary of the work performed.
Job b86eeb8b-1f24-614e-6302-de066908d4a2 summary
Elapsed Time (Minutes): 0.6002
Number of File Transfers: 161
Number of Folder Property Transfers: 13
Total Number of Transfers: 174
Number of Transfers Completed: 174
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 43686370
Final Job Status: Completed
The AzCopy copy command has other options as well. For more information, see the page Upload files63
on the Microsoft website.
Download files
You can also use the AzCopy copy command to transfer files and folders from Azure File Storage to your
local computer. The command is similar to that for uploading files, except that you switch the order of
the arguments; specify the files and folders in the file share first, and the local files and folders second.
For example, to download the files from a folder named myfolder in a file share named myshare to a local
folder called localfolder, use the following command:
azcopy copy "https://<storage-account-name>.file.core.windows.net/myshare/
myfolder<SAS-token>" "localfolder" --recursive
For full details on downloading files using AzCopy, see Download files64.
63 https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-files?toc=/azure/storage/files/toc.json#upload-files
64 https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-files?toc=/azure/storage/files/toc.json#download-files
214 Module 3 Explore non-relational data offerings on Azure
●● A file share, in the same Azure Storage account, for holding product documentation.
In this lab, you'll upload data to these data stores. You'll run queries against the data in the Cosmos DB
database. Finally, you'll download and view the images and documents held in Azure Storage.
Go to the Exercise: Upload, download, and query data in a non-relational data store65 module on
Microsoft Learn, and follow the instructions in the module.
You'll perform this exercise using the Azure portal and the command line.
Summary
In this lesson, you've seen how to use Azure Cosmos DB and Azure Storage accounts to store and retrieve
non-relational data. You've learned how to:
●● Upload data to a Cosmos DB database, and query this data.
●● Upload and download data in an Azure Storage account.
Learn more
●● Common Azure Cosmos DB use cases66
●● Migrate normalized database schema from Azure SQL Database to Azure CosmosDB denormal-
ized container67
●● Tutorial: Use Data migration tool to migrate your data to Azure Cosmos DB68
●● Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory69
●● Quickstart: Build a console app using the .NET V4 SDK to manage Azure Cosmos DB SQL API
account resources70
●● Getting started with SQL queries71
●● az storage container create72
●● New-AzStorageContainer73
●● az storage blob upload74
●● Set-AzStorageBlobContent75
●● az storage blob list76
●● Get-AzStorageBlob77
●● az storage blob download78
65 https://docs.microsoft.com/en-us/learn/modules/explore-non-relational-data-stores-azure/6-exercise
66 https://docs.microsoft.com/azure/cosmos-db/use-cases
67 https://docs.microsoft.com/azure/data-factory/how-to-sqldb-to-cosmosdb
68 https://docs.microsoft.com/azure/cosmos-db/import-data
69 https://docs.microsoft.com/azure/data-factory/connector-azure-cosmos-db
70 https://docs.microsoft.com/azure/cosmos-db/create-sql-api-dotnet-v4
71 https://docs.microsoft.com/azure/cosmos-db/sql-api-sql-query
72 https://docs.microsoft.com/cli/azure/storage/container?view=azure-cli-latest#az-storage-container-create
73 https://docs.microsoft.com/powershell/module/az.storage/new-azstoragecontainer
74 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-upload
75 https://docs.microsoft.com/powershell/module/azure.storage/set-azurestorageblobcontent
76 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-list
77 https://docs.microsoft.com/powershell/module/az.storage/Get-AzStorageBlob
78 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-download
Manage non-relational data stores in Azure 215
79 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-download-batch
80 https://docs.microsoft.com/powershell/module/az.storage/get-azstorageblobcontent
81 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-delete
82 https://docs.microsoft.com/powershell/module/az.storage/remove-azstorageblob
83 https://docs.microsoft.com/cli/azure/storage/blob?view=azure-cli-latest#az-storage-blob-delete-batch
84 https://docs.microsoft.com/cli/azure/storage/container?view=azure-cli-latest#az-storage-container-delete
85 https://docs.microsoft.com/powershell/module/az.storage/remove-azstoragecontainer
86 https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10
87 https://azure.microsoft.com/features/storage-explorer/
88 https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-files
216 Module 3 Explore non-relational data offerings on Azure
Answers
Question 1
What are the elements of an Azure Table storage key?
Table name and column name
■■ Partition key and row key
Row number
Explanation
That's correct. The partition key identifies the partition in which a row is located, and the rows in each
partition are stored in row key order.
Question 2
When should you use a block blob, and when should you use a page blob?
Use a block blob for unstructured data that requires random access to perform reads and writes. Use
a page blob for discrete objects that rarely change.
Use a block blob for active data stored using the Hot data access tier, and a page blob for data stored
using the Cool or Archive data access tiers.
■■ Use a page block for blobs that require random read and write access. Use a block blob for discrete
objects that change infrequently.
Explanation
That's correct. Use a page block for blobs that require random read and write access. Use a block blob for
discrete objects that change infrequently.
Question 3
Why might you use Azure File storage?
To share files that are stored on-premises with users located at other sites.
■■ To enable users at different sites to share files.
To store large binary data files containing images or other unstructured data.
Explanation
That's correct. You can create a file share in Azure File storage, upload files to this file share, and grant
access to the file share to remote users.
Manage non-relational data stores in Azure 217
Question 4
You are building a system that monitors the temperature throughout a set of office blocks, and sets the
air conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has
to manage the air conditioning in several thousand buildings spread across the country/region, and each
building typically contains at least 100 air-conditioned rooms. What type of NoSQL data store is most
appropriate for capturing the temperature data to enable it to be processed quickly?
■■ Send the data to an Azure Cosmos DB database and use Azure Functions to process the data.
Store the data in a file stored in a share created using Azure File Storage.
Write the temperatures to a blob in Azure Blob storage.
Explanation
That's correct. Cosmos DB can ingest large volumes of data rapidly. A thermometer in each room can send
the data to a Cosmos DB database. You can arrange for an Azure Function to run as each item is stored.
The function can examine the temperature, and kick off a remote process to configure the air conditioning
in the room.
Question 1
What is provisioning?
■■ The act of running series of tasks that a service provider performs to create and configure a service.
Providing other users access to an existing service.
Tuning a service to improve performance.
Explanation
That's correct. In Azure, you must provision a service before you can use it.
Question 2
What is a security principal?
A named collection of permissions that can be granted to a service, such as the ability to use the
service to read, write, and delete data. In Azure, examples include Owner and Contributor.
A set of resources managed by a service to which you can grant access.
■■ An object that represents a user, group, service, or managed identity that is requesting access to
Azure resources.
Explanation
That's correct. Azure authentication uses security principles to help determine whether a request to access a
service should be granted.
Question 3
Which of the following is an advantage of using multi-region replication with Cosmos DB?
Data will always be consistent in every region.
■■ Availability is increased.
Increased security for your data.
Explanation
That's correct. Replication improves availability. If one region becomes inaccessible, the data is still available
in other regions.
Module 4 Explore modern data warehouse an-
alytics
Learning objectives
In this lesson, you will:
●● Explore data warehousing concepts
●● Explore Azure data services for modern data warehousing
●● Explore modern data warehousing architecture and workload
●● Explore Azure data services in the Azure portal
220 Module 4 Explore modern data warehouse analytics
https://www.microsoft.com/videoplayer/embed/RE4A3RR
The next unit describes each of these services in a little more detail.
quantities of data. All Apache Hadoop environments can access data in Azure Data Lake Storage
Gen2.
In an Azure Data Services data warehouse solution, data is typically loaded into Azure Data Lake Storage
before being processed into a structure that enables efficient analysis in Azure Synapse Analytics. You can
use a service such as Azure Data Factory (described above) to ingest and load the data from a variety of
sources into Azure Data Lake Storage.
Azure Databricks also supports structured stream processing. In this model, Databricks performs your
computations incrementally, and continuously updates the result as streaming data arrives.
overhead of fetching and converting it each time. You can also use this data as input to further analytical
processing, using Azure Analysis Services.
Azure Synapse Analytics leverages a massively parallel processing (MPP) architecture. This architecture
includes a control node and a pool of compute nodes.
The Control node is the brain of the architecture. It's the front end that interacts with all applications. The
MPP engine runs on the Control node to optimize and coordinate parallel queries. When you submit a
processing request, the Control node transforms it into smaller requests that run against distinct subsets
of the data in parallel.
The Compute nodes provide the computational power. The data to be processed is distributed evenly
across the nodes. Users and applications send processing requests to the control node. The control node
sends the queries to compute nodes, which run the queries over the portion of the data that they each
hold. When each node has finished its processing, the results are sent back to the control node where
they're combined into an overall result.
Azure Synapse Analytics supports two computational models: SQL pools and Spark pools.
In a SQL pool, each compute node uses an Azure SQL Database and Azure Storage to handle a portion of
the data.
226 Module 4 Explore modern data warehouse analytics
You submit queries in the form of Transact-SQL statements, and Azure Synapse Analytics runs them.
However, unlike an ordinary SQL Server database engine, Azure Synapse Analytics can receive data from
a wide variety of sources. To do this, Azure Synapse Analytics uses a technology named PolyBase1.
PolyBase enables you to retrieve data from relational and non-relational sources, such as delimited text
files, Azure Blob Storage, and Azure Data Lake Storage. You can save the data read in as SQL tables within
the Synapse Analytics service.
1 https://docs.microsoft.com/sql/relational-databases/polybase/polybase-guide
Examine components of a modern data warehouse 227
You specify the number of nodes when you create a SQL pool. You can scale the SQL pool manually to
add or remove compute nodes as necessary.
NOTE: You can only scale a SQL pool when it's not running a Transact-SQL query.
In a Spark pool, the nodes are replaced with a Spark cluster. You run Spark jobs comprising code written
in Notebooks, in the same way as Azure Databricks. You can write the code for notebook in C#, Python,
Scala, or Spark SQL (a different dialect of SQL from Transact-SQL). As with a SQL pool, the Spark cluster
splits the work out into a series of parallel tasks that can be performed concurrently. You can save data
generated by your notebooks in Azure Storage or Data Lake Storage.
NOTE: Spark is optimized for in-memory processing. A Spark job can load and cache data into memory
and query it repeatedly. In-memory computing is much faster than disk-based applications, but requires
additional memory resources.
You specify the number of nodes when you create the Spark cluster. Spark pools can have autoscaling
enabled, so that pools scale by adding or removing nodes as needed. Autoscaling can occur while
processing is active.
NOTE: Azure Synapse Analytics can consume a lot of resources. If you aren't planning on performing any
processing for a while, you can pause the service. This action releases the resources in the pool to other
users, and reduces your costs.
Knowledge check
Question 1
When should you use Azure Synapse Analytics?
To perform very complex queries and aggregations
To create dashboards from tabular data
To enable large number of users to query analytics data
Question 2
What is the purpose of data ingestion?
To perform complex data transformations over data received from external sources
To capture data flowing into a data warehouse system as quickly as possible
To visualize the results of data analysis
Question 3
What is the primary difference between a data lake and a data warehouse?
A data lake contains structured information, but a data warehouse holds raw business data
A data lake holds raw data, but a data warehouse holds structured information
Data stored in a data lake is dynamic, but information stored in a data warehouse is static
Summary
This lesson has described how a data warehouse solution works, and given you an overview of the
services you can use to construct a modern data warehouse in Azure.
In this lesson, you've seen how to:
●● Explore data warehousing concepts
●● Explore Azure data services for modern data warehousing
●● Explore modern data warehousing architecture and workload
●● Explore Azure data services in the Azure portal
Learn more
●● Data Factory2
●● Azure Data Lake Storage3
●● Azure Databricks4
●● Azure Synapse Analytics5
2 https://azure.microsoft.com/services/data-factory/
3 https://azure.microsoft.com/services/storage/data-lake-storage/
4 https://azure.microsoft.com/services/databricks/
5 https://azure.microsoft.com/services/synapse-analytics/
230 Module 4 Explore modern data warehouse analytics
6 https://docs.microsoft.com/azure/analysis-services/analysis-services-overview
7 https://docs.microsoft.com/power-bi/fundamentals/power-bi-overview
8 https://azure.microsoft.com/services/hdinsight/
9 https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute
10 https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading
11 https://docs.microsoft.com/sql/relational-databases/polybase/polybase-guide
Explore data ingestion in Azure 231
Learning objectives
In this lesson, you will:
●● Describe data ingestion in Azure
●● Describe components of Azure Data Factory
●● See how to use Azure Data Factory to load data into a data warehouse
Azure Data Factory uses a number of different resources: linked services, datasets, and pipelines. The
following sections describe how Data Factory uses these resources.
Understand datasets
A dataset in Azure Data Factory represents the data that you want to ingest (input) or store (output). If
your data has a structure, a dataset specifies how the data is structured. Not all datasets are structured.
Blobs held in Azure Blob storage are an example of unstructured data.
A dataset connects to an input or an output using a linked service. For example, if you're reading and
processing data from Azure Blob storage, you'd create an input dataset that uses a Blob Storage linked
service to specify the details of the storage account. The dataset would specify which blob to ingest, and
the format of the information in the blob (binary data, JSON, delimited text, and so on). If you're using
Azure Data Factory to store data in a table in a SQL database, you would define an output dataset that
uses a SQL Database linked service to connect to the database, and specifies which table to use in that
database.
Explore data ingestion in Azure 235
Understand pipelines
A pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define
actions to perform on your data. For example, you might use a copy activity to transform data from a
source dataset to a destination dataset. You could include activities that transform the data as it is
236 Module 4 Explore modern data warehouse analytics
transferred, or you might combine data from multiple sources together. Other activities enable you to
incorporate processing elements from other services. For example, you might use an Azure Function
activity to run an Azure Function to modify and filter data, or an Azure Databricks Notebook activity to run
a notebook that performs more advanced processing.
Pipelines don't have to be linear. You can include logic activities that repeatedly perform a series of tasks
while some condition is true using a ForEach activity, or follow different processing paths depending on
the outcome of previous processing using an If Condition activity.
Sometimes when ingesting data, the data you're bringing in can have different column names and data
types to those required by the output. In these cases, you can use a mapping to transform your data from
the input format to the output format. The screenshot below shows the mapping canvas for the Copy
Data activity. It illustrates how the columns from the input data can be mapped to the data format
required by the output.
Explore data ingestion in Azure 237
You can run a pipeline manually, or you can arrange for it to be run later using a trigger. A trigger enables
you to schedule a pipeline to occur according to a planned schedule (every Saturday evening, for exam-
ple), or at repeated intervals (every few minutes or hours), or when an event occurs such as the arrival of
a file in Azure Data Lake Storage, or the deletion of a blob in Azure Blob Storage.
Azure Data Factory provides PolyBase support for loading data. For instance, Data Factory can directly
invoke PolyBase on your behalf if your data is in a PolyBase-compatible data store.
You can use the graphical SSIS tools to create solutions without writing a single line of code. You can also
program the extensive Integration Services object model to create packages programmatically and code
custom tasks and other package objects.
SSIS is an on-premises utility. However, Azure Data factory allows you to run your existing SSIS packages
as part of a pipeline in the cloud. This allows you to get started quickly without having to rewrite your
existing transformation logic.
The SSIS Feature Pack for Azure is an extension that provides components that connect to Azure servic-
es, transfer data between Azure and on-premises data sources, and process data stored in Azure. The
components in the feature pack support transfer to or from Azure storage, Azure Data Lake, and Azure
HDInsight. Using these components, you can perform large-scale processing of ingested data.
https://www.microsoft.com/videoplayer/embed/RE4Asf7
Knowledge check
Question 1
Which component of an Azure Data Factory can be triggered to run data ingestion tasks?
CSV File
Pipeline
Linked service
Question 2
When might you use PolyBase?
To query data from external data sources from Azure Synapse Analytics
To ingest streaming data using Azure Databricks
To orchestrate activities in Azure Data Factory
Question 3
Which of these services can be used to ingest data into Azure Synapse Analytics?
Azure Data Factory
Power BI
Azure Active Directory
Summary
In this lesson, you've learned about tools for ingesting data into an Azure database. You've seen how to
use Azure Data Factory to read, process, and store data in a data warehouse.
Explore data ingestion in Azure 241
Learn more
●● Pipelines and activities in Azure Data Factory12
●● Quickstart: Create a data factory by using the Azure Data Factory UI13
●● Azure SQL Data Warehouse is now Azure Synapse Analytics14
●● Automated enterprise BI with Azure Synapse Analytics and Azure Data Factory15
12 https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities
13 https://docs.microsoft.com/azure/data-factory/quickstart-create-data-factory-portal
14 https://azure.microsoft.com/blog/azure-sql-data-warehouse-is-now-azure-synapse-analytics/
15 https://docs.microsoft.com/azure/architecture/reference-architectures/data/enterprise-bi-adf
242 Module 4 Explore modern data warehouse analytics
Learning objectives
In this lesson, you'll:
●● Describe data processing options for performing analytics in Azure
●● Explore Azure Synapse Analytics
Spark libraries provided with Azure Synapse Analytics enable you to read data from external sources,
and also write out data in a variety of different formats if you need to save your results for further
analysis.
Azure Synapse Analytics uses a clustered architecture. Each cluster has a control node that is used as the
entry point to the system. When you run Transact-SQL statements or start Spark jobs from a notebook,
the request is sent to the control node. The control node runs a parallel processing engine that splits the
operation into a set of tasks that can be run concurrently. Each task performs part of the workload over a
subset of the source data. Each task is sent to a compute node to actually do the processing. The control
node gathers the results from the compute nodes and combines them into an overall result.
244 Module 4 Explore modern data warehouse analytics
The next unit describes the components of Azure Synapse Analytics in more detail.
For further information, read What is Azure Synapse Analytics?16
16 https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is
Explore data storage and processing in Azure 245
17 https://docs.microsoft.com/azure/azure-databricks/what-is-azure-databricks
246 Module 4 Explore modern data warehouse analytics
Like Map/Reduce jobs, Spark jobs are parallelized into a series of subtasks tasks that run on the cluster.
You can write Spark jobs as part of an application, or you can use interactive notebooks. These notebooks
are the same as those that you can run from Azure Databricks. Spark includes libraries that you can use to
read and write data in a wide variety of data stores (not just HDFS). For example, you can connect to
relational databases such as Azure SQL Database, and other services such as Azure Cosmos DB.
Apache Hive provides interactive SQL-like facilities for querying, aggregating, and summarizing data. The
data can come from many different sources. Queries are converted into tasks, and parallelized. Each task
can run on a separate node in the HDInsight cluster, and the results are combined before being returned
to the user.
Apache Kafka is a clustered streaming service that can ingest data in real time. It's a highly scalable
solution that offers publish and subscribe features.
Apache Storm is a scalable, fault tolerant platform for running real-time data processing applications.
Storm can process high volumes of streaming data using comparatively modest computational require-
ments. Storm is designed for reliability, so that events shouldn't be lost. Storm solutions can also provide
guaranteed processing of data, with the ability to replay data that wasn't successfully processed the first
time. Storm can interoperate with a variety of event sources, including Azure Event Hubs, Azure IoT Hub,
Apache Kafka, and RabbitMQ (a message queuing service). Storm can also write to data stores such as
HDFS, Hive, HBase, Redis, and SQL databases. You write a Storm application using the APIs provided by
Apache.
Explore data storage and processing in Azure 247
18 https://docs.microsoft.com/azure/hdinsight/hdinsight-overview
248 Module 4 Explore modern data warehouse analytics
To extract insights, the company wants to process the joined data by using a Spark cluster in the cloud
(using Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure
Synapse Analytics. The company can use the information in the data warehouse to generate and publish
reports. They want to automate this workflow, and monitor and manage it on a daily schedule. They also
want to execute it when files land in a blob store container.
Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can
ingest data from the disparate data stores used by the gaming company. You can build complex ETL
processes that transform data visually with data flows or by using compute services such as Azure
HDInsight, Azure Databricks, and Azure SQL Database. You can then publish the transformed data to
Azure Synapse Analytics for business intelligence applications to consume.
A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a
pipeline perform a task. For example, a pipeline might contain a series of activities that ingests raw data
from Azure Blob storage, and then runs a Hive query on an HDInsight cluster to partition the data and
store the results in a Cosmos DB database.
Linux, you can use POSIX-style permissions to grant read, write, and search access based on file owner-
ship and group membership of users.
Services such as Azure Data Factory, Azure Databricks, Azure HDInsight, Azure Data Lake Analytics, and
Azure Stream Analytics can read and write Data Lake Store directly.
@maxPrices =
SELECT Ticker, MAX(Price) AS MaxPrice
FROM @priceData
GROUP BY Ticker;
OUTPUT @maxPrices
TO "/output/MaxPrices.csv"
USING Outputters.Csv(outputHeader: true);
It's important to understand that the U-SQL code only provides a description of the work to be per-
formed. Azure Data Lake Analytics determines how best to actually carry out this work. Data Lake Analyt-
ics takes the U-SQL description of a job, parses it to make sure it is syntactically correct, and then com-
piles it into an internal representation. Data Lake Analytics then breaks down this internal representation
into stages of execution. Each stage performs a task, such as extracting the data from a specified source,
dividing the data into partitions, processing the data in each partition, aggregating the results in a
partition, and then combining the results from across all partitions. Partitioning is used to improve
parallelization, and the processing for different partitions is performed concurrently on different process-
ing nodes. The data for each partition is determined by the U-SQL compiler, according to the way in
which the job retrieves and processes the data.
A U-SQL job can output results to a single CSV file, partition the results across multiple files, or can write
to other destinations. For example, Data Lake Analytics enables you to create custom outputters if you
want to save data in a particular format (such as XML or HTML). You can also write data to the Data Lake
250 Module 4 Explore modern data warehouse analytics
Catalog. The catalog provides a SQL-like interface to Data Lake Storage, enabling you to create tables,
and views, and run INSERT, UPDATE, and DELETE statements against these tables and views.
NOTE: Any data stored in Azure Synapse Analytics can be used to build and train models with Azure
Machine Learning.
The following sections describe each of these elements in more detail.
the pool. However, you can't run any queries until the pool is resumed. Resuming a pool can take several
minutes.
19 https://azure.microsoft.com/services/machine-learning/
254 Module 4 Explore modern data warehouse analytics
For more information, read Pipelines and activities in Azure Data Factory20.
20 https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities
Explore data storage and processing in Azure 255
Business analysts, data engineers, and data scientists can now use Synapse Spark pools or Synapse SQL
pools to run near real-time business intelligence, analytics, and machine learning pipelines. You can
achieve this without impacting the performance of your transactional workloads on Azure Cosmos DB.
Synapse link has a wide range of uses, including:
●● Supply chain analytics and forecasting. You can query operational data directly and use it to build
machine learning models. You can use the results generated by these models back into Cosmos DB
for near-real-time scoring. You can use these assessments to successively refine the models and
generate more accurate forecasts.
●● Operational reporting. You can use Synapse Analytics to query operational data using Transact-SQL
running in a SQL pool. You can publish the results to dashboards using the support provided to
familiar tools such as Microsoft Power BI.
●● Batch data integration and orchestration. With supply chains getting more complex, supply chain data
platforms need to integrate with a variety of data sources and formats. The Azure Synapse data
integration engine allows data engineers to create rich data pipelines without requiring a separate
orchestration engine.
●● Real-time personalization. You can build engaging ecommerce solutions that allow retailers to gener-
ate personalized recommendations and special offers for customers in real time.
●● IoT maintenance. Industrial IoT innovations have drastically reduced downtimes of machinery and
increased overall efficiency across all fields of industry. One such innovation is predictive maintenance
analytics for machinery at the edge of the cloud. The historical operational data from IoT device
sensors could be used to train predictive models such as anomaly detectors. These anomaly detectors
are then deployed back to the edge for real-time monitoring. Looping back allows for continuous
retraining of the predictive models.
You can access Synapse Studio directly from the Azure portal.
Knowledge check
Question 1
You have a large amount of data held in files in Azure Data Lake storage. You want to retrieve the data in
these files and use it to populate tables held in Azure Synapse Analytics. Which processing option is most
appropriate?
Use Azure Synapse Link to connect to Azure Data Lake storage and download the data
Synapse SQL pool
Synapse Spark pool
Question 2
Which of the components of Azure Synapse Analytics allows you to train AI models using AzureML?
Synapse Studio
Synapse Pipelines
Synapse Spark
Explore data storage and processing in Azure 257
Question 3
In Azure Databricks how do you change the language a cell uses?
The first line in the cell is %language. For example, %scala.
Change the notebook language before writing the commands
Wrap the command in the cell with ##language##.
Summary
In this lesson, you learned about:
●● Data processing options for performing analytics in Azure
●● Azure Synapse Analytics
Learn more
●● What is Azure Databricks?21
●● What is Azure Synapse Analytics?22
●● What is Azure HDInsight?23
●● Azure Machine Learning (AzureML)24
●● Pipelines and activities in Azure Data Factory25
●● Tutorial: Extract, transform, and load data by using Azure Databricks26
●● Azure Synapse Link for Azure Cosmos DB: Near real-time analytics use cases27
21 https://docs.microsoft.com/azure/azure-databricks/what-is-azure-databricks
22 https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is
23 https://docs.microsoft.com/azure/hdinsight/hdinsight-overview
24 https://azure.microsoft.com/services/machine-learning/
25 https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities
26 https://docs.microsoft.com/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
27 https://docs.microsoft.com/azure/cosmos-db/synapse-link-use-cases
258 Module 4 Explore modern data warehouse analytics
Power BI can be simple and fast, capable of creating quick insights from an Excel workbook or a local
database. But Power BI is also robust and enterprise-grade, ready not only for extensive modeling and
real-time analytics, but also for custom development. Therefore, it can be your personal report and
visualization tool, but can also serve as the analytics and decision engine behind group projects, divisions,
or entire corporations.
If you're a beginner with Power BI, this lesson will get you going. If you're a Power BI veteran, this lesson
will tie concepts together and fill in the gaps.
These three elements—Desktop, the service, and Mobile apps—are designed to let people create, share,
and consume business insights in the way that serves them, or their role, most effectively.
Use Power BI
Now that we've introduced the basics of Microsoft Power BI, let's jump into some hands-on experiences
and a guided tour.
The activities and analyses that you'll learn with Power BI generally follow a common flow. The common
flow of activity looks like this:
1. Bring data into Power BI Desktop, and create a report.
2. Publish to the Power BI service, where you can create new visualizations or build dashboards.
3. Share dashboards with others, especially people who are on the go.
4. View and interact with shared dashboards and reports in Power BI Mobile apps.
28 https://go.microsoft.com/fwlink/?linkid=2101313
29 https://docs.microsoft.com/power-bi/consumer/end-user-sign-in
Get started building with Power BI 261
As mentioned earlier, you might spend all your time in the Power BI service, viewing visuals and reports
that have been created by others. And that's fine. Someone else on your team might spend their time in
Power BI Desktop, which is fine too. To help you understand the full continuum of Power BI and what it
can do, we'll show you all of it. Then you can decide how to use it to your best advantage.
So, let's jump in and step through the experience. Your first order of business is to learn the basic build-
ing blocks of Power BI, which will provide a solid basis for turning data into cool reports and visuals.
Visualizations
A visualization (sometimes also referred to as a visual) is a visual representation of data, like a chart, a
color-coded map, or other interesting things you can create to represent your data visually. Power BI has
all sorts of visualization types, and more are coming all the time. The following image shows a collection
of different visualizations that were created in the Power BI service.
Visualizations can be simple, like a single number that represents something significant, or they can be
visually complex, like a gradient-colored map that shows voter sentiment about a certain social issue or
concern. The goal of a visual is to present data in a way that provides context and insights, both of which
would probably be difficult to discern from a raw table of numbers or text.
Datasets
A dataset is a collection of data that Power BI uses to create its visualizations.
You can have a simple dataset that's based on a single table from a Microsoft Excel workbook, similar to
what's shown in the following image.
Get started building with Power BI 263
Datasets can also be a combination of many different sources, which you can filter and combine to
provide a unique collection of data (a dataset) for use in Power BI.
For example, you can create a dataset from three database fields, one website table, an Excel table, and
online results of an email marketing campaign. That unique combination is still considered a single
dataset, even though it was pulled together from many different sources.
Filtering data before bringing it into Power BI lets you focus on the data that matters to you. For example,
you can filter your contact database so that only customers who received emails from the marketing
campaign are included in the dataset. You can then create visuals based on that subset (the filtered
collection) of customers who were included in the campaign. Filtering helps you focus your data—and
your efforts.
An important and enabling part of Power BI is the multitude of data connectors that are included.
Whether the data you want is in Excel or a Microsoft SQL Server database, in Azure or Oracle, or in a
service like Facebook, Salesforce, or MailChimp, Power BI has built-in data connectors that let you easily
connect to that data, filter it if necessary, and bring it into your dataset.
After you have a dataset, you can begin creating visualizations that show different portions of it in
different ways, and gain insights based on what you see. That's where reports come in.
Reports
In Power BI, a report is a collection of visualizations that appear together on one or more pages. Just like
any other report you might create for a sales presentation or write for a school assignment, a report in
Power BI is a collection of items that are related to each other. The following image shows a report in
Power BI Desktop—in this case, it's the second page in a five-page report. You can also create reports in
the Power BI service.
264 Module 4 Explore modern data warehouse analytics
Reports let you create many visualizations, on multiple pages if necessary, and let you arrange those
visualizations in whatever way best tells your story.
You might have a report about quarterly sales, product growth in a particular segment, or migration
patterns of polar bears. Whatever your subject, reports let you gather and organize your visualizations
onto one page (or more).
Dashboards
When you're ready to share a single page from a report, or a collection of visualizations, you create a
dashboard. Much like the dashboard in a car, a Power BI dashboard is a collection of visuals from a
single page that you can share with others. Often, it's a selected group of visuals that provide quick
insight into the data or story you're trying to present.
A dashboard must fit on a single page, often called a canvas (the canvas is the blank backdrop in Power
BI Desktop or the service, where you put visualizations). Think of it like the canvas that an artist or painter
uses—a workspace where you create, combine, and rework interesting and compelling visuals.
You can share dashboards with other users or groups, who can then interact with your dashboards when
they're in the Power BI service or on their mobile device.
Tiles
In Power BI, a tile is a single visualization on a report or a dashboard. It's the rectangular box that holds
an individual visual. In the following image, you see one tile, which is also surrounded by other tiles.
Get started building with Power BI 265
When you're creating a report or a dashboard in Power BI, you can move or arrange tiles however you
want. You can make them bigger, change their height or width, and snuggle them up to other tiles.
When you're viewing, or consuming, a dashboard or report—which means you're not the creator or
owner, but the report or dashboard has been shared with you—you can interact with it, but you can't
change the size of the tiles or their arrangement.
Whether your data insights require straightforward or complex datasets, Power BI helps you get started
quickly and can expand with your needs to be as complex as your world of data requires. And because
Power BI is a Microsoft product, you can count on it being robust, extensible, Microsoft Office–friendly,
and enterprise-ready.
Now let's see how this works. We'll start by taking a quick look at the Power BI service.
The canvas (the area in the center of the Power BI service) shows you the available sources of data in the
Power BI service. In addition to common data sources like Microsoft Excel files, databases, or Microsoft
Azure data, Power BI can just as easily connect to a whole assortment of software services (also called
SaaS providers or cloud services): Salesforce, Facebook, Google Analytics, and more.
For these software services, the Power BI service provides a collection of ready-made visuals that are
pre-arranged on dashboards and reports for your organization. This collection of visuals is called an app.
Apps get you up and running quickly, with data and dashboards that your organization has created for
you. For example, when you use the GitHub app, Power BI connects to your GitHub account (after you
provide your credentials) and then populates a predefined collection of visuals and dashboards in Power
BI.
There are apps for all sorts of online services. The following image shows a page of apps that are availa-
ble for different online services, in alphabetical order. This page is shown when you select the Get button
in the Services box (shown in the previous image). As you can see from the following image, there are
many apps to choose from.
268 Module 4 Explore modern data warehouse analytics
For our purposes, we'll choose GitHub. GitHub is an application for online source control. When you
select the Get it now button in the box for the GitHub app, the Connect to GitHub dialog box appears.
Note that Github does not support Internet Explorer, so make sure you are working in another browser.
Get started building with Power BI 269
After you enter the information and credentials for the GitHub app, installation of the app begins.
After the data is loaded, the predefined GitHub app dashboard appears.
270 Module 4 Explore modern data warehouse analytics
In addition to the app dashboard, the report that was generated (as part of the GitHub app) and used to
create the dashboard is available, as is the dataset (the collection of data pulled from GitHub) that was
created during data import and used to create the GitHub report.
On the dashboard, you can select any of the visuals and interact with them. As you do so, all the other
visuals on the page will respond. For example, when the May 2018 bar is selected in the Pull Requests
(by month) visual, the other visuals on the page adjust to reflect that selection.
Get started building with Power BI 271
The Datasets tab is selected on the Settings page that appears. In the right pane, select the arrow next
to Scheduled refresh to expand that section. The Settings dialog box appears on the canvas, letting you
set the update settings that meet your needs.
That's enough for our quick look at the Power BI service. There are many more things you can do with the
service, and there are many types of data you can connect to, and all sorts of apps, with more of both
coming all the time.
Knowledge check
Question 1
What is the common flow of activity in Power BI?
Create a report in Power BI mobile, share it to the Power BI Desktop, view and interact in the Power BI
service.
Create a report in the Power BI service, share it to Power BI mobile, interact with it in Power BI Desk-
top.
Bring data into Power BI Desktop and create a report, share it to the Power BI service, view and
interact with reports and dashboards in the service and Power BI mobile.
Bring data into Power BI mobile, create a report, then share it to Power BI Desktop.
Get started building with Power BI 273
Question 2
Which of the following are building blocks of Power BI?
Tiles, dashboards, databases, mobile devices.
Visualizations, datasets, reports, dashboards, tiles.
Visual Studio, C#, and JSON files.
Question 3
A collection of ready-made visuals, pre-arranged in dashboards and reports is called what in Power BI?
The canvas.
Scheduled refresh.
An app.
Summary
Let's do a quick review of what we covered in this lesson.
Microsoft Power BI is a collection of software services, apps, and connectors that work together to turn
your data into interactive insights. You can use data from single basic sources, like a Microsoft Excel work-
book, or pull in data from multiple databases and cloud sources to create complex datasets and reports.
Power BI can be as straightforward as you want or as enterprise-ready as your complex global business
requires.
Power BI consists of three main elements—Power BI Desktop, the Power BI service, and Power BI
Mobile—which work together to let you create, interact with, share, and consume your data the way you
want.
Answers
Question 1
When should you use Azure Synapse Analytics?
■■ To perform very complex queries and aggregations
To create dashboards from tabular data
To enable large number of users to query analytics data
Explanation
That's correct. Azure Synapse Analytics is suitable for performing compute-intensive tasks such as these.
Question 2
What is the purpose of data ingestion?
To perform complex data transformations over data received from external sources
■■ To capture data flowing into a data warehouse system as quickly as possible
To visualize the results of data analysis
Explanation
That's correct. Data ingestion can receive data from multiple sources, including streams, and must run
quickly enough so that it doesn't lose any incoming data.
Question 3
What is the primary difference between a data lake and a data warehouse?
A data lake contains structured information, but a data warehouse holds raw business data
■■ A data lake holds raw data, but a data warehouse holds structured information
Data stored in a data lake is dynamic, but information stored in a data warehouse is static
Explanation
That's correct. A data warehousing solution converts the raw data in a data lake into meaningful business
information in a data warehouse.
Question 1
Which component of an Azure Data Factory can be triggered to run data ingestion tasks?
CSV File
■■ Pipeline
Linked service
Explanation
That's correct. Pipelines can be triggered to run activities for ingesting data.
276 Module 4 Explore modern data warehouse analytics
Question 2
When might you use PolyBase?
■■ To query data from external data sources from Azure Synapse Analytics
To ingest streaming data using Azure Databricks
To orchestrate activities in Azure Data Factory
Explanation
That's correct. This is the purpose of PolyBase
Question 3
Which of these services can be used to ingest data into Azure Synapse Analytics?
■■ Azure Data Factory
Power BI
Azure Active Directory
Explanation
That's correct. Azure Data Factory can be used to ingest data into Azure Synapse Analytics from almost any
source.
Question 1
You have a large amount of data held in files in Azure Data Lake storage. You want to retrieve the data in
these files and use it to populate tables held in Azure Synapse Analytics. Which processing option is most
appropriate?
Use Azure Synapse Link to connect to Azure Data Lake storage and download the data
■■ Synapse SQL pool
Synapse Spark pool
Explanation
That's correct. You can use PolyBase from a SQL pool to connect to the files in Azure Data Lake as external
tables, and then ingest the data.
Question 2
Which of the components of Azure Synapse Analytics allows you to train AI models using AzureML?
Synapse Studio
Synapse Pipelines
■■ Synapse Spark
Explanation
That's correct. You would use a notebook to ingest and shape data, and then use SparkML and AzureML to
train models with it.
Get started building with Power BI 277
Question 3
In Azure Databricks how do you change the language a cell uses?
■■ The first line in the cell is %language. For example, %scala.
Change the notebook language before writing the commands
Wrap the command in the cell with ##language##.
Explanation
That's correct. Each cell can start with a language definition.
Question 1
What is the common flow of activity in Power BI?
Create a report in Power BI mobile, share it to the Power BI Desktop, view and interact in the Power BI
service.
Create a report in the Power BI service, share it to Power BI mobile, interact with it in Power BI Desk-
top.
■■ Bring data into Power BI Desktop and create a report, share it to the Power BI service, view and
interact with reports and dashboards in the service and Power BI mobile.
Bring data into Power BI mobile, create a report, then share it to Power BI Desktop.
Explanation
That's correct. The Power BI service lets you view and interact with reports and dashboards, but doesn't let
you shape data.
Question 2
Which of the following are building blocks of Power BI?
Tiles, dashboards, databases, mobile devices.
■■ Visualizations, datasets, reports, dashboards, tiles.
Visual Studio, C#, and JSON files.
Explanation
That's correct. Building blocks for Power BI are visualizations, datasets, reports, dashboards, tiles.
Question 3
A collection of ready-made visuals, pre-arranged in dashboards and reports is called what in Power BI?
The canvas.
Scheduled refresh.
■■ An app.
Explanation
That's correct. An app is a collection of ready-made visuals, pre-arranged in dashboards and reports. You
can get apps that connect to many online services from the AppSource.