100% found this document useful (2 votes)
881 views142 pages

Course Presentation DP 900 AzureDataFundamentals

This document provides an overview of the Microsoft Azure Data Fundamentals (DP-900) certification exam. It discusses that the exam expects candidates to understand 40+ Azure services and how to choose the appropriate data format and storage for different situations. The goal of this training course is to help candidates pass the certification exam and start their cloud journey with Azure by teaching them to make the right data decisions. It recommends an active learning approach of taking notes and reviewing materials to help remember the various Azure services over time.

Uploaded by

Padma Latha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
881 views142 pages

Course Presentation DP 900 AzureDataFundamentals

This document provides an overview of the Microsoft Azure Data Fundamentals (DP-900) certification exam. It discusses that the exam expects candidates to understand 40+ Azure services and how to choose the appropriate data format and storage for different situations. The goal of this training course is to help candidates pass the certification exam and start their cloud journey with Azure by teaching them to make the right data decisions. It recommends an active learning approach of taking notes and reviewing materials to help remember the various Azure services over time.

Uploaded by

Padma Latha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

DP-900

Azure Data Fundamentals


https://links.in28minutes.com/dp-900

1
Getting Started

Azure has 200+ services. Exam expects you to understand 40+ services.
Exam tests your decision making abilities:
Which data format will you use in which situation?
Which Azure data store will you use in which situation?
This course is designed to help you make these choices
Our Goal : Help you get certified and start your cloud journey with Azure

2
How do you put your best foot forward?
Challenging certification - Expects
you to understand and REMEMBER a
number of services
As time passes, humans forget things.
How do you improve your chances of
remembering things?
Active learning - think and take notes
Review the presentation every once in a
while

3
Our Approach
Three-pronged approach to reinforce
concepts:
Presentations (Video)
Demos (Video)
Two kinds of quizzes:
Text quizzes
Video quizzes
(Recommended) Take your time. Do
not hesitate to replay videos!
(Recommended) Have Fun!

4
Getting Started - Azure

5
Before the Cloud - Example 1 - Online Shopping App

Challenge:
Peak usage during holidays and weekends
Less load during rest of the time
Solution (before the Cloud):
Procure (Buy) infrastructure for peak load
What would the infrastructure be doing during periods of low loads?

6
Before the Cloud - Example 2 - Startup

Challenge:
It suddenly becomes popular.
How to handle the sudden increase in load?
Solution (before the Cloud):
Procure (Buy) infrastructure assuming they would be successful
What if they are not successful?

7
Before the Cloud - Challenges

High cost of procuring infrastructure


Needs ahead of time planning (Can you guess the future?)
Low infrastructure utilization (PEAK LOAD provisioning)
Dedicated infrastructure maintenance team (Can a startup afford it?)

8
Silver Lining in the Cloud
How about provisioning (renting) resources when you want
them and releasing them back when you do not need them?
On-demand resource provisioning
Also called Elasticity

9
Cloud - Advantages
Trade "capital expense" for "variable expense"
Benefit from massive economies of scale
Stop guessing capacity
Stop spending money running and maintaining data centers
"Go global" in minutes

10
Microsoft Azure
One of the leading cloud service providers
Provides 200+ services
Reliable, secure and cost-effective
The entire course is all about Azure. You will learn it as we go
further.

11
Best path to learn Azure!

Cloud applications make use of multiple Azure services.


There is no single path to learn these services independently.
HOWEVER, we've worked out a simple path!

12
Setting up Azure Account
Create Azure Account

13
Regions and Zones

14
Regions and Zones

Imagine that your application is deployed in a data center in London


What would be the challenges?
Challenge 1 : Slow access for users from other parts of the world (high latency)
Challenge 2 : What if the data center crashes?
Your application goes down (low availability)

15
Multiple data centers

Let's add in one more data center in London


What would be the challenges?
Challenge 1 : Slow access for users from other parts of the world
Challenge 2 (SOLVED) : What if one data center crashes?
Your application is still available from the other data center
Challenge 3 : What if entire region of London is unavailable?
Your application goes down

16
Multiple regions

Let's add a new region : Mumbai


What would be the challenges?
Challenge 1 (PARTLY SOLVED) : Slow access for users from other parts of the world
You can solve this by adding deployments for your applications in other regions
Challenge 2 (SOLVED) : What if one data center crashes?
Your application is still live from the other data centers
Challenge 3 (SOLVED) : What if entire region of London is unavailable?
Your application is served from Mumbai

17
Regions
Imagine setting up data centers in
different regions around the world
Would that be easy?
(Solution) Azure provides 60+ regions
around the world
Expanding every year
Region : Specific geographical
location to host your resources
Advantages:
High Availability
Low Latency
Global Footprint
Adhere to government regulations

18
Availability Zones
How to achieve high availability in the same
region (or geographic location)?
Enter Availability Zones
Multiple AZs (3) in a region
One or more discrete data centers
Each AZ has independent & redundant power, networking &
connectivity
AZs in a region are connected through low-latency links
(Advantage) Increased availability and fault
tolerance within same region
Survive the failure of a complete data center
(Remember) NOT all Azure regions have
Availability Zones

19
Regions and Availability Zones examples
New Regions and AZs are constantly added

Region Availability Zones


(US) East US 3
(Europe) West Europe 3
(Asia Pacific) Southeast Asia 3
(South America) Brazil South 3
(US) West Central US 0

20
Data
Data is the "oil of the 21st Century Digital Economy"
Amount of data generated increasing exponentially
Mobile devices, IOT devices, application metrics etc
Variety of
Data formats: Structured, Semi Structured and Unstructured
Data store options: Relational databases, NoSQL databases, Analytical
databases, Object/Block/File storage ...
Store data efficiently and gain intelligence
Goal of the course: Help you choose specific data
format and Azure data store for your use case
We will start with 10,000 feet overview of cloud:
Regions, Zones and IaaS/PaaS/SaaS
After that, play with different data formats and data storage
options in Azure

21
IaaS vs PaaS vs SaaS

22
Azure Virtual Machines
In corporate data centers, data stores are deployed on
physical servers
Where do you deploy data stores in the cloud?
Rent virtual servers
Virtual Machines - Virtual servers in Azure
Azure Virtual Machines - Provision & Manage Virtual Machines

23
Problem with using VMs for Databases
You need to take care of:
OS installation & upgrades
Database installation & upgrades
Availability (create a standby database)
Durability (take regular backups)
Scaling compute & storage

24
Managed Services
Do you want to continue running databases in the cloud, the
same way you run them in your data center?
OR are there OTHER approaches?
Let's understand some terminology used with cloud services:
IaaS (Infrastructure as a Service)
PaaS (Platform as a Service)
SaaS (Software as a Service)
Let's get on a quick journey to understand these!

25
IaaS (Infrastructure as a Service)
Use only infrastructure from cloud provider
Example: Running SQL Server on a VM
Cloud Provider is responsible for:
Virtualization, Hardware and Networking
You are responsible for:
OS upgrades and patches
Database software and upgrades
Database Configuration (Tables, Indexes, Views etc)
Data
Scaling of compute & storage, Availability and Durability

26
PaaS (Platform as a Service)
Use a platform provided by cloud
Cloud provider is responsible for:
Virtualization, Hardware and Networking
OS upgrades and patches
Database software and upgrades
Scaling, Availability, Durability etc..
You are responsible for:
Database Configuration (Tables, Views, Indexes, ...)
Data
Examples: Azure SQL Database, Azure Cosmos DB and
a lot more ...
You will NOT have access to OS and Database software
(most of the times!)

27
SaaS (Software as a Service)
Centrally hosted software (mostly on the cloud)
Offered on a subscription basis (pay-as-you-go)
Examples:
Email, calendaring & office tools (such as Outlook 365, Microsoft Office 365, Gmail, Google Docs)
Customer relationship management (CRM), enterprise resource planning (ERP) and document
management tools
Cloud provider is responsible for:
OS (incl. upgrades and patches)
Application Runtime
Auto scaling, Availability & Load balancing etc..
Application code and/or
Application Configuration (How much memory? How many instances? ..)
Customer is responsible for:
Configuring the software!

28
Azure Cloud Service Categories - Scenarios
Scenario Solution
IaaS or PaaS or SaaS: Deploy a Database in Virtual Machines IaaS
IaaS or PaaS or SaaS: Using Gmail SaaS
IaaS or PaaS or SaaS: Using Azure SQL Database to create a database PaaS
True or False: Customer is responsible for OS updates when using PaaS False
True or False: Customer is responsible for Availability when using PaaS False
True or False: In PaaS, customer has access to VM instances False
True or False: In PaaS, customer can customize OS and install custom software False
True or False: In PaaS, customer can configure hardware needs (memory, cpu etc) True

29
Data Formats & Data Stores
10,000 Feet Overview

30
Data Formats & Data Stores
Data is the "oil of the 21st Century Digital Economy"
Amount of data generated increasing exponentially
Data formats:
Structured: Tables, Rows and Columns (Relational)
Semi Structured: Key-Value, Document (JSON), Graph, etc
Unstructured: Video, Audio, Image, Text files, Binary files ...
Data stores:
Relational databases
NoSQL databases
Analytical databases
Object/Block/File storage

31
Structured Data - Relational Databases
Data stored in Tables - Rows &
Columns
Predefined schema - Tables,
Relationships and Constraints

Define indexes - Query efficiently on


all columns
Used for
OLTP (Online Transaction Processing) use
cases and
OLAP (Online Analytics Processing) use
cases

32
Relational Database - OLTP (Online Transaction Processing)
Applications where large number of users make large
number (millions) of transactions
Transaction - small, discrete, unit of work
Example: Transfer money from your account to your friend's account
Heavy writes and moderate reads
Quick processing expected
Use cases: Most traditional applications - banking, e-
commerce, ..
Popular databases: MySQL, Oracle, SQL Server etc
Some Azure Managed Services:
Azure SQL Database: Managed Microsoft SQL Server
Azure Database for MySQL: Managed MySQL
Azure Database for PostgreSQL: Managed PostgreSQL

33
Relational Database - OLAP (Online Analytics Processing)
Applications allowing users to analyze petabytes of data
Examples: Reporting applications, Data warehouses, Business intelligence
applications, Analytics systems
Data is consolidated from multiple (typically transactional) databases
Sample application : Decide insurance premiums analyzing data from last
hundred years
Azure Managed Service: Azure Synapse Analytics
Petabyte-scale distributed data ware house
Unified experience for developing end-to-end analytics solutions
Data integration + Data warehouse + Data analytics
Run complex queries across petabytes of data
Earlier called Azure SQL Data Warehouse

34
Relational Databases - OLAP vs OLTP
OLAP and OLTP use similar data structures
BUT very different approach in how data is
stored
OLTP databases use row storage
Each table row is stored together
Efficient for processing small transactions
OLAP databases use columnar storage
Each table column is stored together
High compression - store petabytes of data efficiently
Distribute data - one table in multiple cluster nodes
Execute single query across multiple nodes -
Complex queries can be executed efficiently
35
Semi Structured Data
Data has some structure BUT not very strict
Semi Structured Data is stored in NoSQL databases
NoSQL = not only SQL
Flexible schema
Structure data the way your application needs it
Let the structure evolve with time
Horizontally scale to petabytes of data with millions of TPS
Managed Service: Azure Cosmos DB
Types of Semi Structured Data:
Document
Key Value
Graph
Column Family

36
Semi Structured Data - 1 - Document
Data stored as collection of documents
Typically JSON (Javascript Object Notation)
Be careful with formatting (name/value pairs, commas etc)
address - Child Object - {}
socialProfiles - Array - []
Documents are retrieved by unique id (called the key)
Typically, you can define additional indexes
Documents don't need to have the same structure
No strict schema defined on database
Apps should handle variations (application defined schema)
Typically, information in one document would be stored in
multiple tables, if you were using a relational database
Use cases: Product Catalog, Profile, Shopping Cart etc
Managed Service: Azure Cosmos DB SQL API &
MongoDB API

37
Semi Structured Data - 2 - Key-Value

Similar to a HashMap
Key - Unique identifier to retrieve a specific value
Value - Number or a String or a complex object, like a JSON file
Supports simple lookups - query by keys
NOT optimized for query by values
Typically, no other indexes allowed
Use cases: Session Store, Caching Data
Managed Services: Azure Cosmos DB Table API, Azure Table Storage

38
Semi Structured Data - 3 - Graph

Social media applications have data with complex relationships


How do you store such data?
As a graph in Graph Databases
Used to store data with complex relationships
Contains nodes and edges (relationships)
Use cases: People and relationships, Organizational charts, Fraud Detection
Managed Service: Azure Cosmos DB Gremlin API

39
Semi Structured Data - 4 - Column Family

Data organized into rows and columns


Can appear similar to a relational database
IMPORTANT FEATURE: Columns are divided into groups called column-family
Rows can be sparse (does NOT need to have value for every column)
Use cases: IOT streams and real time analytics, financial data - transaction
histories, stock prices etc
Managed Service: Azure Cosmos DB Cassandra API

40
Unstructured Data

Data which does not have any structure (Audio files, Video files, Binary files)
What is the type of storage of your hard disk?
Block Storage (Azure Managed Service: Azure Disks)
You've created a file share to share a set of files with your colleagues in a enterprise. What
type of storage are you using?
File Storage (Azure Managed Service: Azure Files)
You want to be able to upload/download objects using a REST API without mounting them
onto your VM. What type of storage are you using?
Object Storage (Azure Managed Service: Azure Blob Storage)
41
Relational vs Non Relational Data - Quick Overview
Relational Data (Structured Data)
OLTP: SQL Server on Azure VMs, Azure SQL Database (or Azure SQL Managed
Instance), Azure Database for PostgreSQL, MariaDB, MySQL
OLAP: Azure Synapse Analytics
Non Relational Data (Semi Structured/Unstructured Data)
Semi Structured - Document (JSON)
Azure Cosmos DB SQL API and Cosmos DB MongoDB API
Semi Structured - Key-Value
Azure Cosmos DB Table API, Azure Table Storage
Semi Structured - Column-Family
Azure Cosmos DB Cassandra API
Semi Structured - Graph
Azure Cosmos DB Gremlin API
Unstructured Data
Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage)

42
Databases - Scenarios
Scenario Solution
A start up with quickly evolving schema for storing Azure Cosmos DB SQL API and Cosmos DB
documents MongoDB API
Transactional local database processing thousands of Azure SQL Database and other relational
transactions per second databases..
Store complex relationships between transactions to identify Azure Cosmos DB Gremlin API
fraud
Database for analytics processing of petabytes of structured Azure Synapse Analytics
data
File share between multiple VMs Azure Files
Storing profile images uploaded by your users Azure Blob Storage

43
Relational Databases

44
Relational Databases
Structured Data - Tables, Rows and Columns
Structured Query Language (SQL) for retrieving and managing data
Recommended when strong transactional consistency guarantees
are needed
Database schema is mandatory
Azure Managed Services:
Azure SQL Database
Azure SQL Managed Instance
Azure Database for PostgreSQL
Azure Database for MySQL
Azure Database for MariaDB

45
Azure SQL Database
Fully Managed Service for Microsoft SQL Server
99.99% availability
Built-in high availability, automatic updates and backups
Flexible and responsive serverless compute
Hyperscale (up to 100 TB) storage
Transparent data encryption(TDE) - Data is automatically
encrypted at rest
Authentication: SQL Server authentication or Active Directory
(and MFA)

46
Relational Databases - Tables and Relationships
Relational Databases are modeled using
Tables and Relationships
A Course has an Instructor
A Course belongs to a Department
Table: Table contains columns and rows
All rows in a table have same set of columns
Relationship between tables is established using
Primary Key and Foreign Key
Primary Key: Uniquely identifies a row in a table
Foreign Key: Provides a link between data in two tables

47
Structured Query Language
SQL: Language used to perform operations on relational databases
Data Definition Language (DDL): Create and modify structure of database objects
Create: Create a database or its constituent objects (Table, View, Index etc)
Drop: Delete objects (Table, View, Index) from database
Alter: Alter structure of the database
Data Query Language (DQL): Perform queries on the data
Example: SELECT * from Course, SELECT Count(*) from Course
Data Manipulation Language (DML): Insert, update or delete data
Example: insert into Course values (1, 'AZ-900', 1);
Example: Update Course Set title='AZ-900 Azure Fundamentals' where id=1
Example: Delete from Course where id=1
Data Control Language (DCL): Manage permissions and other controls
Example: Grant and revoke user access - GRANT SELECT ON course TO user1
Transaction Control Language(TCL): Control transactions within a database
Commit - commits a transaction
Rollback - rollbacks a transaction (used in case of an error)

48
Index
CREATE CLUSTERED INDEX INDEX_NAME on TABLE (COLUMN_NAME);

CREATE NONCLUSTERED INDEX INDEX_NAME on TABLE (COLUMN_NAME);

Allows efficient data retrieval from a database


Combination of one or more columns
Remember: An index is automatically created with the primary key
Remember: A table can have more than one index
Two Types of Indexes:
Clustered: Data in table is stored in the order of the index key values
Remember: Only one clustered index per table ( Why? - data rows can only be sorted in one way)
Non-clustered indexes: Index stored separately with pointers to the data rows

49
View
create view all_courses_with_students

as

select course_id, student_id, first_name, last_name, title

from Course_Student, Student, Course

where Course_Student.student_id = Student.id and

Course_Student.course_id=Course.id;

View: Virtual table mapped to a query


Can be used just like a table in SQL queries
Use cases: Add calculated columns, join multiple tables, filter unnecessary
columns

50
Normalization
Goals in designing relational databases:
High Data Integrity
Minimum Data Redundancy (or Duplication)
How do achieve these goals?
Database Normalization: "Process of restructuring a relational database to reduce
data redundancy and improve data integrity"
First Normal Form (1NF): Single(atomic) valued columns
Violation Example: A column named address
Second Normal Form (2NF): Eliminate redundant data
Third Normal Form (3NF): Move columns not directly dependent on primary key
(REMEMBER) There are other normal forms (4NF, 5NF, ...) but 3NF is considered good enough for most relational data

Advantages of Normalization
Avoid same data being duplicated in multiple tables
Reduce disk space wastage
Avoid data inconsistencies

51
Normalization example
Unnormalized - Enrollment Details

Normalized - Student

Normalized - Instructor

52
Normalization example - 2
Normalized - Course

Normalized - Course_Student

53
Transactions
Transaction: Sequence of operations that need to be atomic
All operations are successful (commit) OR NONE are successful (rollback)
Example: Transfer $10 from Account A to B
Operation 1: Reduce $10 from Account A
Operation 2: Add $10 to Account B
If Operation 1 is successful and Operation 2 fails - Inconsistent state
You don't want that!

Properties: ACID (Atomicity, Consistency, Isolation, Durability)


Atomicity: Each transaction is atomic (either succeeds completely, or fails
completely)
Consistency: Database must be consistent before and after the transaction
Isolation: Multiple Transactions occur independently
Durability: Once a transaction is committed, it remains committed even if there are
system failures (a power outage, for example)
Remember: Supported in all Relational Databases

54
Azure SQL Database - Purchase Models
vCore-based: Choose between provisioned or serverless compute
OPTIONAL: Hyperscale (Autoscale storage)
Higher compute, memory, I/O, and storage limits
Supports BYOL
Serverless Compute: Database is paused during inactive periods
You are only billed for storage during inactive periods
If there is any activity, database is automatically resumed
DTU-based: Bundled compute and storage packages
Balanced allocation of CPU, memory and IO
Assign DTUs (relative - Double DTU => Double resources)
Recommended when you want to keep things simple
You CANNOT scale CPU, memory and IO independently
Use DTUs for small and medium databases (< few hundred DTUs)

55
Azure SQL Database - Important Features
Feature Description
Single database Great fit for modern, cloud-born applications

Fully managed database with predictable performance

Hyperscale storage (up to 100TB)

Serverless compute
Elastic pool Cost-effective solution for multiple databases with variable usage patterns

Manage multiple databases within a fixed budget


Database server Database servers are used to manage groups of single databases and elastic pools.

Things configured at Database server level: Access management, Backup management

56
Azure SQL Database - Remember
Prerequisites to connect and query from Azure SQL
database:
1: Connection Security: Database should allow connection from
your IP address
2: User should be created in the database
3: User should have grants (permissions) to perform queries -
Select, Insert etc.
Use BYOL to reduce license costs
Use read-only replicas (Read scale-out) for offloading
read-only query workloads

57
Azure SQL managed instance
Another Fully Managed Service for Microsoft SQL Server
What's New: Near 100% SQL Server feature compatibility
Recommended when migrating on premise SQL Servers to Azure
Azure SQL managed instance features NOT in Azure SQL Database
Cross-database queries (and transactions) within a single SQL Server instance
Database Mail
Built in SQL Server Agent
Service to execute scheduled administrative tasks - jobs in SQL Server
Native virtual network support
Supports only vCore-based purchasing model
(Remember) SQL Server Analysis Services (SSAS), SQL Server
Reporting Services (SSRS), Polybase: NOT supported by both Azure
SQL Database and SQL Managed Instance

58
SQL Server in Azure - Summary
Service Description
SQL Server on Azure Provides full administrative control over the SQL Server instance and underlying
Virtual Machines OS for migration to Azure
Azure SQL Database Fully Managed Service for Microsoft SQL Server.

Recommended for cloud-born applications


Azure SQL managed Full (Near 100%) SQL Server access and feature compatibility

instance Recommended for migrating on-premise SQL Server databases

Azure SQL managed instance ONLY features: Cross-database queries, Database


Mail Support, SQL Server Agent etc.

59
Azure database for MySQL
Fully managed, scalable MySQL database
Supports 5.6, 5.7 and 8.0 community editions of MySQL
99.99% availability
Choose single zone or zone redundant high availability
Automatic updates and backups
Alternative: Azure Database for MariaDB
MariaDB: community-developed, commercially supported fork of MySQL

60
Azure Database for PostgreSQL
Fully managed, intelligent and scalable PostgreSQL
99.99% availability
Choose single zone or zone redundant high availability
Automatic updates and backups
Single Server and Hyperscale Options
Hyperscale: Scale to hundreds of nodes and execute queries across
multiple nodes

61
Relational Data - Scenarios
Scenario Solution
You are migrating a Microsoft SQL Server database to cloud. You want full access to SQL Server on VM
OS and Microsoft SQL Server installation.
You are migrating a Microsoft SQL Server database to cloud. You do NOT need full Azure SQL Managed
access to OS and Microsoft SQL Server installation. However, you need access to Instance
Database Mail and SQL Server Agent.
You want create a new managed Microsoft SQL Server database in cloud Azure SQL
Database, Azure
SQL Managed
Instance
Which category of SQL is this? GRANT SELECT ON course TO user1 Data Control
Language (DCL)

62
Relational Data - Scenarios - 2
Scenario Solution
Which category of SQL is this? create table course Data Definition Language (DDL)
(...)
Your queries on a relational databases are slow. What is the first Check if there is an index
thing that you would consider doing?
Your colleague asked you to normalize your tables. What should High Data Integrity & Minimum Data
be your goals? Redundancy (or Duplication)
How can you offload read-only workloads from Azure SQL Read-only replicas (Read scale-out)
database?

63
Azure Cosmos DB

64
Relational vs Non Relational Data - Quick Overview
Relational Data (Structured Data)
OLTP: Azure SQL Database, Azure SQL Managed Instance, SQL Server on Azure VMs,
Azure Database for PostgreSQL, MariaDB, MySQL
OLAP: Azure Synapse Analytics
Non Relational Data (Semi Structured/Unstructured Data)
Semi Structured - Document (JSON)
Azure Cosmos DB SQL API and Cosmos DB MongoDB API
Semi Structured - Key-Value
Azure Cosmos DB Table API, Azure Table Storage
Semi Structured - Column-Family
Azure Cosmos DB Cassandra API
Semi Structured - Graph
Azure Cosmos DB Gremlin API
Unstructured Data
Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage)

65
Azure Cosmos DB
Fully managed NoSQL database service
Global database: Automatically replicates data across multiple Azure
regions
Single-digit millisecond response times
99.999% availability
Automatic scaling (serverless) - Storage and Compute
Multi-region writes
Data distribution to any Azure region with the click of a button
Your app doesn't need to be paused or redeployed to add or remove a region
Structure: Azure Cosmos account(s) > database(s) > container(s) >
item(s)

66
Azure Cosmos DB APIs
Core(SQL): SQL based API for working with documents
MongoDB: Document with MongoDB API
Move existing MongoDB workloads
Table: Key Value
Ideal for moving existing Azure Table storage workloads
Gremlin: Graph
Store complex relationships between data
Cassandra: Column Family
REMEMBER: You need a separate Cosmos DB account for each type of
API

67
Azure Cosmos DB - What is Different?
Single-digit millisecond response times even if you
scale to petabytes of data with millions of TPS
Horizontal scalability
One thing I love about Azure Cosmos DB: Flexibility
Structure data the way your application needs it
Let the structure evolve with time
Provides a variety of consistency levels
Strong, Bounded staleness, Session, Consistent prefix, Eventual
If you are familiar with SQL but want to still use document
database use SQL API
Options for key-value, column-family and graph databases

68
Cosmos DB - Structure
Entity SQL Cassandra MongoDB Gremlin Table
Database Database Keyspace Database Database NA
Container Container Table Collection Graph Table
Item Item Row Document Node or edge Item

69
Cosmos DB - Logical and Physical Partitions
Each container is horizontally
partitioned in an Azure region
ALSO distributed to all Azure
regions associated with the
Cosmos DB account
Items in a container divided
into logical partitions based
on the partition key
Cosmos DB take care of
categorizing logical partitions
into physical partitions (https://docs.microsoft.com)
Ensures high availability and
durability

70
Cosmos DB - Provisioned throughput vs Serverless
Factor Provisioned throughput Serverless
Description Provision throughput in Request Units per No need to provision capacity. Auto scales to
second meet request load.
What are you RUs provisioned per hour (usage does per-hour RUs consumed + Storage
billed for? NOT matter) + Storage
When to use? Continuous predictable traffic Intermittent, unpredictable traffic
Multi Regions Yes No - only in 1 Azure region
Max storage per No limit 50 GB
container
Performance < 10 ms latency for point-reads and writes < 10 ms latency for point-reads and < 30 ms
for writes

71
Azure Cosmos DB - Scenarios
Scenario Solution
How can you increase storage associated with Azure Automatic scaling (serverless)
Cosmos DB?
What is the high level structure of storing data in Azure Azure Cosmos account(s) > database(s) >
Cosmos DB? container(s) > item(s)
How are items in a container divided into logical Using partition key
partitions?
You want to store data for a social networking app with Gremlin API
complex relationships
You want SQL based API for working with documents Core(SQL) API
You want to move existing MongoDB workloads to Azure MongoDB API

72
Azure Storage

73
Relational vs Non Relational Data - Quick Overview
Relational Data (Structured Data)
OLTP: Azure SQL Database, Azure SQL Managed Instance, SQL Server on Azure VMs,
Azure Database for PostgreSQL, MariaDB, MySQL
OLAP: Azure Synapse Analytics
Non Relational Data (Semi Structured/Unstructured Data)
Semi Structured - Document (JSON)
Azure Cosmos DB SQL API and Cosmos DB MongoDB API
Semi Structured - Key-Value
Azure Cosmos DB Table API, Azure Table Storage
Semi Structured - Column-Family
Azure Cosmos DB Cassandra API
Semi Structured - Graph
Azure Cosmos DB Gremlin API
Unstructured Data
Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage)

74
Azure Storage
Managed Cloud Storage Solution
Highly available, durable and massively scalable (upto few PetaBytes)
Core Storage Services:
Azure Disks: Block storage (hard disks) for Azure VMs
Azure Files: File shares for cloud and on-premises
Azure Blobs: Object store for text and binary data
Azure Queues: Decouple applications using messaging
Azure Tables: NoSQL store (Very Basic)
Prefer Azure Cosmos DB for NoSQL
(PRE-REQUISITE) Storage Account is needed for Azure Files,
Azure Blobs, Azure Queues and Azure Tables

75
Azure Storage - Data Redundancy
Option Redundancy Discussion
Locally redundant Three synchronous copies in same data center Least expensive and least
storage (LRS) availability
Zone-redundant storage Three synchronous copies in three AZs in the primary
(ZRS) region
Geo-redundant storage LRS + Asynchronous copy to secondary region (three
(GRS) more copies using LRS)
Geo-zone-redundant ZRS + Asynchronous copy to secondary region (three Most expensive and
storage (GZRS) more copies using LRS) highest availability

76
Block Storage
Use case: Hard-disks attached to your
computers
Typically, ONE Block Storage device can be
connected to ONE virtual server
HOWEVER, you can connect multiple different
block storage devices to one virtual server

77
Azure Disks Storage
Disk storage: Disks for Azure VMs
Types:
Standard HDD: Recommended for Backup, non-critical, infrequent access
Standard SSD: Recommended for Web servers, lightly used enterprise applications and
dev/test environments
Premium SSD disks: Recommended for production and performance sensitive workloads
Ultra disks (SSD): Recommended for IO-intensive workloads such as SAP HANA, top tier
databases (for example, SQL, Oracle), and other transaction-heavy workloads
Premium and Ultra provide very high availability
Managed vs Unmanaged Disks:
Managed Disks are easy to use:
Azure handles storage
High fault tolerance and availability
Unmanaged Disks are old and tricky (Avoid them if you can)
You need to manage storage and storage account
Disks stored in Containers (NOT Docker containers. Completely unrelated.)

78
Azure Files
Media workflows need huge shared
storage for things like video editing
Enterprise users need a quick way to
share files in a secure & organized way
Azure Files:
Managed File Shares
Connect from multiple devices concurrently:
From cloud or on-premises
From different OS: Windows, Linux, and macOS
Supports Server Message Block (SMB) and
Network File System (NFS) protocols
Usecase: Shared files between multiple VMs
(example: configuration files)

79
Azure Blob Storage
Azure Blob Storage: Object storage in Azure
Structure: Storage Account > Container(s) > Blob(s)
Store massive volumes of unstructured data
Store all file types - text, binary, backup & archives:
Media files and archives, Application packages and logs
Backups of your databases or storage devices
Three Types of Blobs
Block Blobs: Store text or binary files (videos, archives etc)
Append Blobs: Store log files (Ideal for append operations)
Page Blobs: Foundation for Azure IaaS Disks (512-byte pages up to 8 TB)
Azure Data Lake Storage Gen2: Azure Blob Storage Enhanced
Designed for enterprise big data analytics (exabytes, hierarchical)
Low-cost, tiered storage, with high availability/disaster recovery

80
Azure Blob Storage - Access Tiers
Different kinds of data can be stored in Blob Storage
Media files, website static content
Backups of your databases or storage devices
Long term archives
Huge variations in access patterns
Can I pay a cheaper price for objects I access less frequently?
Access tiers
Hot: Store frequently accessed data
Cool: Infrequently accessed data stored for min. 30 days
Archive: Rarely accessed data stored for min. 180 days
Lowest storage cost BUT Highest access cost
Access latency: In hours
To access: Rehydrate (Change access tier to hot or cool) OR
Copy to another blob with access tier hot or cool
You can change access tiers of an object at any point in time

81
Azure Storage - Remember
Azure Queues: Decouple applications using messaging
Azure Tables: NoSQL store (Very Basic)
A key/value store
Store and retrieve values by key
Supports simple query, insert, and delete operations
Cosmos DB Table API is recommended as key/value store for newer
usecases (supports multi-master in multiple regions)
Azure Tables only supports read replicas in other regions
GRS or GZRS: Data in secondary region is generally NOT available for read or write access
Available for read or write only in case of failover to the secondary region
To enable round the clock read access:
Use read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS)

82
Azure Storage - Scenarios
Scenario Solution
What is needed before storing data to Azure Files, Azure Blobs, Azure Queues and Azure Storage Account
Tables?
You have a Storage Account and you are making use of Azure Blob Storage. You want to No
create a new file share. Is it mandatory to create a new Storage Account?
You want highest availability for data in your Storage Account Geo-zone-
redundant
storage (GZRS)
Which service supports Server Message Block (SMB) and Network File System (NFS) Azure Files
protocols?
You are not planning to access your data in Azure Blob storage for a few years. You can Move data to
wait for a few hours when you need to access the data. How can you reduce your costs? Archive tier

83
Data Analytics

84
Data Analytics
Goal: Convert raw data to intelligence
Uncover trends and discover meaningful information
Find new opportunities and identify weaknesses
Increase efficiency and improve customer satisfaction
Make appropriate business decisions
Raw data can be from different sources:
Customer purchases, bank transactions, stock prices, weather
data, monitoring devices etc
Approach: Ingest => Process => Store (data warehouse
or a data lake) => Analyze
Ex: Decide future sales using past customer behavior
Ex: Faster diagnosis & treatment using patient history

85
Data Analytics Work Flow
Data Ingestion: Capture raw data
From various sources (stream or batch)
Example: Weather data, sales records, user actions - websites ..
Data Processing: Process data
Raw data is not suitable for querying
Clean (remove duplicates), filter (remove anomalies) and/or aggregate data
Transform data to required format (Transformation)
Data Storage: Store to data warehouse or data lake
Data Querying: Run queries to analyze data
Data Visualization: Create visualizations to make it
easier to understand data and make better decisions
Create dashboards, charts and reports (capture trends)
Help business spot trends, outliers, and hidden patterns in data

86
Data Analysis Categories
Descriptive analytics: What’s happening?
Based on historical/current data
Monitor status (of KPIs) and generate alerts
Example: Generating reports (current vs planned)
Diagnostic analytics: Why is something happening?
Take findings from descriptive analytics and dig deeper
Example: Why did sales increase last month?
Example: Why are sales low in Netherlands?
Predictive analytics: What will happen?
Predict probability based on historical data
Mitigate risk and identify opportunities
Example: What will be the future demand?
Example: Calculate probability of something happening in future

87
Data Analysis Categories - 2
Prescriptive analytics: What actions should we take?
Use insights from predictive analytics and make data-driven
informed decisions
Still in early stages
Example: What can I do to increase probability of this course
being successful in future?
Cognitive analytics: Make analytic tools to think like
humans
Combine traditional analytics techniques with AI and ML
features
Examples: Speech to text (transcription or subtitles), text to
speech, Video Analysis, Image Analysis, Semantic Analysis of
Text (Analyze reviews)

88
Big Data - Terminology and Evolution
3Vs of Big Data
Volume: Terabytes to Petabytes to Exabytes
Variety: Structured, Semi structured, Unstructured
Velocity: Batch, Streaming ..
Terminology: Data warehouse vs Data lake
Data warehouse: PBs of Storage + Compute (Typically)
Data stored in a format ready for specific analysis! (processed data)
Examples: Teradata, BigQuery(GCP), Redshift(AWS), Azure Synapse Analytics
Typically uses specialized hardware
Data lake: Typically retains all raw data (compressed)
Typically object storage is used as data lake
Amazon S3, Google Cloud Storage, Azure Data Lake Storage Gen2 etc..
Flexibility while saving cost
Perform ad-hoc analysis on demand
Analytics & intelligence services (even data warehouses) can directly read from data lake
Azure Synapse Analytics, BigQuery(GCP) etc..

89
Data warehouse Best Practice - De-normalized Star Schema

How do you structure data for quick analysis in a data warehouse?


Option: Star Schema
Modeling approach most widely used by relational data warehouses
Each Table classified as "Dimension" or "Fact":
Fact tables: Quantitative data - Data generated in a transactional system (typically)
Contains observations or events (sales orders, stock balances, temperatures ...)
Dimension tables: Contain descriptive attributes related to fact data
Example: Product, Customer, Store, Calendar
Advantage: Star schemas are de-normalized and are easier to query

90
Data Analytics: 3 Azure Specific Services
Azure Synapse Analytics: End-to-end analytics solutions
Data integration + Enterprise data warehouse + Data analytics
Create SQL and Spark pools to analyze data
Azure Data Factory: Fully managed serverless service to build complex data
pipelines
Extract-transform-load (ETL), extract-load-transform (ELT) and data integration
Power BI: Create visualization around data
Unify data and create BI reports & dashboards

91
Big Data - Hadoop, Spark and Databricks
Hadoop based approaches:
Apache Hadoop: Create datasets with variety of data. Get intelligence.
Runs on commodity servers with attached storage (Large clusters - thousands of nodes)
Hadoop Distributed File System (HDFS): Primary data storage
MapReduce: Write Java, Python, .. apps to process data
Enables massive parallelization
HIVE: Query using SQL
Apache Spark: How about processing in-memory?
Really fast: Can be up to 100 times faster than MapReduce (if you make sufficient memory available)
Supports Java, Python, R, SQL and Scala programming languages
Run data analytics, data processing and machine learning workloads
Has become very popular and is offered as a separate service in most cloud platforms!
Databricks: Web-based platform for working with Spark
Centralized platform for machine learning, streaming analytics and business intelligence workloads
Founded by the creators of Apache Spark
Automated cluster management

92
Hadoop and Spark in Azure
Azure HDInsight: Managed Apache Hadoop Azure service
Process big data with Hadoop, Spark
Azure Databricks: Managed Apache Spark service
Premium Spark offering
Focused only on running Apache Spark workloads
Can consume data from Azure SQL Database, Event hubs, Cosmos DB
Other Azure Spark Integrations:
Azure Synapse Analytics: Can run Spark jobs using "Apache Spark for Azure
Synapse"
Azure Data Factory: Run pipelines involving Azure services like Azure HDInsight,
Azure Databricks

93
Massive Parallel Processing (MPP)
Split processing across multiple compute nodes
Typically separate storage and compute
Use Data lake as storage (for example)
Scale compute on demand
Examples: Spark, Azure Synapse Analytics
Some services run Spark in serverless mode!

94
Batch Pipelines

(https://docs.microsoft.com)
Batch Processing: Buffering and processing data in groups
Define condition - how often to run? (every 6 hours or after 10K records)
Advantages: Process huge volumes of data during off-peak hours (overnight, for example)
Typically takes longer to run (minutes to hours to days)
Example: Read from storage (Azure Data Lake Store), process, and write to Relational
Database or NoSQL Database or Data warehouse

95
Streaming Pipelines

(https://docs.microsoft.com)
Streaming Processing: Real-time data processing
Processing data as it arrives (in seconds or milliseconds)
Examples: Stock Market Data, Telemetry from IOT Devices, User action metrics from websites

96
Stream vs Batch Processing
Feature Batch Streaming
Time Process data in batches - all data from few Process most recent data (last 30 seconds, for
Period hours to few days to few months example).
Data Process large datasets efficiently Process individual records or micro batches
Size containing a few records
Latency High - Typically few hours Low - Typically few seconds or milliseconds
Usecase Use for performing complex storage or Used for storing individual records, simple
analysis aggregation or rolling average calculations

97
Apache Parquet
Open source columnar storage format
High compression because of columnar
storage
Efficient storage for big data workloads
Introduced by the Apache Hadoop ecosystem
Supported by most big data platforms:
Azure Data Factory supports Parquet for both read and
write (Source and Sink)
Azure Data Lake Storage / Azure Blob Storage - Store
data in Parquet format
Azure Synapse Analytics can be used to store tabular
representation of data in Parquet format

98
ETL

ETL (Extract, Transform, and Load): Retrieve data, process and store it
Data can be from multiple sources
Recommended for simple processing:
Basic data cleaning tasks, de-duplicating data, formatting data
Example: Ensure data privacy and compliance
Removing sensitive data before it reaches analytical data models
Can run each of the phases in parallel
While extract is going on, you can transform data which is already loaded

99
ELT

ELT (Extract, Load, and Transform): Data is stored before it is transformed


Uses an iterative approach (multiple steps) to process data in target system
Needs a powerful target datastore:
Target datastore should be able to perform transformations
Advantage: Does NOT use a separate transformation engine
Typical target data stores: Hadoop cluster (using Hive or Spark), Azure
Synapse Analytics
Enables use of massively parallel processing (MPP) capabilities of data stores

100
Azure Synapse Analytics

(https://docs.microsoft.com)
Develop end-to-end analytics solutions
Data integration + Enterprise data warehouse + Data analytics
SQL technologies + Spark technologies + Pipelines
Full integration with Power BI, Cosmos DB, and Azure ML

101
Azure Synapse Analytics - Workflow
In a workspace, create pipelines for:
Data Ingestion:
Ingest data from 90+ data sources (Cosmos DB,AWS, GCP..)
Stream data into SQL tables
Data Storage: Datasets - Azure Storage, Azure Data
Lake Storage (https://docs.microsoft.com)
Formats: Parquet, CSV, JSON ..
Data Processing: Mix & match SQL and Spark
SQL pool: SQL Database supporting distributed T-SQL queries
Two consumption models: dedicated and serverless
Recommended for complex reporting & data ingestion using Polybase
SQL Pool can be paused to reduce compute costs
Apache Spark pools: Run Spark based workloads
1: Create Spark data analysis notebooks OR
2: Run batch Spark jobs (jar files)
Recommended for data preparation and ML

102
Azure Data Factory
Fully managed serverless service to build complex data pipelines:
Extract-transform-load (ETL), extract-load-transform (ELT) and data integration
90 built-in connectors
Ingest data from:
Big Data sources like Amazon Redshift, Google BigQuery
Enterprise data warehouses like Oracle Exadata, Teradata
All Azure data services
Build data flows to transform data
Integrate with services like Azure HDInsight, Azure Databricks, Azure Synapse Analytics for data
processing
Move SQL Server Integration Services (SSIS) packages to cloud
CI/CD support with Azure Devops

103
Demo - Azure Data Factory and Synapse Analytics
Create a Data Lake Storage Account Gen2
Create a SQL Server Database
Task: Extract data from SQL Server to CSV file

104
Azure Data Lake Storage (Gen2)
Blob storage + Hierarchical directory structure
Configure permissions(RBAC) at file and directory level
Fully compatible with Hadoop Distributed File System (HDFS)
Apache Hadoop workloads can directly access data in Azure Data Lake
Storage
Three main elements:
Data Lake Store: Azure Data Factory, Azure Databricks, Azure HDInsight,
Azure Data Lake Analytics, and Azure Stream Analytics can read directly
Data Lake Analytics: Run analytics jobs using U-SQL
HDInsight: Run Hadoop jobs

105
Azure Data Factory - Components
Pipeline: Logical group of activities that can be scheduled
You can chain activities in a pipeline
You can run activities sequentially or in parallel
A pipeline can execute other pipelines
Activity: Represents a step in a pipeline (an action to be performed)
Copy Activity: Copy data from one store to another store
Example: Copy CSV from Blob Storage to a Table in SQL Database
Three types of activities: Data movement, Data transformation, Control activities
Data Flow: Create and manage data transformation logic
Build reusable library of data transformation routines
Executes logic on a Spark cluster:
You don't need to manage the cluster (it is spun up and down automatically as needed)
Control flow: Orchestrate pipeline activity based on output of
another pipeline activity

106
Azure Data Factory - Components - 2
Linked Service: Used to connect to an external source
Connect to different sources like Azure Storage Blob, SQL Databases etc
Dataset: Representation of data structures within data stores
Integration Runtime: Compute infrastructure used by Azure Data
Factory allowing you to perform
Triggers: Trigger pipeline at a specific times

107
Power BI
Power BI: Unify data and create BI reports & dashboards
Integrates with all Azure analytics services
Azure Synapse Analytics to Azure Data Lake Storage
Power BI Components
Power BI Service: Online SaaS (Software as a Service) service
Power BI online - app.powerbi.com
Create/share reports and dashboards
Power BI Desktop: Windows desktop application to create and share reports
More data sources, Complex modeling and transformations
Power BI Report Builder: Standalone tool to author paginated reports
Power BI Mobile Apps: Apps for Windows, iOS, and Android devices
Typical Power BI Workflow:
1: Create a report with Power BI Service/Desktop (or paginated report with Power BI Report Builder)
2: Share it to the Power BI service
3: View and interact with report (and create dashboards) using Power BI service
Reports can also be accessed from Power BI mobile

108
Power BI Dashboard
Workspace: Container for dashboards, reports, workbooks & datasets
Dataset: Collection of data
Can be a file(Excel, CSV etc) or a database
Azure SQL Database, Azure Synapse Analytics, Azure HDInsight, ..
Each dataset can be used in multiple reports
Report: One or more pages of visualizations
Highly interactive and highly customizable
All data for a report comes from a single dataset
A report can be used in multiple dashboards
Paginated Reports: Create pixel perfect multi page reports for printing & archiving
(PDF/Word)
Create in "Power BI Report Builder" and publish to use in Power BI service
Dashboard: Single page - visualizations from one or more reports
Technically a canvas with multiple tiles
Monitor the most important information at one glance and dig deeper, if needed
You can select a tile and go to a report page to dig deeper

109
Visualization Options
Bar and column charts: Most basic of charts
Line Charts: Emphasize shape of a series of values over time
Pie Charts: Displays division of total into different categories
Matrix: Summarize data in a tabular structure
Treemap: Charts of colored rectangles
Scatter: Shows relationship between two numerical values
Bubble chart: Replace data points with bubbles
Bubble size represents a 3rd dimension
Filled map: Show on a Map
Cards: Single number
Link: Reference for all visualization options in Power BI

110
Data Analytics Work Flow - Data Ingestion
Data Ingestion: Capture raw data
Azure Data Factory: data ingestion and transformation service
Ingest streaming and batch data
Data from on-premises and cloud
PolyBase: Run T-SQL queries on external data sources
PolyBase makes external data sources appear like tables
SQL Server Integration Services (SSIS): on-premises tool data integration
and data transformation solution that is part of Microsoft SQL Server
Run existing SSIS packages as part of Azure Data Factory pipeline
Spark: Ingest streaming data
IOT Hub: Managed message hub for IoT devices
Event Hub: Big data streaming platform and event ingestion
service

111
Data Analytics Work Flow - Data Processing and Storage
Data Processing and Storage:
Azure Data Lake Storage Gen2: Data lake storage
Azure Synapse Analytics: Data processing can be done using:
1: T-SQL - Query using SQL from databases, files, and Azure Data Lake
storage
2: Spark - Write and run Spark jobs using C#, Scala, Python, SQL etc
Azure Databricks: Process data from Azure Blob storage, Azure
Data Lake Store, Hadoop storage, flat files, databases, and data
warehouses
Handle streaming data
Azure HDInsight: Storage - Azure Data Lake storage
Analyze data using Hadoop Map/Reduce, Apache Spark, Apache Hive (SQL)
Azure Data Factory: Build pipelines and data-driven workflows
Ingest data from relational and non-relational systems

112
Data Analytics Work Flow - Querying and Visualization
Data Querying: Run queries to analyze data
Recommended Services: Azure Synapse Analytics, Hive (SQL)
Data Visualization: Create dashboards, charts and
reports
Recommended Services: Power BI

113
Data Analytics - Scenarios
Scenario Solution
Decide Data Analysis Category: You have generated a report showing current status Descriptive
vs planned analytics
Decide Data Analysis Category: Why did sales increase last month? Diagnostic analytics
Decide Data Analysis Category: Semantic Analysis of Text (Analyze reviews) Cognitive analytics
You want to move your Hadoop workloads to the cloud Azure HDInsight
Stream or Batch: Processing after you received 10K records Batch
You want to move SQL Server Integration Services (SSIS) packages to cloud Use Azure Data
Factory
Categorize activity: Copy data from one store to another store Data movement

114
Data Analytics - Scenarios - 2
Scenario Solution
Categorize activity: Orchestrate pipeline activity based on output of another Control flow
pipeline activity
You want to connect to an external source from Data Factory Linked Service
Compute infrastructure used by Azure Data Factory Integration Runtime
Standalone tool to author paginated reports Power BI Report
Builder
What is Online SaaS Power BI called? Power BI Service
Which of these represents "One or more pages of visualizations in Power BI"? Report
Which of these represents "Single page visualizations from one or more reports"? Dashboard

115
Other Important Azure
Concepts

116
Database Tools
Tool Description
Azure Data Studio Cross-platform (Windows, Mac, linux) db tool with Intellisense, code snippets and
source control

Run SQL queries. Save results in different formats - text, JSON, Excel

Supports SQL Server, Azure SQL Database, Azure Synapse Analytics..

Notebooks: Create and share documents with text, images and SQL query results

Support to create and restore backup from SQL Database


SQL Server Graphical tool for managing SQL Server and Azure Databases

Management Studio Query, design, and manage your databases and data warehouses

(SSMS) Supports configuration, management and administration tasks

Suitable for SQL Server, SQL Database, Azure Synapse Analytics


SQL Server Data Build SQL Server and Azure SQL relational databases, Analysis Services (AS) data
Tools (SSDT) models, Integration Services (IS) packages, and Reporting Services (RS) reports
sqlcmd Run SQL scripts and procedures from command line

Supports SQL Server, Azure SQL Database, Azure SQL MI, Azure Synapse Analytics

117
Roles
Role Description
Database Role: Install, upgrade, control (authorization, availability, durability, performance
Administrator optimization, backups, disaster recovery, compliance with licensing) of data servers

Tools: Azure Data Studio, SQL Server Management Studio (SSMS)


Data Responsible for data architecture, data acquisition, data ingestion, data processing
Engineers (transformation, cleansing and pipelines) and data storage (design, build and test) for
analytical workloads

Responsible for build, test, monitoring, performance optimization of data pipelines

Responsible for improving data reliability, efficiency, and quality

Tools: Azure Data Studio, Azure HDInsight, Azure Databricks, Azure Data Factory, Azure
Cosmos DB, Azure Storage ...

Programming Languages - HiveQL, R, or Python


Data Analyst Responsible for getting intelligence from data through integration of data(from multiple
sources), dashboards, reports, visualizations (charts, graphs, ..) and pattern identification
(from huge volumes of data)

Tools: Microsoft Excel, Power BI...

118
Pricing calculator
Estimate the costs for Azure services
Example Services that you can estimate costs for:
Virtual Machines
Storage Accounts
Azure SQL Database
Azure Cosmos DB
...
Ideal place to explore and learn important factors about
different Azure services

119
Azure Resource Manager


(https://docs.microsoft.com/)
Deployment and management service for Azure
All actions to any resource in Azure go through ARM
Irrespective of where you are performing it from
Azure portal OR Powershell OR CLI or ARM template or ...

120
Azure Resource Manager (ARM) templates
Lets consider an example:
I would want to create an Azure SQL Database
I would want to create an Azure Data Lake Storage Gen2
I would want to create an Azure Data Factory Workspace
AND I would want to create 4 environments
Dev, QA, Stage and Production!
Azure Resource Manager (ARM) templates can help you do all
these with a simple (actually NOT so simple) script!

121
Azure Portal, PowerShell, CLI, Cloud Shell
Tool Details
Azure Portal Web-based user interface. Great to get started BUT NO automation possible.

Runs in all modern desktop and tablet browsers


Azure PowerShell Execute cmdlets (sequence of commands) and create scripts (PowerShell script)

Recommended for teams familiar with Windows administration

Cross-platform (Windows, Linux, and macOS)


Azure CLI Similar to Azure PowerShell BUT uses a different syntax (Bash Scripts)
Recommended for teams familiar with Linux administration (and Bash Scripts)

Cross-platform (Windows, Linux, and macOS)


Azure Cloud Shell Free Browser based interactive shell (Access from Azure Portal)
Common Azure tools pre-installed and configured to use with your account

Supports both PowerShell and CLI (bash)

Runs in all modern desktop and tablet browsers

122
Azure Resource Hierarchy
Hierarchy: Management Group(s) > Subscription (s) >
Resource Group (s) > Resources
Resources: VMs, Storage, Databases
Resource groups: Organize resources by grouping them into
Resource groups
Subscriptions: Manage costs for resources provisioned for
different teams or different projects or different business units
Management groups: Centralized management for access,
policy, and compliance across multiple subscriptions (https://docs.microsoft.com/)
Remember:
No hierarchy in resource groups BUT management groups can
have a hierarchy

123
Resource Groups
Resource Group: Logical container for resources
Associated with a single subscription
Can have multiple resources
(REMEMBER) A resource can be associated with one and only one resource group
Can have resources from multiple regions
Deleting it deletes all resources under it
Tags assigned to resource group are not automatically applied
to resources
HOWEVER, Permissions/Roles assigned to user at the resource group level
are inherited by all resources in the group
Resource Groups (like Management Groups) are free

124
Subscriptions
You need a Subscription to create resources in Azure
Subscription links Azure Account to its resources
An Azure Account can have multiple subscriptions and multiple
account administrators
When do you create a new subscription?
I want to manage different access-management policies for different environments:
Create different subscriptions for different environments
Manage distinct Azure subscription policies for each environment
I want to manage costs across different departments of an organization:
Create different subscriptions for different departments
Create separate billing reports and invoices for each subscription (or department) and manage costs
I'm exceeding the limits available per subscription
Example: VMs per subscription - 25,000 per region

125
Management Groups
Allows you to manage access, policies, and
compliance across multiple subscriptions
Group subscriptions into Management Groups
All subscriptions & resources under a Management Group inherit
all constraints applied to it
(REMEMBER) You can create a hierarchy of
management groups
(REMEMBER) All subscriptions in a management group
should be associated with the same Azure AD tenant (https://docs.microsoft.com/)

126
Quick Review

127
Azure Storage - Quick Review
Service Description
Azure Disk storage Store disks attached to VMs.
Azure Blob storage Store unstructured data - video files, database archives etc.
Azure File storage Create file shares or file servers in the cloud
Azure Queue Decouple applications using a queue (asynchronous communication)
storage
Azure Table Store structure data using NoSQL approach (NON-relational). Schemaless. Key/attribute
storage store.

128
Azure Databases - Quick Review
Service Description
SQL Server on Azure Provides full administrative control over the SQL Server instance and underlying
Virtual Machines OS for migration to Azure
Azure SQL Database Fully Managed Service for Microsoft SQL Server.

Recommended for cloud-born applications


Azure SQL managed Full (Near 100%) SQL Server access and feature compatibility

instance Recommended for migrating on-premise SQL Server databases

Azure SQL managed instance ONLY features: Cross-database queries, Database


Mail Support, SQL Server Agent etc.
Azure Database for Fully managed MySQL database
MySQL
Azure Database for Fully managed PostgreSQL database
PostgreSQL

129
Azure Databases - Quick Review - 2
Service Description
Azure Cosmos DB NoSQL database. Globally distributed.

Core(SQL), MongoDB, Table, Gremlin and Cassandra APIs


Azure Cache for Redis Managed service for Redis (high-throughput, low-latency data caching)
Azure Database Migration Service Migrate databases to the cloud

130
Azure Analytics Services - Quick Review
Service Description
Azure Data Lake Storage Data lake built on Azure Blob Storage
Azure Databricks Managed Apache Spark
Azure HDInsight Managed Apache Hadoop
Azure Synapse Analytics End-to-end analytics solutions

Data integration + Enterprise data warehouse + Data analytics


Azure Data Factory Data Integration. Fully managed serverless service to build complex data pipelines.
Power BI Create visualization around data - reports & dashboards
Event Hubs Receive telemetry from millions of devices

131
Get Ready

132
Certification Exam
Certification Home Page
https://docs.microsoft.com/en-gb/learn/certifications/exams/dp-900
Different Types of Multiple Choice Questions
Type 1 : Single Answer - 2/3/4 options and 1 right answer
Type 2 : Multiple Answer - 5 options and 2 right answers
No penalty for wrong answers
Feel free to guess if you do not know the answer
40-60 questions and 65 minutes
Result immediately shown after exam completion
Email with detailed scores (a couple of days later)

133
Certification Exam - My Recommendations
Read the entire question
Identify the key parts of the question
Read all answers at least once
If you do NOT know the answer, eliminate wrong answers first
Mark questions for future consideration and review them before
final submission

134
You are all set!

135
Let's clap for you!
You have a lot of patience! Congratulations
You have put your best foot forward to get Microsoft Certification -
DP-900: Microsoft Azure Data Fundamentals
Make sure you prepare well and
Good Luck!

136
Do Not Forget!
Recommend the course to your friends!
Do not forget to review!
Your Success = My Success
Share your success story with me on LinkedIn (Ranga Karanam)
Share your success story and lessons learnt in Q&A with other learners!

137
What next?
https://github.com/in28minutes/roadmaps

Simplified Roadmaps to Learn:


AWS
Azure
Google Cloud
DevOps
Full Stack Development

138

You might also like