0% found this document useful (0 votes)
10 views

Module 3

Uploaded by

Neha Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module 3

Uploaded by

Neha Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Module 3 :

Microsoft Azure Data Fundamentals: Explore


non-relational data in Azure
Explore Azure blob storage
Azure Blob Storage is a service that enables you to store massive amounts of unstructured data as binary large
objects, or blobs, in the cloud. Blobs are an efficient way to store data files in a format that is optimized for
cloud-based storage, and applications can read and write them by using the Azure blob storage API.

Azure Blob Storage supports three different types of blob:

 Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 4000
MiB. A block blob can contain up to 190.7 TiB (4000 MiB X 50,000 blocks), giving a maximum
size of over 5000 MiB. The block is the smallest amount of data that can be read or written as an
individual unit. Block blobs are best used to store discrete, large, binary objects that change
infrequently.
 Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob
is optimized to support random read and write operations; you can fetch and store data for a
single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to
implement virtual disk storage for virtual machines.
 Append blobs. An append blob is a block blob optimized to support append operations. You
can only add blocks to the end of an append blob; updating or deleting existing blocks isn't
supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just
over 195 GB.

Blob storage provides three access tiers, which help to balance access latency and storage cost:

 The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data
is stored on high-performance media.
 The Cool tier has lower performance and incurs reduced storage charges compared to the Hot
tier. Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs
to be accessed frequently initially, but less so as time passes. In these situations, you can create
the blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob from the
Cool tier back to the Hot tier.
 The Archive tier provides the lowest storage cost, but with increased latency. The Archive tier is
intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive
tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is
a few milliseconds, but for the Archive tier, it can take hours for the data to become available. To
retrieve a blob from the Archive tier, you must change the access tier to Hot or Cool. The blob
will then be rehydrated. You can read the blob only when the rehydration process is complete
Explore Azure DataLake Storage Gen2
Azure Data Lake Store (Gen1) is a separate service for hierarchical data storage for analytical data lakes, often
used by so-called big data analytical solutions that work with structured, semi-structured, and unstructured
data stored in files. Azure Data Lake Storage Gen2 is a newer version of this service that is integrated into
Azure Storage;

Azure Files supports two common network file sharing protocols:

 Server Message Block (SMB) file sharing is commonly used across multiple operating systems
(Windows, Linux, macOS).
 Network File System (NFS) shares are used by some Linux and macOS versions. To create an NFS
share, you must use a premium tier storage account and create and configure a virtual network
through which access to the share can be controlled.

Explore Azure Tables


Azure Table Storage is a NoSQL storage solution that makes use of tables containing key/value data items.
Each item is represented by a row that contains columns for the data fields that need to be stored.

Check your knowledge


1. What are the elements of an Azure Table storage key?

Table name and column name

Partition key and row key

That's correct. The partition key identifies the partition in which a row is located, and the rows in each
partition are stored in row key order.
Row number

2. What should you do to an existing Azure Storage account in order to support a data lake for Azure
Synapse Analytics?

Add an Azure Files share

Create Azure Storage tables for the data you want to analyze

Upgrade the account to enable hierarchical namespace and create a blob container

That's correct. Enabling a hierarchical namespace adds support for Azure Data Lake Storage Gen 2,
which can be used by Synapse Analytics.
3. Why might you use Azure File storage?

To share files that are stored on-premises with users located at other sites.

To enable users at different sites to share files.

That's correct. You can create a file share in Azure File storage, upload files to this file share, and grant
access to the file share to remote users.
To store large binary data files containing images or other unstructured data.

Explore fundamentals of Azure Cosmos DB

Describe Azure Cosmos DB


Azure Cosmos DB supports multiple application programming interfaces (APIs) that enable developers to use
the programming semantics of many common kinds of data store to work with data in a Cosmos DB database.
The internal data structure is abstracted, enabling developers to use Cosmos DB to store and query data using
APIs with which they're already familiar.

When to use Cosmos DB

Cosmos DB is a highly scalable database management system. Cosmos DB automatically allocates


space in a container for your partitions, and each partition can grow up to 10 GB in size. Indexes are
created and maintained automatically.

IoT and telematics.

Retail and marketing

Gaming.

Web and mobile applications.

Identify Azure Cosmos DB APIs


Azure Cosmos DB is Microsoft's fully managed and serverless distributed database
for applications of any size or scale, with support for both relational and non-
relational workloads. Developers can build and migrate applications fast using their
preferred open source database engines, including PostgreSQL, MongoDB, and
Apache Cassandra.
Azure Cosmos DB for NoSQL

Azure Cosmos DB for NoSQL is Microsoft’s native non-relational service for working with the
document data model. It manages data in JSON document format, and despite being a NoSQL data
storage solution, uses SQL syntax to work with the data.

Azure Cosmos DB for MongoDB

MongoDB is a popular open source database in which data is stored in Binary JSON (BSON) format.
Azure Cosmos DB for MongoDB enables developers to use MongoDB client libraries and code to
work with data in Azure Cosmos DB.

Azure Cosmos DB for PostgreSQL

Azure Cosmos DB for PostgreSQL is a native PostgreSQL, globally distributed relational database that
automatically shards data to help you build highly scalable apps. You can start building apps on a
single node server group, the same way you would with PostgreSQL anywhere else.

Azure Cosmos DB for Table

Azure Cosmos DB for Table is used to work with data in key-value tables, similar to Azure Table
Storage. It offers greater scalability and performance than Azure Table Storage.

Azure Cosmos DB for Apache Cassandra

Azure Cosmos DB for Apache Cassandra is compatible with Apache Cassandra, which is a popular
open source database that uses a column-family storage structure. Column families are tables, similar
to those in a relational database, with the exception that it's not mandatory for every row to have the
same columns.

Azure Cosmos DB for Apache Gremlin

Azure Cosmos DB for Apache Gremlin is used with data in a graph structure; in which entities are
defined as vertices that form nodes in connected graph. Nodes are connected by edges that
represent relationships

Check your knowledge


1. Which API should you use to store and query JSON documents in Azure Cosmos DB?

Azure Cosmos DB for NoSQL


That's correct. The API for NoSQL is designed to store and query JSON documents.
Azure Cosmos DB for Apache Cassandra

Azure Cosmos DB for Table

2. Which Azure Cosmos DB API should you use to work with data in which entities and their
relationships to one another are represented in a graph using vertices and edges?

Azure Cosmos DB for MongoDB

Azure Cosmos DB for NoSQL

Azure Cosmos DB for Apache Gremlin

That's correct. The API for Gremlin is used to manage a network of nodes (vertices) and the
relationships between them (edges).
3. How can you enable globally distributed users to work with their own local replica of a Cosmos DB
database?

Create an Azure Cosmos DB account in each region where you have users.

Use the API for Table to copy data to Azure Table Storage in each region where you have users.

Enable multi-region writes and add the regions where you have users.

That's correct. You can enable multi-region writes in the regions where you want users to work with
the data.

You might also like