Module 3
Module 3
Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 4000
MiB. A block blob can contain up to 190.7 TiB (4000 MiB X 50,000 blocks), giving a maximum
size of over 5000 MiB. The block is the smallest amount of data that can be read or written as an
individual unit. Block blobs are best used to store discrete, large, binary objects that change
infrequently.
Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob
is optimized to support random read and write operations; you can fetch and store data for a
single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to
implement virtual disk storage for virtual machines.
Append blobs. An append blob is a block blob optimized to support append operations. You
can only add blocks to the end of an append blob; updating or deleting existing blocks isn't
supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just
over 195 GB.
Blob storage provides three access tiers, which help to balance access latency and storage cost:
The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data
is stored on high-performance media.
The Cool tier has lower performance and incurs reduced storage charges compared to the Hot
tier. Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs
to be accessed frequently initially, but less so as time passes. In these situations, you can create
the blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob from the
Cool tier back to the Hot tier.
The Archive tier provides the lowest storage cost, but with increased latency. The Archive tier is
intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive
tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is
a few milliseconds, but for the Archive tier, it can take hours for the data to become available. To
retrieve a blob from the Archive tier, you must change the access tier to Hot or Cool. The blob
will then be rehydrated. You can read the blob only when the rehydration process is complete
Explore Azure DataLake Storage Gen2
Azure Data Lake Store (Gen1) is a separate service for hierarchical data storage for analytical data lakes, often
used by so-called big data analytical solutions that work with structured, semi-structured, and unstructured
data stored in files. Azure Data Lake Storage Gen2 is a newer version of this service that is integrated into
Azure Storage;
Server Message Block (SMB) file sharing is commonly used across multiple operating systems
(Windows, Linux, macOS).
Network File System (NFS) shares are used by some Linux and macOS versions. To create an NFS
share, you must use a premium tier storage account and create and configure a virtual network
through which access to the share can be controlled.
That's correct. The partition key identifies the partition in which a row is located, and the rows in each
partition are stored in row key order.
Row number
2. What should you do to an existing Azure Storage account in order to support a data lake for Azure
Synapse Analytics?
Create Azure Storage tables for the data you want to analyze
Upgrade the account to enable hierarchical namespace and create a blob container
That's correct. Enabling a hierarchical namespace adds support for Azure Data Lake Storage Gen 2,
which can be used by Synapse Analytics.
3. Why might you use Azure File storage?
To share files that are stored on-premises with users located at other sites.
That's correct. You can create a file share in Azure File storage, upload files to this file share, and grant
access to the file share to remote users.
To store large binary data files containing images or other unstructured data.
Gaming.
Azure Cosmos DB for NoSQL is Microsoft’s native non-relational service for working with the
document data model. It manages data in JSON document format, and despite being a NoSQL data
storage solution, uses SQL syntax to work with the data.
MongoDB is a popular open source database in which data is stored in Binary JSON (BSON) format.
Azure Cosmos DB for MongoDB enables developers to use MongoDB client libraries and code to
work with data in Azure Cosmos DB.
Azure Cosmos DB for PostgreSQL is a native PostgreSQL, globally distributed relational database that
automatically shards data to help you build highly scalable apps. You can start building apps on a
single node server group, the same way you would with PostgreSQL anywhere else.
Azure Cosmos DB for Table is used to work with data in key-value tables, similar to Azure Table
Storage. It offers greater scalability and performance than Azure Table Storage.
Azure Cosmos DB for Apache Cassandra is compatible with Apache Cassandra, which is a popular
open source database that uses a column-family storage structure. Column families are tables, similar
to those in a relational database, with the exception that it's not mandatory for every row to have the
same columns.
Azure Cosmos DB for Apache Gremlin is used with data in a graph structure; in which entities are
defined as vertices that form nodes in connected graph. Nodes are connected by edges that
represent relationships
2. Which Azure Cosmos DB API should you use to work with data in which entities and their
relationships to one another are represented in a graph using vertices and edges?
That's correct. The API for Gremlin is used to manage a network of nodes (vertices) and the
relationships between them (edges).
3. How can you enable globally distributed users to work with their own local replica of a Cosmos DB
database?
Create an Azure Cosmos DB account in each region where you have users.
Use the API for Table to copy data to Azure Table Storage in each region where you have users.
Enable multi-region writes and add the regions where you have users.
That's correct. You can enable multi-region writes in the regions where you want users to work with
the data.