0% found this document useful (0 votes)
26 views

Notes

Uploaded by

Abbas Ali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Notes

Uploaded by

Abbas Ali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DP-900 Certification Topics / Notes

 Database ACID semantic


o Atomicity: Making sure one process or transection at a time in sequence.
o Durability:
o Consistency:
o Isolation: One Transection is total separated from one another.
 Non-Relational Database
o Store data without fixed schema
 Relational Database
o Need to have a fixed schema or before data entry
o Transactional write intensive
o In a relational database, each row in a table has THE SAME SET OF
COLUMNS
 Transactional Database
o Atomicity, Consistency, Isolation
 Characteristic of a Relational database
o data is queried and manipulated by using a variant of the SQL language
 Characteristic of a non-relational database
o Self-Define entities
 Polybase
o Run T/SQL on SQL instance that read data from external sources.
 Hybrid Relation object database
o Postgres SQL = Store Relation and Non relational data.
 EDGE devices
o Optimise for Quick and manage Input / Output.
 Data Query Language (DQL)
o Select
 Data Definition language (DDL)
o CREATE: Create Object in database
o ALTER: the structure of database
o DROP: Objects from Database
o TRUNCATE: Remove all records from a table, including all spaces allocated for
the records are removed
o RENAME: an object
o COMMENT
 Data manipulation language (DML)
o INSERT: Data into the table
o UPDATE: Existing data into the table
o DELETE: All records from the table, space for the records remains
o MERGE
o CALL
o EXPLAIN PLAN
o LOCK TABLE
 Data Control Language (DCL)
o GRANT: allow specified users to perform specified tasks.
o REVOKE: cancel previously granted or denied permissions.
 Transaction Control Language (TCL)
o COMMIT: Commit command is used to permanently save any transaction
o ROLLBACK: This command restores the database to last committed state
o SAVE POINT: Save point command is used to temporarily save a transaction so
that you can rollback to that point whenever necessary.
o SET TRANSACTION
 Graph Database
o Edge specifies the relationship between vertexes / nodes / Entities.
o Relationships between nodes / Entities
 Azure Storage > Azure Storage Account
o Cool Access is default tier for Azure store account
o Archive access take hour to retrieve data (Can’t read or modified).
o Read from multiple regions but cannot write
o Blob versioning automatically creates a previous version of a blob anytime it is
modified or deleted
o Hierarchical name space.
 Scalability
 Security > Folder Level
 Manipulation > Atomic directory
o Blob Storage
 Lifecycle management policy (Delete Automatically).
 Archive access tier has the highest latency of all three available tiers.
 Cool access is cheaper than the Hot access.
 Data store in Hot access is store on high-performance media.
 Append blob is not optimise for random read and writes.
 Blob rehydration occurs in Azure Blob storage when a blob moves
between (Archive to Cool)
o File Storage
 Fully managed cloud file share service that allows you to store and share
files in the cloud. It is designed for general-purpose file sharing scenarios
and is not optimized for storing event log data.
 HDD is standard
 SSD is Upgrade
 Share file by NFS / SMB
 Shareable on Window server
o Table Storage
 NoSQL database service that allows you to store large amounts of
structured and semi-structured data in a highly scalable and cost-effective
manner. It is ideal for storing event log data that is semi-structured and
received as the logs occur, as it can handle large volumes of data and
support flexible data models.
 Partition Key and Row key (two components by which row in a table is
uniquely identified)
 Item in same partition store in Row key order
 Unique Row key within the partition.
 Two-dimension tables
o Data lake Gen 2 > Store data that is in parquet format
1. Azure resource group
a. Azure Storage Account
i. File share
1. Folders
a. Files
 It supports Role Base Access Control (RBAC)
 provides native support for POSIX-compliant access control lists (ACLs)
at the file and folder level.
 Store data to process not for virtual machines
 Data lake
 >>
 Delta lake
 Transactional consistency.
 Schema in forced.
o Azure Data Lake Storage
 Can be used as a staging area for ingested data before the data is converted
into a format suitable for performing analytics. It allows you to store and
analyze large amounts of data, and it is optimized for big data analytics
workloads.
 Stream processing
 Stream Analytics supports from Kafka to Data Lake
 Stream Analytics supports SQL language.
o Page Storage
 Used for VHDs
 Power Bi
o Collection of visualization appears together is Report.
o Dashboards is the combination of related reports in single location for visibility
into data.
o To share data model / report that are created in BI desktop need to be share first to
BI services.
o Power Bi Desktop
 Full range of data modeling and report editing features.
 Workflow of creating an analytics solution start here
 What can be added to Dashboard?
 Text
 Report page
 Visual from report
o Copy dashboard = Yes
o Single workspace = Yes
o Visual from excel = Yes
o Paginated Reports
 Can be done my Power Bi report builder
 It is highly formatted
 Fixed layout
 Printable
o Power Bi Service
 We can publish Bi reports and models after creating.
 Report and dashboard creation / Report share, and distribution can be done
independently by BI services
 Power Bi charts
o Pi chart = Compare different values
o Line chart = Examine Trends
o Bar / Column chart = Compare different value for discrete values
o Scatter chart = Examine relation or connection between two number values.
o Tree map = Treemaps are charts of colored rectangles, with size representing
value
o Key influencer = A key influencer chart displays the major contributors to a
selected result or value.
 Power BI data model elements
o Dimensions = Aggregate data
o Fact table = Measure that aggregate data
o Key = Establish the relationship between dimensions and fact table.
 Power Bi report search
o Hierarchy
 Allow to drill up and down in the report.
 Semi-Structure Files
o CSV, XML and JSON documents are semi-structured documents
 CSV (Used to store delimited data, Separate data fields by using commas,
and terminates rows by using a carriage return)
 JSON (Uses a hierarchical document schema to define entities that have
multiple attributes)
 Azure Data Factory
o cloud-based data integration service that allows you to create, schedule, and
manage data pipelines. It does not use a massively parallel processing architecture
by default but allows you to perform complex data integration and transformation
tasks by integrating with other Azure services.
 Azure Synapse Analytics
o SQL is used to query.
o Python (data preprocessing, machine learning, and other data science tasks).
 Azure Stream Analytics
o SQL (It is used to analyze and process high-velocity, real-time streaming data
from various sources, such as IoT devices, social media platforms, and other real-
time data streams.)
o Support Kafka to data lake
 Azure Data Explorer
o Use KQL (used to analyze and query large volumes of structured and unstructured
data in real-time)
 Azure HDInsight
o Designed for big data analytics workloads and does not enable you to build
tabular models from multiple data sources to support OLAP queries. Azure
Analysis Services is a service that you can use to create and manage tabular
models for OLAP.
 Microsoft 365
o Assess a company risk against industry and international regulation & standards
o It is component of service trust portal
o Customer cannot use to grant Microsoft engineer access to perform a specific
task.
 Apache Parquet
o Defined names and data types for each column and uses compressed columnar
storage.
 Data Warehouse
o It is optimized for Read operations.
o Data can be aggregated and located into OLAP model also known as CUBE

 Azure SQL Managed Instance


o offers providing near-100 percent compatibility with on-premises Microsoft SQL
Server instances, while providing automated updates. backups, and maintenance
tasks.
 SQL Server on Azure Virtual Machines
o are lift-and-shift ready for existing applications that require fast migration to the
cloud with minimal changes or no changes. SQL virtual machines offer full
administrative control over the SQL Server instance and underlying OS for
migration to Azure.
 Azure SQL Single Database
o Supports the serverless configuration of an Azure SQL database
o Auto scaling
o Bill per second
o Azure Database for MySQL does not always require SSL connections, Good for
security reasons but not compulsory.
 ELT Processes
o A target store that are powerful enough that can transform the date.
 Star Schema
o Dimensions are used to aggregate or slice measure.
 PostgreSQL
o pgAdmin is a popular open-source administration and management tool for
PostgreSQL databases. While it is not directly provided or managed by Azure,
you can use pgAdmin to connect to and manage Azure Database for PostgreSQL
instances.

 MariaDB
o does not offer point-in-time restore for up to 365 days. Azure Database for
MariaDB provides automated backups with a retention period of up to 35 days.
 Cosmos DB
o Allow aggregation for analytics
o It guarantees to read/write under 10 milliseconds latency
o Cosmos DB Container level configuration
 Throughput
 Partition key > Optimise Queries
o Cosmos DB set throughput on which level
 Container
 Database
o It has native SQL API support
o It has configurable indexes.
 Document DB
o Flexible, JSON, Same document.
 Cosmos DB API
o Gremlin API > Graph / Key/valuepairs
o Cor API > JSON format > Let us use SELECT statement.
o Mongo API > BSON format
o Table API > Key value
 Read/Write to multiple regions.
o Cassendra API > Colum family storage structure
 Support Apache Spark and Data Analytics
 Cosmos DB API supported Queries
o Cassendra API > SQL
o Mongo API > MQL
o Graph AP > Gremlin
o Table API > OData / LINQ Queries
 Azure Data Bricks
o It is build based on Apache Spark.
o It processes large amount of data by multiple providers
 We can use Apache Spark to provision
 Data Bricks
 Synapse Analytics
 HD Insights
o By using Data Bricks to Pre-Process data need to use scala.
o Use to visualize data web-based interface.
o It connects to
 Azure SQL Server
 Event hub
 Cosmos DB
 Azure Data Factory
o Triggers are used to start the activate in pipeline.
o Process large amount of data using ETL pipeline.
o Control Flow is to arrange the activities in pipeline.
o Data Integration runtime use MMP (Multiprocessing) it is the compute
infrastructure for data factory.
o SSIS is also an integration service but not part of the data factory.
 Data factory has 4 Components
 Dataset = Data structure in data store
 Activity = Action on dataset
o Data momentum
o Data transformation
 Linked services
o Orchestrate data flow without code
 Azure Synapse Analytics > Tabular representation of data in parquet.
o Native support/built of Apache Spark.
o Massively parallel processing Engine is to distribute processing across compute
Nodes.
o Independent scaling of Storage / Compute = Yes.
o We can pause to reduce the cost = Yes.
o Connector Activities
 Mapping activities
 Look up activities
 Meta data activities.
 Source / Sink matrix table
o We can use synapse Analytics to pre-process data by using Scala.
 OLTP (Online Transactional Processing)
o Database system data is optimized for both Read and Write operations.
o IT will be highly normalized
o Mostly use or write Heavy.
o It is used for transactional workload.
o Schema on write
o Transactional System App
 Live/ Lob App
 OLAP (Online analytical processing)
o Database system data is not optimized for both Read and Write operations but
only for Reads.
o Suitable for analytical workload be data is pre-aggregated.
o Handle complex analytical queries and provide fast query response times.
o Transactional workloads (is commonly used for recording small units of work
events in real time)
o Denormalized
o Mostly used for analytical processing / purposes
o More on read heavy for reporting
o It can be used for paginated reports with dimensional model in warehouse.
o OLAP database
 Star schema with hierarchical data.
 Large amount of data
o HD inside by Hadoop
 Perquite Data
o Azure SQL Database out data to parquet format
o Synapse Analytics dose Tabular Representation of data in parquet
o Gen2- Data Lake can store data that is in parquet format

 Managed SQL Instance


o Fully manage relation database provide 100% features.
 Different analytics
o Descriptive Analytics tells you what happened in the past.
o Cognitive is to transcribe the audio files to text.
o Diagnostic, to answer the question: Why's happening?
o Predictive, to answer the question: What will happen?
o Prescriptive, to answer the question: What actions should we take?
 ELT (Extract, Transform, Load)
o CRM tool > Integration services > Data warehouse
 ETL (Extract, Load, Transform)
o CRM tool > Data warehouse > Integration services.
o Data is fully processed before being loaded to the targeted store
 Bach Processing
o It is to process large amount of data after some interval of time is called batch
processing but, in this case, we can expect latency and time to process the data.
o Output date to file Storage = Yes
o Output date to Relational Database = Yes
o Output date to NoSQL Database = Yes
 star schema
o In computing, the star schema is the simplest style of data mart schema and is the
approach most widely used to develop data warehouses and dimensional data
marts. The star schema consists of one or more fact tables referencing any number
of dimension tables. The star schema is an important special case of the snowflake
schema and is more effective for handling simpler queries.
o Dimension: The star schema consists of one or more fact tables referencing any
number of dimension tables.

 Azure Data Studio


o Cross-platform database tool that allows you to embed documents and query
results into a SQL notebook. It supports Azure SQL databases, SQL Server, and
other databases. It also has built-in support for SQL Notebooks, which allows you
to mix markdown, code, and results in a single notebook. This makes it a great
tool for creating documentation and troubleshooting guides for administrators to
use when working with Azure SQL databases.
o Light weight editor that can run on-demand SQL queries and view and save
results as text, JSON, or MS Excel.
 Azure Resource Manager template
o You can automate deployments and use the practice of infrastructure as code. In
code, you define the infrastructure that needs to be deployed
o To implement infrastructure as code for your Azure solutions, use Azure Resource
Manager templates (ARM templates). The template is a JavaScript Object
o Notation (JSON) file that defines the infrastructure and configuration for your
project. The template uses declarative syntax, which lets you state what you
intend to deploy without having to write the sequence of programming commands
to create it. In the template, you specify the resources to deploy and the properties
for those resources.
 Microsoft SQL Server Data Tools (SSDT)
o SQL Server Data Tools (SSDT) is a modern development tool for building SQL
Server relational databases, databases in Azure SQL, Analysis Services (AS) data
models, Integration Services (IS) packages, and Reporting Services (RS) reports.
With SSDT, you can design and deploy any SQL Server content type with the
same ease as you would develop an application in Visual Studio.

You might also like