Notes
Notes
MariaDB
o does not offer point-in-time restore for up to 365 days. Azure Database for
MariaDB provides automated backups with a retention period of up to 35 days.
Cosmos DB
o Allow aggregation for analytics
o It guarantees to read/write under 10 milliseconds latency
o Cosmos DB Container level configuration
Throughput
Partition key > Optimise Queries
o Cosmos DB set throughput on which level
Container
Database
o It has native SQL API support
o It has configurable indexes.
Document DB
o Flexible, JSON, Same document.
Cosmos DB API
o Gremlin API > Graph / Key/valuepairs
o Cor API > JSON format > Let us use SELECT statement.
o Mongo API > BSON format
o Table API > Key value
Read/Write to multiple regions.
o Cassendra API > Colum family storage structure
Support Apache Spark and Data Analytics
Cosmos DB API supported Queries
o Cassendra API > SQL
o Mongo API > MQL
o Graph AP > Gremlin
o Table API > OData / LINQ Queries
Azure Data Bricks
o It is build based on Apache Spark.
o It processes large amount of data by multiple providers
We can use Apache Spark to provision
Data Bricks
Synapse Analytics
HD Insights
o By using Data Bricks to Pre-Process data need to use scala.
o Use to visualize data web-based interface.
o It connects to
Azure SQL Server
Event hub
Cosmos DB
Azure Data Factory
o Triggers are used to start the activate in pipeline.
o Process large amount of data using ETL pipeline.
o Control Flow is to arrange the activities in pipeline.
o Data Integration runtime use MMP (Multiprocessing) it is the compute
infrastructure for data factory.
o SSIS is also an integration service but not part of the data factory.
Data factory has 4 Components
Dataset = Data structure in data store
Activity = Action on dataset
o Data momentum
o Data transformation
Linked services
o Orchestrate data flow without code
Azure Synapse Analytics > Tabular representation of data in parquet.
o Native support/built of Apache Spark.
o Massively parallel processing Engine is to distribute processing across compute
Nodes.
o Independent scaling of Storage / Compute = Yes.
o We can pause to reduce the cost = Yes.
o Connector Activities
Mapping activities
Look up activities
Meta data activities.
Source / Sink matrix table
o We can use synapse Analytics to pre-process data by using Scala.
OLTP (Online Transactional Processing)
o Database system data is optimized for both Read and Write operations.
o IT will be highly normalized
o Mostly use or write Heavy.
o It is used for transactional workload.
o Schema on write
o Transactional System App
Live/ Lob App
OLAP (Online analytical processing)
o Database system data is not optimized for both Read and Write operations but
only for Reads.
o Suitable for analytical workload be data is pre-aggregated.
o Handle complex analytical queries and provide fast query response times.
o Transactional workloads (is commonly used for recording small units of work
events in real time)
o Denormalized
o Mostly used for analytical processing / purposes
o More on read heavy for reporting
o It can be used for paginated reports with dimensional model in warehouse.
o OLAP database
Star schema with hierarchical data.
Large amount of data
o HD inside by Hadoop
Perquite Data
o Azure SQL Database out data to parquet format
o Synapse Analytics dose Tabular Representation of data in parquet
o Gen2- Data Lake can store data that is in parquet format