Azure Data Engineering - Pragathi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Microsoft Certified: Azure Data Engineer Associate

Module 1 ( Non-Relational Data Stores and Azure Data Lake )


 Storage
 Document data stores
 Columnar data stores
 Key/value data stores
 Graph data stores
 Time series data stores
 Object data stores
 External index
 Why NoSQL or Non-Relational DB?
 When to Choose NoSQL or Non-Relational DB?

Module 2 ( Data Lake and Azure Cosmos DB )


 Data Lake Key Concepts
 Azure Cosmos DB
 Why Azure Cosmos DB?
 Azure Blob Storage
 Why Azure Blob Storage?
 Data Partitioning
 Why Partitioning Data?
 Consistency Levels in Azure Cosmos DB

Module 3 ( Relational Data Stores )


 Introduction to Relational Data Stores
 Azure SQL Database
 Why SQL Database Elastic Pool?

Module 4 ( Why Azure SQL? )


 Azure SQL Security Capabilities
 High-Availability and Azure SQL Database
 Azure Database for MySQL
 Azure Database for PostgreSQL
 Azure Database For MariaDB
 What is PolyBase?
 What is Azure Synapse Analytics (formerly SQL DW)?

Module 5 ( Azure Batch )


 What is Azure Batch?
 Intrinsically Parallel Workloads
 Tightly Coupled Workloads
 Additional Batch Capabilities
 Working of Azure Batch

Module 6 ( Azure Data Factory )


 Flow Process of Data Factory
 Why Azure Data Factory
 Integration Runtime in Azure Data Factory
 Mapping Data Flows

Module 7 ( Azure Data Bricks )


 What is Azure Databricks?
 Azure Spark-based Analytics Platform
 Apache Spark in Azure Databricks
Module 8 ( Data Ingestion from Azure Blob )
 Copy Activity Overview
 Environment Preparation
 Naming Standards
 Linked Services & Data Sets
 Creating ADF Pipeline Preview
 Control Flow Activities (1) – Validation Activity Preview
 Control Flow Activities (2) – Get Metadata, If Condition, Web Activities
 Control Flow Activities (3) – Delete Activity
 ADF Triggers Overview
 Creating Event Trigger

Module 9 ( Introduction to Spark & Spark Architecture )


 Overview of Data Processing
 Overview of Data Processing Libraries
 Setting up Environment to explore Pandas, Dask and Pyspark
 Code examples of Pandas, Dask, and Pyspark
 Difference between Pandas, Dask and Pyspark
 Overview of Distributed Computing
 Overview of Official documentation of Apache Spark
 Spark Key Features & Platforms Preview
 Spark Infrastructure
 Spark Cluster using Databricks
 Executers in Spark Cluster
 Spark Glossary
 Understanding Spark Key Terms

Module 10 ( Azure Stream Analytics )


 Working of Stream Analytics
 Key capabilities and benefits
 Stream Analytics Windowing Functions

Module 11 ( Monitoring & Security )


 What is Azure Monitor?
 What data does Azure Monitor collect?
 What can you Monitor?
 Alerts in Azure
 Azure Security Logging & Auditing

You might also like