0% found this document useful (0 votes)

35 views3 pages

Module 4

Uploaded by

Neha Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

Module 4

Uploaded by

Neha Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Module 4 :

Microsoft Azure Data Fundamentals: Explore

data analytics in Azure
Describe data warehousing architecture
Large-scale data analytics architecture can vary, as can the specific technologies used
to implement it;
Data ingestion and processing – data from one or more transactional data stores, files, real-time streams, or
other sources is loaded into a data lake or a relational data warehouse. The load operation usually involves
an extract, transform, and load (ETL) or extract, load, and transform (ELT) process in which the data is cleaned,
filtered, and restructured for analysis. In ETL processes, the data is transformed before being loaded into an
analytical store, while in an ELT process the data is copied to the store and then transformed. Either way, the
resulting data structure is optimized for analytical queries

Analytical data store – data stores for large scale analytics include relational data warehouses, file-
system based data lakes, and hybrid architectures that combine features of data warehouses and data
lakes (sometimes called data lakehouses or lake databases). We'll discuss these in more depth later.

Explore data ingestion pipelines

On Azure, large-scale data ingestion is best implemented by creating pipelines that orchestrate ETL processes.
You can create and run pipelines using Azure Data Factory, or you can use a similar pipeline engine in Azure
Synapse Analytics or Microsoft Fabric if you want to manage all of the components of your data analytics
solution in a unified workspace.

Explore analytical data stores

There are two common types of analytical data store.

Data warehouses
A data warehouse is a relational database in which the data is stored in a schema that is optimized for data
analytics rather than transactional workloads. Commonly, the data from a transactional store is transformed
into a schema in which numeric values are stored in central fact tables, which are related to one or
more dimension tables that represent entities by which the data can be aggregated
Data lakehouses
A data lake is a file store, usually on a distributed file system for high performance data access. Technologies
like Spark or Hadoop are often used to process queries on the stored files and return data for reporting and
analytics. These systems often apply a schema-on-read approach to define tabular schemas on semi-structured
data files at the point where the data is read for analysis, without applying constraints when it's stored.

Explore platform-as-a-service (PaaS) solutions

Azure Synapse Analytics is a unified, end-to-end solution for large scale data analytics. It brings together
multiple technologies and capabilities, enabling you to combine the data integrity and reliability of a scalable,
high-performance SQL Server based relational data warehouse with the flexibility of a data lake and open-
source Apache Spark. It also includes native support for log and telemetry analytics with Azure Synapse Data
Explorer pools, as well as built in data pipelines for data ingestion and transformation.

Azure Databricks is an Azure implementation of the popular Databricks platform. Databricks is a

comprehensive data analytics solution built on Apache Spark, and offers native SQL capabilities as well as
workload-optimized Spark clusters for data analytics and data science.

Azure HDInsight is an Azure service that supports multiple open-source data analytics cluster types. Although
not as user-friendly as Azure Synapse Analytics and Azure Databricks, it can be a suitable option if your
analytics solution relies on multiple open-source frameworks or if you need to migrate an existing on-premises
Hadoop-based solution to the cloud

Explore Microsoft Fabric

Scalable analytics with PaaS services can be complex, fragmented, and expensive. With Microsoft Fabric, you
don't have to spend all of your time combining various services and implementing interfaces through which
business users can access them. Instead, you can use a single product that is easy to understand, set up, create,
and manage. Fabric is a unified software-as-a-service (SaaS) offering, with all your data stored in a single open
format in OneLake.

Check your knowledge

1. Which Azure PaaS services can you use to create a pipeline for data ingestion and processing?

Azure SQL Database and Azure Cosmos DB

Azure Synapse Analytics and Azure Data Factory

That's correct. Both Azure Synapse Analytics and Azure Data Factory include the capability to create
pipelines.
Azure HDInsight and Azure Databricks
2. What must you define to implement a pipeline that reads data from Azure Blob Storage?

A linked service for your Azure Blob Storage account

That's correct. You need to create linked services for external services you want to use in the pipeline.
A dedicated SQL pool in your Azure Synapse Analytics workspace

An Azure HDInsight cluster in your subscription

3. Which open-source distributed processing engine does Azure Synapse Analytics include?

Apache Hadoop

Apache Spark

That's correct. Azure Synapse Analytics includes an Apache Spark runtime.

Apache Storm

Understand batch and stream processing

Big Data Components
No ratings yet
Big Data Components
31 pages
Uk Sganalytics Com Blog Evolving Big Data Strategies With Data Lakehouses and Da
No ratings yet
Uk Sganalytics Com Blog Evolving Big Data Strategies With Data Lakehouses and Da
12 pages
Synapse Project Deck
No ratings yet
Synapse Project Deck
196 pages
AWS Associate Data Engineer
100% (2)
AWS Associate Data Engineer
23 pages
09 - Azure Data Engineering Cheatsheet
No ratings yet
09 - Azure Data Engineering Cheatsheet
37 pages
DP 900t00a Enu Powerpoint 04
No ratings yet
DP 900t00a Enu Powerpoint 04
23 pages
Azure Data Platform End2End - 2day
100% (2)
Azure Data Platform End2End - 2day
108 pages
Azure Data Engineer - Samatha Gudala
100% (1)
Azure Data Engineer - Samatha Gudala
8 pages
Databricks Guide
No ratings yet
Databricks Guide
27 pages
ELT Architecture in The Azure Cloud
No ratings yet
ELT Architecture in The Azure Cloud
8 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Big Data Architectures
No ratings yet
Big Data Architectures
8 pages
07 Spark Dataframes
100% (1)
07 Spark Dataframes
45 pages
Modern Analytics Academy - Data Modeling
No ratings yet
Modern Analytics Academy - Data Modeling
12 pages
MIE1628 Big Data Analytics Lecture7
No ratings yet
MIE1628 Big Data Analytics Lecture7
77 pages
Azure Data Factory Microsoft Fabric
No ratings yet
Azure Data Factory Microsoft Fabric
14 pages
IoT With Cloud Computing - Unit 3
No ratings yet
IoT With Cloud Computing - Unit 3
27 pages
SDC - Synapse Analytics
No ratings yet
SDC - Synapse Analytics
23 pages
Aniruddha BigDataandAnalytics
No ratings yet
Aniruddha BigDataandAnalytics
33 pages
Fabric and Data Bricks
No ratings yet
Fabric and Data Bricks
49 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
Sampath Polishetty BigData Consultant
No ratings yet
Sampath Polishetty BigData Consultant
7 pages
Week 4 - Azure-AWSStorage
No ratings yet
Week 4 - Azure-AWSStorage
97 pages
Tools Abhishek
No ratings yet
Tools Abhishek
7 pages
Data All Delivering Them DW With Azure 202003224202063744
No ratings yet
Data All Delivering Them DW With Azure 202003224202063744
92 pages
Azure Data Engineer Interview QA
No ratings yet
Azure Data Engineer Interview QA
2 pages
Big Data Components
No ratings yet
Big Data Components
58 pages
Week 2 Data Rols DataPlatfro Use Cases v1 S25
No ratings yet
Week 2 Data Rols DataPlatfro Use Cases v1 S25
50 pages
Azure Data Platform End2End - 1day
No ratings yet
Azure Data Platform End2End - 1day
90 pages
Azure Data Solutions
No ratings yet
Azure Data Solutions
7 pages
Big Data Arch
No ratings yet
Big Data Arch
2 pages
Unit 1
No ratings yet
Unit 1
9 pages
Big Data Technology Stack
100% (1)
Big Data Technology Stack
12 pages
Real Time Analytics With Apache Kafka and Spark: @rahuldausa
No ratings yet
Real Time Analytics With Apache Kafka and Spark: @rahuldausa
54 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Data Engineering 101 - Azure Synapse Analytics
No ratings yet
Data Engineering 101 - Azure Synapse Analytics
45 pages
Bigquery Scenarios - Dipakraj Patil
No ratings yet
Bigquery Scenarios - Dipakraj Patil
37 pages
Big Data
No ratings yet
Big Data
41 pages
Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data For Bioinformatics
No ratings yet
Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data For Bioinformatics
12 pages
Aditya 18cs03 Seminar Report
No ratings yet
Aditya 18cs03 Seminar Report
27 pages
Data Ingestion With Python Cookbook: A Practical Guide To Ingesting, Monitoring, and Identifying Errors in The Data Ingestion Process Esppenchutz
100% (10)
Data Ingestion With Python Cookbook: A Practical Guide To Ingesting, Monitoring, and Identifying Errors in The Data Ingestion Process Esppenchutz
51 pages
dp100 Renewalfail 17-12-2023
No ratings yet
dp100 Renewalfail 17-12-2023
50 pages
MIS Final Report
No ratings yet
MIS Final Report
8 pages
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
No ratings yet
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
12 pages
Mohammad Wahaj Tariq Resume Senior Full Stack Data Engineer
No ratings yet
Mohammad Wahaj Tariq Resume Senior Full Stack Data Engineer
3 pages
Hi Patokl
No ratings yet
Hi Patokl
7 pages
Machine Learning With PySpark
No ratings yet
Machine Learning With PySpark
21 pages
Apache Hudi 101
No ratings yet
Apache Hudi 101
63 pages
Paper 142
No ratings yet
Paper 142
12 pages
UNIT 2 Notes by ARUN JHAPATE
No ratings yet
UNIT 2 Notes by ARUN JHAPATE
22 pages
What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs
No ratings yet
What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs
6 pages
Data Science Tools
No ratings yet
Data Science Tools
2 pages
Silabus PII Short Course Intro To Big Data in 2 Hours
No ratings yet
Silabus PII Short Course Intro To Big Data in 2 Hours
3 pages
Talha's Resume
No ratings yet
Talha's Resume
1 page
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SQL Server Integration Services Essentials: Definitive Reference for Developers and Engineers
From Everand
SQL Server Integration Services Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
From Everand
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
From Everand
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
From Everand
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Apache Arrow Dataset in Practice: The Complete Guide for Developers and Engineers
From Everand
Apache Arrow Dataset in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet

Module 4

Uploaded by

Module 4

Uploaded by

Module 4 :

Microsoft Azure Data Fundamentals: Explore

Explore data ingestion pipelines

Explore analytical data stores

Explore platform-as-a-service (PaaS) solutions

Azure Databricks is an Azure implementation of the popular Databricks platform. Databricks is a

Explore Microsoft Fabric

Check your knowledge

Azure SQL Database and Azure Cosmos DB

Azure Synapse Analytics and Azure Data Factory

A linked service for your Azure Blob Storage account

An Azure HDInsight cluster in your subscription

That's correct. Azure Synapse Analytics includes an Apache Spark runtime.

Understand batch and stream processing

You might also like