Data Engineering 101 - Azure Synapse Analytics
Data Engineering 101 - Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Spark Pool
An Apache Spark cluster integrated with Azure Synapse
for large-scale data processing and machine learning.
Allows running Spark jobs using Scala, Python, SQL,
and R.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Synapse Pipelines
An orchestration tool that allows for the creation,
scheduling, and monitoring of data workflows. Enables
integration of data sources and automation of data
movement and transformation tasks.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Synapse Studio
A unified interface that provides tools for data
exploration, transformation, integration, and
visualization. It allows for managing Synapse resources,
running SQL queries, Spark jobs, and building
pipelines.
Running SQL and Spark Jobs:
Use Synapse Studio to open a SQL script and execute
a query like SELECT * FROM SalesData on a
dedicated SQL pool.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Integration
Integration of various services such as Azure Data
Factory, Power BI, and Azure Machine Learning with
Synapse Analytics for comprehensive data processing
and analysis.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Synapse Notebooks
Interactive notebooks that support multi-language code
execution, enabling data scientists and engineers to
explore data, build models, and collaborate on data-
driven projects.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Synapse Link
A feature that allows for real-time, operational
analytics by enabling seamless connectivity between
Azure Cosmos DB and Azure Synapse Analytics, without
ETL processes.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Flow
A visual, no-code interface within Synapse Pipelines
that allows for complex data transformations, such as
joins, aggregations, and data cleansing, directly within
the pipeline.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Monitoring a Pipeline:
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Workload Management
Techniques for managing and optimizing resource
allocation, query performance, and concurrency in
dedicated SQL pools. It includes features like workload
groups and resource classes.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Partitioning
Dividing large tables into smaller, more manageable
partitions to improve query performance and
scalability. Typically used in dedicated SQL pools to
handle large datasets efficiently.
Implementing Partitioning:
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
PolyBase
A technology that enables the querying of external data
stored in sources like Hadoop or Azure Blob Storage as
if it were within a relational database.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Materialized Views
Precomputed views that store the results of a query
physically, allowing for faster retrieval times and
reduced query execution time by eliminating the need
to recompute results.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Columnstore Indexes
A type of index optimized for read-heavy workloads in
large datasets, providing significant compression and
performance improvements for analytical queries in
dedicated SQL pools.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Encryption
Techniques to protect sensitive data within Synapse
Analytics, both at rest and in transit, using encryption
methods like TDE and SSL/TLS.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Governance
Features and integrations that enable data cataloging,
classification, and governance within Synapse Analytics,
ensuring data is managed according to organizational
policies and regulatory requirements.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Streaming Data
Capabilities within Synapse Analytics to ingest, process,
and analyze real-time streaming data from sources like
IoT devices or event hubs, allowing for timely decision-
making and analytics.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Cross-Dw Query
The ability to execute queries across multiple Synapse
workspaces or Azure services using T-SQL, allowing for
comprehensive analysis without needing to move data
between environments.
SELECT *
FROM [workspace1].[database1].[dbo].[table1]
UNION ALL
SELECT *
FROM [workspace2].[database2].[dbo].[table2];
Workspace Management
Centralized management of Synapse Analytics
resources, including SQL pools, Spark pools, pipelines,
and linked services, all within a single workspace.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Linked Services
A configuration used within Synapse Analytics to define
the connection information for external data sources,
such as databases, data lakes, and other cloud services.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Synapse Roles
Role-based access control (RBAC) in Synapse Analytics
to manage permissions and access to resources,
ensuring only authorized users can access or modify
certain resources.
Assigning Roles:
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Masking
A security feature that hides sensitive data in query
results, displaying masked values to users who do not
have the necessary permissions to view the original
data.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Elastic Query
Allows querying across multiple databases and Synapse
instances, providing the ability to execute distributed
queries that span different data sources within Synapse
Analytics.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Distribution
Strategies for distributing data across nodes in a
Synapse dedicated SQL pool to optimize performance
and resource utilization, including hash, round-robin,
and replicated distributions.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Classifying Data:
UPDATE [sys].[sensitivity_classifications]
SET label_id = 'Confidential', information_type_id = 'Sensitive'
WHERE object_id = OBJECT_ID('Customers') AND
column_id = COLUMNPROPERTY(OBJECT_ID('Customers'),
'SSN', 'ColumnId');
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Pipeline Parameters
Variables that are passed into Synapse Pipelines to
dynamically control the behavior of activities, allowing
for flexible, reusable pipeline designs.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Isolating Workloads:
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Data Auditing
The process of tracking and recording access to and
modifications of data within Synapse Analytics to
ensure compliance with security policies and regulatory
requirements.
Enabling Auditing:
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics
Shwetank Singh
GritSetGrow - GSGLearn.com