0% found this document useful (0 votes)

87 views45 pages

Data Engineering 101 - Azure Synapse Analytics

Uploaded by

Md Zia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views45 pages

Data Engineering 101 - Azure Synapse Analytics

Uploaded by

Md Zia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Data Engineering 101: Azure Synapse Analytics

DATA ENGINEERING 101

Azure Synapse
Analytics

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Azure Synapse Analytics

An integrated analytics service combining big data and
data warehousing, providing the ability to analyze data
across data lakes, data warehouses, and big data
systems.

Use Synapse Studio to create an end-to-end

analytics solution by integrating data from Azure
Data Lake, running transformations using Spark,
and loading the data into a dedicated SQL pool for
reporting and analysis.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Dedicated SQL Pool

A provisioned data warehousing service offering
predictable performance with dedicated resources.
Provides massive parallel processing (MPP) for large-
scale analytics.

Creating a Dedicated SQL Pool:

CREATE DATABASE mydedicatedsqlpool WITH

(EDITION = 'DataWarehouse', SERVICE_OBJECTIVE =
'DW1000c');

This script creates a dedicated SQL pool with a

service objective of DW1000c, which provides a
specific level of performance.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Serverless SQL Pool

An on-demand, distributed query engine that allows
querying data stored in Azure Data Lake without
needing to provision any infrastructure.

Querying a CSV file in Azure Data Lake:

SELECT * FROM OPENROWSET( BULK

'https://mydatalake.blob.core.windows.net/data/s
ample.csv', FORMAT = 'CSV') AS [result];

This query reads data directly from a CSV file stored

in Azure Data Lake without provisioning a dedicated
SQL pool.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Spark Pool
An Apache Spark cluster integrated with Azure Synapse
for large-scale data processing and machine learning.
Allows running Spark jobs using Scala, Python, SQL,
and R.

Submitting a Spark Job in Synapse:

spark.sql("SELECT col1, SUM(col2) FROM my_table
GROUP BY col1").show()

This PySpark code runs a simple aggregation query

on a table stored in Spark. The results are displayed
within the Synapse notebook environment.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Pipelines
An orchestration tool that allows for the creation,
scheduling, and monitoring of data workflows. Enables
integration of data sources and automation of data
movement and transformation tasks.

Building an ETL Pipeline:

1. Create a pipeline that extracts data from an on-
premises SQL Server using the "Copy Data" activity.
2. Apply transformations using the "Data Flow"
activity.
3. Load the transformed data into a dedicated SQL
pool or a data lake.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Studio
A unified interface that provides tools for data
exploration, transformation, integration, and
visualization. It allows for managing Synapse resources,
running SQL queries, Spark jobs, and building
pipelines.
Running SQL and Spark Jobs:
Use Synapse Studio to open a SQL script and execute
a query like SELECT * FROM SalesData on a
dedicated SQL pool.

Simultaneously, you can open a notebook and run

PySpark code to transform large datasets stored in a
Spark pool.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Integration
Integration of various services such as Azure Data
Factory, Power BI, and Azure Machine Learning with
Synapse Analytics for comprehensive data processing
and analysis.

Integration with Azure Data Factory:

Create a pipeline in Azure Data Factory that ingests
data from multiple sources, processes it in Synapse
(e.g., using a Spark pool for transformation), and
then loads the results into Power BI for real-time
visualization and reporting.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Notebooks
Interactive notebooks that support multi-language code
execution, enabling data scientists and engineers to
explore data, build models, and collaborate on data-
driven projects.

Using Synapse Notebooks with PySpark:

from pyspark.sql import SparkSession

spark =
SparkSession.builder.appName("MyApp").getOrCreate()
df = spark.read.csv('/path/to/data.csv') df.show()

This example shows how to load and display a CSV

file in Spark.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Link
A feature that allows for real-time, operational
analytics by enabling seamless connectivity between
Azure Cosmos DB and Azure Synapse Analytics, without
ETL processes.

Real-time Analytics with Synapse Link:

Enable Synapse Link in Azure Cosmos DB to replicate

operational data to Azure Synapse Analytics in near
real-time.

Use Synapse Studio to run analytics on this data as

soon as it arrives, without waiting for scheduled ETL
processes.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Flow
A visual, no-code interface within Synapse Pipelines
that allows for complex data transformations, such as
joins, aggregations, and data cleansing, directly within
the pipeline.

Creating a Data Flow:

1. Use the Data Flow designer to define

transformations: drag and drop activities like
"Source," "Aggregate," "Filter," and "Sink."

2. Set up a source to load data from an Azure Data

Lake, perform transformations, and then write the
output to a dedicated SQL pool.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Monitoring & Management

Provides monitoring tools within Synapse Studio for
tracking the performance and health of SQL queries,
pipelines, and Spark jobs. Includes features like alerts,
activity logs, and resource usage metrics.

Monitoring a Pipeline:

In Synapse Studio, navigate to the "Monitor" tab to

track the status and performance of running
pipelines, see detailed logs for each activity, and set
up alerts to notify you of pipeline failures or
performance issues.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Security & Compliance

Ensures data protection with built-in features like
Transparent Data Encryption (TDE), Virtual Network
Service Endpoints, and role-based access control
(RBAC). Compliance tools help meet regulatory
requirements.
Implementing Data Security:
Use TDE to encrypt all data in your dedicated SQL
pool:

ALTER DATABASE mydatabase SET ENCRYPTION ON;

Apply RBAC to restrict access to sensitive tables,

allowing only authorized users to query or modify
the data.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Workload Management
Techniques for managing and optimizing resource
allocation, query performance, and concurrency in
dedicated SQL pools. It includes features like workload
groups and resource classes.

Configuring Workload Management:

CREATE WORKLOAD GROUP high_priority WITH

(IMPORTANCE = 'high', MIN_PERCENTAGE_RESOURCE = 25);

Assign a critical workload to the "high_priority"

group to ensure it gets the necessary resources and
runs efficiently, even during peak usage.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Lake Integration

Seamlessly integrates with Azure Data Lake Storage
(ADLS) to provide a scalable, secure, and cost-effective
solution for storing and analyzing large datasets in a
distributed environment.

Querying Data in ADLS:

SELECT * FROM OPENROWSET( BULK

'https://mydatalake.blob.core.windows.net/data/trans
actions.parquet', FORMAT='PARQUET') AS [result];

Query Parquet files stored in ADLS using serverless

SQL without moving data into a dedicated SQL pool.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Partitioning
Dividing large tables into smaller, more manageable
partitions to improve query performance and
scalability. Typically used in dedicated SQL pools to
handle large datasets efficiently.

Implementing Partitioning:

CREATE TABLE Sales (Date DATE, Amount FLOAT) WITH

(DISTRIBUTION = ROUND_ROBIN,
PARTITION BY RANGE(Date)
( PARTITION p1 VALUES LESS THAN ('2021-01-01'),
PARTITION p2 VALUES LESS THAN ('2022-01-01')));

This script creates a table partitioned by date.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

PolyBase
A technology that enables the querying of external data
stored in sources like Hadoop or Azure Blob Storage as
if it were within a relational database.

Querying External Data with PolyBase:

CREATE EXTERNAL TABLE myexternaldata (id INT, name

VARCHAR(50)) WITH ( LOCATION =
'abfss://[email protected]/data/', DATA_SOURCE =
my_blob_storage, FILE_FORMAT = myfileformat);

This script creates an external table pointing to data

in Blob Storage.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Materialized Views
Precomputed views that store the results of a query
physically, allowing for faster retrieval times and
reduced query execution time by eliminating the need
to recompute results.

Creating a Materialized View:

CREATE MATERIALIZED VIEW mv_sales

AS
SELECT product_id, SUM(sales_amount) AS total_sales
FROM Sales GROUP BY product_id;

This script creates a materialized view that

precomputes total sales by product, speeding up
future queries on this data.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Columnstore Indexes
A type of index optimized for read-heavy workloads in
large datasets, providing significant compression and
performance improvements for analytical queries in
dedicated SQL pools.

Creating a Columnstore Index:

CREATE CLUSTERED COLUMNSTORE INDEX idx_sales ON Sales;

This script creates a columnstore index on the Sales

table, improving query performance for large-scale
analytical queries by compressing data and
reducing I/O.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Encryption
Techniques to protect sensitive data within Synapse
Analytics, both at rest and in transit, using encryption
methods like TDE and SSL/TLS.

Encrypting Data at Rest:

ALTER DATABASE mydatabase SET ENCRYPTION ON;

Enable TDE to ensure that all data in your dedicated

SQL pool is encrypted at rest, providing an
additional layer of security for sensitive information.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Scaling and Performance

Methods and tools to scale Synapse resources up or
down based on demand, optimize query performance
through indexing, partitioning, and statistics
management, and ensure efficient use of resources.

Scaling a Dedicated SQL Pool:

ALTER DATABASE mydedicatedsqlpool MODIFY

(SERVICE_OBJECTIVE = 'DW2000c');

This script scales up a dedicated SQL pool to

DW2000c, providing more resources for better
performance during high-demand periods.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Governance
Features and integrations that enable data cataloging,
classification, and governance within Synapse Analytics,
ensuring data is managed according to organizational
policies and regulatory requirements.

Implementing Data Governance with Azure Purview:

Integrate Azure Purview with Synapse Analytics to

automatically catalog and classify data assets.
Apply data masking policies to protect sensitive
information based on classification.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Integration with Power BI

Allows seamless connectivity between Synapse
Analytics and Power BI for data visualization and
reporting, enabling real-time analytics and sharing
insights across the organization.

Connecting Power BI to Synapse:

Use the "DirectQuery" mode in Power BI to connect

to a Synapse dedicated SQL pool.

Create dashboards that refresh in real-time as data

changes in Synapse, enabling up-to-the-minute
reporting and analytics for business users.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Streaming Data
Capabilities within Synapse Analytics to ingest, process,
and analyze real-time streaming data from sources like
IoT devices or event hubs, allowing for timely decision-
making and analytics.

Processing Streaming Data:

Use Azure Stream Analytics to ingest data from an

IoT hub, process the data in real-time, and output
the results to an Azure Synapse dedicated SQL pool
for further analysis and reporting.

Use Synapse notebooks for advanced analytics on

the processed data.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Cross-Dw Query
The ability to execute queries across multiple Synapse
workspaces or Azure services using T-SQL, allowing for
comprehensive analysis without needing to move data
between environments.

Executing a Cross-Dw Query:

SELECT *
FROM [workspace1].[database1].[dbo].[table1]
UNION ALL
SELECT *
FROM [workspace2].[database2].[dbo].[table2];

This query pulls data from two different Synapse

workspaces and combines it into a single result set
for analysis.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Workspace Management
Centralized management of Synapse Analytics
resources, including SQL pools, Spark pools, pipelines,
and linked services, all within a single workspace.

Creating a New Workspace:

Use the Azure portal or CLI to create a new Synapse

workspace:

az synapse workspace create --name myWorkspace

--resource-group myResourceGroup --location eastus
--sql-admin-login-user myAdmin
--sql-admin-login-password myPassword

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Linked Services
A configuration used within Synapse Analytics to define
the connection information for external data sources,
such as databases, data lakes, and other cloud services.

Creating a Linked Service:

Use Synapse Studio to create a linked service to

Azure Data Lake Storage:

CREATE DATABASE SCOPED CREDENTIAL myCredential

WITH IDENTITY = 'myIdentity', SECRET = 'mySecret';

Link this credential to an external data source.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Roles
Role-based access control (RBAC) in Synapse Analytics
to manage permissions and access to resources,
ensuring only authorized users can access or modify
certain resources.

Assigning Roles:

GRANT CONTROL ON DATABASE::[myDatabase]

TO [myUser];

This script grants the "CONTROL" role to a specific

user, giving them full access to the database within
the Synapse workspace.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Masking
A security feature that hides sensitive data in query
results, displaying masked values to users who do not
have the necessary permissions to view the original
data.

Applying Data Masking:

ALTER TABLE Employees

ALTER COLUMN SSN ADD MASKED
WITH (FUNCTION = 'partial(1,"XXX-XX-",4)');

This script masks the Social Security Number (SSN)

column in the Employees table, showing only the last
4 digits to unauthorized users.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Azure Synapse RBAC

Role-Based Access Control (RBAC) specific to Azure
Synapse, which helps in managing permissions and
access at different levels of the Synapse environment.

Assigning a Role to a User:

az synapse role assignment create --workspace-

name myWorkspace --role "Synapse Contributor" --
assignee "[email protected]";

This command assigns the Synapse Contributor role

to a specific user in the specified Synapse
workspace.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Elastic Query
Allows querying across multiple databases and Synapse
instances, providing the ability to execute distributed
queries that span different data sources within Synapse
Analytics.

Executing an Elastic Query:

EXEC sp_execute_remote 'RemoteDbServer',

'SELECT * FROM RemoteDatabase.dbo.Table';

This query fetches data from a remote database

within another Synapse instance, enabling cross-
database querying.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

SQL On-Demand Queries

Queries executed on-demand using serverless SQL
pools in Synapse Analytics, allowing for flexible,
scalable querying without the need for dedicated
infrastructure.

Running an On-Demand Query:

SELECT * FROM OPENROWSET(BULK

'https://mystorageaccount.blob.core.windows.net/data/file.c
sv', FORMAT='CSV') AS data;

This query reads data from a CSV file in Azure Data

Lake using the serverless SQL pool, without requiring
any dedicated resources.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Distribution
Strategies for distributing data across nodes in a
Synapse dedicated SQL pool to optimize performance
and resource utilization, including hash, round-robin,
and replicated distributions.

Creating a Table with Hash Distribution:

CREATE TABLE Sales (ProductID INT, SalesAmount

DECIMAL(10,2)) WITH (DISTRIBUTION =
HASH(ProductID));

This script creates a table with hash distribution

based on the ProductID column, optimizing query
performance for large datasets.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Azure Key Vault Integration

Integration with Azure Key Vault to securely manage
and store secrets, keys, and certificates used by
Synapse Analytics for encryption, authentication, and
secure data access.

Accessing Secrets from Key Vault:

CREATE DATABASE SCOPED CREDENTIAL myCredential

WITH IDENTITY = 'Managed Identity', SECRET =
(SELECT value FROM
sys.dm_pdw_nodes.sysdm_exec_requests);

This script retrieves a secret from Azure Key Vault

and uses it in a database-scoped credential.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

SQL Data Discovery & Classification

A feature that identifies, classifies, and labels sensitive
data within a Synapse dedicated SQL pool, helping to
comply with data protection regulations and policies.

Classifying Data:

UPDATE [sys].[sensitivity_classifications]
SET label_id = 'Confidential', information_type_id = 'Sensitive'
WHERE object_id = OBJECT_ID('Customers') AND
column_id = COLUMNPROPERTY(OBJECT_ID('Customers'),
'SSN', 'ColumnId');

This script labels the SSN column as confidential.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Pipeline Parameters
Variables that are passed into Synapse Pipelines to
dynamically control the behavior of activities, allowing
for flexible, reusable pipeline designs.

Using Pipeline Parameters:

Define a parameter in a pipeline:

@pipeline().parameters.inputFileName Use it in a
Copy Data activity to dynamically set the source file
name based on the parameter value:
@dataset().path + '/' +
pipeline().parameters.inputFileName

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse SQL Pool Resource Classes

Predefined resource allocations for queries in a
Synapse dedicated SQL pool, helping to control the
amount of memory and CPU used by different queries
based on their importance and resource needs.

Assigning a Resource Class:

EXEC sp_addrolemember 'xlargerc', 'myUser';

This command assigns the "xlargerc" resource class

to a user, giving them access to more memory and
CPU resources for running large queries in the
dedicated SQL pool.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse SQL Pool Workload Isolation

Techniques for isolating workloads in Synapse SQL
pools to ensure that high-priority queries are not
impacted by lower-priority workloads, using workload
groups and classification rules.

Isolating Workloads:

CREATE WORKLOAD GROUP high_priority

WITH (MIN_PERCENTAGE_RESOURCE = 50);

CREATE WORKLOAD CLASSIFIER high_priority_classifier WITH

(WORKLOAD_GROUP = 'high_priority', MEMBERNAME = 'myUser');

This script creates a workload group and assigns it

to a specific user.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Data Auditing
The process of tracking and recording access to and
modifications of data within Synapse Analytics to
ensure compliance with security policies and regulatory
requirements.

Enabling Auditing:

ALTER DATABASE myDatabase

SET AUDIT ACTION GROUP =
'SCHEMA_OBJECT_ACCESS_GROUP';

This script enables auditing for all schema object

access actions within a Synapse dedicated SQL pool,
recording access events for security and compliance
purposes.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Azure Synapse Hybrid Tables

A feature that combines rowstore and columnstore
indexes within the same table to optimize performance
for both transactional and analytical workloads in
Synapse dedicated SQL pools.

Creating a Hybrid Table:

CREATE TABLE Orders (OrderID INT, OrderDate

DATETIME) WITH (DISTRIBUTION = HASH(OrderID),
INDEX = CLUSTERED COLUMNSTORE INDEX (OrderID),
CLUSTERED INDEX (OrderDate));

This script creates a hybrid table with both

columnstore and rowstore indexes.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Query Performance Tuning

Techniques and tools for optimizing the performance of
queries in Synapse Analytics, including index
optimization, query rewriting, and statistics
management.

Tuning a Query: UPDATE STATISTICS Sales;

Run this command to update statistics on the Sales

table, improving query performance by ensuring the
query optimizer has accurate information about the
data distribution.

Use Query Store to monitor and analyze query

performance.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

PolyBase External Tables

A feature that allows creating external tables in
Synapse Analytics that reference data stored in external
sources, enabling seamless integration and querying of
external data alongside local data.

Creating an External Table with PolyBase:

CREATE EXTERNAL TABLE myExternalTable

(ID INT, Name VARCHAR(50))
WITH (LOCATION = 'hdfs://mycluster/data/',
DATA_SOURCE = myDataSource,
FILE_FORMAT = myFileFormat);

This script creates an external table referencing data

stored in HDFS.
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Query Store

A feature that captures query performance data within
a Synapse dedicated SQL pool, allowing for monitoring
and tuning of query performance over time.

Using Query Store: Enable Query Store:

ALTER DATABASE myDatabase SET QUERY_STORE = ON;

Use Synapse Studio to view query performance

data, identify slow-running queries, and make
adjustments to improve performance based on
historical data captured by Query Store.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: Azure Synapse Analytics

Synapse Auto Scaling

A feature that automatically scales the compute
resources for serverless SQL pools in Synapse Analytics
based on the workload, ensuring optimal performance
and cost efficiency.

Enabling Auto Scaling:

Use the Azure portal to configure auto-scaling for a

serverless SQL pool:

Set the minimum and maximum scale limits based

on expected workloads, and let the system
automatically adjust compute resources as needed
to handle changes in demand.

Shwetank Singh
GritSetGrow - GSGLearn.com

IAM Solution Design For TechCorp Enterprises
60% (5)
IAM Solution Design For TechCorp Enterprises
3 pages
PCEP-30-02 Exam - Free Actual Q&as, Page 4 Exam
No ratings yet
PCEP-30-02 Exam - Free Actual Q&as, Page 4 Exam
1 page
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Microsoft - Az 305.VOct 2023.by .Uriankisikioty89q
No ratings yet
Microsoft - Az 305.VOct 2023.by .Uriankisikioty89q
82 pages
GST Telugu
No ratings yet
GST Telugu
285 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
PRESENTATION - Ask The Expert - How Do I Integrate SAS Viya and Open Source
No ratings yet
PRESENTATION - Ask The Expert - How Do I Integrate SAS Viya and Open Source
121 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Scalable Machine Learning With Apache Spark en
No ratings yet
Scalable Machine Learning With Apache Spark en
145 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
Network Plateform Software Reference
No ratings yet
Network Plateform Software Reference
104 pages
GPGPU
No ratings yet
GPGPU
139 pages
M1 CDL Student Slides v2
No ratings yet
M1 CDL Student Slides v2
184 pages
1.top500oops Java Interview Que
No ratings yet
1.top500oops Java Interview Que
127 pages
?????? ???????????!
No ratings yet
?????? ???????????!
129 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
Spark Job Dataproc
No ratings yet
Spark Job Dataproc
4 pages
Docker Scenario Based Questions and Answers
No ratings yet
Docker Scenario Based Questions and Answers
25 pages
Data Engineering 101 Learning Path
No ratings yet
Data Engineering 101 Learning Path
26 pages
Ebook Accelerating Apache Spark 3
No ratings yet
Ebook Accelerating Apache Spark 3
108 pages
CLS 1306 WXCC - AI&Orchestration
No ratings yet
CLS 1306 WXCC - AI&Orchestration
135 pages
Kafka Concepts
No ratings yet
Kafka Concepts
75 pages
MLOps Interview QnA
No ratings yet
MLOps Interview QnA
19 pages
Interview PDF
No ratings yet
Interview PDF
100 pages
Pandas Vs SQL
No ratings yet
Pandas Vs SQL
50 pages
Pyspark RDD Cheat Sheet Python For Data Science
No ratings yet
Pyspark RDD Cheat Sheet Python For Data Science
1 page
Cloud Computing Day - 1
No ratings yet
Cloud Computing Day - 1
124 pages
Devops 84
No ratings yet
Devops 84
197 pages
Azure Data Engineering Interview Q & A - Topicwise
100% (1)
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Oreilly Tech Guide Principles and Patterns For Distributed Application Architecture
No ratings yet
Oreilly Tech Guide Principles and Patterns For Distributed Application Architecture
125 pages
Sustainable Web Development With Ruby On Rails P2.0
No ratings yet
Sustainable Web Development With Ruby On Rails P2.0
487 pages
User-Group & Permissions-Ownership
No ratings yet
User-Group & Permissions-Ownership
6 pages
Data Engineering Roadmap 2024
No ratings yet
Data Engineering Roadmap 2024
4 pages
Professional Cloud Network Engineer-150-336
No ratings yet
Professional Cloud Network Engineer-150-336
187 pages
SQL Master
No ratings yet
SQL Master
45 pages
TER36055 - V3.0 SG Ed1 CE PDF
No ratings yet
TER36055 - V3.0 SG Ed1 CE PDF
914 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
De Mod 0 Get Started With Pyspark Programming
No ratings yet
De Mod 0 Get Started With Pyspark Programming
7 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Database
No ratings yet
Database
145 pages
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
24 pages
Spark QA
No ratings yet
Spark QA
34 pages
Data Engineering - Dimensional Modelling
No ratings yet
Data Engineering - Dimensional Modelling
52 pages
Data Contracts Early Release 042024
No ratings yet
Data Contracts Early Release 042024
52 pages
Z Devops Guide
No ratings yet
Z Devops Guide
136 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Usharani Bhimavarapu Jude D
100% (1)
Usharani Bhimavarapu Jude D
349 pages
Devops Full Notes 5
No ratings yet
Devops Full Notes 5
241 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
AWS SAA Diagrams
No ratings yet
AWS SAA Diagrams
200 pages
Devops Shack: Linux Commands Documentation
No ratings yet
Devops Shack: Linux Commands Documentation
7 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
ZFNET Architecture
No ratings yet
ZFNET Architecture
14 pages
Software Testing Unit-1
No ratings yet
Software Testing Unit-1
111 pages
Brij B. Gupta - Modern Principles, Practices, and Algorithms For Cloud Security (2019) - 1
No ratings yet
Brij B. Gupta - Modern Principles, Practices, and Algorithms For Cloud Security (2019) - 1
361 pages
S4 - RM Ra SQL
No ratings yet
S4 - RM Ra SQL
111 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
AWS Certification Preparation Notes
No ratings yet
AWS Certification Preparation Notes
25 pages
Azure Role Management
No ratings yet
Azure Role Management
2 pages
Azure Fundamentals Notes
No ratings yet
Azure Fundamentals Notes
29 pages
AZ-305 Exam - Free Actual Q&As
No ratings yet
AZ-305 Exam - Free Actual Q&As
6 pages
IAM
No ratings yet
IAM
4 pages
Nis 2
No ratings yet
Nis 2
40 pages
AMU Exit Exam Model V1
No ratings yet
AMU Exit Exam Model V1
43 pages
Vulnerability Management Policy and Procedures
50% (4)
Vulnerability Management Policy and Procedures
18 pages
HR App Design Application
No ratings yet
HR App Design Application
16 pages
NM1051
No ratings yet
NM1051
13 pages
Deep Learning Meets Blockchain For Automated and Secure Access Control
No ratings yet
Deep Learning Meets Blockchain For Automated and Secure Access Control
21 pages
NCP-MCA Exam Valid Questions
No ratings yet
NCP-MCA Exam Valid Questions
21 pages
Lecture 5 System Design
No ratings yet
Lecture 5 System Design
13 pages
Software Engineering Project
No ratings yet
Software Engineering Project
9 pages
AZ 900T00 Microsoft Azure Fundamentals 03
No ratings yet
AZ 900T00 Microsoft Azure Fundamentals 03
67 pages
Az 900
No ratings yet
Az 900
88 pages
1 - Stackrox - CKS Study Guide
No ratings yet
1 - Stackrox - CKS Study Guide
26 pages
Web Security Unit - 3 Material Final
No ratings yet
Web Security Unit - 3 Material Final
11 pages
IAM Solution Designs For TechCorp Enterprises
No ratings yet
IAM Solution Designs For TechCorp Enterprises
2 pages
Module2 Governance
No ratings yet
Module2 Governance
27 pages
User Management Techniques
No ratings yet
User Management Techniques
5 pages
Kplabs Sap c02 Ppt+25th+March+2024
No ratings yet
Kplabs Sap c02 Ppt+25th+March+2024
1,492 pages
API - Practical Deployment of Cisco Identity Services Engine (ISE)
No ratings yet
API - Practical Deployment of Cisco Identity Services Engine (ISE)
3 pages
IT 200 Project One Technology Hardware and Software - Edited
No ratings yet
IT 200 Project One Technology Hardware and Software - Edited
7 pages
07 Authorization
No ratings yet
07 Authorization
40 pages
AWS Certification Guide - AWS Certified Security - Specialty A Comprehensive Guide To AWS Certified Security - Specialty... (Hermans, Kris LTD, Cybellium) (Z-Library)
No ratings yet
AWS Certification Guide - AWS Certified Security - Specialty A Comprehensive Guide To AWS Certified Security - Specialty... (Hermans, Kris LTD, Cybellium) (Z-Library)
281 pages
Guide To Design Database For Blog Management in MySQL
No ratings yet
Guide To Design Database For Blog Management in MySQL
7 pages
Iot Security Notes Unit 1 and 2
No ratings yet
Iot Security Notes Unit 1 and 2
66 pages