0% found this document useful (0 votes)

75 views6 pages

What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs

Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big data analytic applications. Apache Spark in Azure Synapse Analytics provides a fully managed Spark service that makes it easy to create and configure serverless Apache Spark pools in the cloud. Spark pools can leverage Azure Storage and Azure Data Lake Storage Gen2 and come with Spark, Anaconda, Apache Livy, and Nteract notebooks pre-installed.

Uploaded by

demetrius albuquerque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views6 pages

What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs

Uploaded by

demetrius albuquerque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

Azure / Synapse Analytics /  Ｄ／ 

Apache Spark in Azure Synapse

Analytics
Article • 02/16/2022 • 4 minutes to read • 9 contributors  

In this article
What is Apache Spark
Spark pool architecture
Apache Spark in Azure Synapse Analytics use cases
Where do I start
Next steps

Apache Spark is a parallel processing framework that supports in-memory processing to

boost the performance of big-data analytic applications. Apache Spark in Azure Synapse
Analytics is one of Microsoft's implementations of Apache Spark in the cloud. Azure
Synapse makes it easy to create and configure a serverless Apache Spark pool in Azure.
Spark pools in Azure Synapse are compatible with Azure Storage and Azure Data Lake
Generation 2 Storage. So you can use Spark pools to process your data stored in Azure.

What is Apache Spark

https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 1/6
4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

Apache Spark provides primitives for in-memory cluster computing. A Spark job can
load and cache data into memory and query it repeatedly. In-memory computing is
much faster than disk-based applications. Spark also integrates with multiple
programming languages to let you manipulate distributed data sets like local
collections. There's no need to structure everything as map and reduce operations.

Spark pools in Azure Synapse offer a fully managed Spark service. The benefits of
creating a Spark pool in Azure Synapse Analytics are listed here.

Feature Description

Speed and Spark instances start in approximately 2 minutes for fewer than 60 nodes and
efficiency approximately 5 minutes for more than 60 nodes. The instance shuts down, by
default, 5 minutes after the last job executed unless it is kept alive by a notebook
connection.

Ease of You can create a new Spark pool in Azure Synapse in minutes using the Azure
creation portal, Azure PowerShell, or the Synapse Analytics .NET SDK. See Get started with
Spark pools in Azure Synapse Analytics.

Ease of use Synapse Analytics includes a custom notebook derived from Nteract . You can
use these notebooks for interactive data processing and visualization.

REST APIs Spark in Azure Synapse Analytics includes Apache Livy , a REST API-based
Spark job server to remotely submit and monitor jobs.

Support for Spark pools in Azure Synapse can use Azure Data Lake Storage Generation 2 as
Azure Data well as BLOB storage. For more information on Data Lake Storage, see Overview
Lake Storage of Azure Data Lake Storage.
Generation 2

Integration Azure Synapse provides an IDE plugin for JetBrains' IntelliJ IDEA that is useful
with third- to create and submit applications to a Spark pool.
party IDEs

https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 2/6
4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

Feature Description

Pre-loaded Spark pools in Azure Synapse come with Anaconda libraries pre-installed.
Anaconda Anaconda provides close to 200 libraries for machine learning, data analysis,
libraries visualization, etc.

Scalability Apache Spark in Azure Synapse pools can have Auto-Scale enabled, so that
pools scale by adding or removing nodes as needed. Also, Spark pools can be
shut down with no loss of data since all the data is stored in Azure Storage or
Data Lake Storage.

Spark pools in Azure Synapse include the following components that are available on
the pools by default.
Spark Core . Includes Spark Core, Spark SQL, GraphX, and MLlib.
Anaconda
Apache Livy
Nteract notebook

Spark pool architecture

It is easy to understand the components of Spark by understanding how Spark runs on
Azure Synapse Analytics.
Spark applications run as independent sets of processes on a pool, coordinated by the
SparkContext object in your main program (called the driver program).

The SparkContext can connect to the cluster manager, which allocates resources across
applications. The cluster manager is Apache Hadoop YARN . Once connected, Spark
acquires executors on nodes in the pool, which are processes that run computations and
store data for your application. Next, it sends your application code (defined by JAR or
Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks
to the executors to run.
The SparkContext runs the user's main function and executes the various parallel
operations on the nodes. Then, the SparkContext collects the results of the operations.
The nodes read and write data from and to the file system. The nodes also cache
transformed data in-memory as Resilient Distributed Datasets (RDDs).

The SparkContext connects to the Spark pool and is responsible for converting an
application to a directed acyclic graph (DAG). The graph consists of individual tasks that
get executed within an executor process on the nodes. Each application gets its own
executor processes, which stay up for the duration of the whole application and run
tasks in multiple threads.
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 3/6
4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

Apache Spark in Azure Synapse Analytics use

cases
Spark pools in Azure Synapse Analytics enable the following key scenarios:

Data Engineering/Data Preparation

Apache Spark includes many language features to support preparation and processing
of large volumes of data so that it can be made more valuable and then consumed by
other services within Azure Synapse Analytics. This is enabled through multiple
languages (C#, Scala, PySpark, Spark SQL) and supplied libraries for processing and
connectivity.

Machine Learning
Apache Spark comes with MLlib , a machine learning library built on top of Spark that
you can use from a Spark pool in Azure Synapse Analytics. Spark pools in Azure Synapse
Analytics also include Anaconda, a Python distribution with a variety of packages for
data science including machine learning. When combined with built-in support for
notebooks, you have an environment for creating machine learning applications.

Where do I start
Use the following articles to learn more about Apache Spark in Azure Synapse Analytics:
Quickstart: Create a Spark pool in Azure Synapse
Quickstart: Create an Apache Spark notebook
Tutorial: Machine learning using Apache Spark
Apache Spark official documentation

７ Note

Some of the official Apache Spark documentation relies on using the spark console,
this is not available on Azure Synapse Spark, use the notebook or IntelliJ
experiences instead

Next steps
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 4/6
4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

In this overview, you get a basic understanding of Apache Spark in Azure Synapse
Analytics. Advance to the next article to learn how to create a Spark pool in Azure
Synapse Analytics:

Create a Spark pool in Azure Synapse

Recommended content
Quickstart: Create a serverless Apache Spark pool using web tools - Azure
Synapse Analytics
This quickstart shows how to use the web tools to create a serverless Apache Spark pool in
Azure Synapse Analytics and how to run a Spark SQL query.

Quickstart: Get started analyzing with Spark - Azure Synapse Analytics

In this tutorial, you'll learn to analyze data with Apache Spark.

Quickstart: Create a serverless Apache Spark pool using the Azure portal -
Azure Synapse Analytics
Create a serverless Apache Spark pool using the Azure portal by following the steps in this
guide.

Overview of how to use Linux Foundation Delta Lake in Apache Spark for
Azure Synapse Analytics - Azure Synapse Analytics
Learn how to use Delta Lake in Apache Spark for Azure Synapse Analytics, to create, and use
tables with ACID properties.

Apache Spark core concepts - Azure Synapse Analytics

Introduction to core concepts for Apache Spark in Azure Synapse Analytics.

Tutorial: Get started integrate with pipelines - Azure Synapse Analytics

In this tutorial, you'll learn how to integrate pipelines and activities using Synapse Studio.

Quickstart: Transform data using Apache Spark job definition - Azure Synapse
Analytics
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 5/6
4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

This tutorial provides step-by-step instructions for using Azure Synapse Analytics to
transform data with Apache Spark job definition.

What's new? - Azure Synapse Analytics

Learn about the new features and documentation improvements for Azure Synapse Analytics

Show more Ｓ

https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview 6/6

English Apki Success (Md. Aalam) 1st
0% (2)
English Apki Success (Md. Aalam) 1st
832 pages
Azure Synapse Analytics
100% (2)
Azure Synapse Analytics
7,794 pages
Azure Synapse
No ratings yet
Azure Synapse
609 pages
Computer Fundamentals (ALL in ONE)
No ratings yet
Computer Fundamentals (ALL in ONE)
818 pages
Synapse Project Deck
No ratings yet
Synapse Project Deck
196 pages
Devc Lecture Notes (20A54201) : I - Btech
No ratings yet
Devc Lecture Notes (20A54201) : I - Btech
218 pages
Ultimate Azure Synapse Analytics: Unlock the Full Potential of Azure Synapse Analytics to Seamlessly Integrate, Analyze, and Optimize Complex Data for Enhanced Business Insights and Decision-Making (English Edition)
From Everand
Ultimate Azure Synapse Analytics: Unlock the Full Potential of Azure Synapse Analytics to Seamlessly Integrate, Analyze, and Optimize Complex Data for Enhanced Business Insights and Decision-Making (English Edition)
Swapnil Mule
No ratings yet
経絡と経穴 ("KEIRAKU TO KEIKETSU") MERIDIANS AND ACUPOINTS AS DESCRIBED IN JAPANESE KANPÔ MEDICINE
100% (11)
経絡と経穴 ("KEIRAKU TO KEIKETSU") MERIDIANS AND ACUPOINTS AS DESCRIBED IN JAPANESE KANPÔ MEDICINE
45 pages
Azure Synapse
No ratings yet
Azure Synapse
229 pages
Azure Synapse Course Presentation
100% (1)
Azure Synapse Course Presentation
261 pages
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
From Everand
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
Omar Khedher
No ratings yet
Create Loving Relationships
100% (1)
Create Loving Relationships
57 pages
Azure Analytics: Synapse
100% (4)
Azure Analytics: Synapse
251 pages
OpenStack Cookbook
From Everand
OpenStack Cookbook
Jorven Halquin
No ratings yet
OpenStack Cookbook: Manage Compute, Storage and Networking through Single Interface
From Everand
OpenStack Cookbook: Manage Compute, Storage and Networking through Single Interface
Jorven Halquin
No ratings yet
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Choir Theory Packet Final
No ratings yet
Choir Theory Packet Final
35 pages
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Azure Synapse Analytics Overview
No ratings yet
Azure Synapse Analytics Overview
251 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
102 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Dynamics365 ExportToDataLake To Synapse Link TransitionGuide
No ratings yet
Dynamics365 ExportToDataLake To Synapse Link TransitionGuide
45 pages
Azure Synapse Guidebook
100% (1)
Azure Synapse Guidebook
15 pages
Grade 7 Physics Electricity Workbook
No ratings yet
Grade 7 Physics Electricity Workbook
45 pages
Spark: Big Data Cluster Computing in Production
From Everand
Spark: Big Data Cluster Computing in Production
Ilya Ganelin
No ratings yet
Diagrams
No ratings yet
Diagrams
69 pages
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
From Everand
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
Saurabh Chhajed
No ratings yet
OpenStack Sahara Essentials
From Everand
OpenStack Sahara Essentials
Omar Khedher
No ratings yet
James Serra Azure Synapse Analytics Overview Big Data Conference Europe
No ratings yet
James Serra Azure Synapse Analytics Overview Big Data Conference Europe
72 pages
Data Engineering 101 - Azure Synapse Analytics
No ratings yet
Data Engineering 101 - Azure Synapse Analytics
45 pages
OpenStack Essentials - Second Edition
From Everand
OpenStack Essentials - Second Edition
Dan Radez
No ratings yet
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
29 pages
Paper Scheme MATH 113
100% (1)
Paper Scheme MATH 113
2 pages
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
From Everand
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
Adam Jones
No ratings yet
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
8 pages
Learning PySpark
From Everand
Learning PySpark
Tomasz Drabas
No ratings yet
Azure Data Engineer Learning Path
No ratings yet
Azure Data Engineer Learning Path
12 pages
English A Guide To Giving Dawah To Non Muslims
100% (1)
English A Guide To Giving Dawah To Non Muslims
52 pages
Spark Pool Vs SQL Pool
No ratings yet
Spark Pool Vs SQL Pool
8 pages
Data Analyst Azure PowerBI Syllabus
No ratings yet
Data Analyst Azure PowerBI Syllabus
35 pages
Netxms Admin
No ratings yet
Netxms Admin
546 pages
ExportToDatalakeTransitionsToSynapseLink V 01
No ratings yet
ExportToDatalakeTransitionsToSynapseLink V 01
45 pages
Azure Synapse Analytics: Partner Webinar
No ratings yet
Azure Synapse Analytics: Partner Webinar
35 pages
Syllabus On American Accent
No ratings yet
Syllabus On American Accent
11 pages
SDC - Synapse Analytics
No ratings yet
SDC - Synapse Analytics
23 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
Document 2
No ratings yet
Document 2
11 pages
Lesson Plan Critique Smaion
No ratings yet
Lesson Plan Critique Smaion
16 pages
Azure Synapse
No ratings yet
Azure Synapse
12 pages
English For Advanced Learners
100% (2)
English For Advanced Learners
9 pages
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
From Everand
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
Adam Jones
No ratings yet
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
5 pages
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
13 pages
Azure Analytics Interview Answers Complete
No ratings yet
Azure Analytics Interview Answers Complete
5 pages
Advanced Real-Time Data Integration: Apache Kafka and Spark Streaming Techniques
From Everand
Advanced Real-Time Data Integration: Apache Kafka and Spark Streaming Techniques
Adam Jones
No ratings yet
Important DE Interview Questions
No ratings yet
Important DE Interview Questions
5 pages
Analytics and DWH With Azure Synapse
No ratings yet
Analytics and DWH With Azure Synapse
5 pages
Azure Synpse
No ratings yet
Azure Synpse
4 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Dkosjfnaf
No ratings yet
Dkosjfnaf
2 pages
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
From Everand
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Understand Azure Synapse Serverless SQL Pool Capabilities and Use Cases
No ratings yet
Understand Azure Synapse Serverless SQL Pool Capabilities and Use Cases
3 pages
Azure Book 125
No ratings yet
Azure Book 125
1 page
Azure Book 126
No ratings yet
Azure Book 126
1 page
Azure Synapse Analytics PoC Environment
No ratings yet
Azure Synapse Analytics PoC Environment
8 pages
Azure Synapse Analytics - Azure Synapse Analytics - Microsoft Docs
No ratings yet
Azure Synapse Analytics - Azure Synapse Analytics - Microsoft Docs
3 pages
Bca Question Papers Blue Print
60% (5)
Bca Question Papers Blue Print
11 pages
Network Models Part 1
No ratings yet
Network Models Part 1
20 pages
DP-203 Agenda
No ratings yet
DP-203 Agenda
8 pages
ASA
No ratings yet
ASA
1 page
Practical File XII CS
No ratings yet
Practical File XII CS
6 pages
Azure DW
No ratings yet
Azure DW
2 pages
Mastering Apache Arrow: Accelerating Data Processing and In-Memory Analytics
From Everand
Mastering Apache Arrow: Accelerating Data Processing and In-Memory Analytics
Robert Johnson
No ratings yet
Questions in Passive Voice
No ratings yet
Questions in Passive Voice
3 pages
Introduction To Databases
No ratings yet
Introduction To Databases
23 pages
Memory Forensics and Volatility Framework
No ratings yet
Memory Forensics and Volatility Framework
4 pages
Co Prehension Lesson
No ratings yet
Co Prehension Lesson
2 pages
Literacy Assessment Test
No ratings yet
Literacy Assessment Test
4 pages
Unit 1
No ratings yet
Unit 1
9 pages
LP Spaces For P in 01
No ratings yet
LP Spaces For P in 01
4 pages
Purpose of Business Communication
No ratings yet
Purpose of Business Communication
4 pages
M569 1 Medialon Showmaster Hardware Manual
No ratings yet
M569 1 Medialon Showmaster Hardware Manual
80 pages
Giant Maths Word Search WORD LIST
No ratings yet
Giant Maths Word Search WORD LIST
1 page
Quickstart - To Load Data Into Dedicated SQL Pool Using The Copy Activity - Azure Synapse Analytics - Microsoft Docs
No ratings yet
Quickstart - To Load Data Into Dedicated SQL Pool Using The Copy Activity - Azure Synapse Analytics - Microsoft Docs
11 pages
Jan 23 2022
No ratings yet
Jan 23 2022
9 pages
Close Passage - Reconcilliation Week
No ratings yet
Close Passage - Reconcilliation Week
2 pages
Quran Is in Arabic Only
No ratings yet
Quran Is in Arabic Only
5 pages
Azure Synapse Link For Azure Cosmos DB, Benefits, and When To Use It - Microsoft Docs
No ratings yet
Azure Synapse Link For Azure Cosmos DB, Benefits, and When To Use It - Microsoft Docs
9 pages
Connect Synapse Workspace To Microsoft Purview - Azure Synapse Analytics - Microsoft Docs
No ratings yet
Connect Synapse Workspace To Microsoft Purview - Azure Synapse Analytics - Microsoft Docs
6 pages
What Is Dedicated SQL Pool (Formerly SQL DW) - Azure Synapse Analytics - Microsoft Docs
No ratings yet
What Is Dedicated SQL Pool (Formerly SQL DW) - Azure Synapse Analytics - Microsoft Docs
4 pages
Change QUOTES From Direct To Reported Speech
No ratings yet
Change QUOTES From Direct To Reported Speech
3 pages

What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs

Uploaded by

What Is Apache Spark - Azure Synapse Analytics - Microsoft Docs

Uploaded by

4/26/22, 2:29 PM What is Apache Spark - Azure Synapse Analytics | Microsoft Docs

Azure ​/ Synapse Analytics ​/  Ｄ ／ 

Apache Spark in Azure Synapse

Apache Spark is a parallel processing framework that supports in-memory processing to

What is Apache Spark

Spark pool architecture

Apache Spark in Azure Synapse Analytics use

Data Engineering/Data Preparation

Create a Spark pool in Azure Synapse

Quickstart: Get started analyzing with Spark - Azure Synapse Analytics

Apache Spark core concepts - Azure Synapse Analytics

Tutorial: Get started integrate with pipelines - Azure Synapse Analytics

What's new? - Azure Synapse Analytics

You might also like

Azure / Synapse Analytics /  Ｄ／ 