0% found this document useful (0 votes)
47 views29 pages

Lecture #12.1 - Apache Spark - Production Scenarios

Databricks is a commercial platform built on Apache Spark that provides automated cluster management and multi-language notebooks. It is a popular way to use Spark in the cloud, growing in adoption in recent years. Databricks provides a web-based interface for working with Spark clusters on major cloud platforms like AWS, Azure, and GCP. While learning how to set up Databricks is not the focus, videos are provided to demonstrate its capabilities for developing Spark applications, streaming use cases, machine learning, and customer stories using Databricks.

Uploaded by

Sumit Khaitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views29 pages

Lecture #12.1 - Apache Spark - Production Scenarios

Databricks is a commercial platform built on Apache Spark that provides automated cluster management and multi-language notebooks. It is a popular way to use Spark in the cloud, growing in adoption in recent years. Databricks provides a web-based interface for working with Spark clusters on major cloud platforms like AWS, Azure, and GCP. While learning how to set up Databricks is not the focus, videos are provided to demonstrate its capabilities for developing Spark applications, streaming use cases, machine learning, and customer stories using Databricks.

Uploaded by

Sumit Khaitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MODERN DATA ARCHITECTURES

FOR BIG DATA II


APACHE SPARK
PRODUCTION
SCENARIOS
Agenda

● Developing a Spark Application

● Databricks

● References
▹ Databricks Usages
▹ Customer Stories
▹ Optional Videos

2
1.
SPARK
APPLICATIONS
1.1
DEVELOPING
A SPARK
APPLICATION
Developing a Spark Application

So far we’ve played around with Spark by


running Jupyter notebooks, which perfectly
valid for interactive analytics.

Spark applications are more than that, and


oftentimes are needed for critical and
production jobs (ETL, streaming, advanced
analytics, …)

5
Developing a Spark Application

We can create a PySpark Application just


porting all the code we’ve written in the cells
of a notebook in a regular python application:

if __name__ == '__main__':
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
sc = SparkSession.builder \
.master("local") \
.appName("Bikes") \
.getOrCreate()

spark = SparkSession(sc)

6
Developing a Spark application

Example of a pyspark Application in a regular


python file (.py)

7
Developing a Spark Application

The key thing to remember is that Spark


applications are meant to run in distributed
way, this is, in a cluster of computers.

Spark provides a command line interface to


launch and execute Spark applications in
clusters.

This command is spark-submit

8
Developing a Spark Application

Once the Spark code is written, it’s time to submit


it for execution by using spark-submit:

osbdet@osbdet:~$ export PYSPARK_PYTHON=/usr/bin/python3


osbdet@osbdet:~$ $SPARK_HOME/bin/spark-submit --master local \
--packages "graphframes:graphframes:0.8.0-spark3.0-s_2.12" \
Bikes.py

9
Developing a Spark Application

10
2.
DATABRICKS
Databricks

Databricks is an American enterprise software


company founded by the creators of Apache
Spark.

Databricks develops a web-based platform for


working with Spark, that provides automated
cluster management and multi language
notebooks.

It’s a commercial product available in all the


major cloud vendors (AWS, Azure and GCP)

12
Databricks

It’s adoption has been growing in the last years


and right now is the preferred way of using
Apache Spark in the clouds.

13
Databricks

Since it’s a cloud technology we will need to


register ourselves in a cloud vendor providing
our credit card.

In order avoid unwanted charges in case you


miss to stop a cloud service and that the goal of
this class is NOT learning how to setup and
configure a Databricks environment in the cloud
but to learn how to use it, we will provide you
some videos.

14
Databricks

https://www.youtube.com/watch?v=5MC-RVfqnuY 15
Databricks

https://www.youtube.com/watch?v=js3MFxkDcL8 16
Databricks

https://www.youtube.com/watch?v=67DeQOWIA7c 17
Databricks

https://www.youtube.com/watch?v=xtHcZVroK8Y 18
2.
REFERENCES
2.1
DATABRICKS
USAGE
Databricks - company behind Spark

21
Streaming use cases

22
Machine Learning and Big Data

23
2.2
CUSTOMER
STORIES
Recommendation Engine for Rue Gilt Groupe

25
Shell

26
Clearsense

27
2.3
OPTIONAL
VIDEOS
Databricks

If you still being very interested in provisioning


your Databricks environment you can follow this
steps

29

You might also like