0% found this document useful (0 votes)
11 views42 pages

Spark and Scala - Module 1

The document outlines a course on Apache Spark and Scala, detailing various modules covering topics from introduction to Scala, big data concepts, and Spark basics. It emphasizes the importance of Scala in big data technologies and provides insights into its features, frameworks, and community. The document also includes practical guidance on software installation and hands-on exercises with Scala.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views42 pages

Spark and Scala - Module 1

The document outlines a course on Apache Spark and Scala, detailing various modules covering topics from introduction to Scala, big data concepts, and Spark basics. It emphasizes the importance of Scala in big data technologies and provides insights into its features, frameworks, and community. The document also includes practical guidance on software installation and hands-on exercises with Scala.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Apache Spark and Scala

Module 1: Getting Started / Introduction to Scala

© 2015 BlueCamphor Technologies (P) Ltd.


Course Topics

Module 1 Module 2 Module 3 Module 4


Getting Started / Scala – Essentials and Introducing Traits and Functional Programming
Introduction to Scala Deep Dive OOPS in Scala in Scala

Module 5 Module 6 Module 7 Module 8


Spark and Big Data Advanced Spark Understanding RDDs Shark, SparkSQL and
Concepts Project Discussion

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 2


Session Objectives
This session will help you to understand:

ᗍ Big Data
ᗍ IBM’s Big Data Definition
ᗍ Some Big Data Examples
ᗍ Sparks Basics
ᗍ Why Spark ?
ᗍ Spark Components
ᗍ Scala Basics
ᗍ Why Scala ?
ᗍ Scala Job Trends
ᗍ Users of Scala
ᗍ Scala Frameworks
ᗍ Scala Usage
ᗍ Software Installation
ᗍ Scala Hands-on
ᗍ Scala community

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 3


Introduction to Big Data

Big data is a broad term for data sets so large or complex


that traditional data processing applications are
inadequate

The challenges of big data includes: analysis, capture,


data curation, search, sharing, storage, transfer,
visualization, and information privacy

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 4


IBM’s Big Data Definition

ᗍ IBM’s Definition – Big Data Characteristics


ᗍ http://www.ibmbigdatahub.com/infographic/four-vs-big-data/

Velocity
Real-time capture and Real-time analytics

Volume Variety
Petabytes per day/week Unstructured data, web logs,
audio, video, image

Veracity
Uncertainty of data

Big Data

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 5


Some Big Data Examples
ᗍ NYSE broadcasts several levels of data, including trade prices, sizes
ᗍ NYSE Technologies receives four to five terabytes of a data in a day and which is used for complex analytics,
market surveillance, capacity planning and monitoring

NYSE generates about one terabyte of new trade data per day to perform stock trading analytics to determine
trends for optimal trades

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 6


Check your Understanding – 1
Which of the following are the Big Data Solutions Candidates?

a) Processing 1.5 TB data everyday


b) Processing 30 minutes Flight sensor data
c) Interconnecting 50K data points (approx. 1 MB input file)
d) Processing User clicks on a website

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 7


Introduction to Spark

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 8


Check your Understanding – Solution
Which of the following are the Big Data Solutions Candidates?

a) Processing 1.5 TB data everyday


b) Processing 30 minutes Flight sensor data
c) Interconnecting 50K data points (approx. 1 MB input file)
d) Processing User clicks on a website

ALL of the options are Big Data solutions Scenario. Even if the
input size of the problem is small, the processing might make
the scenario as Big Data Problem

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 9


Spark Basics

ᗍ Apache Spark is a general-purpose cluster in-memory computing system which is used for data analytics

ᗍ It provides high-level APIs in Java, Scala and Python and an optimized engine that supports general execution
graphs

ᗍ Apache Spark Provides various high level tools like Spark SQL for structured data processing, R programming
Language for analyzing large datasets and MLlib for Machine Learning etc.

High-level APIs High-level Tools More ..

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 10


Spark Basics (Cont’d)

Spark framework is polyglot – It can be programmed in several programming languages (Java, Scala ,R 3.2.2 and
Python supported)

R 3.2.2

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 11


Why Spark?

Speed Generality Runs Everywhere


Run programs up to 100x Combine SQL ,streaming Spark runs on Hadoop,
faster than Hadoop Map and complex analytics Mesos.standalone or in
Reduce in memory into one platform cloud

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 12


Spark Components

Spark SQL
Spark SQL Spark
Spark Streaming
Streaming MLib
MLib Machine
Machine GraphX graph
GraphX graph
Structured Data
Structured Data real-time
real-time Learning
Learning Processing
Processing

Spark Core

Standalone Scheduler YARN Mesos

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 13


Spark Components (Cont’d)
Enables analytical and Graph Computation
interactive apps for engine
live streaming data (Similar to Graph)

Used for structured


data. Can run Mllib Machine learning library
Spark GraphX
unmodified hive Spark SQL (Machine being built on top of
queries on existing Streaming (graph) Spark. Provision for
Learning) support to many machine
Hadoop deployment
learning algorithms with
speeds upto 100 times
faster than Map-Reduce
Spark Core

Spark YARN client


Standalone Scheduler YARN Mesos controls how many
executors it will allocate
on the cluster

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 14


CREATES MAGIC

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 15


Introduction to Scala

Martin Odersky and his team started developing Scala in 2001

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 16


Introduction to Scala (Cont’d)
ᗍ Scala is a general purpose programming language, multiparadigm object oriented, functional, scalable

ᗍ Aimed to implement common programming patterns in a concise, elegant, and type-safe way

ᗍ Supports both object-oriented and functional programming styles, thus helping programmers to be more
productive

ᗍ Publicly released in January 2004 on the JVM platform and a few months later on the .NET platform

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 17


Introduction to Scala (Cont’d)
Scala is Statically Typed
ᗍ Statically typed language binds the type to a variable for its entire scope

ᗍ Dynamically typed languages bind the type to the actual value referenced by a variable .Example : python

ᗍ Fully supports Object Oriented Programming


ᗍ Everything is an object in Scala
ᗍ Unlike Java, Scala does not have primitives
ᗍ Supports “static” class members through Singleton Object Concept
ᗍ Improved support for OOP through Traits

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 18


Why Scala?
ᗍ Scala is pure object-oriented language. Conceptually, every value is an object and every operation is a method-
call

ᗍ Scala is also a functional language and supports immutable data structures

ᗍ Many big data technologies use Scala like Spark, Kakfka, Storm, Akka, Scalding and web frameworks like Play

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 19


Why Scala? (Cont’d)
Scala code compared to Java code

Java Code Scala Code

List<String> list = new ArrayList<String>(); val list = List("1", "2", "3")


list.add("1");
list.add("2");
list.add("3");

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 20


Scala – Job Trends

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 21


Scala Users

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 22


Scala Frameworks

Play – For Web Development Scalding – For Map/Reduce

Play is a high-productivity Java and Scala Scalding is a Scala library that makes it easy
web application framework that integrates to specify Hadoop MapReduce jobs. Scalding
the components and APIs you need for is built on top of Cascading, a Java library
modern web application development that abstracts away low-level Hadoop details

Akka – Actors Based Framework

Akka is a toolkit and runtime for


building highly concurrent, distributed,
and fault tolerant applications on the
JVM. Akka is written in Scala

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 23


Scala Frameworks (Cont’d)

Spark – In – memory Processing Apache Kafka

Apache Spark is a general-purpose cluster in-memory Apache Kafka is publish-subscribe


computing system. It is used for fast data analytics messaging rethought as a distributed
and it abstracts APIs in Java, Scala and Python, and commit log
provides an optimized engine that supports general
execution graphs

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 24


Scala Usage

Scripting Web Application Messaging

Mobile Android Apps Digital Subscriber Line GUI (Graphical User Interface)

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 25


Check your Understanding – 2
Which Features are supported by Scala?

a) Less error prone functional style


b) High maintainability and productivity
c) High scalability
d) High testability
e) Provides features of concurrent programming

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 26


Check your Understanding – Solution
Which Features are supported by Scala?

a) Less error prone functional style


b) High maintainability and productivity
c) High scalability
d) High testability
e) Provides features of concurrent programming

All of these are the features of Scala

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 27


Software Installation
ᗍ Latest version can be downloaded from: http://www.scala-lang.org/download/
ᗍ Install the Scala and Set the Scala Path in Machine

Note: Extensive installation Guide is available in LMS

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 28


Scala Hands-on (Cont’d)
ᗍ Start  Type Run  Type cmd

ᗍ Type scala

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 29


Scala Hands-on (Cont’d)

ᗍ Download Vmware Player

ᗍ Double-click on Vmware Player Workstation  Open the Virtual Machine (it will open the Ubuntu desktop)

ᗍ Install scala

ᗍ Then select New Terminal  It will open the Terminal and install Scala then type scala

Note: Installation Guide for Linux is Available in LMS

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 30


Scala Hands-on (Cont’d)
ᗍ Scala IDE provides excellent and enhanced editing and debugging support for the development of pure Scala
(mixed Scala-Java also) applications

ᗍ The best choices for Scala IDEs are IntelliJ IDEA and Eclipse because they are excellent in terms of stability and
features like type interference, code inspection and memory consumption

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 31


Scala Hands-on (Cont’d)
ᗍ REPL: Read - Evaluate - Print - Loop
ᗍ Easiest way to get started with Scala, acts as an interactive shell interpreter
ᗍ Even though it appears as interpreter, all typed code is converted to Bytecode and executed
ᗍ Invoked by typing Scala as shown below

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 32


Scala Hands-on (Cont’d)

After you type an expression, such as 10 + 30, and hit enter:


scala> 10 + 30

The interpreter will print:


res0: Int = 40

This line includes:


ᗍ An automatically generated or user-defined name to refer to the computed value (res0, which means result 0),
ᗍ A colon (:), followed by the type of the expression (Int),
ᗍ An equals sign (=),
ᗍ The value resulting from evaluating the expression (30)

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 33


Scala Hands-on (Cont’d)
In the beginning, you started with the REPL

ᗍ Scala scripts can be written in text files and saves the script with a .scala extension
ᗍ It indicates to the operating system and programmer that the file is actually a scala program

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 34


Scala Hands-on (Cont’d)
The scripts can be read into the interpreter in several ways:
scala Hello.scala # here Hello is Script name

The script is executed and the REPL is immediately closed


ᗍ scala –i Hello.scala (Output prints and opens the scala REPL )

ᗍ scala –nc Hello.scala

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 35


Scala Community
Developers in countries all over the world are using Scala for a large variety of applications across a broad range of
industries

Popular ways to connect with the Scala community are via mailing lists or IRC channels

Though there are plenty of opportunities to connect face-to-face with others in the community– for example, via
local Scala Meetups, or local Scala user groups

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 36


Check your Understanding – 3
Scala REPL acts as scala Interpreter

a) True
b) False

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 37


Check your Understanding – Solution
Scala REPL acts as scala Interpreter

a) True
b) False

False

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 38


Check your Understanding – 4
Scala supports primitive and wrapper classes ?

a) True
b) False

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 39


Check your Understanding – Solution
Scala supports primitive and wrapper classes ?

a) True
b) False

False

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 40


Questions

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 41

You might also like