Introduction To Scala in Spark

Uploaded by

gq998trc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views41 pages

Introduction To Scala in Spark

Uploaded by

gq998trc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Introduction to Scala in Spark

Dan Lo
Department of Computer Science
Kennesaw State University
What is Scala?
• Scala stands for "scalable language.“
• Both object-oriented and functional
• Aim to address criticisms of Java
• To be concise!
• Designed by Martin Odersky (2004), a German computer scientist and
professor of programming methods at École Polytechnique Fédérale
de Lausanne in Switzerland.
• Martin’s Ph.D. advisor, Niklaus Wirth, developed Pascal.
Scala Overview
• It’s a high-level language.
• It’s statically typed.
• Its syntax is concise but still readable — we call it expressive.
• It supports the object-oriented programming (OOP) paradigm.
• It supports the functional programming (FP) paradigm.
• It has a sophisticated type inference system.
• Scala code results in .class files that run on the Java Virtual Machine
(JVM).
• It’s easy to use Java libraries in Scala.
Two types of variables
• val is an immutable variable — like final in Java — and should be
preferred
• var creates a mutable variable, and should only be used when there is
a specific reason to use it
• Examples:
• val x = 1 //immutable
• var y = 0 //mutable
Common Data Types
Byte 8 bit signed value. Range from -128 to 127
Short 16 bit signed value. Range -32768 to 32767
Int 32 bit signed value. Range -2147483648 to 2147483647
Long 64 bit signed value. -9223372036854775808 to 9223372036854775807

Float 32 bit IEEE 754 single-precision float

Double 64 bit IEEE 754 double-precision float
Char 16 bit unsigned Unicode character. Range from U+0000 to U+FFFF

String A sequence of Chars

Boolean Either the literal true or the literal false
Unit Corresponds to no value
Type Inference
• Scala will infer a variable type.
• val x = 1
• val s = "a string“
• val f = 3.14f
• val df = 3.14
• val p = new Person("Regina")
Declaring variable types
• val x: Int = 1
• val s: String = "a string"
• val f: Float = 3.14f
• val df: Double = 3.14
• val p: Person = new Person("Regina")
if/else Decision Structure
if (test1) { If/else returns a value:
doA()
} else if (test2) { val x = if (a < b) a else b
doB()
} else if (test3) {
doC()
} else {
doD()
}
Switch (Match)
val result = i match {
case 1 => "one"
case 2 => "two"
case _ => "not 1 or 2"
}
For loop and expression
• for (arg <- args) println(arg) • // yield; create a List (5, 6, 6)
• // "x to y" syntax • val fruits = List("apple", "banana",
• for (i <- 0 to 5) println(i) "lime", "orange")
• // "x to y by" syntax • val fruitLengths = for {
• for (i <- 0 to 10 by 2) println(i) • f <- fruits
• // yield expression; create a vector • if f.length > 4
• val x = for (i <- 1 to 5) yield i * 2 • } yield f.length
While and do-while
// while loop // do-while
while(condition) { do {
statement(a) statement(a)
statement(b) statement(b)
} }
while(condition)
Classes
• No need to create “get” and “set” methods to access the fields in the class.
class Person(var firstName: String, var lastName: String) {
def printFullName() = println(s"$firstName $lastName")
}
// This is how you use that class:
val p = new Person("Julia", "Kern")
println(p.firstName)
p.lastName = "Manes"
p.printFullName()
Scala Methods
• With return type:
def sum(a: Int, b: Int): Int = a + b
def concatenate(s1: String, s2: String): String = s1 + s2
• Without return type:
def sum(a: Int, b: Int) = a + b
def concatenate(s1: String, s2: String) = s1 + s2
• How you call the methods
val x = sum(1, 2)
val y = concatenate("foo", "bar")
Polymorphic Methods
• Methods in Scala can be parameterized by type as well as value. The
syntax is similar to that of generic classes.
• Type parameters are enclosed in square brackets, while value
parameters are enclosed in parentheses.
Example
Higher Order Function
• Higher order functions take other functions as parameters or return a
function as a result.
• val salaries = Seq(20000, 70000, 40000)
• def doubleSalary(x: Int): Int = x * 2
• Val newSalaries = salaries.map(doubleSalary) // List(40000, 140000,
80000)
• doubleSalary is a function which takes a single Int, x, and returns x *
2.
Function that Returns a Function
Anonymous Function

In general, the tuple on the left of the arrow =>

is a parameter list and the value of the
expression on the right is what gets returned.
Anonymous Function (cont.)

Underscore in this context can only be used once in the function body.
For example, to compute square, _ * _ won’t work. Instead, use Math.pow(_, 2).
Use Functions as Variables
• Function in Scala is a first class value.
• So you may treat them as if they were variables.
Common Mistakes in Functions
• Functions (procedures) that return no values
def myFunc(x: Int): Int {
return (x*x)
}
Missing =
Correct way: def myFunc(x: Int) = x*x

• Functions (procedures) that return no values

def myProc(x: Int) {
println(x*x)
}
Traits (Interfaces)
Traits Example
trait Speaker { // Dog class extends all the traits
def speak(): String // has no body, so it’s abstract class Dog(name: String) extends Speaker with
} TailWagger with Runner {
trait TailWagger { def speak(): String = "Woof!"
def startTail(): Unit = println("tail is wagging") }
def stopTail(): Unit = println("tail is stopped") // Cat class with overriding
} class Cat extends Speaker with TailWagger with
Runner {
trait Runner {
def speak(): String = "Meow"
def startRunning(): Unit = println("I’m running")
override def startRunning(): Unit =
def stopRunning(): Unit = println("Stopped println("Yeah ... I don’t run")
running")
override def stopRunning(): Unit =
} println("No need to stop")
}
Collection classes
• Basic Scala collection classes: List, ListBuffer, Vector, ArrayBuffer, Map, and
Set
• Populating lists
• val nums = List.range(0, 10)
• val nums = (1 to 10 by 2).toList
• val letters = ('a' to 'f').toList
• val letters = ('a' to 'f' by 2).toList
• Sequence methods
• val nums = (1 to 10).toList // create a list
• nums.foreach(println) // foreach()
• nums.filter(_ < 4).foreach(println) // filter()
• val doubles = nums.map(_ * 2) // map()
• nums.foldLeft(0)(_ + _) // foldLeft(0) sum them together
• nums.foldLeft(1)(_ * _) // foldLeft(1) multiply them together
Tuples
• Tuples let you put a heterogeneous collection of elements in a little
container. Tuples can contain between two and 22 values, and they
can all be different types.
• class Person(var name: String)
• val t = (11, "Eleven", new Person("Eleven"))
• Reference tuple field by t._1, t._2, and t._3, respectively.
• Assign tuple fields to variables
val (num, string, person) = (11, "Eleven", new Person("Eleven"))
Case Class
• A Case Class is just like a regular class, which has a feature for modeling unchangeable data.
• A case object has some more features than a regular object, such as serializable, ProductArity,
and hashCode().
case class employee (name:String, age:Int)
object Main
{
// Main method
def main(args: Array[String])
{
var c = employee("Nidhi", 23)

// Display both Parameter

println("Name of the employee is " + c.name);
println("Age of the employee is " + c.age);
}
}
Implicit Class
• An implicit class is a class marked with the implicit keyword. This
keyword makes the class’s primary constructor available for implicit
conversions when the class is in scope.
• They must be defined inside of another trait/class/object.
• They may only take one non-implicit argument in their constructor.
• There may not be any method, member or object in scope with the
same name as the implicit class.
Implicit Class Example
object Helpers {
implicit class IntWithTimes(x: Int) {
def times[A](f: => A): Unit = {
def loop(current: Int): Unit =
if(current > 0) {
f
loop(current - 1)
}
loop(x)
}
}
}
Generic Class
• Generic classes in Scala take a type as a parameter within square brackets []
• It gives a flexible way to create a class that deals with multiple data types.
class Stack[A] {
private var elements: List[A] = Nil
def push(x: A) { elements = x :: elements }
def peek: A = elements.head
def pop(): A = {
val currentTop = peek
elements = elements.tail
currentTop
}
}
Use a Generic Class
val stack = new Stack[Int]
stack.push(1)
stack.push(2)
println(stack.pop) // prints 2
println(stack.pop) // prints 1
Spark Shell
• spark-shell --master local[4] will start a Spark shell running on local
machine with 4 CPUs.
• The --master option specifies the master URL for a distributed cluster,
or local to run locally with one thread, or local[N] to run locally with N
threads.
• You should start by using local for testing. For a full list of options, run
Spark shell with the --help option.
Power User Mode
• Spark shell truncates output from your program. To fix that you can
set the maximum length of the strings printed by the REPL in power
user mode as follows:
RDD Operations
• Scala programs are running on a cluster, multiple nodes, for parallel
computing.
• RDD are lazy design; execute when an actual action is called.
• Two types of operations
• transformations, which create a new dataset from an existing one, and
• actions, which return a value to the driver program after running a
computation on the dataset.
• Two types of shared variable in Spark
• Broadcast variables, used to cache a value in memory on all nodes
• Accumulator, used for counters and sums, globally across all nodes
RDD Transformations (subset)
• map(func) Return a new distributed dataset formed by passing each element of the source through a function
func.
• filter(func) Return a new dataset formed by selecting those elements of the source on which func returns true.
• flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return
a Seq rather than a single item).
• groupByKey([numPartitions]) When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>)
pairs. Note: Aggregation (such as a sum or average) over each key, using reduceByKey or aggregateByKey will yield
much better performance. Note: By default, the level of parallelism in the output depends on the number of partitions
of the parent RDD. You can pass an optional numPartitions argument to set a different number of tasks.
• reduceByKey(func, [numPartitions]) When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where
the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.
• aggregateByKey(zeroValue)(seqOp, combOp, [numPartitions]) When called on a dataset of (K, V) pairs, returns a
dataset of (K, U) pairs where the values for each key are aggregated using the given combine functions and a neutral
"zero" value. Allows an aggregated value type that is different than the input value type, while avoiding unnecessary
allocations.
• sortByKey([ascending], [numPartitions]) When called on a dataset of (K, V) pairs where K implements Ordered,
returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending
argument.
• join(otherDataset, [numPartitions]) When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W))
pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and
fullOuterJoin.
RDD Actions (subset)
• reduce(func) Aggregate the elements of the dataset using a commutative and associative function func (which takes two arguments and
returns one).
• collect() Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter for a small subset of
the data.
• count() Return the number of elements in the dataset.
• first() Return the first element of the dataset (similar to take(1)).
• take(n) Return an array with the first n elements of the dataset.
• takeOrdered(n, [ordering]) Return the first n elements of the RDD using either their natural order or a custom comparator.
• saveAsTextFile(path) Write the elements of the dataset as a text file (or set of text files) in a given directory in the local filesystem, HDFS
or any other Hadoop-supported file system. Spark will call toString on each element to convert it to a line of text in the file.
• saveAsSequenceFile(path) (Java and Scala) Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local
filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that implement Hadoop's
Writable interface. In Scala, it is also available on types that are implicitly convertible to Writable (Spark includes conversions for basic
types like Int, Double, String, etc).
• saveAsObjectFile(path) (Java and Scala) Write the elements of the dataset in a simple format using Java serialization, which can then be
loaded using SparkContext.objectFile().
• countByKey() Only available on RDDs of type (K, V). Returns a hashmap of (K, Int) pairs with the count of each key.
• foreach(func) Run a function func on each element of the dataset. This is usually done for side effects such as updating an
Accumulator or interacting with external storage systems. Note: modifying variables other than Accumulators outside of the foreach()
may result in undefined behavior. See Understanding closures for more details.
Example
Running RDD Operations on Spark
• Spark breaks up the processing of RDD operations into tasks, each of
which is executed by an executor.
• Prior to execution, Spark computes the task’s closure.
• The closure is those variables and methods which must be visible for
the executor to perform its computations on the RDD (in the
following example, the foreach()).
• This closure is serialized and sent to each executor as a copy.
Example: Architecture Pitfalls

var counter = 0 // this counter used in foreach below will be sent to each
executor (multiple copies)
var rdd = sc.parallelize(data) //distribute user created data over the cluster

// Wrong: Don't do this!!

rdd.foreach(x => counter += x)

println("Counter value: " + counter)

Program Behaves Differently
• The variables within the closure sent to each executor are now copies
and thus, when counter is referenced within the foreach function, it’s
no longer the counter on the driver node.
• There is still a counter in the memory of the driver node but this is no
longer visible to the executors!
• The final value of counter will still be zero since all operations
on counter were referencing the value within the serialized closure.
• Use Accumulator instead for counters or sums.
Accumulators Example
For more information
• Scala Documentation https://docs.scala-lang.org/
• Scala Book, Meriam Lachkar and Alvin Alexander, https://docs.scala-
lang.org/overviews/scala-book/introduction.html
• Programming in Scala, First Edition by Martin Odersky, Lex Spoon, and
Bill Venners, https://www.artima.com/pins1ed/
• RDD Programming Guide, https://spark.apache.org/docs/latest/rdd-
programming-guide.html#rdd-operations

Scala
No ratings yet
Scala
25 pages
A462766162 - 28953 - 7 - 2025 - Introduction To Scala
No ratings yet
A462766162 - 28953 - 7 - 2025 - Introduction To Scala
198 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Functional Programming
No ratings yet
Functional Programming
1,020 pages
2 Scala 1
No ratings yet
2 Scala 1
95 pages
Introduction To Scala
100% (5)
Introduction To Scala
32 pages
Scala Basics Presentation (Part1) Danylo Hurin
No ratings yet
Scala Basics Presentation (Part1) Danylo Hurin
44 pages
Introduction To Scala
No ratings yet
Introduction To Scala
191 pages
Spark and Scala - Module 4
No ratings yet
Spark and Scala - Module 4
43 pages
Scala SGD
No ratings yet
Scala SGD
43 pages
Scala (Cheatsheet)
100% (2)
Scala (Cheatsheet)
2 pages
Scala Training
No ratings yet
Scala Training
52 pages
Scala Unit 2
No ratings yet
Scala Unit 2
27 pages
Course - 600+ Scala Interview Questions Practice Test - Udemy
No ratings yet
Course - 600+ Scala Interview Questions Practice Test - Udemy
35 pages
Unifying Oo and Functional Programming Jorge Ortiz
No ratings yet
Unifying Oo and Functional Programming Jorge Ortiz
55 pages
Scala Lightning Tour
No ratings yet
Scala Lightning Tour
8 pages
Scala Tutorial
100% (3)
Scala Tutorial
15 pages
10.0 Peripheral Devices
No ratings yet
10.0 Peripheral Devices
67 pages
Scala Question Answers
No ratings yet
Scala Question Answers
20 pages
Scala Tutorial
No ratings yet
Scala Tutorial
36 pages
F:/lenovo data/2/desktop/ACCA/P1/Audio Lectures/extracts/p1 - LSBF - Video - Sessions/ Sessions
No ratings yet
F:/lenovo data/2/desktop/ACCA/P1/Audio Lectures/extracts/p1 - LSBF - Video - Sessions/ Sessions
1 page
Introduction To Data Science Programming Handout Set 2
No ratings yet
Introduction To Data Science Programming Handout Set 2
52 pages
Scala Introduction: Saiful Bahri
No ratings yet
Scala Introduction: Saiful Bahri
40 pages
SDN Unit-1
No ratings yet
SDN Unit-1
2 pages
Scala Interview
No ratings yet
Scala Interview
26 pages
Notes
No ratings yet
Notes
26 pages
Scala For Java Developers: Tuesday, January 13, 2009
No ratings yet
Scala For Java Developers: Tuesday, January 13, 2009
58 pages
Scala
No ratings yet
Scala
27 pages
Anurag Group of Institutions: (Formerly CVSR College of Engineering)
No ratings yet
Anurag Group of Institutions: (Formerly CVSR College of Engineering)
34 pages
Dell Alienware M17x R3 Compal LA-6601P Rev 1.0 Schematics
No ratings yet
Dell Alienware M17x R3 Compal LA-6601P Rev 1.0 Schematics
63 pages
Mastering Advanced Scala Sample
No ratings yet
Mastering Advanced Scala Sample
21 pages
Scala Lift Off 2009
No ratings yet
Scala Lift Off 2009
37 pages
Essential Essential Scala
No ratings yet
Essential Essential Scala
97 pages
Self Leaning Tools Grade 4 1
No ratings yet
Self Leaning Tools Grade 4 1
36 pages
Oop Handout Princple Programming Language
No ratings yet
Oop Handout Princple Programming Language
26 pages
Esra
No ratings yet
Esra
5 pages
Lab04 Scala
No ratings yet
Lab04 Scala
9 pages
IndicThreads-Pune12-Polyglot & Functional Programming On JVM-Old
No ratings yet
IndicThreads-Pune12-Polyglot & Functional Programming On JVM-Old
27 pages
Installation & Troubleshooting Manual: Enertria
No ratings yet
Installation & Troubleshooting Manual: Enertria
8 pages
Ecma 167
No ratings yet
Ecma 167
150 pages
Day1 Scala Crash Course
No ratings yet
Day1 Scala Crash Course
15 pages
Scala Tutorial For Java Programmers (Michel Schinz, Philipp Haller)
No ratings yet
Scala Tutorial For Java Programmers (Michel Schinz, Philipp Haller)
15 pages
A Brief Intro To: Scala
No ratings yet
A Brief Intro To: Scala
56 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
PPS Mini Project
No ratings yet
PPS Mini Project
6 pages
Module 4-Modular Programming in C
No ratings yet
Module 4-Modular Programming in C
21 pages
Lecture 4.1 - Quantum Query Algorithms
No ratings yet
Lecture 4.1 - Quantum Query Algorithms
38 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
ISPF
100% (1)
ISPF
58 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Coal Lab 8 22K-4161
No ratings yet
Coal Lab 8 22K-4161
14 pages
Configuring PVPGN For WarCraft III
90% (10)
Configuring PVPGN For WarCraft III
3 pages
Functional Programming: Int Double
No ratings yet
Functional Programming: Int Double
27 pages
Scala Constructs: Concepts of Functional Programming
No ratings yet
Scala Constructs: Concepts of Functional Programming
21 pages
A Scala Tutorial For Java Programmers: January 16, 2014
No ratings yet
A Scala Tutorial For Java Programmers: January 16, 2014
15 pages
Scala Quick Reference PDF
No ratings yet
Scala Quick Reference PDF
6 pages
Scala
No ratings yet
Scala
15 pages
02 Getting Started With Scala
No ratings yet
02 Getting Started With Scala
25 pages
Networking Project
No ratings yet
Networking Project
42 pages
Project Compputer Class 9
No ratings yet
Project Compputer Class 9
19 pages
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Scala Basic Interview Questions
No ratings yet
Scala Basic Interview Questions
16 pages
Scala Cheat Sheet
No ratings yet
Scala Cheat Sheet
2 pages
A Scala Tutorial For Java Programmers: September 5, 2013
No ratings yet
A Scala Tutorial For Java Programmers: September 5, 2013
15 pages
50 Java Concepts Every Developer Should Know
From Everand
50 Java Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Duolingo Roster
No ratings yet
Duolingo Roster
4 pages
State Machine Entry and Debugging Tutorial
No ratings yet
State Machine Entry and Debugging Tutorial
32 pages
Building Your Own Function Objects: Scala in Action
No ratings yet
Building Your Own Function Objects: Scala in Action
4 pages
Benchmarking Cloud Serving Systems With YCSB
No ratings yet
Benchmarking Cloud Serving Systems With YCSB
12 pages
Scalacheatsheet 150511011517 Lva1 App6892 PDF
No ratings yet
Scalacheatsheet 150511011517 Lva1 App6892 PDF
2 pages
Simple Object Access Protocol
No ratings yet
Simple Object Access Protocol
13 pages
Mini Thermal Receipt Printer
No ratings yet
Mini Thermal Receipt Printer
19 pages
Product CI858
No ratings yet
Product CI858
3 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Me909u521 Brief Datasheet
No ratings yet
Me909u521 Brief Datasheet
1 page
Scala Reference
No ratings yet
Scala Reference
6 pages
Question Bank: Subject Code: Ec1303 Sem / Year Subject Name
No ratings yet
Question Bank: Subject Code: Ec1303 Sem / Year Subject Name
3 pages
Scala Quick Reference
No ratings yet
Scala Quick Reference
6 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Huawei FusionServer Pro V5 Rack Server Data Sheet
No ratings yet
Huawei FusionServer Pro V5 Rack Server Data Sheet
22 pages
(20-58) Charging Case Firmware Update Guide For R180 - Rev1.1
No ratings yet
(20-58) Charging Case Firmware Update Guide For R180 - Rev1.1
6 pages
Scala Cheat Sheet Amresh
No ratings yet
Scala Cheat Sheet Amresh
2 pages
Day 5 MOBILE TECHNOLOGIES
No ratings yet
Day 5 MOBILE TECHNOLOGIES
3 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Servicenow Discovery: The It Challenge Benefits Accelerate Time To Value
No ratings yet
Servicenow Discovery: The It Challenge Benefits Accelerate Time To Value
3 pages
Brief History of The Notebook
No ratings yet
Brief History of The Notebook
12 pages
Scala Cheatsheet
No ratings yet
Scala Cheatsheet
2 pages

Introduction To Scala in Spark

Uploaded by

Introduction To Scala in Spark

Uploaded by

Introduction to Scala in Spark

Float 32 bit IEEE 754 single-precision float

String A sequence of Chars

In general, the tuple on the left of the arrow =>

• Functions (procedures) that return no values

// Display both Parameter

// Wrong: Don't do this!!

println("Counter value: " + counter)

You might also like