Introduction To Scala in Spark
Introduction To Scala in Spark
Dan Lo
Department of Computer Science
Kennesaw State University
What is Scala?
• Scala stands for "scalable language.“
• Both object-oriented and functional
• Aim to address criticisms of Java
• To be concise!
• Designed by Martin Odersky (2004), a German computer scientist and
professor of programming methods at École Polytechnique Fédérale
de Lausanne in Switzerland.
• Martin’s Ph.D. advisor, Niklaus Wirth, developed Pascal.
Scala Overview
• It’s a high-level language.
• It’s statically typed.
• Its syntax is concise but still readable — we call it expressive.
• It supports the object-oriented programming (OOP) paradigm.
• It supports the functional programming (FP) paradigm.
• It has a sophisticated type inference system.
• Scala code results in .class files that run on the Java Virtual Machine
(JVM).
• It’s easy to use Java libraries in Scala.
Two types of variables
• val is an immutable variable — like final in Java — and should be
preferred
• var creates a mutable variable, and should only be used when there is
a specific reason to use it
• Examples:
• val x = 1 //immutable
• var y = 0 //mutable
Common Data Types
Byte 8 bit signed value. Range from -128 to 127
Short 16 bit signed value. Range -32768 to 32767
Int 32 bit signed value. Range -2147483648 to 2147483647
Long 64 bit signed value. -9223372036854775808 to 9223372036854775807
Underscore in this context can only be used once in the function body.
For example, to compute square, _ * _ won’t work. Instead, use Math.pow(_, 2).
Use Functions as Variables
• Function in Scala is a first class value.
• So you may treat them as if they were variables.
Common Mistakes in Functions
• Functions (procedures) that return no values
def myFunc(x: Int): Int {
return (x*x)
}
Missing =
Correct way: def myFunc(x: Int) = x*x
var counter = 0 // this counter used in foreach below will be sent to each
executor (multiple copies)
var rdd = sc.parallelize(data) //distribute user created data over the cluster