0% found this document useful (0 votes)
47 views43 pages

Scala SGD

The document outlines a class session focused on Scala and Spark, detailing the schedule, topics to be covered, and resources for learning Scala. Key concepts include control structures, functions, collections, and case classes, along with practical examples and exercises. Additionally, it introduces accumulators in Spark and references for further study in Scala programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views43 pages

Scala SGD

The document outlines a class session focused on Scala and Spark, detailing the schedule, topics to be covered, and resources for learning Scala. Key concepts include control structures, functions, collections, and case classes, along with practical examples and exercises. Additionally, it introduces accumulators in Spark and references for further study in Scala programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Scala, Accumulators,

DataFrames and
Datasets
Annoncements and plan for
today
• Class and labsession of Feb24 moved to, respectively,
March 6, 9am, and March 7, 2pm.

• The day iss dedicated to Spark and Scala

• We will see basic notions and code in Scala sewveral


algorithms, even in the morning

• Solutions of the mid-term either in the morning or the


afternoon, depending on available time
Plan for Scala
• The basics

• Control structures and functions

• Collections

• Case classes

• Structure of a Scala program


Interactive mode -
the Scala interpreter

• Instructions for installing Scala can be found here


http://horstmann.com/scala/install/

• This link is related to the book I am referring in


these slides, considered by Martin Odersky as the
best book for quickly learning the essentials of
Scala
Interactive mode -
the Scala interpreter
scala> 8 * 5 + 2
res0: Int = 42

scala> 0.5 * res0


res57: Double = 21.0

scala> "Hello, " + res0


res58: String = Hello, 42

scala> 1 to 10
res0: scala.collection.immutable.Range.Inclusive = Range 1 to 10

scala>
Scala types

https://docs.scala-lang.org/tour/uni ed-types.html
fi
Focus on range collections

scala> 1 to 10
res1: scala.collection.immutable.Range.Inclusive = Range 1 to 10

scala> 1 until 10
res2: scala.collection.immutable.Range = Range 1 until 10

scala> 1 to 10 by 2
res3: scala.collection.immutable.Range = inexact Range 1 to 10 by 2

scala>
Range for populating
sequences

scala> val x = (1 to 10).toList


x: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> val x = (1 to 10).toArray


x: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> val x = (1 to 10).toSet


x: scala.collection.immutable.Set[Int] = Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)

scala>
Values and Variables
scala> val answer = 8 * 5 + 2
answer: Int = 42 • Names can be either of the val
scala> 0.5 * answer
or variable kind
res59: Double = 21.0

scala> answer = 0
• Val is used for names
<console>:13: error: reassignment to val associated to constants
answer = 0
^

scala> var counter = 0


• While variable is for names
counter: Int = 0 whose value may change
scala> counter = 1
counter: Int = 1 • Suggestion: use as much as
scala> possible val names in
programs.
Control stucures and
functions
• Differently from Java and C++, in Scala almost all
constructs have a value

• For instance

• An if-expression has a value

• A { ; …; ….;… } block has a value, the value of


the last expression in the block
Conditional expressions
• The value of the if-expression
is the value of the expression
scala> val x=1 following if or else
x: Int = 1

scala> if (x > 0) 1 else -1 • It follows that the if-expression


res1: Int = 1
has a type, the type of its
scala> val s = if (x > 0) 1 else -1 value
s: Int = 1

scala> • What happens if the if-


expression may yeld values of
different types ?
Conditional expressions
scala> if (x > 0) "positive" else -1
res2: Any = positive
• The type of the if-expression is
Any, which is the common
scala>
super-type of all types
Conditional expressions
scala> var y=0 • An if-expression without the
y: Int = 0
else-branch can yeld no value
scala> if (y > 0) 1
res4: AnyVal = () • In this case Scala assume that
scala> y=1 a value is returned which is ()
y: Int = 1

scala> if (y > 0) 1
• () denotes ‘no value’ and its
res5: AnyVal = 1 type is Unit.
scala>
Complex branches
scala> if (n > 0) { r = r * n; n -= 1 } • Many Scala programmers
scala> if (n > 0) { prefer the second style
| r = r * n
| n -= 1
| }

scala>
Block expressions and assignements

scala> { r = r * n; n -= 1 }
• The value/type of a block
expression is the value/type of
scala> var c = { r = r * n; n -= 1 }
c: Unit = ()
its last expression

scala>

scala> var x0,y0=0.1


x0: Double = 0.1
y0: Double = 0.1

scala> { val dx = x - x0; val dy = y - y0; sqrt(dx * dx + dy * dy) }


res40: Double = 1.2727922061357855

scala>
Input and output
scala> val name = scala.io.StdIn.readLine("Your name: ")
Your name: name: String = toto

scala> print("Your age: ") ; val age = io.StdIn.readInt()


Your age: age: Int = 9

scala> printf("Hello, %s! Next year, you will be %d.\n", name, age + 1)

To read a numeric, Boolean, or character value, use readInt, readDouble,


readByte, readShort, readLong, readFloat, readBoolean, or readChar.

Only readline takes a prompt string (e.g., "Your name: “)


Loops
• In case of for the variable does not
……… n and r are initialized … need to be initialized, differently from
scala> while (n > 0) { r=r* n the while case
| n -= 1
| }
• For other advanced uses see the
scala> for (i <- 1 to n)
documentation
| r=r* i https://docs.scala-lang.org/tour/for-comprehensions.html

scala> val s = "Hello"


s: String = Hello

scala> var sum = 0


sum: Int = 0

scala> for (i <- 0 until s.length) // Last value for i is s.length - 1


| sum += s(i)

scala>
Functions
scala> def abs(x: Double) = if (x >= 0) x else -x
abs: (x: Double)Double
• Scala has functions in addition to
scala> def fac(n : Int) = { methods.
| var r = 1
| for (i <- 1 to n) r = r * i
| r
• A method operates on an object,
| } but a function doesn’t
fac: (n: Int)Int
• While types of input parameters
scala>
must be declared, the output
type could not
scala> def fac(n: Int):Int =
| if (n <= 0) 1 else n * fac(n - 1) • With the exception of recursive
fac: (n: Int)Int functions
scala>
Arrays
scala> val nums = new Array[Int](10)
nums: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

scala> val a = new Array[String](10)


a: Array[String] = Array(null, null, null, null, null,
null, null, null, null, null) • They can be either of xed or
scala> val s = Array("Hello", "World")
variable lenght
s: Array[String] = Array(Hello, World)
• These examples relate to xed-
scala> s(0) = "Goodbye"
legnth arrays

scala> s • Arrays are mutable, value at


res15: Array[String] = Array(Goodbye, World)
position i can be chnaged

scala>
fi
fi
Variable-Length Arrays
scala> import scala.collection.mutable.ArrayBuffer

scala> val b = ArrayBuffer[Int]()


b: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer()

scala> b += 1
res17: b.type = ArrayBuffer(1)

scala> b += (1, 2, 3, 5)
res18: b.type = ArrayBuffer(1, 1, 2, 3, 5)

scala> b ++= Array(8, 13, 21)


res19: b.type = ArrayBuffer(1, 1, 2, 3, 5, 8, 13, 21)

scala> b.trimEnd(5)

scala> b
res27: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 1, 2)

scala>
Traversing Arrays
scala> b ++= Array(8, 13, 21)
res19: b.type = ArrayBuffer(1, 1, 2, 3, 5, 8, 13, 21)

scala> b.trimEnd(5)

scala> for (i <- 0 until b.length)


| println(i + ": " + b(i))
0: 1
1: 1
2: 2

scala> for (elem <- b)


| println(elem)
1
1
2

scala>
Transforming Arrays

scala> b ++= Array(8, 13, 21)


res19: b.type = ArrayBuffer(1, 1, 2, 3, 5, 8, 13, 21)

scala> b.trimEnd(5)

scala> for (elem <- b if elem % 2 == 0) yield 2 * elem


res25: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(4)

scala> b.filter(x => x % 2 == 0 ).map(x => 2 * x)


res26: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(4)

scala> b.filter(_ % 2 == 0).map(2 * _)


res27: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(4)

scala>
Tuples - immutable
scala> val t = (1, 3.14, "Fred")
t: (Int, Double, String) = (1,3.14,Fred)

scala> t._1
res49: Int = 1

scala> val (first, second, third) = t pattern matching on tuples


first: Int = 1
second: Double = 3.14
third: String = Fred

scala> val (first, second, _) = t


first: Int = 1
second: Double = 3.14
Zipping
scala> val symbols = Array("<", "-", ">")
symbols: Array[String] = Array(<, -, >)

scala> val counts = Array(2, 10, 2)


counts: Array[Int] = Array(2, 10, 2)

scala> val pairs = symbols.zip(counts)


pairs: Array[(String, Int)] = Array((<,2), (-,10), (>,2))

scala> for ((s, n) <- pairs) print(s * n)


<<---------->>
scala>
Case classes and pattern
matching

scala> case class Message(sender: String, recipient: String, body: String)

defined class Message

scala> val message1 = Message("[email protected]", "[email protected]", "Ça va ?”)

message1: Message = Message([email protected],[email protected],Ça va ?)

scala> println(message1.sender)

[email protected]

scala>
Pattern matching on case classes

abstract class Noti cation

case class Email(sender: String, title: String, body: String) extends Noti cation

case class SMS(caller: String, message: String) extends Noti cation

case class VoiceRecording(contactName: String, link: String) extends Noti cation

https://docs.scala-lang.org/tour/pattern-matching.html
fi
fi
fi
fi
Pattern matching on case classes

abstract class Noti cation

case class Email(sender: String, title: String, body: String) extends Noti cation

case class SMS(caller: String, message: String) extends Noti cation

case class VoiceRecording(contactName: String, link: String) extends Noti cation

def showNoti cation(noti cation: Noti cation): String = {


noti cation match {
case Email(sender, title, _) =>
s"You got an email from $sender with title: $title"
case SMS(number, message) =>
s"You got an SMS from $number! Message: $message"
case VoiceRecording(name, link) =>
s"You received a Voice Recording from $name! Click the link to hear it: $link"
}
}
val someSms = SMS("12345", "Are you there?")
val someVoiceRecording = VoiceRecording("Tom", "voicerecording.org/id/123")

println(showNoti cation(someSms)) // prints You got an SMS from 12345! Message: Are you there?

println(showNoti cation(someVoiceRecording)) // prints You received a Voice Recording from Tom! Click the link to hear it:
voicerecording.org/id/123

https://docs.scala-lang.org/tour/pattern-matching.html
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
What to use in place of
numpy?

• You can rely on Nd4J

• Used in this nice post presenting a Scala


implementation of linear regression

https://www.cpuheater.com/scala/machine-learning-scala-linear-regression/
Much more in the following
references

• Scala for the impatient. Cay Horstmann

• Programming in Scala, 3rd Edition. Martin Odersky


et al.

• Scala Cook book, Recipes for Object-Oriented


and Functional Programming By Alvin Alexander
Spark
Accumulators

needed in the GCC project


Accumulators
Aggregating values from worker nodes back to the driver program.
Demo on DataBricks
notebook
Some more practice
with Scala and RDD
Let’s focus on Scala rst,
and then switch to RDD

• Implement functions for vector manipulations

• Use these functions for Batch Gradient Descendent


for simple Linear Regression

• Before going back to the Databricks note book,


let’s quickly recall some basics about BGD
fi
Large scale linear
regression and gradient
descendant optimization

Based on material of Stanford CS221 course Arti cial Intelligence: Principles and Techniques
fi
Linear regression
fw (x) = w · (x)
3
( (x), y)
w · (x)

2 residual w · (x) y

0
0 1 2 3 4

(x) assume this


denotes [x,1]
Definition: residual

The residual is (w · (x)) y, the amount by which prediction


fw (x) = w · (x) overshoots the target y.

CS221 / Spring 2019 / Charikar & Sadigh [linear regression] 39


Linear regression
fw (x) = w · (x)

Definition: squared loss

Losssquared (x, y, w) = (fw (x) y )2


| {z }
residual

Example:

w = [2, 1], (x) = [2, 0], y = 1

Losssquared (x, y, w) = 25
Loss minimization framework
So far: one example, Loss(x, y, w) is easy to minimize.

Key idea: minimize training loss


1 X
TrainLoss(w) = Loss(x, y, w)
|Dtrain |
(x,y)2Dtrain

min TrainLoss(w)
w2Rd

Key: need to set w to make global tradeo↵s — not every example can
be happy.

CS221 / Spring 2019 / Charikar & Sadigh 44


How to optimize?

Definition: gradient

The gradient rw TrainLoss(w) is the direction that increases the


loss the most.

Algorithm: gradient descent

Initialize w = [0, . . . , 0]
For t = 1, . . . , T :
w w ⌘ rw TrainLoss(w)
|{z} | {z }
step size gradient

CS221 / Spring 2019 / Charikar & Sadigh 52


Gradient descent is slow
1 X
TrainLoss(w) = Loss(x, y, w)
|Dtrain |
(x,y)2Dtrain

Gradient descent:

w w ⌘rw TrainLoss(w)

Problem: each iteration requires going over all training examples —


expensive when have lots of data!

CS221 / Spring 2019 / Charikar & Sadigh 56


First solution

Gradient desce
1
TrainLoss(w) =
|Dtrain |
(x
Least squares regression
Objective function:

1 X
TrainLoss(w) = (w · (x) y)2
|Dtrain |
(x,y)2Dtrain

Gradient (use chain rule):


reduce —> vector
1 X map
rw TrainLoss(w) = 2( w · (x) y ) (x)
|Dtrain | | {z }
(x,y)2Dtrain
prediction target

[semi-live solution]
Array[Double,
Array[Double]]

CS221 / Spring 2019 / Charikar & Sadigh 54


Lab session this afternoon:

Implement LR with BDG in Spark, by


generating a simple train set by means of
y = x +2

Now, let’s deal with the needed auxiliary


functions

You might also like