Spark and Scala - Module 1
Spark and Scala - Module 1
ᗍ Big Data
ᗍ IBM’s Big Data Definition
ᗍ Some Big Data Examples
ᗍ Sparks Basics
ᗍ Why Spark ?
ᗍ Spark Components
ᗍ Scala Basics
ᗍ Why Scala ?
ᗍ Scala Job Trends
ᗍ Users of Scala
ᗍ Scala Frameworks
ᗍ Scala Usage
ᗍ Software Installation
ᗍ Scala Hands-on
ᗍ Scala community
Velocity
Real-time capture and Real-time analytics
Volume Variety
Petabytes per day/week Unstructured data, web logs,
audio, video, image
Veracity
Uncertainty of data
Big Data
NYSE generates about one terabyte of new trade data per day to perform stock trading analytics to determine
trends for optimal trades
ALL of the options are Big Data solutions Scenario. Even if the
input size of the problem is small, the processing might make
the scenario as Big Data Problem
ᗍ Apache Spark is a general-purpose cluster in-memory computing system which is used for data analytics
ᗍ It provides high-level APIs in Java, Scala and Python and an optimized engine that supports general execution
graphs
ᗍ Apache Spark Provides various high level tools like Spark SQL for structured data processing, R programming
Language for analyzing large datasets and MLlib for Machine Learning etc.
Spark framework is polyglot – It can be programmed in several programming languages (Java, Scala ,R 3.2.2 and
Python supported)
R 3.2.2
Spark SQL
Spark SQL Spark
Spark Streaming
Streaming MLib
MLib Machine
Machine GraphX graph
GraphX graph
Structured Data
Structured Data real-time
real-time Learning
Learning Processing
Processing
Spark Core
ᗍ Aimed to implement common programming patterns in a concise, elegant, and type-safe way
ᗍ Supports both object-oriented and functional programming styles, thus helping programmers to be more
productive
ᗍ Publicly released in January 2004 on the JVM platform and a few months later on the .NET platform
ᗍ Dynamically typed languages bind the type to the actual value referenced by a variable .Example : python
ᗍ Many big data technologies use Scala like Spark, Kakfka, Storm, Akka, Scalding and web frameworks like Play
Play is a high-productivity Java and Scala Scalding is a Scala library that makes it easy
web application framework that integrates to specify Hadoop MapReduce jobs. Scalding
the components and APIs you need for is built on top of Cascading, a Java library
modern web application development that abstracts away low-level Hadoop details
Mobile Android Apps Digital Subscriber Line GUI (Graphical User Interface)
ᗍ Type scala
ᗍ Double-click on Vmware Player Workstation Open the Virtual Machine (it will open the Ubuntu desktop)
ᗍ Install scala
ᗍ Then select New Terminal It will open the Terminal and install Scala then type scala
ᗍ The best choices for Scala IDEs are IntelliJ IDEA and Eclipse because they are excellent in terms of stability and
features like type interference, code inspection and memory consumption
ᗍ Scala scripts can be written in text files and saves the script with a .scala extension
ᗍ It indicates to the operating system and programmer that the file is actually a scala program
Popular ways to connect with the Scala community are via mailing lists or IRC channels
Though there are plenty of opportunities to connect face-to-face with others in the community– for example, via
local Scala Meetups, or local Scala user groups
a) True
b) False
a) True
b) False
False
a) True
b) False
a) True
b) False
False