8 Apache Spark
8 Apache Spark
3 main steps:
• Create RDD
• Transformation
• Actions
Creating RDDs
Two ways creating an RDD
• Initialize a collection of values
val rdd = sc.parallelize(Seq(1,2,3,4))
• Load data file(s) from fileSystem, HDFS, S3, etc.
val rdd = sc.textFile(“file:///anyText.txt”)
RDD Operations
Two types of operations
Transformations: Define
a new RDD based on
current RDD(s)
20
Two ways working with spark
• Interactively (Spark-shell)
• for learning or data exploration
• Python or Scala
• https://spark.apache.org/docs/latest/
• https://sparkbyexamples.com/