Hadoop Spark
Hadoop Spark
Software Engineering
Hadoop and Spark
David A. Wheeler
SWE 622
George Mason University
outline
Apache Hadoop
Hadoop Distributed File System (HDFS)
MapReduce
Apache Spark
Source: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0004.html
Source: https://spark.apache.org/docs/latest/quick-start.html
Source: https://spark.apache.org/docs/latest/programming-guide.html
points = spark.textFile(...).map(parsePoint).cache()
w = numpy.random.ranf(size = D) # current separating plane
for i in range(ITERATIONS):
gradient = points.map(
lambda p: (1 / (1 + exp(-p.y*(w.dot(p.x)))) - 1) * p.y * p.x
).reduce(lambda a, b: a + b)
w -= gradient
print "Final separating plane: %s" % w
Source: https://spark.apache.org/docs/latest/quick-start.html
Source: https://spark.apache.org/docs/latest/quick-start.html