0% found this document useful (0 votes)

162 views93 pages

CopiousData TheKillerAppForFP PDF

Uploaded by

yasir2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

162 views93 pages

CopiousData TheKillerAppForFP PDF

Uploaded by

yasir2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Conference Home Page @GOTOCHGO

Welcome to
GOTO 2014 Night Chicago # 1
Speakers
Steve Vinoski, Chief Architect Basho
Dave Thomas, Bedarra Research Labs

2003 Bedarra Research Labs. All rights reserved.

Friday, November 29, 13

Conrmed speakers more to come

2003 Bedarra Research Labs. All rights reserved.

Friday, November 29, 13

GOTO Chicago Conference May 20-21, 2014

3 Keynotes - 30 Invited Talks, 8 Full Day Workshops

JAOO Aarhus Denmark 16 years ago GOTO Aarhus, Amsterdam, Berlin,
Chicago, Zurich, QCON London, FlowCon SFO, CodeMesh London, YOW!
Melbourne, Brisbane, Sydney, Lambda Jam Brisbane, Chicago, Erlang Factory SFO,
London

Mission Compliment the local tech community by:

1. Bringing World Leading Software Experts meet Local Developers
2. Focusing on the latest and emerging technologies and practices
3. Helping Expand the local network who really get software and will
be next generation of enlightened software leaders/executives.
4. Inviting all speakers by independent PC based on their
competence, and reputation independent of sponsor, platform or
organizer bias
Conference Home Page

2003 Bedarra Research Labs. All rights reserved.

Friday, November 29, 13

Copious Data: The Killer App
for Func>onal Programming

Nov. 21, 2013

GOTO Chicago 2014 Night #2
[email protected]
polyglotprogramming.com/talks
Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Copyright Dean Wampler, 2011-2013, All Rights Reserved. Photos can only be used with
permission. Otherwise, the content is free to use.
Photo: Cloud Gate (a.k.a. The Bean) in Millenium Park, Chicago, Illinois, USA
Consultant at
Typesafe

Dean Wampler... Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Typesafe builds tools for creating Reactive Applications, http://typesafe.com/platform. See

also the Reactive Manifesto, http://www.reactivemanifesto.org/

Photo: The Chicago River

Founder,
Chicago-Area Scala
Enthusiasts
and co-organizer,
Chicago Hadoop User Group

Dean Wampler... Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Ive been doing Scala for 6 years and Big Data for 3.5 years.
Programming

Hive
Functional
Programming
for Java Developers
Dean Wampler,
Jason Rutherglen &
Dean Wampler Edward Capriolo

Dean Wampler... Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

My books
What Is Big err
Copious Data?

8 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Copious
Data
Data so big that
tradi>onal solu>ons are
too slow, too small, or
too expensive to use.
Hat tip: Bob Korbus

9 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Big Data a buzz word, but generally associated with the problem of data sets too big to
manage with traditional SQL databases. A parallel development has been the NoSQL
movement that is good at handling semistructured data, scaling, etc.
3 Trends

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Three prevailing trends driving data-centric computing.
Photo: Prizker Pavilion, Millenium Park, Chicago (designed by Frank Gehry)
Data Size

11 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Data volumes are obviously growing rapidly.
Facebook now has over 600PB (Petabytes) of data in Hadoop clusters!
Formal Schemas

12 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
There is less emphasis on formal schemas and domain models, i.e., both relational models of data and OO models, because data schemas and
sources change rapidly, and we need to integrate so many disparate sources of data. So, using relatively-agnostic software, e.g., collections of
things where the software is more agnostic about the structure of the data and the domain, tends to be faster to develop, test, and deploy. Put
another way, we find it more useful to build somewhat agnostic applications and drive their behavior through data...
Data-Driven Programs

13 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
This is the 2nd generation Stanley, the most successful self-driving car ever built (by a Google-Stanford) team. Machine learning is growing in
importance. Here, generic algorithms and data structures are trained to represent the world using data, rather than encoding a model of the
world in the software itself. Its another example of generic algorithms that produce the desired behavior by being application agnostic and data
driven, rather than hard-coding a model of the world. (In practice, however, a balance is struck between completely agnostic apps and some
engineering towards for the specific problem, as you might expect...)
Probabilistic
Models vs.
Formal
Grammars

tor.com/blogs/...
14 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
An interesting manifestation of this trend is the public argument between Noam Chomsky and Peter Norvig on the nature of language. Chomsky
long ago proposed a hierarchical model of formal language grammars. Peter Norvig is a proponent of probabilistic models of language. Indeed all
successful automated language processing systems are probabilistic.
http://www.tor.com/blogs/2011/06/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai
What Is
MapReduce?

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Cloud Gate - The Bean - in Millenium Park, Chicago, on a sunny day - with some of my relatives ;)
Hadoop is the dominant
copious data pla]orm
today.

16 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
A Hadoop Cluster
Hadoop v1.X Cluster
master backup master
JobTracker Secondary
NameNode NFS NameNode
Disk

node node node

TaskTracker TaskTracker TaskTracker
DataNode DataNode DataNode

Disk
Disk
Disk
Disk
Disk Disk
Disk
Disk
Disk
Disk Disk
Disk
Disk
Disk
Disk

17 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
A Hadoop v1.X cluster. (V2.X introduces changes in the master processes, including support for high-availability and federation). In brief:
JobTracker (JT): Master of submitted MapReduce jobs. Decomposes job into tasks (each a JVM process), often run where the blocks of input files
are located, to minimize net IO.
NameNode (NN): HDFS (Hadoop Distributed File System) master. Knows all the metadata, like block locations. Writes updates to a shared NFS disk
(in V1) for use by the Secondary NameNode.
Secondary NameNode (SNN): periodically merges in-memory HDFS metadata with update log on NFS disk to form new metadata image used when
booting the NN and SNN.
TaskTracker: manages each task given to it by the JT.
DataNode: manages the actual blocks it has on the node.
Disks: By default, Hadoop just works with a bunch of disks - cheaper and sometimes faster than RAID. Blocks are replicated 3x (default) so most
HW failures dont result in data loss.
MapReduce in Hadoop
Lets look at a
MapReduce algorithm:
Inverted Index.
Used for text/web search.
18 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Lets walk through a simple version of computing an inverted index. Imagine a web crawler has found all docs on the web and stored their URLs
and contents in HDFS. Now well index it; build a map from each word to all the docs where its found, ordered by term frequency within the docs.
Crawl teh Interwebs
Web Crawl Map Phase

wikipedia.org/hadoop index
Hadoop provides block
MapReduce and HDFS
... ...
wikipedia.org/hadoop Hadoop provides... Map Task
... ...
...

block
wikipedia.org/hbase
... ...
HBase stores data in HDFS wikipedia.org/hbase HBase stores...
Map Task
... ...

... block
... ... Map Task
wikipedia.org/hive wikipedia.org/hive Hive queries...
Hive queries HDFS files and ... ...
HBase tables with SQL

19 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Crawl pages, including Wikipedia. Use the URL as the document id in our first index, and the contents of each document (web page) as the second
column. in our data set.
Compute Inverse Index
b Crawl Map Phase Reduce Phase
index
Reduce Task ...
block
hadoop
... ...
hbase
wikipedia.org/hadoop Hadoop provides... Map Task
hdfs
... ...
hive
Reduce Task
...

Sort, Shuffle
block
... ...
wikipedia.org/hbase HBase stores...
Map Task
...
... ...
Reduce Task

block ...
... ... Map Task
wikipedia.org/hive Hive queries...
... ... Reduce Task ...
and
...

20 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Now run a MapReduce job, where a separate Map task for each input block will be started. Each map tokenizes the content in to words, counts the
words, and outputs key-value pairs...
Compute Inverse Index
b Crawl Map Phase Reduce Phase
index
Reduce Task ...
block
... ...
(hadoop,(wikipedia.org/hadoop,1)) hadoop
hbase
wikipedia.org/hadoop Hadoop provides... (provides,(wikipedia.org/hadoop,1))
Map Task
Map Task hdfs
... ... (mapreduce,(wikipediate.org/hadoop, 1)) hive
Reduce Task
(and,(wikipedia.org/hadoop,1)) ...

Sort, Shuffle
block
... ...
(hdfs,(wikipedia.org/hadoop, 1))
wikipedia.org/hbase HBase stores...
Map Task
...
... ...
Reduce Task

block ...
... ... Map Task
wikipedia.org/hive
... Key-values output
Hive queries...
... Reduce Task ...

by first map task and

...

21 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Now run a MapReduce job, where a separate Map task for each input block will be started. Each map tokenizes the content in to words, counts the
words, and outputs key-value pairs...
Each key is a word that was found and the corresponding value is a tuple of the URL (or other document id) and the count of the words (or
alternatively, the frequency within the document). Shown are what the first map task would output (plus other k-v pairs) for the (fake) Wikipedia
Hadoop page. (Note that we convert to lower case)
Compute Inverse Index
Map Phase Reduce Phase inverse index
block
Reduce Task ... ...
hadoop (.../hadoop,1)
hbase (.../hbase,1),(.../hive,1)
... Map Task
hdfs (.../hadoop,1),(.../hbase,1),(.../hive,1)
hive (.../hive,1)
Reduce Task
... ...
Sort, Shuffle

Map Task block

... ...

Reduce Task
block
... ...
Map Task
block
Reduce Task ... ...
and (.../hadoop,1),(.../hive,1)
... ...

22 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Finally, each reducer will get some range of the keys. There are ways to control this, but well just assume that the first reducer got all keys starting
with h and the last reducer got all the and keys. The reducer outputs each word as a key and a list of tuples consisting of the URLs (or doc ids)
and the frequency/count of the word in that document, sorted by most frequent first. (All our docs have only one occurrence of any word, so the
sort is moot)
Anatomy: MapReduce Job
Map Phase Reduce Phase

Reduce Task Map (or Flatmap):

Map Task
Transform one input to
Reduce Task 0-N outputs.
Sort, Shuffle

Map Task

Reduce Task
Reduce:

Map Task Collect multiple inputs

Reduce Task
into one output.

23 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
To recap, a true functional/mathematical map transforms one input to one output, but this is generalized in MapReduce to be one to 0-N. In
other words, it should be FlatmapReduce!! The output key-value pairs are distributed to reducers. The reduce collects together multiple inputs
with the same key into
24 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Quiz. Do you understand this tweet?

So, MapReduce is
a mashup of our friends
atmap and reduce.

25 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Even in this somewhat primitive and coarse-grain framework, our functional data concepts are evident!
Today,
Hadoop is our best,
general-purpose tool
for horizontal scaling
of Copious Data.
26 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
MapReduce and Its
Discontents

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Is MapReduce the end of the story? Does it meet all our needs? Lets look at a few problems
Photo: Gratuitous Romantic beach scene, Ohio St. Beach, Chicago, Feb. 2011.
MapReduce doesnt t
all computa>on needs.
HDFS doesnt t all
storage needs.

28 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Its hard to implement
many algorithms
in MapReduce.

29 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

1-Map, 1-Reduce
phase...
30 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Even word count is not obvious. When you get to fancier stuff like joins, group-bys, etc., the
mapping from the algorithm to the implementation is not trivial at all. In fact, implementing
algorithms in MR is now a specialized body of knowledge.
Mul>ple MR jobs
required for some
algorithms.
Each one ushes its
results to disk!
31 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

If you have to sequence MR jobs to implement an algorithm, ALL the data is flushed to disk
between jobs. Theres no in-memory caching of data, leading to huge IO overhead.
MapReduce is designed
for oine, batch-mode
analy>cs.
High latency; not
suitable for event
processing.
32 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Alternatives are emerging to provide event-stream (real-time) processing.

The Hadoop Java API
is hard to use.

33 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

The Hadoop Java API is even more verbose and tedious to use than it should be.
Lets look at code for a
simpler algorithm,
Word Count.
(Tokenize as before, but
ignore original
document loca>ons.)
34 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

In Word Count, the mapper just outputs the word-count pairs. We forget about the document
URL/id. The reducer gets all word-count pairs for a word from all mappers and outputs each
word with its final, global count.
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import java.util.StringTokenizer;

class WCMapper extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

static final IntWritable one = new IntWritable(1);

static final Text word = new Text; // Value will be set in a non-thread-safe way!

@Override
public void map(LongWritable key, Text valueDocContents,
OutputCollector<Text, IntWritable> output, Reporter reporter) {
String[] tokens = valueDocContents.toString.split("\\s+");
for (String wordString: tokens) {
if (wordString.length > 0) {
word.set(wordString.toLowerCase);
output.collect(word, one);
}
}
}
}

class Reduce extends MapReduceBase

implements Reducer[Text, IntWritable, Text, IntWritable] {

public void reduce(Text keyWord, java.util.Iterator<IntWritable> valuesCounts,

OutputCollector<Text, IntWritable> output, Reporter reporter) {
int totalCount = 0;
while (valuesCounts.hasNext) {
totalCount += valuesCounts.next.get;
}
output.collect(keyWord, new IntWritable(totalCount));
}
}
35 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
This is intentionally too small to read and were not showing the main routine, which doubles the code size. The algorithm is simple, but the framework is in your
face. In the next several slides, notice which colors dominate. In this slide, its dominated by green for types (classes), with relatively few yellow functions that
implement actual operations (i.e., do actual work).
The main routine Ive omitted contains boilerplate details for configuring and running the job. This is just the core MapReduce code. In fact, Word Count is not
too bad, but when you get to more complex algorithms, even conceptually simple ideas like relational-style joins and group-bys, the corresponding MapReduce
code in this API gets complex and tedious very fast!
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import java.util.StringTokenizer;

class WCMapper extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

static final IntWritable one = new IntWritable(1);

static final Text word = new Text; // Value will be set in a non-thread-safe way!

}
output.collect(word, one);
The
}

}
} interesting
class Reduce extends MapReduceBase bits
implements Reducer[Text, IntWritable, Text, IntWritable] {

public void reduce(Text keyWord, java.util.Iterator<IntWritable> valuesCounts,

OutputCollector<Text, IntWritable> output, Reporter reporter) {
int totalCount = 0;
while (valuesCounts.hasNext) {
totalCount += valuesCounts.next.get;
}
output.collect(keyWord, new IntWritable(totalCount));
}
}
36 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
This is intentionally too small to read and were not showing the main routine, which doubles the code size. The algorithm is simple, but the framework is in your
face. In the next several slides, notice which colors dominate. In this slide, its dominated by green for types (classes), with relatively few yellow functions that
implement actual operations (i.e., do actual work).
The main routine Ive omitted contains boilerplate details for configuring and running the job. This is just the core MapReduce code. In fact, Word Count is not
too bad, but when you get to more complex algorithms, even conceptually simple ideas like relational-style joins and group-bys, the corresponding MapReduce
code in this API gets complex and tedious very fast!
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import java.util.StringTokenizer;

class WCMapper extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

static final IntWritable one = new IntWritable(1);

static final Text word = new Text; // Value will be set in a non-thread-safe way!

}
output.collect(word, one); The 90s called. They
}
}
want their EJBs back!
}

class Reduce extends MapReduceBase

implements Reducer[Text, IntWritable, Text, IntWritable] {

public void reduce(Text keyWord, java.util.Iterator<IntWritable> valuesCounts,

OutputCollector<Text, IntWritable> output, Reporter reporter) {
int totalCount = 0;
while (valuesCounts.hasNext) {
totalCount += valuesCounts.next.get;
}
output.collect(keyWord, new IntWritable(totalCount));
}
}
37 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
This is intentionally too small to read and were not showing the main routine, which doubles the code size. The algorithm is simple, but the framework is in your
face. In the next several slides, notice which colors dominate. In this slide, its dominated by green for types (classes), with relatively few yellow functions that
implement actual operations (i.e., do actual work).
The main routine Ive omitted contains boilerplate details for configuring and running the job. This is just the core MapReduce code. In fact, Word Count is not
too bad, but when you get to more complex algorithms, even conceptually simple ideas like relational-style joins and group-bys, the corresponding MapReduce
code in this API gets complex and tedious very fast!
Use Cascading (Java)

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Cascading is a Java library that provides higher-level abstractions for building data processing pipelines with concepts familiar from SQL such as a
joins, group-bys, etc. It works on top of Hadoops MapReduce and hides most of the boilerplate from you.
See http://cascading.org.
Photo: Fermi Lab Office Building, Batavia, IL.
Cascading Concepts
Data ows consist of
source and sink Taps
connected by Pipes.

39 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Word Count
Flow
Pipe ("word count assembly")
line words word count
Each(Regex) GroupBy Every(Count)

Tap Tap
HDFS
(source) (sink)

40 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Schematically, here is what Word Count looks like in Cascading. See http://
docs.cascading.org/cascading/1.2/userguide/html/ch02.html for details.
import org.cascading.*;
...
public class WordCount {
public static void main(String[] args) {
String inputPath = args[0];
String outputPath = args[1];
Properties properties = new Properties();
FlowConnector.setApplicationJarClass( properties, WordCount.class );

Scheme sourceScheme = new TextLine( new Fields( "line" ) );

Scheme sinkScheme = new TextLine( new Fields( "word", "count" ) );
Tap source = new Hfs( sourceScheme, inputPath );
Tap sink = new Hfs( sinkScheme, outputPath, SinkMode.REPLACE );

Pipe assembly = new Pipe( "wordcount" );

String regex = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)";

Function function = new RegexGenerator( new Fields( "word" ), regex );
assembly = new Each( assembly, new Fields( "line" ), function );
assembly = new GroupBy( assembly, new Fields( "word" ) );
Aggregator count = new Count( new Fields( "count" ) );
assembly = new Every( assembly, count );

FlowConnector flowConnector = new FlowConnector( properties );

Flow flow = flowConnector.connect( "word-count", source, sink, assembly);
flow.complete();
}
}
41 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Here is the Cascading Java code. Its cleaner than the MapReduce API, because the code is more focused on the algorithm with less boilerplate,
although it looks like its not that much shorter. HOWEVER, this is all the code, where as previously I omitted the setup (main) code. See http://
docs.cascading.org/cascading/1.2/userguide/html/ch02.html for details of the API features used here; we wont discuss them here, but just
mention some highlights.
Note that there is still a lot of green for types, but at least the API emphasizes composing behaviors together.
Use Scalding (Scala)

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Scalding is a Scala DSL (domain-specific language) that wraps Cascading providing an even more intuitive and more boilerplate-free API for
writing MapReduce jobs. https://github.com/twitter/scalding
Scala is a new JVM language that modernizes Javas object-oriented (OO) features and adds support for functional programming, as we discussed
previously and well revisit shortly.
import com.twitter.scalding._

class WordCountJob(args: Args)

extends Job(args) {
TextLine( args("input") )
.read
.flatMap('line -> 'word) {
line: String =>
line.trim.toLowerCase
.split("\\W+")
}
.groupBy('word) {
group => group.size('count)
}
}
.write(Tsv(args("output"))) Thats It!!
}
43 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
This Scala code is almost pure domain logic with very little boilerplate. There are a few minor differences in the implementation. You dont explicitly specify the
Hfs (Hadoop Distributed File System) taps. Thats handled by Scalding implicitly when you run in non-local model. Also, Im using a simpler tokenization
approach here, where I split on anything that isnt a word character [0-9a-zA-Z_].
There is little green, in part because Scala infers type in many cases. There is a lot more yellow for the functions that do real work!
What if MapReduce, and hence Cascading and Scalding, went obsolete tomorrow? This code is so short, I wouldnt care about throwing it away! I invested little
time writing it, testing it, etc.
Use Cascalog (Clojure)

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html
Clojure is a new JVM, lisp-based language with lots of important concepts, such as persistent datastructures.
(defn lowercase [w] (.toLowerCase w))

(?<- (stdout) [?word ?count]

(sentence ?s)
(split ?s :> ?word1)
(lowercase ?word1 :> ?word)
(c/count ?count))

Datalog-style queries
45 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Cascalog embeds Datalog-style logic queries. The variables to match are named ?foo.
Other Improved APIs:
Crunch (Java) &
Scrunch (Scala)
Scoobi (Scala)
... 46 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

See https://github.com/cloudera/crunch.
Others include Scoobi (http://nicta.github.com/scoobi/) and Spark, which well discuss next.
Use Spark
(Not MapReduce)

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
http://www.spark-project.org/
Spark started as a Berkeley project. recently, the developers launched Databricks to commercialize it, given the growing interest in Spark as a
MapReduce replacement. It can run under YARN, the newer Hadoop resource manager (its not clear thats the best strategy, though, vs. using
Mesos, another Berkeley project being commercialized by Mesosphere) and Spark can talk to HDFS, the Hadoop Distributed File System.
import org.apache.spark.SparkContext

object WordCountSpark {
def main(args: Array[String]) {
val ctx = new SparkContext(...)
val file = ctx.textFile(args(0))
val counts = file.flatMap(
line => line.split("\\W+"))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile(args(1))
}
} Also small and concise!

48 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
This spark example is actually closer in a few details, i.e., function names used, to the original Hadoop Java API example, but it cuts down boilerplate to the bare
minimum.
Spark is a Hadoop
MapReduce alterna>ve:
Distributed computing with
in-memory caching.
~30x faster than MapReduce
(in part due to caching of
intermediate data).
49 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Spark also addresses the lack of flexibility for the MapReduce model.
Spark is a Hadoop
MapReduce alterna>ve:

Originally designed for

machine learning applications.
Developed by Berkeley AMP.
50 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Use SQL!
Hive, Shark, Impala,
Presto, or Lingual

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Using SQL when you can! Here are 5 (and growing!) options.
Use SQL when you can!
Hive: SQL on top of MapReduce.
Shark: Hive ported to Spark.
Impala & Presto: HiveQL with
new, faster back ends.
Lingual: ANSI SQL on Cascading. 52 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

See http://hive.apache.org/ or my book for Hive, http://shark.cs.berkeley.edu/ for shark,

and http://www.cloudera.com/content/cloudera/en/products/cloudera-enterprise-core/
cloudera-enterprise-RTQ.html for Impala. http://www.facebook.com/notes/facebook-
engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
for Presto. Impala & Presto are relatively new.
Word Count in Hive SQL!
CREATE TABLE docs (line STRING);
LOAD DATA INPATH '/path/to/docs'
INTO TABLE docs;

CREATE TABLE word_counts AS

SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\W+'))
AS word FROM docs) w
GROUP BY word
ORDER BY word;

... and similarly for the other SQL tools.

53 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
This is how you could implement word count in Hive. Were using some Hive built-in functions for tokenizing words in each line, the one column in the docs
table, etc., etc.
Lingual is similarly, but because its more ANSI-compliant, the example would be much different.
Were in the era I call
The SQL Strikes Back!

(with apologies to
George Lucas...)
54 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

IT shops realize that NoSQL is useful and all, but people really, Really, REALLY love SQL. So,
its making a big comeback. You can see it in Hadoop, in SQL-like APIs for some NoSQL
DBs, e.g., Cassandra and MongoDBs Javascript-based query language, as well as NewSQL
DBs.
Combinators

Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Photo: The defunct Esquire movie theater on Oak St., off the Magnificent Mile, in Chicago. Now completely gone!
Why were the
Scala, Clojure, and SQL
solu>ons so concise
and appealing??

56 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Data problems
are fundamentally
Mathema>cs!

evanmiller.org/mathematical-hacker.html

57 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

A blog post about how developers ignore mathematics at their peril!

Category Theory

Monads - Structure.
Abstracting over collections.
Control flow and mutability
containment.
58 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

Monads generalize the properties of containers, like lists and maps, such as applying a
function to each element and returning a new instance of the same container type. This also
applies to encapsulations of state transformations and principled mutability, as used in
Haskell.
Category Theory

Monoids, Groups, Rings, etc.

Abstracting over addition,
subtraction, multiplication, and
division.

59 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Monoid: Addi>on

(a + b) + (c + d) for some a, b, c, d.
Add All the Things, Avi Bryant,
StrangeLoop 2013.

infoq.com/presentations/abstract-algebra-analytics
60 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

For an explanation of this slide, see this great presentation by Avi Bryant at StrangeLoop
2013 on generalizing addition (monoids).
Linear Algebra
Eigenvector and Singular Value
Decomposition.
Essential tools in machine
learning.
Av = mv
61 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Example: Eigenfaces
Represent images
as vectors.
Solve for
modes.
Top N modes
approx. faces! 62
http://en.wikipedia.org/wiki/File:Eigenfaces.png
Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Set Theory and
First-Order Logic
Relational Model.
Data organized
into tuples,
grouped by
relations.
http://dl.acm.org/citation.cfm?doid=362384.362685
63 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

64 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

Formulated by Codd in 69. Most systems dont follow it exactly, like allowing identical
records, where set elements are unique. Codds original model didnt support NULLs either
(unknown), but he later proposed a revision to allow them.
What are Combinators?
Functions that are side-effect
free.
They get all their information
from their inputs and write all
their work to their outputs.
65 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Lets look at
4 rela>onal operators
and the corresponding
func>onal combinators.

66 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13

See, for example, the discussions in Database in Depth and SQL and Relational Theory,
Second Edition, both by C.J. Date (OReilly)
Recall our Word Counts:
CREATE TABLE word_counts (
word CHARACTER(64),
count INTEGER);
(ANSI SQL syntax)

val word_counts: Stream[(String,Int)]

(Scala)

67 Copyright 2011-2013, Dean Wampler, All Rights Reserved

Friday, November 29, 13
Our word_counts table from before, using ANSI SQL syntax this time.
The corresponding Scala might be any kind of collection, e.g., a List. Here, Ill use a Stream, which is a lazy collection useful for large data structures like I/O...
Note that its a stream of a two-element tuple, a String (for the word) and an Int (for the count).
Restrict

SELECT * FROM word_counts

WHERE word = 'Chicago';

vs.

word_counts.filter {
case (word, count) =>
word == "Chicago"
}

Friday, November 29, 13
For the Scala example, assume word_counts is a collection (List, Vector, etc.) of 2-element tuples. The case match in the anonymous function passed to filter is a
way of conveniently assigning variables to each element of the tuple, here word and count. Then I filter on only certain word values.
Project

SELECT word FROM word_counts;

vs.

word_counts.map {
case (word, count) =>
word
}

Friday, November 29, 13
Here, I just return the words in each record or Scala tuple.
Join
CREATE TABLE dictionary (
word CHARACTER(64),
definition CHARACTER(256));

Table for join examples.

Friday, November 29, 13
First, we need something to join with; lets use a dictionary of word definitions.
Join - SQL
SELECT w.word, d.definition
FROM word_counts AS w
dictionary AS d
WHERE w.word = d.word;

Friday, November 29, 13
Here is the SQL join that gives us the words and their definitions. (side note: Hive doesnt support this inferred join syntax; you have to use a more explicit JOIN
ON syntax.)
Join - Scalding
val word_counts =
Csv("/path/", ('wword, 'count)).read
val definitions =
Csv("/path/", ('dword, 'definition)).read

word_counts
.joinWithLarger('wword -> 'dword,
dictionary)
.project('wword, 'definition)

Friday, November 29, 13
The Scala collections library doesnt have a join combinator. We would have to build up something that understands the data, such as exploiting sort order. This is
a case where a large-scale data system will implement expensive operations, where a general-purpose programming library might not. So, Im using a Scalding
example.
Join
SELECT w.word, d.definition
FROM word_counts AS w
dictionary AS d
WHERE w.word = d.word;

vs.

word_counts
.joinWithLarger('wword -> 'dword,
dictionary)
.project('wword, 'definition)

Friday, November 29, 13
Now shown together, with some of the Scalding setup code removed.
Joins are expensive.
Your data system needs
to exploit
op>miza>ons.

Friday, November 29, 13
Group By
SELECT count, size(word) AS size
FROM word_counts
GROUP BY count
ORDER BY size DESC;

vs.
word_counts.groupBy {
case (word, count) => count
}.toList.map {
case (count, words) => (count, words.size)
}.sortBy {
case (count, size) => -size
} 75 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
How many words appeared once, twice, 3 times, ..., N-times? Order this list descending.
Im back to the Scala library (as opposed to Scalding). The code inputs a collections of tuples, (word,count) and groups by count. This creates a map with the
count as the key and a list of the words as the value.
Next we convert this to a list of tuples (count,List(words)) and map it to a list of tuples with the (count, size of List(words)), then finally sort descending by the list
sizes.
Example
scala> val word_counts = List(
("a", 1), ("b", 2), ("c", 3),
("d", 2), ("e", 2), ("f", 3))

scala> val out = word_counts.groupBy {

case (word, count) => count
}.toList.map {
case (count, words) => (count, words.size)
}.sortBy {
case (count, size) => -size
}

out: List[(Int,Int)] =
List((2,3), (3,2), (1,1))
76 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
Heres a simple example you can run in the Scala REPL (prompts are scala>).
We could go on, but
you get the point.
Declara>ve, func>onal
combinators are a
natural tool for data.
77 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13
SQL vs. FP
SQL
Lots of optimizations for data
manipulation.
FP
More combinators.
First class functions!
78 Copyright 2011-2013, Dean Wampler, All Rights Reserved
Friday, November 29, 13

A drawback of SQL is that it doesnt provide first class functions, so (depending on the
system) youre limited to those that are built-in or UDFs (user-defined funcs) that you can
write and add. FP languages make this easy!!
FP to the
Rescue!

Friday, November 29, 13
Outside my condo window one Sunday morning...
Popular Claim:

Mul4core concurrency
is driving FP adop>on.

Friday, November 29, 13

Weve all heard this. In fact, this is how I got interested in FP.
My Claim:
Data will drive the next
wave of widespread
FP adop>on.

Friday, November 29, 13

Even today, most developers get by without understanding concurrency. Many will just use an
Actor or Reactive model to solve their problems. I think more devs will have to learn how to
work with data at scale and that fact will drive them to FP. This will be the next wave.
Data
Friday, November 29, 13
Architectures
Copyright 2011-2013, Dean Wampler, All Rights Reserved

What should software architectures look like for these kinds of systems?
Photo: Two famous 19th Century Buildings in Chicago.
Other, Object-
Oriented Query
Objects Domain Logic

4
1
Object Model
ParentB1 SQL
toJSON
3
ChildB1 ChildB2 Object-
toJSON toJSON Relational
Mapping

Result Set

Database

Friday, November 29, 13
Traditionally, weve kept a rich, in-memory domain model requiring an ORM to convert persistent data into the model. This is resource overhead and complexity we cant afford in big data
systems. Rather, we should treat the result set as it is, a particular kind of collection, do the minimal transformation required to exploit our collections libraries and classes representing some
domain concepts (e.g., Address, StockOption, etc.), then write functional code to implement business logic (or drive emergent behavior with machine learning algorithms)

The toJSON methods are there because we often convert these object graphs back into fundamental structures, such as the maps and arrays of JSON so we can send them to the browser!
Relational/
Functional Query
Domain Logic

1
Functional
Abstractions
3 SQL

Functional
Other, Object-
Oriented Query
Wrapper for
Objects Domain Logic Relational Data
4
1
Object Model
ParentB1 SQL
toJSON
3 Result Set
ChildB1 ChildB2 Object-
toJSON toJSON Relational
Mapping
2

Result Set

Database
Database

Friday, November 29, 13
But the traditional systems are a poor fit for this new world: 1) they add too much overhead in computation (the ORM layer, etc.) and memory (to store the objects). Most of what we do with
data is mathematical transformation, so were far more productive (and runtime efficient) if we embrace fundamental data structures used throughout (lists, sets, maps, trees) and build rich
transformations into those libraries, transformations that are composable to implement business logic.
Relational/

Focus on:
Functional Query
Domain Logic

Lists Functional
Abstractions
3 SQL

Maps Functional
Wrapper for
Relational Data

Sets
Result Set

Trees 2

... Database

ParentB1
toJSON

ChildB1 ChildB2
toJSON toJSON

Database Files

Process 1 Process 2 Process 3

Web Client 1 Web Client 2 Web Client 3

ParentB1
toJSON

ChildB1 ChildB2
toJSON toJSON

Database Files

Friday, November 29, 13
In a broader view, object models tend to push us towards centralized, complex systems that dont decompose well and stifle reuse and optimal deployment scenarios. FP code makes it
easier to write smaller, focused services that we compose and deploy as appropriate. Each ProcessN could be a parallel copy of another process, for horizontal, shared-nothing
scalability, or some of these processes could be other services
Smaller, focused services scale better, especially horizontally. They also dont encapsulate more business logic than is required, and this (informal) architecture is also suitable for scaling
ML and related algorithms.
Web Client 1 Web Client 2 Web Client 3

Data Size

Formal
Schema
Process 1 Process 2 Process 3

Data-Driven
Programs Database Files

Friday, November 29, 13
And this structure better fits the trends I outlined at the beginning of the talk.
Hadoop is the
Enterprise Java Beans
of our >me.

Friday, November 29, 13
I worked with EJBs a decade ago. The framework was completely invasive into your business logic. There were too many configuration options in
XML files. The framework paradigm was a poor fit for most problems (like soft real time systems and most algorithms beyond Word Count).
Internally, EJB implementations were inefficient and hard to optimize, because they relied on poorly considered object boundaries that muddled
more natural boundaries. (Ive argued in other presentations and my FP for Java Devs book that OOP is a poor modularity tool)
The fact is, Hadoop reminds me of EJBs in almost every way. Its a 1st generation solution that mostly works okay and people do get work done
with it, but just as the Spring Framework brought an essential rethinking to Enterprise Java, I think there is an essential rethink that needs to
happen in Big Data, specifically around Hadoop. The functional programming community, is well positioned to create it...
MapReduce
is waning

Friday, November 29, 13

Weve seen a lot of issues with MapReduce. Already, alternatives are being developed, either general options, like
Spark and Storm, or special-purpose built replacements, like Impala. Lets consider other options...
Emerging replacements
are based on
Func>onal Languages...
import com.twitter.scalding._

class WordCountJob(args: Args) extends Job(args) {

TextLine( args("input") )
.read
.flatMap('line -> 'word) {
line: String =>
line.trim.toLowerCase
.split("\\W+")
}
.groupBy('word) {
group => group.size('count) }
}
.write(Tsv(args("output")))
}

Friday, November 29, 13

FP is such a natural fit for the problem that any attempts to build big data systems without it will be handicapped
and probably fail.
Lets consider other MapReduce options...
... and SQL

CREATE TABLE docs (line STRING);

LOAD DATA INPATH '/path/to/docs'
INTO TABLE docs;

CREATE TABLE word_counts AS

SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\W+'))
AS word FROM docs) w
GROUP BY word
ORDER BY word;

Friday, November 29, 13

FP is such a natural fit for the problem that any attempts to build big data systems without it will be handicapped
and probably fail.
Lets consider other MapReduce options...
Ques>ons?

Nov. 21, 2013

Programming Pearls PDF
100% (2)
Programming Pearls PDF
195 pages
City Networks
No ratings yet
City Networks
218 pages
DECODE 2018 Report Finalv
No ratings yet
DECODE 2018 Report Finalv
77 pages
SOCAP15 Program
No ratings yet
SOCAP15 Program
104 pages
Noconname2k22 Schedule
No ratings yet
Noconname2k22 Schedule
4 pages
WWW Melbourne Vic Gov Au
No ratings yet
WWW Melbourne Vic Gov Au
9 pages
Wicked Cool Java - Code Bits, Open-Source Libraries, and Project Ideas PDF
No ratings yet
Wicked Cool Java - Code Bits, Open-Source Libraries, and Project Ideas PDF
249 pages
2011 CfA Annual Report
No ratings yet
2011 CfA Annual Report
32 pages
Linear Algebra Book by Jin Ho Kwak and Sungpyo Hong PDF
100% (2)
Linear Algebra Book by Jin Ho Kwak and Sungpyo Hong PDF
380 pages
Ciudades Inteligentes Exposé: 10 Ciudades en Transición (2012)
No ratings yet
Ciudades Inteligentes Exposé: 10 Ciudades en Transición (2012)
51 pages
PHP Lab Programs V Sem
No ratings yet
PHP Lab Programs V Sem
19 pages
Tnabc Day 1: Track One Track Two
No ratings yet
Tnabc Day 1: Track One Track Two
3 pages
Head First - Object-Oriented Design and Analysis PDF
No ratings yet
Head First - Object-Oriented Design and Analysis PDF
603 pages
12cs Ernakulam SQP 2223 Solved QP
No ratings yet
12cs Ernakulam SQP 2223 Solved QP
68 pages
Chicago COUNTS Press Release 08.25.10
No ratings yet
Chicago COUNTS Press Release 08.25.10
2 pages
Ai Lab File For Ptu
No ratings yet
Ai Lab File For Ptu
29 pages
9781786467058-What You Need To Know About Unity 5 PDF
100% (2)
9781786467058-What You Need To Know About Unity 5 PDF
55 pages
Advances in Grid Computing PDF
100% (1)
Advances in Grid Computing PDF
284 pages
MySQL Triggers
No ratings yet
MySQL Triggers
21 pages
My Resume
100% (1)
My Resume
4 pages
HTML XHTML and CSS Castro (WWW - Lianbooks.com) PDF
No ratings yet
HTML XHTML and CSS Castro (WWW - Lianbooks.com) PDF
640 pages
Final VT Report
No ratings yet
Final VT Report
36 pages
Chapter 01: A First Program in C#
No ratings yet
Chapter 01: A First Program in C#
39 pages
Oracle Applications - XML - BI Publisher Interview Questions
No ratings yet
Oracle Applications - XML - BI Publisher Interview Questions
3 pages
CS3251 Programming in C - Notes
No ratings yet
CS3251 Programming in C - Notes
125 pages
Data Structures Algorithms and Applications in C++ With Microsoft Compiler 1st Edition by Sartaj Sahni ISBN 007236226X 978-0072362268
100% (10)
Data Structures Algorithms and Applications in C++ With Microsoft Compiler 1st Edition by Sartaj Sahni ISBN 007236226X 978-0072362268
84 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
BetterProgrammingThroughFP PDF
No ratings yet
BetterProgrammingThroughFP PDF
153 pages
Generics in Java KT Presentation
No ratings yet
Generics in Java KT Presentation
14 pages
Techmesh London 2012 December 5, 2012: Mapreduce and Its Discontents
No ratings yet
Techmesh London 2012 December 5, 2012: Mapreduce and Its Discontents
79 pages
WhatsAheadForBigData PDF
No ratings yet
WhatsAheadForBigData PDF
103 pages
Practicals PPBE
No ratings yet
Practicals PPBE
119 pages
Lab 4
No ratings yet
Lab 4
9 pages
Keyword Reference - Alphabetical - QB64.org Wiki
No ratings yet
Keyword Reference - Alphabetical - QB64.org Wiki
34 pages
VB Script
No ratings yet
VB Script
3 pages
Clojure and Rabbit MQ
No ratings yet
Clojure and Rabbit MQ
7 pages
PWP Microproject
No ratings yet
PWP Microproject
11 pages
Sanket Oop PDF
No ratings yet
Sanket Oop PDF
29 pages
Activeadmin Docs
No ratings yet
Activeadmin Docs
39 pages
Alias Simple Yet Powerful
No ratings yet
Alias Simple Yet Powerful
14 pages
3PAR Web Services PDF
No ratings yet
3PAR Web Services PDF
51 pages
Subprocess 1
No ratings yet
Subprocess 1
14 pages
Ba Iiba Whitepaper PDF
No ratings yet
Ba Iiba Whitepaper PDF
8 pages
ENGL1101 Wohlsen Digital Literacy
No ratings yet
ENGL1101 Wohlsen Digital Literacy
4 pages
Doubly Linked List
No ratings yet
Doubly Linked List
18 pages
61 Function Expression Exercise
No ratings yet
61 Function Expression Exercise
2 pages
Dbi PDF
No ratings yet
Dbi PDF
13 pages
DCIT 101 ASSIGNMENT 10917663docx
No ratings yet
DCIT 101 ASSIGNMENT 10917663docx
9 pages
Chapter 4 - Anatomy of A Learning Algorithms
No ratings yet
Chapter 4 - Anatomy of A Learning Algorithms
2 pages
CAS Manual
No ratings yet
CAS Manual
5 pages
Department of Electrical & Electronic Engineering Eee 111 Introduction To Computing
No ratings yet
Department of Electrical & Electronic Engineering Eee 111 Introduction To Computing
3 pages
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
From Everand
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
Michael A. Hiltzik
4/5 (91)
The Modern Web: Multi-Device Web Development with HTML5, CSS3, and JavaScript
From Everand
The Modern Web: Multi-Device Web Development with HTML5, CSS3, and JavaScript
Peter Gasston
4/5 (10)
Linux Basics for Hackers: Getting Started with Networking, Scripting, and Security in Kali
From Everand
Linux Basics for Hackers: Getting Started with Networking, Scripting, and Security in Kali
OccupyTheWeb
3/5 (3)
Neo4j - A Graph Project Story
From Everand
Neo4j - A Graph Project Story
Mervaillie Nicolas
5/5 (1)
The Quick Guide to Cloud Computing and Cyber Security
From Everand
The Quick Guide to Cloud Computing and Cyber Security
Marcia R.T. Pistorious
4/5 (11)
Creative Freedom: the World of Free & Open-Source Creative Suite
From Everand
Creative Freedom: the World of Free & Open-Source Creative Suite
Dave Njogu
No ratings yet
The SparkFun Guide to Processing: Create Interactive Art with Code
From Everand
The SparkFun Guide to Processing: Create Interactive Art with Code
Derek Runberg
No ratings yet
The Book of F#: Breaking Free with Managed Functional Programming
From Everand
The Book of F#: Breaking Free with Managed Functional Programming
Dave Fancher
4/5 (3)
Machine Learning, revised and updated edition
From Everand
Machine Learning, revised and updated edition
Ethem Alpaydin
No ratings yet
Algorithmic Thinking, 2nd Edition: Learn Algorithms to Level Up Your Coding Skills
From Everand
Algorithmic Thinking, 2nd Edition: Learn Algorithms to Level Up Your Coding Skills
Daniel Zingaro
No ratings yet
The Launch Pad: Inside Y Combinator
From Everand
The Launch Pad: Inside Y Combinator
Randall Stross
No ratings yet
Archiving Secrets: Common Sense Advice On Saving Photos & Family Documents Without Fancy Programs
From Everand
Archiving Secrets: Common Sense Advice On Saving Photos & Family Documents Without Fancy Programs
John Hirtle
No ratings yet
What Are Programs and Apps?
From Everand
What Are Programs and Apps?
L. E. Carmichael
No ratings yet
Zing This!
From Everand
Zing This!
William Gilreath
No ratings yet
Drupal 7 Media
From Everand
Drupal 7 Media
Liran Tal
No ratings yet
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Drush for Developers - Second Edition
From Everand
Drush for Developers - Second Edition
Juampy Novillo Requena
No ratings yet
Going Chromebook: Living in the Cloud: Going Chromebook, #1
From Everand
Going Chromebook: Living in the Cloud: Going Chromebook, #1
Brian Schell
3.5/5 (2)
Data Virtualization: Selected Writings
From Everand
Data Virtualization: Selected Writings
Rick F. van der Lans
No ratings yet
Implementing Cloud Storage with OpenStack Swift
From Everand
Implementing Cloud Storage with OpenStack Swift
Amar Kapadia
No ratings yet
Getting Started with Hazelcast - Second Edition
From Everand
Getting Started with Hazelcast - Second Edition
Mat Johns
No ratings yet
Jump Start MySQL: Master the Database That Powers the Web
From Everand
Jump Start MySQL: Master the Database That Powers the Web
Timothy Boronczyk
No ratings yet
OpenStack Sahara Essentials
From Everand
OpenStack Sahara Essentials
Omar Khedher
No ratings yet
Deep Learning with Hadoop
From Everand
Deep Learning with Hadoop
Dipayan Dev
No ratings yet
Learning Splunk Web Framework
From Everand
Learning Splunk Web Framework
Vincent Sesto
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Clojure Reactive Programming
From Everand
Clojure Reactive Programming
Leonardo Borges
No ratings yet
Cloud Computing Unveiled: A Short Journey Through Time
From Everand
Cloud Computing Unveiled: A Short Journey Through Time
Maula Issa
No ratings yet
Getting Started with Greenplum for Big Data Analytics
From Everand
Getting Started with Greenplum for Big Data Analytics
Sunila Gollapudi
No ratings yet
Exploring SE for Android
From Everand
Exploring SE for Android
William Confer
No ratings yet
Instant Nokogiri
From Everand
Instant Nokogiri
Hunter Powers
No ratings yet
Design and Build It in the Dirt
From Everand
Design and Build It in the Dirt
Nikole Bethea
No ratings yet
How Does Streaming Work?
From Everand
How Does Streaming Work?
M.M. Eboch
No ratings yet
Design and Build It to Help
From Everand
Design and Build It to Help
Nikole Bethea
No ratings yet
What Is Computer Coding?
From Everand
What Is Computer Coding?
Mary K. Pratt
No ratings yet
Cloud Development and Deployment with CloudBees
From Everand
Cloud Development and Deployment with CloudBees
Nicolas De loof
No ratings yet
Implementing Cloud Design Patterns for AWS
From Everand
Implementing Cloud Design Patterns for AWS
Marcus Young
No ratings yet
Hamshack Raspberry Pi: How to Use the Raspberry Pi for Amateur Radio Activities
From Everand
Hamshack Raspberry Pi: How to Use the Raspberry Pi for Amateur Radio Activities
James Baughn K9E0H
No ratings yet
Instant Play Framework Starter
From Everand
Instant Play Framework Starter
Daniel Dietrich
No ratings yet
Crayola ® STEAM Teams: Creativity, Innovation, and Teamwork
From Everand
Crayola ® STEAM Teams: Creativity, Innovation, and Teamwork
Kevin Kurtz
No ratings yet
Practical OneOps
From Everand
Practical OneOps
Nilesh Nimkar
No ratings yet
Software Defined Networking (SDN): Anatomy of OpenFlow Volume I
From Everand
Software Defined Networking (SDN): Anatomy of OpenFlow Volume I
Doug Marschke
No ratings yet
Rise of the Data Cloud
From Everand
Rise of the Data Cloud
Frank Slootman
5/5 (1)
Rust for Beginners
From Everand
Rust for Beginners
Hernando Abella
No ratings yet
Instant Google Drive Starter
From Everand
Instant Google Drive Starter
Mike Procopio
No ratings yet
Cloud Engineering for Beginners
From Everand
Cloud Engineering for Beginners
Nenne Adaora Nwodo
No ratings yet
Splunk Essentials - Second Edition
From Everand
Splunk Essentials - Second Edition
Betsy Page Sigman
No ratings yet
Tune into the Cloud: The story so far
From Everand
Tune into the Cloud: The story so far
Gregor Petri
No ratings yet
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
From Everand
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
Malcolm Coxall
No ratings yet
Python Data Science Essentials
From Everand
Python Data Science Essentials
Alberto Boschetti
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet