0% found this document useful (0 votes)
127 views64 pages

CS 5412: Topics in Cloud Computing: Ken Birman Spring, 2018

CS5412 is a course on cloud computing that focuses on fog and edge computing rather than hands-on cloud projects. It aims to understand key components, how they work, and their limitations through concrete use cases. Prior to 2005, Amazon had large data centers for web requests that were unreliable due to scaling issues. A Yahoo experiment found that page load times over 100ms negatively impacted purchase rates. Around 2006, Amazon reinvented their data center design using inexpensive racks of machines, a message bus, and parallelism to improve performance and scalability. Today's cloud uses lightweight tier-one servers and specialized tier-two microservices. Even simple cloud designs pose choices around threads, processes, virtual machines, and containers to optimize resource usage

Uploaded by

Kevin Gao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views64 pages

CS 5412: Topics in Cloud Computing: Ken Birman Spring, 2018

CS5412 is a course on cloud computing that focuses on fog and edge computing rather than hands-on cloud projects. It aims to understand key components, how they work, and their limitations through concrete use cases. Prior to 2005, Amazon had large data centers for web requests that were unreliable due to scaling issues. A Yahoo experiment found that page load times over 100ms negatively impacted purchase rates. Around 2006, Amazon reinvented their data center design using inexpensive racks of machines, a message bus, and parallelism to improve performance and scalability. Today's cloud uses lightweight tier-one servers and specialized tier-two microservices. Even simple cloud designs pose choices around threads, processes, virtual machines, and containers to optimize resource usage

Uploaded by

Kevin Gao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

CS 5412:

TOPICS IN CLOUD Ken Birman


Spring, 2018
COMPUTING HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 1
CLOUD COMPUTING
CS5412 is…
 A deep study of a big topic, but NOT a hands-on class (we encourage
cloud experience, but don’t require that project demos run on the cloud)
 Instead, our goal is to focus on some concrete use cases and really
understand what role each element plays, how that class of components
work, and what their limitations are.
This year’s focus will be on “Fog and Edge Computing”
 When a cloud touches the edge, you get fog.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 2
WHERE DID THE CLOUD COME
FROM?
Prior to ~2005, we had “data centers designed for high availability”.
Amazon had especially large ones, to serve its web requests
 This is all before the AWS cloud model
 The real goal was just to support online shopping

Their system wasn’t very reliable and the core problem was scaling
 Like a theoretical complexity growth issue.
 Amazon’s computers were overloaded and often crashed
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 3
YAHOO EXPERIMENT
A sprint to render your web page!

At Yahoo, they tried an “alpha/beta” experiment


For web purchases, they had one group of customers who saw fast web
page rendering (below 100ms), and a second group who saw small delays
before pages rendered.
For every 100ms delay, purchase rates noticeably dropped. So 100ms
became a magic threshold!

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 4
STARTING AROUND 2006, AMAZON LED IN
REINVENTING DATA CENTER
COMPUTING
This was pretty much when Werner Vogels joined, but the redesign wasn’t
actually his idea (it was already happening).

Amazon reorganized their design so that when you access a web page,
tens or hundreds of “services” could collaborate to fetch the content.

They also began to guess at your next action and precompute what they
would probably need to answer your next query or link click.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 5
OLD APPROACH (2005)
Product List

Internet routing was Image Database


pretty static, except
for load balancing
Web Server
built the page
Computers were mostly
desktops Billing and Account Info

Databases held the real


product inventory

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 6
NEW APPROACH (2008) Message Bus
Product List
a re st n y
t o ne f ma
u t ed one o
Ro ter,
ac en
dat Massive
Internet rackswas
routing of inexpensive
webstatic,
pretty serverexcept
machines dispatch Racks of highlyImage
parallel Database
workers do much
forparallel tasks, then render the
load balancing of the data fetching and processing,
web page in parallel for speed
Web Server ideally ahead of need
built the page The old databases might still be present in
Devices became more mobile this layer, or may have been split into lots
Computers were mostly
of services.
desktops Billing and Account Info

Databases held the real


product inventory

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 7
TIER ONE / TIER TWO
We often talk about the cloud as a “multi-tier” environment.

Tier one: programs that generate the web page you see.

Tier two: services that support tier one. We will see one later
(DHT/KVS storage used to create a massive cache)

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 8
TODAY’S CLOUD
Tier one runs on very lightweight servers:
 They use very small amounts of computer memory
 They don’t need a lot of compute power either
 They have limited needs for storage, or network I/O

Tier two -Services specialize in various aspects of the content delivered


to the end-user. They may run on somewhat “beefier” computers.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 9
SLIGHT DIGRESSION

Today’s cloud sometimes doesn’t act the way you would expect from
what you learned in your O/S course.

To see this, we’ll spend a minute on just one example.

Slight digression HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 10


TIER-ONE FOCUSES ON EASY
STORIES
Which is better:
Multithreaded servers?

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 11
TIER-ONE FOCUSES ON EASY
STORIES
Which is better:
Multithreaded servers?
Or multiple single-threaded
servers?

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 12
WHAT YOU LEARNED IN O/S
COURSE
Probably, you just took a class where the big focus was concurrency and
threaded programs.

The story you heard was something like this:


 Because of Moore’s law, modern computers are NUMA multiprocessors.
 To leverage that power, create lots of threads, link with a library like
“pthreads”, and request that your program be allocated multiple cores.
 Use thread synchronization/critical sections to ensure correctness.
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 13
WHAT YOU LEARNED IN O/S
COURSE
But you also learned about virtual machines.
Basically, we take one computer, the “bare metal” machine, and run a very
small microkernel on it.
 The microkernel in turn runs a special version of the operating system
called a “virtualized” O/S. It lets you launch “virtual machines”.
 Programs inside a virtual machine think they are running on private,
dedicated computers in a network.
Perhaps they even mentioned “containers”, which are a new and more optimized
way to get more or less the same functionality, perhaps with a bit less isolation.
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 14
… EVEN OUR “EASY” CLOUD
POSES CHOICES!
Are those threads?
… Linux processes?
… virtual machines?
… Linux containers?

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 15
THIS MAY SEEM LIKE IT IS TOO
SIMPLE!
Actually, even very simple questions sometimes lead to a lot of
complexity
Here we have four options:
1. Keep my server busy by running one multithreaded application on it
2. Keep it busy by running N unthreaded versions of my application as
virtual machines, sharing the hardware
3. Keep it busy by running N side by side processes, but don’t virtualize
4. Keep it busy by running N side by side processes using containers
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 16
TO UNDERSTAND WHICH
“WINS”…
One way to try and answer such a question focuses on resources

We want the edge of the cloud to be as cost-effective as possible.

Which of these options consumes the least memory and bandwidth?


 This probably would be a single program with multiple threads
 They share memory, which must lead to at least some efficiencies
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 17
BUT WOULD A MULTI-
THREADED SOLUTION
PERFORM WELL?
We need to understand more about modern server hardware
Early days of the web were before we fell off Moore’s curve
Modern servers are NUMA machines with many cores. We need to pick
the solution that is most closely matched to this modern hardware
What do we know about NUMA hardware?

32-core Intel Aubrey


chip. Some servers
have as many as 128
cores today!
Slight 18
NUMA ARCHITECTURE
A NUMA computer is really a small rack of computers on a chip
Each has its own L2 cache, and small groups of them share DRAM.
 Accessing your nearby DRAM is rapid
 Accessing “remote” (but still on-chip) DRAM is much slower, 15x or
more

NUMA hardware provides cache consistency and locking, but costs can be
quite high if these features have much work to do.
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 19
DEEP DIVE ON THAT QUESTION
Why can’t a NUMA machine just throw parallelism at
a multi-threaded program and get a speedup linear
in the number of cores?
 Actual experience: Things slow down with more cores!
 Issues: locking, memory “layout”, context switching, L2 cache hit
rate

Virtualization can make things even worse. And Linux processes sharing
one machine can interfere with each other.
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 20
CONTAINERS TO THE RESCUE!
A container is a normal Linux process with a library that mimics a full VM.
 The system looks “private” but without full virtualization.
 Eliminates the 10% or so performance overheads seen with true VMs.
 Also, containers launch and shut down much faster than a full VM,
because we don’t need to load the whole OS.
 We won’t see NUMA memory contention problems.
 Security and “isolation” are nearly as good as for VMs.
In CS5412 we treat both options as forms of virtualization.

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 21
MESOS VS.
VMWARE
A version of Linux that has become increasingly optimized to include these kinds of
ideas
Adopted by Microsoft as the basis for the Azure virtualization product
 Azure cloud offers Azure HPC (small supercomputing configurations),
 … Azure versions of Linux via Mesos (for example, Ubuntu)
 … Windows Enterprise Server (mostly for back-compatibility)
VMWare is still more widely used, and has true virtualization, but is trying to adopt
container-like ideas too, since it would lag in performance
 AWS has dozens of VMWare configuration options (and other options, too)
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 22
AHA!

Build highly optimized single threaded web page server, run one per
client.
Run lots of copies on each NUMA server computer (perhaps hundreds).
Use containers for isolation, container O/S smart about DRAM memory
issues.
 Share read-only pages only between cores that share the same
DRAM
 Make one copy per DRAM for read-only shared data, like code
pages!
Slight 23
… CLOUD COMPUTING ISN’T AN O/S
COURSE
It isn’t that the O/S course wasn’t correct…

… but few O/S courses have caught up with the complexities of modern
hardware, the diversity of choices, and the objectives arising at cloud
scale

Our challenge in CS5412 is to move from an O/S centric, single-machine


perspective to a more global perspective spanning the full datacenter!
Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 24
THIS SHIFT OF PERSPECTIVE
MOTIVATES CHANGES
For example, in your O/S course you probably learned a LOT about
concurrency and virtualization.

But as a result, there wasn’t enough time to have learned a lot about
container models, or architectures for massive scalability.

Those are the kinds of topics we focus on in CS5412.

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 25
HOW CAN ANYONE PICK?
Any “public” cloud platform today offers dozens or hundreds of O/S and
configuration choices, optimized for different use cases.
Realistically, you need to read the documentation or watch training videos
and adopt what they recommend. Not everything works well in the cloud.
Many people just adapt demos and code samples to solve new problems.
 In college, this would sound like theft (or at least, slacking off).
 But in cloud environments, this is pretty much the only option!

Slight HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 26
EVEN WITH CONTAINERS, PAGE
RETRIEVALS WERE STILL WAY
TOO SLOW!
Time to
Key issue: database get back
accesses weretocausing
our core topic!
delay on the “critical path”,
meaning the delay-limiting steps of page generation.
Recall that
Key ideas right before
Amazon the digression, we were focused on
deployed:
Yahoo!’s experiment showing that we really need
 Precompute as much as possible, and cache the results to
generate web pages in 100ms or less.
 Don’t check for “staleness” of cache entries. Use what you find.
 It is ok for web pages to have some inaccuracies, like the inventory.
 People don’t trust an inventory in any case!

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 27
SUMMARY OF THE IMPACT?
Prior to 2005, one server
computed a typical page
By 2010, > 100 servers
contribute content and the
one server my browser is
connected to just assembles
the pieces (resizing if needed)
and passes them
back to the (mobile) user

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 28
WHY WASN’T MERE
PARALLELISM ENOUGH?
The main delay factor involved interactions with back-end databases.

All of these front-end (tier one and tier two) systems and -Services kept
stalling to fetch data from databases, run transactions on databases.

Main cause of cache misses? Transactions would cause updates that


invalidated the cached items (they became stale). In a database model,
stale cached data must be removed from the cache.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 29
ERIC BREWER’S CAP THEOREM
At Berkeley, Eric Brewer captured this insight with a “theorem”
CAP stands for “Consistency, Availability and Partition Tolerance”

Basically, Eric argues that:


 There is a tradeoff between these properties
 You can’t get all of them at once (simple example: conflicting database
updates are issued when the network is temporarily partitioned)
 He concluded that one has to be weakened, and suggested that using
stale cached data isn’t such a terrible thing. Many web pages won’t care
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 30
ERIC BREWER’S CAP THEOREM

Relax consistency (C),


 Gain faster response (A).
 Generate responses even when unable to talk to
back-end servers (P).

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 31
BASE METHODOLOGY
Invented at eBay, adopted by Amazon and others
 Basic Availability, Soft State and Eventual
Consistency

“Use CAP. Eventually, clean up any mess this creates.”

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 32
MASSIVE CACHES
With CAP and BASE, we saw huge growth in the amount of caching.
How to store such large caches?
 At first, using in-memory storage via hashed lookup.
 But the next idea was to spread the data over racks of computers
Led to Distributed Hash Tables:
 Just hash twice: Once to find the computer, and a second time to find
the data in memory!

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 33
DISTRIBUTED HASH TABLE
CONCEPT DHT server 13 does a fast
in-memory lookup

1
2
to find the object,

3
4
if it has a copy

5
6
7
8
Hash key

9
10
to an

11
12
integer,

13
14
maybe 13

15
Key Value

16
Look up: “Inventory/Toys/Lego/Castle/…”

If the object isn’t in the cache, we go to the “Inventory/Toys/Lego/Castle/…” (html web page)
back-end server and ask for it from there. But
this still shields the server from load if we do
find it.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 34
DHT / KVS TERMINOLOGY
Sometimes called a distributed hash table (DHT)
Others prefer “key-value store” (KVS)

Hit occurs if the thing we are looking for is found in the cache
A miss occurs if the thing is missing from the cache and must be fetched
from some kind of back-end server.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 35
CLOUDS ALSO HOST BIG DATA,
MACHINE LEARNING TOOLS
Early cloud just served web pages and embedded ads.

But individualized advertising gives far better results…


 Better selection of ads increases revenue, gave rise to an AI revolution
 Social networking “graphs” bring further value to the table
These are graphs to track who likes what, who is similar to whom, etc…

Today, best to think of the whole cloud as a massive scalable system for machine learning and
associated actions.
The cloud uses web pages and web standards, but AI is its real role.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 36
WHERE DOES THE AI LIVE?
Reminder: All of those “tiers”

Third tier: Stateful


services like Back-end: Big-data
databases, plus other analytics and machine
Mobile client Second tier:
First tier: “workers” learning tools
Caches and
Builds web
similar -
pages
services
Stuff “happens” here …100ms ….seconds ….minutes/hours37
WHAT ABOUT SPECIAL
HARDWARE? YOU FIND IT
EVERYWHERE YOU LOOK!
Cloud systems are massive consumers of cutting edge hardware!
Computers arrive by the 12-wheeler truckload
Storage: the saying goes that “terrabytes pull petabytes” (and exobytes)
RDMA communication hardware is rapidly gaining adoption
GPU clusters and FPGA/ASIC specialized hardware is popular
The NICs and routers are programmable (P4)
Coming soon: TPU clusters, q-bit computing…
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 38
SOME HARD QUESTIONS
What jobs should cloud platforms own?
What does it mean to “guarantee” something in the cloud?
 Such as fault-tolerance? Or speed?
 … security/privacy?
Borat Cloud: The trusted
 … “data consistency”? choice for secrets and data
that must stay private!

We’ll only look at a few of these hard questions. Some deserve entire courses!
(Security, cloud databases, networks…)
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 39
WHICH SUBTOPICS WILL WE
FOCUS ON?
Scalability: If we build something and test with 10 machines, will it still be
stable and working with 10,000 machines and a million users?

Fault-tolerance: Can we build systems that self-repair

Consistency: If we query data replicas, can we be sure the data is valid?

Speed: How fast can all of this run, using modern hardware accelerators?
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 40
TAKE AWAY FROM THIS
LECTURE
We want massive scaling, and efficient sharing. This leads to data centers
packed with NUMA computers and speciality hardware
Understanding the hardware and the runtime system makes a huge difference
Once you understand the issue, it also helps “explain” common trends, like
the current focus on container platforms like Mesos (but VMWare will keep
up).

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 41
OTHER QUESTIONS YOU MIGHT
HAVE
I used terms like NUMA, threads, VMs, containers, L2 cache hit rate.
Are all of these really required background?
 How much background is assumed?
 What work is required from attendees?
 MEng project credit for students in the MEng (or MS) programs
 Topics for spring projects

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 42
BACKGROUND ASSUMED?
Solid understanding of computer architectures, operating systems, good
programming skills (including “threads”) in Java, C++ or C#

Some basic appreciation of how networks work, how operating systems


work, virtualization. Many people will have taken CS4410 or similar.

Prior exposure to “distributed computing” not required or expected

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 43
HOW MUCH WORK IS
REQUIRED?
Attend classes, and recitations (some are reviews of class material, but
others are to help with the projects)

Project will typically take about 3-6 hours per week through the whole
semester. Some groups end up spending way more time in the last
week. Starting work early can avoid this last-minute crunch issue.

Working in teams and flexible goals (if you fall behind… do less!) helps
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 44
PRELIM AND PROJECT
We have one open-book take-home prelim.
You’ll have 24 hours to do it.

Handed out on March 22. Due on March 23. Focus is on DHT material,
but cumulative over the semester up to that point.

There is no final, but the project is due at the end of study week (or
sooner). Each student will do a semester-long project, often in a team.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 45
WHAT IF I NEED TO TRAVEL FOR
INTERVIEWS?
Please be in town March 22/23. Put it on your calendar now.

We do not have a makeup for the prelim. If you are out of town, you
will just have to do it while travelling.

You do not need permission to miss a class, but you will still be expected
to review the slides and be sure to learn the material.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 46
VSense: A Vampire
FOG PROJECTS Early Warning
System
We prefer fog computing ideas for spring 2018.
Goal: show that you can master and demonstrate an edge application that uses
cloud infrastructure tools on “some” popular platform:
 Some source of input: “edge devices”. Potentially, lots of them.
 Infrastructure to capture, think about, and perhaps to hold data.
… It must be scalable, and fault-tolerant
 Reacts immediately: this is what makes it a fog application.
You get to invent a “use case”. This is like coming up with a concept for a
company, but then using standard technologies to build the “app”…
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 47
PROJECT RESOURCES
We won’t provide cloud accounts, but you can request a free one. Web
page for the course explains where to apply (it lists several options)

You can also use OpenStack (a cloud infrastructure, very standard) on the
machines in the MEng lab, or even on your own laptop.

Key thing: we are trying to avoid having you spend $$$.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 48
GROUPS
Type “Z” group Average, terrified group
starts 10 days before (hopes that if we allow groups of
Type “A” group: Plans to deadline, never sleeps 3, we can be talked into 11)
base a startup on project

We allow you to work alone, but most students work in groups of 2 or 3.

Ideally, each group has people with a range of talents, and people work
equally hard but have different styles and roles. We encourage this
because real jobs are like that: One person is great at GUIs, another is a
C++ superstar. Teamwork is the way to succeed in modern computing!

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 49
FORMING PROJECT GROUPS
We’ll use our first recitation to talk about projects.

This will also be a chance to try and team up with some other students

Piazza discussion area can be a useful resource too

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 50
PROJECT DEMO DAY

The project winds up with a demo (during study period)

You give us a very short writeup (to remember what you did), and stand in
front of a poster (you’ll make one up and printed it on large paper, at the
Olin library). You’ll use the poster to explain how your solution works.

Then you will run the demo to show us that it really does work. We’ll ask
a lot of questions about how you used ideas covered in class.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 51
MENG/MS PROJECT CREDIT
If you are in need of an MEng project, you can use your CS5412 project
for this purpose. Sign up for the MEng project credits “class”.

This extra credit is only available for people doing an MEng project.

You’ll get the same project grade as your teammates, but it will also
become your CS5999 grade.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 52
GRADING
Curved: Based half on prelim, half on project.

Usually about half of the class ends up with A+/A/A-. The rest get
B+/B/B-. A+ is pretty rare.

C grades are rare… usually people who skipped class, started the project
late, and figured that an open-book prelim must be easy.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 53
BOOK, CLOUD TRAINING
RESOURCES, CODE SAMPLES,
DEMO MATERIALS
Web page points to two books

Neither is required, and Cornell library has both on reserve

Cloud vendors have huge amounts of online material and videos showing how to
work with their tools.

It is absolutely fine to start with a vendor-supplied demo or sample and to extend it


into part of your project. This is how real cloud systems are built.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 54
SUMMARY: IN CS5412 WE WILL
EXPLORE
Cloud computing. But this is a broad topic, too big to fully cover.
We’ll touch on many aspects, but will drill down on fog computing
A smart highway is a great example of what this entails
 Split into layers: sensors, data collectors, smart memory, deeper “learning” tools.
 Sensors capture diverse forms of data, upload to the cloud (voice, images,
video…)
 Instantly processed using machine learning tools (the AI aspects are not our topic)
 Consistency, scale, performance, real-time, fault-tolerance all matter.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 55
EXTRA SLIDES: ABOUT We probably won’t have time to

KEN actually cover these

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 56
ABOUT ME
Worked in distributed computing since 1980’s
I build software for highly available, high performance, scalable systems.
 My “Isis Toolkit” still operates the French Air Traffic Control System, the US Navy AEGIS.
 Oracle has used it for so long that they forgot (but you still see it launch during startup)
Isis  It ran the New York Stock Exchange for ten years (after which, terrorists stole the name!  )

I helped Amazon, Microsoft, IBM and Facebook build and improve their clouds!
 Amazon CTO (Werner Vogels) was a member of my group, and brought our ideas to AWS
 As a PhD student Qi Huang doubled the speed of Facebook image and video playback
 IBM’s WebSphere used a “message bus” based on work done here at Cornell

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 57
STAYING UP TO DATE
Last year, I had a sabbatical and used it to learn all about the newest cloud
computing trends, to really catch up with developments
 I spent three months full time at Microsoft, learning all about Azure.
 … visited tons of companies and asked lots of questions.
 … met the developers of RDMA hardware (Mellanox), new
kinds of solid state storage (Western Digital), and spent time at
Facebook, Google, VMWare, Mellanox, Verizon/Yahoo…

58
SMART POWER GRID EXAMPLE

Machine Learning
for a “smart” power grid
High-rate data flows
Real-time reactions,
decision-making

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 59
SMART ENVIRONMENT?
Autonomous drones “decide” to do
detailed imaging of interesting sites
Subsurface “CAT scans” during fast-evolving
seismic events
Monitoring large regions during major
weather events
Large-scale subsurface sensor deployments to
track pollutants flow
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 60
KEN’S DREAM: A SMART
VINEYARD
Question: What’s going on in the world’s best vineyards?
 Suppose we could see everything, from the surface to
deep underground (the soil “biome”), year round…
 Like a subterranean microscope!
 We would know which spots are most
promising for great wine, and why

Best of all: invent this, and you


would be welcome to visit and see
the places using it…
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 61
DERECHO: SMART MEMORY
FOR CLOUDS
A Derecho in Indiana

Derecho is the name of the system my group is working on right now.


 The newest cloud hardware can move data faster than software can
keep up… Derecho is a software library to help overcome this.
How does Derecho gets its extreme speed?
 Remote DMA hardware (RDMA networking)
 Solid State Storage (SSD devices, Optane, etc).

3-D XPoint Technology


62
DERECHO: SMART MEMORY
FOR CLOUDS
A Derecho in Indiana

What makes a cloud computing memory layer smart?


 It looks like any other storage layer: file system API Linux File System
API
 But unlike a standard file system it is massively parallel open/read/write…

The developer of a smart memory can integrate machine learning code right
into the storage infrastructure.
 Like a file system that is smart about what to store, and how to store it.
 Avoids storing uninteresting data, and reduces delays if something exciting
happens and an urgent action is needed
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 63
SMART MEMORIES FOR SMART
HIGHWAYS
Derecho helps you build this kind
of structured service, in C++

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 64

You might also like