Lecture 01 - Intro

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Cloud

 Compu)ng  
Lecture  1    
 
Intro  –  By  Shmulik  Goldstein  
 
Dan  Amiga  
[email protected]  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Logistics

• All logistics covered by Dan next week

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Shmulik Goldsein - About

• Platform Startegy Advisor – Microsoft


[email protected]

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Course Agenda
• Intro
• From 0/1 to Data Centers
• Architecture + AWS
• Architecture + Azure
• Storage
• Cache
• RDBMS
• NoSQL
• Big Data
• Security

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Todays Agenda - Intro

• Problems
• History
• What is cloud computing

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Problems
• Time
• Cost
• Process
• Performance
• Scale/Parallel
• Storage Size
• Availability
• Connectivity
• Security
• How-to-do-it-right

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Users & Performance
• Google style applications
– Massive amount of data
– Massive amount of short time users
– Close to nothing latency
– As available as the internet
• Facebook
– Massive amount of data
– Data relations complexity
– Massive amount of long time users
– Be agile “code wins arguments”, time line, etc.

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Time

• Even if you are a small player time is


important
– TTM, Latency, Time to scale, Availability, etc

Development   Deployment  

Cost   30%   70%  

Time   10%   90%  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Storage problem

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Teaser 1

• What is the fastest way to search an array?

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Teaser 2

• Can you break a crypto hash function?


– Assuming no vulnerabilities

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Teaser 3

• I Have data from multiple buildings around


the world and would like to ask questions
about that data.

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Evolu)on  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


1950s-1960s… Mainframe
• Big, extremely reliable, secure transaction
oriented business applications.
• Not really measured by MIPS or FLOPS
• CPU + I/O
• Exist today!
– No more punch cards; now have (web)
user interface
– NASA Just powered down it’s last
mainframe (2012, IBM Z9)
• Highly Centralized - Filled the room
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Mainframe for developers

• No developer productivity + Time Sharing


• Assembly/Cobol/Fortran

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Supercomputers vs. Mainframes
• Supercomputer
• At the frontline of current processing capacity (speed)
• Mostly HPC usage for scientific and engineering
problems (data/number crunching)
• Measured in FLOPS(peta flops) or TEPS (amazon)
• *as of 2011 only 9 computers in the world were
measured in gigaTEPS (but you can do better)
– Mainframes
• Measured in MIPS (instructions usually integers)
• Built for (business) transaction reliability (goods, money,
services, airlines)
• Transaction includes DB I/O

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Supercomputers vs Distributed computing
• Supercomputer
• Exotic Architectures
• 100,000s of processors (how do they communicate?)
– Distributed Architectures of Computing
• Usage of computer clusters
– Tightly coupled systems
– Single system image
– Centralized Job management & scheduling system
• Usage of computer grids
– Loosely coupled (Decentralization)
– Diversity and Dynamism
– Distributed Job Management & scheduling

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


1970s… Minis
– Less then 25K$ vs 1M$ for mainframes
– Made for smaller organizations, departments, relatively
affordable
– UNIX was originally a minicomputer OS, while Windows
NT—the foundation for all current versions of Microsoft
Windows—borrowed design ideas liberally from VMS
and UNIX
- Pascal / C

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


1980s – middle of 2000… PCs

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


PC
• 1K USD (?)
• Made for a single person
• Microprocessor
• Programmer Productivity
– Smalltalk, C++
• End of the 20th century we thought this is where the
computation will be
– See Microsoft software and UI directions
– See SETI
• Famous Steve vs Bill
– Unique/Generic HW/SW
– “People who are really serious about software should make
their own hardware” (Alan Kray)
• Finally Socket based programming
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
1990s… Networking + Servers
• Ethernet, Patch Panels
• Lower cost of LANs
• Software
– Beginning of Java/Corba/RMI
– Microsoft Component Oriented Programming (COM),
DCOM
• Architectures
– Client/Server
– 3-Tier, RPC Based
• Administration explosion

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


2000s - Internet

• Infrastructure stability and cost go down


• Popularity and reach go up
• Reliable protocol stack
• Software
– ROR/PHP/Python – (Scripting + Interrupters)
– .NET, Java (Component oriented)
– Service Oriented Architectures
• Lack of customization and multi tenancy

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Typical web/large scale enterprise app

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Tiers vs Layers

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Today

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Cloud  Principals  
 
Pay  only  for  what  you  use  
 
Ability  to  scale  up  and  scale  down  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Cloud Computing

• Store Data
• Run Applications
– Combined with:
• Utility model
• Elastic Nature

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


What Is Cloud Computing?
Cloud  Compu)ng:  App  and  Infrastructure  over  Internet  

Compute  as  a  Service:  Applica)ons  over  the  Internet  

U)lity  Compu)ng:  “Pay-­‐as-­‐You-­‐Go”  Datacenter  Hardware  and  SoRware  

Three  New  Aspects  to  Cloud  Compu)ng  

The  Illusion  of  Infinite  Compu)ng  Resources  Available  on  Demand  

The  Elimina)on  of  an  Upfront  Commitment  by  Cloud  Users  

The  Ability  to  Pay  for  Use  of  Compu)ng  Resources    


on  a  Short-­‐Term  Basis  as  Needed  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Why Now?

• It’s possible
• It’s mandatory
• “Web Space Race”
– Build extremely large datacenters (10,000/0’s)
– Driven by growth (more users, more data)
• Operations & Infrastructure expertise
• Broadband got better

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Platform Evolution

•  Hosted  so1ware  pla6orm  


Client   Server   Mobile   Cloud  
•  Shared  infrastructure  
•  Virtualized  and  dynamic  
•  Increasingly  higher  level  services  
•  Pay  as  you  go  pricing  model  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Platform Continuum

On-­‐Premises   Hosted   Cloud  


Servers   Servers   Pla`orm  

• Bring  your  own  machines,   • Ren)ng  machines,   • Shared,  mul)-­‐tenant  


connec)vity,  soRware,   connec)vity,  soRware   infrastructure  
etc.   • Less  control   • Virtualized  &  dynamic  
• Complete  control   • Fewer  responsibili)es   • Scalable  &  available  
• Complete  responsibility   • Lower  capital  costs     • Abstracted  from  the  
• Sta)c  capabili)es   • More  flexible   infrastructure  
• Upfront  capital  costs  for   • Pay  for  fixed  capacity,   • Higher-­‐level  services  
the  infrastructure   even  if  idle   • Pay  as  you  go  
On-­‐Premise  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


%  of    
Capital  Equipment  Budget    
spent  on  IT  in  2000?  

45%  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2Commerce
015   Department Statistics
%  of  U)lized    
Server  Capacity  
on  Average?  

6%  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015   Economist Survey on IT, 2008
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Elasticity – Provisioning for Peak
Real World Server Utilization is 5% to 20%
• Many Services Peak Exceeds Average by a Factor of 2 to 10
• Most Provision for Peak
• Painful to Under-Provision (Lost Customers)

Provisioning  for  Peak  


Without  Elas)city,    
We  Waste  Resources  
(Shaded  Areas)  
During  Non-­‐Peak  Times  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Elasticity: Risks of Under-Provisioning

Under-­‐Provisioning  #1  
Poten)al  Revenue  
(Shaded  Area)  Is  
Sacrificed  

Under-­‐Provisioning  #2  
Some  Users  Respond  to  
Under-­‐Provisioning  by  
Permanently  Deser)ng  
the  Site...      
Bad  for  Revenue!  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Fallback:  Hosted  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


WebHos)ng  Contract  Boilerplate  
Agreement  Provisions  Included   – Scope    
– Key  Tasks  and  Milestones    
– Scope  of  Services    
– Project  Deliverables    
– Price  and  Payment    
– Time  and  Cost  Es)mates    
– Term  and  Termina)on    
– Price  and  Payment    
– Customer  Service    
– Invoices    
– User's  Warran)es  and  Obliga)ons    
– Payment    
– Ownership  of  Intellectual  Property    
– Project  Organiza)on  and  Personnel  Requirements    
– Warranty  and  Disclaimer    
– Suppor)ng  Documenta)on    
– Limita)on  of  Liability    
– Expenses    
– Indemnifica)on  of  Host    
– Confiden)al  Informa)on      
– Rela)on  of  Par)es    
– Employee  Solicita)on/Hiring    
Exhibit  B:  Service  Level  Agreement  
– Down)me    
– Non-­‐assignment    
– Technical  Support    
– Arbitra)on    
– Aiorneys'  Fees      
– Severability    
– Force  Majeure    
Exhibit  C:  Web  Hos)ng  Acceptable  Use  
– No  Waiver     Policy  
– En)re  Agreement     – Acceptable  Use    
– Repor)ng  of  Viola)ons  of  This  Acceptable  Use  Policy    
– Revisions  to  This  Acceptable  Use  Policy    
Exhibit  A:  Statement  of  Work  
– Preamble      
– Project  Background    
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Solutions

• Time
• Cost
• Scale
• Storage Size
• Availability
• Connectivity
• Security
• How-to-do-it-right

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Cloud  Workload  Paierns  

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Workload  Paierns  Op)mal  For  Cloud            
“On  and  Off  “       “Growing  Fast“      

Compute    
Compute    

 
Inac)vity  
Period    
Average  Usage  
Average   Usage  

Time     Time    

On  &  off  workloads  (e.g.  batch  job)   Successful  services  needs  to  grow/scale        
Over  provisioned  capacity  is  wasted     Keeping  up  w/  growth  is  big  IT  challenge    
Time  to  market  can  be  cumbersome     Complex  lead  )me  for  deployment  

“Unpredictable  Burs)ng“       “Predictable  Burs)ng“      


Compute    

Compute    
Average  Usage     Average  Usage    

Time     Time    

Unexpected/unplanned  peak  in  demand       Services  with  micro  seasonality  trends        


Sudden  spike  impacts  performance     Peaks  due  to  periodic  increased  demand  
Can’t  over  provision  for  extreme  cases     IT  complexity  and  wasted  capacity        
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Application Models
Web Hosting High Performance Computing
§ Massive scale infrastructure § Parallel & distributed processing
§ Burst & overflow capacity § Massive modeling & simulation
§ Temporary, ad-hoc sites § Advanced analytics
Application Hosting Information Sharing
§ Hybrid applications § Reference data
§ Composite applications § Common data repositories
§ Automated agents / jobs § Knowledge discovery & mgmt
Media Hosting & Processing Collaborative Processes
§ CGI rendering § Multi-enterprise integration
§ Content transcoding § B2B & e-commerce
§ Media streaming § Supply chain management
Distributed Storage § Health & life sciences
§ External backup and storage § Domain-specific services
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Cloud Services

“IaaS”
Infrastructure-as-a-Service
“PaaS”
Platform-as-a-Service
“SaaS”
Software-as-a-Service

host build consume

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Cloud Services

Packaged Infrastructure Platform Software


(as a Service) (as a Service) (as a Service)
Software

You manage
Applications Applications Applications Applications
You manage
Data Data Data Data

Runtime Runtime Runtime Runtime

Managed by vendor
Middleware Middleware Middleware Middleware
You manage

Managed by vendor
O/S O/S O/S O/S
Managed by vendor

Virtualization Virtualization Virtualization Virtualization

Servers Servers Servers Servers

Storage Storage Storage Storage

Networking Networking Networking Networking


State of Cloud Computing
• Perceptions > Types
– “The end of software” • Public
– On-demand infrastructure • Private
– Cheaper and better • Internal
• Reality • External
– Hybrid world; not “all-or-nothing” • Hybrid
– Leverage existing IT skills and
investments > Categories
– Seamless user experiences • SaaS
– Evolutionary; not revolutionary • PaaS
• Drivers • IaaS
– Ease-of-use, convenience
– Product effectiveness
– Simplify IT, reduce costs

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Public cloud concerns

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Economy of scale

• Bandwidth (not so much) OpEx


• Network (not so much) CapEx
• Servers (Yes, but)
– Cirrascale, SGI, Rackable

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Amazon Prices March 2012

Ques)ons:  
1. Is  this  cheap  or  expensive?  Why?  
2. Will  windows  be  cheaper?  Why?  
3. Why  are  the  memory  numbers  are  floats?  
4. How  can  I  make  this  cheaper?  
 
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
 
Spot Instances – bid!

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


And,  what  about  these  new  scenarios?  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  
Performance – FLOPS & Big Data Problems

• Climate Change Prediction Program


– Weather Prediction for a week, takes 24 hours
• 56Gflops
– Climate Prediction for 50 years, takes 30 days
• 4.8 Tflops
• How much on my laptop?

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Internet-Scale Application

• 2009 stats:
– +200B pageviews/month – +20B photos, +2B/month growth
– >3.9T feed actions/day – 600,000 photos served / sec
– +300M active users – 25TB log data / day processed
– >1B chat mesgs/day thru Scribe
– 100M search queries/day – 120M queries /sec on memcache
– >6B minutes spent/day (ranked
#2 on Internet)

• Scaling the “relational” data:


– Keeps data normalized,
randomly distributed, accessed
at high volumes
– Uses “shared nothing”
architecture

Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  


Internet-Scale Application

• 2007 stats:
– +20 petabytes of data processed / day by +100K MapReduce jobs
– 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
– +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage

• ~40 GB/sec aggregate read/


write throughput across the
cluster
• +500 servers for each
search query < 500ms
• Scaling the process:
– MapReduce: parallel
processing framework
– BigTable: structured hash
database
– Google File System:
massively scalable distributed
storage
Dan  Amiga  –  IDC  Cloud  Compu)ng  2015  

You might also like