0% found this document useful (0 votes)
43 views24 pages

Hashing in Networked Systems: Mike Freedman

The document discusses various uses of hashing in networked systems, including equal-cost multipath routing, network load balancing, per-flow statistics, caching, and data partitioning. It describes how hashing is used to map data like IP addresses or URLs to buckets or nodes in a deterministic way to distribute load uniformly. The document also covers different hashing strategies like modulo hashing and consistent hashing, which allows nodes to be dynamically added or removed without remapping all data.

Uploaded by

Kank Riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views24 pages

Hashing in Networked Systems: Mike Freedman

The document discusses various uses of hashing in networked systems, including equal-cost multipath routing, network load balancing, per-flow statistics, caching, and data partitioning. It describes how hashing is used to map data like IP addresses or URLs to buckets or nodes in a deterministic way to distribute load uniformly. The document also covers different hashing strategies like modulo hashing and consistent hashing, which allows nodes to be dynamically added or removed without remapping all data.

Uploaded by

Kank Riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

LB

Server
Cluster
Switches

HashinginNetworkedSystems

COS461:ComputerNetworks
Spring2011

MikeFreedman
h@p://www.cs.princeton.edu/courses/archive/spring11/cos461/

Hashing
HashfuncIon
FuncIonthatmapsalarge,possiblyvariablesized
datumintoasmalldatum,oNenasingleintegerthat
servestoindexanassociaIvearray
Inshort:mapsnbitdatumintokbuckets(k<<2n)
ProvidesIme&spacesavingdatastructureforlookup

Maingoals:
Lowcost
DeterminisIc
Uniformity(loadbalanced)

Todaysoutline
Usesofhashing
EqualcostmulIpathrouInginswitches
Networkloadbalancinginserverclusters
PerowstaIsIcsinswitches(QoS,IDS)
CachingincooperaIveCDNsandP2Plesharing
DataparIIoningindistributedstorageservices

Varioushashingstrategies
Modulohashing
Consistenthashing
BloomFilters

UsesofHashing

EqualcostmulIpathrouIng(ECMP)

ECMP
MulIpathrouIngstrategythatsplitstracover
mulIplepathsforloadbalancing

Whynotjustroundrobinpackets?
Reordering(leadtotripleduplicateACKinTCP?)
DierentRTTperpath(forTCPRTO)
DierentMTUsperpath

EqualcostmulIpathrouIng(ECMP)

PathselecIonviahashing
#buckets=#outgoinglinks
HashnetworkinformaIon(source/destIPaddrs)to
selectoutgoinglink:preservesowanity

Now:ECMPindatacenters

DatacenternetworksaremulIrootedtree
Goal:Supportfor100,000sofservers
RecallEthernetspanningtreeproblems:Noloops
L3rouIngandECMP:TakeadvantageofmulIplepaths

Networkloadbalancing
Goal:Splitrequestsevenlyoverkservers
Mapnewowstoanyserver
PacketsofexisIngowsconInuetousesameserver

3approaches
LoadbalancerterminatesTCP,opensownconnecIontoserver
VirtualIP/DedicatedIP(VIP/DIP)approaches
OneglobalfacingvirtualIPrepresentsallserversincluster
HashclientsnetworkinformaIon(sourceIP:port)
NATapproach:ReplacevirtualIPwithserversactualIP
DirectServerReturn(DSR)

LoadbalancingwithDSR
LB
Server
Cluster
Switches

ServersbindtobothvirtualanddedicatedIP
LoadbalancerjustreplacesdestMACaddr
ServerseesclientIP,respondsdirectly
PacketinreversedirecIondonotpassthroughloadbalancer
Greaterscalability,parIcularlyfortracwithassymmetric
bandwidth(e.g.,HTTPGETs)

10

Perowstateinswitches
SwitchesoNenneedtomaintainconnecIon
recordsorperowstate
Qualityofserviceforows
Flowbasedmeasurementandmonitoring
PayloadanalysisinIntrusionDetecIonSystems(IDSs)

Onpacketreceipt:
HashowinformaIon(packet5tuple)
Performlookupifpacketbelongstoknownow
Otherwise,possiblycreatenewowentry
ProbabilisIcmatch(falseposiIves)maybeokay

11

CooperaIveWebCDNs
TreeliketopologyofcooperaIvewebcaches
Checklocal
Ifmiss,checksiblings/parent

Oneapproach
InternetCacheProtocol(ICP)
UDPbasedlookup,shortImeout

public
Internet
Parent
webcache

AlternaIveapproach
Aprioriguessissiblings/childrenhavecontent
Nodessharehashtableofcachedcontentwithparent/siblings
ProbabilisIccheck(falseposiIves)okay,asactualICPlookupto
neighborcouldjustreturnfalse

12

HashtablesinP2Plesharing

Twolayernetwork(e.g.,Gnutella,Kazaa)
Ultrapeersaremorestable,notNATted,higherbandwidth
Leafnodesconnectwith1ormoreultrapeers

Ultrapeershandlecontentsearchers
Leafnodessendhashtableofcontenttoultrapeers
Searchrequestsoodedthroughultrapeernetwork
Whenultrapeergetsrequest,checkshashtablesofits
childrenformatch

13

DataparIIoning
Networkloadbalancing:Allmachinesareequal
DataparIIoning:Machinesstoredierentcontent
NonhashbasedsoluIon
DirectoryservermaintainsmappingfromO(entries)to
machines(e.g.,Networklesystem,GoogleFileSystem)
Nameddatacanbeplacedonanymachine

HashbasedsoluIon
NodesmaintainmappingsfromO(buckets)tomachines
Dataplacedonthemachinethatownsthenamesbucket

14

ExamplesofdataparIIoning
Akamai
1000clustersaroundInternet,each>=1servers
Hash(URLsdomain)tomaptooneserver
AkamaiDNSawareofhashfuncIon,returnsmachinethat
1. isingeographicallynearbycluster
2. managesparIcularcustomerdomain

Memcached(Facebook,Twi@er,)
Employkmachinesforinmemorykeyvaluecaching
Onread:
Checkmemcache
Ifmiss,readdatafromDB,writetomemcache
Onwrite:invalidatecache,writedatatoDB

15

HowAkamaiWorksAlreadyCached
cnn.com (content provider)

GET
index.
html
1

DNS root server

Akamai server

Akamai high-level
DNS server

2
7

Akamai low-level DNS


server

8
9
Enduser

10
GET /cnn.com/foo.jpg

Nearby
hash-chosen
Akamai
server

Cluster

16

HashingTechniques

17

BasicHashTechniques
Simpleapproachforuniformdata
IfdatadistributeduniformlyoverN,forN>>n
Hashfn=<data>modn
Failsgoalofuniformityifdatanotuniform

Nonuniformdata,variablelengthstrings
Typicallysplitstringsintoblocks
PerformrollingcomputaIonoverblocks
CRC32checksum
CryptographichashfuncIons(SHA1has64byteblocks)

18

ApplyingBasicHashing
ConsiderproblemofdataparIIon:
GivendocumentX,chooseoneofkserverstouse

Supposeweusemodulohashing
Numberservers1..k
PlaceXonserveri=(Xmodk)
Problem?Datamaynotbeuniformlydistributed
PlaceXonserveri=hash(X)modk
Problem?
Whathappensifaserverfailsorjoins(kk1)?
WhatisdierentclientshasdierentesImateofk?
Answer:Allentriesgetremappedtonewnodes!

19

ConsistentHashing
insert(key
lookup(key
1,value)
1)

key1=value

key1

key2

key3

ConsistenthashingparIIonskeyspaceamongnodes
Contactappropriatenodetolookup/storekey
Bluenodedeterminesrednodeisresponsibleforkey1
Bluenodesendslookuporinserttorednode

20

ConsistentHashing

0000
0010
URL
00011

0110

1010
URL
01002

1100

1110 1111

URL
10113

ParIIoningkeyspaceamongnodes
NodeschooserandomidenIers:

e.g.,hash(IP)

KeysrandomlydistributedinIDspace:

e.g.,hash(URL)

KeysassignedtonodenearestinIDspace
Spreadsownershipofkeysevenlyacrossnodes

21

ConsistentHashing
0

ConstrucIon
AssignChashbucketstorandompoints
onmod2ncircle;hashkeysize=n

14
12

Bucket

MapobjecttorandomposiIononcircle
Hashofobject=closestclockwisebucket

Desiredfeatures
Balanced:NobuckethasdisproporIonatenumberofobjects
Smoothness:AddiIon/removalofbucketdoesnotcause
movementamongexisIngbuckets(onlyimmediatebuckets)
Spreadandload:Smallsetofbucketsthatlienearobject

22

BloomFilters
DatastructureforprobabilisIcmembershiptesIng
Smallamountofspace,constantImeoperaIons
FalseposiIvespossible,nofalsenegaIves
UsefulinperownetworkstaIsIcs,sharinginformaIon
betweencooperaIvecaches,etc.

Basicideausinghashfnsandbitarray
UsekindependenthashfuncIonstomapitemtoarray
Ifallarrayelementsare1,itspresent.Otherwise,not

23

BloomFilters
Startwithanmbitarray,lledwith0s.
0

Toinsert,hasheachitemkImes.IfHi(x)=a,setArray[a]=1.
0

Tocheckifyisinset,checkarrayatHi(y).Allkvaluesmustbe1.
0

PossibletohaveafalseposiIve:allkvaluesare1,butyisnotinset.
0

24

Todaysoutline
Usesofhashing
EqualcostmulIpathrouInginswitches
Networkloadbalancinginserverclusters
PerowstaIsIcsinswitches(QoS,IDS)
CachingincooperaIveCDNsandP2Plesharing
DataparIIoningindistributedstorageservices

Varioushashingstrategies
Modulohashing
Consistenthashing
BloomFilters

You might also like