The Changing Memory Hierarchy
The Changing Memory Hierarchy
Oneofthemainwaystoincreasesystemperformanceisminimisinghowfardownthememoryhierarchy
onehastogotomanipulatedata.It'snotjustsystemlevelprogrammersthatneedtobeawareoftheseissues,
asmostsystemshaveatime/costrequirement,beithowfastyourwebapplicationresponds,orhowmany
racksyouneedinyourdatacenter.
InthiseraofmultipleCPUspersystemthiscanbefurthercomplicatedforprogrammersduetomemory
contentionbetweeneachCPU.Also,virtualizationintroducesfurthercomplications.Considerthefollowing
diagramwhichshowsthememoryhierarchycurrentlyina4socketby4coresystem,whichUlrichDrepper
mentionsisgoingtobeacommonsysteminhisexcellentpaperoncomputermemory.
[UpdateSep2010:NotetheorganisationofcachelevelsinmulticoreCPUscanvaryquiteabit]
http://www.pixelbeat.org/docs/memory_hierarchy/ 1/3
07/03/2017 Thechangingmemoryhierarchy
[UpdateOct2010:hwlocisahandytoolforautomaticallygeneratingdiagramslikethese]
Upuntillatelywe'vejusthadincrementalimprovementstotheperformance(notsize),ofRAMand
mechanicalharddisks,andCPUperformancehasdivergedfromthemalot.Sochangestothememory
hierarchywouldbothspeedsystemsupalot,andsimplifysoftwarerunningontheCPU.It'stheseexciting
changesthatarehappeningnowandinthenextfewyearsthatI'mfocusingonhere.
[UpdateOct2015:Asstatedabove,thedivergenceinspeedbetweenmainmemoryandCPUs,impliesmuch
moreperformanceforefficientuseoftheCPUcaches.Thisisdemonstratedinprofilinghardwareevents,
whereadjustingthememorysizeandaccesspatternreducestheaccessdepthinthememoryhierarchy,thus
greatlyincreasingperformance.Nowoftenit'snotpossibleorpracticaltoadjustallmemoryaccesses,andso
IntelasoftheBroadwellmicroarchitecture(Sep2014),hasmadeCATavailable(incertainXEON
processorstostart),whichallowsonetodynamicallypartitionthesharedcache,tolimitwhatpartofthe
cachecanbewrittentobyacore.Inthisway,restrictingVMs/containers/apps/...toacore,willrestrictthem
toevictingonlypartofthesharedcacheacrosscores,resultinginmoreefficientutilizationofthesystem.
ThisiswellexplainedinDanLuu'ssummaryofCATadvantages.Partitioningfunctionalitylikethiswillalso
improvesecurityisolation,andprotectagainstsidechannelattacks.Infuturedynamiccacheallocationwill
probablybecomeavailableonmostCPUsandacrossmorecachelevels.]
[UpdateSep2015:Notecachecoherenceisabiglimitationtothenumberofcorespossible,andanew
"tardis"cachecoherencemodelpromisingtoremovethelinearincreaseincacheaccountingmemoryper
core.Itworksbytaggingtheoperationswithacountertoorderreads/writes,thusallowingcorestooperate
onolderdataifthatsuffices.Generationcountersareusefulforrelativeorderingratherthantryingto
synchronizewiththeuniversewithtimestampsorsomething.Iproposedonlkml(andstillstandby)asimilar
mechanismforrelativeorderingoffileswithinafilesystem.Distributedcores/filesystemscanusehigher
levelmethodsforcoherence,butwithinthe"system"countershaveanadvantage.]
SolidStateDisks
ConsiderforexamplehowSSDsaffectprocessingofalargefileonamulticoresystem.Becauserandom
seeksareofnoextracostonSSDscomparedtomechanicaldisks,it'ssensibleformultiplecorestoprocess
separateportionsofafiledirectly.Withmechanicaldiskseachcorewouldjustbefightingoverthe
mechanicaldiskhead,andslowdownalotcomparedtojustasinglecoreprocessingthefile.Inotherwords,
datapartitioningtotakeadvantageofmultiplecoresismuchmorecomplicatedformechanicaldisksthanfor
SSDs,requiringmorecomplexlogicandarraysofdiskstoachieveparallelization.Noteforcertain
operationslikesorting,onehastotakeRAMsizeintoaccount,sothecoresshouldprocesschunksofthefile
inparallelwhereeachchunkis((ramsize/numcpus)abit).Forotheroperationslikesearchingforexample,
RAMsizeisnotafactor,andonecanjustsplitthefileintochunksof(filesize/numcpus).[UpdateDec
2012:GiventhewideningdisparitybetweentraditionaldisksandSSDs,they'reseparatingouttodistinct
http://www.pixelbeat.org/docs/memory_hierarchy/ 2/3
07/03/2017 Thechangingmemoryhierarchy
layersinthememoryhierarchy.Totakeadvantageofthis,hybriddrivesarebecomingavailable,asis
softwaretotransparentlycombineseparatedrives,likeSRTorLinuxsolutionslikebcache.][UpdateJan
2016:ACMQueuediscussiononfasternonvolatilestorage"itisrarethattheperformanceassumptionsthat
wemakeaboutanunderlyinghardwarecomponentchangeby1,000x".]
2TransistorDRAM
2TDRAMcurrentlybeingdevelopedbyIntel,hasthepotentialtoenhancecachesinCPUsatleast.Youcan
seeinthediagramabovethatthelevel2cachecanbebothusedtospeedaccesstotherelativelyslowRAM
andspeedupcommunicationbetweencoresinasingleprocessor.Whenthismemorywallislowereditagain
givestheopportunitytousedifferentalgorithms,especiallyonmulticoresystems.TianTianofIntelhas
writtenagoodarticleonhowsharedcachesenhanceamulticoresystemandhowprogrammerscantake
furtheradvantageofthem.TherealsoisanothergoodACMarticleonoptimizingapplicationperformancein
thepresenceofcaches,andthisexcellentpresentationonlockfreealgorithmstakingconsiderationsofthe
currentmemoryhierarchy.[UpdateDec2008:InoticedanIEEEreferencetoaSandiaNationalLaboratories
simulation,whichshowedthatformanyapplications,thememorywallwithcurrentarchitecturescauses
performancetodeclinewithgreaterthan8processors,soitlooksliketechnologylike2TDRAMwillbe
requiredinthenearfuture.]
MRAMandMemristors
Thesetechnologieshavethepotentialtobethebiggestgamechangers.They'reessentiallyveryfastnon
volatilememory,andsowillaffectbothcurrentRAMandflashtechnologies.
MRAMhasbeenindevelopmentforawhile,butwhilebeingaboutastwiceasfastascurrentRAM
technologies,it'smuchmoreexpensive.HoweverresearchersinGermanyhaverecentlyfiguredouthowto
makeit10timesfasteragain!
MemristorshaverecentlybeencreatedbyHPlabsandagaintheyhavethepotentialtobeafast,dense,
cheap,nonvolatilememory.Thememristorwasfirsttheorizedin1971byLeonChua,beingafourth
fundamentalcircuitelement,havingpropertiesthatcannotbeachievedbyanycombinationoftheotherthree
elements(resistor,inductor,capacitor).[UpdateSep2010:Memristorswillbeavailableby2014apparently.]
[UpdateNov2011:Youcanapparentlymakehomemadememristors:)][UpdateJun2014:Informative
memristorinfoandroadmapfromHP]Interestingtimes...
[UpdateJul2015:3DXpointwasannouncedbyIntel/Microntobeavailablein2016.Mostlymarketingfor
now,butasatransistorlessnonvolatiletechnology,haspotentialtobeanotherlevelinthehierarchyunder
DRAMatfirst,andeventuallyreplacingitaltogether.]
Aug192008
http://www.pixelbeat.org/docs/memory_hierarchy/ 3/3