0% found this document useful (0 votes)
153 views

GPU Programming in MATLAB

The document discusses using GPUs to accelerate MATLAB code by parallelizing computations. It describes how GPUs have massively parallel architectures and can speed up applications that are computationally intensive and highly parallelizable. The document provides an example of porting a wave equation solver to a GPU by making minimal changes to the code to use GPU-enabled MATLAB functions and transfer data to the GPU memory.

Uploaded by

khaard
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views

GPU Programming in MATLAB

The document discusses using GPUs to accelerate MATLAB code by parallelizing computations. It describes how GPUs have massively parallel architectures and can speed up applications that are computationally intensive and highly parallelizable. The document provides an example of porting a wave equation solver to a GPU by making minimal changes to the code to use GPU-enabled MATLAB functions and transfer data to the GPU memory.

Uploaded by

khaard
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

12/19/2014

GPU Programming in MATLAB

GPU Programming in MATLAB


ByJillReese,MathWorksandSarahZaranek,MathWorks
Multicoremachinesandhyperthreadingtechnologyhaveenabledscientists,engineers,andfinancialanalysts
tospeedupcomputationallyintensiveapplicationsinavarietyofdisciplines.Today,anothertypeofhardware
promisesevenhighercomputationalperformance:thegraphicsprocessingunit(GPU).
Originallyusedtoaccelerategraphicsrendering,GPUsareincreasinglyappliedtoscientificcalculations.
UnlikeatraditionalCPU,whichincludesnomorethanahandfulofcores,aGPUhasamassivelyparallelarray
ofintegerandfloatingpointprocessors,aswellasdedicated,highspeedmemory.AtypicalGPUcomprises
hundredsofthesesmallerprocessors(Figure1).

Figure1.ComparisonofthenumberofcoresonaCPUsystemandaGPU.

ThegreatlyincreasedthroughputmadepossiblebyaGPU,however,comesatacost.First,memoryaccess
becomesamuchmorelikelybottleneckforyourcalculations.DatamustbesentfromtheCPUtotheGPU
beforecalculationandthenretrievedfromitafterwards.BecauseaGPUisattachedtothehostCPUviathe
PCIExpressbus,thememoryaccessisslowerthanwithatraditionalCPU.1 Thismeansthatyouroverall
computationalspeedupislimitedbytheamountofdatatransferthatoccursinyouralgorithm.Second,
programmingforGPUsinCorFortranrequiresadifferentmentalmodelandaskillsetthatcanbedifficultand
timeconsumingtoacquire.Additionally,youmustspendtimefinetuningyourcodeforyourspecificGPUto
optimizeyourapplicationsforpeakperformance.
ThisarticledemonstratesfeaturesinParallelComputingToolboxthatenableyoutorunyourMATLAB
codeonaGPUbymakingafewsimplechangestoyourcode.Weillustratethisapproachbysolvingasecond
orderwaveequationusingspectralmethods.

WhyParallelizeaWaveEquationSolver?
Waveequationsareusedinawiderangeofengineeringdisciplines,includingseismology,fluiddynamics,
acoustics,andelectromagnetics,todescribesound,light,andfluidwaves.
Analgorithmthatusesspectralmethodstosolvewaveequationsisagoodcandidateforparallelization
becauseitmeetsbothofthecriteriaforaccelerationusingtheGPU(see"WillExecutiononaGPUAccelerate
MyApplication?"):
Itiscomputationallyintensive.ThealgorithmperformsmanyfastFouriertransforms(FFTs)andinverse
fastFouriertransforms(IFFTs).Theexactnumberdependsonthesizeofthegrid(Figure2)andthenumber
oftimestepsincludedinthesimulation.EachtimesteprequirestwoFFTsandfourIFFTsondifferent
matrices,andasinglecomputationcaninvolvehundredsofthousandsoftimesteps.
Itismassivelyparallel.TheparallelFFTalgorithmisdesignedto"divideandconquer"sothatasimilartask
isperformedrepeatedlyondifferentdata.Additionally,thealgorithmrequiressubstantialcommunication

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

1/6

12/19/2014

GPU Programming in MATLAB

betweenprocessingthreadsandplentyofmemorybandwidth.TheIFFTcansimilarlyberuninparallel.

Figure2.Asolutionforasecondorderwaveequationona32x32grid(seeanimation
(http://www.mathworks.com/videos/solutionofsecondorderwaveequationanimation79288.html?type=shadow)
).

WillExecutiononaGPUAccelerateMyApplication?
AGPUcanaccelerateanapplicationifitfitsbothofthefollowingcriteria:
ComputationallyintensiveThetimespentoncomputationsignificantlyexceedsthetimespentontransferringdata
toandfromGPUmemory.
MassivelyparallelThecomputationscanbebrokendownintohundredsorthousandsofindependentunitsofwork.
ApplicationsthatdonotsatisfythesecriteriamightactuallyrunsloweronaGPUthanonaCPU.

GPUComputinginMATLAB
Beforecontinuingwiththewaveequationexample,let'squicklyreviewhowMATLABworkswiththeGPU.
FFT,IFFT,andlinearalgebraicoperationsareamongmorethan100builtinMATLABfunctionsthatcanbe
executeddirectlyontheGPUbyprovidinganinputargumentofthetypeGPUArray,aspecialarraytype
providedbyParallelComputingToolbox.TheseGPUenabledfunctionsareoverloadedinotherwords,they
operatedifferentlydependingonthedatatypeoftheargumentspassedtothem.
Forexample,thefollowingcodeusesanFFTalgorithmtofindthediscreteFouriertransformofavectorof
pseudorandomnumbersontheCPU:
A = rand(2^16,1);

B = fft(A);

ToperformthesameoperationontheGPU,wefirstusethegpuArraycommandtotransferdatafromthe
MATLABworkspacetodevicememory.Thenwecanrunfft,whichisoneoftheoverloadedfunctionsonthat
data:
A = gpuArray(rand(2^16,1));

B = fft(A);

ThefftoperationisexecutedontheGPUratherthantheCPUsinceitsinput(aGPUArray)isheldonthe
GPU.
Theresult,B,isstoredontheGPU.However,itisstillvisibleintheMATLABworkspace.Byrunningclass(B),
wecanseethatitisaGPUArray.
class(B)

ans =

parallel.gpu.GPUArray

WecancontinuetomanipulateBonthedeviceusingGPUenabledfunctions.Forexample,tovisualizeour

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

2/6

12/19/2014

GPU Programming in MATLAB

results,theplotcommandautomaticallyworksonGPUArrays:
plot(B);

ToreturnthedatabacktothelocalMATLABworkspace,youcanusethegathercommandforexample
C = gather(B);
CisnowadoubleinMATLABandcanbeoperatedonbyanyoftheMATLABfunctionsthatworkondoubles.

Inthissimpleexample,thetimesavedbyexecutingasingleFFTfunctionisoftenlessthanthetimespent
transferringthevectorfromtheMATLABworkspacetothedevicememory.Thisisgenerallytruebutis
dependentonyourhardwareandsizeofthearray.Datatransferoverheadcanbecomesosignificantthatit
degradestheapplication'soverallperformance,especiallyifyourepeatedlyexchangedatabetweentheCPU
andGPUtoexecuterelativelyfewcomputationallyintensiveoperations.Itismoreefficienttoperformseveral
operationsonthedatawhileitisontheGPU,bringingthedatabacktotheCPUonlywhenrequired2 .
NotethatGPUs,likeCPUs,havefinitememories.However,unlikeCPUs,theydonothavetheabilitytoswap
memorytoandfromdisk.Thus,youmustverifythatthedatayouwanttokeepontheGPUdoesnotexceed
itsmemorylimits,particularlywhenyouareworkingwithlargematrices.ByrunninggpuDevice,youcanquery
yourGPUcard,obtaininginformationsuchasname,totalmemory,andavailablememory.

ImplementingandAcceleratingtheAlgorithmtoSolveaWaveEquationinMATLAB
Toputtheaboveexampleintocontext,let'simplementtheGPUfunctionalityonarealproblem.Our
computationalgoalistosolvethesecondorderwaveequation

withtheconditionu=0ontheboundaries.Weuseanalgorithmbasedonspectralmethodstosolvethe
equationinspaceandasecondordercentralfinitedifferencemethodtosolvetheequationintime.
Spectralmethodsarecommonlyusedtosolvepartialdifferentialequations.Withspectralmethods,the
solutionisapproximatedasalinearcombinationofcontinuousbasisfunctions,suchassinesandcosines.In
thiscase,weapplytheChebyshevspectralmethod,whichusesChebyshevpolynomialsasthebasis
functions.
Ateverytimestep,wecalculatethesecondderivativeofthecurrentsolutioninboththexandydimensions
usingtheChebyshevspectralmethod.Usingthesederivativestogetherwiththeoldsolutionandthecurrent
solution,weapplyasecondordercentraldifferencemethod(alsoknownastheleapfrogmethod)tocalculate
thenewsolution.Wechooseatimestepthatmaintainsthestabilityofthisleapfrogmethod.
TheMATLABalgorithmiscomputationallyintensive,andasthenumberofelementsinthegridoverwhichwe
computethesolutiongrows,thetimethealgorithmtakestoexecuteincreasesdramatically.Whenexecutedon
asingleCPUusinga2048x2048grid,ittakesmorethanaminutetocompletejust50timesteps.Notethat
thistimealreadyincludestheperformancebenefitoftheinherentmultithreadinginMATLAB.SinceR2007a,
MATLABsupportsmultithreadedcomputationforanumberoffunctions.Thesefunctionsautomatically
executeonmultiplethreadswithouttheneedtoexplicitlyspecifycommandstocreatethreadsinyourcode.
WhenconsideringhowtoacceleratethiscomputationusingParallelComputingToolbox,wewillfocusonthe
codethatperformscomputationsforeachtimestep.Figure3illustratesthechangesrequiredtogetthe
algorithmrunningontheGPU.NotethatthecomputationsinvolveMATLABoperationsforwhichGPUenabled
overloadedfunctionsareavailablethroughParallelComputingToolbox.TheseoperationsincludeFFTand
IFFT,matrixmultiplication,andvariouselementwiseoperations.Asaresult,wedonotneedtochangethe
algorithminanywaytoexecuteitonaGPU.WesimplytransferthedatatotheGPUusinggpuArraybefore
enteringtheloopthatcomputesresultsateachtimestep.

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

3/6

12/19/2014

GPU Programming in MATLAB

Figure3.CodeComparisonToolshowingthedifferencesintheCPUandGPUversionsofthecode.TheGPUandCPU
versionsshareover84%oftheircodeincommon(94linesoutof111).

AfterthecomputationsareperformedontheGPU,wetransfertheresultsfromtheGPUtotheCPU.Each
variablereferencedbytheGPUenabledfunctionsmustbecreatedontheGPUortransferredtotheGPU
beforeitisused.
ToconvertoneoftheweightsusedforspectraldifferentiationtoaGPUArrayvariable,weuse
W1T = gpuArray(W1T);

CertaintypesofarrayscanbeconstructeddirectlyontheGPUwithoutourhavingtotransferthemfromthe
MATLABworkspace.Forexample,tocreateamatrixofzerosdirectlyontheGPU,weuse
uxx = parallel.gpu.GPUArray.zeros(N+1,N+1);

WeusethegatherfunctiontobringdatabackfromtheGPUforexample:
vvg = gather(vv);

NotethatthereisasingletransferofdatatotheGPU,followedbyasingletransferofdatafromtheGPU.All
thecomputationsforeachtimestepareperformedontheGPU.

ComparingCPUandGPUExecutionSpeeds
ToevaluatethebenefitsofusingtheGPUtosolvesecondorderwaveequations,weranabenchmarkstudy
inwhichwemeasuredtheamountoftimethealgorithmtooktoexecute50timestepsforgridsizesof64,128,
512,1024,and2048onanIntelXeonProcessorX5650andthenusinganNVIDIATeslaC2050GPU.
Foragridsizeof2048,thealgorithmshowsa7.5xdecreaseincomputetimefrommorethanaminuteonthe
CPUtolessthan10secondsontheGPU(Figure4).ThelogscaleplotshowsthattheCPUisactuallyfaster
forsmallgridsizes.Asthetechnologyevolvesandmatures,however,GPUsolutionsareincreasinglyableto
handlesmallerproblems,atrendthatweexpecttocontinue.

Figure4.Plotofbenchmarkresultsshowingthetimerequiredtocomplete50timestepsfordifferentgridsizes,using
eitheralinearscale(left)oralogscale(right).

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

4/6

12/19/2014

GPU Programming in MATLAB

AdvancedGPUProgrammingwithMATLAB
ParallelComputingToolboxprovidesastraightforwardwaytospeedupMATLABcodebyexecutingitona
GPU.Yousimplychangethedatatypeofafunction'sinputtotakeadvantageofthemanyMATLAB
commandsthathavebeenoverloadedforGPUArrays.(AcompletelistofbuiltinMATLABfunctionsthat
supportGPUArrayisavailableintheParallelComputingToolboxdocumentation
(http://www.mathworks.com/help/toolbox/distcomp/bsic4fr1.html#bsloua31).)
ToaccelerateanalgorithmwithmultiplesimpleoperationsonaGPU,youcanusearrayfun,whichappliesa
functiontoeachelementofanarray.BecausearrayfunisaGPUenabledfunction,youincurthememory
transferoverheadonlyonthesinglecalltoarrayfun,notoneachindividualoperation.
Finally,experiencedprogrammerswhowritetheirownCUDAcodecanusetheCUDAKernelinterfacein
ParallelComputingToolboxtointegratethiscodewithMATLAB.TheCUDAKernelinterfaceenableseven
morefinegrainedcontroltospeedupportionsofcodethatwereperformancebottlenecks.Itcreatesa
MATLABobjectthatprovidesaccesstoyourexistingkernelcompiledintoPTXcode(PTXisalowlevelparallel
threadexecutioninstructionset).YoutheninvokethefevalcommandtoevaluatethekernelontheGPU,
usingMATLABarraysasinputandoutput.

Summary
EngineersandscientistsaresuccessfullyemployingGPUtechnology,originallyintendedforaccelerating
graphicsrendering,toacceleratetheirdisciplinespecificcalculations.Withminimaleffortandwithoutextensive
knowledgeofGPUs,youcannowusethepromisingpowerofGPUswithMATLAB.GPUArraysandGPU
enabledMATLABfunctionshelpyouspeedupMATLABoperationswithoutlowlevelCUDAprogramming.If
youarealreadyfamiliarwithprogrammingforGPUs,MATLABalsoletsyouintegrateyourexistingCUDA
kernelsintoMATLABapplicationswithoutrequiringanyadditionalCprogramming.
ToachievespeedupswiththeGPUs,yourapplicationmustsatisfysomecriteria,amongthemthefactthat
sendingthedatabetweentheCPUandGPUmusttakelesstimethantheperformancegainedbyrunningon
theGPU.Ifyourapplicationsatisfiesthesecriteria,itisagoodcandidatefortherangeofGPUfunctionality
availablewithMATLAB.
GPUGlossary
CPU(centralprocessingunit).Thecentralunitinacomputerresponsibleforcalculationsandforcontrollingor
supervisingotherpartsofthecomputer.TheCPUperformslogicalandfloatingpointoperationsondataheldinthe
computermemory.
GPU(graphicsprocessingunit).Programmablechiporiginallyintendedforgraphicsrendering.Thehighlyparallel
structureofaGPUmakesthemmoreeffectivethangeneralpurposeCPUsforalgorithmswhereprocessingoflarge
blocksofdataisdoneinparallel.
Core.AsingleindependentcomputationalunitwithinaCPUorGPUchip.CPUandGPUcoresarenotequivalentto
eachotherGPUcoresperformspecializedoperationswhereasCPUcoresaredesignedforgeneralpurposeprograms.
CUDA.AparallelcomputingtechnologyfromNVIDIAthatconsistsofaparallelcomputingarchitectureanddeveloper
tools,libraries,andprogrammingdirectivesforGPUcomputing.
Device.AhardwarecardcontainingtheGPUanditsassociatedmemory.
Host.TheCPUandsystemmemory.
Kernel.CodewrittenforexecutionontheGPU.Kernelsarefunctionsthatcanrunonalargenumberofthreads.The
parallelismarisesfromeachthreadindependentlyrunningthesameprogramondifferentdata.

Published201191967v01

References
1.SeeChapter6(MemoryOptimization)oftheNVIDIACUDACBestPracticesdocumentationforfurtherinformation
aboutpotentialGPUcomputingbottlenecksandoptimizationofGPUmemoryaccess.
2.SeeChapter6(MemoryOptimization)oftheNVIDIACUDACBestPracticesdocumentationforfurtherinformation
aboutimprovingperformancebyminimizingdatatransfers.

ProductsUsed

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

5/6

12/19/2014

GPU Programming in MATLAB

MATLAB(http://www.mathworks.com/products/matlab)
ParallelComputingToolbox(http://www.mathworks.com/products/parallelcomputing)

LearnMore
SpectralMethods,LloydN.Trefethen(http://www.mathworks.com/support/books/book48110.html?
category=6&language=1&view=category)
IntroductiontoMATLABGPUComputing(http://www.mathworks.com/discovery/matlabgpu.html)
AcceleratingSignalProcessingAlgorithmswithGPUsandMATLAB
(http://www.mathworks.com/discovery/gpusignalprocessing.html)

Thispagewasprintedfrom:http://www.mathworks.com/company/newsletters/articles/gpuprogramminginmatlab.html

19942014TheMathWorks,Inc.

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

6/6

You might also like