0% found this document useful (0 votes)
277 views4 pages

Running COMSOL in Parallel On Clusters - 1001 - Knowledge Base

The document describes how to enable distributed parallelization in COMSOL using MPI on compute clusters. COMSOL supports distributed parallel operations on Windows and Linux clusters. Setup and configuration is explained along with troubleshooting tips.

Uploaded by

ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
277 views4 pages

Running COMSOL in Parallel On Clusters - 1001 - Knowledge Base

The document describes how to enable distributed parallelization in COMSOL using MPI on compute clusters. COMSOL supports distributed parallel operations on Windows and Linux clusters. Setup and configuration is explained along with troubleshooting tips.

Uploaded by

ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Support>KnowledgeBase>RunningCOMSOLinparallelonclusters

RunningCOMSOLinparallelonclusters
BrowsebyCategory

SolutionNumber:

1001

Title:

RunningCOMSOLinparallelonclusters

Platform:

Windows,Linux

Appliesto:

AllProducts

Versions:

Allversions

Categories:

Solver

Keywords:

solvermemoryparallelcluster

ProblemDescription
Thissolutiondescribeshowyouenabledistributedparallelization(clusterjobs)inCOMSOL.

Solution
COMSOLsupportstwomutualmodesofparalleloperation:sharedmemoryparalleloperationsanddistributedmemoryparalleloperations,includingcluster
support.Thissolutionisdedicatedtodistributedmemoryparalleloperations.Forsharedmemoryparalleloperations,seeSolution1096.
COMSOLcandistributecomputationsoncomputeclustersusingtheMPImodel.Onelargeproblemcanbedistributedacrossmanycomputenodes.Also,
parametricsweepscanbedistributedwithindividualparametercasesdistributedtoeachclusternode.
ClustercomputingissupportedonWindows(WindowsHPCServer2008/R2)andLinux,includingcommonschedulerslikeLSF,PBS,andSunGridEngine
(SGE,alsoknownasOracleGridEngine).Asofversion4.3,COMSOLbydefaultusesHydratoinitializetheMPIenvironmentonLinux.
NOTE:touseCOMSOLonacomputecluster,youneedtheFloatingNetworkLicense(FNL)option.
Atthebottomofthispagearequickguidesthatexplainhowtogetstartedwithclustercomputing,andhowtogetmoreinformation.
Someusefultipsandtroubleshootingguidesareprovidedbelow.

Fundamentals
Thefollowingtermsoccurfrequentlywhendescribingthehardwareforclustercomputingandsharedmemoryparallelcomputing:
Computenode:Thecomputenodesarewherethedistributedcomputingoccurs.TheCOMSOLserverresides
inacomputenodeandcommunicateswithothercomputenodesusingMPI(messagepassinginterface).
Host:Thehostisahardwarephysicalmachinewithanetworkadapteranduniquenetworkaddress.Thehost
ispartofthecluster.Itissometimesreferredtoasaphysicalnode.
Core:Thecoreisaprocessorcoreusedinsharedmemoryparallelismbyacomputationalnodewithmultiple
processors.

Thenumberofusedhostsandthenumberofcomputationalnodesareusuallythesame.Forsomespecialproblemtypes,likeverysmallproblemswithmany
parameters,itmightbebeneficialtousemorethanonecomputationalnodeononehost.

Clusterdistribution,WindowsandLinux
ExamplemodelsforclustertestingareincludedintheModelLibrary:
COMSOL_Multiphysics/Tutorial_Models/micromixer_cluster
COMSOL_Multiphysics/Tutorial_Models/thermal_actuator_jh_distributed

Troubleshooting
https://www.comsol.co.in/support/knowledgebase/1001/

1/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Yourfirststopistomakesureyouhavethelatestreleaseinstalled.Thelatestreleasecanbedownloadedhere.AlsodoHelp>CheckforUpdatestoinstallthe
latestsoftwareupdates.Thelatestupdatesarealsoavailablefordownloadhere.
ErrormessagesrelatingtoGTK
GLibGObjectWARNING**:invalid(NULL)pointerinstance
GLibGObjectCRITICAL**:g_signal_connect_data:assertion`G_TYPE_CHECK_INSTANCE(instance)'failed
GtkCRITICAL**:gtk_settings_get_for_screen:assertion`GDK_IS_SCREEN(screen)'failed
...

TheseerrorstypicallyoccurwhenCOMSOL'sJavacomponentistryingtodisplayanerrormessageinagraphicalwindow,butthereisnographicaldisplay
available.Therecommendedsolutionistodisablefilelocking.Addtherow
Dosgi.locking=none

tothreeofCOMSOL's*.iniconfigurationfiles.Openthefollowingfilesinatexteditor:
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolcluster.ini
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolclustermphserver.ini
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolclusterbatch.ini

IneachofthesefilesyouwillfindseveralDosgi.*rows.AddtheDosgi.locking=nonerowdirectlybelowthese.Pleasenotethattheoptionsarecasesensitive.
Checkthatthenodescanaccessthelicensemanager
Linux:Logintoeachnodeandrunthethecommand
comsolbatchinputfile/usr/local/comsol50/multiphysics/models/COMSOL_Multiphysics/EquationBased_Models/point_source.mphoutputfileout.mph

Thecommandaboveshouldbeissuedononeline./usr/local/comsol50isassumedtobeyourCOMSOLinstallationdirectory.The
/usr/local/comsol50/multiphysics/bindirectory,wherethecomsolscriptislocated,isassumedtobeincludedinthesystemPATH.Makesureyouhavewrite
permissionsfor./out.mph.Noerrormessagesshouldbeproduced,oryoumayhavealicensemanagerconnectivityproblem.
WindowsHPCS:LogintoeachnodewithremotedesktopandstarttheCOMSOLDesktopGUI.Noerrormessagesshouldbedisplayed.
IssueswithInfinibandbasedLinuxclusters
UpdatetheInfinibanddriverstothelatestsoftwareversion.Ifyoucannotupdateatthistime,addthecommandlineoptionsmpifabricsshm:tcpormpifabrics
tcp.ThiswilluseTCPforcommunicationbetweennodes.
FormoreinformationadviceonhowtotroubleshootInfinibandissues,pleaserefertothesectionTroubleshootingDistributedCOMSOLandMPIintheCOMSOL
MultiphysicsReferenceManual.
ProblemswiththeClusterComputingfeatureinthemodeltree
Ifyougettheerrormessage"Processstatusindicatesthatprocessisrunning",itmeansthatthe*.statusfileinthebatchdirectoryindicatesthatthepreviousjobis
stillrunning.Insomecasesthiscanhappenevenifthejobisnotactuallyrunning,forexampleifthejobhaltedorwasterminatedinanuncontrolledway.Towork
aroundthisproblem,performthesesteps:
CancelanyrunningjobsintheWindowsHPCSJobmanagerorotherschedulerthatyouuse.
InCOMSOL,gototheExternalProcesspageatthebottomrightcorneroftheCOMSOLDesktop.
ClicktheClearStatusbutton.Iftheerrorstillremains,manuallydeleteallthefilesinthebatchdirectory.

ErrormessagesduetocommunicationproblemsbetweenLinuxnodes
Ifyougeterrormessages,makesurethatthecomputenodescanaccesseachotherovertcp/ipandthatallnodescanaccessthelicensemanagerinordertocheck
outlicenses.IfyourunthesshprotocolbetweenthehostsonaLinuxclusteryouneedtopregeneratethekeysinordertopreventthenodestoaskeachotherfor
passwordsassoonascommunicationisinitiated:
#generatethekeys
sshkeygentdsa
sshkeygentrsa
#copythepublickeytotheothermachine
sshcopyidi~/.ssh/idrsa.pubuser@hostname
sshcopyidi~/.ssh/iddsa.pubuser@hostname

Cloudcomputing
COMSOL4.3aintroducedsupportforcloudcomputingthroughAmazonElasticComputeCloud(AmazonEC2).SeethePDFguideRunningCOMSOLon
theAmazonCloudforfurtherinformation.
HardwareRecommendations
SeetheknowledgebasesolutiononSelectinghardwareforclusters.
https://www.comsol.co.in/support/knowledgebase/1001/

2/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

SeeAlso
SeealsoCOMSOLandMultithreading.
ExampleofLSFjobsubmissionscript
#!/bin/sh
#Rerunprocessifnodegoesdown,butnotifjobcrashes
#Cannotbeusedwithinteractivejobs.
#BSUBr
#Jobname
#BSUBJcomsoltest
#Numberofprocesses.
#BSUBn20
#Redirectscreenoutputtooutput.txt
#BSUBooutput.txt
rmrfoutput.txt
#CreatehostfileforCOMSOL
cat$LSB_DJOB_HOSTFILE|uniq>comsol_hostfile
#LaunchtheCOMSOLbatchjob
comsolclustersimplefcomsol_hostfilebatchinputfilein.mphoutputfileout.mph

ExampleofPBSjobsubmissionscript
#!/bin/bash
###############################################################################
#
exportnn=2
exportnp=8
exportinputfile="simpleParametricModel.mph"
exportoutputfile="outfile.mph"
#
qsubVlnodes=${nn}:ppn=${np}<<__EOF__
#
#PBSNCOMSOL
#PBSqdp48
#PBSo$HOME/cluster/job_COMSOL_$$.log
#PBSe$HOME/cluster/job_COMSOL_$$.err
#PBSrn
#[email protected]<br>
#
echo"
echo"Startingjobat:date"
echo
#
cd${PBS_O_WORKDIR}
echo"Currentworkingdirectoryis:pwd"
#
np=$(wcl<$PBS_NODEFILE)
echo"Runningon${np}processes(cores)onthefollowingnodes:"
cat$PBS_NODEFILE
#
cat$PBS_NODEFILE|uniq>comsol_nodefile
echo"parallelCOMSOLRUN"
comsolclustersimplefcomsol_nodefilebatchmpiargrmkmpiargpbsinputfile$inputfileoutputfile$outputfilebatchlogbatch_COMSOL__$$.log
echo
echo"Jobfinishedat:date"
echo"
#
__EOF__

RelatedFiles
cluster_install_linux_50_169.pdf
cluster_install_linux_50_169.pptx
cluster_install_win_50.pdf
cluster_install_win_50.pptx

1.3MB
1.2MB
1,007KB
1.3MB

Feedback
Documentquality?(poortoexcellent)
Howcanweimprovethisdocument?

https://www.comsol.co.in/support/knowledgebase/1001/

3/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Myemailaddress:(optional)

VerifyEmail

Send

Disclaimer
COMSOLmakeseveryreasonableefforttoverifytheinformationyouviewonthispage.Resourcesanddocumentsareprovidedforyourinformationonly,and
COMSOLmakesnoexplicitorimpliedclaimstotheirvalidity.COMSOLdoesnotassumeanylegalliabilityfortheaccuracyofthedatadisclosed.Any
trademarksreferencedinthisdocumentarethepropertyoftheirrespectiveowners.Consultyourproductmanualsforcompletetrademarkdetails.

Support
SupportCenter
KnowledgeBase
ProductUpdates
ProductDownload
ReleaseNotes
ContactSupport
ReleaseHistory
COMSOLBasedBooks

https://www.comsol.co.in/support/knowledgebase/1001/

4/4

You might also like