This project is now also on the Open Build Service. There you also find rpms
for openSUSE (Leap, Tumbleweed), SLES and CentOS.
This repository contains a fork of the Son of Grid Engine project in conjunction with some documentation and fixes to get the gridengine working on more recent Linux systems.
First you need to build the source or use the binary packages from the Open Build Service.
I only tested openSUSE Leap 15.1 manually. If you manage to get the things running on any other platform, please let me know!
zypper addrepo https://download.opensuse.org/repositories/home:ph03nix:gridengine/openSUSE_Leap_15.1/home:ph03nix:gridengine.repo
zypper refresh
zypper install gridengine
zypper addrepo https://download.opensuse.org/repositories/home:ph03nix:gridengine/openSUSE_Leap_15.2/home:ph03nix:gridengine.repo
zypper refresh
zypper install gridengine
zypper addrepo https://download.opensuse.org/repositories/home:ph03nix:gridengine/SLE_15_SP1/home:ph03nix:gridengine.repo
zypper refresh
zypper install gridengine
zypper addrepo https://download.opensuse.org/repositories/home:ph03nix:gridengine/SLE_15/home:ph03nix:gridengine.repo
zypper refresh
zypper install gridengine
cd /etc/yum.repos.d/
wget https://download.opensuse.org/repositories/home:ph03nix:gridengine/CentOS_7/home:ph03nix:gridengine.repo
yum install gridengine
cd /etc/yum.repos.d/
wget https://download.opensuse.org/repositories/home:ph03nix:gridengine/ScientificLinux_7/home:ph03nix:gridengine.repo
yum install gridengine
Before building make sure you are relaxed and your cup of coffee (or filling of your choice) is full and well temperated.
Then take a deep breath and be prepared for turbulence.
In general the build process consists of the following steps
- Build the dependency tool and create dependencies with
aimk
- Compile with
aimk
The suggested (working) build options are: aimk -no-herd -nosecure -no-java
The included spec file gridengine.spec is meant for use with then SUSE open build service.
This setup works with openSUSE LEAP 15.0 and 15.1
Install the requirements
# zypper install gcc java-1_8_0-openjdk java-1_8_0-openjdk-devel javacc junit ant automake hwloc-devel libopenssl-devel libdb-4_8-devel pam-devel libXt-devel motif-devel xorg-x11-devel
## Notes: * for the openjdk you can also use a more recent version
## * The version libopenssl-1_0_0-devel is required and needs to uninstall the (by default) installed version 1.1
Prepare the environment by executing the bootstrap.sh
script
$ cd sge-8.1.9/source
$ ./scripts/bootstrap.sh -no-secure
Then build the SGE using
# ./aimk -no-herd -nosecure -no-java
The build process takes some time. The generated binaries are (in my case) in the LINUXAMD64
folder in sources
Now install the binaries to SGE_ROOT
:
# export SGE_ROOT="/opt/sge/" ## Or whereever you want to install the grid engine to
# mkdir /opt/sge/ ## create target directory
# scripts/distinst -local -allall -noexit ## asks for confirmation
# cd $SGE_ROOT
# ./inst_sge -m -x -csp
As of now, the gui_installer
does not work, as we do not have the izPack
Package included.
Done
Instructions updated on 18.01.2019
IMPORTANT: Please build the SGE not under root! I encountered some cryptic linker errors as root, that disappeared when building as unprivileged user. Also ... (shame on me!) you should never build as root anyways ...
Install Requirements with
# yum install csh java-1.8.0-openjdk java-1.8.0-openjdk-devel gcc ant automake hwloc-devel openssl-devel libdb-devel pam-devel libXt-devel motif-devel ncurses-libs ncurses-devel
Then, as unprivileged user, go into a tmux
or screen
session and start the building process with
$ cd sge-8.1.9/source
$ ./scripts/bootstrap.sh
$ ./aimk -no-herd -no-java
## No HADOOP support and no Java support
## Note Java is not needed for qmon!
If you encounter some cryptic linker errors (undefined reference to tputs, tgoto, ecc.) make sure you build as unprivileged user!
The build process takes some time. The generated binaries are (in my case) in the LINUXAMD64
folder in sources
Now install the binaries to SGE_ROOT
:
# export SGE_ROOT="/opt/sge/" ## Or whereever you want to install the grid engine to
# mkdir /opt/sge/ ## create target directory
# scripts/distinst -local -allall -noexit ## asks for confirmation
# cd $SGE_ROOT
# ./inst_sge -m -x -csp ## or run '# ./start_gui_installer'
Done.
For the graphical installer, you need to run aimk
with java support. For that you will need the following additional dependencies
# yum install ant-junit junit javacc
Then building should work with
$ ./scripts/bootstrap.sh
$ ./aimk -no-herd
If you get Java version errors, please adjust build.properties
for your needs.
In order to make SGE run, you will need to open the following ports
firewall-cmd --add-port=992/udp --permanent
firewall-cmd --add-port=6444/tcp --permanent
firewall-cmd --add-port=6445/tcp --permanent
firewall-cmd --reload
Important: Make sure, your local hostname is present int /etc/hosts
, otherwise you run into problems during the installation.
In case you want to use OpenMPI, make sure to compile OpenMPI with --with-sge
support.
In case you are using Spack, compile OpenMPI with schedulers="sge" on
spack install openmpi%[email protected] schedulers="sge"
You will also need to set control_slaves
and job_is_first_task
to true
$ qconf -sp openmpi
pe_name openmpi
slots 1024
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
qsort_args NONE
../sh.proc.c:153:16: error: storage size of ‘w’ isn’t known
union wait w;
This error was the whole reason for forking the repository. Comment out line 51 in ``sge-8.1.9/source/3rdparty/qtcsh/sh.proc.c` as follows:
50: #if defined(_BSD) || (defined(IRIS4D) && __STDC__) || defined(__lucid) || defined(linux) || defined(__GNU__) || defined(__GLIBC__)
51: //# define BSDWAIT
52: #endif /* _BSD || (IRIS4D && __STDC__) || __lucid || glibc */
I encountered this error when building as root. Try building as unprivileged user (which you should do anyways!)
Some weird java version not supported errors occurred to me, when building on OpenSuSE 15 LEAP. Edit the file build.properties
and put there a more recent Java version like
# sge-8.1.9/source/build.properties
javac.debug=true
javac.deprecated=true
default.sge.javac.source=1.6
default.sge.javac.target=1.6
jgdi.javac.source=1.6
jgdi.javac.target=1.6
jjsv.javac.source=1.6
jjsv.javac.target=1.6
hadoop.javac.source=1.6
hadoop.javac.target=1.6
That should fix the issue.
Symptoms for this issue are or that the qmaster script doesn't start in the installation routine, or you get errors like
error resolving local host: can't resolve host name (h_errno = HOST_NOT_FOUND)
Another symptom are errors related to act_qmaster
.
Make sure, your hostname resolved to your local IP and vice-versa by editing your /etc/hosts
accordingly
Example (Assuming your hostname is masternode.gridengine.whatever
)
## /etc/hosts
[...]
# IP-Address Full-Qualified-Hostname Short-Hostname
#
127.0.0.1 localhost
192.168.0.100 masternode.gridengine.whatever