Page MenuHomePhabricator

cloudvps: shinken project trusty deprecation
Closed, ResolvedPublic

Description

Ubuntu Trusty is no longer available in Cloud VPS since Nov 2017 for new instances. However, the EOL of Trusty is approaching in 2019 and we need to move to Debian Stretch before that date.

All instances in the shinken project needs to upgrade as soon as possible.

The list of affected VMs is:

  • shinken-01.shinken.eqiad.wmflabs

Listed administrator are:

More info in openstack browser: https://tools.wmflabs.org/openstack-browser/project/shinken

Event Timeline

Krenair triaged this task as Medium priority.Sep 17 2018, 4:52 PM
Krenair created this task.

Discussed with Brooke and decided to create a jessie instance - there is no shinken package in stretch.

Change 461957 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Labs monitoring: Authorise new shinken host

https://gerrit.wikimedia.org/r/461957

Change 461982 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] shinken: Only overwrite init.d script on trusty

https://gerrit.wikimedia.org/r/461982

Change 461982 merged by Dzahn:
[operations/puppet@production] shinken: Only overwrite init.d script on trusty

https://gerrit.wikimedia.org/r/461982

I've got it set up at https://shinken-jessie.wmflabs.org but I want https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/461957/ done before continuing.
So far I've had to hack /usr/lib/python2.7/dist-packages/shinken/objects/servicedependency.py to get shinken to run (it doesn't like our servicedependency blocks with dependent_hostgroup_name) and also change our config around a bit.

(The next stage, FYI, is to go through puppet.git and stick conditionals everywhere so that the shinken config is generated correctly on the new machine - then we can enable puppet on the new machine).

So I thought this was working, but the poller has stopped doing its thing and other services have been unable to communicate with it (they'll send HTTP requests to e.g. /ping but never get a response)

Change 461957 merged by Andrew Bogott:
[operations/puppet@production] Labs monitoring: Authorise new shinken host

https://gerrit.wikimedia.org/r/461957

I've tried the following unsuccessfully (timeout on port http://localhost:7771 persists):

  • Upgraded to pycurl 7.43.0.2 via pip in shinken-02 as a test (apt-get install gcc libcurl4-gnutls-dev gnutls-dev && pip install --upgrade pycurl). Reverted to system package afterwards.
  • Added daemon_thread_pool_size 64 to poller's configuration. Removed.

Mentioned in SAL (#wikimedia-cloud) [2018-10-19T20:24:39Z] <gtirloni> temporarily increased project quota (T204562)

Change 468692 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/debs/shinken@master] Initial import of shinken-2.0.3

https://gerrit.wikimedia.org/r/468692

Change 468692 merged by GTirloni:
[operations/debs/shinken@master] Initial import of shinken-2.0.3

https://gerrit.wikimedia.org/r/468692

Change 468792 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie

https://gerrit.wikimedia.org/r/468792

I had given up on fixing Shinken 2.0.3 that ships with Jessie and decided to test 2.4.3 instead. It worked great from the very start but packaging it for Jessie became a can of worms and I abandoned that idea as I've no desire to become a maintainer of Shinken packages, or in UTF-8 terms: ✝️⚔️😈 (yes, that's the holy cross in a sword fight with the devil).

Digging a bit further on 2.0.3 and the alternatives to get the pollers to answer properly, I stumbled upon the passive option which inverts the flow of data between schedule->poller. This seemed to work great and it's the main thing in the proposed change. The whole issue with the standard data flow seems to be that the pollers aren't ready quick enough and skip "broks" (internal inter-daemon requests) received from the scheduler. On top of that, the daemons aren't very smart to recover from errors (and that's what 2.4.3 does better, from what I could see -- errors still happened from time to time but they recovered). There are other changes but it's mostly moving things around to satisfy things that changed in Shinken 2.0.

Overview of the kind of traffic that Shinken generated on the loopback interface for all the inter-daemon communication.

shinken-02.png (537×890 px, 51 KB)

Change 468792 merged by GTirloni:
[operations/puppet@production] shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie

https://gerrit.wikimedia.org/r/468792

Change 469248 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] shinken: Fix webui.cfg template

https://gerrit.wikimedia.org/r/469248

Change 469248 merged by GTirloni:
[operations/puppet@production] shinken: Fix webui.cfg template

https://gerrit.wikimedia.org/r/469248

Change 469249 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] shinken: Fix typo in init.pp

https://gerrit.wikimedia.org/r/469249

Change 469249 merged by GTirloni:
[operations/puppet@production] shinken: Fix typo in init.pp

https://gerrit.wikimedia.org/r/469249

shinken has been migrated to Jessie and upgraded to Shinken 2.0.3 as well.

shinken.wmflabs.org is pointing to shinken-02 now.

shinken-01 (Trusty) has been turned off and will be deleted in 1 week from now.

According to shinken SAL, @GTirloni deleted shinken-01 today. Thanks for sorting this out Giovanni.

Change 472119 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] wmcs: Remove unused shinken-01 definitions

https://gerrit.wikimedia.org/r/472119

Change 472119 merged by GTirloni:
[operations/puppet@production] wmcs: Remove unused shinken-01 definitions

https://gerrit.wikimedia.org/r/472119

I merged https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/461962/ after rebasing and since the dependencies were all merged. It had been in my review queue for some time. Done for completeness. Thanks Alex Monk and GTirloni !