- - Provide FQDN of system.
- cloudvirt1042.eqiad.wmnet
- - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
- Server is out of service and can be worked at any time.
- - Put system into a failed state in Netbox.
- - Provide urgency of request, along with justification (redundancy, dependencies, etc)
- No particular urgency.
- - Describe issue and/or attach hardware failure log. (Refer to https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook if you need help)
- Server fails to boot after a reimage.
- - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.
T364984: cloudvirt1041: can't boot after reimage was this exact same issue on an another cloudvirt from the same batch. That one was fixed by upgrading the NIC firmware to 21.81.
I tried to upgrade the firmware myself, but the cookbook (sudo cookbook sre.hardware.upgrade-firmware -c nic --new cloudvirt1042.eqiad.wmnet) only shows an option to upgrade to 22.9 which AIUI has other issues and should be avoided for now. Is there a way I can upgrade the firmware by myself? If not, could you please upgrade the firmware on this server? I'm happy to take care of the reimage after it's been upgraded etc.