Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address issues with host switcher removal before releasing Cockpit with this feature #20823

Open
garrett opened this issue Jul 31, 2024 · 12 comments
Assignees

Comments

@garrett
Copy link
Member

garrett commented Jul 31, 2024

I wasn't consulted about the design upfront, during the process, or even at the end of this PR... I was told that the technical details and what we can do were still being discussed, even as recent as yesterday, and that we haven't decided what will be done. It was also not obvious that this would be merged when it was.

From a user point of view, this is also very problematic for several reasons:

  • Firstly, it mysteriously breaks things for anyone who has used Cockpit ever before, even if that would be another machine.
  • It doesn't change the UI properly for the top-level switcher component, so it looks broken on a machine without hosts.
  • On a machine with hosts, it has some very weird and clunky UI that is nonstandard and doesn't explain why this is now "broken".
  • There has been no warning whatsoever. The functionality is suddenly broken. There hasn't been a grace period.
  • It's not obvious within Cockpit on how to get the functionality back meanwhile.
  • We were specifically told, many times, that people use this functionality and depend upon it. Even as recently as the call we had with a customer.
  • If this was meant to fly under the radar for a RHEL release, then the defaults for RHEL should be different from the defaults overall. This instantly breaks it for everyone.

We need to address all these points before making a release with this PR merged.

Originally posted by @garrett in #20749 (comment)

@garrett garrett added the release-blocker Targetted for next release label Jul 31, 2024
@garrett
Copy link
Member Author

garrett commented Jul 31, 2024

Let's figure out what it would mean to fix this in such a way where it's not blocking release:

  1. Re-enable the default settings everywhere for now, except for RHEL10 beta and Rawhide.
  2. I'll try to polish up what we have for RHEL, I guess?
  3. Mention it in the release notes, but don't have it be the default yet.
  4. Discuss about how to message this for future releases (both within Cockpit and with a blog post), with a "sunset" timeline. This should include alternatives.
  5. From what I understand, we're talking about having bastion hosts on the login screen. We should enhance that so it would remember the hosts, similar to what we do with Cockpit Client. Right?

Timeline: This should all be finished before mid-October?

Unblocking release would be steps 1 - 3.

@martinpitt
Copy link
Member

Re-enable the default settings everywhere for now, except for RHEL10 beta and Rawhide.

The delta is effectively "keep it on for Fedora 40". Arch is a rolling release like rawhide, and there it should be disabled first.

I'll try to polish up what we have for RHEL, I guess?

Note that we were running out of time about disabling the feature for RHEL 10. We can change the looks of it all the way through the next 4 years.

Mention it in the release notes, but don't have it be the default yet.

Not sure what this means. The relnotes enumerate exactly which ones change (development series) and which don't (LTS/stable releases).

Discuss about how to message this for future releases (both within Cockpit and with a blog post), with a "sunset" timeline. This should include alternatives.

The release note in #20749 does list the alternatives that we have.

From what I understand, we're talking about having bastion hosts on the login screen. We should enhance that so it would remember the hosts, similar to what we do with Cockpit Client. Right?

That seems fine, yes.

Timeline: This should all be finished before mid-October?

There's nothing special about mid-October. The only time-critical thing is point 1 (but that's also trivial to do)

@garrett
Copy link
Member Author

garrett commented Jul 31, 2024

@mvollmer and had a meeting and took notes.


Adapting to the host switcher removal

Goals

  • Are we switching this off for everyone in the future or just RHEL (with warnings for others)?
    • Are we going to have a big warning (better than we do now?), if so, we can leave it on and just make people more aware of the problem
    • RHEL is different from everything else, it will probably need to be off

Planning

  • Immediate term (before release)

    • Changing configs for distros: Having host switcher configured to be on by default for most all distros, except in RHEL10 beta (and perhaps Rawhide? Not sure): It should act similar to what we have released by default (for now, at least )
      • (we can delay the release, if needed)
    • Mention it in release notes already?
      • Have a separate blog post about it too?
    • Remove the “mixed mode” where you can access, edit, and delete but not add. If you can use it, you should be able to fully use it. It should be either on or off, not in-between.
      • (...or hide, like keeping the code but not have it active… but I think we should probably outright remove it, as I don’t think we actually want it in the long run either)
  • Short term (after release, weeks)

    • Make the warning more obvious.
      • Perhaps on login (although this would allow the JS to execute, so probably not)?
      • Redesign modal, at least somewhat? With a more obvious warning?
        • It’s only in the host key dialog right now, which is not enough…
      • Warning modal dialog to prevent immediate connecting, with the option to proceed or not, that explains the problem succinctly and provides a choice
        • This should not be active on every connection, but how?
          • Once per session? So additional connections shouldn’t see it?
          • Set a flag in session storage? But we’d need a way to undo the
  • Longer term (before October)

    • UI design for shell without host switcher
      • In other words: Make the visuals of the shell more obvious (make it look a little different where there’s a complete top bar when the feature isn’t on, for example); don’t make it look the same “but broken”
      • This would probably only affect Cockpit when the host switcher is turned off (on RHEL or opted-in)
    • Enhance the login screen to remember the previous hosts that have been connected to
      • Cockpit Client already does this; we should replicate similar behavior for bastion logins

@garrett
Copy link
Member Author

garrett commented Jul 31, 2024

The modal that prevents connecting should probably mention the alternatives in the top comment @ #20749 (comment):

  • The login page offers a "Connect to:" field which will directly connect to the given host with SSH. This is safe and will always be supported. However, this only supports user/password or Kerberos authentication, not SSH keys.
  • If you use a Linux desktop, consider using the Cockpit Client flatpak. That can use password and SSH key authentication to any SSH target, and gives you a Cockpit session even for machines which don't have any Cockpit related packages installed.
  • If you have a custom page which wants to use channels (commands, file operations, D-Bus calls, etc.) on a remote machine, then please use the cockpit-connect-ssh library to set up the SSH connection instead of relying on the host switcher.

So people using Cockpit are aware and can transition over, instead of just outright breaking their years-old workflows with no warning whatsoever.

@martinpitt
Copy link
Member

Are we switching this off for everyone in the future or just RHEL (with warnings for others)?

"Off" on the upcoming releases, "keep on" for existing stables.

RHEL is different from everything else, it will probably need to be off

Nothing about this problem is RHEL specific. RHEL 9 is a place where we can't turn off the host switcher without (rightfully) risking the anger of customers, but we have no such obligation for Fedora (and not even for RHEL 10 beta for the next two weeks -- afterwards the feature goes public).

Also, if we have a solution that is good enough for RHEL, it's surely good enough for all the other OSes.

Warning modal dialog to prevent immediate connecting, with the option to proceed or not, that explains the problem succinctly and provides a choice

That's what we've had since 2023.

Are we going to have a big warning (better than we do now?), if so, we can leave it on and just make people more aware of the problem

The first version of the warning introduced in #19409 had a more scary appearance FTR. But you know full well user psychology -- that doesn't really work well. We've had a recent customer call and despite that warning they were absolutely shocked when we explained the potential impact.

Re-enabling this needs to take more effort than ignoring a little yellow warning line. Hence the config option: you have to read the documentation for it, have more information, and are in much less of a psychological hurry to get to you sweets than in the dialog.

So people using Cockpit are aware and can transition over, instead of just outright breaking their years-old workflows with no warning whatsoever.

Yes, that's my concern as well, and it collides with "Remove the “mixed mode”" -- that is the much more painful part. Ripping it out right away gives people no transition period for existing setups.

This should not be active on every connection, but how? Once per session? So additional connections shouldn’t see it? Set a flag in session storage?

This dialog is only shown for hosts unknown to SSH. Once you ack the fingerprint, it's marked as trusted.

FTR: I've announced the analysis in https://issues.redhat.com/browse/COCKPIT-870 multiple times in our weekly meetings since May, and begged for feedback. I also discussed it in Brno (granted, neither Marius nor Garrett was there), with just about zero interest. But oh well that's fine -- at least now the pressure is to justify introducing weaknesses again, instead of justifying to plug them.

@garrett
Copy link
Member Author

garrett commented Aug 1, 2024

Warning modal dialog to prevent immediate connecting, with the option to proceed or not, that explains the problem succinctly and provides a choice

That's what we've had since 2023.

No. Not at all

I thought we have a warning during the key exchange, not when connecting to existing hosts? Someone who already has added hosts in the past would never see the issue if we remove the only place where the warning is shown from being accessed.

I've announced the analysis in https://issues.redhat.com/browse/COCKPIT-870 multiple times in our weekly meetings since May, and begged for feedback

You did not ask for design feedback specifically, and whenever I asked about it from a design standpoint, I was always told that you were still hashing out the implementation details and weren't sure which direction it would go... this was even as recent as this past Monday in the meeting and also even Tuesday in Matrix. I have so many other things to work on, and if something that is still talked about as being in an "exploratory" prototype stage to see what's "technically possible" and design consideration is also dismissed the multiple times when I keep asking, then I really have to still concentrate on other things.

I also discussed it in Brno

Right, Marius and I weren't there in person, and this also wasn't mentioned on the video call at all, which I did attend as much as possible while you all had the meeting.

Nothing about this problem is RHEL specific

Right, but the problem is also mainly theoretical, as a machine that someone connects to has to also be compromised. Obviously, an Enterprise environment is different from a hobby environment (home servers and such), and restricting it in RHEL10 like this makes sense. And businesses migrating from RHEL9 (and prior) to RHEL10 is a big deal and will have lots of changes anyway, so they will expect some things to be different from before.

But we shouldn't handle a removal of a feature in the community side of things by immediately breaking it halfway, especially in a non-obvious way, for non-obvious reasons. We need to provide a migration path and let people ahead of time. And we need to let them know why. Not everyone reads the docs or our website (and, in fact, I'd argue most don't, as that's the way it generally works for nearly all projects and products)... most people will see the change in Cockpit, be confused as to why things are broken and then not know what to do, causing frustration. We can, and should, handle this better as to avoid that.

@garrett
Copy link
Member Author

garrett commented Aug 1, 2024

As a community project, we have good will from people using it in the community. Breaking a popular feature we've had for years without warning means we're burning some good will from people, and I want for us to avoid that.

(It's totally different compared to a big RHEL upgrade, where admins expect and plan for changes. And the changes are batched up over years across lots of projects.)

@martinpitt
Copy link
Member

Warning modal dialog to prevent immediate connecting, with the option to proceed or not, that explains the problem succinctly and provides a choice

That's what we've had since 2023.

No. Not at all

I thought we have a warning during the key exchange, not when connecting to existing hosts? Someone who already has added hosts in the past would never see the issue if we remove the only place where the warning is shown from being accessed.

Right, because it'd be silly to warn about connecting to hosts which you already know and connected to in the past.

You did not ask for design feedback specifically, and whenever I asked about it from a design standpoint

"Design" in the sense of "what do we do about this whole topic: That's all I ever talked about. I laid out an analysis and possible directions. "Design" in the sense of "pixels", no -- there are many aspects of this (magic triangle choices and priorities, existing users, workflows, what are we able to support, providing alternatives for RHCERT and talking to them, etc.), but I never considered dropping that triangle as a "design" thing.

I sent PR #20829 to revert this and assigned https://issues.redhat.com/browse/COCKPIT-870 to @garrett and @mvollmer for continuing this.

With my almost-zero availability in the past and next few weeks, and the obvious turmoil I created, I think it's better if I reset to the previous state and we start from scratch. And as I've been unable to raise any interest about this in the past few months, I obviously did that wrong. Sorry about that, and I hope you have more luck with this.

Please talk your solution over with https://issues.redhat.com/secure/ViewProfile.jspa?name=rhn-engineering-thoger

Thanks!

@garrett
Copy link
Member Author

garrett commented Aug 1, 2024

Right, because it'd be silly to warn about connecting to hosts which you already know and connected to in the past.

It's just as theoretical that those machines could've been hacked as a new machine.

With my almost-zero availability in the past and next few weeks

I hope you're feeling better soon! Sorry for the added drama!

@tek-aevl
Copy link

tek-aevl commented Aug 11, 2024

I need an alternative to do this key feature, that I use for virtual machines being removed? Do I now have to be restricted to tunneling each device that I was originally using this for?

@martinpitt
Copy link
Member

@tek-aevl https://cockpit-project.org/blog/cockpit-322.html lists three alternatives and how to turn it back on -- but you must know what you are doing.

@tek-aevl
Copy link

tek-aevl commented Aug 13, 2024

Looks simple enough, my main issue would be that tye keyboard does not always launch when selecting a machines vnc console, therfore this was the back up way into my virtual machines.

@martinpitt martinpitt removed the release-blocker Targetted for next release label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants