Skip to content

InvalidImageName error could be moved to API Validation. #115736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kannon92
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously InvalidImageName is possible if your image has all capitals, fails docker regular expression checks. This moves these checks to the API when creating the pod. Avoids creating this resource.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add API Validation for container image names.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

kubernetes/enhancements#3816


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 13, 2023
@kannon92
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 13, 2023
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also include the removal of the kubelet code in the same PR, so we can see it is 1-for-1 ?

@kannon92
Copy link
Contributor Author

Can you also include the removal of the kubelet code in the same PR, so we can see it is 1-for-1 ?

I don't think we can remove it in kubelet. I'll paste the kubelet code here. I think there is a bit more in the Kubelet code than just doing validation:

	// If the image contains no tag or digest, a default tag should be applied.
	image, err := applyDefaultImageTag(container.Image)
	if err != nil {
		msg := fmt.Sprintf("Failed to apply default image tag %q: %v", container.Image, err)
		m.logIt(ref, v1.EventTypeWarning, events.FailedToInspectImage, logPrefix, msg, klog.Warning)
		return "", msg, ErrInvalidImageName
	}
// applyDefaultImageTag parses a docker image string, if it doesn't contain any tag or digest,
// a default tag will be applied.
func applyDefaultImageTag(image string) (string, error) {
	named, err := dockerref.ParseNormalizedNamed(image)
	if err != nil {
		return "", fmt.Errorf("couldn't parse image reference %q: %v", image, err)
	}
	_, isTagged := named.(dockerref.Tagged)
	_, isDigested := named.(dockerref.Digested)
	if !isTagged && !isDigested {
		// we just concatenate the image name with the default tag here instead
		// of using dockerref.WithTag(named, ...) because that would cause the
		// image to be fully qualified as docker.io/$name if it's a short name
		// (e.g. just busybox). We don't want that to happen to keep the CRI
		// agnostic wrt image names and default hostnames.
		image = image + ":latest"
	}
	return image, nil
}

@kannon92 kannon92 force-pushed the validate-docker-image branch from 01f8a40 to e44fe3c Compare February 13, 2023 19:01
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 13, 2023
@kannon92 kannon92 force-pushed the validate-docker-image branch from e44fe3c to 8f21b97 Compare February 13, 2023 19:45
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2023
@kannon92 kannon92 force-pushed the validate-docker-image branch from 8f21b97 to 9a19a9f Compare February 13, 2023 19:49
@kannon92 kannon92 changed the title WIP: InvalidImageName error could be moved to API Validation. InvalidImageName error could be moved to API Validation. Feb 13, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2023
allErrs = append(allErrs, field.Invalid(path.Child("image"), ctr.Image, "repository name must be all lowercase"))
} else if strings.Contains(err.Error(), "hexadecimal strings") {
allErrs = append(allErrs, field.Invalid(path.Child("image"), ctr.Image, "repository name must not specify 64-byte hexadecimal strings"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ignore other errors in purpose? If so, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea some of just so I can make sure all unit tests pass.

@derekwaynecarr had a comment in this code about padded whitespace being a case that we shouldn't check:

// TODO: do not validate leading and trailing whitespace to preserve backward compatibility.
	// for example: https://github.com/openshift/origin/issues/14659 image = " " is special token in pod template
	// others may have done similar

I found that our unit tests allow empty image names so I don't check for that and I also avoid this check if we have padded whitespace.

It is interesting that we actually check and fail this for Ephermeral Containers but not for normal containers.

I pushed a change to spell out more cases and report user friendly errors:

The following cases are test cases I added:

  1. capital letters in the image name.
  2. sha256 in the image name
  3. http or https in the image name
  4. and general error that fails the docker check

Copy link
Contributor

@bart0sh bart0sh Feb 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to double check errors returned by ParseNormalizedNamed in geeneral.
What would happen if ParseNormalizedNamed error messages are changed or new checks are added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the best approach here.

#115736 (comment) was one suggestion to better align these validation messages with the current approach.

In general I think we would want more descriptive error messages if this validation fails. The ParseNormalizedNamed will return invalid reference format for all these cases. We double check errors so we could give a more descriptive error in the validation message.

From a user I usually find descriptive errors to be much more helpful than a general one.

I do have unit tests around this function so that we can make sure that we translate cases to error messages

Copy link
Contributor

@bart0sh bart0sh Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseNormalizedNamed will return invalid reference format for all these cases.

It will return "invalid reference format: repository name must be lowercase" or "invalid repository name (%s), cannot specify 64-byte hexadecimal strings" or "reference %s has no name", which are quite descriptive from my point of view. If you find them less descriptive than needed I'd suggest to submit a PR for github.com/docker/distribution/digestset.

If new checks are added to theParseNormalizedNamed API we wouldn't notice it here and consider them as "image is an invalid reference type". This is not what we want, I believe.

#115736 (comment) was one suggestion to better align these validation messages with the current approach

Would it help to include error message returned by the ParseNormalizedNamed into the Kubernetes validation error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin and you have very similar thoughts!

#115736 (comment)

I'm not as sure about this code. Is it appropriate to apply a default tag in the validation logic?

I could probably move this utility to validation.go and maybe remove it from kubelet image_manager but not sure what problems that would cause?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking about this code: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_image.go#L35
Kubeadm may also have the code that pulls images. Other k/k components can also have something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty new to Kubernetes so I would need help on where to look. Where is the code for Kubeadm?

I found out that ParseImageName is used in most places and we only use a special case in image_manager because we don't want the docker url. So the author uses ParseNormalizedNames so they can ignore the fact that ParseImageName returns the full docker repo name.

I updated the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises an interesting point!

Kubelet can create Pods from static files on disk, so it needs to do its own validation, but I don't honestly know if it is better to create such a pod with an error condition (which is very visible to the user) or to fail the creation, which will be buried in the kubelet logs.

Perhaps this is why InvalidImageName is implemented as it is today? I don't know the history of that, and we should be careful not to break a UX that was, maybe, intentional.

@derekwaynecarr or @dchen1107 or @smarterclayton may recall the history?

Otherwise we should do some archaeology (git blame + git log) to see if there were discussions about this.

I hope I didn't send you on a wild goose chase, but non-zero chance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should put this PR up as a topic of discussion in sig-node.

@bart0sh
Copy link
Contributor

bart0sh commented Feb 17, 2023

/triage accepted
/priority important-longterm
/assign

@kannon92 kannon92 force-pushed the validate-docker-image branch from 04e27df to 7a90200 Compare February 21, 2023 14:32
@kannon92 kannon92 force-pushed the validate-docker-image branch from 7a90200 to 3e27cea Compare February 21, 2023 18:54
@kannon92 kannon92 force-pushed the validate-docker-image branch from 3e27cea to b388b8b Compare February 22, 2023 15:09
@kannon92
Copy link
Contributor Author

/retest

@@ -209,7 +239,9 @@ func TestParallelPuller(t *testing.T) {
fakeRuntime.CalledFunctions = nil
fakeClock.Step(time.Second)
_, _, err := puller.EnsureImageExists(ctx, pod, container, nil, nil)
fakeRuntime.AssertCalls(expected.calls)
if len(expected.calls) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannon92 This looks like we're ignoring calls if we're not expecting them. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand how we do mocking of functions here.

I found that all the calls use GetImageRef and I notice that is a function in the ImageService. I'm not sure if I have to refactor this and add functions to the TestRuntime in order to properly mock them.

So I found that if I don't put an expect call I can still assert against errors that aren't present in the FakeRuntime.

func (f *FakeRuntime) GetImageRef(_ context.Context, image kubecontainer.ImageSpec) (string, error) {
	f.Lock()
	defer f.Unlock()

	f.CalledFunctions = append(f.CalledFunctions, "GetImageRef")
	for _, i := range f.ImageList {
		if i.ID == image.Image {
			return i.ID, nil
		}
	}
	return "", f.InspectErr
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like a bug in the FakeRuntime init code. You can try to apply this patch to your PR branch. It should fix panics and reveal test failures:

diff --git a/pkg/kubelet/images/image_manager_test.go b/pkg/kubelet/images/image_manager_test.go
index 64f3e1d9738..faf3062d3ec 100644
--- a/pkg/kubelet/images/image_manager_test.go
+++ b/pkg/kubelet/images/image_manager_test.go
@@ -196,7 +196,7 @@ func (m *mockPodPullingTimeRecorder) RecordImageStartedPulling(podUID types.UID)
 
 func (m *mockPodPullingTimeRecorder) RecordImageFinishedPulling(podUID types.UID) {}
 
-func pullerTestEnv(c pullerTestCase, serialized bool) (puller ImageManager, fakeClock *testingclock.FakeClock, fakeRuntime *ctest.FakeRuntime, container *v1.Container) {
+func pullerTestEnv(t *testing.T, c pullerTestCase, serialized bool) (puller ImageManager, fakeClock *testingclock.FakeClock, fakeRuntime *ctest.FakeRuntime, container *v1.Container) {
        container = &v1.Container{
                Name:            "container_name",
                Image:           c.containerImage,
@@ -207,7 +207,7 @@ func pullerTestEnv(c pullerTestCase, serialized bool) (puller ImageManager, fake
        fakeClock = testingclock.NewFakeClock(time.Now())
        backOff.Clock = fakeClock
 
-       fakeRuntime = &ctest.FakeRuntime{}
+       fakeRuntime = &ctest.FakeRuntime{T: t}
        fakeRecorder := &record.FakeRecorder{}
 
        fakeRuntime.ImageList = []Image{{ID: "present_image:latest"}}
@@ -231,17 +231,15 @@ func TestParallelPuller(t *testing.T) {
 
        useSerializedEnv := false
        for _, c := range cases {
-               puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+               puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
 
                t.Run(c.testName, func(t *testing.T) {
                        ctx := context.Background()
                        for _, expected := range c.expected {
-                               fakeRuntime.CalledFunctions = nil
+                               fakeRuntime.ClearCalls()
                                fakeClock.Step(time.Second)
                                _, _, err := puller.EnsureImageExists(ctx, pod, container, nil, nil)
-                               if len(expected.calls) > 0 {
-                                       fakeRuntime.AssertCalls(expected.calls)
-                               }
+                               fakeRuntime.AssertCalls(expected.calls)
                                assert.Equal(t, expected.err, err)
                        }
                })
@@ -261,7 +259,7 @@ func TestSerializedPuller(t *testing.T) {
 
        useSerializedEnv := true
        for _, c := range cases {
-               puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+               puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
 
                t.Run(c.testName, func(t *testing.T) {
                        ctx := context.Background()
@@ -322,7 +320,7 @@ func TestPullAndListImageWithPodAnnotations(t *testing.T) {
                }}
 
        useSerializedEnv := true
-       puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+       puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
        fakeRuntime.CalledFunctions = nil
        fakeRuntime.ImageList = []Image{}
        fakeClock.Step(time.Second)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I’ll apply that and test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to move some of this into a separate PR: #116231

This was mostly so we can discuss the feature separate from some test fixes and additions.

@kannon92 kannon92 force-pushed the validate-docker-image branch from b388b8b to 74033ca Compare February 28, 2023 13:50
@@ -3192,6 +3194,17 @@ func validateContainerCommon(ctr *core.Container, volumes map[string]core.Volume
allErrs = append(allErrs, field.Required(path.Child("image"), ""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this function also run on the pod templates when creating Jobs, Deployments, etc?

Is it acceptable to also fail the creation of those, or just Pods themselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pod was never going to be viable, it seems better to fail early? On the other hand, if someone was using this as a template (e.g. doing custom admission control as the pod is created to change the template value into a real value) then it would break them.

Do we have any evidence or anecdata?

@kannon92 kannon92 force-pushed the validate-docker-image branch from 74033ca to 6c68cef Compare February 28, 2023 20:16
@kannon92
Copy link
Contributor Author

/hold I am open for people to review but I think maybe I should take this to sig-node and discuss.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2023
@kannon92 kannon92 force-pushed the validate-docker-image branch from 0afbe55 to 69f9aa7 Compare March 2, 2023 20:41
@kannon92
Copy link
Contributor Author

kannon92 commented Mar 2, 2023

/retest

@dchen1107
Copy link
Member

/assign @dchen1107

@kannon92
Copy link
Contributor Author

kannon92 commented Mar 7, 2023

/retest

@kannon92
Copy link
Contributor Author

kannon92 commented Mar 7, 2023

I decided to move a lot of the test increases and utility cleanup to #116231.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh
Copy link
Contributor

bart0sh commented Apr 18, 2023

@kannon92 If you still need this PR then please rebase, if not, please close the PR

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 18, 2023

From a backwards compatibility perspective, image name was intended to be passed through the system and interpreted by the CRI / container runtime directly, without being parsed. Very early on we choose not to parse it so that evolutions in the image pull spec (in Docker or OCI) could occur safely, and it was intended to be an opaque value from the Kube perspective.

I don't know if anyone has added parsing outside of the Kubelet, but I would probably be against tightening validation on the apiserver because we have no way of knowing the full use of it (people may use invalid image names as placeholders and use webhooks to resolve them at the last minute), we generally don't tighten validation on fields for forwards compatibility, and it would limit our ability to support more diverse runtime types in the future.

I do think the Kubelet should return a very clear error Reason via pod conditions to indicate that the image value was invalid, and we should define that in CRI at a minimum (CRI should return a well known error when the image spec is invalid).

The last time we discussed this was when we added image digest to the pod status,

#7203 (comment) is one of the earlier places we discussed it, #1697 describes the core issue.

@kannon92
Copy link
Contributor Author

From a backwards compatibility perspective, image name was intended to be passed through the system and interpreted by the CRI / container runtime directly, without being parsed. Very early on we choose not to parse it so that evolutions in the image pull spec (in Docker or OCI) could occur safely, and it was intended to be an opaque value from the Kube perspective.

I don't know if anyone has added parsing outside of the Kubelet, but I would probably be against tightening validation on the apiserver because we have no way of knowing the full use of it (people may use invalid image names as placeholders and use webhooks to resolve them at the last minute), we generally don't tighten validation on fields for forwards compatibility, and it would limit our ability to support more diverse runtime types in the future.

I do think the Kubelet should return a very clear error Reason via pod conditions to indicate that the image value was invalid, and we should define that in CRI at a minimum (CRI should return a well known error when the image spec is invalid).

The last time we discussed this was when we added image digest to the pod status,

#7203 (comment) is one of the earlier places we discussed it, #1697 describes the core issue.

Thank you for your explanation. I will goahead and close this PR but I appreciate your feedback.

@kannon92 kannon92 closed this Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

9 participants