-
Notifications
You must be signed in to change notification settings - Fork 41.2k
InvalidImageName error could be moved to API Validation. #115736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/sig node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also include the removal of the kubelet code in the same PR, so we can see it is 1-for-1 ?
I don't think we can remove it in kubelet. I'll paste the kubelet code here. I think there is a bit more in the Kubelet code than just doing validation:
|
01f8a40
to
e44fe3c
Compare
e44fe3c
to
8f21b97
Compare
8f21b97
to
9a19a9f
Compare
allErrs = append(allErrs, field.Invalid(path.Child("image"), ctr.Image, "repository name must be all lowercase")) | ||
} else if strings.Contains(err.Error(), "hexadecimal strings") { | ||
allErrs = append(allErrs, field.Invalid(path.Child("image"), ctr.Image, "repository name must not specify 64-byte hexadecimal strings")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ignore other errors in purpose? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea some of just so I can make sure all unit tests pass.
@derekwaynecarr had a comment in this code about padded whitespace being a case that we shouldn't check:
// TODO: do not validate leading and trailing whitespace to preserve backward compatibility.
// for example: https://github.com/openshift/origin/issues/14659 image = " " is special token in pod template
// others may have done similar
I found that our unit tests allow empty image names so I don't check for that and I also avoid this check if we have padded whitespace.
It is interesting that we actually check and fail this for Ephermeral Containers but not for normal containers.
I pushed a change to spell out more cases and report user friendly errors:
The following cases are test cases I added:
- capital letters in the image name.
- sha256 in the image name
- http or https in the image name
- and general error that fails the docker check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to double check errors returned by ParseNormalizedNamed
in geeneral.
What would happen if ParseNormalizedNamed
error messages are changed or new checks are added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the best approach here.
#115736 (comment) was one suggestion to better align these validation messages with the current approach.
In general I think we would want more descriptive error messages if this validation fails. The ParseNormalizedNamed
will return invalid reference format
for all these cases. We double check errors so we could give a more descriptive error in the validation message.
From a user I usually find descriptive errors to be much more helpful than a general one.
I do have unit tests around this function so that we can make sure that we translate cases to error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseNormalizedNamed will return invalid reference format for all these cases.
It will return "invalid reference format: repository name must be lowercase" or "invalid repository name (%s), cannot specify 64-byte hexadecimal strings" or "reference %s has no name", which are quite descriptive from my point of view. If you find them less descriptive than needed I'd suggest to submit a PR for github.com/docker/distribution/digestset
.
If new checks are added to theParseNormalizedNamed
API we wouldn't notice it here and consider them as "image is an invalid reference type". This is not what we want, I believe.
#115736 (comment) was one suggestion to better align these validation messages with the current approach
Would it help to include error message returned by the ParseNormalizedNamed
into the Kubernetes validation error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin and you have very similar thoughts!
I'm not as sure about this code. Is it appropriate to apply a default tag in the validation logic?
I could probably move this utility to validation.go and maybe remove it from kubelet image_manager but not sure what problems that would cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about this code: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_image.go#L35
Kubeadm may also have the code that pulls images. Other k/k components can also have something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still pretty new to Kubernetes so I would need help on where to look. Where is the code for Kubeadm?
I found out that ParseImageName
is used in most places and we only use a special case in image_manager because we don't want the docker url. So the author uses ParseNormalizedNames
so they can ignore the fact that ParseImageName
returns the full docker repo name.
I updated the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises an interesting point!
Kubelet can create Pods from static files on disk, so it needs to do its own validation, but I don't honestly know if it is better to create such a pod with an error condition (which is very visible to the user) or to fail the creation, which will be buried in the kubelet logs.
Perhaps this is why InvalidImageName
is implemented as it is today? I don't know the history of that, and we should be careful not to break a UX that was, maybe, intentional.
@derekwaynecarr or @dchen1107 or @smarterclayton may recall the history?
Otherwise we should do some archaeology (git blame + git log) to see if there were discussions about this.
I hope I didn't send you on a wild goose chase, but non-zero chance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should put this PR up as a topic of discussion in sig-node.
/triage accepted |
04e27df
to
7a90200
Compare
7a90200
to
3e27cea
Compare
3e27cea
to
b388b8b
Compare
/retest |
@@ -209,7 +239,9 @@ func TestParallelPuller(t *testing.T) { | |||
fakeRuntime.CalledFunctions = nil | |||
fakeClock.Step(time.Second) | |||
_, _, err := puller.EnsureImageExists(ctx, pod, container, nil, nil) | |||
fakeRuntime.AssertCalls(expected.calls) | |||
if len(expected.calls) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kannon92 This looks like we're ignoring calls if we're not expecting them. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand how we do mocking of functions here.
I found that all the calls use GetImageRef
and I notice that is a function in the ImageService. I'm not sure if I have to refactor this and add functions to the TestRuntime in order to properly mock them.
So I found that if I don't put an expect call I can still assert against errors that aren't present in the FakeRuntime.
func (f *FakeRuntime) GetImageRef(_ context.Context, image kubecontainer.ImageSpec) (string, error) {
f.Lock()
defer f.Unlock()
f.CalledFunctions = append(f.CalledFunctions, "GetImageRef")
for _, i := range f.ImageList {
if i.ID == image.Image {
return i.ID, nil
}
}
return "", f.InspectErr
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like a bug in the FakeRuntime init code. You can try to apply this patch to your PR branch. It should fix panics and reveal test failures:
diff --git a/pkg/kubelet/images/image_manager_test.go b/pkg/kubelet/images/image_manager_test.go
index 64f3e1d9738..faf3062d3ec 100644
--- a/pkg/kubelet/images/image_manager_test.go
+++ b/pkg/kubelet/images/image_manager_test.go
@@ -196,7 +196,7 @@ func (m *mockPodPullingTimeRecorder) RecordImageStartedPulling(podUID types.UID)
func (m *mockPodPullingTimeRecorder) RecordImageFinishedPulling(podUID types.UID) {}
-func pullerTestEnv(c pullerTestCase, serialized bool) (puller ImageManager, fakeClock *testingclock.FakeClock, fakeRuntime *ctest.FakeRuntime, container *v1.Container) {
+func pullerTestEnv(t *testing.T, c pullerTestCase, serialized bool) (puller ImageManager, fakeClock *testingclock.FakeClock, fakeRuntime *ctest.FakeRuntime, container *v1.Container) {
container = &v1.Container{
Name: "container_name",
Image: c.containerImage,
@@ -207,7 +207,7 @@ func pullerTestEnv(c pullerTestCase, serialized bool) (puller ImageManager, fake
fakeClock = testingclock.NewFakeClock(time.Now())
backOff.Clock = fakeClock
- fakeRuntime = &ctest.FakeRuntime{}
+ fakeRuntime = &ctest.FakeRuntime{T: t}
fakeRecorder := &record.FakeRecorder{}
fakeRuntime.ImageList = []Image{{ID: "present_image:latest"}}
@@ -231,17 +231,15 @@ func TestParallelPuller(t *testing.T) {
useSerializedEnv := false
for _, c := range cases {
- puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+ puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
t.Run(c.testName, func(t *testing.T) {
ctx := context.Background()
for _, expected := range c.expected {
- fakeRuntime.CalledFunctions = nil
+ fakeRuntime.ClearCalls()
fakeClock.Step(time.Second)
_, _, err := puller.EnsureImageExists(ctx, pod, container, nil, nil)
- if len(expected.calls) > 0 {
- fakeRuntime.AssertCalls(expected.calls)
- }
+ fakeRuntime.AssertCalls(expected.calls)
assert.Equal(t, expected.err, err)
}
})
@@ -261,7 +259,7 @@ func TestSerializedPuller(t *testing.T) {
useSerializedEnv := true
for _, c := range cases {
- puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+ puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
t.Run(c.testName, func(t *testing.T) {
ctx := context.Background()
@@ -322,7 +320,7 @@ func TestPullAndListImageWithPodAnnotations(t *testing.T) {
}}
useSerializedEnv := true
- puller, fakeClock, fakeRuntime, container := pullerTestEnv(c, useSerializedEnv)
+ puller, fakeClock, fakeRuntime, container := pullerTestEnv(t, c, useSerializedEnv)
fakeRuntime.CalledFunctions = nil
fakeRuntime.ImageList = []Image{}
fakeClock.Step(time.Second)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I’ll apply that and test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move some of this into a separate PR: #116231
This was mostly so we can discuss the feature separate from some test fixes and additions.
b388b8b
to
74033ca
Compare
@@ -3192,6 +3194,17 @@ func validateContainerCommon(ctr *core.Container, volumes map[string]core.Volume | |||
allErrs = append(allErrs, field.Required(path.Child("image"), "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this function also run on the pod templates when creating Jobs, Deployments, etc?
Is it acceptable to also fail the creation of those, or just Pods themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pod was never going to be viable, it seems better to fail early? On the other hand, if someone was using this as a template (e.g. doing custom admission control as the pod is created to change the template value into a real value) then it would break them.
Do we have any evidence or anecdata?
74033ca
to
6c68cef
Compare
/hold I am open for people to review but I think maybe I should take this to sig-node and discuss. |
0afbe55
to
69f9aa7
Compare
/retest |
/assign @dchen1107 |
/retest |
I decided to move a lot of the test increases and utility cleanup to #116231. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@kannon92 If you still need this PR then please rebase, if not, please close the PR |
From a backwards compatibility perspective, image name was intended to be passed through the system and interpreted by the CRI / container runtime directly, without being parsed. Very early on we choose not to parse it so that evolutions in the image pull spec (in Docker or OCI) could occur safely, and it was intended to be an opaque value from the Kube perspective. I don't know if anyone has added parsing outside of the Kubelet, but I would probably be against tightening validation on the apiserver because we have no way of knowing the full use of it (people may use invalid image names as placeholders and use webhooks to resolve them at the last minute), we generally don't tighten validation on fields for forwards compatibility, and it would limit our ability to support more diverse runtime types in the future. I do think the Kubelet should return a very clear error Reason via pod conditions to indicate that the image value was invalid, and we should define that in CRI at a minimum (CRI should return a well known error when the image spec is invalid). The last time we discussed this was when we added image digest to the pod status, #7203 (comment) is one of the earlier places we discussed it, #1697 describes the core issue. |
Thank you for your explanation. I will goahead and close this PR but I appreciate your feedback. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Previously InvalidImageName is possible if your image has all capitals, fails docker regular expression checks. This moves these checks to the API when creating the pod. Avoids creating this resource.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#3816