Regression when using ExternalName services #13081

philpep · 2025-03-25T10:28:46Z

After upgrade from chart 4.11.3 to 4.12.1 my ingresses using ExternalName services aren't working anymore (http 503 Service Temporarily Unavailable).

Logs of the controller shows:

2025/03/25 10:22:17 [error] 26#26: *33847 lua entry thread aborted: runtime error: /etc/nginx/lua/balancer.lua:78: bad argument #1 to 'ipairs' (table expected, got nil)
stack traceback:
coroutine 0:
        [C]: in function 'ipairs'
        /etc/nginx/lua/balancer.lua:78: in function 'resolve_external_names'
        /etc/nginx/lua/balancer.lua:114: in function 'sync_backend'
        /etc/nginx/lua/balancer.lua:148: in function </etc/nginx/lua/balancer.lua:146>, context: ngx.timer

Example:

apiVersion: v1
kind: Service
metadata:
  name: example
spec:
  type: ExternalName
  externalName: internal.example.com
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example.com
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: example
                port:
                  number: 443
  tls:
    - hosts:
      - example.com

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2025-03-25T10:28:54Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

philpep · 2025-03-25T10:32:32Z

It might relate to #13076 but seems a different issue.

philpep · 2025-03-25T10:39:45Z

Seems the bug was introduced in 4.11.5 release (4.11.4 is OK).

philpep · 2025-03-25T10:43:05Z

I think issue is from this commit c6c5b48

tolix1 · 2025-03-26T07:36:25Z

same issue here. 4.11.5 and 4.12.1 impacted

Gacko · 2025-03-26T13:57:32Z

Please do not override the issue template and instead fill it as requested. This is important for reproducing your issue.

Also please add information about what internal.example.com is pointing at. Are these IP addresses? Is it a CNAME?

philpep · 2025-03-26T15:03:48Z

Please do not override the issue template and instead fill it as requested. This is important for reproducing your issue.

Sorry will do better next time.

Also please add information about what internal.example.com is pointing at. Are these IP addresses? Is it a CNAME?

Yes it's a resolvable CNAME.

sepich · 2025-03-26T15:23:48Z

Another possibly related issue is that nginx.ingress.kubernetes.io/default-backend stopped working for externalName Ingresses.

Here is example yamls

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
spec:
  selector:
    matchLabels:
      app: echo
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
        - image: inanimate/echo-server
          name: echo
      enableServiceLinks: false
---
apiVersion: v1
kind: Service
metadata:
  name: echo
spec:
  selector:
    app: echo
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/upstream-vhost: google.com
    nginx.ingress.kubernetes.io/custom-http-errors: "301"
    nginx.ingress.kubernetes.io/default-backend: echo
    prometheus.io/probe: "false"
  name: test
spec:
  ingressClassName: nginx
  defaultBackend:
    service:
      name: ext
      port:
        number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: ext
spec:
  type: ExternalName
  externalName: google.com

So in example we trying to use custom errors:
https://kubernetes.github.io/ingress-nginx/examples/customization/custom-errors/
In this case we access google.com via http/80 and get redirect to https:

$ curl -si google.com | head -1
HTTP/1.1 301 Moved Permanently

Then we try to forward this 301 to be served from out service echo.
It works in v1.11.4:

127.0.0.1 - - [26/Mar/2025:15:22:12 +0000] "GET / HTTP/1.1" 200 1774 "-" "curl/8.7.1" 67 0.033 [custom-default-backend-default-echo] [] 142.250.186.174:80 : 10.244.0.8:8080 0 : 1774 0.030 : 0.003 301 : 200 0cc21efb7ceba0b69745bc9afc359738

but broken in v1.11.5:

127.0.0.1 - - [26/Mar/2025:15:15:00 +0000] "GET / HTTP/1.1" 502 150 "-" "curl/8.7.1" 67 0.036 [custom-default-backend-default-echo] [] 216.58.206.78:80 : 0.0.0.1:80 0 : 0 0.035 : 0.001 301 : 502 6958aeaac151beea2b82922fd8da30fd
2025/03/26 15:15:00 [error] 38#38: *24724 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET / HTTP/1.1", upstream: "http://0.0.0.1:80/", host: "test"
2025/03/26 15:15:00 [warn] 38#38: *24724 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET / HTTP/1.1", upstream: "http://0.0.0.1:80/", host: "test"

Gacko · 2025-03-26T16:16:21Z

Yes it's a resolvable CNAME.

Out of curiosity: Can you make it an A / AAAA record? Just wanna see if it's related to the record type.

philpep · 2025-03-27T08:58:48Z

Yes it's a resolvable CNAME.

Out of curiosity: Can you make it an A / AAAA record? Just wanna see if it's related to the record type.

In my case it's a A record (without AAAA) resolving into a RFC1918 reserved ip address, outside of the k8s cluster (e.g. 192.168.42.12)

vasili439 · 2025-03-27T09:16:38Z

Yes it's a resolvable CNAME.

Out of curiosity: Can you make it an A / AAAA record? Just wanna see if it's related to the record type.

In my case it's a A record (without AAAA) resolving into a RFC1918 reserved ip address, outside of the k8s cluster (e.g. 192.168.42.12)

in my case the same: I've tried as a quick fix to replace RFC1918 IP with temp domain name (A record) but no luck. HTTP503 with plain IP address and with DNS A record.

Gacko · 2025-03-27T09:53:01Z

Maybe @neerfri can shed some light on this, as they implemented the change.

Confushion · 2025-03-28T14:33:48Z

Same issue here (v.4.11.5) using ip address as externalName

Confushion · 2025-03-28T15:57:45Z

Update: changing externalName from an ip address to a (valid) dns-name (A record) seemed to fix my issue....

wilmardo · 2025-03-31T08:43:36Z

@strongjz any reason why this is closed? This for sure is an undocumented regression. It was previously supported to have an IP address as yourExternalName address and now it isn't working anymore.
If it is an intended change it at least needs to be documented that this isn't supported anymore.

tgraskemper · 2025-03-31T13:22:50Z

@wilmardo You seem to be referring to the other closed ticket of #13076 where the response stated, as the documentation also does, that IP addresses as ExternalName is not supported.

Your statement about this being an undocumented regression remains true, however. The fact that CNAME's now behave such that 503's randomly occur and are sent to the user is clearly an issue, as we are facing the same. If an A record is suppose to fix this, it misses the fact that some of us are proxying to an external service where we don't control the DNS entry, and so have to come up with a hacky workaround, like resolving the name, potentially able to change, and upload that as a new record to be used as the ExternalName (assuming you don't have other TLS issues).

An alternative workaround, potentially impacting your existing ingress configuration and also quite ugly, is to capture the 503 and use the proxy_pass directive not impacted by this issue to proxy back to the original service.

    nginx.ingress.kubernetes.io/configuration-snippet: |
      error_page 503 = @fallback_pass;
    nginx.ingress.kubernetes.io/server-snippet: |
      location @fallback_pass {
        set $proxy_host mysubdomain.mysite.com;
        proxy_set_header Host $proxy_host;
        proxy_pass https://$proxy_host/;
      }

Truthfully this solution might seem quite dumb to someone who knows the NGINX configuration options better than me, so would love for someone else to chime in on a better workaround.

philpep · 2025-03-31T13:39:35Z

@strongjz @Confushion @wilmardo please note this ticket is about using a valid hostname/CNAME (not IP address) as externalName. This is not the same as #13076

I think this ticket should be re-opened since the issue still exists.

I still have this traceback on 4.12.1

2025/03/25 10:22:17 [error] 26#26: *33847 lua entry thread aborted: runtime error: /etc/nginx/lua/balancer.lua:78: bad argument #1 to 'ipairs' (table expected, got nil)
stack traceback:
coroutine 0:
        [C]: in function 'ipairs'
        /etc/nginx/lua/balancer.lua:78: in function 'resolve_external_names'
        /etc/nginx/lua/balancer.lua:114: in function 'sync_backend'
        /etc/nginx/lua/balancer.lua:148: in function </etc/nginx/lua/balancer.lua:146>, context: ngx.timer

strongjz · 2025-03-31T13:43:40Z

Apologies, folks, I read @Confushion response and thought the issue was resolved.

/reopen

k8s-ci-robot · 2025-03-31T13:43:47Z

@strongjz: Reopened this issue.

In response to this:

Apologies for folks, I read @Confushion response and thought the issue was resolved.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

neerfri · 2025-04-01T08:32:12Z

Hi all!
I'm the original author of the commit that caused the issue.
I'd love to help solve this.

@philpep - Your comment stating the difference between this issue and #13076 was important. Thank you.

After reading the code again, with the stack-trace you provided in mind, it's clear that backend.endpoints in your case is nil.
It's possible to fix the code such that it handles this case. Before I issue such a fix I think we need to understand why this happening. This will allow us to add a test to ensure we don't have regressions again.

Here's what I know:

resolve_external_names(original_backend) is called from sync_backend(backend).
In this case sync_backend(backend) is called from sync_backends_with_external_name(). The call from sync_backends skips external name backends.
sync_backends_with_external_name() is called by a timer and operates on the backends_with_external_name variable.
The backends_with_external_name variable is update in sync_backends(), which is also called by a timer.
sync_backends() pulls the backend data using configuration.get_backends_data() which is a JSON object describing the backends
To the best of my understanding, that JSON is set from the go function func configureBackends(rawBackends []*ingress.Backend)
From what I can tell by reading these lines the endpoints for a backend are always set in the Backend struct. An empty array is always initialized.
At the struct's definition, the Endpoints attribute is defined with Endpoints []Endpoint json:"endpoints,omitempty"`` which means that zero-length arrays are omitted. Hence the nil on the lua side.

To sum this knowledge into a plan we can either:

fix the lua code to account for a nil value in backend.endpoints or
change the JSON coming in by removing the omitempty from the JSON struct.

Personally I feel more comfortable with changing the lua code to reduce possible effects on other part of the system.

@strongjz @wilmardo @Gacko - As members of the repo please share your opinion and who can escort this change in. It took 1 year to get the original PR merged. I want to make sure if I put in the hours to issue a PR and solve this that there is a member that want's to help us get it merged in a timely manner.

Thank you ✌️

Gacko · 2025-04-01T10:18:39Z

Hello,

first @wilmardo is not a maintainer of this project. Second I think we can get a possible fix in quite fast now, because there's a proper discussion and investigation. This is why I was linking you here, as you have way more context and knowledge around this than I do, so I'm pretty sure we can get this fixed soon.

Thank you!

philpep · 2025-04-01T15:04:56Z

Hi @neerfri thanks for detailed analysis!

While I'm not sure I understand all the details, it appear not everyone have this issue, since people having an IP address and then switching to CNAME fix their issue.

So I think it might be related to my environment (dns server or k8s setup). One important thing I omitted to mention is that I'm running a EOL kubernetes 1.28.15 cluster. Given what you said about "Endpoint", maybe there was a change in kubernetes/client-go expecting "nil" values to return empty map or something like this.

I plan to upgrade my clusters soon, I'll see if it fixes my issue.

What k8s cluster version are using other people having the same issue ?

jmiller-ca · 2025-04-01T15:54:24Z

I was going to keep quiet but maybe some additional info will help

EKS 1.32
Using the helm chart for install

The manifest below works fine on v4.11.4 but updating to 4.12.1 breaks

apiVersion: v1
kind: Service
metadata:
  name: files-cache-proxy
spec:
  type: ExternalName
  externalName: media.mycompany.com
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-mycompany-com
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/proxy-ssl-name: media.mycompany.com
    nginx.ingress.kubernetes.io/upstream-vhost: "media.mycompany.com"
    # # Add caching annotations
    # nginx.ingress.kubernetes.io/proxy-cache: "cache-media-my-company.com"

    # Add header control
    nginx.ingress.kubernetes.io/configuration-snippet: |
      expires 1M;
      add_header Cache-Control "public, max-age=2592000, immutable";
      proxy_ssl_name media.mycompany.com;
      proxy_ssl_server_name on;
      proxy_cache_valid 200 302 24h;
      proxy_cache_valid 301      1h;
      proxy_cache_valid any      1m;
      # Hide cache-related headers
      proxy_hide_header X-Powered-By;
      proxy_hide_header Vary;
      proxy_hide_header Pragma;
      proxy_hide_header Last-Modified;
      proxy_hide_header Set-Cookie;
  name: files-cache-proxy
spec:
  ingressClassName: external
  rules:
    - host: my-01.qa.mycompany.com
      http:
        paths:
          - backend:
              service:
                name: files-cache-proxy
                port:
                  number: 443
            path: /files/cache
            pathType: Prefix
  tls:
    - hosts:
        - my-01.qa.mycompany.com
      secretName: my-01.qa.mycompany-com-tls

when looking at ingress list files-cache-proxy did not have a ADDRESS

with a revert to v4.11.4 everything started working again as expected.

sorry for the multiple updates

neerfri · 2025-04-03T08:31:21Z

Hi All,

TL;DR;

A pull request was opened to fix this.
#13154
Maintainers, please help approve it for tests
Others, please go subscribe if you are effected

Updating here as I'm making progress on tracking the source of problem here in order to decide the best course of action.

It seems that a behavior regarding the endpoints of a service with type ExternalName has changed, which is the reason for different Kubernetes versions providing different outcomes here.
For those who wish to learn further you can read:
The issue reported: kubernetes/kubernetes#105986 (comment)
The PR that made the change: kubernetes/kubernetes#114814

From an hour of digging and thinking about this I'm not sure how we can fix the implementation for Kubernetes versions that do not send the endpoint because the Lua code was reading the DNS entry from the endpoints, I might need to learn more the exact payload we have from Kubernetes for this.

@Gacko I've created a PR at #13154
Currently the PR only contains tests to cover the scenarios discussed here
Please approve it for testing so I can make progress on the tests to guide us regarding a possible implementation.
As you can see I'm giving this priority since we have users being impacted, your quick response will be much appreciated.

Thank you ✌️

philpep added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2025

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 25, 2025

k8s-ci-robot added the needs-priority label Mar 25, 2025

strongjz added this to [SIG Network] Ingress NGINX Mar 25, 2025

strongjz closed this as completed Mar 30, 2025

github-project-automation bot moved this to Done in [SIG Network] Ingress NGINX Mar 30, 2025

k8s-ci-robot reopened this Mar 31, 2025

neerfri added a commit to neerfri/ingress-nginx that referenced this issue Apr 3, 2025

add tests for issue kubernetes#13081

9dbf78f

neerfri linked a pull request Apr 3, 2025 that will close this issue

Fix exception using ExternalName with empty endpoint list #13154

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression when using ExternalName services #13081

Regression when using ExternalName services #13081

philpep commented Mar 25, 2025 •

edited

Loading

k8s-ci-robot commented Mar 25, 2025

philpep commented Mar 25, 2025

philpep commented Mar 25, 2025

philpep commented Mar 25, 2025

tolix1 commented Mar 26, 2025 •

edited

Loading

Gacko commented Mar 26, 2025

philpep commented Mar 26, 2025

sepich commented Mar 26, 2025

Gacko commented Mar 26, 2025

philpep commented Mar 27, 2025

vasili439 commented Mar 27, 2025

Gacko commented Mar 27, 2025

Confushion commented Mar 28, 2025 •

edited

Loading

Confushion commented Mar 28, 2025

wilmardo commented Mar 31, 2025

tgraskemper commented Mar 31, 2025 •

edited

Loading

philpep commented Mar 31, 2025

strongjz commented Mar 31, 2025 •

edited

Loading

k8s-ci-robot commented Mar 31, 2025

neerfri commented Apr 1, 2025

Gacko commented Apr 1, 2025

philpep commented Apr 1, 2025 •

edited

Loading

jmiller-ca commented Apr 1, 2025 •

edited

Loading

neerfri commented Apr 3, 2025

Regression when using ExternalName services #13081

Regression when using ExternalName services #13081

Comments

philpep commented Mar 25, 2025 • edited Loading

k8s-ci-robot commented Mar 25, 2025

philpep commented Mar 25, 2025

philpep commented Mar 25, 2025

philpep commented Mar 25, 2025

tolix1 commented Mar 26, 2025 • edited Loading

Gacko commented Mar 26, 2025

philpep commented Mar 26, 2025

sepich commented Mar 26, 2025

Gacko commented Mar 26, 2025

philpep commented Mar 27, 2025

vasili439 commented Mar 27, 2025

Gacko commented Mar 27, 2025

Confushion commented Mar 28, 2025 • edited Loading

Confushion commented Mar 28, 2025

wilmardo commented Mar 31, 2025

tgraskemper commented Mar 31, 2025 • edited Loading

philpep commented Mar 31, 2025

strongjz commented Mar 31, 2025 • edited Loading

k8s-ci-robot commented Mar 31, 2025

neerfri commented Apr 1, 2025

Gacko commented Apr 1, 2025

philpep commented Apr 1, 2025 • edited Loading

jmiller-ca commented Apr 1, 2025 • edited Loading

neerfri commented Apr 3, 2025

TL;DR;

philpep commented Mar 25, 2025 •

edited

Loading

tolix1 commented Mar 26, 2025 •

edited

Loading

Confushion commented Mar 28, 2025 •

edited

Loading

tgraskemper commented Mar 31, 2025 •

edited

Loading

strongjz commented Mar 31, 2025 •

edited

Loading

philpep commented Apr 1, 2025 •

edited

Loading

jmiller-ca commented Apr 1, 2025 •

edited

Loading