Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DNS resolution for external domain names (Internet) not working on pods #1516

Closed
gsfd2000 opened this issue Sep 20, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@gsfd2000
Copy link

gsfd2000 commented Sep 20, 2024

What did you do

I created a default k3d cluster and deployed pods on it. The pods are not able to resolve external dns. I tried setting K3D_FIX_DNS to 0 and 1, both made no difference, this example used default.

  • How was the cluster created?

    • k3d cluster create dnstest
  • What did you do afterwards?

    • kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
vagrant@ubuntu-10032023:~/testcluster$ kubectl exec -it dnsutils -- nslookup google.com
Server:		10.43.0.10
Address:	10.43.0.10#53

** server can't find google.com: NXDOMAIN
command terminated with exit code 1
error message in coredns pod logs:
vagrant@ubuntu-10032023:~/testcluster$ kubectl logs coredns-76f668cf94-rxgvs -n kube-system
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] 127.0.0.1:38206 - 26358 "HINFO IN 2083597104883247646.1506387703770981152. udp 57 false 512" NXDOMAIN qr,aa,rd 132 0.000832127s
[INFO] plugin/reload: Running configuration SHA512 = fd586de816c2c35b2d8a1c5ceb51dda557a14f33879cb49b6c6a115bc61f862d3cdad6881dd13478e0a77a6f5267199b6f43500da4783b193415947413fb64e3
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[INFO] 10.42.0.8:48569 - 3595 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000183376s
[INFO] 10.42.0.8:44383 - 41899 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000093384s
[INFO] 10.42.0.8:45401 - 20902 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000095355s
[INFO] 10.42.0.8:41790 - 244 "A IN google.com. udp 28 false 512" NXDOMAIN qr,aa,rd 103 0.000072917s
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server 
cluster dns points to coredns service:
vagrant@ubuntu-10032023:~/testcluster$ kubectl exec -it dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5 
k3d cluster running on one server node container with lb 
vagrant@ubuntu-10032023:~/testcluster$ docker container ls
CONTAINER ID   IMAGE                            COMMAND                  CREATED         STATUS         PORTS                             NAMES
bcee24ec17ff   ghcr.io/k3d-io/k3d-proxy:5.7.0   "/bin/sh -c nginx-pr…"   3 minutes ago   Up 3 minutes   80/tcp, 0.0.0.0:38263->6443/tcp   k3d-dnstest-serverlb
b79f010fe5e5   rancher/k3s:v1.29.6-k3s1         "/bin/k3d-entrypoint…"   3 minutes ago   Up 3 minutes                                     k3d-dnstest-server-0 
the coredns pod does seem to resolve correctly to the underlying docker network gateway ip 172.19.0.1
vagrant@ubuntu-10032023:~/testcluster$ kubectl debug -it coredns-66c56f4556-dgm46  -n kube-system --image registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 --target coredns -n kube-system
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-v99vb.
If you don't see a command prompt, try pressing enter.
root@coredns-66c56f4556-dgm46:/# 
root@coredns-66c56f4556-dgm46:/# cat /etc/resolv.conf
search eu.pg.com
nameserver 172.19.0.1
options ndots:0

this IP also resolves.
vagrant@ubuntu-10032023:~/testcluster$ kubectl get nodes -owide
NAME                    STATUS   ROLES                  AGE   VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION      CONTAINER-RUNTIME
k3d-dnstest4-server-0   Ready    control-plane,master   20d   v1.29.6+k3s1   172.19.0.2    <none>        K3s v1.29.6+k3s1   5.15.0-69-generic   containerd://1.7.17-k3s1
the k3d docker node container resolves correctly
vagrant@ubuntu-10032023:~/testcluster$ docker exec -it k3d-dnstest-server-0 nslookup google.com
Server:		127.0.0.11
Address:	127.0.0.11:53
Non-authoritative answer:

Non-authoritative answer:
Name:	google.com
Address: 142.250.185.238 
`the underlying virtualbox machine (spawned by vagrant) also resolves correctly:
the machine ```ning on a company laptop
vagrant@ubuntu-10032023:~/testcluster$ nslookup google.com
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	google.com
Address: 142.250.185.142
Name:	google.com
Address: 2a00:1450:4001:810::200e 
as a test, I tried to change the forward configuration of the coredns configmap directly to docker gatway 172.19..01 or alternativel 8.8.8.8 but that did not change anything as well
vagrant@ubuntu-10032023:~/testcluster$ kubectl get cm coredns -n kube-system -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          fallthrough
        }
        prometheus :9153
        forward . 8.8.8.8
        cache 30
        loop
        reload
        loadbalance
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server
  Nodehosts: |
    172.19.0.2 k3d-dnstest4-server-0
kind: ConfigMap
metadata:
  annotations:
.....

What did you expect to happen

I expect the nslookup command to run/resolve properly for external DNS in every cluster pod

Which OS & Architecture

vagrant@ubuntu-10032023:~/testcluster$ uname -a
Linux ubuntu-10032023 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
vagrant@ubuntu-10032023:~/testcluster$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.2 LTS
Release:	22.04
Codename: jammy 
  • output of k3d runtime-info
vagrant@ubuntu-10032023:~/testcluster$ k3d runtime-info

arch: x86_64
cgroupdriver: systemd
cgroupversion: "2"
endpoint: /var/run/docker.sock
filesystem: extfs
infoname: ubuntu-10032023
name: docker
os: Ubuntu 22.04.2 LTS
ostype: linux
version: 27.0.3 

Which version of k3d

vagrant@ubuntu-10032023:~/testcluster$ k3d version
k3d version v5.7.0
k3s version v1.29.6-k3s1 (default) ```

## Which version of docker
 ```vagrant@ubuntu-10032023:~/testcluster$ docker version
Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0 

potentially related

#1515

@gsfd2000 gsfd2000 added the bug Something isn't working label Sep 20, 2024
@gsfd2000
Copy link
Author

there is a change which happened on coredns switching to usage of coredns-custom which seems to create the problem. Clusters spawned with k3d >=5.7.0 are using coredns-custom configmap which shoots the coredns based cluster pod external FQDN resolutions
https://github.com/k3d-io/k3d/releases/tag/v5.7.0
v5.6.3...v5.7.0
71b5755
when you remove the import entry from the coredns cm, it is working. The new coredns-custom configuration seems to have a hiccup somewhere, can someone pls take a look?
When I downgrade on the same cluster to 5.6.3 and run the same cluster creation, I have no challenges.

@gsfd2000
Copy link
Author

I have seen that issues have already been reverted in > 5.7.0, hence closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant