DNS Debugging
Your application logs print could not resolve host web-svc. The Service exists, the Pods are running, but the name does not resolve. DNS failures are among the most frustrating issues in Kubernetes because the error message rarely tells you what is actually wrong. A systematic approach turns a guessing game into a short checklist.
The debug decision tree
Before running any commands, it helps to have a mental model of the DNS resolution path and where it can break.
Each node corresponds to a concrete command you can run right now. Work through them in order.
Step 1: verify CoreDNS is running
CoreDNS is the foundation. If it is not running, no name in the cluster resolves. Check it first:
kubectl get pods -n kube-systemYou should see Pods with coredns in their name in the Running state. In the simulated cluster, CoreDNS is started during bootstrap, so this step tells you whether the system services are healthy. A Pod in CrashLoopBackOff here means DNS is completely broken for all workloads.
CoreDNS is running and healthy but DNS still fails for your application. What is your next step?
- Restart the failing application Pod
- Check that the Service exists in the correct namespace
- Scale up the CoreDNS Deployment
Reveal answer
Check that the Service exists in the correct namespace. A healthy CoreDNS cannot resolve a Service that does not exist, or one that lives in a different namespace than expected.
Step 2: verify the Service exists and has Endpoints
A Service can exist in DNS without any working backend. This happens when the label selector in the Service does not match any running Pods. CoreDNS always registers the A record for the Service’s ClusterIP, but traffic has nowhere to go.
First confirm the Service exists and note its ClusterIP:
kubectl get service web-svcThen inspect its Endpoints:
kubectl describe service web-svcScroll to the Endpoints field. If it shows <none>, the selector matches no running Pods. The DNS name resolves correctly, but TCP connections to that IP will time out.
A Service with no Endpoints does not cause a DNS failure. The name resolves to the ClusterIP, but connecting to it will hang. If your application reports a connection timeout rather than "unknown host", check Endpoints first. DNS and routing are independent layers.
You run kubectl get service api and the Service exists. Then kubectl describe service api shows Endpoints: <none>. What is the most likely cause?
Reveal answer
The Service selector does not match any running Pod labels. Either the Pods are not running, or there is a label mismatch, for example the selector uses app: api but the Pods have app: api-server. Fix the selector or the Pod labels to make them match.
Step 3: test name resolution with nslookup
Once you know CoreDNS is up and the Service exists, test DNS resolution directly. Start with the short name:
nslookup web-svcIn the simulated cluster, nslookup queries the DNS simulation layer. If the Service exists in the current namespace, you get back its ClusterIP. If it fails, try the namespace-qualified name:
nslookup web-svc.defaultAnd if that also fails, try the full FQDN:
nslookup web-svc.default.svc.cluster.localMoving from short name to FQDN isolates namespace confusion. If the FQDN resolves but the short name does not, you are querying from a different namespace than where the Service lives.
nslookup web-svc fails, but nslookup web-svc.default succeeds. What does this tell you?
Reveal answer
The query is coming from a Pod in a namespace other than default. The short name web-svc is expanded using the current namespace’s search domain, which does not include default. Use the namespace-qualified form or the full FQDN when crossing namespace boundaries.
Step 4: reproduce a broken selector to understand the layers
The best way to understand the difference between a DNS failure and a connectivity failure is to create one intentionally. Create a Service whose selector will not match any Pod:
kubectl create service clusterip broken-svc --tcp=80:80This Service gets the selector app: broken-svc by default. Unless you have a Pod with exactly that label, the Endpoints list is empty. Verify:
kubectl describe service broken-svcThe Endpoints field shows <none>. Now resolve the name:
nslookup broken-svcResolution succeeds. CoreDNS returns the ClusterIP. The DNS layer works. But any TCP connection to that IP will time out because there is no backend. This is the most common source of confusion: developers see a successful DNS lookup and assume the Service is healthy, when the actual problem is a selector mismatch.
In the simulated cluster, nslookup queries the simulated DNS layer. If the Service does not exist, resolution fails exactly as it would on a real cluster. If the Service exists but has no Endpoints, resolution still succeeds. The failure surfaces at the connection layer, not the DNS layer.
Now fix the broken Service by deleting it and creating one with a selector that actually matches a running Pod. First check what Pods and labels are available:
kubectl get pods --show-labelsCan you construct a kubectl expose command that creates a Service targeting one of those Pods? Use the labels you see in the output to make the selector match.
You create a Service but accidentally omit the selector field entirely. nslookup returns an IP. Your application still cannot connect. What should you check?
Try it: kubectl describe service broken-svc
Reveal answer
Look at the Endpoints field. Without a selector, the Service has no Endpoints and no routing target. DNS resolves because the ClusterIP was assigned. Fix by adding a selector that matches your Pods, or by manually creating an Endpoints object pointing to the Pod IPs.
DNS debugging in Kubernetes always follows the same four steps: confirm CoreDNS is healthy, confirm the Service exists in the right namespace, test resolution from short name to FQDN, then inspect Endpoints to separate DNS failures from connectivity failures. Most issues that look like DNS problems are actually namespace confusion or selector mismatches.