You type example.com into the browser and a fraction of a second later you see the page. But computers can't connect by name — they need a numeric address like 93.184.216.34. Something has to turn a human-friendly name into that address on the fly. That's the job of DNS (Domain Name System) — a service that works like the phone book of the internet: you know the name, it gives you the number.

For a backend developer, DNS is more than "the browser opens a site". Your service reaches the database by the name db.internal, a neighbouring service by name, an external API by domain. Every such call starts with a trip to DNS. And when something breaks at this step, the symptoms can be sneaky — right up to "works on my machine, but not in production". Let's see how it's built.

DNS as the phone book of the internet

Picture an old paper phone book: names on the left, numbers on the right. Want to call someone — find the name, read the number. DNS does exactly the same thing, but for the network: name api.example.com goes in, IP address 203.0.113.10 comes out.

There's one difference: a single book doesn't exist. There are billions of names in the world, and keeping them in one file is impossible. So the directory is distributed — split into parts owned by different servers around the world. When you need an address, the system walks across several of these servers and assembles the answer piece by piece. From the outside it looks instant, but inside it's a small journey.

The request path: from resolver to authoritative server

Say your code needs the address of api.example.com. Here's what happens, step by step:

  1. Recursive resolver. Your machine doesn't ask the whole internet — it asks one middleman server, the recursive resolver (usually your ISP's DNS or a public one like 8.8.8.8). Its job is to run after the answer on your behalf and return a ready IP.
  2. Root servers. The resolver doesn't know the address up front, so it goes down a chain from the top. First it asks a root server: "Who's responsible for the .com zone?" The root doesn't know the specific address of api.example.com, but it knows where to go next.
  3. TLD servers. The root sends the resolver to the servers of the .com zone (TLD — top-level domain). They don't know the final address either, but they know who's responsible for example.com.
  4. Authoritative server. Finally the resolver reaches the authoritative server for example.com — the server that holds the real records for this domain. It hands back the final answer: api.example.com → 203.0.113.10.

An analogy: you're looking for a person in a huge building. At the entrance (root) they tell you: "Those people are on floor 7." On the floor (TLD) — "Office 712." In the office (authoritative server) sits, at last, the one who knows the exact answer. The resolver walks this path for you and remembers the route, so it doesn't have to run it again next time.

Record types: A, AAAA, CNAME, MX, TXT

An authoritative server stores not a single address but a set of records of different types. Each answers its own question. The main ones:

  • A — maps a name to an IPv4 address. The most common record: example.com → 93.184.216.34.
  • AAAA — the same, but for an IPv6 address (2606:2800:220:1:248:1893:25c8:1946). Read as "quad-A".
  • CNAME — an alias: the name points not to an address but to another name. For example, www.example.com is a CNAME to example.com. Handy when several names should lead to one host: change the address in one place, and the rest follow.
  • MX — where to deliver mail for the domain (mail exchange). When you write to user@example.com, the mail server looks at the MX record to find which server accepts the letters.
  • TXT — arbitrary text. Used to verify domain ownership and mail settings (SPF, DKIM) — for example, to prove that mail sent under your name really comes from you.

Here's a fragment of a zone on an authoritative server:

example.com.      3600  IN  A      93.184.216.34
www.example.com.  3600  IN  CNAME  example.com.
example.com.      3600  IN  MX     10 mail.example.com.
example.com.      3600  IN  TXT    "v=spf1 include:_spf.example.com ~all"

The number 3600 at the start is the TTL — more on that below.

You can look at the records yourself. The dig utility shows the answer in detail:

$ dig api.example.com A +short
203.0.113.10

And nslookup is simpler, for a quick check:

$ nslookup example.com
Name:    example.com
Address: 93.184.216.34

TTL and caching: why changes aren't instant

Running the whole resolution chain for every request would be slow and expensive. So answers are cached — remembered for a while. And here the TTL (Time To Live) plays the key role — that same 3600 from the record. It says: "you may treat this answer as fresh for this many seconds". 3600 means one hour.

Until the TTL expires, resolvers and your machine hand back the remembered address without re-asking the authoritative server. That's fast and takes the load off. But there's a flip side: when you change a domain's address, the old answer keeps living in caches around the world for exactly as long as the TTL allows. Hence the phrase "DNS changes don't propagate instantly" — in reality nothing "propagates" anywhere; cached answers simply expire gradually across the planet.

The practical takeaway: if you're planning to move a service to a new address, lower the record's TTL in advance (say, to 60 seconds) a day before the move. Then at switch time the old address goes stale in caches within a minute rather than an hour. After the move you can raise the TTL back.

/etc/hosts: a local override

Before any DNS, the operating system first peeks into a small local file — /etc/hosts (on Windows, C:\Windows\System32\drivers\etc\hosts). It's a plain "name — address" list:

127.0.0.1    localhost
127.0.0.1    api.example.com

If the name is found here, the system takes the address from here and doesn't go to DNS at all. Handy for local development: you can make api.example.com point to your machine to test code against a local service under its "real" name.

But this same convenient thing is a source of classic confusion. A line added to hosts a couple of months ago and forgotten keeps silently redirecting the name. Hence situations like "works for me, not for a colleague": you have an override in hosts you don't remember. If resolution behaves oddly — check hosts first.

DNS inside Kubernetes

In Kubernetes, DNS is the foundation of how services find each other. Instead of hard-coding pod IP addresses (which keep changing on restarts), services reach each other by name. The cluster runs its own DNS (usually CoreDNS), and every Service automatically gets a name.

For example, a Service named orders in the default namespace is reachable from other pods simply as orders or by the full name orders.default.svc.cluster.local. Your code sends a request to http://orders/..., the cluster DNS turns the name into a current address, and traffic is balanced across the live pods. This is what's called service discovery — finding services by name. The same "name instead of address" principle as in the wider internet, only inside the cluster.

The applied angle: DNS cache in clients and applications

It's not only resolvers on the network that cache — applications cache too. Different runtimes do it their own way: the JVM, for instance, keeps DNS resolution results in the process's memory by default; HTTP clients in Go, Node.js, and Python have their own caches and settings. If an external address changes but your service has already cached it, it will keep knocking on the old address even when DNS has long been handing out the new one. Restarting the process clears this cache — hence the folk wisdom "restarting fixed it".

Each runtime has its own way to control how long this cache lives (in the JVM it's networkaddress.cache.ttl, for example, and in old versions the default was "cache forever", which hurts especially when external APIs move). HTTP clients, connection pools, and sidecar proxies have similar internal caches. So when after an address change "service A still talks to the old B", there are several suspects: the OS cache, the resolver cache, the application cache. The same mechanism produces the "works on my machine" effect: your machine has the fresh address cached, another still has the old one, or vice versa.

The moral is simple: DNS is not an instant switch but a layered system of caches with delays. When planning address changes, budget for these delays and remember the TTL at every level.

Where this applies

DNS is the first step of almost any network interaction, so it shows up in debugging constantly. A site won't open, a service can't see the database, an integration "broke after a move" — before blaming the code, it's worth checking whether the name resolves correctly (dig/nslookup) and whether an old address is cached somewhere. DNS sits at the application layer of the stack — for how it fits into the bigger picture, see the article on the OSI and TCP/IP models.

Where beginners stumble:

  • They forget about TTL during a move. They change the A record and expect an instant effect, while the old address lives in caches for another hour. TTL must be lowered in advance.
  • They don't remember the line in /etc/hosts. A local override silently overrides DNS and produces "works for me, not for others".
  • They blame everything on the application's DNS cache. Applications and clients keep their own cache (the JVM, for example); after an address change a service may keep hitting the old one until it's restarted or the cache TTL is tuned.
  • They confuse CNAME and A. A CNAME points to a name, not an address; you can't put a CNAME on the domain root or mix it with other records.
  • They think DNS returns "the site". DNS returns only the address. Then a connection is established to that address and HTTP flows — that's already a different step.

What to learn next

DNS returned the address — now the connection itself begins. It makes sense to go through IP addresses, ports, and NAT: how a packet finds a machine and a specific service from the address you got. Then the application layer on top: HTTP and its secure variant HTTPS and TLS, where, by the way, the name from DNS is checked once more against the certificate. And to understand how a service stays available when some nodes drop out, look at reliability and fault tolerance — DNS and caches play a role there too.