How to create healthchecks for Docker containers

How to create and manage healthchecks in your Docker compose stack and restart your containers if they are unhealthy.

TheLazyFox

Nov 13, 2021 • 3 min read

Docker

What is an healthcheck?

Docker provides natively an healthcheck system, when a container has an healthcheck configured, it comes with an health status in addition to its normal status. This status is initially starting. Whenever a health check passes, it becomes healthy (whatever state it was previously in). After a certain number of consecutive failures, it becomes unhealthy.

The HEALTHCHECK instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running.

Nevertheless, a huge amount of Docker container providers don't implement this feature.

No worries, there is a way to integrate an healthcheck in your docker compose file and this is the purpose of this post! 🚀

Healthchecks with Docker Compose

This is how it looks like in your docker-compose.yml file.

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost"]
  interval: 1m30s
  timeout: 10s
  retries: 3
  start_period: 40s

healthcheck in docker-compose

interval, timeout and start_period are self explanatory. interval is the frequency the test is run, timeout is the time allocated to run the test before its considered as failed and start_period , the time to wait before the first test is started to give time to your container to start. They are specified as durations. retriesis the number of tentatives before considering the test as failed.

Now let's talk about test, this is the actual test docker is going to execute to determine if your docker is healthy or unhealthy. It must be either a string or a list. If it's a string, just put the full command. If it’s a list, the first item must be NONE, CMD or CMD-SHELL.

NONE, no healthcheck will run including the one in the container if there is one
CMD, it must be followed by the command and all its argument as part of the list
CMD-SHELL, followed by a string it’s equivalent to just specifying that string

Let's go through some examples to hit the local webserver to see if its running correctly:

# CMD option
test: ["CMD", "curl", "-f", "http://localhost"]

# CMD-SHELL option
test: ["CMD-SHELL", "curl -f http://localhost || exit 1"]

# String option
test: "curl -f https://localhost || exit 1"

Different test options

Healthcheck and dependencies

You probably already use depends_on to ensure some containers are started before launching another one.

depends_on:
   - mycontainer
   
# OR 

depends_on:
   mycontainer:
      condition: service_started

Wait for mycontainer to be started

Now you can wait until the container turns healthy! ✌🏼

depends_on:
  mycontainer:
    condition: service_healthy

Wait for mycontainer to be healthy

And now, what if?

My container is not a webserver

When your container is not running a webserver, you will need to define what command to run to ensure its healthy.

Some containers contain interesting command like postgrewhich comes with pg_isreadyto know if the database server is up and running. Check the documentation if you want to customize it.

test: ["CMD-SHELL", "pg_isready -U user"]

Check if postgre server is up and running

With influxdb, you can list all databases which will indeed confirm the up and running state of your container.

test: ["CMD-SHELL", "influx -execute 'SHOW DATABASES' || exit 1"]

Check if influxdb server is up and running

My container doesn't have curl

Some of the containers running a webserver don't contain curl, no worries, you can still use wget.

test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider --no-check-certificate http://localhost/ || exit 1"]

My container uses another container's network

And finally, what if you are routing a container traffic through a vpn container?

In this case, it's important to hit the local webserver through the vpn container in your healthcheck's test.

In the example below, I'm testing the internal port of the container through the VPN IP address. By doing this, you ensure your container will turn unhealthy if your VPN container is restarting or just down.

version: "2.3" 
services: 
  vpn:
    container_name: my-vpn
    [...]
    networks:
      default:
        ipv4_address: 172.21.0.2
    ports:
      - 8888:1111
    [...]

  mycontainer:
    container_name: my-container
    [...]
    depends_on:
     - vpn
    network_mode: "service:vpn"
    [...]
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://172.21.0.2:1111/ || exit 1"]
      start_period: 120s
      interval: 60s
      timeout: 10s
      retries: 3

networks:
  default:
    external:
      name: my-vpn-network

My container is unhealthy

If your container is unhealthy, you can use some of the containers able to restart unhealthy ones automatically. I recommend willfarrell/docker-autoheal or qdm12/deunhealth. I personnaly use the second one, written in Go like Docker it directly uses the Docker API.

The docker-compose.yml file is super simple to setup, no volume to map.

version: "3.7"
services:
  deunhealth:
    image: qmcgaw/deunhealth:latest
    container_name: deunhealth
    network_mode: "none"
    environment:
      - LOG_LEVEL=info
      - HEALTH_SERVER_ADDRESS=127.0.0.1:9999
      - TZ=Europe/Zurich
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

🎉 Now it's your turn to healthcheck all your containers!

Tags: Docker