Container health check¶

Container health check checks the health status of containers according to user requirements. After configuration, if the application in the container is abnormal, the container will automatically restart and recover. Kubernetes provides Liveness checks, Readiness checks, and Startup checks.

LivenessProbe can detect application deadlock (the application is running, but cannot continue to run the following steps). Restarting containers in this state can help improve the availability of applications, even if there are bugs in them.
ReadinessProbe can detect when a container is ready to accept request traffic. A Pod can only be considered ready when all containers in a Pod are ready. One use of this signal is to control which Pod is used as the backend of the Service. If the Pod is not ready, it will be removed from the Service's load balancer.
Startup check (StartupProbe) can know when the application container is started. After configuration, it can control the container to check the viability and readiness after it starts successfully, so as to ensure that these liveness and readiness probes will not affect the start of the application. Startup detection can be used to perform liveness checks on slow-starting containers, preventing them from being killed before they start running.

Liveness and readiness checks¶

The configuration of LivenessProbe is similar to that of ReadinessProbe, the only difference is to use readinessProbe field instead of livenessProbe field.

HTTP GET parameter description:

Parameter	Description
Path (Path)	The requested path for access. Such as: /healthz path in the example
Port (Port)	Service listening port. Such as: port 8080 in the example
protocol	access protocol, Http or Https
Delay time (initialDelaySeconds)	Delay check time, in seconds, this setting is related to the normal startup time of business programs. For example, if it is set to 30, it means that the health check will start 30 seconds after the container is started, which is the time reserved for business program startup.
Timeout (timeoutSeconds)	Timeout, in seconds. For example, if it is set to 10, it indicates that the timeout waiting period for executing the health check is 10 seconds. If this time is exceeded, the health check will be regarded as a failure. If set to 0 or not set, the default timeout waiting time is 1 second.
Timeout (timeoutSeconds)	Timeout, in seconds. For example, if it is set to 10, it indicates that the timeout waiting period for executing the health check is 10 seconds. If this time is exceeded, the health check will be regarded as a failure. If set to 0 or not set, the default timeout waiting time is 1 second.
SuccessThreshold (successThreshold)	The minimum number of consecutive successes that are considered successful after a probe fails. The default value is 1, and the minimum value is 1. This value must be 1 for liveness and startup probes.
Maximum number of failures (failureThreshold)	The number of retries when the probe fails. Giving up in case of a liveness probe means restarting the container. Pods that are abandoned due to readiness probes are marked as not ready. The default value is 3. The minimum value is 1.

Check with HTTP GET request¶

YAML example:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness  # Container name
    image: k8s.gcr.io/liveness  # Container image
    args:
    - /server  # Arguments to pass to the container
    livenessProbe:
      httpGet:
        path: /healthz  # Access request path
        port: 8080  # Service listening port
        httpHeaders:
        - name: Custom-Header  # Custom header name
          value: Awesome  # Custom header value
      initialDelaySeconds: 3  # Wait 3 seconds before the first probe
      periodSeconds: 3  # Perform liveness detection every 3 seconds

According to the set rules, Kubelet sends an HTTP GET request to the service running in the container (the service is listening on port 8080) to perform the detection. The kubelet considers the container alive if the handler under the /healthz path on the server returns a success code. If the handler returns a failure code, the kubelet kills the container and restarts it. Any return code greater than or equal to 200 and less than 400 indicates success, and any other return code indicates failure. The /healthz handler returns a 200 status code for the first 10 seconds of the container's lifetime. The handler then returns a status code of 500.

Use TCP port check¶

TCP port parameter description:

Parameter	Description
Port (Port)	Service listening port. Such as: port 8080 in the example
Delay time (initialDelaySeconds)	Delay check time, in seconds, this setting is related to the normal startup time of business programs. For example, if it is set to 30, it means that the health check will start 30 seconds after the container is started, which is the time reserved for business program startup.
Timeout (timeoutSeconds)	Timeout, in seconds. For example, if it is set to 10, it indicates that the timeout waiting period for executing the health check is 10 seconds. If this time is exceeded, the health check will be regarded as a failure. If set to 0 or not set, the default timeout waiting time is 1 second.

For a container that provides TCP communication services, based on this configuration, the cluster establishes a TCP connection to the container according to the set rules. If the connection is successful, it proves that the detection is successful, otherwise the detection fails. If you choose the TCP port detection method, you must specify the port that the container listens to.

YAML example:

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

This example uses both readiness and liveness probes. The kubelet sends the first readiness probe 5 seconds after the container is started. Attempt to connect to port 8080 of the goproxy container. If the probe is successful, the Pod will be marked as ready and the kubelet will continue to run the check every 10 seconds.

In addition to the readiness probe, this configuration includes a liveness probe. The kubelet will perform the first liveness probe 15 seconds after the container is started. The readiness probe will attempt to connect to the goproxy container on port 8080. If the liveness probe fails, the container will be restarted.

Run command check¶

YAML example:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness  # Container name
    image: k8s.gcr.io/busybox  # Container image
    args:
    - /bin/sh  # Command to run
    - -c  # Pass the following string as a command
    - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600  # Command to execute
    livenessProbe:
      exec:
        command:
        - cat  # Command to check liveness
        - /tmp/healthy  # File to check
      initialDelaySeconds: 5  # Wait 5 seconds before the first probe
      periodSeconds: 5  # Perform liveness detection every 5 seconds

The periodSeconds field specifies that the kubelet performs a liveness probe every 5 seconds, and the initialDelaySeconds field specifies that the kubelet waits for 5 seconds before performing the first probe. According to the set rules, the cluster periodically executes the command cat /tmp/healthy in the container through the kubelet to detect. If the command executes successfully and the return value is 0, the kubelet considers the container to be healthy and alive. If this command returns a non-zero value, the kubelet will kill the container and restart it.

Protect slow-starting containers with pre-start checks¶

Some applications require a long initialization time at startup. You need to use the same command to set startup detection. For HTTP or TCP detection, you can set the failureThreshold * periodSeconds parameter to a long enough time to cope with the long startup time scene.

YAML example:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 1
  periodSeconds: 10

startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10

With the above settings, the application will have up to 5 minutes (30 * 10 = 300s) to complete the startup process. Once the startup detection is successful, the survival detection task will take over the detection of the container and respond quickly to the container deadlock. If the start probe has been unsuccessful, the container is killed after 300 seconds and further disposition is performed according to the restartPolicy .