Backup and Restore Containers With Kubernetes Checkpointing API | by Martin Heinz | Nov, 2022

Kubernetes v1.25 introduces Container Checkpointing API – here’s how you can use it for container backup and restore or forensic analysis

generated with steady spread

Kubernetes v1.25 introduced the Container Checkpointing API as an alpha feature. It provides a way to backup and restore pod containers – without stopping them.

This feature is primarily aimed at forensic analysis, but general and restore is something any Kubernetes user can take advantage of.

So, let’s take a look at this brand-new feature and see how we can enable it in our cluster and take advantage of it for backup and restore or forensic analysis.

Before we start checking out the containers, we need a playground to play around with kubelet and its workloads. We will need a v1.25+ Kubernetes cluster and container runtime that supports container checkpointing.

we will create a cluster like this kubeadm Inside VM(s) built with Vagrant. i made one fund with everything needed to spin up such a cluster with just vagrant upSo if you want to follow along, check it out.

If you want to create your own cluster, make sure it meets the following:

should be clustered ContainerCheckpoint Enabled feature flag. For kubeadm Use the following configuration:

it will pass --feature-gates Flags for each of the cluster components. For a complete list of available feature gates, see docs,

We also need to use a container runtime that supports checkpointing. At the time of writing, only cri-o supports it with containerd Probably coming soon-ish.

To configure your cluster with CRI-O, install it using instructions in the docsor use feature script in the above mentioned repository (you should be running this in a VM, not on your local machine).

In addition, we need to enable criu For CRI-O, the tool that does the actual checkpointing in the background.

To enable it, we need to set --enable-criu-support=true flag. above feature script is that for you.

Also, if you plan to restore it back to pod, you will also need --drop-infra-ctr set to false. otherwise you will get CreateContainerError with messages like:

With CRI-O established, we also need to state kubeadm To use its socket, take care of the following configuration:

With this, we can spin-up a cluster:

This should give us a single node cluster like (note the container runtime version):

Comment: In general, the best and simplest way to play with Kubernetes is to use KindD. KinD, however (as of the time of writing), does not support container checkpointing. Alternatively, you can also try local-up-cluster.sh Script in Kubernetes repository.

Apart from this, we can try to make an outpost. Can be done with normal operations on Kubernetes kubectl or by running curl command against the Cluster API Server. However, this won’t work here, as the checkpointing API is only exposed on kubelet on each cluster node.

So, we have to jump on the node and talk kubelet directly:

To build checkpoints, we also need a running pod. instead of using system pods kube-systemLet’s create a dummy Nginx webserver default namespace:

Above, you can see that we have also removed the taint from the node – this allows us to schedule workloads on the node even though it is part of the control plane.

Next, let’s make a sample API request kubelet To see if we can get a valid response:

kubelet By default, runs on port 10250So we curl Ask for it and all of its pods. We also had to specify the CA certificate, client certificate and key for authentication.

Now it’s finally time to create a checkpoint:

Checkpointing API is available here .../checkpoint/$NAMESPACE/$POD/$CONTAINERhere we used webserver Pod created first. This request created a collection in /var/lib/kubelet/checkpoints/checkpoint-_--.tar,

depending on the setup you are using, after running the above curlYou may receive an error along the lines of the following:

This means that your container runtime does not (yet) support checkpointing, or it is not enabled properly.

Now that we have a checkpoint container archive, let’s take a look at what’s inside:

If you don’t need a running pod/container for analysis, extracting and reading some of the files shown above may give you the information you need.

Also, I’m no security expert, so I’m not going to give you questionable information about how to analyze these files here.

As a starting point, you might want to look at tools like docker-explorer either This talk On Container Forensics in Kubernetes.

While the Checkpointing API is currently aimed more at forensic analysis, it can still be used to restore a Pod/Container from Archive.

The easiest way is to create an image from the checkpoint collection:

Here we use an empty ( scratch) image to which we add the collection. we need to use ADD Because it automatically extracts the archive. Next, we make with docker either buildah,

Above we also specify the annotation, which describes original human-readable name of the containerAnd then we push it to a registry so Kubernetes can pull it.

Finally, we create a pod, specifying the image we pushed earlier:

To test whether it works, we can expose the pod through the service and curl Its IP:

And it worked! We successfully backed up the non-stop running pod and rebuilt it to its original state.

Thanks to CRIU, normal checkpoint and restore has been possible for containers for some time now, but it’s still a big step forward for Kubernetes, and hopefully we’ll see this feature/API in beta and eventually in GA.

The previous sections demonstrated the use of the Checkpointing API – it’s very useful, but it also lacks some basic features, such as basic restore functionality or support from all major container runtimes. So, be aware of its limitations if you enable it in a production (or even development) environment/cluster.

With that said, this feature is not only a great addition for forensic analysis – in the future, when better/native restore processes are available, this may become a reasonable backup and restore process for container workloads, which for some Can be very useful for the types of long running Kubernetes workloads.

Want to Connect?

This article was originally posted at martinheinz.dev

Leave a Reply