Docker: Container Lifetime & Persistent Data
Updated: Jan 25, 2022
All the lectures in this series focused on running and managing containers without focusing on the data generated while the container is up and running. I think that it's well understood as containers are usually immutable and ephemeral (and in simple language, containers are unchanging or disposable). The idea here is that we can throw away an image and create a new one from an image, a task that will take only a few seconds; the idea in doing it is that we keep containers running with the current configuration, and if there is any update, we will throw away the container and use a new one with the updated configuration instead,
This gives us high benefits and increases reliability and consistency, which is priceless in the DevOps world. However, there is a thread-off here; what about databases or containers used to store unique data? Ideally, a container shouldn't contain unique data mixing it with its binaries; this is known as "separation of concerns." Docker provides us with the option to create a new container with an updated version of our application. However, the data used by the old container is still preserved and available for the new container to use.
So now that we understand the problem of Persistent Data in the Docker world, what is the solution we have to handle it? Docker provides two ways of using "Volumes and Bind Mounts." So docker volumes are configuration options for a container that creates a unique location outside that container file system to store unique data. This preserves it across container removal and allows us to attach it to whatever container we want (the container will see it just as a file path).
Then there are "Bind Mounts," simply sharing or mounting a host directory/file into a container.
Persistent Data: Data Volumes
I know that volumes seem pretty simple upfront, but there's a lot to it. The first way you can tell a container that it needs to use a specific volume is in a DockerFile. Let's see an example of a MySQL container.
First, let access to Docker hub and search for the MySQL image at this address: https://hub.docker.com/search?q=mysql&type=image
Now, let's click on it to access the latest DockerFile of this image:
Now, because MySQL is a database, we can expect that the DockerFile will have a volume command. And guess what? It has one:
This is the default location of the MySQL database. This image is programmed in a way that tells Docker that when we start a new container from this image, to create a new volume location and assign it to this directory in the container, which means any files that we put in there, in the container will outlive the container until we manually delete the volume.
Note: Volumes must be removed manually, you cannot clean them when removing a container, and that's great when talking about Persistent Data, as it provides an additional layer of protection for deleting essential data.
Let's pull the MySQL image using the "docker pull MySQL" command:
and inspect it using the "docker image inspect MySQL" command. If you search on the output, you will see that the DockerFile does not appear in the image configuration. This is because the Dockerfiles isn't part of the image metadata, which you'll notice in this config area that it specified that volume there:
Now, let's continue and run a container from that image:
λ docker container run -d --name mysql_2 -e MYSQL_ALLOW_EMPTY_PASSWORD=true mysql
Now, let's inspect this container, and as you can see from the retuned output, the volume configuration is set to a specific path:
and we can also see it under "Mounts":
You see here that the running container gets its unique location on the host to store that data, and then it's in the background, mapped or mounted. To that location in the container. So that the location in the container thinks it's writing to/var/lib/MySQL. However, if you take a closer look, you will see that the data is actually living in other location in the host: "Source": "/var/lib/docker/volumes/2d3c7a825e010f65e584aeab4ef7057def45cbd7811c08cbba8055e22c2457f0/_data",
We can check this address by inspecting the image using the inspect command:
and take it even further by examining the volume itself:
By looking at this mount point, you probably can tell that it's not very user-friendly in terms of telling us what's in it or what this volume is assigned to. We can see the volume used by the container but still can't answer the question of what the volume is connected to, right? We can easily demonstrate it by creating other containers from that image:
well, you can easily see the problem, I created three new containers from that image, and as you can see, there's no real easy way here to tell one from the other. Now, to prove that the data is kept even if the containers are removed, I will remove all containers and check the status of their images:
So how can we make the access to Docker images more friendly? Well, that's where names volumes come to help because we can use this option to specify friendly names when running a new container using the -V command:
λ docker container run -d --name SQLSERVER -e MYSQL_ROOT_PASSWORD=password -v mysql-database:/var/lib/mysql mysql
now, if we re-run the Docker volume, we will see that the volume is created with the name we just specified instead of an ID as we saw until now:
And if we inspect that volume, this becomes even more readable: