Mastering Docker Volumes
For those just starting with Docker, I've written some posts on how to think like containers, which I believe to be an important mindset to using Docker, as well as using Docker in development, the right way.
However, if you want to move one step further on mastering Docker, it's important to understand how volumes and networking work in Docker.
In this post, I'll guide you through the Volume basics, leaving Networking for another post.
Containers are ephemeral
It's important to keep in mind that containers are ephemeral. Whenever the container finishes executing the command, it's completely "shut down".
=>
It starts a new container using the image node and executes the command ls (list files and directories) inside the container.
The command gives an output and is finished. Once the command is finished, the container is shut down and all its data is lost.
Executing a Javascript program
Let's suppose we have a file containing a very basic Javascript program:
helloWorld.js
How can we run this program using containers? One could try to do something like:
...which raises the error:
That's because containers are isolated and do not share the same Host filesystem.
Sync data between Host and Container
We have to sync the file helloWorld.js with the containers. Docker provides a way to mount volumes, which means:
I want to mount a directory or file from the Host to the Container, so every change made in the Host will be mirrored to the Container, and vice-versa.
You change the file/directory in the Host, the container will see the changes. You change the file/directory in the container, the Host will see the changes.
Let's sync the helloWorld.js with the container:
...where:
pwd refers to the full pathname of the current directory
-v {hostPath}:{containerPath} mounts the file/directory to the container
node is the image
node /app/helloWorld.js executes the command using the path defined inside the container,
/app/helloWorld.js
A real project
Suppose we have a project tree like this:
From inside the docker-101 directory, we want to run a container using the index.js file, but all the files in the src folder must be in the container app as well.
All the files within the pwd, current directory, will be mounted to the /app within the container.
What if we want to enter the container and manipulate the files from there?
...where:
-it instructs Docker to keep the container terminal (bash/shell) open
bash opens a new bash/shell inside the container
Now, after exiting the container, if we perform lsfrom the host, we get:
This type of volume is called Path Volume.
Using mount option
Another way to mount volumes is by using the --mount option, which specifies the type of the mount, a source and a target.
The mount is more explicit, but in most cases using -v should be good enough.
Another real example
Imagine we want to print the timestamp for current time. We can do this in a lot of ways, but as for now we're going to use the underscore library.
index.js
var _ = ;
We execute docker run -it -v $(pwd):/app node node /app/index.js and we get the following error:
Obviously, we have to add the underscore dependency to our project. Let's do it using npm:
Note the -w /app, which means working directory, where we are telling Docker to execute the command from within that directory. In our case, the command npm add underscore will be executed inside the /app directory in the container.
Now, we can run our index.js:
Want some NPM cache?
Sometimes we need to speedup the npm process using cache. Inside the container, by default, the npm stores cache at /root/.npm. You can check that by running:
=>
At this moment, we are only syncing the -v $(pwd):/app volume. When the container writes the npm cache to /root/.npm, we are not syncing back to the Host.
We learned that we can mount multiple volumes to the container, so let's do that:
When we perform ls from the Host, we get the following:
However, we do not want to keep track of "npm_cache" folder in our project. It's not our business.
What if we wanted to store such cache in another path in the Host, therefore we do not care where it's stored?
Docker provides a special type of Volume, where the path in the Host is chosen by Docker, hence the only thing we need to know is its name. Docker knows where it is located, but we do not care.
Yes, they are called Named Volumes.
[
{
}
The Mountpoint is the exact path in the Host. So, every container using this volume will sync their data directly to this mountpoint, leaving us to manage and know the volume by using its name elsewhere, "my-volume".
In our example of underscore, we can start the container using named volumes too:
This way, Docker will create a volume called npm_cache and mount it to the container at /root/.npm.
Don't believe me? Go check yourself:
[
{
}
Yay! Much clearer, as we keep the speedup of our npm process.
Wrapping up
In this post we learned how to use Path Volumes and Named Volumes in Docker.
Path volumes are very helpful when we want to keep track the path in the host, i.e mounting the entire project we are working on.
Named volumes are handy when we do not want to keep track the path in the host, but instead leveraging Docker to choose the mountpoint for us, i.e using cache for NPM, Yarn, Ruby bundler, Python packages and so on.