Docker for development, why, and how?

# Docker for development, why? and how?

Docker is one of the most used tools in the DevOps world. It is considered the de facto containerization application. However, it might be a little vague for newcomers what Docker is exactly used for. To some junior DevOps engineers, they might not see a big difference between a Docker container and a virtual machine (perhaps a Vagrant one). And, more importantly, why not just use a configuration management application (which is another DevOps tool) to do the job? So, to let's be all on the same page and read along.

# The problem

Any invention or technology must address a certain problem or pain point. Even in non-tech circles, you'll always learn that your product must solve a problem or a pain point. So, what does Docker solve here?

Let's say you're a developer working on a NodeJS web application. Throughout coding the app, you come to install some dependencies. NodeJS uses a tool called `npm` to make dependency satisfaction easier. You just keep a file called `package.json` in the root directory of the app, where you add all the libraries and modules that your software will need, and then run `npm install.` Now all is good. But, what about NodeJS itself? it must be installed first, right? here is where the problem starts to arise. Which version of NodeJS was this app built against? Although NodeJS is very generous when it comes to the platform where is can run, but sometimes, a specific version needs a specific OS and, more importantly, kernel version. What if you are using the cutting edge edition? This means that you will need to compile it first before you can use it! Compiling NodeJS (or any software) is not as easy as just installing it. You will need compilers like `gcc`, libraries like `glibc`, `libstdc++` and many others.

![Docker tries to solve the dependencies pain point](https://www.linuxschoolonline.com/wp-content/uploads/2018/08/docker-problem.png)

As you can see, lots of questions need to be asked by the ops team to the devs team. Each question may lead to a totally different path.

Of course, a seasoned ops engineer or system administrator does not have a problem asking the right questions and deploying the needed infrastructure. But how could they possible do this?

## Shell scripting

A well-written shell script can do the job. But it has its own shortcomings:

* Now the application needs new components. We need to ensure that they are installed on the environment.
* The devs are using a new tool called `gulp` to perform some tasks. We need this tool to be added to the deployment script.
* Running the script several times (to deploy the missing dependencies or to reset the environment) throws errors. We need to modify it so that it only deploys what's missing. This leads to a thicker script file and, thus, a more error-prone, hard-to-debug one.
* The devs want a new environment to test a cutting-edge feature. They can't just mess with the already running one, so they want the ops team to provision a new environment, possibly with some differences than the original one. This means a new script version needs to be written, taking into account the modifications demanded by the dev team.

## A virtual machine or Vagrant

A virtual machine might solve the above problems, especially if you combine it with a configuration management tool like Ansible. You'd even save the installation and setup time when you use Vagrant.

But virtual machines and Vagrant suffer their own drawbacks.

![Vagrant problems](https://www.linuxschoolonline.com/wp-content/uploads/2018/08/vagrant-problems.png)

As you can see from the diagram, you can use a virtual machine and, using Ansible for example, you can automate creating environments that are exactly the same as each other. You can also made some changes to the Ansible *playbook* to make any required environment modifications without risking the introduction of bugs or wasting time. You can also use Vagrant to encapsulate the whole environment in a *box* and use that to spawn already-configured Vagrant machines as much as needed.

But did you notice that that each machine is consuming a dedicated share of the host's CPU, memory, and disk space? Those resources are not used solely for running the NodeJS application. They are also used to power the OS itself. This means a lot of wasted computing power, especially when more environments are needed.

## What about a virtual machine that does NOT need a dedicated OS?

That is a rough definition of a Linux container. A container encapsulates whatever application you want to deploy with all the dependencies that it might need, even the OS related ones.

![Docker containers](https://www.linuxschoolonline.com/wp-content/uploads/2018/08/docker-solution.png)

The trick here is, we will not need a complete OS to run it. Only the kernel libraries that are needed will be packaged. This results in:

* Much smaller *image* size. Some Linux images like [Alpine](https://hub.docker.com/_/alpine/) is only a few megabytes!
* Much faster load time. A Docker *container* can start in literally milliseconds.
* Mush spared resources. Since the container does not need a complete OS, the CPU cycles and memory addresses consumed by a dedicated OS are spared.
* A container is just a process on your system. You can create hundreds (even thousands) of them on a single host. The same is not true for virtual machines unless you own a very powerful host. This results in easier, cheaper, and more robust scalability options.
* Docker is currently supported on Linux, Windows (version 10 pro), and modern macOS. This means that you can build your image on an Ubuntu 16 machine and run it on the newer Ubuntu 18, or even Windows or macOS. This means even easier deployments and more costs saved.

# Containers, images, Docker...I'm confused!

Newcomers to Docker often do not realize the difference between an image and a container. They also think that Docker is another name for containerization. Let's make things clearer.

## Image

The image is like the template that Docker uses to spawn containers. Just like a class is used to create objects (if you are a developer), or the binary file is used to launch processes (if you are a sysadmin).

A typical image contains:

* **Files:** like application binaries, dependencies, libraries, kernel modules and so on
* **Metadata:** instructions for how the container will behave. For example, which process it will run, which network ports it will expose, which *volumes* it will use for persistent storage, among other settings.

## Container

A container is the manifestation of the image. It's just a process like any other process on your system, yet it is much richer. A container encapsulates a complete application, with everything it needs to operate correctly. Let's see how our NodeJS container might look like on your system:

![Dockerfile is used to spawn containers](https://www.linuxschoolonline.com/wp-content/uploads/2018/08/Dockerfile-and-image.png)

The `Dockerfile` is the instructions that are used when building the image. Once the image is built, it can be used to start multiple containers, all sharing the same characteristics and behavior.

## Docker is not containerization

The last ambiguous point here is: what exactly is Docker? Docker is an open-source project that was first released in March 2013. It uses the containerization technology to create an operating-system-level of virtualization. Originally, it was based on the [LXC](https://en.wikipedia.org/wiki/LXC) technology, but since 2014, it used its own containerization technology, libcontainer, which is built using the Go programming language.

The point that I want to drive home here is that Docker is an implementation of a technology that already existed before, it is not the containerization method itself.

# When not to use Docker?

Docker is very robust when it comes to quickly deploying complete infrastructures with the minimum cost and fastest load times. However, sometimes it may not be your best option if:

* You application runs only on Windows or macOS. Till the time of this writing, Docker can only be used to containerize applications that run on Linux kernel. This means that you cannot convert a native Windows or macOS application and run it elsewhere. You can, however, run a Linux container on Linux, Windows, or macOS.
* Your application is tightly coupled with the OS. Sometimes, you may need to make specific, low-level changes to the OS so that your application can run as expected. Think of security tools that must have direct access to the CPU or memory of the OS to scan for threats.
* Your application works in GUI only. Although the trend is moving fast towards client-server applications, where you access your interface using a thin client (aka web browser), there are still some legacy applications that need a GUI to work. Think of a word processor or a spreadsheet app that does not have an SaaS (Software as a Service) cloud version and you get the point.

# Other vendors in the market

Despite its popularity, Docker is not the only software that uses Linux containerization. We have other vendors like [Apache Mesos](https://en.wikipedia.org/wiki/Apache_Mesos).

# Technologies built on top of Docker

In addition, there are other software applications that are built on top of Docker to create and orchestrate clusters of containers. Examples of those are [Kubernetets](https://kubernetes.io/) and [Red Hat Openshift](https://www.openshift.com/products/container-platform/).

# Conclusion

In this article, I outlined the challenges that containerization and Docker try to solve. I demonstrated a simple web application deployment example and walked you through the possible paths that you might follow if you will use virtualization technologies vs. containerization. Then, I briefly stated some of the use cases where Docker is not the best choice. Finally, I showed you examples of other Docker-related technologies. I hope you liked this article, please drop me a comment if you want to ask or suggest anything.

6 thoughts on “Docker for development, why, and how?”

  1. As a developer who went from working on a desktop app that used our company’s API to getting thrown onto a project to containerize said API, thank you for this post. You really did a good job explaining things in my opinion. This article helped clear up some things for me that have been a major sticking point. The only thing I would say is that a longer discussion of how Kuberneties (or docker swarm, or whatever orchestration you use), Docker, and Chef/Ansible/Puppet/etc. are used in tandem or relate. My company uses Chef for our current system, but it seems like everything Chef does is replaced by Dockerfiles and Kuberneties? This is the area that I personally have struggled with understanding and haven’t really found a good article to read that makes me feel confident in what I am doing (especially when I have try to explain things to experienced DevOps engineers who don’t have time to read about Docker). I, and I assume other engineers new to DevOps, would find an elaboration on that point incredibly helpful. Other than that, I loved the post. Well written, I followed along and learned some things. Even the parts I read where I already knew what you were talking about I can see how valuable and well-written the sections were.

    Reply
  2. >You application runs only on Windows or macOS. Till the time of this writing, Docker can only be used to containerize applications that run on Linux kernel. This means that you cannot convert a native Windows or macOS application and run it elsewhere. You can, however, run a Linux container on Linux, Windows, or macOS.

    Incorrect. Microsoft provides a number of Windows based images (e.g. [https://hub.docker.com/r/microsoft/dotnet-framework/](https://hub.docker.com/r/microsoft/dotnet-framework/)). We are using them to containerize a number of .NET Framework and Core applications that only run on Windows.

    Reply
  3. Full disclosure: I only skimmed your article; it looks great and will read it in full later.

    I’m a web dev and sysops guy for nearly 30 years. Docker has absolutely revolutionized local laptop development for me.

    It’s hard to explain how amazing Docker is to someone who has grown up in the virtualization era. But when you’ve been in IT for as long as I have, you remember what it was like to get your CEO and CFO to understand why you’re submitting a $25,000 proposal and a 3 month timeline as to why you need to invest in servers, hardware, RAM, routers, switches, networking cables, and software licenses, to be able to “try out a new technology.” A new idea that may, or may not, work out.

    Gone are the days when making a poor choice in hardware and software meant scrapping or repurposing those huge investments and month-long delays in your R&D initiative.

    Even just a few years back, getting a dozen services running on commodity hardware (my development PC or Mac was top of the line, too) was barely feasible, if at all.

    Fast forward to the Docker era, and you can spin up a different database platform by 9:15am and be running benchmarks during your coffee break at 10:30. You can develop on your Mac, using not an emulation, but the exact same collection of servers and OSes you’re going to be deploying on in production.

    Fast forward to today, and you can spin up whole stacks of virtual servers that work together- like a whole MEAN stack, or a whole ELK stack, in under an hour. Just to see what it does. Just for fun. And trash it just as fast if you don’t like it. Or hibernate it if you’re going surfing. Or pause it in the terminal before waking it up when you’re seatbelt is fastened and you’ve reached cruising altitude.

    Fast forward to today, and you can commit all these interwoven hardware emulations to your source code repository as text files, to be shared and replicated identically to your dispersed team worldwide.

    Few technologies have truly increased what a single developer or system/platform/service architect is capable of, more than Docker.

    And I haven’t even learned Kubernetes yet.

    Reply
  4. You did a good job of expressing *why* you might want to use Docker for development but you did not really tell us *how*. Where do we put our code? Should it be inside the container or outside the container on a shared volume? You didn’t tell us. Are you suggesting that people should use an Ubuntu container and manually install all of their software in that? Or should we start with a Python or Node.js container? We don’t know, you didn’t show us an example. Do we do our development work from inside the container or outside? This wasn’t specified. You started to say “*Let’s see how our NodeJS container might look like on your system:*” but then you didn’t show us. You should explain to the reader exactly how you do this and what the tradeoffs are.

    For example: I use **Vagrant + VirtualBox + Docker** for all of my development. I use Docker inside the Vagrant VM to supply all of the middleware that I need (e.g., Redis, PostgreSQL, Kafka, etc.) and do all of my development from within an Ubuntu VM that I spin up with Vagrant. Then I package up my application as a Docker image for deployment. I’ve given conference tutorials on the topic. You can see my slides here: [https://www.slideshare.net/JohnRofrano/making-developers-productive-with-vagrant-virtualbox-and-docker](https://www.slideshare.net/JohnRofrano/making-developers-productive-with-vagrant-virtualbox-and-docker)

    u/abohmeed I don’t want to discourage you from making the article better so please take my comments in the constructive spirit in which I am intending them. You did a great job. Try and add an example `Dockerfile` so that readers can cut and paste it and get started on their own. Explain the workflow to them. Here is one possible scenario but please elaborate on how you use Docker while developing.

    Here is an example for developing with Docker and Python 3:

    Create a `Dockerfile` with the following contents:

    FROM python:3.6-alpine
    EXPOSE 5000
    WORKDIR /app

    This will create a Docker image that has Python 3.6 installed, exposes port `5000` so that you can get to it from outside the container (I use Flask for my microservice development which defaults to port 5000) and creates an `/app` folder to hold your code.

    Next, build a Docker image called `myapp` from the `Dockerfile` using the `docker build` command:

    docker build -t myapp .

    Create a container from the `myapp` image using the `-v` parameter to map your current directory to the `/app` folder inside the container. This command will also start a shell prompt inside the container:

    docker run –rm -it -v $(PWD):/app myapp sh

    You are now running inside the Python container. Check what version of Python you have:

    /app # python –version
    Python 3.6.6

    Congratulations! You are all set to start coding. Any changes that you make to the contents of the `/app` folder inside the container will actually be saved to the current folder on your host computer because you have mapped it with the volume parameter: `-v $(PWD):/app` This way you can delete and recreate the container any time you want and your code will always be there (and hopefully checked into GitHub).

    This is just an example of explaining “*how*” you would use Docker for doing development. Feel free to use it as a basis for your Node.js example in your article. Keep up the good work.

    \~jr

    Reply
  5. Docker is the best-in-breed tourniquet for a self-inflicted wound of bad stack management.

    It’s a sign that ops are being far outnumbered by devs. But that’s kind of the story of computing since the 1950s.

    Reply

Leave a Comment