Docker is an emerging virtualization tool in recent years, which can realize the isolation of resources and system environment just like virtual machine. Based on the research report published by IBM, this paper will discuss the differences between Docker and traditional virtualization methods, and compare the performance differences among physical machines, Docker containers and virtual machines as well as the principle of the differences.
Docker and virtual machine implementation principle comparison
The following figure shows the implementation framework of virtual machine and Docker respectively.
Comparing the differences between the two figures, the Guest OS layer and the Hypervisor layer of the virtual machine in the left figure are replaced by the Docker Engine layer in Docker. The Guest OS of the virtual machine is the operating system installed on the virtual machine. It is a complete operating system kernel. The Hypervisor layer of a virtual machine can be simply understood as a hardware virtualization platform that exists as a kernel-type driver in the Host OS.
The way virtual machines achieve resource isolation is through the use of independent OS, and the use of Hypervisor virtualization CPU, memory, IO devices, etc. For example, for virtual CPUs, the Hypervisor creates a data structure for each virtual CPU, simulates the values of all the CPUs' registers, and tracks and modifies these values as appropriate. It should be noted that in most cases, the virtual machine software code runs directly on the hardware without the need for Hypervisor intervention. Only on high-power requests does the Guest OS need to run the kernel state to modify the register data of the CPU, and the Hypervisor intervenes to modify and maintain the virtual CPU state.
The Hypervisor virtualizes memory by creating a shadow Page Table. Normally, a page table can be used to translate from virtual memory to physical memory. In the case of virtualization, since the so-called physical memory is still virtual, the shadow page table will do: virtual memory -> virtual physical memory -> real physical memory.
For IO device virtualization, when the Hypervisor receives the page fault and discovers that the virtual physical memory address actually corresponds to an I/O device, the Hypervisor uses software to simulate the device's behavior and returns. For example, when the CPU wants to write to a disk, the Hypervisor writes the corresponding data to a Host OS file, which in effect simulates the virtual disk.
Docker is much more concise when compared with virtual machines to achieve resource and environment isolation. Docker Engine can be viewed as a simple encapsulation of Linux's NameSpace, Cgroup, and image management file system operations. Docker does not use a completely separate Guest OS to achieve environment isolation like the virtual machine. It uses the container mode supported by the Linux kernel itself to achieve resource and environment isolation. To put it simply, Docker uses namespace to achieve isolation of system environment; Use Cgroup to realize resource limitation; Use mirroring to isolate the root environment.
Through the comparison of Docker and virtual machine implementation principles, we can roughly draw some conclusions:
(1) Docker has fewer layers of abstraction than virtual machines. Since Docker does not need Hypervisor to realize hardware resource virtualization, programs running on Docker container directly use the hardware resources of the actual physical machine. Therefore, in terms of CPU and memory utilization, Docker will have advantages in efficiency. Specific efficiency comparison will be given in the following sections. On the IO Device virtualization, there are many solutions for Docker image management, such as using AUFS file system or Device Mapper to achieve Docker file management. The efficiency of each implementation solution is slightly different.
(2) Docker uses the kernel of the host, and does not need Guest OS. Therefore, when creating a new container, Docker does not need to reload an OS kernel as the virtual machine does. As we know, booting and loading the operating system kernel is a time-consuming and resource-intensive process. When creating a new virtual machine, the virtual machine software needs to load the Guest OS. This new process is on a minute-level. Docker omits this process by directly using the host's operating system, so it only takes a few seconds to create a new Docker container. In addition, the modern operating system is a complex system, and the resource cost of adding a new operating system to a physical machine is relatively large. Therefore, Docker also has a relatively large advantage in resource consumption compared with virtual machine. In fact, we can easily set up hundreds or thousands of containers on a single physical machine, but only a few virtual machines.
Comparison of computational efficiency between Docker and virtual machine
In the previous section, we speculated from a theoretical point of view that Docker should be more efficient in CPU and memory utilization than the virtual machine. In this section, we will analyze the data given by the paper published by IBM. The following data were measured on an IBM X3650 M4 server. The main hardware parameters are:
(1) Two Intel Xeon E5-2655 processors with a main frequency of 2.4-3.0GHz. Each processor has eight cores, so there are 16 cores in total.
(2) 256 GB RAM.
In the test, the computing power data is obtained by computing Linpack program. The results are shown below:
From left to right are the computing power data of the physical machine, Docker and virtual machine. It can be seen that compared with the physical machine, Docker has almost no loss of computing power, while the virtual machine has very obvious loss compared with the physical machine. The computing power loss of the virtual machine is around 50%.
Why is there such a loss of performance? On the one hand, because the virtual machine adds a layer of virtual hardware, the application running on the virtual machine is running on the CPU of the Hypervisor virtual during the numerical calculation; On the other hand, there are differences due to the characteristics of the computing program itself. The virtual CPU architecture of virtual machines is different from the real CPU architecture. In general, numerical programs have certain optimization measures for specific CPU architectures. Virtualization makes these measures invalid or even counterproductive. For example, for the platform in this experiment, the actual CPU architecture is 2 physical CPUs, each CPU has 16 cores, a total of 32 cores, using NUMA architecture; Virtual machines, on the other hand, virtualize CPUs as a single CPU with 32 cores. As a result, the calculation program can not be optimized according to the actual CPU architecture, which greatly reduces the calculation efficiency.
Comparison of memory access efficiency between Docker and virtual machine
The comparison of memory access efficiency is relatively complex, mainly because there are many scenarios for memory access:
(1) large quantities of memory data reading and writing of continuous address blocks. The performance data obtained in this test environment is memory bandwidth, and the performance bottleneck is mainly in the performance of the memory chip.
(2) Random memory access performance. The performance data in this test environment are mainly related to memory bandwidth, cache hit ratio, and the efficiency of virtual address and physical address translation.
The following sections focus on these two memory access scenarios. Before we start, let's outline the differences between the memory access models of Docker and virtual machines. Docker and the virtual machine memory access model are shown below:
It can be seen that in terms of application memory access, the application of virtual machine has to carry out two times of virtual memory to physical memory mapping, and the cost of reading and writing memory is higher than that of Docker's application.
The following figure shows the test data of scenario (1), namely the memory bandwidth data. The figure on the left shows the program running on one CPU (8 cores), and the figure on the right shows the program running on two CPUs (16 cores). The units are all Gb /s.
As can be seen from the data in the figure, there is not much difference between Docker and virtual machine in the performance of memory bandwidth. This is because in the memory bandwidth test, the memory addresses read and write are contiguous, large quantities, and the kernel is optimized for this operation (data pre-access). Therefore, the number of mappings between virtual memory and physical memory is relatively small, and the performance bottleneck is mainly in the read and write speed of physical memory. Therefore, there is little difference between the test performance of Docker and virtual machine in this case.
In the memory bandwidth test, the reason why the memory access performance of Docker and virtual machine is not significantly different is that the number of mapping from virtual address to physical address is relatively small in the memory bandwidth test. Based on this assumption, we hypothesize that the performance gap between the two will widen when we run random memory access tests, because the number of times that virtual memory addresses need to be mapped to physical memory addresses will increase. The results are shown in the figure below.
The left image shows the program running on one CPU, and the right image shows the program running on two CPUs. As can be seen from the left figure, indeed, as predicted, the performance gap between the container and the virtual machine in random memory access performance became obvious. The container's memory access performance was significantly better than that of the virtual machine. However, to our surprise, when we ran the test program on two CPUs, the gap between the container and the virtual machine's random memory access performance became less significant.
The IBM paper offers a plausible explanation for this phenomenon. This is because when there are two CPUs accessing memory at the same time, the control of memory reading and writing will become more complicated, because two CPUs may read and write data at the same address at the same time, which requires some synchronization operations on the memory data, resulting in loss of memory reading and writing performance. This loss exists even for physical machines, as you can see that the memory access performance data in the right figure is lower than that in the left figure. The loss of two CPUs on memory read and write performance is very large, and the proportion of this loss is far greater than the difference between the virtual machine and Docker due to the different memory access models. Therefore, in the figure on the right, there is no obvious difference between the random memory access performance of Docker and the virtual machine.
Comparison of startup time and resource consumption between Docker and virtual machine
The above two sections compare the performance of the application running in Docker and the application running in the virtual machine. In fact, another important reason why Docker attracts so much attention from developers is that the system cost of starting Docker is much lower than that of starting a virtual machine: both from the perspective of startup time and startup resource consumption. Docker directly uses the system kernel of the host machine, avoiding the system boot time required by the virtual machine startup and the resource consumption of operating system running. Docker can launch a large number of containers in a matter of seconds, which is not possible with a virtual machine. The advantages of fast startup and low system resource consumption make Docker have a good application prospect in elastic cloud platform and automatic operation and maintenance system.
Docker disadvantage
The previous content mainly discusses Docker's advantages over virtual machines, but Docker is not a perfect system either. Compared with virtual machine, Docker still has the following disadvantages:
1. Not as good as virtual machine in terms of resource isolation. Docker uses cgroup to realize resource restriction, which can only limit the maximum value of resource consumption rather than isolate other programs from occupying their own resources.
2. Security issues. At present, Docker does not distinguish the user who executes the instructions. As long as a user has the permission to execute Docker, he can perform all operations on Docker's container, no matter whether the container is created by the user or not. For example, both A and B have the authority to execute Docker. Since Docker's server side does not specifically determine which user initiated Docker Cline, A can delete the container created by B, which poses A certain security risk.
3. Docker is still in the rapid update of version, and the detailed functions are adjusted greatly. Some core modules rely on a higher version of the kernel and have version-compatibility issues
Share on: