287 lines
12 KiB
Markdown
287 lines
12 KiB
Markdown
---
|
|
title: "LXD: Containers for Human Beings"
|
|
subtitle: "Docker's great and all, but I prefer the workflow of interacting with VMs"
|
|
date: 2023-09-19T14:26:00-04:00
|
|
categories:
|
|
- Technology
|
|
tags:
|
|
- Sysadmin
|
|
- Containers
|
|
- VMs
|
|
- Docker
|
|
- LXD
|
|
draft: true
|
|
toc: true
|
|
rss_only: false
|
|
cover: ./cover.png
|
|
---
|
|
|
|
This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and
|
|
SouthEast LinuxFest 2023. The first was not recorded, but the second was and is
|
|
on [SELF's PeerTube instance.][selfpeertube] I apologise for the terrible audio,
|
|
but there's unfortunately nothing I can do about that. If you're already
|
|
intimately familiar with the core concepts of VMs or containers, I would suggest
|
|
skipping those respective sections. If you're vaguely familiar with either, I
|
|
would recommend reading them because I do go a little bit in-depth.
|
|
|
|
[selfpeertube]: https://peertube.linuxrocks.online/w/hjiTPHVwGz4hy9n3cUL1mq?start=1m
|
|
|
|
{{< adm type="warn" >}}
|
|
|
|
**Note:** Canonical has decided to [pull LXD out][lxd] from under the Linux
|
|
Containers entity and instead continue development under the Canonical brand.
|
|
The majority of the LXD creators and developers have congregated around a fork
|
|
called [Incus.][inc] I'll be keeping a close eye on the project and intend to
|
|
migrate as soon as there's an installable release.
|
|
|
|
[lxd]: https://linuxcontainers.org/lxd/
|
|
[inc]: https://linuxcontainers.org/incus/
|
|
|
|
{{< /adm >}}
|
|
|
|
Questions, comments, and corrections are welcome! Feel free to use the
|
|
self-hosted comment system at the bottom, send me an email, an IM, reply to the
|
|
fediverse post, etc. Edits and corrections, if there are any, will be noted just
|
|
below this paragraph.
|
|
|
|
## The benefits of VMs and containers
|
|
|
|
- **Isolation:** you don't want to allow an attacker to infiltrate your email
|
|
server through your web application; the two should be completely separate
|
|
from each other and VMs/containers provide strong isolation guarantees.
|
|
- **Flexibility:** <abbr title="Virtual Machines">VMs</abbr> and containers only
|
|
use the resources they've been given. If you tell the VM it has 200 MBs of
|
|
RAM, it's going to make do with 200 MBs of RAM and the kernel's <abbr
|
|
title="Out Of Memory">OOM</abbr> killer is going to have a fun time 🤠
|
|
- **Portability:** once set up and configured, VMs and containers can mostly be
|
|
treated as closed boxes; as long as the surrounding environment of the new
|
|
host is similar to the previous in terms of communication (proxies, web
|
|
servers, etc.), they can just be picked up and dropped between various hosts
|
|
as necessary.
|
|
- **Density:** applications are usually much lighter than the systems they're
|
|
running on, so it makes sense to run many applications on one system. VMs and
|
|
containers facilitate that without sacrificing security.
|
|
- **Cleanliness:** VMs and containers are applications in black boxes. When
|
|
you're done with the box, you can just throw it away and most everything
|
|
related to the application is gone.
|
|
|
|
## Virtual machines
|
|
|
|
As the name suggests, Virtual Machines are all virtual; a hypervisor creates
|
|
virtual disks for storage, virtual <abbr title="Central Processing
|
|
Units">CPUs</abbr>, virtual <abbr title="Network Interface Cards">NICs</abbr>,
|
|
virtual <abbr title="Random Access Memory">RAM</abbr>, etc. On top of the
|
|
virtualised hardware, you have your kernel. This is what facilitates
|
|
communication between the operating system and the (virtual) hardware. Above
|
|
that is the operating system and all your applications.
|
|
|
|
At this point, the stack is quite large; VMs aren't exactly lightweight, and
|
|
this impacts how densely you can pack the host.
|
|
|
|
I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in
|
|
general do, but there are actually two different kinds of hypervisor. They're
|
|
creatively named **Type 1** and **Type 2**.
|
|
|
|
### Type 1 hypervisors
|
|
|
|
These run directly in the host kernel without an intermediary OS. A good example
|
|
would be [KVM,][kvm] a **VM** hypervisor than runs in the **K**ernel. Type 1
|
|
hypervisors can communicate directly with the host's hardware to allocate RAM,
|
|
issue instructions to the CPU, etc.
|
|
|
|
[debian]: https://debian.org
|
|
[kvm]: https://www.linux-kvm.org
|
|
[vb]: https://www.virtualbox.org/
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.h: Type 1 hypervisor
|
|
hk.h.k1: Guest kernel
|
|
hk.h.k2: Guest kernel
|
|
hk.h.k3: Guest kernel
|
|
hk.h.k1.os1: Guest OS
|
|
hk.h.k2.os2: Guest OS
|
|
hk.h.k3.os3: Guest OS
|
|
hk.h.k1.os1.app1: Many apps
|
|
hk.h.k2.os2.app2: Many apps
|
|
hk.h.k3.os3.app3: Many apps
|
|
```
|
|
|
|
### Type 2 hypervisors
|
|
|
|
These run in userspace as an application, like [VirtualBox.][vb] Type 2
|
|
hypervisors have to first go through the operating system, adding an additional
|
|
layer to the stack.
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.os: Host OS
|
|
hk.os.h: Type 2 hypervisor
|
|
hk.os.h.k1: Guest kernel
|
|
hk.os.h.k2: Guest kernel
|
|
hk.os.h.k3: Guest kernel
|
|
hk.os.h.k1.os1: Guest OS
|
|
hk.os.h.k2.os2: Guest OS
|
|
hk.os.h.k3.os3: Guest OS
|
|
hk.os.h.k1.os1.app1: Many apps
|
|
hk.os.h.k2.os2.app2: Many apps
|
|
hk.os.h.k3.os3.app3: Many apps
|
|
```
|
|
|
|
## Containers
|
|
|
|
VMs use virtualisation to achieve isolation. Containers use **namespaces** and
|
|
**cgroups**, technologies pioneered in the Linux kernel. By now, though, there
|
|
are [equivalents for Windows] and possibly other platforms.
|
|
|
|
[equivalents for Windows]: https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs
|
|
|
|
**[Linux namespaces]** partition kernel resources like process IDs, hostnames,
|
|
user IDs, directory hierarchies, network access, etc. This prevents one
|
|
collection of processes from seeing or gaining access to data regarding another
|
|
collection of processes.
|
|
|
|
**[Cgroups]** limit, track, and isolate the hardware resource use of a
|
|
collection of processes. If you tell a cgroup that it's only allowed to spawn
|
|
500 child processes and someone executes a fork bomb, the fork bomb will expand
|
|
until it hits that limit. The kernel will prevent it from spawning further
|
|
children and you'll have to resolve the issue the same way you would with VMs:
|
|
delete and re-create it, restore from a good backup, etc. You can also limit CPU
|
|
use, the number of CPU cores it can access, RAM, disk use, and so on.
|
|
|
|
[Linux namespaces]: https://en.wikipedia.org/wiki/Linux_namespaces
|
|
[Cgroups]: https://en.wikipedia.org/wiki/Cgroups
|
|
|
|
### Application containers
|
|
|
|
The most well-known example of application container tech is probably
|
|
[Docker.][docker] The goal here is to run a single application as minimally as
|
|
possible inside each container. In the case of a single, statically-linked Go
|
|
binary, a minimal Docker container might contain nothing more than the binary.
|
|
If it's a Python application, you're more likely to use an [Alpine Linux image]
|
|
and add your Python dependencies on top of that. If a database is required, that
|
|
goes in a separate container. If you've got a web server to handle TLS
|
|
termination and proxy your application, that's a third container. One cohesive
|
|
system might require many Docker containers to function as intended.
|
|
|
|
[docker]: https://docker.com/
|
|
[Alpine Linux image]: https://hub.docker.com/_/alpine
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
Host kernel.Container runtime.c1: Container
|
|
Host kernel.Container runtime.c2: Container
|
|
Host kernel.Container runtime.c3: Container
|
|
|
|
Host kernel.Container runtime.c1.One app
|
|
Host kernel.Container runtime.c2.Few apps
|
|
Host kernel.Container runtime.c3.Full OS.Many apps
|
|
```
|
|
|
|
### System containers
|
|
|
|
One of the most well-known examples of system container tech is the subject of
|
|
this post: LXD! Rather than containing a single application or a very small set
|
|
of them, system containers are designed to house entire operating systems, like
|
|
[Debian] or [Rocky Linux,][rocky] along with everything required for your
|
|
application. Using our examples from above, a single statically-linked Go binary
|
|
might run in a full Debian container, just like the Python application might.
|
|
The database and webserver might go in _that same_ container.
|
|
|
|
[Debian]: https://www.debian.org/
|
|
[rocky]: https://rockylinux.org/
|
|
|
|
You treat each container more like you would a VM, but you get the performance
|
|
benefit of _not_ virtualising everything. Containers tend to be _much_ lighter
|
|
than most VMs.[^1]
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.c1: Container
|
|
hk.c2: Container
|
|
hk.c3: Container
|
|
hk.c1.os1: Full OS
|
|
hk.c2.os2: Full OS
|
|
hk.c3.os3: Full OS
|
|
hk.c1.os1.app1: Many apps
|
|
hk.c2.os2.app2: Many apps
|
|
hk.c3.os3.app3: Many apps
|
|
```
|
|
|
|
## When to use which
|
|
|
|
These are personal opinions. Please evaluate each technology and determine for
|
|
yourself whether it's a suitable fit for your environment.
|
|
|
|
### VMs
|
|
|
|
As far as I'm aware, VMs are your only option when you want to work with
|
|
esoteric hardware or hardware you don't physically have on-hand. You can tell
|
|
your VM that it's running with RAM that's 20 years old, a still-in-development
|
|
RISC-V CPU, and a 420p monitor. That's not possible with containers. VMs are
|
|
also your only option when you want to work with foreign operating systems:
|
|
running Linux on Windows, Windows on Linux, or OpenBSD on a Mac all require
|
|
virtualisation. Another reason to stick with VMs is for compliance purposes.
|
|
Containers are still very new and some regulatory bodies require virtualisation
|
|
because it's a decades-old and battle-tested isolation technique.
|
|
|
|
{{< adm type="note" >}}
|
|
See Drew DeVault's blog post [_In praise of qemu_][qemu] for a great use of VMs
|
|
|
|
[qemu]: https://drewdevault.com/2022/09/02/2022-09-02-In-praise-of-qemu.html
|
|
|
|
{{< /adm >}}
|
|
|
|
### Application containers
|
|
|
|
Application containers are particularly popular for [microservices] and
|
|
[reproducible builds,][repb] though I personally think [NixOS] is a better fit
|
|
for the latter. App containers are also your only option if you want to use
|
|
cloud platforms with extreme scaling capabilities like Google Cloud's App Engine
|
|
standard environment or AWS's Fargate.
|
|
|
|
[microservices]: https://en.wikipedia.org/wiki/Microservices
|
|
[repb]: https://en.wikipedia.org/wiki/Reproducible_builds
|
|
[NixOS]: https://nixos.org/
|
|
|
|
Application containers also tend to be necessary when the application you want
|
|
to self-host is _only_ distributed as a Docker image and the maintainers
|
|
adamantly refuse to support any other deployment method. This is a _massive_ pet
|
|
peeve of mine; yes, Docker can make running self-hosted applications easier for
|
|
inexperienced individuals,[^2] but an application orchestration system _does
|
|
not_ fit in every single environment. By refusing to provide proper "manual"
|
|
deployment instructions, maintainers of these projects alienate an entire class
|
|
of potential users and it pisses me off.
|
|
|
|
Just document your shit.
|
|
|
|
### System containers
|
|
|
|
Personally, I prefer the workflow of system containers and use them for
|
|
everything else. Because they contain entire operating systems, you're able to
|
|
interact with it in a similar way to VMs or even your PC; you shell into it,
|
|
`apt install` whatever you need, set up the application, expose it over the
|
|
network (for example, on `0.0.0.0:8080`), proxy it on the container host, and
|
|
that's it! This process can be trivially automated with shell scripts, Ansible
|
|
roles, Chef, Puppet, whatever you like. Back the system up using [tarsnap] or
|
|
[rsync.net] or [Backblaze,][bb] Google Drive, and [restic.][restic] If you use
|
|
ZFS for your LXD storage pool, maybe go with [syncoid and sanoid.][ss]
|
|
|
|
[tarsnap]: https://www.tarsnap.com/
|
|
[rsync.net]: https://rsync.net/
|
|
[bb]: https://www.backblaze.com/
|
|
[restic]: https://restic.net/
|
|
[ss]: https://github.com/jimsalterjrs/sanoid
|
|
|
|
My point is that using system containers doesn't mean throwing out the last few
|
|
decades of systems knowledge and wisdom.
|
|
|
|
[^1]:
|
|
There's a [technical
|
|
publication](https://dl.acm.org/doi/10.1145/3132747.3132763) indicating that
|
|
specialised VMs with unikernels can be far lighter and more secure than
|
|
containers.
|
|
|
|
[^2]:
|
|
Until they need to do _anything_ more complex than pull a newer image. Then
|
|
it's twice as painful as the "manual" method might have been.
|