secluded/content/posts/lxd-containers-for-human-be...

16 KiB

title subtitle date categories tags draft toc rss_only cover
LXD: Containers for Human Beings Docker's great and all, but I prefer the workflow of interacting with VMs 2023-08-11T16:30:00-04:00
Technology
Sysadmin
Containers
VMs
Docker
LXD
true true false ./cover.png

This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and SouthEast LinuxFest 2023. The first was not recorded, but the second was and is on SELF's PeerTube instance. I apologise for the terrible audio, but there's unfortunately nothing I can do about that. If you're already intimately familiar with the core concepts of VMs or containers, I would suggest skipping those respective sections. If you're vaguely familiar with either, I would recommend reading them because I do go a little bit in-depth.

{{< adm type="warn" >}}

Note: Canonical has decided to pull LXD out from under the Linux Containers entity and instead continue development under the Canonical brand. The majority of the LXD creators and developers have congregated around a fork called Incus. I'll be keeping a close eye on the project and intend to migrate as soon as there's an installable release.

{{< /adm >}}

The benefits of VMs and containers

  • Isolation: you don't want to allow an attacker to infiltrate your email server through your web application; the two should be completely separate from each other and VMs/containers provide strong isolation guarantees.
  • Flexibility: VMs and containers only use the resources they've been given. If you tell the VM it has 200 MBs of RAM, it's going to make do with 200 MBs of RAM and the kernel's OOM killer is going to have a fun time 🤠
  • Portability: once set up and configured, VMs and containers can mostly be treated as closed boxes; as long as the surrounding environment of the new host is similar to the previous in terms of communication (proxies, web servers, etc.), they can just be picked up and dropped between various hosts as necessary.
  • Density: applications are usually much lighter than the systems they're running on, so it makes sense to run many applications on one system. VMs and containers facilitate that without sacrificing security.
  • Cleanliness: VMs and containers are applications in black boxes. When you're done with the box, you can just throw it away and most everything related to the application is gone.

Virtual machines

As the name suggests, Virtual Machines are all virtual; a hypervisor creates virtual disks for storage, virtual CPUs, virtual NICs, virtual RAM, etc. On top of the virtualised hardware, you have your kernel. This is what facilitates communication between the operating system and the (virtual) hardware. Above that is the operating system and all your applications.

At this point, the stack is quite large; VMs aren't exactly lightweight, and this impacts how densely you can pack the host.

I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in general do, but there are actually two different kinds of hypervisor. They're creatively named Type 1 and Type 2.

Type 1 hypervisors

These run directly in the host kernel without an intermediary OS. A good example would be KVM, a VM hypervisor than runs in the Kernel. Type 1 hypervisors can communicate directly with the host's hardware to allocate RAM, issue instructions to the CPU, etc.

hk: Host kernel
hk.h: Type 1 hypervisor
hk.h.k1: Guest kernel
hk.h.k2: Guest kernel
hk.h.k3: Guest kernel
hk.h.k1.os1: Guest OS
hk.h.k2.os2: Guest OS
hk.h.k3.os3: Guest OS
hk.h.k1.os1.app1: Many apps
hk.h.k2.os2.app2: Many apps
hk.h.k3.os3.app3: Many apps

Type 2 hypervisors

These run in userspace as an application, like VirtualBox. Type 2 hypervisors have to first go through the operating system, adding an additional layer to the stack.

hk: Host kernel
hk.os: Host OS
hk.os.h: Type 2 hypervisor
hk.os.h.k1: Guest kernel
hk.os.h.k2: Guest kernel
hk.os.h.k3: Guest kernel
hk.os.h.k1.os1: Guest OS
hk.os.h.k2.os2: Guest OS
hk.os.h.k3.os3: Guest OS
hk.os.h.k1.os1.app1: Many apps
hk.os.h.k2.os2.app2: Many apps
hk.os.h.k3.os3.app3: Many apps

Containers

VMs use virtualisation to achieve isolation. Containers use namespaces and cgroups, technologies pioneered in the Linux kernel. By now, though, there are equivalents for Windows and possibly other platforms.

Linux namespaces partition kernel resources like process IDs, hostnames, user IDs, directory hierarchies, network access, etc. This prevents one collection of processes from seeing or gaining access to data regarding another collection of processes.

Cgroups limit, track, and isolate the hardware resource use of a collection of processes. If you tell a cgroup that it's only allowed to spawn 500 child processes and someone executes a fork bomb, the fork bomb will expand until it hits that limit. The kernel will prevent it from spawning further children and you'll have to resolve the issue the same way you would with VMs: delete and re-create it, restore from a good backup, etc. You can also limit CPU use, the number of CPU cores it can access, RAM, disk use, and so on.

Application containers

The most well-known example of application container tech is probably Docker. The goal here is to run a single application as minimally as possible inside each container. In the case of a single, statically-linked Go binary, a minimal Docker container might contain nothing more than the binary. If it's a Python application, you're more likely to use an Alpine Linux image and add your Python dependencies on top of that. If a database is required, that goes in a separate container. If you've got a web server to handle TLS termination and proxy your application, that's a third container. One cohesive system might require many Docker containers to function as intended.

Host kernel.Container runtime.c1: Container
Host kernel.Container runtime.c2: Container
Host kernel.Container runtime.c3: Container

Host kernel.Container runtime.c1.One app
Host kernel.Container runtime.c2.Few apps
Host kernel.Container runtime.c3.Full OS.Many apps

System containers

One of the most well-known examples of system container tech is the subject of this post: LXD! Rather than containing a single application or a very small set of them, system containers are designed to house entire operating systems, like Debian or Rocky Linux, along with everything required for your application. Using our examples from above, a single statically-linked Go binary might run in a full Debian container, just like the Python application might. The database and webserver might go in that same container.

You treat each container more like you would a VM, but you get the performance benefit of not virtualising everything. Containers are much lighter than any virtual machine.

hk: Host kernel
hk.c1: Container
hk.c2: Container
hk.c3: Container
hk.c1.os1: Full OS
hk.c2.os2: Full OS
hk.c3.os3: Full OS
hk.c1.os1.app1: Many apps
hk.c2.os2.app2: Many apps
hk.c3.os3.app3: Many apps

When to use which

These are personal opinions. Please evaluate each technology and determine for yourself whether it's a suitable fit for your environment.

VMs

As far as I'm aware, VMs are your only option when you want to work with esoteric hardware or hardware you don't physically have on-hand. You can tell your VM that it's running with RAM that's 20 years old, a still-in-development RISC-V CPU, and a 420p monitor. That's not possible with containers. VMs are also your only option when you want to work with foreign operating systems: running Linux on Windows, Windows on Linux, or OpenBSD on a Mac all require virtualisation. Another reason to stick with VMs is for compliance purposes. Containers are still very new and some regulatory bodies require virtualisation because it's a decades-old and battle-tested isolation technique.

{{< adm type="note" >}} See Drew DeVault's blog post In praise of qemu for a great use of VMs

{{< /adm >}}

Application containers

Application containers are particularly popular for microservices and reproducible builds, though I personally think NixOS is a better fit for the latter. App containers are also your only option if you want to use cloud platforms with extreme scaling capabilities like Google Cloud's App Engine standard environment or AWS's Fargate.

Application containers also tend to be necessary when the application you want to self-host is only distributed as a Docker image and the maintainers adamantly refuse to support any other deployment method. This is a massive pet peeve of mine; yes, Docker can make running self-hosted applications easier for inexperienced individuals,1 but application orchestration system does not fit in every single environment. By refusing to provide proper "manual" deployment instructions, maintainers of these projects alienate an entire class of potential users and it pisses me off.

Just document your shit.

System containers

Personally, I prefer the workflow of system containers and use them for everything else. Because they contain entire operating systems, you're able to interact with it in a similar way to VMs or even your PC; you shell into it, apt install whatever you need, set up the application, expose it over the network (for example, on 0.0.0.0:8080), proxy it on the container host, and that's it! This process can be trivially automated with shell scripts, Ansible roles, Chef, Puppet, whatever you like.

Crash course to LXD

Quick instructions for installing LXD and setting up your first application.

Installation

{{< adm type="note" >}}

Note: the instructions below say to install LXD using Snap. I personally dislike Snap, but LXD is a Canonical product and they're doing their best to promote it as much as possible. One of the first things the Incus project did was rip out Snap support, so it will eventually be installable as a proper native package.

{{< /adm >}}

  1. Install snap following Canonical's tutorial
    • LXD is natively packaged for Arch and Alpine, but configuration can be a massive headache.
  2. sudo snap install lxd
  3. lxd init
    • Defaults are fine for the most part; you may want to increase the size of the storage pool.
  4. lxc launch images:debian/12 container-name
  5. lxc shell container-name

Usage

As an example of how to use LXD in a real situation, we'll set up my URL shortener. You'll need a VPS with LXD installed and a (sub)domain pointed to the VPS.

Run lxc launch images:debian/12 earl followed by lxc shell earl and apt install curl. Also apt install a text editor, like vim or nano depending on what you're comfortable with. Head to the Installation section of earl's SourceHut page and expand the List of latest binaries. Copy the link to the binary appropriate for your platform, head back to your terminal, type curl -LO, and paste the link you copied. This will download the binary to your system. Run mv <filename> earl to rename it, chmod +x earl to make it executable, then ./earl to execute it. It will create a file called config.yaml that you need to edit before proceeding. Change the accessToken to something else and replace the listen value, 127.0.0.1, with 0.0.0.0. This exposes the application to the host system so we can reverse proxy it.

The next step is daemonising it so it runs as soon as the system boots. Edit the file located at /etc/systemd/system/earl.service and paste the following code snippet into it.

[Unit]
Description=personal link shortener
After=network.target

[Service]
User=root
Group=root
WorkingDirectory=/root/
ExecStart=/root/earl -c config.yaml

[Install]
WantedBy=multi-user.target

Save, then run systemctl daemon-reload followed by systemctl enable --now earl. You should be able to curl localhost:8275 and see some HTML.

Now we need a reverse proxy on the host. Exit the container with exit or Ctrl+D, and if you have a preferred webserver, install it. If you don't have a preferred webserver yet, I recommend installing Caddy. All that's left is running lxc list, making note of the earl container's IPv4 address, and reverse proxying it. If you're using Caddy, edit /etc/caddy/Caddyfile and replace everything that's there with the following.

<(sub)domain> {
	encode zstd gzip
	reverse_proxy <container IP address>:1313
}

Run systemctl restart caddy and head to whatever domain or subdomain you entered. You should see the home page with just the text earl on it. If you go to /login, you'll be able to enter whatever access token you set earlier and log in.

Executing a fork bomb

I've seen some people say that executing a fork bomb from inside a container is equivalent to executing it on the host. The fork bomb will blow up the whole system and render every application and container you're running inoperable.

That's partially true because LXD by default doesn't put a limit on how many processes a particular container can spawn. You can limit that number yourself by running

lxc profile set default limits.processes <num-processes>

Any container you create under the default profile will have a total process limit of <num-processes>. I can't tell you what a good process limit is though; you'll need to do some testing and experimentation on your own.

Note that this doesn't save you from fork bombs, all it does is prevent an affected container from affecting other containers. If someone executes a fork bomb in a container, it'll be the same as if they executed it in a virtual machine; assuming it's a one-off, you'll need to fix it by rebooting the container. If it was set to run at startup, you'll need to recreate the container, restore from a backup, revert to a snapshot, etc.


  1. Until they need to do anything more complex than pull a newer image. Then it's twice as painful as the "manual" method might have been. ↩︎