secluded/content/posts/lxd-containers-for-human-be...

426 lines
17 KiB
Markdown
Raw Normal View History

2023-06-25 20:18:31 +00:00
---
title: "LXD: Containers for Human Beings"
subtitle: "Docker's great and all, but I prefer the workflow of interacting with VMs"
2023-09-17 22:04:50 +00:00
date: 2023-09-17T18:04:00-04:00
2023-06-25 20:18:31 +00:00
categories:
2023-08-16 19:34:57 +00:00
- Technology
2023-06-25 20:18:31 +00:00
tags:
2023-08-16 19:34:57 +00:00
- Sysadmin
- Containers
- VMs
- Docker
- LXD
2023-06-25 20:18:31 +00:00
draft: true
2023-08-28 23:49:44 +00:00
toc: true
2023-06-25 20:18:31 +00:00
rss_only: false
cover: ./cover.png
---
2023-08-16 19:34:57 +00:00
This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and
2023-06-25 20:18:31 +00:00
SouthEast LinuxFest 2023. The first was not recorded, but the second was and is
2023-08-18 02:04:07 +00:00
on [SELF's PeerTube instance.][selfpeertube] I apologise for the terrible audio,
2023-08-24 00:38:21 +00:00
but there's unfortunately nothing I can do about that. If you're already
intimately familiar with the core concepts of VMs or containers, I would suggest
skipping those respective sections. If you're vaguely familiar with either, I
would recommend reading them because I do go a little bit in-depth.
2023-06-25 20:18:31 +00:00
[selfpeertube]: https://peertube.linuxrocks.online/w/hjiTPHVwGz4hy9n3cUL1mq?start=1m
2023-08-16 19:34:57 +00:00
{{< adm type="warn" >}}
**Note:** Canonical has decided to [pull LXD out][lxd] from under the Linux
Containers entity and instead continue development under the Canonical brand.
2023-08-24 00:38:21 +00:00
The majority of the LXD creators and developers have congregated around a fork
called [Incus.][inc] I'll be keeping a close eye on the project and intend to
migrate as soon as there's an installable release.
2023-08-16 19:34:57 +00:00
[lxd]: https://linuxcontainers.org/lxd/
[inc]: https://linuxcontainers.org/incus/
{{< /adm >}}
2023-09-18 05:47:33 +00:00
Questions, comments, and corrections are welcome! Feel free to use the
self-hosted comment system at the bottom, send me an email, an IM, reply to the
fediverse post, etc. Edits and corrections, if there are any, will be noted just
below this paragraph.
2023-06-25 20:18:31 +00:00
## The benefits of VMs and containers
2023-08-24 00:38:21 +00:00
- **Isolation:** you don't want to allow an attacker to infiltrate your email
server through your web application; the two should be completely separate
from each other and VMs/containers provide strong isolation guarantees.
2023-06-25 20:18:31 +00:00
- **Flexibility:** <abbr title="Virtual Machines">VMs</abbr> and containers only
2023-08-18 02:04:07 +00:00
use the resources they've been given. If you tell the VM it has 200 MBs of
RAM, it's going to make do with 200 MBs of RAM and the kernel's <abbr
title="Out Of Memory">OOM</abbr> killer is going to have a fun time 🤠
2023-06-25 20:18:31 +00:00
- **Portability:** once set up and configured, VMs and containers can mostly be
2023-08-27 18:16:14 +00:00
treated as closed boxes; as long as the surrounding environment of the new
host is similar to the previous in terms of communication (proxies, web
servers, etc.), they can just be picked up and dropped between various hosts
as necessary.
2023-08-18 02:04:07 +00:00
- **Density:** applications are usually much lighter than the systems they're
running on, so it makes sense to run many applications on one system. VMs and
containers facilitate that without sacrificing security.
2023-08-24 00:38:21 +00:00
- **Cleanliness:** VMs and containers are applications in black boxes. When
you're done with the box, you can just throw it away and most everything
related to the application is gone.
2023-07-18 17:21:24 +00:00
## Virtual machines
2023-08-24 00:38:21 +00:00
As the name suggests, Virtual Machines are all virtual; a hypervisor creates
virtual disks for storage, virtual <abbr title="Central Processing
Units">CPUs</abbr>, virtual <abbr title="Network Interface Cards">NICs</abbr>,
virtual <abbr title="Random Access Memory">RAM</abbr>, etc. On top of the
virtualised hardware, you have your kernel. This is what facilitates
communication between the operating system and the (virtual) hardware. Above
that is the operating system and all your applications.
At this point, the stack is quite large; VMs aren't exactly lightweight, and
this impacts how densely you can pack the host.
I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in
general do, but there are actually two different kinds of hypervisor. They're
creatively named **Type 1** and **Type 2**.
### Type 1 hypervisors
These run directly in the host kernel without an intermediary OS. A good example
would be [KVM,][kvm] a **VM** hypervisor than runs in the **K**ernel. Type 1
hypervisors can communicate directly with the host's hardware to allocate RAM,
issue instructions to the CPU, etc.
[debian]: https://debian.org
[kvm]: https://www.linux-kvm.org
[vb]: https://www.virtualbox.org/
2023-08-18 02:04:07 +00:00
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
2023-08-24 00:38:21 +00:00
hk: Host kernel
2023-08-24 01:51:27 +00:00
hk.h: Type 1 hypervisor
hk.h.k1: Guest kernel
hk.h.k2: Guest kernel
hk.h.k3: Guest kernel
hk.h.k1.os1: Guest OS
hk.h.k2.os2: Guest OS
hk.h.k3.os3: Guest OS
hk.h.k1.os1.app1: Many apps
hk.h.k2.os2.app2: Many apps
hk.h.k3.os3.app3: Many apps
2023-08-24 00:38:21 +00:00
```
### Type 2 hypervisors
These run in userspace as an application, like [VirtualBox.][vb] Type 2
hypervisors have to first go through the operating system, adding an additional
layer to the stack.
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
hk: Host kernel
2023-08-24 01:51:27 +00:00
hk.os: Host OS
hk.os.h: Type 2 hypervisor
hk.os.h.k1: Guest kernel
hk.os.h.k2: Guest kernel
hk.os.h.k3: Guest kernel
hk.os.h.k1.os1: Guest OS
hk.os.h.k2.os2: Guest OS
hk.os.h.k3.os3: Guest OS
hk.os.h.k1.os1.app1: Many apps
hk.os.h.k2.os2.app2: Many apps
hk.os.h.k3.os3.app3: Many apps
2023-07-18 17:21:24 +00:00
```
## Containers
2023-08-27 18:16:14 +00:00
VMs use virtualisation to achieve isolation. Containers use **namespaces** and
**cgroups**, technologies pioneered in the Linux kernel. By now, though, there
are [equivalents for Windows] and possibly other platforms.
2023-08-18 02:04:07 +00:00
2023-08-27 18:16:14 +00:00
[equivalents for Windows]: https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs
**[Linux namespaces]** partition kernel resources like process IDs, hostnames,
user IDs, directory hierarchies, network access, etc. This prevents one
collection of processes from seeing or gaining access to data regarding another
collection of processes.
**[Cgroups]** limit, track, and isolate the hardware resource use of a
collection of processes. If you tell a cgroup that it's only allowed to spawn
500 child processes and someone executes a fork bomb, the fork bomb will expand
until it hits that limit. The kernel will prevent it from spawning further
children and you'll have to resolve the issue the same way you would with VMs:
delete and re-create it, restore from a good backup, etc. You can also limit CPU
use, the number of CPU cores it can access, RAM, disk use, and so on.
2023-08-18 02:04:07 +00:00
2023-08-24 01:51:27 +00:00
[Linux namespaces]: https://en.wikipedia.org/wiki/Linux_namespaces
[Cgroups]: https://en.wikipedia.org/wiki/Cgroups
### Application containers
2023-08-27 18:16:14 +00:00
The most well-known example of application container tech is probably
[Docker.][docker] The goal here is to run a single application as minimally as
possible inside each container. In the case of a single, statically-linked Go
binary, a minimal Docker container might contain nothing more than the binary.
If it's a Python application, you're more likely to use an [Alpine Linux image]
and add your Python dependencies on top of that. If a database is required, that
goes in a separate container. If you've got a web server to handle TLS
termination and proxy your application, that's a third container. One cohesive
system might require many Docker containers to function as intended.
[docker]: https://docker.com/
[Alpine Linux image]: https://hub.docker.com/_/alpine
2023-08-24 01:51:27 +00:00
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
Host kernel.Container runtime.c1: Container
Host kernel.Container runtime.c2: Container
Host kernel.Container runtime.c3: Container
Host kernel.Container runtime.c1.One app
Host kernel.Container runtime.c2.Few apps
Host kernel.Container runtime.c3.Full OS.Many apps
2023-08-18 02:04:07 +00:00
```
2023-08-24 01:51:27 +00:00
### System containers
2023-08-27 18:16:14 +00:00
One of the most well-known examples of system container tech is the subject of
this post: LXD! Rather than containing a single application or a very small set
of them, system containers are designed to house entire operating systems, like
[Debian] or [Rocky Linux,][rocky] along with everything required for your
application. Using our examples from above, a single statically-linked Go binary
might run in a full Debian container, just like the Python application might.
The database and webserver might go in _that same_ container.
[Debian]: https://www.debian.org/
[rocky]: https://rockylinux.org/
You treat each container more like you would a VM, but you get the performance
2023-09-17 22:04:50 +00:00
benefit of _not_ virtualising everything. Containers tend to be _much_ lighter
than most VMs.[^1]
2023-08-27 18:16:14 +00:00
2023-08-18 02:04:07 +00:00
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
2023-08-24 01:51:27 +00:00
hk: Host kernel
hk.c1: Container
hk.c2: Container
hk.c3: Container
hk.c1.os1: Full OS
hk.c2.os2: Full OS
hk.c3.os3: Full OS
hk.c1.os1.app1: Many apps
hk.c2.os2.app2: Many apps
hk.c3.os3.app3: Many apps
2023-07-18 17:21:24 +00:00
```
2023-08-27 18:16:14 +00:00
## When to use which
2023-07-18 17:21:24 +00:00
2023-08-28 23:49:44 +00:00
These are personal opinions. Please evaluate each technology and determine for
yourself whether it's a suitable fit for your environment.
### VMs
2023-07-18 17:21:24 +00:00
2023-08-27 18:16:14 +00:00
As far as I'm aware, VMs are your only option when you want to work with
2023-08-28 23:49:44 +00:00
esoteric hardware or hardware you don't physically have on-hand. You can tell
your VM that it's running with RAM that's 20 years old, a still-in-development
RISC-V CPU, and a 420p monitor. That's not possible with containers. VMs are
also your only option when you want to work with foreign operating systems:
running Linux on Windows, Windows on Linux, or OpenBSD on a Mac all require
virtualisation. Another reason to stick with VMs is for compliance purposes.
Containers are still very new and some regulatory bodies require virtualisation
because it's a decades-old and battle-tested isolation technique.
2023-07-18 17:21:24 +00:00
2023-08-27 18:16:14 +00:00
{{< adm type="note" >}}
See Drew DeVault's blog post [_In praise of qemu_][qemu] for a great use of VMs
2023-07-18 17:21:24 +00:00
2023-08-27 18:16:14 +00:00
[qemu]: https://drewdevault.com/2022/09/02/2022-09-02-In-praise-of-qemu.html
2023-08-28 23:49:44 +00:00
2023-08-27 18:16:14 +00:00
{{< /adm >}}
2023-07-18 17:21:24 +00:00
2023-08-28 23:49:44 +00:00
### Application containers
2023-08-27 18:16:14 +00:00
Application containers are particularly popular for [microservices] and
[reproducible builds,][repb] though I personally think [NixOS] is a better fit
for the latter. App containers are also your only option if you want to use
cloud platforms with extreme scaling capabilities like Google Cloud's App Engine
standard environment or AWS's Fargate.
[microservices]: https://en.wikipedia.org/wiki/Microservices
[repb]: https://en.wikipedia.org/wiki/Reproducible_builds
[NixOS]: https://nixos.org/
2023-08-28 23:49:44 +00:00
Application containers also tend to be necessary when the application you want
to self-host is _only_ distributed as a Docker image and the maintainers
adamantly refuse to support any other deployment method. This is a _massive_ pet
peeve of mine; yes, Docker can make running self-hosted applications easier for
2023-09-19 02:07:33 +00:00
inexperienced individuals,[^2] but an application orchestration system _does
not_ fit in every single environment. By refusing to provide proper "manual"
2023-08-28 23:49:44 +00:00
deployment instructions, maintainers of these projects alienate an entire class
of potential users and it pisses me off.
Just document your shit.
### System containers
2023-09-11 06:54:34 +00:00
Personally, I prefer the workflow of system containers and use them for
everything else. Because they contain entire operating systems, you're able to
interact with it in a similar way to VMs or even your PC; you shell into it,
`apt install` whatever you need, set up the application, expose it over the
network (for example, on `0.0.0.0:8080`), proxy it on the container host, and
that's it! This process can be trivially automated with shell scripts, Ansible
2023-09-17 22:04:50 +00:00
roles, Chef, Puppet, whatever you like. Back the system up using [tarsnap] or
[rsync.net] or [Backblaze,][bb] Google Drive, and [restic.][restic] If you use
ZFS for your LXD storage pool, maybe go with [syncoid and sanoid.][ss]
[tarsnap]: https://www.tarsnap.com/
[rsync.net]: https://rsync.net/
[bb]: https://www.backblaze.com/
[restic]: https://restic.net/
[ss]: https://github.com/jimsalterjrs/sanoid
My point is that using system containers doesn't mean throwing out the last few
decades of systems knowledge and wisdom.
2023-07-18 17:21:24 +00:00
## Crash course to LXD
2023-09-11 06:54:34 +00:00
Quick instructions for installing LXD and setting up your first application.
2023-08-24 00:38:21 +00:00
### Installation
{{< adm type="note" >}}
**Note:** the instructions below say to install LXD using [Snap.][snap] I
2023-08-24 01:51:27 +00:00
personally dislike Snap, but LXD is a Canonical product and they're doing their
2023-08-28 23:49:44 +00:00
best to promote it as much as possible. One of the first things the Incus
project did was [rip out Snap support,][rsnap] so it will eventually be
installable as a proper native package.
2023-08-24 00:38:21 +00:00
[snap]: https://en.wikipedia.org/wiki/Snap_(software)
[rsnap]: https://github.com/lxc/incus/compare/9579f65cd0f215ecd847e8c1cea2ebe96c56be4a...3f64077a80e028bb92b491d42037124e9734d4c7
{{< /adm >}}
2023-07-18 17:21:24 +00:00
1. Install snap following [Canonical's tutorial](https://earl.run/ZvUK)
2023-08-24 00:38:21 +00:00
- LXD is natively packaged for Arch and Alpine, but configuration can be a
massive headache.
2023-07-18 17:21:24 +00:00
2. `sudo snap install lxd`
3. `lxd init`
2023-08-28 23:49:44 +00:00
- Defaults are fine for the most part; you may want to increase the size of
the storage pool.
4. `lxc launch images:debian/12 container-name`
5. `lxc shell container-name`
2023-08-24 00:38:21 +00:00
### Usage
2023-08-28 23:49:44 +00:00
As an example of how to use LXD in a real situation, we'll set up [my URL
shortener.][earl] You'll need a VPS with LXD installed and a (sub)domain pointed
to the VPS.
Run `lxc launch images:debian/12 earl` followed by `lxc shell earl` and `apt
install curl`. Also `apt install` a text editor, like `vim` or `nano` depending
on what you're comfortable with. Head to the **Installation** section of [earl's
SourceHut page][earl] and expand the **List of latest binaries**. Copy the link
to the binary appropriate for your platform, head back to your terminal, type
`curl -LO`, and paste the link you copied. This will download the binary to your
system. Run `mv <filename> earl` to rename it, `chmod +x earl` to make it
executable, then `./earl` to execute it. It will create a file called
`config.yaml` that you need to edit before proceeding. Change the `accessToken`
to something else and replace the `listen` value, `127.0.0.1`, with `0.0.0.0`.
This exposes the application to the host system so we can reverse proxy it.
[earl]: https://earl.run/source
The next step is daemonising it so it runs as soon as the system boots. Edit the
file located at `/etc/systemd/system/earl.service` and paste the following code
snippet into it.
```ini
[Unit]
Description=personal link shortener
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/root/
ExecStart=/root/earl -c config.yaml
[Install]
WantedBy=multi-user.target
```
Save, then run `systemctl daemon-reload` followed by `systemctl enable --now
earl`. You should be able to `curl localhost:8275` and see some HTML.
Now we need a reverse proxy on the host. Exit the container with `exit` or
`Ctrl+D`, and if you have a preferred webserver, install it. If you don't have a
preferred webserver yet, I recommend [installing Caddy.][caddy] All that's left
is running `lxc list`, making note of the `earl` container's `IPv4` address, and
reverse proxying it. If you're using Caddy, edit `/etc/caddy/Caddyfile` and
replace everything that's there with the following.
[caddy]: https://caddyserver.com/docs/install
```text
<(sub)domain> {
encode zstd gzip
reverse_proxy <container IP address>:1313
}
```
Run `systemctl restart caddy` and head to whatever domain or subdomain you
entered. You should see the home page with just the text `earl` on it. If you go
to `/login`, you'll be able to enter whatever access token you set earlier and
log in.
2023-09-18 05:47:33 +00:00
### Further tips
2023-09-19 02:07:33 +00:00
One of the things you might want to do post-installation is mess around with
2023-09-18 05:47:33 +00:00
profiles. There's a `default` profile in LXD that you can show with `lxc profile
show default`.
``` text
$ lxc profile show default
config: {}
description: Default LXD profile
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
name: default
used_by: []
```
Not all config options are listed here though; you'll need to read [the
documentation] for a full enumeration.
[the documentation]: https://documentation.ubuntu.com/lxd/en/latest/config-options/
2023-08-28 23:49:44 +00:00
I've seen some people say that executing a fork bomb from inside a container is
equivalent to executing it on the host. The fork bomb will blow up the whole
system and render every application and container you're running inoperable.
That's partially true because LXD _by default_ doesn't put a limit on how many
processes a particular container can spawn. You can limit that number yourself
by running
```text
lxc profile set default limits.processes <num-processes>
```
Any container you create under the `default` profile will have a total process
limit of `<num-processes>`. I can't tell you what a good process limit is
though; you'll need to do some testing and experimentation on your own.
2023-09-18 05:47:33 +00:00
As stated in [the containers section,](#containers) this doesn't _save_ you from
fork bombs. It just helps prevent a fork bomb from affecting the host OS or
other containers.
2023-08-24 01:51:27 +00:00
2023-08-28 23:49:44 +00:00
[^1]:
2023-09-17 22:04:50 +00:00
There's a [technical
publication](https://dl.acm.org/doi/10.1145/3132747.3132763) indicating that
specialised VMs with unikernels can be far lighter and more secure than
containers.
[^2]:
2023-08-28 23:49:44 +00:00
Until they need to do _anything_ more complex than pull a newer image. Then
it's twice as painful as the "manual" method might have been.