272 lines
11 KiB
Markdown
272 lines
11 KiB
Markdown
---
|
|
title: "LXD: Containers for Human Beings"
|
|
subtitle: "Docker's great and all, but I prefer the workflow of interacting with VMs"
|
|
date: 2023-08-11T16:30:00-04:00
|
|
categories:
|
|
- Technology
|
|
tags:
|
|
- Sysadmin
|
|
- Containers
|
|
- VMs
|
|
- Docker
|
|
- LXD
|
|
draft: true
|
|
rss_only: false
|
|
cover: ./cover.png
|
|
---
|
|
|
|
This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and
|
|
SouthEast LinuxFest 2023. The first was not recorded, but the second was and is
|
|
on [SELF's PeerTube instance.][selfpeertube] I apologise for the terrible audio,
|
|
but there's unfortunately nothing I can do about that. If you're already
|
|
intimately familiar with the core concepts of VMs or containers, I would suggest
|
|
skipping those respective sections. If you're vaguely familiar with either, I
|
|
would recommend reading them because I do go a little bit in-depth.
|
|
|
|
[selfpeertube]: https://peertube.linuxrocks.online/w/hjiTPHVwGz4hy9n3cUL1mq?start=1m
|
|
|
|
{{< adm type="warn" >}}
|
|
|
|
**Note:** Canonical has decided to [pull LXD out][lxd] from under the Linux
|
|
Containers entity and instead continue development under the Canonical brand.
|
|
The majority of the LXD creators and developers have congregated around a fork
|
|
called [Incus.][inc] I'll be keeping a close eye on the project and intend to
|
|
migrate as soon as there's an installable release.
|
|
|
|
[lxd]: https://linuxcontainers.org/lxd/
|
|
[inc]: https://linuxcontainers.org/incus/
|
|
|
|
{{< /adm >}}
|
|
|
|
## The benefits of VMs and containers
|
|
|
|
- **Isolation:** you don't want to allow an attacker to infiltrate your email
|
|
server through your web application; the two should be completely separate
|
|
from each other and VMs/containers provide strong isolation guarantees.
|
|
- **Flexibility:** <abbr title="Virtual Machines">VMs</abbr> and containers only
|
|
use the resources they've been given. If you tell the VM it has 200 MBs of
|
|
RAM, it's going to make do with 200 MBs of RAM and the kernel's <abbr
|
|
title="Out Of Memory">OOM</abbr> killer is going to have a fun time 🤠
|
|
- **Portability:** once set up and configured, VMs and containers can mostly be
|
|
treated as closed boxes; as long as the surrounding environment of the new
|
|
host is similar to the previous in terms of communication (proxies, web
|
|
servers, etc.), they can just be picked up and dropped between various hosts
|
|
as necessary.
|
|
- **Density:** applications are usually much lighter than the systems they're
|
|
running on, so it makes sense to run many applications on one system. VMs and
|
|
containers facilitate that without sacrificing security.
|
|
- **Cleanliness:** VMs and containers are applications in black boxes. When
|
|
you're done with the box, you can just throw it away and most everything
|
|
related to the application is gone.
|
|
|
|
## Virtual machines
|
|
|
|
As the name suggests, Virtual Machines are all virtual; a hypervisor creates
|
|
virtual disks for storage, virtual <abbr title="Central Processing
|
|
Units">CPUs</abbr>, virtual <abbr title="Network Interface Cards">NICs</abbr>,
|
|
virtual <abbr title="Random Access Memory">RAM</abbr>, etc. On top of the
|
|
virtualised hardware, you have your kernel. This is what facilitates
|
|
communication between the operating system and the (virtual) hardware. Above
|
|
that is the operating system and all your applications.
|
|
|
|
At this point, the stack is quite large; VMs aren't exactly lightweight, and
|
|
this impacts how densely you can pack the host.
|
|
|
|
I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in
|
|
general do, but there are actually two different kinds of hypervisor. They're
|
|
creatively named **Type 1** and **Type 2**.
|
|
|
|
### Type 1 hypervisors
|
|
|
|
These run directly in the host kernel without an intermediary OS. A good example
|
|
would be [KVM,][kvm] a **VM** hypervisor than runs in the **K**ernel. Type 1
|
|
hypervisors can communicate directly with the host's hardware to allocate RAM,
|
|
issue instructions to the CPU, etc.
|
|
|
|
[debian]: https://debian.org
|
|
[kvm]: https://www.linux-kvm.org
|
|
[vb]: https://www.virtualbox.org/
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.h: Type 1 hypervisor
|
|
hk.h.k1: Guest kernel
|
|
hk.h.k2: Guest kernel
|
|
hk.h.k3: Guest kernel
|
|
hk.h.k1.os1: Guest OS
|
|
hk.h.k2.os2: Guest OS
|
|
hk.h.k3.os3: Guest OS
|
|
hk.h.k1.os1.app1: Many apps
|
|
hk.h.k2.os2.app2: Many apps
|
|
hk.h.k3.os3.app3: Many apps
|
|
```
|
|
|
|
### Type 2 hypervisors
|
|
|
|
These run in userspace as an application, like [VirtualBox.][vb] Type 2
|
|
hypervisors have to first go through the operating system, adding an additional
|
|
layer to the stack.
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.os: Host OS
|
|
hk.os.h: Type 2 hypervisor
|
|
hk.os.h.k1: Guest kernel
|
|
hk.os.h.k2: Guest kernel
|
|
hk.os.h.k3: Guest kernel
|
|
hk.os.h.k1.os1: Guest OS
|
|
hk.os.h.k2.os2: Guest OS
|
|
hk.os.h.k3.os3: Guest OS
|
|
hk.os.h.k1.os1.app1: Many apps
|
|
hk.os.h.k2.os2.app2: Many apps
|
|
hk.os.h.k3.os3.app3: Many apps
|
|
```
|
|
|
|
## Containers
|
|
|
|
VMs use virtualisation to achieve isolation. Containers use **namespaces** and
|
|
**cgroups**, technologies pioneered in the Linux kernel. By now, though, there
|
|
are [equivalents for Windows] and possibly other platforms.
|
|
|
|
[equivalents for Windows]: https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs
|
|
|
|
**[Linux namespaces]** partition kernel resources like process IDs, hostnames,
|
|
user IDs, directory hierarchies, network access, etc. This prevents one
|
|
collection of processes from seeing or gaining access to data regarding another
|
|
collection of processes.
|
|
|
|
**[Cgroups]** limit, track, and isolate the hardware resource use of a
|
|
collection of processes. If you tell a cgroup that it's only allowed to spawn
|
|
500 child processes and someone executes a fork bomb, the fork bomb will expand
|
|
until it hits that limit. The kernel will prevent it from spawning further
|
|
children and you'll have to resolve the issue the same way you would with VMs:
|
|
delete and re-create it, restore from a good backup, etc. You can also limit CPU
|
|
use, the number of CPU cores it can access, RAM, disk use, and so on.
|
|
|
|
[Linux namespaces]: https://en.wikipedia.org/wiki/Linux_namespaces
|
|
[Cgroups]: https://en.wikipedia.org/wiki/Cgroups
|
|
|
|
### Application containers
|
|
|
|
The most well-known example of application container tech is probably
|
|
[Docker.][docker] The goal here is to run a single application as minimally as
|
|
possible inside each container. In the case of a single, statically-linked Go
|
|
binary, a minimal Docker container might contain nothing more than the binary.
|
|
If it's a Python application, you're more likely to use an [Alpine Linux image]
|
|
and add your Python dependencies on top of that. If a database is required, that
|
|
goes in a separate container. If you've got a web server to handle TLS
|
|
termination and proxy your application, that's a third container. One cohesive
|
|
system might require many Docker containers to function as intended.
|
|
|
|
[docker]: https://docker.com/
|
|
[Alpine Linux image]: https://hub.docker.com/_/alpine
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
Host kernel.Container runtime.c1: Container
|
|
Host kernel.Container runtime.c2: Container
|
|
Host kernel.Container runtime.c3: Container
|
|
|
|
Host kernel.Container runtime.c1.One app
|
|
Host kernel.Container runtime.c2.Few apps
|
|
Host kernel.Container runtime.c3.Full OS.Many apps
|
|
```
|
|
|
|
### System containers
|
|
|
|
One of the most well-known examples of system container tech is the subject of
|
|
this post: LXD! Rather than containing a single application or a very small set
|
|
of them, system containers are designed to house entire operating systems, like
|
|
[Debian] or [Rocky Linux,][rocky] along with everything required for your
|
|
application. Using our examples from above, a single statically-linked Go binary
|
|
might run in a full Debian container, just like the Python application might.
|
|
The database and webserver might go in _that same_ container.
|
|
|
|
[Debian]: https://www.debian.org/
|
|
[rocky]: https://rockylinux.org/
|
|
|
|
You treat each container more like you would a VM, but you get the performance
|
|
benefit of _not_ virtualising everything. Containers are _much_ lighter than any
|
|
virtual machine.
|
|
|
|
```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
|
|
hk: Host kernel
|
|
hk.c1: Container
|
|
hk.c2: Container
|
|
hk.c3: Container
|
|
hk.c1.os1: Full OS
|
|
hk.c2.os2: Full OS
|
|
hk.c3.os3: Full OS
|
|
hk.c1.os1.app1: Many apps
|
|
hk.c2.os2.app2: Many apps
|
|
hk.c3.os3.app3: Many apps
|
|
```
|
|
|
|
## When to use which
|
|
|
|
{{< adm type="warn" >}}
|
|
**Warning:** this is my personal opinion. Please evaluate each technology and
|
|
determine for yourself whether it's a suitable fit for your environment.
|
|
{{< /adm >}}
|
|
|
|
As far as I'm aware, VMs are your only option when you want to work with
|
|
esoteric hardware or hardware you don't physically have on-hand. It's also your
|
|
only option when you want to work with foreign operating systems: running Linux
|
|
on Windows, Windows on Linux, or OpenBSD on a Mac all require virtualisation.
|
|
Another reason to stick with VMs is for compliance purposes. Containers are
|
|
still very new and some regulatory bodies require virtualisation because it's a
|
|
decades-old and battle-tested isolation technique.
|
|
|
|
{{< adm type="note" >}}
|
|
See Drew DeVault's blog post [_In praise of qemu_][qemu] for a great use of VMs
|
|
|
|
[qemu]: https://drewdevault.com/2022/09/02/2022-09-02-In-praise-of-qemu.html
|
|
{{< /adm >}}
|
|
|
|
Application containers are particularly popular for [microservices] and
|
|
[reproducible builds,][repb] though I personally think [NixOS] is a better fit
|
|
for the latter. App containers are also your only option if you want to use
|
|
cloud platforms with extreme scaling capabilities like Google Cloud's App Engine
|
|
standard environment or AWS's Fargate.
|
|
|
|
[microservices]: https://en.wikipedia.org/wiki/Microservices
|
|
[repb]: https://en.wikipedia.org/wiki/Reproducible_builds
|
|
[NixOS]: https://nixos.org/
|
|
|
|
- When the app you want to run is _only_ distributed as a Docker container and
|
|
the maintainers adamantly refuse to support any other deployment method
|
|
- (Docker does run in LXD 😉)
|
|
- System containers
|
|
- Anything not listed above 👍
|
|
|
|
## Crash course to LXD
|
|
|
|
### Installation
|
|
|
|
{{< adm type="note" >}}
|
|
|
|
**Note:** the instructions below say to install LXD using [Snap.][snap] I
|
|
personally dislike Snap, but LXD is a Canonical product and they're doing their
|
|
best to prmote it as much as possible. One of the first things the Incus project
|
|
did was [rip out Snap support,][rsnap] so it will eventually be installable as a
|
|
proper native package.
|
|
|
|
[snap]: https://en.wikipedia.org/wiki/Snap_(software)
|
|
[rsnap]: https://github.com/lxc/incus/compare/9579f65cd0f215ecd847e8c1cea2ebe96c56be4a...3f64077a80e028bb92b491d42037124e9734d4c7
|
|
|
|
{{< /adm >}}
|
|
|
|
1. Install snap following [Canonical's tutorial](https://earl.run/ZvUK)
|
|
- LXD is natively packaged for Arch and Alpine, but configuration can be a
|
|
massive headache.
|
|
2. `sudo snap install lxd`
|
|
3. `lxd init`
|
|
4. `lxc image copy images:debian/11 local: --alias deb-11`
|
|
5. `lxc launch deb-11 container-name`
|
|
6. `lxc shell container-name`
|
|
|
|
### Usage
|
|
|
|
{install my URL shortener}
|
|
|
|
[^1]: Docker containers on Windows and macOS actually run in a Linux VM.
|