secluded/content/posts/replacing-youtube-invidious.md

284 lines
12 KiB
Markdown

---
title: "Replacing YouTube & Invidious"
description: "Simple script I created to download YouTube videos"
cover: /assets/pngs/download-video.png
date: 2020-08-03T05:43:37-04:00
toc: true
categories:
- Technology
tags:
- YouTube
- Invidious
- youtube-dl
- Media
- Automation
---
Omar Roth, the developer of Invidious, recently wrote a blog post about
*[Stepping away from open source](https://omar.yt/posts/stepping-away-from-open-source)*.
While I never used the official instance, I thought this was a good
opportunity to create a tool that downloads videos from YouTubers I'm
subscribed to so I can watch them offline in whatever manner I prefer.
To that end, [youtube-dl](https://github.com/ytdl-org/youtube-dl) is by
far the most reliable and versatile option. Having been around since
before 2008[^1], I don't think the project is going anywhere.
[MPV](https://mpv.io/) is my media player of choice and it relies on
youtube-dl for watching online content from Twitch to YouTube to
PornHub to [much more](https://ytdl-org.github.io/youtube-dl/supportedsites.html).
Conveniently, youtube-dl comes with all of the tools and flags needed
for exactly this purpose so scripting and automating it is incredibly
easy. Taking a look at `man youtube-dl` reveals a *plethora* of options
but only a few are necessary.
## Preventing duplicates
The main thing when downloading an entire channel is ensuring the same
video isn't downloaded more than once. `--download-archive` will save the
IDs of all the videos downloaded to prevent them from being downloaded
again. All that's required is a file path at which to store the list.
```bash
--download-archive .archives/<channel>.txt
```
## Selecting quality
By default, youtube-dl downloads the *single* file with the best quality
audio and video so it doesn't need to mux[^2] them together after.
However, I prefer to have the highest possibly quality and don't mind
waiting a few seconds longer for FFmpeg to combine audio and video
files. `--format` lets the user decide which format they prefer and
whether they want to focus on storage efficiency or quality. The basic
options are `best`, `worst`, `bestvideo`, and `bestaudio`. These will
download a *single* file that is the highest or lowest quality of both
video/audio or video-only/audio-only. There are a lot of other options
for more fine-grained control over what to look for but, as I mentioned,
I want the highest video and the highest quality audio so I use
`bestvideo+bestaudio`. After downloading both of those files, they will
be muxed together.
```bash
--format bestvideo+bestaudio
```
## Embedding subtitles
I enjoy subtitles but I know many people don't so this section can
certainly be ignored.
There are a few options for fetching and embedding subtitles. To get
"real" subtitles that someone transcribed manually, use `--write-sub`.
For YouTube's auto-generated subtitles that may or may not be horribly
inaccurate, use `--write-auto-sub`. For selecting the language,
`--sub-lang <language-code>`, for embedding them, `--embed-subs`, and
for selecting the format, `--sub-format <desired-format>`. I like using
SRT files so I have that first with `best` beside it like so:
`--sub-format srt/best`. SRT is preferred but, if unavailable, whatever
other highest-quality format will be used and embedded instead.
```bash
--write-sub --write-auto-sub --sub-format srt/best --sub-lang en --embed-subs
```
## Limiting downloads
When switching to this method, the initial download will pull
*all* of a channel's videos. I certainly don't want this so they should
be limited in some way. `--dateafter` and `--playlist-end` serve very
nicely. The former will only download videos published *after* a certain
date and the latter will only download X number of videos.
```bash
--dateafter 20200801 --playlist-end 5
```
**EDIT:** A reader sent me an email with this improvement to the archive
functionality. Rather than checking the last five days of videos to see if
they've already been downloaded, this snippet will check when the archive file
was last edited and use that as the `--dateafter` parameter, making the script a
bit more efficient.
``` bash
AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || date +%Y%m%d)
--dateafter $AFTER --playlist-end 5
```
## Naming format
The final parameter to look at is how to name the files once they're
downloaded. `--output` provides templating functionality and there are a
*lot* of options. For this use, an acceptable template might be
something like `Channel/Title.ext`. In youtube-dl's templating format,
that's `"%(uploader)s/%(title)s.%(ext)s"`.
```bash
--output "%(uploader)s/%(title)s.%(ext)s"
```
## Getting notifications
I don't yet have a good method for getting notifications when there are
*new* videos but there is a simple way to get notified when the script
is finished running. `notify-send` is one of the easiest and has pretty
simple syntax as well: the first string is the notification summary and
the second is a longer description. You can optionally pass an icon
name to make it look a little better.
```bash
notify-send -i video-x-generic "Downloads finished" "Check the YouTube folder for new videos"
```
For some reason, I don't get icons when the generic name is specified
but I know the command will work on most systems. On mine, I have to
pass the path to the icon file I want: `-i /usr/share/icons/Suru++-Dark/apps/64/video.svg`
## Writing the script
I want to store the videos in `~/Videos/YouTube` and I want the archive
records stored in `.archives` so the first line (after the shebang[^3])
creates those directories if they don't already exist and the second
enters the `YouTube` folder.
```bash
mkdir -p "$HOME/Videos/YouTube/.archives"
cd "$HOME/Videos/YouTube"
```
From here, a way to reuse the youtube-dl command is necessary so
parameters can be changed in one place and they'll apply to all
channels. Functions are intended for exactly this purpose and are
formatted like so:
```bash
functionName () {
# Code here
}
```
I've named the function `dl` so mine looks like this:
```bash
dl () {
AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || \
date +%Y%m%d)
youtube-dl --download-archive .archives/"$1".txt -f \
bestvideo+bestaudio --dateafter 20200801 --write-sub \
--write-auto-sub --sub-format srt/best --sub-lang en \
--embed-subs -o "%(uploader)s/%(title)s.%(ext)s" \
--playlist-end 5 "$2"
sleep 5
}
```
Because it's in a function that's called repeatedly, the `AFTER` variable will
be reevaluated each time using a different archive file to ensure no videos are
missed. The backslashes at the end (`\`) tell bash that it's a single command
spanning multiple lines. At the bottom, `sleep` just waits 5 seconds before
downloading the next channel. It's unlikely that YouTube will ratelimit a
residential address for this but it is still possible. Waiting a bit before
continuing reduces the likelihood further.
Note the use of `"$1"` and `"$2"` in the archive path and at the very
end of the youtube-dl command. This lets the user define what the archive file
should be named and what channel to download videos from. A line using
the function would be something like:
```bash
dl linustechtips https://www.youtube.com/user/LinusTechTips
```
The result would be this directory structure:
```
YouTube
├── .archives
│   └── linustechtips.txt
└── Linus Tech Tips
└── They still make MP3 players.mkv
```
The last line is the notification command:
```bash
notify-send -i video-x-generic "Downloads finished" "Check the YouTube folder for new videos"
```
## Finished script
This the script I have in use right now.
```bash
#!/bin/bash
mkdir -p "$HOME/Videos/YouTube/.archives"
cd "$HOME/Videos/YouTube"
dl () {
AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || \
date +%Y%m%d)
youtube-dl --download-archive .archives/"$1".txt -f \
bestvideo+bestaudio --dateafter $AFTER --write-sub \
--write-auto-sub --sub-format srt/best --sub-lang en \
--embed-subs -o "%(uploader)s/%(title)s.%(ext)s" \
--playlist-end 5 "$2"
sleep 5
}
dl ding https://www.youtube.com/channel/UClq42foiSgl7sSpLupnugGA
dl meute https://www.youtube.com/channel/UCY3cAFsquIk7VGMuk-V8S3g
dl vsauce https://www.youtube.com/user/Vsauce
dl ardour https://www.youtube.com/channel/UCqeg5vkTkH-DYxmOO9FJOHA
dl pewdiepie https://www.youtube.com/user/PewDiePie
dl techaltar https://www.youtube.com/channel/UCtZO3K2p8mqFwiKWb9k7fXA
dl avikaplan https://www.youtube.com/user/AviKaplanMusic
dl lukesmith https://www.youtube.com/channel/UC2eYFnH61tmytImy1mTYvhA
dl techlinked https://www.youtube.com/c/techlinked/
dl robscallon https://www.youtube.com/user/robs70986987
dl setheverman https://www.youtube.com/user/SethEverman
dl logosbynick https://www.youtube.com/channel/UCEQXp_fcqwPcqrzNtWJ1w9w
dl techquickie https://www.youtube.com/user/Techquickie
dl yanntiersen https://www.youtube.com/user/YannTiersenOfficial
dl andrewhuang https://www.youtube.com/user/songstowearpantsto
dl aurahandpan https://www.youtube.com/user/Jantzulu
dl jamesveitch https://www.youtube.com/user/james948
dl brandonacker https://www.youtube.com/user/brandonacker
dl unboxtherapy https://www.youtube.com/user/unboxtherapy
dl linustechtips https://www.youtube.com/user/LinusTechTips
dl michaelreeves https://www.youtube.com/channel/UCtHaxi4GTYDpJgMSGy7AeSw
dl countrysquire https://www.youtube.com/channel/UCdrw_DN_OmFIic0gTZJDVCQ
dl roomieofficial https://www.youtube.com/user/RoomieOfficial
dl fridaycheckout https://www.youtube.com/channel/UCRG_N2uO405WO4P3Ruef9NA
dl lastweektonight https://www.youtube.com/user/LastWeekTonight
dl bingingwithbabish https://www.youtube.com/user/bgfilms
notify-send --icon /usr/share/icons/Suru++-Dark/apps/64/video.svg "Downloads finished" "Check the YouTube folder for new videos"
```
## Automation
This is a very simple process.
1. Store your script wherever you want but take note of the directory.
2. Run `crontab -e`
* If you don't already have a cron utility installed, try `cronie`.
It should be in most repos.
4. Paste and edit: `0 */6 * * * /home/user/path/to/script.sh`
5. Save
6. Exit
7. ???
8. Profit
The pasted line runs the script every 6th hour of every day, every week,
every month, and every year. To change the frequency just run `crontab
-e`, edit the line, and save. [Crontab Generator](https://crontab-generator.org/)
or [Crontab Guru](https://crontab.guru/) might be useful if the syntax
is confusing.
Have fun!
## Edits
1. Synchronised `--dateafter` parameter with last modified timestamp of the
archive file — thank you Dominic!
[^1]: The [first commit](https://github.com/ytdl-org/youtube-dl/commit/4fa74b5252a23c2890ddee52b8ee5811b5bb2987) for the current youtube-dl repository was made on
21 July 2008 and it references "the new youtube-dl", which suggests
versions prior.
[^2]: In the digital media world, muxing is process of combining two or
more files into one. This might be a video track, multiple audio tracks,
and/or subtitle tracks.
[^3]: A *shebang* is a sequence of characters that tell your program
loader what interpreter to use. `#!/bin/bash` is what you would use if
it's a bash script. To use a Python interpreter, `#!/usr/bin/env python`
will use the program search path to find the appropriate executable.