--- title: "Replacing YouTube & Invidious" description: "Simple script I created to download YouTube videos" cover: /assets/pngs/download-video.png date: 2020-08-03T05:43:37-04:00 toc: true categories: - Technology tags: - YouTube - Invidious - youtube-dl - Media - Automation --- Omar Roth, the developer of Invidious, recently wrote a blog post about *[Stepping away from open source](https://omar.yt/posts/stepping-away-from-open-source)*. While I never used the official instance, I thought this was a good opportunity to create a tool that downloads videos from YouTubers I'm subscribed to so I can watch them offline in whatever manner I prefer. To that end, [youtube-dl](https://github.com/ytdl-org/youtube-dl) is by far the most reliable and versatile option. Having been around since before 2008[^1], I don't think the project is going anywhere. [MPV](https://mpv.io/) is my media player of choice and it relies on youtube-dl for watching online content from Twitch to YouTube to PornHub to [much more](https://ytdl-org.github.io/youtube-dl/supportedsites.html). Conveniently, youtube-dl comes with all of the tools and flags needed for exactly this purpose so scripting and automating it is incredibly easy. Taking a look at `man youtube-dl` reveals a *plethora* of options but only a few are necessary. ## Preventing duplicates The main thing when downloading an entire channel is ensuring the same video isn't downloaded more than once. `--download-archive` will save the IDs of all the videos downloaded to prevent them from being downloaded again. All that's required is a file path at which to store the list. ```bash --download-archive .archives/.txt ``` ## Selecting quality By default, youtube-dl downloads the *single* file with the best quality audio and video so it doesn't need to mux[^2] them together after. However, I prefer to have the highest possibly quality and don't mind waiting a few seconds longer for FFmpeg to combine audio and video files. `--format` lets the user decide which format they prefer and whether they want to focus on storage efficiency or quality. The basic options are `best`, `worst`, `bestvideo`, and `bestaudio`. These will download a *single* file that is the highest or lowest quality of both video/audio or video-only/audio-only. There are a lot of other options for more fine-grained control over what to look for but, as I mentioned, I want the highest video and the highest quality audio so I use `bestvideo+bestaudio`. After downloading both of those files, they will be muxed together. ```bash --format bestvideo+bestaudio ``` ## Embedding subtitles I enjoy subtitles but I know many people don't so this section can certainly be ignored. There are a few options for fetching and embedding subtitles. To get "real" subtitles that someone transcribed manually, use `--write-sub`. For YouTube's auto-generated subtitles that may or may not be horribly inaccurate, use `--write-auto-sub`. For selecting the language, `--sub-lang `, for embedding them, `--embed-subs`, and for selecting the format, `--sub-format `. I like using SRT files so I have that first with `best` beside it like so: `--sub-format srt/best`. SRT is preferred but, if unavailable, whatever other highest-quality format will be used and embedded instead. ```bash --write-sub --write-auto-sub --sub-format srt/best --sub-lang en --embed-subs ``` ## Limiting downloads When switching to this method, the initial download will pull *all* of a channel's videos. I certainly don't want this so they should be limited in some way. `--dateafter` and `--playlist-end` serve very nicely. The former will only download videos published *after* a certain date and the latter will only download X number of videos. ```bash --dateafter 20200801 --playlist-end 5 ``` **EDIT:** A reader sent me an email with this improvement to the archive functionality. Rather than checking the last five days of videos to see if they've already been downloaded, this snippet will check when the archive file was last edited and use that as the `--dateafter` parameter, making the script a bit more efficient. ``` bash AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || date +%Y%m%d) --dateafter $AFTER --playlist-end 5 ``` ## Naming format The final parameter to look at is how to name the files once they're downloaded. `--output` provides templating functionality and there are a *lot* of options. For this use, an acceptable template might be something like `Channel/Title.ext`. In youtube-dl's templating format, that's `"%(uploader)s/%(title)s.%(ext)s"`. ```bash --output "%(uploader)s/%(title)s.%(ext)s" ``` ## Getting notifications I don't yet have a good method for getting notifications when there are *new* videos but there is a simple way to get notified when the script is finished running. `notify-send` is one of the easiest and has pretty simple syntax as well: the first string is the notification summary and the second is a longer description. You can optionally pass an icon name to make it look a little better. ```bash notify-send -i video-x-generic "Downloads finished" "Check the YouTube folder for new videos" ``` For some reason, I don't get icons when the generic name is specified but I know the command will work on most systems. On mine, I have to pass the path to the icon file I want: `-i /usr/share/icons/Suru++-Dark/apps/64/video.svg` ## Writing the script I want to store the videos in `~/Videos/YouTube` and I want the archive records stored in `.archives` so the first line (after the shebang[^3]) creates those directories if they don't already exist and the second enters the `YouTube` folder. ```bash mkdir -p "$HOME/Videos/YouTube/.archives" cd "$HOME/Videos/YouTube" ``` From here, a way to reuse the youtube-dl command is necessary so parameters can be changed in one place and they'll apply to all channels. Functions are intended for exactly this purpose and are formatted like so: ```bash functionName () { # Code here } ``` I've named the function `dl` so mine looks like this: ```bash dl () { AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || \ date +%Y%m%d) youtube-dl --download-archive .archives/"$1".txt -f \ bestvideo+bestaudio --dateafter 20200801 --write-sub \ --write-auto-sub --sub-format srt/best --sub-lang en \ --embed-subs -o "%(uploader)s/%(title)s.%(ext)s" \ --playlist-end 5 "$2" sleep 5 } ``` Because it's in a function that's called repeatedly, the `AFTER` variable will be reevaluated each time using a different archive file to ensure no videos are missed. The backslashes at the end (`\`) tell bash that it's a single command spanning multiple lines. At the bottom, `sleep` just waits 5 seconds before downloading the next channel. It's unlikely that YouTube will ratelimit a residential address for this but it is still possible. Waiting a bit before continuing reduces the likelihood further. Note the use of `"$1"` and `"$2"` in the archive path and at the very end of the youtube-dl command. This lets the user define what the archive file should be named and what channel to download videos from. A line using the function would be something like: ```bash dl linustechtips https://www.youtube.com/user/LinusTechTips ``` The result would be this directory structure: ``` YouTube ├── .archives │   └── linustechtips.txt └── Linus Tech Tips └── They still make MP3 players.mkv ``` The last line is the notification command: ```bash notify-send -i video-x-generic "Downloads finished" "Check the YouTube folder for new videos" ``` ## Finished script This the script I have in use right now. ```bash #!/bin/bash mkdir -p "$HOME/Videos/YouTube/.archives" cd "$HOME/Videos/YouTube" dl () { AFTER=$(date -r .archives/"$1".txt +%Y%m%d 2>/dev/null || \ date +%Y%m%d) youtube-dl --download-archive .archives/"$1".txt -f \ bestvideo+bestaudio --dateafter $AFTER --write-sub \ --write-auto-sub --sub-format srt/best --sub-lang en \ --embed-subs -o "%(uploader)s/%(title)s.%(ext)s" \ --playlist-end 5 "$2" sleep 5 } dl ding https://www.youtube.com/channel/UClq42foiSgl7sSpLupnugGA dl meute https://www.youtube.com/channel/UCY3cAFsquIk7VGMuk-V8S3g dl vsauce https://www.youtube.com/user/Vsauce dl ardour https://www.youtube.com/channel/UCqeg5vkTkH-DYxmOO9FJOHA dl pewdiepie https://www.youtube.com/user/PewDiePie dl techaltar https://www.youtube.com/channel/UCtZO3K2p8mqFwiKWb9k7fXA dl avikaplan https://www.youtube.com/user/AviKaplanMusic dl lukesmith https://www.youtube.com/channel/UC2eYFnH61tmytImy1mTYvhA dl techlinked https://www.youtube.com/c/techlinked/ dl robscallon https://www.youtube.com/user/robs70986987 dl setheverman https://www.youtube.com/user/SethEverman dl logosbynick https://www.youtube.com/channel/UCEQXp_fcqwPcqrzNtWJ1w9w dl techquickie https://www.youtube.com/user/Techquickie dl yanntiersen https://www.youtube.com/user/YannTiersenOfficial dl andrewhuang https://www.youtube.com/user/songstowearpantsto dl aurahandpan https://www.youtube.com/user/Jantzulu dl jamesveitch https://www.youtube.com/user/james948 dl brandonacker https://www.youtube.com/user/brandonacker dl unboxtherapy https://www.youtube.com/user/unboxtherapy dl linustechtips https://www.youtube.com/user/LinusTechTips dl michaelreeves https://www.youtube.com/channel/UCtHaxi4GTYDpJgMSGy7AeSw dl countrysquire https://www.youtube.com/channel/UCdrw_DN_OmFIic0gTZJDVCQ dl roomieofficial https://www.youtube.com/user/RoomieOfficial dl fridaycheckout https://www.youtube.com/channel/UCRG_N2uO405WO4P3Ruef9NA dl lastweektonight https://www.youtube.com/user/LastWeekTonight dl bingingwithbabish https://www.youtube.com/user/bgfilms notify-send --icon /usr/share/icons/Suru++-Dark/apps/64/video.svg "Downloads finished" "Check the YouTube folder for new videos" ``` ## Automation This is a very simple process. 1. Store your script wherever you want but take note of the directory. 2. Run `crontab -e` * If you don't already have a cron utility installed, try `cronie`. It should be in most repos. 4. Paste and edit: `0 */6 * * * /home/user/path/to/script.sh` 5. Save 6. Exit 7. ??? 8. Profit The pasted line runs the script every 6th hour of every day, every week, every month, and every year. To change the frequency just run `crontab -e`, edit the line, and save. [Crontab Generator](https://crontab-generator.org/) or [Crontab Guru](https://crontab.guru/) might be useful if the syntax is confusing. Have fun! ## Edits 1. Synchronised `--dateafter` parameter with last modified timestamp of the archive file — thank you Dominic! [^1]: The [first commit](https://github.com/ytdl-org/youtube-dl/commit/4fa74b5252a23c2890ddee52b8ee5811b5bb2987) for the current youtube-dl repository was made on 21 July 2008 and it references "the new youtube-dl", which suggests versions prior. [^2]: In the digital media world, muxing is process of combining two or more files into one. This might be a video track, multiple audio tracks, and/or subtitle tracks. [^3]: A *shebang* is a sequence of characters that tell your program loader what interpreter to use. `#!/bin/bash` is what you would use if it's a bash script. To use a Python interpreter, `#!/usr/bin/env python` will use the program search path to find the appropriate executable.