From 56e36c9e411eeb346b836ee6c802ec5fc2b37355 Mon Sep 17 00:00:00 2001 From: MDLeom <2809763-curben@users.noreply.gitlab.com> Date: Thu, 14 May 2020 09:08:31 +0100 Subject: [PATCH] docs(faq): mention Tranco - https://gitlab.com/curben/urlhaus-filter/-/commit/447826dd4b0d2fe667a22967edf136d1f34d82eb - add `dos2unix` busybox alias --- faq.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/faq.md b/faq.md index a65ecc8..52cf95d 100644 --- a/faq.md +++ b/faq.md @@ -1,13 +1,15 @@ - How is the filter created? 1. Grab the URLhaus **Database dump (CSV)** and save it to [URLhaus.csv](https://gitlab.com/curben/urlhaus-filter/blob/master/src/URLhaus.csv). 2. Extract the domains. - 3. Exclude popular domains ([Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)) and some well-known domains (if not listed by Umbrella, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)). + 3. Exclude popular domains using [Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html) (top 1M domains + subdomains), [Tranco List](https://tranco-list.eu/) (top 1M domains) and some well-known domains (if not listed by Umbrella & Tranco, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)). 4. Extract the URLs (from step 1) that include popular domains (Umbrella and exclude.txt). 5. Merge the files from step 3 and 4. 6. Lite version only parses online urls from that database. Status of an URL (online or offline) is determined by URLhaus and can be found in the fourth column of the database. -- Why there is an issue running the scripts locally? +- Why there is an issue running the script locally? + Install **dos2unix** or use `busybox dos2unix` if BusyBox is already installed (like Ubuntu). + + If you have busybox but not `/usr/bin/dos2unix`, you can run `alias dos2unix="busybox dos2unix"` before executing the script. + + To set permanent alias, add it to your ".bashrc" or ".bash_profile". - Can you add this *very-bad-url.com* to the filter? + No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit).