From 38242f8d16ffb198a7022c15f1fcfc3bc27cd4ee Mon Sep 17 00:00:00 2001 From: curben Date: Sat, 11 May 2019 20:14:37 +0930 Subject: [PATCH] docs(faq): describe filter creation process ref: https://gitlab.com/curben/urlhaus-filter/commit/5beecca906596d336ec6159408559e0e8428d764 --- FAQ.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/FAQ.md b/FAQ.md index c5f6a0d..8185dc4 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,3 +1,13 @@ +- How is the filter created? + 1. Grab the URLhaus database dump and save it to [URLhaus.csv](https://gitlab.com/curben/urlhaus-filter/blob/master/src/URLhaus.csv). + 2. Extract the domains. + 3. Remove offline domains, popular domains ([Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)) and more well-known domains (if not listed by Umbrella, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)). + 4. Extract the URLs (from step 1) that include popular domains (Umbrella and exclude.txt). + 5. Merge the files from step 3 and 4. + +- Why there is an issue running the scripts locally? + + Install **dos2unix** or use `busybox dos2unix` if BusyBox is already installed (like Ubuntu). + - Can you add this *very-bad-url.com* to the filter? + No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit).