docs(faq): describe filter creation process

ref: 5beecca906
curben 2019-05-11 20:14:37 +09:30
parent a1f2af2f65
commit 38242f8d16
1 changed files with 10 additions and 0 deletions

10
FAQ.md

@ -1,3 +1,13 @@
- How is the filter created?
1. Grab the URLhaus database dump and save it to [URLhaus.csv](https://gitlab.com/curben/urlhaus-filter/blob/master/src/URLhaus.csv).
2. Extract the domains.
3. Remove offline domains, popular domains ([Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)) and more well-known domains (if not listed by Umbrella, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)).
4. Extract the URLs (from step 1) that include popular domains (Umbrella and exclude.txt).
5. Merge the files from step 3 and 4.
- Why there is an issue running the scripts locally?
+ Install **dos2unix** or use `busybox dos2unix` if BusyBox is already installed (like Ubuntu).
- Can you add this *very-bad-url.com* to the filter?
+ No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit).