docs(faq): mention Tranco
- 447826dd4b
- add `dos2unix` busybox alias
parent
50c2a27b9e
commit
56e36c9e41
6
faq.md
6
faq.md
|
@ -1,13 +1,15 @@
|
|||
- How is the filter created?
|
||||
1. Grab the URLhaus **Database dump (CSV)** and save it to [URLhaus.csv](https://gitlab.com/curben/urlhaus-filter/blob/master/src/URLhaus.csv).
|
||||
2. Extract the domains.
|
||||
3. Exclude popular domains ([Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)) and some well-known domains (if not listed by Umbrella, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)).
|
||||
3. Exclude popular domains using [Umbrella Popularity List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html) (top 1M domains + subdomains), [Tranco List](https://tranco-list.eu/) (top 1M domains) and some well-known domains (if not listed by Umbrella & Tranco, see [exclude.txt](https://gitlab.com/curben/urlhaus-filter/blob/master/src/exclude.txt)).
|
||||
4. Extract the URLs (from step 1) that include popular domains (Umbrella and exclude.txt).
|
||||
5. Merge the files from step 3 and 4.
|
||||
6. Lite version only parses online urls from that database. Status of an URL (online or offline) is determined by URLhaus and can be found in the fourth column of the database.
|
||||
|
||||
- Why there is an issue running the scripts locally?
|
||||
- Why there is an issue running the script locally?
|
||||
+ Install **dos2unix** or use `busybox dos2unix` if BusyBox is already installed (like Ubuntu).
|
||||
+ If you have busybox but not `/usr/bin/dos2unix`, you can run `alias dos2unix="busybox dos2unix"` before executing the script.
|
||||
+ To set permanent alias, add it to your ".bashrc" or ".bash_profile".
|
||||
|
||||
- Can you add this *very-bad-url.com* to the filter?
|
||||
+ No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit).
|
||||
|
|
Loading…
Reference in New Issue