diff --git a/faq.md b/faq.md index 7119d3f..1c4bae6 100644 --- a/faq.md +++ b/faq.md @@ -7,22 +7,7 @@ 6. Lite version only parses online urls from that database. Status of an URL (online or offline) is determined by URLhaus and can be found in the fourth column of the database. - Why there is an issue running the script locally? - + Install **dos2unix** or use `busybox dos2unix` if BusyBox is already installed (like Ubuntu). - + If you have busybox but not `/usr/bin/dos2unix`, you can run `alias dos2unix="busybox dos2unix"` before executing the script. - + To set permanent alias, add it to your ".bashrc" or ".bash_profile". - -- Can you add this *very-bad-url.com* to the filter? - + No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit). - -- Why don't you `curl top-1m.csv.zip` and output to stdout? - + If curl fails, top-1m.txt will be empty. Output as file avoids that. - -- Why do you need to clone the repo again in your CI? I thought CI already fetch the repo by default? - + GitLab Runner clone/fetch the repo using HTTPS method by default ([log](https://gitlab.com/curben/urlhaus-filter/-/jobs/105979394)). This method requires deploy *token* which is *read-only* (cannot push). - + Deploy *key* has write access but cannot be used with the HTTPS method, hence, the workaround to clone using SSH. - + Another approach is [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) which seems HTTPS-compatible. - + Deploy/SSH key is preferred because it can be [restricted](https://docs.gitlab.com/ce/ssh/README.html#per-repository-deploy-keys) to access one repo only, unlike personal access token which has [global](https://docs.gitlab.com/ce/ssh/README.html#global-shared-deploy-keys) access. - + See issue [#18106](https://gitlab.com/gitlab-org/gitlab-ce/issues/18106), [#20567](https://gitlab.com/gitlab-org/gitlab-ce/issues/20567) and [#20845](https://gitlab.com/gitlab-org/gitlab-ce/issues/20845). + + Install busybox (bundled in Alpine and Ubuntu) and run the script using `busybox sh src/script.sh`. - Script terminated halfway with "grep broken pipe" or "grep killed" error + This is an out-of-memory (OOM) issue due to large input file. Recommend 3GB+ RAM.