8000 Improve GeoLite2 file downloads when using RoadRunner · Issue #2124 · shlinkio/shlink · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Improve GeoLite2 file downloads when using RoadRunner #2124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
acelaya opened this issue May 10, 2024 · 4 comments
Closed

Improve GeoLite2 file downloads when using RoadRunner #2124

acelaya opened this issue May 10, 2024 · 4 comments
Milestone

Comments

@acelaya
Copy link
Member
acelaya commented May 10, 2024

An issue was recently reported, which was causing a GeoLite2 db file download attempt for every visit to Shlink #2114. The root cause is not determined, and it eventually went away.

A similar issue was fixed some time ago, which caused the same result, but because of a bug in Shlink #2021

These issues are highlighting the fact that current approach to automatically download/update the GeoLite2 db file is a bit brittle, and would be good to revisit it.

Current approach and context

When Shlink started to use GeoLite2, it initially provided a command line tool that checks if the database is up to date, and tries to download it otherwise. It was up to users to schedule the execution of this command as they see fit.

This is still the recommended approach for those serving Shlink with a classic web server (nginx/apache + php-fpm, or similar).

For convenience, and due to the existence of background jobs when Shlink started to support swoole/openswoole, and later RoadRunner, Shlink tried to provide a mechanism to automatically check if the GeoLite2 db needs to be updated, every time a visit happens, and do it if the file's build date metadata tells it's old enough.

This presents some problems though. If the download fails for whatever reason (a bug in Shlink, incorrect write permissions, download timeout, error while extracting the file, etc.), when existing db is too old, Shlink will try to download a new file for every visit, which can lead to a lot of download attempts.

This is even worst with recent MaxMind API limit changes, which only allow 30 daily downloads for one API key, leading to email notifications and logs getting fluded with errors, when Shlink has reached that limit.

Ideal scenario

EDIT: The approach finally selected is described in this comment.

In an ideal world, Shlink would try to update the GeoLite2 db only every N days, but not based on the file metadata, but on a fixed time schedule relative to when was the last attempt. If an error occurs, Shlink should re-schedule another attempt a bit later, with a maximum amount of attempts per day to try to avoid API limits.

This is tricky though, as RoadRunner's jobs system doesn't immediately provide this capability, so it would require some custom implementation.

RoadRunner's job queues docs https://docs.roadrunner.dev/queues-and-jobs/overview-queues

@madscientista

This comment has been minimized.

@acelaya acelaya moved this to Todo in Shlink Jul 3, 2024
@acelaya acelaya moved this from Todo to In Progress in Shlink Jul 21, 2024
@acelaya acelaya removed the status in Shlink Jul 23, 2024
@acelaya acelaya removed this from the 4.2.0 milestone Jul 23, 2024
@acelaya acelaya added this to the 4.3.0 milestone Aug 11, 2024
@acelaya acelaya moved this to Todo in Shlink Oct 8, 2024
@acelaya acelaya removed this from the 4.3.0 milestone Nov 24, 2024
@acelaya acelaya removed the status in Shlink Nov 24, 2024
@acelaya acelaya added this to the 4.4.0 milestone Nov 29, 2024
@shlinkio shlinkio deleted a comment from SoCuul Nov 30, 2024
@MattBlissett
Copy link

If using the official Docker container, a workaround for this issue is to mount the GeoIP database from the host, and use the host's systemd timer / cron to update the database:

docker run --restart always --name shlink -p 8080:8080 [environment] -v /var/lib/GeoIP/GeoLite2-City.mmdb:/etc/shlink/data/GeoLite2-City.mmdb shlinkio/shlink:stable

As far as I can see, Shlink still geolocates visits without the licence key, so long as the database is present.

@acelaya acelaya moved this to Todo in Shlink Nov 30, 2024
@acelaya
Copy link
Member Author
acelaya commented Dec 5, 2024

I'm thinking there's another possible approach to work around the issue with RoadRunner jobs described above, which also avoids coupling with that if a different runtime needs to be used in future.

We could continue scheduling database updates after a visit is created, like it happens now, but track GeoLite updates in the database, including when was the attempt, what is/was its status (in progress, success, error, etc.), in case of error, what was the reason, and other useful information.

Using this approach, we would be able to:

  1. Know when was the last successful download, and attempt another after a reasonable amount of time, but not based on the file metadata.
  2. Know if the N last attempts have failed, and in that case do not attempt again, and wait for a specific amount of time instead, preventing hitting API limits and other temporary issues.
  3. Use the database itself to "lock" the download process.
    If a job to download the database is running, and before finishing, another one is triggered, the second should always wait for the previous one to finish.
    This is currently done through an external locking mechanism, but if the database is involved, we could use it as well.
  4. Have useful historical information, as right now only the logs can tell you have many times has Shlink attempted a download.

The main challenge this approach presents is the fact that GeoLite downloads need to happen on a per filesystem basis. If you are running a cluster of 4 Shlink docker instances, the 4 of them need their GeoLite file copy.

Since the database is shared, we would need to track which instance is attempting the download, which means we need a unique ID per filesystem.

For instances sharing the same filesystem, this may not be a problem.

@acelaya
Copy link
Member Author
acelaya commented Dec 16, 2024

The change in logic is now merged.

Now I need to do a bit extra testing, specially focused in performance, and see if I can get rid of the external locking mechanism and use the database, but starting with Shlink 4.4, the geolocation database downloads should be more reliable, predictable and easier to diagnose when failing.

They should also avoid hitting GeoLite API limits due to the new approach which prevents erroring downloads to be retried indefinitely.

@acelaya acelaya closed this as completed Dec 17, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Shlink Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants
0