Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add healthcheck support for http and smtp #8

Merged

Conversation

ap-wtioit
Copy link
Contributor

this enables the user to configure a healthcheck for containers that can
be used to automatically restart proxies resolving to no longer active
ips due to PRE_RESOLVE

Issue description

on our longer running instances we discovered that after a while we couldn't reach proxied services anymore from odoo. we tracked it down to the docker-whitelist configs using PRE_RESOLVE: 1 for cloud services that are not always using the same ip for deployment (e.g. accounts.google.com) the ip changed after a few weeks while the running proxy still resolved to the old ip which was no longer serving the service.

New behaviour

with this change you can configure the required healthcheck for the proxied service type (http/smtp) that now enables docker to check the health of the proxy container. combined with https://github.com/willfarrell/docker-autoheal this enables us to automatically restart those containers no longer resolving to the correct ip.

We tested this for 4 months (and improving it in the meanwhile) on our prod system so it seems a worthy addition for us.

To me it looks like this also could be the foundation for fixing #5 (if the reporter confirms they are also using a target service with changing IP addresses) if we add an SQL healthcheck as well.

info @wt-io-it

@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 2 times, most recently from 5328834 to d275223 Compare July 5, 2021 10:08
Copy link
Contributor

@joao-p-marques joao-p-marques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good addition! Thanks 👍

Can you please check the comments?

healthcheck.py Outdated Show resolved Hide resolved
tests/run_tests.sh Outdated Show resolved Hide resolved
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch from d275223 to ba6893a Compare July 5, 2021 12:02
joao-p-marques added a commit that referenced this pull request Jul 7, 2021
yajo pushed a commit that referenced this pull request Jul 8, 2021
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 18 times, most recently from d89cd1c to 73d04f4 Compare July 9, 2021 09:16
Copy link
Contributor

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much better! A couple of comments though.

tests/test_healtcheck.py Outdated Show resolved Hide resolved
Comment on lines 20 to 26
def _healthcheck(*args, **kwargs):
args = ("-f", "tests/healthcheck.yaml") + args
return docker_compose(*args, **kwargs)


def _get_container_id(service_name):
return _healthcheck("ps", "-q", service_name).strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you knew this, but plumbum supports these kind of constructs out of the box (it would be equivalent (except the strip() part)):

Suggested change
def _healthcheck(*args, **kwargs):
args = ("-f", "tests/healthcheck.yaml") + args
return docker_compose(*args, **kwargs)
def _get_container_id(service_name):
return _healthcheck("ps", "-q", service_name).strip()
_healthcheck = docker_compose["-f", "tests/healthcheck.yaml"]
_get_container_id = _healthcheck["ps", "-q", service_name]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_container_id = _healthcheck["ps", "-q", service_name]

does not work, because this syntax doesn't support args:

 NameError: name 'service_name' is not defined

Copy link
Contributor

@yajo yajo Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry, my butter fingers. It should be _get_container_id = _healthcheck["ps", "-q"] instead, and then later called as _get_container_id(service_name).strip()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed it to

_healthcheck["ps", "-q"]

@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 2 times, most recently from 2f3bc2f to 06bfdc2 Compare July 12, 2021 06:55
@yajo yajo requested a review from joao-p-marques July 12, 2021 07:00
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch from 06bfdc2 to f4e232c Compare July 12, 2021 07:00
Copy link
Contributor

@joao-p-marques joao-p-marques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just (another) couple of comments 😄

tests/test_healtcheck.py Outdated Show resolved Hide resolved
tests/healthcheck.yaml Show resolved Hide resolved
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 4 times, most recently from a54d0e4 to 52f13a3 Compare July 12, 2021 08:06
@pedrobaeza
Copy link
Member

I'm checking this now (after a while without seeing here). Is it everything up to date to check?

@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch from 52f13a3 to b04dc30 Compare February 6, 2024 09:27
@ap-wtioit
Copy link
Contributor Author

I rebased the previous 3 commits, and cherry-picked the improvements we made to our internal version in the last 3 years:

  • adding timeouts for checking availability of services (in case the port becomes firewalled, e.g. when the service gets a new ip and the old ip is now unused and protected by a firewall)
  • when using PRE_RESOLVE, check if the target still resolves to the same IP (so the container gets restarted when the service gets a new ip)
  • detect load balancing ip addresses (e.g. when we get a new ip with every request) and skip the PRE_RESOLVE healthcheck from then on, preventing constantly restarting the container in this cases (the http / smtp healthchecks still work for those cases)

@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 5 times, most recently from bcdaacd to f1e7027 Compare February 6, 2024 10:12
this enables the user to configure a healthcheck for containers that can
be used to automatically restart proxies resolving to no longer active
ips due to PRE_RESOLVE
this should help if ports are/become firewalled for healthcheck to be
able to mark container as unhealthy
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch 2 times, most recently from e72d53f to bc075b6 Compare February 6, 2024 10:28
when we detect a load balancing dns (responding with different ips on
almost every request) we deactivate the resolver healthcheck to not
restart those containers all the time
@ap-wtioit ap-wtioit force-pushed the master-enable_healthcheck_github branch from bc075b6 to 4627f26 Compare February 6, 2024 10:29
@ap-wtioit
Copy link
Contributor Author

@pedrobaeza i think i finally made it work. Not at all sure about the poetry stuff (it was pretty broken on Ubuntu 23.10 and python 3.11) in the end i disabled typed-ast in poetry lock ran poetry add --dev pytest-timeout (and manually moved it to [tool.poetry.dev-dependencies]) ran poetry lock and checked with poetry install

@pedrobaeza pedrobaeza merged commit 118a9ac into Tecnativa:master Feb 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants