8000 certs stops reloading · Issue #2 · dyson/certman · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
certs stops reloading #2
Open
Open
@ts3ng

Description

@ts3ng

I ran into an issue where the certs would stop reloading when we rotated them in our kubernetes environment. We imported this small library to handle the tls secret rotations that happen when but after the second rotation we got some TLS errors in our replica sets that came up. After the first rotation you will notice the certman logs stops running.

we saw the logs would stop appearing after a rotation.

2022/02/03 03:57:53 certman: watch event: "/run/secrets/tls/tls.key": REMOVE
2022/02/03 03:57:53 certman: certificate and key loaded
2022/02/03 03:57:53 certman: watch event: "/run/secrets/tls/tls.crt": CHMOD
2022/02/03 03:57:53 certman: certificate and key loaded
2022/02/03 03:57:53 certman: watch event: "/run/secrets/tls/tls.crt": REMOVE
2022/02/03 03:57:53 certman: certificate and key loaded

The error was the following:

Error creating: Internal error occurred: failed calling webhook "janus.mutating.custom-admission-webhooks<redacted>": Post "https://janus.ns-team-janus.svc:443/janus/v1/sidecar?timeout=2s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "<redacted>")

We noticed the cert that was mounted as a volume mount in our deployment object didn't match the one that was being served for our service runtime.

we consume our secret via kubernetes volume secret

      volumes:
        - name: janus-webhook-tls
          secret:
            secretName: janus-webhook-tls

We verified by grabbing the cert with the following command and noticed the dates were older than the originally rotated cert on the volume mount.

/ # echo | openssl s_client -servername <service-endpoint> -connect <service-endpoint>:<port>

Cert that was mounted via volume was the new one.

        Validity
            Not Before: Feb  8 12:27:12 2022 GMT
            Not After : Aug  8 12:27:12 2022 GMT

vs.
cert that was exposed via the openssl command is old.

        Validity
            Not Before: Feb  2 03:56:21 2022 GMT
            Not After : Aug  2 03:56:20 2022 GMT

There is already a PR filed that seems to address the same issue.
#1
The issue seemed to be the timing of the load the event that was being triggered and we also changed the the fsnotify to monitor the directory instead other individual files. This PR had to be tweak for us to get it to work in our application.
https://github.com/ts3ng/certman/tree/fix-reload

LMK if you want me to open PR as this lib doesn't look like it's been maintained for a while now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0