8000 GitHub - tholinka/home-ops: Home Operations driven by k8s cluster deployed with Talos Linux; automated via Flux, Renovate, and GitHub Actions πŸ€–
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Home Operations driven by k8s cluster deployed with Talos Linux; automated via Flux, Renovate, and GitHub Actions πŸ€–

License

Notifications You must be signed in to change notification settings

tholinka/home-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ My Home Operations Repository 🚧

... managed with Flux, Renovate, and GitHub Actions πŸ€–

DiscordΒ Β  TalosΒ Β  KubernetesΒ Β  FluxΒ Β  Renovate

Home-InternetΒ Β  Status-PageΒ Β  Alertmanager

Age-DaysΒ Β  Uptime-DaysΒ Β  Node-CountΒ Β  Pod-CountΒ Β  CPU-UsageΒ Β  Memory-UsageΒ Β  Cluster Power-UsageΒ Β  Internet Power-UsageΒ Β  Alerts

πŸ’‘ Overview

This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Environment as Code (EaC), Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate, and GitHub Actions.

🌱 Kubernetes

My Kubernetes cluster is deployed with Talos. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server with BTRFS for NFS/SMB shares, bulk file storage and backups.

Core Components

  • actions-runner-controller: Self-hosted Github runners.
  • cert-manager: Creates SSL certificates for services in my cluster.
  • cilium: eBPF-based networking.
  • cloudflared: Enables Cloudflare secure access to routes.
  • external-dns: Automatically syncs ingress DNS records to Cloudflare and Unifi.
  • external-secrets: Managed Kubernetes secrets using Bitwarden Secrets Manager.
  • rook: Distributed block storage for persistent storage.
  • spegel: Stateless cluster local OCI registry mirror. No more image pull backoff.
  • volsync: Automatic backup and recovery of persistent volume claims to NFS and Cloudflare R2. Lose nothing when the cluster blows up!

GitOps

Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.

The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.

Directories

This Git repository contains the following directories under Kubernetes.

πŸ“ kubernetes
β”œβ”€β”€ πŸ“ apps             # applications
β”œβ”€β”€ πŸ“ bootstrap        # bootstrap procedures
└── πŸ“ flux             # flux system configuration
  β”œβ”€β”€ πŸ“ components     # re-useable components used by apps
  └── πŸ“ meta
    └── πŸ“ repositories # sources for flux, e.g. helm charts

Flux Workflow

This is a high-level look how Flux deploys my applications with dependencies. In most cases a HelmRelease will depend on other HelmRelease's, in other cases a Kustomization will depend on other Kustomization's, and in rare situations an app can depend on a HelmRelease and a Kustomization. The example below shows that mongo won't be deployed or upgrade until mongo and volsync are installed and in a healthy state.

graph TD
    A>Kustomization: unifi-mongo] --> |Depends on| B
    B>Kustomization: mongo] --> |Creates| B1
    B1>HelmRelease: mongo] --> |Depends on| B3
    B3>HelmRelease: rook-ceph-cluster] -->|Depends on| B4
    B4>HelmRelease: rook-ceph]
    B5>Kustomization: rook-ceph] -->|Creates| B3
    B5 -->|Creates| B4
    A -->|Depends on|D
    D>Kustomization: volsync] --> |Creates| D1
    D1>HelmRelease: volsync] --> |Depends on| E1
    E>Kustomization: snapshot-controller] --> |Creates| E1>Helmrelease: snapshot-controller]
    B3 --> |Depends on| E1
Loading

🌐 Networking

graph TD
  A>AT&T Fiber]
  A --> |😭|A1([Only gives a /64 per request, so no IPv6 on VLANs on UniFi])
  A --> |1Gb/1Gb on a 2.5GbE link| R
  B>T-Mobile Home Internet - Wireless] --> |Failover, 300ish down| R
  B --> |😭|B1([No IPv6 at all on UniFi])
  R>UniFi Gateway Max <br> 2 WAN ports, 3 LAN ports <br> @ 2.5 GbE]
  R --> |2.5GbE| S1
  R --> |2.5GbE<br>VLAN servers| S2
  S1>UniFi Switch Flex <br> 8x 2.5G PoE ports, 1x 10 GbE uplink]
  S2>USW Pro HD 24 <br> 22x 2.5GbE ports, 2x 10GbE port, 4 10G SFP+ ports]
  S1 --> |2.5GbE| W
  W>UniFi U7 Pro]
  W --> |WiFi|W1([ssid])
  W --> |IoT WiFi<br>VLAN iot|W2([ssid iot])
  S1 --> D([Devices])
  S2 -->|2.5GbE| K([7 Kubernetes nodes])
  S2 -->|1GbE| P([Raspberry Pi 4b with Z-Wave GPIO Hat])
  S2 -->|1GbE| KVM([PiKVM v3])
  S2 --> |2x 10G SFP+ Bonded| N([NAS])
Loading

🏘️ VLANs

Name ID CIDR
Default 0 192.168.1.0/24
servers 20 192.168.20.0/24
iot 30 192.168.30.0/24
guest 40 192.168.40.0/24

🌎 DNS

In my cluster there are three instances of ExternalDNS running. One syncs the public DNS to Cloudflare. The second syncs to Pi-Hole, which is the primary internal dns. The third syncs to my UniFi Gatway Max using ExternalDNS webhook provider for UniFi, as a fallback in case the cluster is down. This setup is managed by creating two gateways, internal and external. internal is only exposed internally, whereas external is exposed both internally and through Cloudflare.

🏠 Home DNS

graph TD
  A>internal-external-dns] -->|Updates|D
  C>Answers Request]
  D[PiHole] -->|Blocked, Cluster, or custom hosts|C
  D -->|Forwards other requests|E[DNSCrypt-Proxy]
  E[DNSCrypt-Proxy] -->|Forwards requests to DNSCrypt or DoH resolver|C
Loading

☁️ Cloud Dependencies

While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.

Alternative solutions to the first two of these problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Gatus; however, maintaining another cluster and monitoring another group of workloads would be more work and probably be more or equal out to the same costs as described below.

Service Use Cost
Bitwarden Secret Manager Secrets with External Secrets Free
Cloudflare Domain and S3 ~$50/yr
GCP Voice interactions with Home Assistant over Google Assistant Free
GitHub Hosting this repository and continuous integration/deployments Free
Gmail Email hosting Free
Pushover Kubernetes Alerts and application notifications $5 OTP
Fastmail E-Mail ~$60/yr
Total: ~$90/yo

πŸ–₯️ Hardware

Num Device CPU RAM OS Disk Data Disks OS Function
3 Lenovo ThinkCentre M700 Tiny i5-6400T 16GB of DDR4 512 GB SSD 512 GB SATA NVMe Talos Kubernetes
1 HP EliteDesk 800 G5 Desktop Mini i5-9500T 64 GB of DDR4 256 GB PCIe 3 NVMe 512 GB PCIe 3 NVMe Talos Kubernetes
1 HP ProDesk 400 G4 Desktop Mini i5-8500T 16 GB of DDR4 256 GB SSD 512 GB PCIe 3 NVMe Talos Kubernetes
2 HP ProDesk 400 G5 Desktop Mini i5-9500T 32 GB of DDR4 256 GB PCIe3 NVMe - Talos Kubernetes
1 Raspberry Pi 4b - 2GB 256GB SD card - Talos Kubernetes
+ a ZAC93 GPIO module for Z-Wave 800
1 Raspberry Pi 4b with PiKVM Hat - 2GB 256GB SD card - Arch Linux (PiKVM) PiKVM
1 self built NAS i7-6700k 32GB of DDR4 Raid 1: 250GB Samsung 840 EVO + 512 GB PCIe 3 NVMe BTRFS Raid 1:
- 3TB WD Black
- 2x 4TB WD Red
- 16 TB WD Gold Enterprise
- WD Ultrastar DC HC550 18 TB
512 GB PCIe 3 NVMe (PCIe passthrough to QEMU)
Arch Linux Large Files and Backups (+ Talos Worker in QEMU)

πŸ™ Thanks

Thanks to all the people in the Home Operations Discord.

The awesome kubesearch.dev, large parts of this are inspired by various work found through the search.

Extra Special Thanks

About

Home Operations driven by k8s cluster deployed with Talos Linux; automated via Flux, Renovate, and GitHub Actions πŸ€–

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0