The operational intelligence system that catches configuration issues before they cause outages.
Kogaro transforms Kubernetes cluster hygiene from reactive debugging to proactive intelligence. While other tools generate compliance noise, Kogaro delivers actionable signals that production teams actually trust and act upon.
Production Kubernetes clusters suffer from silent configuration failures:
- Dangling references cause mysterious service outages
- Security misconfigurations slip through CI/CD
- Resource issues manifest as performance problems
- Network policies have gaps that compromise security
These issues are invisible until they cause incidents.
Kogaro provides operational vigilance through:
- 60+ validation types across Reference, Security, Resource, Image, and Networking categories
- CI/CD Integration with scoped validation for pre-deployment validation
- Structured error codes (KOGARO-XXX-YYY) for automated processing
- Real-time detection of configuration drift and dangerous changes
- Prometheus integration for monitoring and alerting
- Production-ready architecture with leader election and HA support
Result: Issues caught in minutes, not hours. Admins who trust alerts instead of ignoring noise.
Category | Traditional Tools | Kogaro Advantage |
---|---|---|
Policy Engines | Complex rule languages | Simple, focused validations |
Security Scanners | Point-in-time reports | Continuous operational monitoring |
Monitoring Tools | Runtime metrics only | Configuration hygiene focus |
Compliance Tools | Audit checklists | Actionable operational intelligence |
Unique Value: Kogaro is the only tool specifically designed for operational configuration hygiene - catching the silent failures that other tools miss.
Kogaro provides five comprehensive validation categories covering all critical aspects of Kubernetes cluster hygiene:
Detects dangling references to non-existent resources:
-
Ingress References (
--enable-ingress-validation
)dangling_ingress_class
: Missing IngressClass referencesdangling_service_reference
: Missing Service references in ingress rulesdangling_tls_secret
: Missing TLS Secrets in ingress
-
ConfigMap References (
--enable-configmap-validation
)dangling_configmap_volume
: Missing ConfigMap volume referencesdangling_configmap_envfrom
: Missing ConfigMap envFrom references
-
Secret References (
--enable-secret-validation
)dangling_secret_volume
: Missing Secret volume referencesdangling_secret_envfrom
: Missing Secret envFrom referencesdangling_secret_env
: Missing Secret env var references
-
Storage References (
--enable-pvc-validation
)dangling_pvc_reference
: Missing PVC referencesdangling_storage_class
: Missing StorageClass references
-
ServiceAccount References (
--enable-serviceaccount-validation
)dangling_service_account
: Missing ServiceAccount references
Ensures proper resource management and QoS:
- Resource Constraints (
--enable-resource-limits-validation
)missing_resource_requests
: Containers without CPU/memory requestsmissing_resource_limits
: Containers without CPU/memory limitsinsufficient_cpu_request
: CPU requests below minimum thresholdsinsufficient_memory_request
: Memory requests below minimum thresholdsqos_class_issue
(BestEffort): Containers with no resource constraintsqos_class_issue
(Burstable): Containers where requests β limits
Detects security misconfigurations and vulnerabilities:
-
Pod & Container Security (
--enable-security-validation
)pod_running_as_root
: Pod SecurityContext specifies runAsUser: 0pod_allows_root_user
: Pod SecurityContext missing runAsNonRoot: truecontainer_running_as_root
: Container SecurityContext specifies runAsUser: 0container_allows_privilege_escalation
: Container allows privilege escalationcontainer_privileged_mode
: Container running in privileged modecontainer_writable_root_filesystem
: Container has writable root filesystemcontainer_additional_capabilities
: Container adds Linux capabilitiesmissing_pod_security_context
: Pod has no SecurityContext definedmissing_container_security_context
: Container has no SecurityContext defined
-
ServiceAccount & RBAC Security (
--enable-security-serviceaccount-validation
)serviceaccount_cluster_role_binding
: ServiceAccount with ClusterRoleBindingserviceaccount_excessive_permissions
: ServiceAccount with dangerous RoleBinding
Validates container images and registry accessibility:
- Image Registry & Architecture (
--enable-image-validation
)invalid_image_reference
: Containers with malformed image referencesmissing_image
: Images that don't exist in the registrymissing_image_warning
: Missing images (when--allow-missing-images
is enabled)architecture_mismatch
: Image architecture incompatible with cluster nodesarchitecture_mismatch_warning
: Architecture mismatches (when--allow-architecture-mismatch
is enabled)
Validates service connectivity and network policies:
-
Service Connectivity (
--enable-networking-validation
)service_selector_mismatch
: Service selectors that don't match any podsservice_no_endpoints
: Services with no ready endpoints despite matching podsservice_port_mismatch
: Service ports that don't match container portspod_no_service
: Pods not exposed by any Service (warning when enabled)
-
NetworkPolicy Coverage (
--networking-policy-validation
)network_policy_orphaned
: NetworkPolicy selectors that don't match any podsmissing_network_policy_default_deny
: Namespaces with policies but no default denymissing_network_policy_required
: Required namespaces missing NetworkPolicies
-
Ingress Connectivity (
--enable-networking-validation
)ingress_service_missing
: Ingress references to non-existent servicesingress_service_port_mismatch
: Ingress references to non-existent service portsingress_no_backend_pods
: Ingress services with no ready backend pods
- Prometheus Metrics: Exports validation error counts and run statistics
- Structured Logging: Detailed logs of all validation issues found
- Health Checks: Kubernetes-native health and readiness probes
Kogaro assigns structured error codes to all validation issues for easy categorization, filtering, and automated processing. Each error follows the format KOGARO-CCC-XXX
:
- Reference Validation:
KOGARO-REF-001
throughKOGARO-REF-011
- Resource Limits:
KOGARO-RES-001
throughKOGARO-RES-010
- Security Validation:
KOGARO-SEC-001
throughKOGARO-SEC-012
- Image Validation:
KOGARO-IMG-001
throughKOGARO-IMG-005
- Networking Validation:
KOGARO-NET-001
throughKOGARO-NET-009
Benefits:
- Automated Processing: Filter and process errors by type or category
- Metrics & Alerting: Create dashboards and alerts based on error patterns
- Tool Integration: External tools can understand and act on specific error types
- Trend Analysis: Track which issues are most common over time
π See the complete Error Codes Reference for detailed mappings
Example usage:
# Show only security issues
kubectl logs kogaro-pod | grep "KOGARO-SEC-"
# Count reference validation errors
kubectl logs kogaro-pod | grep "KOGARO-REF-" | wc -l
# Check for image-related issues
kubectl logs kogaro-pod | grep "KOGARO-IMG-"
Deploy in 5 minutes, start catching silent failures immediately.
For detailed deployment instructions, see the Deployment Guide.
- Go 1.21 or later
- Kubernetes cluster access
- kubectl configured
# Add the Kogaro Helm repository
helm repo add kogaro https://topiaruss.github.io/kogaro
helm repo update
# Install Kogaro with default settings
helm install kogaro kogaro/kogaro \
--namespace kogaro-system \
--create-namespace
# Or install with custom configuration including image validation
helm install kogaro kogaro/kogaro \
--namespace kogaro-system \
--create-namespace \
--set validation.enableServiceAccountValidation=true \
--set validation.enableImageValidation=true \
--set validation.scanInterval=10m \
--set resourceLimits.minCPURequest=50m \
--set security.enableNetworkPolicyValidation=true
# Check deployment status
kubectl get pods -n kogaro-system
# Watch it immediately detect configuration issues
kubectl logs -n kogaro-system -l app.kubernetes.io/name=kogaro -f
# Clone and install directly
git clone https://github.com/topiaruss/kogaro.git
cd kogaro
helm install kogaro charts/kogaro --namespace kogaro-system --create-namespace
# Clone the repository
git clone https://github.com/topiaruss/kogaro.git
cd kogaro
# Install dependencies
go mod download
# Run locally against your current kubeconfig
go run main.go --scan-interval=30s
# Run with specific validations enabled (including image validation)
go run main.go --enable-secret-validation=false --enable-security-validation=true --enable-image-validation=true --min-cpu-request=100m
# Or build and run
make build
./bin/kogaro --help
NEW: Pre-deployment validation for your CI/CD pipelines
Kogaro now provides powerful CI/CD integration with scoped validation - validate your configuration files before deployment while benefiting from full cluster-aware validation.
# Validate your config file before deployment
# Shows only errors related to YOUR resources
helm template my-app ./chart | kogaro --mode=one-off --config=- --scope=file-only
# Or validate a specific YAML file
kogaro --mode=one-off --config=deployment.yaml --scope=file-only
- π― Focused Feedback:
--scope=file-only
shows only errors for resources in your config files - π Full Validation: Still performs cluster-aware validation (detects conflicts, missing references, etc.)
- π³ Image Validation: Verifies container images exist in registries and match cluster architecture
- β‘ Fast Feedback: Catch issues before deployment, not after
- π€ Automation Ready: Structured exit codes and error formats for CI/CD automation
# GitHub Actions example
- name: Validate Kubernetes manifests
run: |
helm template my-app ./chart | \
kogaro --mode=one-off --config=- --scope=file-only \
--enable-image-validation=true --output=ci
# GitLab CI example
validate-k8s:
script:
- helm template my-app ./chart > manifests.yaml
- kogaro --mode=one-off --config=manifests.yaml --scope=file-only --enable-image-validation=true
--scope=all
(default): Show all validation errors in the cluster--scope=file-only
: Show only errors for resources defined in your config file
Perfect for: Pre-deployment validation, CI/CD pipelines, developer workflows
--scan-interval
: Interval between cluster scans (default: 5m)--metrics-bind-address
: Metrics server bind address (default: :8080)--health-probe-bind-address
: Health probe bind address (default: :8081)--leader-elect
: Enable leader election for HA deployments (default: false)
--scope
: Control which errors are displayed for one-off validationsall
: Show all validation errors (default)file-only
: Show only errors for resources defined in the config file
--enable-ingress-validation
: Enable Ingress references validation (default: true)--enable-configmap-validation
: Enable ConfigMap references validation (default: true)--enable-secret-validation
: Enable Secret references validation (default: true)--enable-pvc-validation
: Enable PVC/StorageClass validation (default: true)--enable-reference-serviceaccount-validation
: Enable ServiceAccount reference validation (default: false)
--enable-resource-limits-validation
: Enable resource requests/limits validation (default: true)--enable-missing-requests-validation
: Enable missing requests validation (default: true)--enable-missing-limits-validation
: Enable missing limits validation (default: true)--enable-qos-validation
: Enable QoS class analysis (default: true)--min-cpu-request
: Minimum CPU request threshold (e.g., '10m')--min-memory-request
: Minimum memory request threshold (e.g., '16Mi')
--enable-security-validation
: Enable security configuration validation (default: true)--enable-root-user-validation
: Enable root user validation (default: true)--enable-security-context-validation
: Enable SecurityContext validation (default: true)--enable-security-serviceaccount-validation
: Enable ServiceAccount permissions validation (default: true)--enable-network-policy-validation
: Enable NetworkPolicy validation (default: true)--security-required-namespaces
: Namespaces requiring NetworkPolicies for security validation
--enable-image-validation
: Enable container image validation (default: false)--allow-missing-images
: Allow deployment if images are not found in registry (default: false)--allow-architecture-mismatch
: Allow deployment if image architecture doesn't match nodes (default: false)
--enable-networking-validation
: Enable networking connectivity validation (default: true)--enable-networking-service-validation
: Enable Service validation (default: true)--enable-networking-ingress-validation
: Enable Ingress connectivity validation (default: true)--enable-networking-policy-validation
: Enable NetworkPolicy coverage validation (default: true)--networking-required-namespaces
: Namespaces requiring NetworkPolicies for networking validation--warn-unexposed-pods
: Warn about pods not exposed by Services (default: false)
Access metrics at http://localhost:8080/metrics
:
# Total validation errors by type
kogaro_validation_errors_total{resource_type="Ingress",validation_type="dangling_ingress_class",namespace="default"}
# Total validation runs
kogaro_validation_runs_total
Built for Production Operations
Kogaro uses a modular validator architecture designed for enterprise Kubernetes environments:
- Validator Registry: Extensible system managing Reference, Security, Resource, and Networking validators
- Continuous Monitoring: Configurable scan intervals from seconds to hours
- Operational Intelligence:
- Detects silent failures before they impact users
- Structured error codes for automated response systems
- Real-time configuration drift detection
- Network connectivity and security posture validation
- Enterprise Features: Leader election, HA deployment, comprehensive RBAC
- Observability: Prometheus metrics, structured logging, health checks
- Zero-Downtime: Kubernetes-native with rolling updates and graceful shutdown
The validator registry pattern supports easy extension. Add new validators by implementing the Validator
interface:
func (v *ReferenceValidator) validateNewResourceType(ctx context.Context) ([]ValidationError, error) {
// Your validation logic here
return errors, nil
}
Then register it in the validator registry. See Contributing Guide for details.
The Problem: Your CI/CD pipeline deploys this Ingress successfully:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
spec:
ingressClassName: nginx # β Typo! Should be 'ingress-nginx'
rules:
- host: myapp.example.com
http:
paths:
- path: /
backend:
service:
name: my-app-service
port:
number: 80
What happens: Deployment succeeds β , but traffic fails silently β. Users see 404 errors.
Kogaro catches it immediately:
π¨ KOGARO-REF-001: IngressClass 'nginx' does not exist in namespace 'default'
Resource: Ingress/my-app
Expected: 'ingress-nginx' (available IngressClass)
Impact: Traffic routing will fail
Fix: kubectl patch ingress my-app -p '{"spec":{"ingressClassName":"ingress-nginx"}}'
The Problem: This pod deploys without errors:
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: myapp:latest
securityContext:
runAsUser: 0 # β Running as root!
allowPrivilegeEscalation: true # β Security risk!
What happens: Pod runs successfully β , but creates security vulnerabilities β.
Kogaro detects the risk:
π¨ KOGARO-SEC-003: Container running as root user (UID 0)
Resource: Pod/my-app
Security Risk: HIGH - Root access in container
Best Practice: Set runAsUser to non-zero value
Fix: Add securityContext with runAsUser: 1000 and runAsNonRoot: true
The Problem: Container image doesn't exist or has architecture mismatch:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
image: myregistry/nonexistent:latest # β Image doesn't exist!
- name: sidecar
image: arm64/nginx:latest # β Wrong architecture for AMD64 cluster!
What happens: Deployment succeeds β , but pods get ImagePullBackOff errors β.
Kogaro prevents the issue:
π¨ KOGARO-IMG-002: Container 'app' references non-existent image: myregistry/nonexistent:latest
Resource: Deployment/my-app
Impact: Pod will fail to start with ImagePullBackOff
Fix: Verify image exists in registry or use --allow-missing-images for development
π¨ KOGARO-IMG-004: Container 'sidecar' image architecture (arm64) incompatible with cluster nodes (amd64)
Resource: Deployment/my-app
Impact: Pod will fail to start on amd64 nodes
Fix: Use multi-arch image or specify correct architecture
- Error Codes Reference - Complete mapping of structured error codes for all validation types
- Deployment Guide - Comprehensive deployment and configuration instructions
- Contributing Guide - Development setup and contribution guidelines
- Security Policy - Security considerations and vulnerability reporting
- Validation Mappings - Technical mapping of validation types to error codes, Kubernetes spec paths, and test files
"Kogaro caught a dangling IngressClass reference that would have caused a production outage. Our deployment pipeline passed all tests, but traffic would have failed silently."
β DevOps Engineer, Fortune 500 Company
"We use Kogaro's structured error codes to automatically create Jira tickets for configuration issues. Game changer for our automation."
β Platform Team Lead, Tech Startup
"Finally, a tool that catches the 'invisible' issues that cause 3 AM pages. Kogaro pays for itself in the first week."
β SRE Manager, SaaS Company
- π Report Issues
- π‘ Feature Requests
- π€ Contributing Guide
- π§ Security Policy
- Temporal Intelligence: Distinguish NEW issues from stable patterns
- Custom Validations: Plugin system for organization-specific rules
- GitOps Integration: Pre-deployment validation in CI/CD pipelines
- Advanced Alerting: Slack, PagerDuty, and custom webhook integration
- Multi-cluster: Fleet-wide configuration consistency validation