Validation Rules
How to define success criteria for challenges using the CLI-based validation system.
Validation rules define when a challenge is considered "solved". Kubeasy uses a CLI-based validation system where validations are defined in challenge.yaml and executed directly against the Kubernetes cluster.
Philosophy: Validate Outcomes, Not Implementations
The key principle of Kubeasy validations is to check that the problem is fixed, not how it was fixed.
Bad validation (reveals the solution):
- key: memory-limit-fixed
title: "Memory Limit Set to 256Mi"
description: "Container memory limit must be 256Mi"Good validation (checks outcome):
- key: stable-operation
title: "Stable Operation"
description: "Pod must run without crashing"This allows multiple valid solutions and doesn't spoil the learning experience.
Validation Types
| Type | Purpose | Example Use Case |
|---|---|---|
status | Check resource conditions | Pod Ready, Deployment Available |
log | Find strings in container logs | "Connected to database", "Server started" |
event | Detect forbidden K8s events | No OOMKilled, no Evicted |
metrics | Check pod/deployment metrics | Restart count < 3, replicas >= 2 |
connectivity | HTTP connectivity tests | Service responds with 200 |
Defining Validations
All validations are defined in the validations section of challenge.yaml:
title: Pod Evicted
description: |
A pod keeps crashing...
# ... other metadata
validations:
- key: unique-identifier
title: "User-Friendly Title"
description: "What this checks (not how to fix it)"
order: 1
type: status
spec:
# Type-specific configurationCommon Fields
| Field | Required | Description |
|---|---|---|
key | Yes | Unique identifier for this validation |
title | Yes | Short title shown in the UI |
description | Yes | What this validation checks |
order | No | Display order (lower = first) |
type | Yes | One of: status, log, event, metrics, connectivity |
spec | Yes | Type-specific configuration |
Validation Examples
Status Validation
Check if a resource has specific conditions (Ready, Available, etc.)
- key: pod-ready-check
title: "Pod Ready"
description: "The application pod must be running and healthy"
order: 1
type: status
spec:
target:
kind: Pod
labelSelector:
app: my-app
conditions:
- type: Ready
status: "True"For Deployments:
- key: deployment-available
title: "Deployment Available"
description: "All replicas must be available"
order: 1
type: status
spec:
target:
kind: Deployment
name: my-deployment
conditions:
- type: Available
status: "True"Log Validation
Find expected strings in container logs.
- key: database-connection
title: "Database Connected"
description: "The application must connect to the database"
order: 2
type: log
spec:
target:
kind: Pod
labelSelector:
app: api-service
expectedStrings:
- "Connected to database successfully"
sinceSeconds: 120With specific container:
- key: sidecar-logs
title: "Sidecar Running"
description: "The sidecar container must be logging"
type: log
spec:
target:
kind: Pod
labelSelector:
app: my-app
containerName: sidecar
expectedStrings:
- "Sidecar initialized"
sinceSeconds: 60Event Validation
Detect forbidden Kubernetes events (useful for checking stability).
- key: no-crashes
title: "No Crash Events"
description: "The pod should not experience crashes or evictions"
order: 3
type: event
spec:
target:
kind: Pod
labelSelector:
app: data-processor
forbiddenReasons:
- "OOMKilled"
- "Evicted"
- "BackOff"
- "FailedScheduling"
sinceSeconds: 300Metrics Validation
Check pod or deployment metrics like restart count.
- key: low-restarts
title: "Low Restart Count"
description: "Pod must be stable without excessive restarts"
order: 4
type: metrics
spec:
target:
kind: Pod
labelSelector:
app: my-app
metricChecks:
- metric: restartCount
operator: LessThan
value: 3Available operators: LessThan, GreaterThan, Equals
Connectivity Validation
Test HTTP connectivity between pods.
- key: service-reachable
title: "Service Connectivity"
description: "The backend must be reachable"
order: 6
type: connectivity
spec:
sourcePod:
labelSelector:
app: client
targets:
- url: "http://backend-service:8080/health"
expectedStatusCode: 200
timeoutSeconds: 5With custom headers (useful for Ingress testing):
- key: ingress-routing
title: "Ingress Routing"
description: "Traffic must route through ingress"
type: connectivity
spec:
sourcePod:
labelSelector:
app: client
targets:
- url: "http://ingress-nginx-controller.ingress-nginx.svc.cluster.local"
headers:
Host: "api.local"
expectedStatusCode: 200
timeoutSeconds: 5Kyverno Policies
Kyverno policies prevent users from bypassing the challenge (e.g., replacing the broken app with a working one).
What to Protect
- Container images - Prevent replacing the application
- Critical volume mounts - Prevent removing problematic configs
- Essential labels - Ensure validations can find resources
Example Policy
# policies/protect.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: protect-challenge-image
spec:
validationFailureAction: Enforce
rules:
- name: preserve-image
match:
resources:
kinds: ["Deployment"]
names: ["my-app"]
namespaces: ["challenge-*"]
validate:
message: "Cannot change the application image"
pattern:
spec:
template:
spec:
containers:
- name: app
image: "kubeasy/broken-app:v1"What NOT to Protect
Users should be free to:
- Modify resource limits/requests
- Add environment variables
- Change probe configurations
- Add/modify labels and annotations
- Scale deployments
Complete Example
Here's a complete challenge.yaml with multiple validations:
title: Pod Evicted
description: |
A data processing pod keeps crashing and getting evicted.
It was working fine yesterday, but now Kubernetes keeps killing it.
theme: resources-scaling
difficulty: easy
estimated_time: 15
initial_situation: |
A data processing application is deployed as a single pod.
The pod starts successfully but after a few seconds gets killed.
It enters a CrashLoopBackOff state and keeps restarting.
objective: |
Fix the pod so it can run without being evicted.
Understand why Kubernetes is killing the application.
validations:
- key: pod-running
title: "Pod Ready"
description: "The data-processor pod must be running and healthy"
order: 1
type: status
spec:
target:
kind: Pod
labelSelector:
app: data-processor
conditions:
- type: Ready
status: "True"
- key: no-eviction
title: "No Crash Events"
description: "The pod should run stably without being killed"
order: 2
type: event
spec:
target:
kind: Pod
labelSelector:
app: data-processor
forbiddenReasons:
- "Evicted"
- "OOMKilled"
sinceSeconds: 300
- key: low-restarts
title: "Stable Operation"
description: "The pod must not restart excessively"
order: 3
type: metrics
spec:
target:
kind: Pod
labelSelector:
app: data-processor
metricChecks:
- metric: restartCount
operator: LessThan
value: 3Anti-Patterns
Don't reveal the solution in validation titles
# BAD
- key: memory-limit
title: "Memory Limit Increased to 256Mi"
# GOOD
- key: stable-operation
title: "Stable Operation"Don't be too specific about implementation
# BAD
- key: probe-check
title: "Liveness Probe Uses /healthz Endpoint"
# GOOD
- key: health-checks
title: "Health Checks Pass"Don't check implementation details
# BAD - Forces specific solution
- key: secret-volume
title: "Secret Mounted at /etc/credentials"
type: status
spec:
# Checks for specific volume mount path
# GOOD - Checks the app works
- key: authentication
title: "Application Authenticated"
type: log
spec:
expectedStrings:
- "Authentication successful"How Validation Works
- User starts challenge → CLI deploys manifests via ArgoCD
- User works on the fix → Modifies resources with kubectl
- User submits → CLI loads validations from
challenge.yaml - CLI executes validations → Runs each check against the cluster
- Results sent to backend → Backend verifies all objectives present
- XP awarded → If all validations pass
Next Steps
- See Challenge Structure for the complete challenge format
- Learn Creating Your First Challenge step by step
- Review Testing Challenges for verification