Validation Rules

How to define success criteria for challenges using the CLI-based validation system.

Validation rules define when a challenge is considered "solved". Kubeasy uses a CLI-based validation system where validations are defined in challenge.yaml and executed directly against the Kubernetes cluster.

Philosophy: Validate Outcomes, Not Implementations

The key principle of Kubeasy validations is to check that the problem is fixed, not how it was fixed.

Bad validation (reveals the solution):

- key: memory-limit-fixed
  title: "Memory Limit Set to 256Mi"
  description: "Container memory limit must be 256Mi"

Good validation (checks outcome):

- key: stable-operation
  title: "Stable Operation"
  description: "Pod must run without crashing"

This allows multiple valid solutions and doesn't spoil the learning experience.

Validation Types

Type	Purpose	Example Use Case
`status`	Check resource conditions	Pod Ready, Deployment Available
`log`	Find strings in container logs	"Connected to database", "Server started"
`event`	Detect forbidden K8s events	No OOMKilled, no Evicted
`metrics`	Check pod/deployment metrics	Restart count < 3, replicas >= 2
`connectivity`	HTTP connectivity tests	Service responds with 200

Defining Validations

All validations are defined in the validations section of challenge.yaml:

title: Pod Evicted
description: |
  A pod keeps crashing...
# ... other metadata

validations:
  - key: unique-identifier
    title: "User-Friendly Title"
    description: "What this checks (not how to fix it)"
    order: 1
    type: status
    spec:
      # Type-specific configuration

Common Fields

Field	Required	Description
`key`	Yes	Unique identifier for this validation
`title`	Yes	Short title shown in the UI
`description`	Yes	What this validation checks
`order`	No	Display order (lower = first)
`type`	Yes	One of: status, log, event, metrics, connectivity
`spec`	Yes	Type-specific configuration

Validation Examples

Status Validation

Check if a resource has specific conditions (Ready, Available, etc.)

- key: pod-ready-check
  title: "Pod Ready"
  description: "The application pod must be running and healthy"
  order: 1
  type: status
  spec:
    target:
      kind: Pod
      labelSelector:
        app: my-app
    conditions:
      - type: Ready
        status: "True"

For Deployments:

- key: deployment-available
  title: "Deployment Available"
  description: "All replicas must be available"
  order: 1
  type: status
  spec:
    target:
      kind: Deployment
      name: my-deployment
    conditions:
      - type: Available
        status: "True"

Log Validation

Find expected strings in container logs.

- key: database-connection
  title: "Database Connected"
  description: "The application must connect to the database"
  order: 2
  type: log
  spec:
    target:
      kind: Pod
      labelSelector:
        app: api-service
    expectedStrings:
      - "Connected to database successfully"
    sinceSeconds: 120

With specific container:

- key: sidecar-logs
  title: "Sidecar Running"
  description: "The sidecar container must be logging"
  type: log
  spec:
    target:
      kind: Pod
      labelSelector:
        app: my-app
    containerName: sidecar
    expectedStrings:
      - "Sidecar initialized"
    sinceSeconds: 60

Event Validation

Detect forbidden Kubernetes events (useful for checking stability).

- key: no-crashes
  title: "No Crash Events"
  description: "The pod should not experience crashes or evictions"
  order: 3
  type: event
  spec:
    target:
      kind: Pod
      labelSelector:
        app: data-processor
    forbiddenReasons:
      - "OOMKilled"
      - "Evicted"
      - "BackOff"
      - "FailedScheduling"
    sinceSeconds: 300

Metrics Validation

Check pod or deployment metrics like restart count.

- key: low-restarts
  title: "Low Restart Count"
  description: "Pod must be stable without excessive restarts"
  order: 4
  type: metrics
  spec:
    target:
      kind: Pod
      labelSelector:
        app: my-app
    metricChecks:
      - metric: restartCount
        operator: LessThan
        value: 3

Available operators: LessThan, GreaterThan, Equals

Connectivity Validation

Test HTTP connectivity between pods.

- key: service-reachable
  title: "Service Connectivity"
  description: "The backend must be reachable"
  order: 6
  type: connectivity
  spec:
    sourcePod:
      labelSelector:
        app: client
    targets:
      - url: "http://backend-service:8080/health"
        expectedStatusCode: 200
        timeoutSeconds: 5

With custom headers (useful for Ingress testing):

- key: ingress-routing
  title: "Ingress Routing"
  description: "Traffic must route through ingress"
  type: connectivity
  spec:
    sourcePod:
      labelSelector:
        app: client
    targets:
      - url: "http://ingress-nginx-controller.ingress-nginx.svc.cluster.local"
        headers:
          Host: "api.local"
        expectedStatusCode: 200
        timeoutSeconds: 5

Kyverno Policies

Kyverno policies prevent users from bypassing the challenge (e.g., replacing the broken app with a working one).

What to Protect

Container images - Prevent replacing the application
Critical volume mounts - Prevent removing problematic configs
Essential labels - Ensure validations can find resources

Example Policy

# policies/protect.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: protect-challenge-image
spec:
  validationFailureAction: Enforce
  rules:
    - name: preserve-image
      match:
        resources:
          kinds: ["Deployment"]
          names: ["my-app"]
          namespaces: ["challenge-*"]
      validate:
        message: "Cannot change the application image"
        pattern:
          spec:
            template:
              spec:
                containers:
                  - name: app
                    image: "kubeasy/broken-app:v1"

What NOT to Protect

Users should be free to:

Modify resource limits/requests
Add environment variables
Change probe configurations
Add/modify labels and annotations
Scale deployments

Complete Example

Here's a complete challenge.yaml with multiple validations:

title: Pod Evicted
description: |
  A data processing pod keeps crashing and getting evicted.
  It was working fine yesterday, but now Kubernetes keeps killing it.
theme: resources-scaling
difficulty: easy
estimated_time: 15
initial_situation: |
  A data processing application is deployed as a single pod.
  The pod starts successfully but after a few seconds gets killed.
  It enters a CrashLoopBackOff state and keeps restarting.
objective: |
  Fix the pod so it can run without being evicted.
  Understand why Kubernetes is killing the application.

validations:
  - key: pod-running
    title: "Pod Ready"
    description: "The data-processor pod must be running and healthy"
    order: 1
    type: status
    spec:
      target:
        kind: Pod
        labelSelector:
          app: data-processor
      conditions:
        - type: Ready
          status: "True"

  - key: no-eviction
    title: "No Crash Events"
    description: "The pod should run stably without being killed"
    order: 2
    type: event
    spec:
      target:
        kind: Pod
        labelSelector:
          app: data-processor
      forbiddenReasons:
        - "Evicted"
        - "OOMKilled"
      sinceSeconds: 300

  - key: low-restarts
    title: "Stable Operation"
    description: "The pod must not restart excessively"
    order: 3
    type: metrics
    spec:
      target:
        kind: Pod
        labelSelector:
          app: data-processor
      metricChecks:
        - metric: restartCount
          operator: LessThan
          value: 3

Anti-Patterns

Don't reveal the solution in validation titles

# BAD
- key: memory-limit
  title: "Memory Limit Increased to 256Mi"

# GOOD
- key: stable-operation
  title: "Stable Operation"

Don't be too specific about implementation

# BAD
- key: probe-check
  title: "Liveness Probe Uses /healthz Endpoint"

# GOOD
- key: health-checks
  title: "Health Checks Pass"

Don't check implementation details

# BAD - Forces specific solution
- key: secret-volume
  title: "Secret Mounted at /etc/credentials"
  type: status
  spec:
    # Checks for specific volume mount path

# GOOD - Checks the app works
- key: authentication
  title: "Application Authenticated"
  type: log
  spec:
    expectedStrings:
      - "Authentication successful"

How Validation Works

User starts challenge → CLI deploys manifests via ArgoCD
User works on the fix → Modifies resources with kubectl
User submits → CLI loads validations from challenge.yaml
CLI executes validations → Runs each check against the cluster
Results sent to backend → Backend verifies all objectives present
XP awarded → If all validations pass

Next Steps

See Challenge Structure for the complete challenge format
Learn Creating Your First Challenge step by step
Review Testing Challenges for verification

Validation Rules

On this page