KEDA¶

KEDA (Kubernetes Event-Driven Autoscaling) extends the Kubernetes HPA to scale workloads based on external event sources - Kafka consumer lag, queue depth, Prometheus metrics, HTTP request rate, cron schedules, and dozens more - including scaling all the way to zero.

The standard HPA only scales on CPU and memory. For event-driven workloads, those signals arrive too late or don't reflect the actual backlog. A Kafka consumer might be idle (low CPU) but sitting behind 10,000 unprocessed messages. KEDA surfaces that backlog as the scaling signal.

Architecture¶

flowchart TD
    KEDA[KEDA Operator] --> |reads| SO[ScaledObject\nor ScaledJob]
    KEDA --> |creates/manages| HPA[HorizontalPodAutoscaler]
    KEDA --> |queries| Scaler[External scaler\nKafka / Redis / SQS / Prometheus...]
    Scaler --> |metric value| KEDA
    HPA --> |scales| Workload[Deployment / StatefulSet\nor Jobs]

KEDA doesn't replace the HPA - it generates and manages HPA objects on your behalf. The KEDA operator translates external metric values into ExternalMetrics API objects that the HPA's scaling loop can consume. This means KEDA integrates with standard Kubernetes cluster autoscaler behavior.

ScaledObject¶

A ScaledObject ties a workload to one or more scalers:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processor

  pollingInterval: 15           # check scaler every 15s
  cooldownPeriod: 60            # wait 60s after last event before scaling to zero
  minReplicaCount: 0            # allow scale-to-zero
  maxReplicaCount: 50
  idleReplicaCount: 0           # if no messages, scale to 0 (overrides minReplicaCount)

  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.platform:9092
        consumerGroup: order-processor
        topic: orders
        lagThreshold: "100"          # scale up when lag exceeds 100 messages per replica
        offsetResetPolicy: latest
      authenticationRef:
        name: kafka-trigger-auth

lagThreshold: "100" means KEDA targets: desiredReplicas = ceil(currentLag / lagThreshold). With 1,000 messages queued and threshold 100, KEDA targets 10 replicas. With 0 messages and idleReplicaCount: 0, KEDA scales to zero.

TriggerAuthentication¶

Credentials for scalers are separated from the ScaledObject using TriggerAuthentication:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-trigger-auth
  namespace: production
spec:
  secretTargetRef:
    - parameter: sasl
      name: kafka-credentials
      key: sasl-mechanism
    - parameter: username
      name: kafka-credentials
      key: username
    - parameter: password
      name: kafka-credentials
      key: password
    - parameter: tls
      name: kafka-credentials
      key: tls

Use ClusterTriggerAuthentication for credentials shared across namespaces (e.g., a single AWS IAM role or cloud credentials).

Common scalers¶

Kafka¶

triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.platform:9092
      consumerGroup: my-consumer
      topic: my-topic
      lagThreshold: "50"

Redis Lists (queue depth)¶

triggers:
  - type: redis
    metadata:
      address: redis.platform:6379
      listName: job-queue
      listLength: "20"     # target: one replica per 20 items
    authenticationRef:
      name: redis-auth

AWS SQS¶

triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
      queueLength: "10"
      awsRegion: us-east-1
    authenticationRef:
      name: aws-keda-auth

Use IRSA (IAM Roles for Service Accounts) with ClusterTriggerAuthentication to avoid long-lived credentials.

Prometheus¶

Scale on any Prometheus metric:

triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring:9090
      metricName: active_transactions
      query: |
        sum(active_transactions{namespace="production"})
      threshold: "100"     # one replica per 100 active transactions

This is the most flexible scaler. Any signal you can express in PromQL - queue depth, active sessions, p99 latency above a threshold, error rate - can drive scaling.

HTTP (HTTP Add-on)¶

Scale HTTP workloads to zero and back when a request arrives:

triggers:
  - type: http
    metadata:
      hosts: api.example.com
      scalingMetric: requestRate
      targetPendingRequests: "100"

The HTTP Add-on deploys an interceptor proxy. Traffic to scaled-to-zero deployments is held by the proxy while KEDA brings the first replica up, then forwarded.

Cron¶

Scale on a schedule (useful for batch windows or predictive pre-warming):

triggers:
  - type: cron
    metadata:
      timezone: America/Chicago
      start: "0 7 * * 1-5"     # weekdays 7am
      end: "0 20 * * 1-5"      # weekdays 8pm
      desiredReplicas: "5"

Multiple cron triggers on a single ScaledObject create overlapping scale-up windows.

ScaledJob¶

ScaledJob manages Jobs instead of Deployments. Each "unit of work" spawns its own Job, rather than scaling a long-running consumer.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: image-processor
  namespace: production
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    template:
      spec:
        restartPolicy: Never
        containers:
          - name: processor
            image: my-org/image-processor:v1.2.0
            command: ["python", "process_one.py"]

  pollingInterval: 10
  maxReplicaCount: 30          # max concurrent Jobs
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5

  scalingStrategy:
    strategy: accurate         # create one Job per pending message

  triggers:
    - type: rabbitmq
      metadata:
        protocol: amqp
        queueName: image-processing
        mode: QueueLength
        value: "1"             # one Job per message
      authenticationRef:
        name: rabbitmq-auth

Use ScaledJob when: - each message is a discrete unit of work with its own completion state - you need Job-level tracking (completion, failure history) - the processing time is long enough that sharing a Deployment adds overhead

Use ScaledObject (Deployment) when: - consumers are long-running, stateful, or maintain connections (Kafka consumers hold partition leases) - rapid scale events would create too many Jobs

Scale-to-zero patterns¶

Scale-to-zero reduces cost for workloads with intermittent traffic. The tradeoff is cold-start latency.

Design considerations: - Image pull time: pre-pull images to nodes or use lightweight base images. With large images, cold start can take 30-60 seconds. - Application startup: readiness probes determine when the pod is ready to receive traffic. Optimize your startup path. - The first message problem: for Kafka, the consumer must join the consumer group before it can read. With minReplicaCount: 0, the first message has no consumer. KEDA detects the lag and starts scaling, but there's a polling delay (pollingInterval). Set this low (5-10s) for latency-sensitive queues. - Graceful shutdown: consumers should finish processing their current message before exiting. Use preStop hooks and terminationGracePeriodSeconds.

Pause autoscaling¶

Temporarily disable autoscaling without removing the ScaledObject:

kubectl annotate scaledobject order-processor \
  autoscaling.keda.sh/paused-replicas=2

This freezes the replica count at 2. Remove the annotation to resume:

kubectl annotate scaledobject order-processor \
  autoscaling.keda.sh/paused-replicas-

Useful during deployments or when you want to drain a queue without scaling up.

Multi-trigger behavior¶

When a ScaledObject has multiple triggers, KEDA takes the maximum desired replica count across all triggers:

triggers:
  - type: kafka
    metadata:
      lagThreshold: "100"   # with 500 lag → 5 replicas
  - type: prometheus
    metadata:
      threshold: "50"       # with 200 active sessions → 4 replicas

Result: KEDA targets max(5, 4) = 5 replicas. This is the right behavior - you want to keep up with the most demanding signal.

Operational patterns¶

Monitor KEDA metrics: KEDA exposes its own Prometheus metrics at :2222/metrics - keda_scaler_metrics_value, keda_scaler_active, keda_scaled_object_errors. Alert on scaler errors.

Scaler connectivity: if a scaler can't reach its external system (Kafka is down, Redis is unreachable), KEDA logs an error and holds the last known scale. It does not scale to zero on scaler failure. Test this behavior before relying on it for availability.

HPA coexistence: if you already have an HPA for CPU scaling, KEDA can manage it. Use advanced.horizontalPodAutoscalerConfig to set HPA behavior (stabilization windows, scale-up/down rates).

Fallback: KEDA supports a fallback block - if the scaler fails N consecutive times, use a fixed replica count as the fallback:

fallback:
  failureThreshold: 3
  replicas: 5