KEDA¶
KEDA (Kubernetes Event-Driven Autoscaling) extends the Kubernetes HPA to scale workloads based on external event sources - Kafka consumer lag, queue depth, Prometheus metrics, HTTP request rate, cron schedules, and dozens more - including scaling all the way to zero.
The standard HPA only scales on CPU and memory. For event-driven workloads, those signals arrive too late or don't reflect the actual backlog. A Kafka consumer might be idle (low CPU) but sitting behind 10,000 unprocessed messages. KEDA surfaces that backlog as the scaling signal.
Architecture¶
flowchart TD
KEDA[KEDA Operator] --> |reads| SO[ScaledObject\nor ScaledJob]
KEDA --> |creates/manages| HPA[HorizontalPodAutoscaler]
KEDA --> |queries| Scaler[External scaler\nKafka / Redis / SQS / Prometheus...]
Scaler --> |metric value| KEDA
HPA --> |scales| Workload[Deployment / StatefulSet\nor Jobs]
KEDA doesn't replace the HPA - it generates and manages HPA objects on your behalf. The KEDA operator translates external metric values into ExternalMetrics API objects that the HPA's scaling loop can consume. This means KEDA integrates with standard Kubernetes cluster autoscaler behavior.
ScaledObject¶
A ScaledObject ties a workload to one or more scalers:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-processor
pollingInterval: 15 # check scaler every 15s
cooldownPeriod: 60 # wait 60s after last event before scaling to zero
minReplicaCount: 0 # allow scale-to-zero
maxReplicaCount: 50
idleReplicaCount: 0 # if no messages, scale to 0 (overrides minReplicaCount)
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.platform:9092
consumerGroup: order-processor
topic: orders
lagThreshold: "100" # scale up when lag exceeds 100 messages per replica
offsetResetPolicy: latest
authenticationRef:
name: kafka-trigger-auth
lagThreshold: "100" means KEDA targets: desiredReplicas = ceil(currentLag / lagThreshold). With 1,000 messages queued and threshold 100, KEDA targets 10 replicas. With 0 messages and idleReplicaCount: 0, KEDA scales to zero.
TriggerAuthentication¶
Credentials for scalers are separated from the ScaledObject using TriggerAuthentication:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-trigger-auth
namespace: production
spec:
secretTargetRef:
- parameter: sasl
name: kafka-credentials
key: sasl-mechanism
- parameter: username
name: kafka-credentials
key: username
- parameter: password
name: kafka-credentials
key: password
- parameter: tls
name: kafka-credentials
key: tls
Use ClusterTriggerAuthentication for credentials shared across namespaces (e.g., a single AWS IAM role or cloud credentials).
Common scalers¶
Kafka¶
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.platform:9092
consumerGroup: my-consumer
topic: my-topic
lagThreshold: "50"
Redis Lists (queue depth)¶
triggers:
- type: redis
metadata:
address: redis.platform:6379
listName: job-queue
listLength: "20" # target: one replica per 20 items
authenticationRef:
name: redis-auth
AWS SQS¶
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
queueLength: "10"
awsRegion: us-east-1
authenticationRef:
name: aws-keda-auth
Use IRSA (IAM Roles for Service Accounts) with ClusterTriggerAuthentication to avoid long-lived credentials.
Prometheus¶
Scale on any Prometheus metric:
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
metricName: active_transactions
query: |
sum(active_transactions{namespace="production"})
threshold: "100" # one replica per 100 active transactions
This is the most flexible scaler. Any signal you can express in PromQL - queue depth, active sessions, p99 latency above a threshold, error rate - can drive scaling.
HTTP (HTTP Add-on)¶
Scale HTTP workloads to zero and back when a request arrives:
triggers:
- type: http
metadata:
hosts: api.example.com
scalingMetric: requestRate
targetPendingRequests: "100"
The HTTP Add-on deploys an interceptor proxy. Traffic to scaled-to-zero deployments is held by the proxy while KEDA brings the first replica up, then forwarded.
Cron¶
Scale on a schedule (useful for batch windows or predictive pre-warming):
triggers:
- type: cron
metadata:
timezone: America/Chicago
start: "0 7 * * 1-5" # weekdays 7am
end: "0 20 * * 1-5" # weekdays 8pm
desiredReplicas: "5"
Multiple cron triggers on a single ScaledObject create overlapping scale-up windows.
ScaledJob¶
ScaledJob manages Jobs instead of Deployments. Each "unit of work" spawns its own Job, rather than scaling a long-running consumer.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: image-processor
namespace: production
spec:
jobTargetRef:
parallelism: 1
completions: 1
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: my-org/image-processor:v1.2.0
command: ["python", "process_one.py"]
pollingInterval: 10
maxReplicaCount: 30 # max concurrent Jobs
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5
scalingStrategy:
strategy: accurate # create one Job per pending message
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: image-processing
mode: QueueLength
value: "1" # one Job per message
authenticationRef:
name: rabbitmq-auth
Use ScaledJob when:
- each message is a discrete unit of work with its own completion state
- you need Job-level tracking (completion, failure history)
- the processing time is long enough that sharing a Deployment adds overhead
Use ScaledObject (Deployment) when:
- consumers are long-running, stateful, or maintain connections (Kafka consumers hold partition leases)
- rapid scale events would create too many Jobs
Scale-to-zero patterns¶
Scale-to-zero reduces cost for workloads with intermittent traffic. The tradeoff is cold-start latency.
Design considerations:
- Image pull time: pre-pull images to nodes or use lightweight base images. With large images, cold start can take 30-60 seconds.
- Application startup: readiness probes determine when the pod is ready to receive traffic. Optimize your startup path.
- The first message problem: for Kafka, the consumer must join the consumer group before it can read. With minReplicaCount: 0, the first message has no consumer. KEDA detects the lag and starts scaling, but there's a polling delay (pollingInterval). Set this low (5-10s) for latency-sensitive queues.
- Graceful shutdown: consumers should finish processing their current message before exiting. Use preStop hooks and terminationGracePeriodSeconds.
Pause autoscaling¶
Temporarily disable autoscaling without removing the ScaledObject:
This freezes the replica count at 2. Remove the annotation to resume:
Useful during deployments or when you want to drain a queue without scaling up.
Multi-trigger behavior¶
When a ScaledObject has multiple triggers, KEDA takes the maximum desired replica count across all triggers:
triggers:
- type: kafka
metadata:
lagThreshold: "100" # with 500 lag → 5 replicas
- type: prometheus
metadata:
threshold: "50" # with 200 active sessions → 4 replicas
Result: KEDA targets max(5, 4) = 5 replicas. This is the right behavior - you want to keep up with the most demanding signal.
Operational patterns¶
Monitor KEDA metrics: KEDA exposes its own Prometheus metrics at :2222/metrics - keda_scaler_metrics_value, keda_scaler_active, keda_scaled_object_errors. Alert on scaler errors.
Scaler connectivity: if a scaler can't reach its external system (Kafka is down, Redis is unreachable), KEDA logs an error and holds the last known scale. It does not scale to zero on scaler failure. Test this behavior before relying on it for availability.
HPA coexistence: if you already have an HPA for CPU scaling, KEDA can manage it. Use advanced.horizontalPodAutoscalerConfig to set HPA behavior (stabilization windows, scale-up/down rates).
Fallback: KEDA supports a fallback block - if the scaler fails N consecutive times, use a fixed replica count as the fallback: