Istio¶
Istio is a service mesh that adds traffic management, mutual TLS, and observability to Kubernetes workloads without requiring application code changes.
The core problem it solves: as you add more services, cross-service concerns like retries, circuit breaking, encryption, and access control accumulate in each service. A mesh moves those concerns to the infrastructure layer.
Architecture¶
Istio separates into a control plane and a data plane.
flowchart TD
subgraph Control Plane
istiod["istiod\n(Pilot + Citadel + Galley)"]
end
subgraph Pod A
AppA[App container] <--> ProxyA[Envoy sidecar]
end
subgraph Pod B
AppB[App container] <--> ProxyB[Envoy sidecar]
end
istiod -- xDS config --> ProxyA
istiod -- xDS config --> ProxyB
istiod -- certificates --> ProxyA
istiod -- certificates --> ProxyB
ProxyA <-- mTLS --> ProxyB
istiod is a single binary that consolidates three former components: - Pilot: distributes routing rules and service discovery to Envoy proxies via xDS APIs - Citadel: issues and rotates mTLS certificates (SPIFFE/X.509 format) - Galley: validates and processes configuration
Envoy sidecars intercept all inbound and outbound traffic from every pod. The application has no awareness of them - iptables rules redirect traffic through the sidecar.
Sidecar injection¶
Namespaces with istio-injection: enabled label get automatic injection:
Or opt individual pods in/out:
Traffic management¶
Istio traffic management uses two primary CRDs: VirtualService and DestinationRule.
VirtualService¶
A VirtualService defines how requests to a host are routed. It replaces the coarse-grained behavior of a Kubernetes Service with fine-grained routing rules.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: checkout
spec:
hosts:
- checkout
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: checkout
subset: canary
- route:
- destination:
host: checkout
subset: stable
weight: 100
DestinationRule¶
A DestinationRule defines subsets and per-subset policies (load balancing, connection pool, outlier detection):
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: checkout
spec:
host: checkout
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 10
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
outlierDetection is passive circuit breaking - Istio ejects hosts that return errors above the threshold. For active circuit breaking (fail fast when the pool is exhausted), configure connectionPool.
Retries and timeouts¶
http:
- route:
- destination:
host: payment
timeout: 3s
retries:
attempts: 3
perTryTimeout: 1s
retryOn: gateway-error,connect-failure,retriable-4xx
Set timeouts and retries at the mesh layer, not in application code. This prevents timeout stacking - if service A calls B which calls C, and all have 5s timeouts, the end-to-end latency can reach 15s.
Ingress Gateway¶
Istio's Gateway resource exposes services outside the cluster via the istio-ingressgateway pods:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: public-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: my-cert-tls
hosts:
- api.example.com
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-vs
spec:
hosts:
- api.example.com
gateways:
- public-gateway
http:
- route:
- destination:
host: api-service
port:
number: 8080
Mutual TLS¶
Istio issues SPIFFE-compliant X.509 certificates to every sidecar and rotates them automatically (default 24-hour lifetime). mTLS validates both sides of every service-to-service connection.
PeerAuthentication¶
Controls whether mTLS is enforced or optional for inbound traffic:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: my-app
spec:
mtls:
mode: STRICT # reject any plaintext connections
Modes:
- STRICT: all inbound connections must use mTLS
- PERMISSIVE: accept both mTLS and plaintext (useful during migration)
- DISABLE: plaintext only
Apply STRICT at the mesh level (namespace: istio-system) to lock down the entire cluster, then use PERMISSIVE per namespace for services that receive external traffic.
Authorization policies¶
AuthorizationPolicy controls which services or users can reach a workload:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-frontend-to-checkout
namespace: my-app
spec:
selector:
matchLabels:
app: checkout
action: ALLOW
rules:
- from:
- source:
principals:
- cluster.local/ns/my-app/sa/frontend
to:
- operation:
methods: ["POST"]
paths: ["/checkout/*"]
principals maps to the SPIFFE ID embedded in the mTLS certificate - cluster.local/ns/<namespace>/sa/<service-account>. This is identity-based authorization tied to real Kubernetes service accounts, not IP addresses.
Default-deny pattern: apply an empty AuthorizationPolicy to a namespace - it denies everything, then add explicit ALLOW policies per service.
Observability¶
Istio automatically generates three telemetry signals without application instrumentation:
Metrics - Envoy exports Prometheus metrics for every request: istio_requests_total, istio_request_duration_milliseconds, istio_request_bytes. These are available at :15090/metrics on every sidecar.
Distributed traces - Istio propagates trace headers (B3 or W3C TraceContext). Applications must forward the headers (x-request-id, x-b3-traceid, etc.) between service calls - Envoy adds them on ingress but can't forward them internally without application cooperation.
Access logs - Envoy logs every request/response. Format is configurable via Telemetry CRD.
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: mesh-default
namespace: istio-system
spec:
accessLogging:
- providers:
- name: envoy
tracing:
- providers:
- name: zipkin
randomSamplingPercentage: 1.0
Ambient mesh (sidecarless)¶
Istio's ambient mode, stable as of Istio 1.22, removes sidecars entirely. Traffic flows through a per-node ztunnel (zero-trust tunnel) for L4 mTLS, and an optional per-namespace waypoint proxy for L7 policies.
Benefits: no sidecar resource overhead, no injection, faster pod startup, easier upgrades.
istioctl install --set profile=ambient
kubectl label namespace my-app istio.io/dataplane-mode=ambient
Waypoint for L7 policies, retries, and traffic splitting:
Operational patterns¶
Canary upgrades: always upgrade with istioctl upgrade, not helm upgrade directly. Use istioctl analyze before and after to catch configuration issues.
Debug a sidecar: istioctl proxy-config cluster <pod>, istioctl proxy-config listener <pod>. These show what Pilot has pushed to Envoy. Mismatched config between what you expect and what Envoy has is the most common Istio issue.
Resource overhead: each Envoy sidecar uses roughly 50-100MB memory and a small but real CPU budget. For high-QPS services, tune concurrency to pin sidecar worker threads.
Egress control: use ServiceEntry to allow workloads to reach external services, and an EgressGateway to route and log all outbound traffic centrally.