Skip to content

Istio

Istio is a service mesh that adds traffic management, mutual TLS, and observability to Kubernetes workloads without requiring application code changes.

The core problem it solves: as you add more services, cross-service concerns like retries, circuit breaking, encryption, and access control accumulate in each service. A mesh moves those concerns to the infrastructure layer.

Architecture

Istio separates into a control plane and a data plane.

flowchart TD
    subgraph Control Plane
        istiod["istiod\n(Pilot + Citadel + Galley)"]
    end
    subgraph Pod A
        AppA[App container] <--> ProxyA[Envoy sidecar]
    end
    subgraph Pod B
        AppB[App container] <--> ProxyB[Envoy sidecar]
    end
    istiod -- xDS config --> ProxyA
    istiod -- xDS config --> ProxyB
    istiod -- certificates --> ProxyA
    istiod -- certificates --> ProxyB
    ProxyA <-- mTLS --> ProxyB

istiod is a single binary that consolidates three former components: - Pilot: distributes routing rules and service discovery to Envoy proxies via xDS APIs - Citadel: issues and rotates mTLS certificates (SPIFFE/X.509 format) - Galley: validates and processes configuration

Envoy sidecars intercept all inbound and outbound traffic from every pod. The application has no awareness of them - iptables rules redirect traffic through the sidecar.

Sidecar injection

Namespaces with istio-injection: enabled label get automatic injection:

kubectl label namespace my-app istio-injection=enabled

Or opt individual pods in/out:

metadata:
  annotations:
    sidecar.istio.io/inject: "true"   # or "false"

Traffic management

Istio traffic management uses two primary CRDs: VirtualService and DestinationRule.

VirtualService

A VirtualService defines how requests to a host are routed. It replaces the coarse-grained behavior of a Kubernetes Service with fine-grained routing rules.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: checkout
spec:
  hosts:
    - checkout
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: checkout
            subset: canary
    - route:
        - destination:
            host: checkout
            subset: stable
          weight: 100

DestinationRule

A DestinationRule defines subsets and per-subset policies (load balancing, connection pool, outlier detection):

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: checkout
spec:
  host: checkout
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
  subsets:
    - name: stable
      labels:
        version: stable
    - name: canary
      labels:
        version: canary

outlierDetection is passive circuit breaking - Istio ejects hosts that return errors above the threshold. For active circuit breaking (fail fast when the pool is exhausted), configure connectionPool.

Retries and timeouts

http:
  - route:
      - destination:
          host: payment
    timeout: 3s
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: gateway-error,connect-failure,retriable-4xx

Set timeouts and retries at the mesh layer, not in application code. This prevents timeout stacking - if service A calls B which calls C, and all have 5s timeouts, the end-to-end latency can reach 15s.

Ingress Gateway

Istio's Gateway resource exposes services outside the cluster via the istio-ingressgateway pods:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: public-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: my-cert-tls
      hosts:
        - api.example.com
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-vs
spec:
  hosts:
    - api.example.com
  gateways:
    - public-gateway
  http:
    - route:
        - destination:
            host: api-service
            port:
              number: 8080

Mutual TLS

Istio issues SPIFFE-compliant X.509 certificates to every sidecar and rotates them automatically (default 24-hour lifetime). mTLS validates both sides of every service-to-service connection.

PeerAuthentication

Controls whether mTLS is enforced or optional for inbound traffic:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: my-app
spec:
  mtls:
    mode: STRICT   # reject any plaintext connections

Modes: - STRICT: all inbound connections must use mTLS - PERMISSIVE: accept both mTLS and plaintext (useful during migration) - DISABLE: plaintext only

Apply STRICT at the mesh level (namespace: istio-system) to lock down the entire cluster, then use PERMISSIVE per namespace for services that receive external traffic.

Authorization policies

AuthorizationPolicy controls which services or users can reach a workload:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-checkout
  namespace: my-app
spec:
  selector:
    matchLabels:
      app: checkout
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - cluster.local/ns/my-app/sa/frontend
      to:
        - operation:
            methods: ["POST"]
            paths: ["/checkout/*"]

principals maps to the SPIFFE ID embedded in the mTLS certificate - cluster.local/ns/<namespace>/sa/<service-account>. This is identity-based authorization tied to real Kubernetes service accounts, not IP addresses.

Default-deny pattern: apply an empty AuthorizationPolicy to a namespace - it denies everything, then add explicit ALLOW policies per service.

Observability

Istio automatically generates three telemetry signals without application instrumentation:

Metrics - Envoy exports Prometheus metrics for every request: istio_requests_total, istio_request_duration_milliseconds, istio_request_bytes. These are available at :15090/metrics on every sidecar.

Distributed traces - Istio propagates trace headers (B3 or W3C TraceContext). Applications must forward the headers (x-request-id, x-b3-traceid, etc.) between service calls - Envoy adds them on ingress but can't forward them internally without application cooperation.

Access logs - Envoy logs every request/response. Format is configurable via Telemetry CRD.

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  accessLogging:
    - providers:
        - name: envoy
  tracing:
    - providers:
        - name: zipkin
      randomSamplingPercentage: 1.0

Ambient mesh (sidecarless)

Istio's ambient mode, stable as of Istio 1.22, removes sidecars entirely. Traffic flows through a per-node ztunnel (zero-trust tunnel) for L4 mTLS, and an optional per-namespace waypoint proxy for L7 policies.

Benefits: no sidecar resource overhead, no injection, faster pod startup, easier upgrades.

istioctl install --set profile=ambient
kubectl label namespace my-app istio.io/dataplane-mode=ambient

Waypoint for L7 policies, retries, and traffic splitting:

istioctl waypoint apply --namespace my-app

Operational patterns

Canary upgrades: always upgrade with istioctl upgrade, not helm upgrade directly. Use istioctl analyze before and after to catch configuration issues.

Debug a sidecar: istioctl proxy-config cluster <pod>, istioctl proxy-config listener <pod>. These show what Pilot has pushed to Envoy. Mismatched config between what you expect and what Envoy has is the most common Istio issue.

Resource overhead: each Envoy sidecar uses roughly 50-100MB memory and a small but real CPU budget. For high-QPS services, tune concurrency to pin sidecar worker threads.

Egress control: use ServiceEntry to allow workloads to reach external services, and an EgressGateway to route and log all outbound traffic centrally.