Audit & Logging
In Kubernetes, there are two stories being told at the same time:
- The Detective Story (Audit Logs): "Who changed the system configuration?"
- The Operator Story (App Logs): "Why is my application crashing?"
If you don't capture these stories immediately, they vanish. Kubernetes logs are ephemeral; when a Pod dies, its logs die with it.
1. Audit Logs (The "Black Box" Recorder)
Audit logs record every single request sent to the Kubernetes API Server. They answer the questions: Who, What, Where, and When.
- Who: User
alice - What: Tried to
delete - Where: The
secretnameddb-pass - When: At
12:05 PM - Result:
403 Forbidden
The 4 Audit Levels
You must configure how much data you want. This is a trade-off between "Visibility" and "Disk Space."
| Level | Description | Use Case |
|---|---|---|
None |
Don't log anything. | Frequent, noisy events (like kube-proxy watching endpoints). |
Metadata |
Log the User, Timestamp, Resource, and Verb. No payloads. | Standard Production (Low Cost, High Value). |
Request |
Log metadata + the request body sent by the user. | Debugging "Why did this object change?" |
RequestResponse |
Log everything + the server's response body. | High Security / Debugging. (Generates massive data). |
Security Warning: Secrets in Logs
Be extremely careful using Request or RequestResponse levels on Secret or ConfigMap resources. You might accidentally write your database passwords into your plain-text audit log files!
Configuration (The Policy File)
You pass a policy file to the API Server to define rules.
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# 1. Don't log noisy system calls
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
# 2. Log full request body for critical changes (Pod modifications)
- level: Request
resources:
- group: ""
resources: ["pods"]
# 3. Default: Log metadata only for everything else
- level: Metadata
2. Application Logging (The "Stream")
Kubernetes does not provide a native storage solution for logs. It assumes your application writes to Standard Output (stdout) and Standard Error (stderr).
The container runtime (containerd) captures these streams and writes them to a file on the Node (usually /var/log/containers/*.log).
The Logging Pipeline
Since logs on the Node are deleted when the Pod is deleted, you need a Cluster-Level Logging Stack to ship them to safety.
graph LR
subgraph "Worker Node"
Pod["App Pod"] -->|stdout| File["/var/log/containers/..."]
Agent["Log Agent<br/>(Fluent Bit / Promtail)"] -.->|Reads| File
end
Agent -->|Push| Backend["Log Storage<br/>(Loki / Elasticsearch)"]
Backend -->|Query| UI["Dashboard<br/>(Grafana / Kibana)"]
The "DaemonSet" Pattern
The most common architecture is running a Logging Agent (like Fluent Bit or Promtail) as a DaemonSet.
- One agent runs on every Node.
- It mounts
/var/log/containersas a read-only volume. - It tails every log file, adds metadata (Pod Name, Namespace), and pushes it to the backend.
3. Best Practices
Security
- Alert on
403 Forbidden: If your Audit Logs show a user trying to readsecretsand getting denied 10 times in a minute, you are under attack. Alert on this pattern. - Separate Retention: Audit logs are legal documents. Keep them in a separate bucket (e.g., S3 Glacier) for 1 year, even if you only keep App logs for 7 days.
Operations
- JSON Logging: Force your developers to log in JSON format.
- Bad:
2023-10-01 Error: DB failed(Hard to parse). - Good:
{"level": "error", "msg": "DB failed", "service": "payment"}(Easy to filter).
- Bad:
- Don't Log to Files Inside Containers: If your legacy app writes to
/app/logs/server.log, standard Kubernetes logging will not catch it. You must use a "Sidecar" container totail -fthat file to stdout, or configure the app to write to stdout directly.
Summary
- Audit Logs track API access ("Who did it?"). They are configured via a Policy File on the control plane.
- App Logs track application health. They rely on the stdout/stderr stream.
- Persistence: Logs are ephemeral. You must use a collection agent (DaemonSet) to ship them to a central store (Loki/Elastic) or they will be lost on Pod restart.
- Golden Rule: Avoid
RequestResponselogging for Secrets to prevent credential leaks.