# Kubernetes Container Logs – ILM Tiering + SLM Snapshots (ECK on AKS)
This README documents what was configured for the **Fleet-managed Kubernetes container logs** data stream:
- **Data stream:** `logs-kubernetes.container_logs-pdng`
- **Goal:** keep hot/warm/cold tiering for performance & cost control, automatically delete old indices, and still keep a longer **backup** in Azure Blob via **SLM snapshots**.
- **Important constraint:** the cluster license is **not compliant for searchable snapshots**, so we **do not** use the ILM `searchable_snapshot` action. Instead we use:
- **ILM** for hot → warm → cold → delete
- **SLM** to snapshot indices to **Azure Blob** (`azure_repo`)
- **Restore-on-demand** when you need to query data older than what ILM keeps online
---
## 1. Architecture
```mermaid
flowchart TB
subgraph Users["Users / Ops"]
U[Kibana Discover / Dashboards]
end
subgraph Ingress["Ingress / LB"]
IG["Ingress / LB"]
end
subgraph ElasticStack[Elastic Stack on ECK]
subgraph ControlPlane[Control Plane]
FS[Fleet Server]
KBN[Kibana]
end
subgraph HotTier[Hot Tier]
EH[Elasticsearch Hot Nodes]
end
subgraph WarmTier[Warm Tier]
EW[Elasticsearch Warm Nodes]
end
subgraph ColdTier[Cold Tier]
EC[Elasticsearch Cold Nodes]
end
end
subgraph Snapshot[Snapshot Storage]
AZ[(Azure Blob: azure_repo / es-snapshots)]
end
subgraph K8s[Kubernetes Cluster]
EA["Elastic Agent (DaemonSet)"]
KP["Kubelet / Metrics"]
LOGS[Container Logs]
end
%% User access
U --> IG --> KBN
%% Fleet control
KBN --> FS
FS --> EA
%% Data ingest
KP --> EA
LOGS --> EA
EA --> EH
%% ILM tiering
EH -->|ILM rollover| EW
EW -->|ILM migrate| EC
EC -->|ILM delete @ 90d| X[(deleted)]
%% Snapshots
EH -.->|SLM snapshot schedule| AZ
EW -.->|SLM snapshot schedule| AZ
EC -.->|SLM snapshot schedule| AZ
%% Restore
AZ -->|Restore| R["restore-* indices (temporary)"]
KBN --> R
```
**Key point:** without searchable snapshots, **Azure snapshots are not directly queryable**.
To query historical data, **restore** a snapshot into temporary `restore-*` indices, query, then delete them.
---
## 2. What we changed
### 2.1 ILM: hot 7 / warm 30 / cold 90 / delete 90 (no searchable snapshot)
- New ILM policy: `pdng-logs-hot7-warm30-cold90-del90`
- Applied only to `logs-kubernetes.container_logs-pdng` via a **high priority override index template**
- Triggered a rollover so the next backing index uses the new ILM policy
Result observed:
- `.ds-...-000001` stayed on the default `logs` policy (expected)
- `.ds-...-000002` uses `pdng-logs-hot7-warm30-cold90-del90` (correct)
### 2.2 SLM: snapshots to Azure Blob (backup / longer retention)
- Snapshot repository exists: `azure_repo` (Azure container `es-snapshots`)
- SLM policy snapshots `.ds-logs-kubernetes.container_logs-pdng-*`
- Snapshots are retained (example: 120 days) so you can restore data after ILM deletes it
---
## 3. Step-by-step Setup
> Run commands in **Kibana → Dev Tools**, or with `curl` against Elasticsearch.
### 3.1 Confirm data stream and snapshot repo
```http
GET _data_stream/logs-kubernetes.container_logs-pdng
GET _snapshot/azure_repo
```
### 3.2 Create ILM policy (no searchable snapshot)
```http
PUT _ilm/policy/pdng-logs-hot7-warm30-cold90-del90
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "7d",
"max_primary_shard_size": "50gb"
},
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": { "max_num_segments": 1 },
"set_priority": { "priority": 50 }
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": { "priority": 0 }
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
```
### 3.3 Create an override index template for the data stream
This does **not** modify Fleet-managed templates/pipelines; it only overrides ILM for the specific data stream.
```http
PUT _index_template/logs-kubernetes.container_logs-pdng-ilm
{
"index_patterns": ["logs-kubernetes.container_logs-pdng"],
"priority": 500,
"data_stream": {},
"template": {
"settings": {
"index.lifecycle.name": "pdng-logs-hot7-warm30-cold90-del90"
}
},
"_meta": {
"description": "Override ILM policy for logs-kubernetes.container_logs-pdng only (keep Fleet pipelines/mappings)"
}
}
```
### 3.4 Validate the template override (simulate)
```http
POST _index_template/_simulate_index/logs-kubernetes.container_logs-pdng
```
Expected:
- `index.lifecycle.name` is `pdng-logs-hot7-warm30-cold90-del90`
- Overlapping templates exist (normal), but the final lifecycle name is ours
### 3.5 Rollover once to apply the new policy to a fresh backing index
```http
POST logs-kubernetes.container_logs-pdng/_rollover
```
### 3.6 Verify ILM applied to the new generation
```http
GET _data_stream/logs-kubernetes.container_logs-pdng
GET .ds-logs-kubernetes.container_logs-pdng-*/_ilm/explain
```
Expected:
- `...-000001` policy: `logs`
- `...-000002` policy: `pdng-logs-hot7-warm30-cold90-del90`
---
## 4. SLM (Snapshots) Setup
### 4.1 Create an SLM policy (daily snapshots)
Example: daily at 01:30, keep snapshots for 120 days.
```http
PUT _slm/policy/slm-pdng-container-logs-daily
{
"schedule": "0 30 1 * * ?",
"name": "<slm-pdng-container-logs-{now/d}>",
"repository": "azure_repo",
"config": {
"indices": [
".ds-logs-kubernetes.container_logs-pdng-*"
],
"ignore_unavailable": true,
"include_global_state": false
},
"retention": {
"expire_after": "120d",
"min_count": 7,
"max_count": 200
}
}
```
### 4.2 Trigger a snapshot immediately (test)
```http
POST _slm/policy/slm-pdng-container-logs-daily/_execute
```
Check status:
```http
GET _slm/status
GET _slm/policy/slm-pdng-container-logs-daily
GET _snapshot/azure_repo/_all
```
---
## 5. How to query data after ILM deletes it (Restore-on-demand)
### 5.1 List available snapshots
```http
GET _snapshot/azure_repo/_all
```
Pick the snapshot name you want (e.g., `slm-pdng-container-logs-2026.02.01`).
### 5.2 Restore into temporary indices (`restore-*`) and allocate to cold tier
Replace `<SNAPSHOT_NAME>`.
```http
POST _snapshot/azure_repo/<SNAPSHOT_NAME>/_restore
{
"indices": ".ds-logs-kubernetes.container_logs-pdng-*",
"rename_pattern": ".ds-logs-kubernetes.container_logs-pdng-(.+)",
"rename_replacement": "restore-logs-k8s-pdng-$1",
"include_global_state": false,
"index_settings": {
"index.routing.allocation.include._tier_preference": "data_cold"
}
}
```
### 5.3 Query in Kibana
In **Discover**:
- Use index pattern: `restore-*` (or filter `_index: restore-*`)
- Queries work the same as live logs (e.g., `kubernetes.container.name : "pdng-cosmosrestapi"`)
### 5.4 Cleanup after investigation
```http
DELETE restore-logs-k8s-pdng-*
```
---
## 6. Notes / Troubleshooting
### Searchable snapshot is blocked by license
If you see an error like:
- `non-compliant for [searchable-snapshots]`
Then you **cannot** use ILM `searchable_snapshot`. Use SLM + restore instead (this README’s approach).
### ILM tier migration
New indices are created on `data_hot` (expected). ILM will migrate older indices to warm/cold when `min_age` is reached, assuming the cluster has those tiers available.
### Old backing indices
Templates only apply to newly created backing indices. If you want to change ILM policy for a previously created backing index, you can update its settings manually, but be cautious and validate with `_ilm/explain`.
---
## 7. Quick verification commands
```http
POST _index_template/_simulate_index/logs-kubernetes.container_logs-pdng
GET _data_stream/logs-kubernetes.container_logs-pdng
GET .ds-logs-kubernetes.container_logs-pdng-*/_ilm/explain
GET _slm/status
GET _snapshot/azure_repo/_all
```
---
**End.**