---
# System prepended metadata

title: Kubernetes Container Logs – ILM Tiering + SLM Snapshots (ECK on AKS)
tags: [HP]

---

# Kubernetes Container Logs – ILM Tiering + SLM Snapshots (ECK on AKS)

This README documents what was configured for the **Fleet-managed Kubernetes container logs** data stream:

- **Data stream:** `logs-kubernetes.container_logs-pdng`
- **Goal:** keep hot/warm/cold tiering for performance & cost control, automatically delete old indices, and still keep a longer **backup** in Azure Blob via **SLM snapshots**.
- **Important constraint:** the cluster license is **not compliant for searchable snapshots**, so we **do not** use the ILM `searchable_snapshot` action. Instead we use:
  - **ILM** for hot → warm → cold → delete
  - **SLM** to snapshot indices to **Azure Blob** (`azure_repo`)
  - **Restore-on-demand** when you need to query data older than what ILM keeps online

---

## 1. Architecture

```mermaid
flowchart TB

subgraph Users["Users / Ops"]
  U[Kibana Discover / Dashboards]
end

subgraph Ingress["Ingress / LB"]
  IG["Ingress / LB"]
end

subgraph ElasticStack[Elastic Stack on ECK]

  subgraph ControlPlane[Control Plane]
    FS[Fleet Server]
    KBN[Kibana]
  end

  subgraph HotTier[Hot Tier]
    EH[Elasticsearch Hot Nodes]
  end

  subgraph WarmTier[Warm Tier]
    EW[Elasticsearch Warm Nodes]
  end

  subgraph ColdTier[Cold Tier]
    EC[Elasticsearch Cold Nodes]
  end

end

subgraph Snapshot[Snapshot Storage]
  AZ[(Azure Blob: azure_repo / es-snapshots)]
end

subgraph K8s[Kubernetes Cluster]
  EA["Elastic Agent (DaemonSet)"]
  KP["Kubelet / Metrics"]
  LOGS[Container Logs]
end

%% User access
U --> IG --> KBN

%% Fleet control
KBN --> FS
FS --> EA

%% Data ingest
KP --> EA
LOGS --> EA
EA --> EH

%% ILM tiering
EH -->|ILM rollover| EW
EW -->|ILM migrate| EC
EC -->|ILM delete @ 90d| X[(deleted)]

%% Snapshots
EH -.->|SLM snapshot schedule| AZ
EW -.->|SLM snapshot schedule| AZ
EC -.->|SLM snapshot schedule| AZ

%% Restore
AZ -->|Restore| R["restore-* indices (temporary)"]
KBN --> R
```

**Key point:** without searchable snapshots, **Azure snapshots are not directly queryable**.  
To query historical data, **restore** a snapshot into temporary `restore-*` indices, query, then delete them.

---

## 2. What we changed

### 2.1 ILM: hot 7 / warm 30 / cold 90 / delete 90 (no searchable snapshot)

- New ILM policy: `pdng-logs-hot7-warm30-cold90-del90`
- Applied only to `logs-kubernetes.container_logs-pdng` via a **high priority override index template**
- Triggered a rollover so the next backing index uses the new ILM policy

Result observed:

- `.ds-...-000001` stayed on the default `logs` policy (expected)
- `.ds-...-000002` uses `pdng-logs-hot7-warm30-cold90-del90` (correct)

### 2.2 SLM: snapshots to Azure Blob (backup / longer retention)

- Snapshot repository exists: `azure_repo` (Azure container `es-snapshots`)
- SLM policy snapshots `.ds-logs-kubernetes.container_logs-pdng-*`
- Snapshots are retained (example: 120 days) so you can restore data after ILM deletes it

---

## 3. Step-by-step Setup

> Run commands in **Kibana → Dev Tools**, or with `curl` against Elasticsearch.

### 3.1 Confirm data stream and snapshot repo

```http
GET _data_stream/logs-kubernetes.container_logs-pdng
GET _snapshot/azure_repo
```

### 3.2 Create ILM policy (no searchable snapshot)

```http
PUT _ilm/policy/pdng-logs-hot7-warm30-cold90-del90
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_primary_shard_size": "50gb"
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": { "priority": 0 }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
```

### 3.3 Create an override index template for the data stream

This does **not** modify Fleet-managed templates/pipelines; it only overrides ILM for the specific data stream.

```http
PUT _index_template/logs-kubernetes.container_logs-pdng-ilm
{
  "index_patterns": ["logs-kubernetes.container_logs-pdng"],
  "priority": 500,
  "data_stream": {},
  "template": {
    "settings": {
      "index.lifecycle.name": "pdng-logs-hot7-warm30-cold90-del90"
    }
  },
  "_meta": {
    "description": "Override ILM policy for logs-kubernetes.container_logs-pdng only (keep Fleet pipelines/mappings)"
  }
}
```

### 3.4 Validate the template override (simulate)

```http
POST _index_template/_simulate_index/logs-kubernetes.container_logs-pdng
```

Expected:

- `index.lifecycle.name` is `pdng-logs-hot7-warm30-cold90-del90`
- Overlapping templates exist (normal), but the final lifecycle name is ours

### 3.5 Rollover once to apply the new policy to a fresh backing index

```http
POST logs-kubernetes.container_logs-pdng/_rollover
```

### 3.6 Verify ILM applied to the new generation

```http
GET _data_stream/logs-kubernetes.container_logs-pdng
GET .ds-logs-kubernetes.container_logs-pdng-*/_ilm/explain
```

Expected:

- `...-000001` policy: `logs`
- `...-000002` policy: `pdng-logs-hot7-warm30-cold90-del90`

---

## 4. SLM (Snapshots) Setup

### 4.1 Create an SLM policy (daily snapshots)

Example: daily at 01:30, keep snapshots for 120 days.

```http
PUT _slm/policy/slm-pdng-container-logs-daily
{
  "schedule": "0 30 1 * * ?",
  "name": "<slm-pdng-container-logs-{now/d}>",
  "repository": "azure_repo",
  "config": {
    "indices": [
      ".ds-logs-kubernetes.container_logs-pdng-*"
    ],
    "ignore_unavailable": true,
    "include_global_state": false
  },
  "retention": {
    "expire_after": "120d",
    "min_count": 7,
    "max_count": 200
  }
}
```

### 4.2 Trigger a snapshot immediately (test)

```http
POST _slm/policy/slm-pdng-container-logs-daily/_execute
```

Check status:

```http
GET _slm/status
GET _slm/policy/slm-pdng-container-logs-daily
GET _snapshot/azure_repo/_all
```

---

## 5. How to query data after ILM deletes it (Restore-on-demand)

### 5.1 List available snapshots

```http
GET _snapshot/azure_repo/_all
```

Pick the snapshot name you want (e.g., `slm-pdng-container-logs-2026.02.01`).

### 5.2 Restore into temporary indices (`restore-*`) and allocate to cold tier

Replace `<SNAPSHOT_NAME>`.

```http
POST _snapshot/azure_repo/<SNAPSHOT_NAME>/_restore
{
  "indices": ".ds-logs-kubernetes.container_logs-pdng-*",
  "rename_pattern": ".ds-logs-kubernetes.container_logs-pdng-(.+)",
  "rename_replacement": "restore-logs-k8s-pdng-$1",
  "include_global_state": false,
  "index_settings": {
    "index.routing.allocation.include._tier_preference": "data_cold"
  }
}
```

### 5.3 Query in Kibana

In **Discover**:

- Use index pattern: `restore-*` (or filter `_index: restore-*`)
- Queries work the same as live logs (e.g., `kubernetes.container.name : "pdng-cosmosrestapi"`)

### 5.4 Cleanup after investigation

```http
DELETE restore-logs-k8s-pdng-*
```

---

## 6. Notes / Troubleshooting

### Searchable snapshot is blocked by license

If you see an error like:

- `non-compliant for [searchable-snapshots]`

Then you **cannot** use ILM `searchable_snapshot`. Use SLM + restore instead (this README’s approach).

### ILM tier migration

New indices are created on `data_hot` (expected). ILM will migrate older indices to warm/cold when `min_age` is reached, assuming the cluster has those tiers available.

### Old backing indices

Templates only apply to newly created backing indices. If you want to change ILM policy for a previously created backing index, you can update its settings manually, but be cautious and validate with `_ilm/explain`.

---

## 7. Quick verification commands

```http
POST _index_template/_simulate_index/logs-kubernetes.container_logs-pdng
GET _data_stream/logs-kubernetes.container_logs-pdng
GET .ds-logs-kubernetes.container_logs-pdng-*/_ilm/explain
GET _slm/status
GET _snapshot/azure_repo/_all
```

---

**End.**
