Upgrading Rancher Server from v2.4.3 to v2.12.1: A Bulletproof, Step-by-Step Guide

If you're still running Rancher v2.4.3, you're roughly eight minor versions and five years behind current. That gap matters more than it sounds like, because of one rule that Rancher enforces strictly:

The only tested and supported Rancher upgrade path between minor versions is: latest patch of your current minor → latest patch of the next minor. You cannot skip minor versions, and you cannot upgrade directly from v2.4.3 to v2.12.1 in a single Helm upgrade.

So this post is not "run one helm upgrade command." It's a chain of controlled hops, each one backed up, verified, and soaked before you move to the next. Skipping this discipline is the single most common cause of a Rancher server that comes up broken, or worse, one that comes up "fine" but silently corrupts downstream cluster state.

Read the whole post before you touch anything. Then follow it top to bottom, in order, without skipping steps.


1. The single most important thing to know before you start

If your v2.4.3 environment has any downstream clusters provisioned with RKE (RKE1) — the classic Rancher-launched Kubernetes clusters from that era — you have a hard blocker:

  • RKE1 reached end of life on July 31, 2025.
  • Rancher v2.12.0 and later no longer supports provisioning or managing downstream RKE1 clusters at all.

This means before you can safely land on v2.12.1, every RKE1 downstream cluster must be replatformed to RKE2 (or migrated to an imported/hosted cluster type). You cannot "deal with it later" — Rancher 2.12 has a pre-upgrade validation check that will find and list RKE1 resources and can block the upgrade if they're present.

Action: Inventory every downstream cluster right now and tag which ones are RKE1. Plan their replatform to RKE2 as a parallel workstream that finishes before you attempt the 2.11 → 2.12 hop. Don't proceed past this section until you know the answer.


2. Build your environment inventory

Write this down in a shared doc. You will reference it at every hop.

# Current Rancher version
kubectl get settings.management.cattle.io server-version -o jsonpath='{.value}'

# Helm release info
helm list -n cattle-system

# Kubernetes version of the management cluster
kubectl version --short

# cert-manager version
kubectl get pods -n cert-manager -o jsonpath='{.items[0].spec.containers[0].image}'

# Node OS / container runtime
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
OS:.status.nodeInfo.osImage,\
KERNEL:.status.nodeInfo.kernelVersion,\
CONTAINER_RUNTIME:.status.nodeInfo.containerRuntimeVersion

# Downstream clusters and their provisioning driver
kubectl get clusters.management.cattle.io -o custom-columns=\
NAME:.metadata.name,\
DRIVER:.status.driver,\
PROVIDER:.status.provider

Record for each downstream cluster: driver (RKE1/RKE2/K3s/EKS/etc.), Kubernetes version, node count. You'll cross-check this against the support matrix at every hop.


3. Pre-flight fixes that must happen before hop #1

  • Helm version. Rancher v2.4.3-era installs sometimes still used Helm 2. Helm v2 support was deprecated starting in the Rancher v2.7 line and removed entirely in v2.9. If you're on Helm 2, migrate to Helm 3 now — don't wait for it to become a blocker mid-chain. To reach v2.12.x specifically, your Helm client must be v3.18 or newer (required for Kubernetes 1.33 support introduced in that release).
  • cert-manager. Note your current cert-manager version. You'll need to check compatibility and potentially upgrade it before several of the hops below — Rancher's release notes for each target minor version state the required cert-manager version.
  • Ingress controller. Note your nginx-ingress or Traefik version; some hops require a minimum version.
  • Disk space. Before any hop into v2.11+ territory, check available disk space on your nodes. Rancher v2.12+ added UI server-side pagination that increases disk usage, and insufficient space can cause pod eviction mid-upgrade.

4. The real upgrade path

Here is the chain you actually need to walk. At the time you do this, check the Rancher stable Helm repo and the GitHub releases page for the actual latest patch release of each minor line — don't hardcode version numbers from a blog post, they go stale.

v2.4.3 → latest v2.4.x
       → latest v2.5.x
       → latest v2.6.x
       → latest v2.7.x
       → latest v2.8.x
       → latest v2.9.x
       → latest v2.10.x
       → latest v2.11.x
       → v2.12.1

To find the latest available patch for any minor line right before you hop to it:

helm repo update
helm search repo rancher-stable/rancher --versions | grep "2\.9\."

(swap 2.9. for whichever minor line you're checking.)

Why the full chain and not fewer hops: Rancher's docs are explicit that skipping minor versions is unsupported and untested. Database migrations, CRD schema changes, and internal API versioning all assume you passed through each minor release in sequence. This is the same reason SUSE's own storage products (Longhorn) enforce identical one-hop-at-a-time rules — accumulated schema drift is the risk, not just "does it start up."


5. The repeatable hop procedure

Run this exact sequence for every single hop in the chain above. Do not shortcut it because "it's just a patch bump" — several of these hops are minor-version boundaries with real breaking changes.

5.1 Confirm you're on the latest patch of your current minor

kubectl get settings.management.cattle.io server-version -o jsonpath='{.value}'
helm search repo rancher-stable/rancher --versions | grep "^rancher-stable/rancher\s*2\.X\." 

If you're not on the latest patch of your current minor, upgrade to that patch first, using this same procedure, before moving to the next minor.

5.2 Read the release notes for the target version

Check the GitHub release notes for the specific target version for:

  • Breaking changes
  • Required cert-manager version
  • Required Kubernetes version range for the management cluster
  • Any documented manual migration steps or known issues

Do not skip this even if it feels repetitive — this is where version-specific landmines live (deprecated APIs, CVE-driven behavior changes, changed default settings).

5.3 Take backups — both of them, every time

Rancher-level backup (the officially supported rollback mechanism):

kubectl apply -f - <<EOF
apiVersion: resources.cattle.io/v1
kind: Backup
metadata:
  name: pre-upgrade-backup-$(date +%Y%m%d-%H%M)
spec:
  retentionCount: 10
  resourceSetName: rancher-resource-set
EOF

Confirm it completed:

kubectl get backups.resources.cattle.io

Infrastructure-level etcd snapshot (belt and suspenders — if the management cluster itself gets corrupted, the Rancher backup alone won't save you):

# RKE1 management cluster
rke etcd snapshot-save --config cluster.yml --name pre-upgrade-snapshot

# RKE2 management cluster
rke2 etcd-snapshot save --name pre-upgrade-snapshot

# K3s management cluster
k3s etcd-snapshot save --name pre-upgrade-snapshot

Also export your current Helm values so you can diff or restore configuration:

helm get values rancher -n cattle-system -o yaml > pre-upgrade-values-$(date +%Y%m%d).yaml

Do not proceed to 4.4 until both backups exist and you've verified the Rancher Backup resource shows Completed.

5.4 Check cert-manager compatibility

Compare your current cert-manager version against what the target Rancher version requires (documented in that version's install/upgrade docs). If an upgrade is needed and your cert-manager is v1.5 or below, follow cert-manager's own in-place upgrade docs rather than uninstalling — an uninstall/reinstall of cert-manager underneath a live Rancher install can orphan certificates.

5.5 Update the Helm repo and run the upgrade

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

helm upgrade rancher rancher-stable/rancher \
  --namespace cattle-system \
  --values pre-upgrade-values-$(date +%Y%m%d).yaml \
  --version <TARGET_VERSION> \
  --wait \
  --timeout 10m

Replace <TARGET_VERSION> with the exact patch version for this hop (e.g. 2.7.14), never with a bare minor number. If your installation uses a different chart repo name (e.g. rancher-stable deployed via a marketplace listing), keep using that same release/repo name — don't switch repos mid-chain unless you deliberately follow Rancher's documented repo-switch procedure.

5.6 Watch the rollout, don't walk away

kubectl get pods -n cattle-system -w

Specifically watch for a rancher-pre-upgrade job pod. On some versions (notably around the 2.11 → 2.12 boundary), this job can hang in Pending if your Helm values set podAntiAffinity to required on a single-node or under-provisioned cluster — if it hangs, check node capacity and anti-affinity settings before assuming something is broken.

5.7 Verify before moving to the next hop

# Confirm the new version is live
kubectl get settings.management.cattle.io server-version -o jsonpath='{.value}'

# All cattle-system pods healthy
kubectl get pods -n cattle-system --no-headers | grep -v Running

# Downstream clusters still Active
kubectl get clusters.management.cattle.io -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions

# UI/API health
curl -sk https://<your-rancher-hostname>/healthz

Log into the UI. Check that at least one downstream cluster shows healthy, that you can view its workloads, and that authentication (local, LDAP, SAML — whatever you use) still works.

Soak it. Don't chain-hop through all nine versions in one sitting. Let each hop run for at least a few hours (ideally a full day for the bigger minor jumps) before moving on, so you catch delayed issues — controller crash loops, webhook failures, agent reconnect problems — while you still know exactly which hop caused them.

4.8 Repeat 4.1–4.7 for the next minor version

Go back to step 4.1 with the next minor line in the chain. Do this all the way to v2.12.1.


6. Special notes for the final hop into v2.12.1

  • Re-confirm your Helm client is v3.18+ before this specific hop — earlier Helm clients aren't validated against v2.12.x's Kubernetes 1.33 support and may misbehave.
  • Rancher v2.12 runs a pre-upgrade validation check specifically for leftover RKE1 resources. If you missed replatforming something in Section 0, this hop is where it will fail loudly (which is much better than it failing silently).
  • If your system-image registry configuration uses a cluster-scoped registry or a global system-default-registry (common in air-gapped setups), be aware v2.12.1 has a known issue generating an incorrect docker.io path segment for the cattle-cluster-agent image in that configuration. The workaround is to explicitly set the CATTLE_AGENT_IMAGE environment variable to the bare repository and tag (no registry prefix), e.g. rancher/rancher-agent:v2.12.1 — Rancher will apply the correct registry prefix automatically.
  • v2.12 also changed audit log defaults: there's now a dedicated AUDIT_LOG_ENABLED setting, separate from AUDIT_LEVEL. If you rely on audit logging, review this setting explicitly after the upgrade rather than assuming your old AUDIT_LEVEL config behaves the same way.

7. Rollback procedure (know this before you need it)

If a hop goes wrong and you can't forward-fix it quickly:

# Roll back the Helm release itself
helm rollback rancher -n cattle-system

# Restore Rancher's application state from the backup taken in step 4.3
kubectl apply -f restore.yaml   # a Restore CR referencing your pre-upgrade Backup

# If the management cluster's etcd itself is damaged (rare, but possible):
# stop the server process first, then restore from the etcd snapshot
rke2 server --cluster-reset \
  --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/pre-upgrade-snapshot

Important nuance: restoring a Rancher backup returns Rancher to the exact state it was in when that backup was taken. Anything created or changed after that point — new clusters, new users, new RBAC — is gone. This is another reason to soak each hop before moving on: the closer your rollback point is to "now," the less you lose.


8. Final verification checklist before calling it done

  • [ ] server-version reports v2.12.1
  • [ ] All cattle-system pods Running, none crash-looping
  • [ ] Every downstream cluster shows Active
  • [ ] No RKE1 clusters remain (v2.12+ can't manage them)
  • [ ] Authentication provider(s) working (local, LDAP, SAML, etc.)
  • [ ] Monitoring/logging integrations reconnected and reporting
  • [ ] Fleet / GitOps pipelines (if used) syncing correctly — note that v2.12 moved Fleet workspace initialization through the provisioning controller path instead of webhook side effects, so custom Fleet RBAC assumptions are worth re-checking
  • [ ] Audit logging behaving as expected under the new AUDIT_LOG_ENABLED setting
  • [ ] A fresh Rancher Backup taken after landing on v2.12.1, as your new baseline

9. One more thing worth knowing

By the time you finish this chain, v2.12.1 will likely no longer be Rancher's actual latest release — development moves fast, and v2.13/v2.14 lines are already active. That's fine if v2.12.1 is your deliberate target (e.g. for support-matrix or compliance reasons). But if the real goal is just "get current," it's worth checking the latest stable release before you finish this project, since you'll be paying the same per-hop discipline either way — an extra hop or two while you're already in the rhythm of this process is cheap compared to doing it all again in six months.

Treat this whole chain as a project, not a maintenance window: inventory → replatform RKE1 → hop, verify, soak → repeat → final verification. That discipline is what turns "upgrading five years of Rancher versions" from a plausible outage into a boring, predictable afternoon-by-afternoon task.