Skip the Upgrade Chain: Fresh RKE2 + Rancher Install, and How to Migrate Your Apps Over Cleanly
The previous post walked through the painful, mandatory hop-by-hop path required to upgrade an existing Rancher server from v2.4.3 to v2.12.1. If your v2.4.3 environment is old enough that it's still running RKE1 downstream clusters, dragging Helm 2 baggage, or just accumulated five years of config drift — the faster, safer, and honestly less stressful option is usually not to upgrade it at all.
Build a fresh RKE2 cluster, install the latest Rancher on it directly, and migrate your applications over deliberately. No nine-hop upgrade chain, no legacy RKE1 replatform-in-place surgery, no inherited cert-manager version conflicts. You get a clean, current, fully-supported starting point, and a migration you can automate and repeat.
This post covers both halves: standing up the new stack, and moving real applications — a Next.js/React frontend, a Go or Node.js API, and a MySQL database — from the old cluster to the new one with minimal downtime.
Why "fresh + migrate" beats "upgrade in place" here
- No RKE1 problem. Rancher v2.12+ can't manage RKE1 downstream clusters at all. A fresh RKE2 cluster sidesteps that entirely — RKE2 has been the supported path for years.
- No cert-manager/Helm archaeology. A brand-new install just uses current cert-manager and current Helm requirements, full stop.
- No accumulated CRD/schema drift. Nine minor-version hops means nine sets of database migrations applied in sequence to years-old data. A fresh install has none of that history to carry.
- You control the cutover window precisely. The old cluster keeps serving traffic the entire time you build and test the new one. There's no "point of no return" moment like there is mid-upgrade-chain.
- It forces you to formalize deployment as automation — which, if you're already thinking GitOps and CI/CD, is a net win rather than a chore.
The trade-off: you need somewhere to run two clusters briefly, and you need a migration plan for stateful data. That's what the rest of this post is for.
Part 1 — Build the new cluster and install Rancher fresh
1.1 Provision the RKE2 cluster
For production, use three server nodes for HA (odd number required for etcd quorum). For a smaller setup a single server node works, but plan to grow into HA before this becomes your permanent home.
# On the first server node
mkdir -p /etc/rancher/rke2
cat <<EOF > /etc/rancher/rke2/config.yaml
tls-san:
- rancher.yourdomain.com
EOF
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service
# Watch it come up
journalctl -u rke2-server -f
Once it's up, wire up kubectl:
mkdir -p ~/.kube
sudo cp /etc/rancher/rke2/rke2.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
export KUBECONFIG=~/.kube/config
export PATH=$PATH:/var/lib/rancher/rke2/bin
kubectl get nodes
Join additional server nodes using the token at /var/lib/rancher/rke2/server/node-token, and agent nodes the same way with INSTALL_RKE2_TYPE="agent".
Ingress note: ingress-nginx went end-of-life in March 2026, and starting with RKE2 v1.36, Traefik is the default ingress for new clusters instead of nginx. If you're pulling in a recent RKE2 version, plan your ingress annotations and TLS setup around Traefik rather than nginx-ingress annotations you may be used to from the old cluster — they don't map 1:1.
1.2 Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
1.3 Install Rancher — direct to latest, no chain required
This is the entire benefit of a fresh install: one command, no version chain.
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update
helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--create-namespace \
--set hostname=rancher.yourdomain.com \
--set bootstrapPassword=<set-a-strong-temporary-password> \
--set ingress.tls.source=letsEncrypt \
--set letsEncrypt.email=you@yourdomain.com \
--set letsEncrypt.ingress.class=traefik
Leave off --version to get whatever is currently the latest stable release — this is the one place in the whole process where you actually want "latest," since there's no legacy state to preserve.
Watch the rollout:
kubectl -n cattle-system rollout status deploy/rancher
Log in via the hostname you set, using the bootstrap password, then immediately set a permanent admin password when prompted.
1.4 Register the new cluster / verify local management
If you installed Rancher on the RKE2 cluster itself (the common "single cluster, local management" pattern), you're done — this cluster is both your Rancher server and available for workloads. If you're keeping Rancher's management plane separate from workload clusters, register this new RKE2 cluster as a downstream cluster instead.
At this point you have a clean Rancher v2.14.x (or whatever is current) instance on a clean RKE2 cluster, and nothing running on it yet. Time to move applications over.
Part 2 — Migration strategy: don't treat every workload the same way
The mistake teams make here is reaching for one tool (usually Velero) and pointing it at everything. Split your migration by workload type instead — it's both safer and a better fit for an automation-first setup.
| Workload type | Best migration method | Why |
|---|---|---|
| Stateless apps (Next.js frontend, Go/Node.js API) | Redeploy via CI/CD / GitOps, not backup-restore | You already have the source of truth (your Git repo + container images) — redeploying is cleaner than copying stale cluster state |
| Stateful data (MySQL) | Native DB export/import or replication, not raw PV copy | Guarantees data consistency; avoids storage-class/CSI mismatches between clusters |
| Managed backend (Supabase) | Nothing to migrate at the infra layer | It's already outside the cluster — just repoint app config |
| Secrets / TLS / configs | Re-issue and re-encrypt for the new cluster, don't copy raw | Old secrets may reference old cluster's storage or drift from source of truth |
| Anything you don't have IaC for | Velero backup/restore as a safety net | Better than losing config for that one workload nobody remembers building manually |
2.1 Stateless workloads: redeploy, don't copy
If your Next.js frontend and Go/Node.js backend are already built as container images via CI, the cleanest migration is to redeploy them fresh on the new cluster from your existing pipeline, pointed at a new kubeconfig/context. This sidesteps an entire category of migration risk (stale manifests, cluster-specific annotations, old ingress classes) and is exactly the kind of thing worth automating once and reusing forever.
A minimal GitHub Actions example, deploying to the new cluster via kubectl using a stored kubeconfig secret:
name: deploy-frontend
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
run: |
docker build -t registry.yourdomain.com/frontend:${{ github.sha }} .
docker push registry.yourdomain.com/frontend:${{ github.sha }}
- name: Deploy to new cluster
run: |
echo "${{ secrets.NEW_CLUSTER_KUBECONFIG }}" | base64 -d > kubeconfig
export KUBECONFIG=kubeconfig
kubectl set image deployment/frontend \
frontend=registry.yourdomain.com/frontend:${{ github.sha }} \
-n production
kubectl rollout status deployment/frontend -n production
If you're already leaning toward GitOps rather than push-based CI, this is a good moment to stand up Fleet (bundled with Rancher) or ArgoCD pointed at the new cluster, so future deploys — and any future cluster migration — become "point the GitOps controller at the new cluster and let it reconcile," rather than a manual project every time.
Repeat this for the Go/Node.js backend. Both deployments should reference the new cluster's ingress (Traefik, per Part 1) rather than copying old nginx-ingress annotations verbatim.
2.2 Stateful data: MySQL
Don't try to move a MySQL PVC by copying the underlying volume across clusters — storage classes and CSI drivers frequently don't match between an old cluster and a new one, and you'll fight silent data corruption or mount failures instead of a clean migration. Move the data at the database layer instead.
For small-to-medium databases, a straightforward dump/restore during a short maintenance window is the least risky option:
# On the OLD cluster: dump the database
kubectl exec -n production deploy/mysql -- \
mysqldump -u root -p"$MYSQL_ROOT_PASSWORD" --single-transaction --routines --triggers app_db \
> app_db_dump.sql
# Copy the dump somewhere both clusters can reach (S3, local transfer, etc.)
# On the NEW cluster: restore into the freshly deployed MySQL
kubectl exec -i -n production deploy/mysql -- \
mysql -u root -p"$MYSQL_ROOT_PASSWORD" app_db < app_db_dump.sql
--single-transaction gives you a consistent snapshot without locking the whole database, assuming InnoDB tables.
For larger databases where a full dump/restore would mean too much downtime, set up MySQL replication from old to new instead: bring up the new MySQL as a replica of the old one, let it catch up, then cut over once replication lag hits zero — this gets you down to seconds of write downtime instead of however long a full dump/restore takes. Percona XtraBackup is the standard tool if you need a hot physical backup rather than logical replication for very large datasets.
If you're using Supabase instead of self-hosted MySQL for this app (common for MVP-stage projects), there's nothing to migrate at the cluster level at all — the database lives outside both clusters. You just need to repoint the new cluster's app config/secrets at the same Supabase project, and double check IP allow-listing or connection pooling settings if you're using a fixed source IP anywhere.
2.3 Secrets, TLS, and config
Don't kubectl get secret -o yaml | kubectl apply your way across clusters wholesale — old secrets can carry stale references (old cert-manager issuer names, old storage class references baked into annotations) that silently break on the new cluster. Instead:
- Re-create application secrets from your actual source of truth (vault,
.envfiles in a secrets manager, CI secrets) on the new cluster. - Let cert-manager on the new cluster issue fresh TLS certificates rather than copying certificate secrets — it's one Ingress annotation away and guarantees valid, correctly-scoped certs.
- If you want this automated going forward, this is a good trigger to adopt External Secrets Operator or Sealed Secrets so secrets live in Git/your secrets manager and get synced into whichever cluster is current, rather than being cluster-resident state you have to remember to migrate by hand.
2.4 The safety-net workload: Velero
For anything you don't have clean IaC for — that one internal tool someone kubectl apply'd manually in 2022 and never wrote down — use Velero as your catch-all rather than trying to reconstruct it from memory.
# Install Velero on BOTH clusters, pointed at the same object storage bucket
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0 \
--bucket k8s-migration-backup \
--secret-file ./credentials-velero \
--backup-location-config region=us-east-1
# On the OLD cluster: back up the namespace(s) in question
velero backup create legacy-app-backup \
--include-namespaces legacy-app \
--include-cluster-resources=true \
--snapshot-volumes=true \
--wait
# On the NEW cluster: restore
velero restore create legacy-app-restore \
--from-backup legacy-app-backup \
--wait
A couple of things worth knowing before you rely on this: Velero doesn't support restoring into a cluster running an older Kubernetes version than the one the backup was taken on — going from your old cluster to a newer RKE2 is the supported direction, not the reverse. And Velero can't back up hostPath volumes at all, so if the old cluster has anything using those instead of proper PVCs, that data needs a manual copy.
Part 3 — Cutover
Once an application is deployed and verified on the new cluster:
- Smoke test on the new cluster's own hostname/IP before touching DNS — hit the app directly, verify the frontend renders, API responds, and it's reading/writing the migrated database correctly.
- Lower DNS TTL for the relevant records an hour or so ahead of cutover, if you haven't already, so the eventual switch propagates fast.
- Cut over DNS (or your load balancer / ingress target) from old cluster to new cluster, one application at a time rather than all at once. This turns "big bang migration" into a series of small, individually-reversible steps.
- Watch error rates and logs on the new cluster closely for the first hour after each cutover.
- Keep the old cluster running, untouched, for a rollback window (a few days is reasonable) before you decommission anything. If something on the new cluster misbehaves, flipping DNS back is far faster than trying to reconstruct the old environment from scratch.
Part 4 — Decommission and lock in the automation
Once every application has been running clean on the new cluster through your rollback window:
- Tear down the old RKE2/RKE1 cluster and its Rancher instance.
- Keep the Velero backups from the migration for a while longer as an extra safety net, even after decommissioning.
- If this migration pushed you into setting up CI/CD-driven deploys or Fleet/ArgoCD GitOps for the first time, treat that as the actual deliverable of this project, not just a means to an end. The next cluster migration — whether it's a Kubernetes version bump, a cloud provider change, or another Rancher generational leap — should be "point the pipeline at a new kubeconfig and let it reconcile," not another multi-week manual project.
That's the real payoff of doing it this way instead of upgrading in place: you don't just get a current Rancher server, you get a deployment process that makes the next one boring.