Backup & Recovery
kubara is GitOps first:
- Use Git and
kubara generate/kubara bootstrapto recreate desired platform state - Use your secret backend as the source of truth for credentials and sensitive values
- Use Velero to recover runtime resources and persistent data that Git cannot recreate
This page covers the kubara-specific path. For Velero installation details, command usage, and troubleshooting, use the official Velero documentation.
Your goal should be to have as much of your desired state inside your GitOps repository and secret backend and only rely on Velero for dynamic data that gets generated in runtime like the contents of PVCs of databases etc.
Velero & kubara
kubara supports Velero as a built-in component for backup, restore, disaster recovery, and migration.
What kubara covers:
- Enabling Velero in cluster config
- Generated overlay files and GitOps rollout
- Documentation on how Velero fits into a recommended recovery model
What stays with you as a Platform Operator / Team:
- Provider-specific installation and configuration
- Backup and restore command tutorials and runbooks
- Scheduling, CSI, file-system backup internals, and troubleshooting
Before enabling Velero
Decide these three things first:
- Backup storage
The most common way to use Velero is with an S3-compatible object storage target plus credentials from your secret backend.
backupStorage.create: truelets kubara create the bucket and credentials where supported. - Volume backup mode Choose the mode based on your provider's snapshot durability and your recovery goal:
fs-backup: Uses the Velero node-agent to back up volume contents to object storage. This is the default.csi-snapshot: Uses CSI snapshots. Snapshot durability depends on the provider and CSI driver.csi-data-mover: Creates CSI snapshots and moves their data to object storage through the Velero node-agent. More about File System Backups can be found in the official Velero docs.- Recovery goal Be clear whether you are optimizing for namespace restore, cluster rebuild, disaster recovery, or migration.
File-system backup and CSI snapshots are mutually exclusive
File-system backup via Kopia and CSI volume snapshots are mutually exclusive — they cannot both be active for the same PVC at once. When backupMode: fs-backup is enabled, Velero uses file-system backup for all volumes and takes no CSI snapshots. Choose one approach before going to production; switching later may leave gaps in your backup history.
References: File-system backup · CSI snapshots
CSI snapshots are not full backups — know your provider
A CSI snapshot is generally not a full, standalone backup and is not necessarily independent of its source volume. Snapshot durability depends on your cloud provider and CSI driver.
Before relying on plain CSI snapshots, check your provider's block storage documentation. If snapshots are removed together with their source volume — or if you need true off-volume backups — use backupMode: fs-backup or backupMode: csi-data-mover, so the data is stored in object storage independently of volume state.
For example, the Open Telekom Cloud EVS documentation states: "When you delete an EVS disk, all the disk data including the snapshots created for this disk will be deleted."
References: Velero CSI support · CSI Snapshot Data Movement
3-2-1 Backups
On a more general note, your team should try to follow the 3-2-1 backup strategy, meaning: 3 Copies of your data 2 Different mediums, so not on the same disk as describe above 1 Offsite location, e.g different Region or even different Provider
This makes your setup more resilient against possible disasters.
Enable Velero
Example config.yaml:
clusters:
- name: my-cluster
stage: prod
services:
velero:
status: enabled
config:
backupStorage:
s3Url: https://object.storage.eu01.onstackit.cloud
For STACKIT, backupStorage.create: true makes kubara generate the dedicated Object Storage bucket (bucket-velero-<name>-<stage>) and credentials group. The bucket region defaults to eu01 and can be changed through backupStorage.region. Set backupStorage.s3Url to the matching S3 endpoint for that region.
The generated Terraform writes the S3-compatible credentials into STACKIT Secrets Manager at path <cluster-name>/<stage>/velero/velero_s3_credentials, key cloud, in the form:
[default]
aws_access_key_id = <ACCESS_KEY_ID>
aws_secret_access_key = <SECRET_ACCESS_KEY>
So on STACKIT you need status: enabled and backupStorage.s3Url unless you intentionally change the backup mode, bucket region, or use an existing bucket.
For T Cloud Public CCE, configure the matching OBS endpoint and region:
config:
backupMode: csi-data-mover
backupStorage:
create: true
region: eu-de
s3Url: https://obs.eu-de.otc.t-systems.com
With backupStorage.create: true, the generated infrastructure layer creates the OBS bucket and access credentials. Pass the velero_credential_access_key and velero_credential_secret_access_key Terraform outputs to the OpenBao layer as TF_VAR_velero_access_key_id and TF_VAR_velero_secret_access_key. Activate the velero_credentials block from customer-service-catalog/terraform/<cluster-name>/openbao/secrets.tf-example so OpenBao writes the credential file to secret/<cluster-name>/<stage>/velero/velero_s3_credentials.
To use an existing S3-compatible bucket instead, set backupStorage.create: false and provide the bucket connection details in the Velero config:
config:
backupStorage:
create: false
bucketName: my-velero-backups
region: eu01
s3Url: https://s3.example.com
With backupStorage.create: false, kubara does not generate the Terraform bucket or credentials. Provide the S3-compatible credentials yourself in your provider's expected secret path, using key cloud and the same format shown above.
Then:
- Run
kubara generate - Review:
customer-service-catalog/helm/<cluster-name>/velero/values.yamlcustomer-service-catalog/helm/<cluster-name>/velero/additional-values.yaml
- Commit and push so Argo CD can deploy Velero
- Test a full backup and restore cycle immediately after setup Do not consider Velero operational until you have verified that backups actually work end-to-end. A misconfigured node-agent, CSI driver integration, or S3 endpoint can silently produce incomplete or empty backups with no visible error during backup creation. Restore to a test namespace and confirm that data is intact. Repeat this test after major changes (Velero upgrades, CSI driver updates, storage migrations). References: Backup reference · Restore reference · Disaster recovery
With a healthy Velero setup, create a backup with:
velero backup create my-backup --wait
velero backup describe my-backup
velero restore create --from-backup my-backup --wait
These are the most simple commands Velero offers for backup. For production we advice you to create automated backups.
For more information on that have a look at the official documentation.
You can also create a cronjob via the values.yaml setting, with the help of the additional-values.yaml file. You can look here, how this might look like.
Use additional-values.yaml for environment-specific overrides you want to keep next to the generated baseline.
Custom VolumeSnapshotClass via additional-values.yaml
When you use backupMode: csi-snapshot or backupMode: csi-data-mover, Velero uses CSI snapshots instead of file-system backups.
kubara writes volumeSnapshotClass.k8sProvider into the generated Velero values based on terraform.provider.
If your environment is not covered by one of the built-in provider mappings, or if you need provider-specific fields that differ from the default, define your own VolumeSnapshotClass in customer-service-catalog/helm/<cluster-name>/velero/additional-values.yaml.
When terraform.provider is none, kubara does not select a built-in VolumeSnapshotClass provider.
volumeSnapshotClass.customDefinition takes precedence over the provider mapping, so this is the recommended way to supply a fully custom snapshot class.
Example:
volumeSnapshotClass:
customDefinition:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: velero-csi
labels:
velero.io/csi-volumesnapshot-class: "true"
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
tagSpecification_1: "Name=velero-snapshot"
Important notes:
- Keep
name: velero-csiand the labelvelero.io/csi-volumesnapshot-class: "true"unless you intentionally want Velero to use a different class selection setup. - Put this override into
additional-values.yaml, not the generatedvalues.yaml, becausevalues.yamlis regenerated by kubara. - If the built-in provider mapping already matches your environment, you usually do not need a custom definition.
Recovery model
Velero should complement GitOps, not replace it.
| Source | Typical content |
|---|---|
| Git + kubara + Argo CD | Cluster definitions, generated platform config, Helm values, ApplicationSets, managed manifests |
| Secret backend | External credentials, OAuth secrets, provider tokens, synced secret resources |
| Velero | Runtime Kubernetes resources and persistent volume data |
Recommended recovery flow
- Required access to Git, the secret backend, the Velero object storage, and a working target cluster
- Bootstrap the cluster again and let Argo CD restore the declared platform state
- Use Velero to restore runtime resources and persistent data
- Verify that Argo CD, External Secrets, ingress, certificates, DNS, and stateful workloads are healthy
For most teams, the best first test is restoring one non-critical namespace or workload.
Misc.
Other Storage Providers
Should you use a provider who does not support the S3 API, you can change the provider by replacing the plugin in managed-service-catalog/helm/velero/values.yaml. A list of available plugins can be found here.
Crash Consistency
If you use file based backup, to backup a deployed database, please refer to the backup tools of choice for your database. File based backup is not crash-consistent. E.g use pg_dump or mysqldump instead.