Validating Your Kubernetes Environment

Before you add your cluster to Commvault to begin protecting it, using kubectl commands, validate that your environment is ready for Commvault backups.

Get the API Server URL

To add your cluster to Commvault, you need to know the kube-apiserver or control plane URL.

Command to run:
```
 
kubectl cluster-info
```

In the following example output, the URL is https://k8s-123-4.home.arpa:6443:

Kubernetes control plane is running at https://k8s-123-4.home.arpa:6443 CoreDNS is running at https://k8s-123-4.home.arpa:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Verify the Nodes Are Ready

Verify that all cluster nodes (control nodes and worker nodes) have a condition status of Ready—not of DiskPressure, MemoryPressure, PIDPressure, or NetworkUnavailable. For information about node condition statuses, see Conditions.

Also verify that the version of the nodes is supported by Commvault.

For on-premises clusters or infrastructure-based clusters, verify that multiple control plane nodes or master nodes are listed.

Command to run:
```
kubectl get nodes
```

Example output:

From an Azure Kubernetes Service (AKS) cluster:

NAME STATUS ROLES AGE VERSION aks-agentpool-26889666-vmss000000 Ready agent 3h41m v1.23.5 aks-agentpool-26889666-vmss000001 Ready agent 3h41m v1.23.5 aks-agentpool-26889666-vmss000002 Ready agent 3h41m v1.23.5

From a Google Kubernetes Engine (GKE) cluster:

NAME STATUS ROLES AGE VERSION gke-cluster-1-default-pool-a709ed39-hcc6 Ready <none> 9s v1.23.6-gke.1700 gke-cluster-1-default-pool-a709ed39-q497 Ready <none> 9s v1.23.6-gke.1700 gke-cluster-1-default-pool-a709ed39-zkg8 Ready <none> 9s v1.23.6-gke.1700

From an on-premises Vanilla Kubernetes single-node dev/test cluster:

NAME STATUS ROLES AGE VERSION k8s-123-4 Ready control-plane,master 77d v1.23.4

Verify the CSI Drivers Are Functioning, Registered, and Support Persistent Mode

In order for the Container Storage Interface (CSI) driver to perform provisioning, attach/detach, mount, and snapshot activities, CSI drivers must be installed, functioning, and registered, and they must support the Persistent mode.

Command to run:
```
kubectl get csidrivers
```
Example output:
- From a cluster that is running the Rook-Ceph CSI driver:
- From an Azure Kubernetes Service (AKS) with default CSI drivers installed and configured:
- From a Google Kubernetes Engine (GKE) cluster with default CSI drivers installed and configured:
- From a Vanilla Kubernetes cluster with Ceph and Hostpath CSI drivers installed and configured:

Verify the CSI Pods

Each of your CSI drivers has one or more pods that run to respond to provisioning, attach, detach, and mount requests. Verify that the pods for your CSI driver have a status of Running. Also verify that each CSI driver has 2 pods listed, with 1 pod labeled as the provisioner. Commvault uses the provisioner to create new CSI volumes during restores.

Verify Resource Limits

Check whether your Kubernetes administrator set ResourceQuotas on your cluster.

Command to run:

kubectl get resourcequotas -n namespace

If quotas are set, tune the Commvault worker Pod CPU and memory so that the quotas don't prevent Commvault from creating the necessary temporary worker Pods and PersistentVolumeClaim.

Verify the Nodes Have No Active Taints

Verify that there are no taints on your cluster that might prevent backups or restores.

Command to run:
```
 
kubectl describe node node name | grep -i taints
```
Where node_name is the name of the node that you want to verify.
Example output from a Vanilla Kubernetes server that has no active taints:
```
 
Taints:   <none>
```

Verify Your vSphereVolume StorageClass Is Available

If you want to use Commvault for snapshot protection of VMware vSphere VMDK volumes, when the vSphere CSI driver is unavailable, the legacy vsphereVolume (deprecated) volume plug-in must be available.

Caution

vsphereVolume is an in-tree volume plug-in and is deprecated from active Kubernetes releases. Upgrade your VMware cluster to support the Container Storage Interface (CSI) out-of-tree driver.

List Your StorageClasses

Identify the StorageClasses that have a Provisioner column value of kubernetes.io/vsphere-volume.

Command to run:
```
kubectl get sc
```

Example output:

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE thin kubernetes.io/vsphere-volume Delete Immediate false 10d thin-csi (default) csi.vsphere.vmware.com Delete WaitForFirstConsumer true 7d8h

Verify the ReclaimPolicy Is Delete

For StorageClasses that have a provisioner of kubernetes.io/vsphere-volume, verify that the ReclaimPolicy is correctly set to Delete.

Command to run:
```
 
kubectl describe sc storageclass_name
```
Example command:
```
kubectl describe sc thin
```

Example output:

 
Name:                  thin
IsDefaultClass:        No
Annotations:           <none>
Provisioner:           kubernetes.io/vsphere-volume
Parameters:            diskformat=thin
AllowVolumeExpansion:  <unset>
MountOptions:          <none>ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

Verify the StorageClasses Are CSI-Enabled

The PersistentVolumeClaims (PVCs) that you want to protect must be presented by a registered, CSI-enabled StorageClass. Verify that the StorageClasses that have PersistentVolumeClaims that you want to protect use the Container Storage Interface (CSI).

Command to run:
```
kubectl get storageclasses
```
Example output from a Vanilla Kubernetes cluster running Hostpath and Ceph Raw-Block Device (RBD) CSI drivers. This cluster also runs a non-CSI-based volume plug-in for provisioning object-based Rook-Ceph storage. Note that, if the provisioner does not contact CSI, then the volume plugin/driver is not supported by Commvault for backups and restores.

Verify You Have a CSI Node That Can Handle Requests

After installing a CSI driver, you can verify that the installation was successful by listing the nodes that have CSI drivers installed on them.

Command to run:
```
kubectl get csinodes
```

Example output:

From a Azure Kubernetes Service (AKS) cluster that has a default CSI driver installation:

NAME DRIVERS AGE aks-agentpool-26889666-vmss000000 2 5h aks-agentpool-26889666-vmss000001 2 5h aks-agentpool-26889666-vmss000002 2 5h

From a Google Kubernetes Engine (GKE) cluster with default CSI driver installation:

NAME DRIVERS AGE gke-cluster-1-default-pool-a709ed39-hcc6 1 73m gke-cluster-1-default-pool-a709ed39-q497 1 73m gke-cluster-1-default-pool-a709ed39-zkg8 1 73m

If Necessary, Get Detailed Information About the CSI Drivers Installed on Each Node

You can get detailed information about the CSI drivers installed on each node.

Command to run:
```
 
kubectl describe csinodes csinode_name
```

Example commands:

For an Azure Kubernetes Service (AKS) cluster:

kubectl describe csinodes aks-agentpool-26889666-vmss000000

Example output:

Name: aks-agentpool-26889666-vmss000000 Labels: <none> Annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd CreationTimestamp: Mon, 30 May 2022 16:45:27 +0000 Spec: Drivers: disk.csi.azure.com: Node ID: aks-agentpool-26889666-vmss000000 Allocatables: Count: 4 Topology Keys: [topology.disk.csi.azure.com/zone] file.csi.azure.com: Node ID: aks-agentpool-26889666-vmss000000 Events: <none>

For a Google Kubernetes Engine (GKE) cluster:

kubectl describe csinodes gke-cluster-1-default-pool-a709ed39-hcc6

Example output:

Name: gke-cluster-1-default-pool-a709ed39-hcc6 Labels: <none> Annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/cinder,kubernetes.io/gce-pd CreationTimestamp: Mon, 30 May 2022 20:36:18 +0000 Spec: Drivers: pd.csi.storage.gke.io: Node ID: projects/intense-vault-351721/zones/us-central1-c/instances/gke-cluster-1-default-pool-a709ed39-hcc6 Allocatables: Count: 15 Topology Keys: [topology.gke.io/zone] Events: <none>

Verify the Pods Do Not Have an Error Status

Verify that your cluster and its hosted applications have a status of Running, Completed, or Terminated—not of Pending, Failed, CrashLoop, Evicted, or Unknown. Although Commvault notifies you of failures for backups and restores, you still need to verify the stability of your cluster before beginning backups.

If you identify a pod that does not have a status of Running, Completed, or Terminated, see Troubleshooting Applications. For more information about states, see Container states.

Command to run:
```
kubectl get pods -A
```

Example output:

From a newly created Azure Kubernetes Services (AKS) cluster with all pods in the Running status (as expected):

NAMESPACE NAME calico-system calico-kube-controllers- calico-system calico-node-6g6hm calico-system calico-node-n5xvk calico-system calico-node-r9kk9 calico-system calico-typha-ddbb67d7b-kzk6k calico-system calico-typha-ddbb67d7b-mxhhl kube-system cloud-node-manager-bdfsj kube-system cloud-node-manager-bhkfp kube-system cloud-node-manager-kqb5l kube-system coredns-87688dc49-wb592 kube-system coredns-87688dc49-xbbpg kube-system coredns-autoscaler-6fb889cdfc-vsm7j kube-system csi-azuredisk-node-6p4w8 kube-system csi-azuredisk-node-g5rct kube-system csi-azuredisk-node-n2v72 kube-system csi-azurefile-node-6ss8s kube-system csi-azurefile-node-r5682 kube-system csi-azurefile-node-rdrgn kube-system kube-proxy-8f5fm kube-system kube-proxy-sw4hs kube-system kube-proxy-wqzlz kube-system metrics-server-948cff58d-5d8qv kube-system metrics-server-948cff58d-q4lsm kube-system tunnelfront-5486fcf877-j8gvc tigera-operator tigera-operator-5755874764-vxckp READY STATUS RESTARTS AGE 7547b445f6-rgt7l 1/1 Running 1 (5h14m ago) 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h18m 1/1 Running 0 5h14m 1/1 Running 0 5h18m 3/3 Running 0 5h15m 3/3 Running 0 5h15m 3/3 Running 0 5h15m 3/3 Running 0 5h15m 3/3 Running 0 5h15m 3/3 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 0 5h15m 1/1 Running 1 (5h14m ago) 5h18m 1/1 Running 1 (5h14m ago) 5h18m 1/1 Running 0 4h56m 1/1 Running 0 5h18m

From a newly created Google Kubernetes Engine (GKE) cluster with all pods in the Running status (as expected):

NAMESPACE NAME READY STATUS RESTARTS AGE kube-system event-exporter-gke-5dc976447f-szrn9 2/2 Running 0 86m kube-system fluentbit-gke-6896v 2/2 Running 0 85m kube-system fluentbit-gke-69ntr 2/2 Running 0 85m kube-system fluentbit-gke-wnwkd 2/2 Running 0 85m kube-system gke-metrics-agent-5cq66 1/1 Running 0 85m kube-system gke-metrics-agent-cwnzn 1/1 Running 0 85m kube-system gke-metrics-agent-tdvx2 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-8sfzc 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-gvvnn 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-tbr2t 1/1 Running 0 86m kube-system konnectivity-agent-autoscaler-84559799b7-njczq 1/1 Running 0 86m kube-system kube-dns-584f56f967-65758 4/4 Running 0 86m kube-system kube-dns-584f56f967-jglv5 4/4 Running 0 85m kube-system kube-dns-autoscaler-9f89698b6-rn54z 1/1 Running 0 74m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-hcc6 1/1 Running 0 84m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-q497 1/1 Running 0 85m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-zkg8 1/1 Running 0 85m kube-system l7-default-backend-5465dfc4ff-274zw 1/1 Running 0 86m kube-system metrics-server-v0.5.2-6f6d597469-xn6sx 2/2 Running 0 85m kube-system pdcsi-node-2xr6p

From a Vanilla Kubernetes cluster that recently experienced a DiskFull event on its CSI driver:

NAMESPACE NAME READY STATUS RESTARTS AGE app-nginx nginx 1/1 Running 0 76d apps-helm my-release-redis-master-0 1/1 Running 0 56d apps-helm my-release-redis-replicas-0 1/1 Running 0 56d apps-helm my-release-redis-replicas-1 1/1 Running 0 56d apps-helm my-release-redis-replicas-2 1/1 Running 0 56d calico-apiserver calico-apiserver-5444dfd6b4-nc9f6 1/1 Running 0 77d calico-apiserver calico-apiserver-5444dfd6b4-pkgwj 1/1 Running 0 77d calico-system calico-kube-controllers-67f85d7449-ctxdz 1/1 Running 1 (77d ago) 77d calico-system calico-node-tlqvh 1/1 Running 1 (77d ago) 77d calico-system calico-typha-7bc4d5557f-rb7d2 1/1 Running 2 (77d ago) 77d default csi-hostpath-socat-0 1/1 Running 0 63m default csi-hostpathplugin-0 8/8 Running 0 63m default csirbd-demo-pod 1/1 Running 0 76d kube-system coredns-64897985d-brsvc 1/1 Running 1 (77d ago) 77d kube-system coredns-64897985d-s9rsq 1/1 Running 1 (77d ago) 77d kube-system etcd-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-apiserver-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-controller-manager-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-proxy-8mmzm 1/1 Running 1 (77d ago) 77d kube-system kube-scheduler-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system snapshot-controller-7f5d798964-6vz5m 1/1 Running 0 76d kube-system snapshot-controller-7f5d798964-mc59t 1/1 Running 0 76d rook-ceph csi-cephfsplugin-provisioner-5dc9cbcc87-9hjvh 6/6 Running 0 76d rook-ceph csi-cephfsplugin-qjpbn 3/3 Running 0 76d rook-ceph csi-rbdplugin-provisioner-58f584754c-gqw6b 6/6 Running 1 (43d ago) 76d rook-ceph csi-rbdplugin-x5vlg 3/3 Running 0 76d rook-ceph rook-ceph-mgr-a-f74657b66-bs6b2 0/1 Completed 1 76d rook-ceph rook-ceph-mgr-a-f74657b66-k8vz2 1/1 Running 0 11d rook-ceph rook-ceph-mon-a-75d9f6df4-446xc 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-7j7pd 0/1 Completed 0 25d rook-ceph rook-ceph-mon-a-75d9f6df4-bsbwg 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-c7rxf 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-fktm4 0/1 Completed 0 42d rook-ceph rook-ceph-mon-a-75d9f6df4-fplth 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-kctct 0/1 Completed 1 76d rook-ceph rook-ceph-mon-a-75d9f6df4-msx5n 1/1 Running 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-phmrb 0/1 Evicted 0 2d15h

Verify You Have a VolumeSnapshotClass That Has a CSI Driver

A CSI-enabled VolumeSnapshotClass is required for Commvault to orchestrate the creation of storage snapshots. Verify that your environment includes a VolumeSnapshotClass that has a CSI driver. A VolumeSnapshot is a storage-level snapshot of the underlying storage sub-system. Volume snapshots can be application-consistent or infrastructure/storage-consistent (default).

Command to run:
```
kubectl get volumesnapshotclass
```

Example output from a Vanilla Kubernetes cluster with Ceph Raw Block Device (RBD) VolumeSnapshotClass installed and configured:

NAME DRIVER DELETIONPOLICY AGE csi-hostpath-snapclass hostpath.csi.k8s.io Delete 74mcsi-rbdplugin-snapclass rook-ceph.rbd.csi.ceph.com Delete 76d

If Necessary, Get Detailed Information About the VolumeSnapshotClass

Command to run:

kubectl describe volumesnapshotclass volumesnapshotclass_name

Example command:

kubectl describe volumesnapshotclass csi-rbdplugin-snapclass

Example output:

Name: csi-rbdplugin-snapclass Namespace: Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1 Deletion Policy: Delete Driver: rook-ceph.rbd.csi.ceph.com Kind: VolumeSnapshotClass Metadata: Creation Timestamp: 2022-03-14T23:05:36Z Generation: 1 Managed Fields: API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:deletionPolicy: f:driver: f:parameters: .: f:clusterID: f:csi.storage.k8s.io/snapshotter-secret-name: f:csi.storage.k8s.io/snapshotter-secret-namespace: Manager: kubectl-create Operation: Update Time: 2022-03-14T23:05:36Z Resource Version: 67371 UID: 153a1fac-783c-4b71-9d57-f0e161650100 Parameters: Cluster ID: rook-ceph csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph Events: <none>

The example output shows that the underlying driver is a CSI driver (rook-ceph.rbd.csi.ceph.com) and that a VolumeSnapshotClass is registered.

Volume snapshots is not installed by default in many cloud-managed Kubernetes services. For example, a default installation Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE) clusters returns "No resources found", which indicates that no VolumeSnapshotClass is registered and snapshot backup is not possible without installing the volume snapshot controller and associated StorageClass and Custom Resource Definitions. For instructions to install the CSI external-snapshotter, see kubernetes-csi / external-snapshotter.

Verify the API Version Includes snapshot.storage.k8s.io

The CSI external-snapshotter supports both v1 and v1beta1 snapshot APIs. Verify that your installed external-snapshotter supports the snapshot.storage.k8s.io/v1 API.

Command to run:

kubectl describe volumesnapshotclass volumesnapshotclass_name

Example command:

kubectl describe volumesnapshotclass csi-rbdplugin-snapclass

Example output:

Verify You Have a snapshot-controller Pod in the Running Status

To install and configure the CSI external-snapshotter, the Kubernetes Volume Snapshot CRDs, volume snapshot controller, and snapshot validation webhook components are installed. Verify that the snapshot-controller is installed and has a status of Running, which is required for the external snapshotter to handle requests to orchestrate snapshots.

Command to run:
```
kubectl get pods -A | grep -i snapshot
```

Example output from a Vanilla Kubernetes cluster with the external-snapshotter installed and running:

kube-system snapshot-controller-7f5d798964-6vz5m 1/1 Running 0 76d kube-system snapshot-controller-7f5d798964-mc59t 1/1 Running 0 76d

Verify There Are No Orphan Resources Created by Commvault

Note

If a backup or restore operation is interrupted, Commvault might not have a chance to remove temporary volumesnapshots, volumes, and Commvault worker pods. For simple identification and manual removal, if required, Commvault attaches labels to these resources. For more information, see Restrictions and Known Issues for Kubernetes.

Command to run:

kubectl get pods,pvc,volumesnapshot -l cv-backup-admin= --all-namespaces

Example output from a cluster that has no orphaned objects:
```
No resources found
```

If Necessary, Delete Orphaned Resources

If orphaned resources are listed, delete them.

Command to run:

 
kubectl delete pod|pvc|volumesnapshot -n namespace resource_name

Verify the centos:8 Image Can Be Pulled

Commvault spawns a temporary worker pod for each protected application, namespace, and PersistentVolumeClaim. The Commvault worker pod uses the centos:8 docker hub image. For Commvault data management operations to function, the centos:8 image must be accessible to all nodes where the operations will occur. Verify that your cluster or nodes can pull the centos:8 image. For more information, see System Requirements for Kubernetes.

In air-gapped environments, verify that the centos:8 image can be pulled from all nodes that will have Commvault worker pods running on them. For more information, see Protecting an Air-Gapped Kubernetes Cluster.

Command to run:
```
docker pull centos:8
```

Example output from a cluster that is connected to hub.docker.com for image initial download and updated image download:

8: Pulling from library/centos
a1d0c7532777: Already exists
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:8
docker.io/library/centos:8