NetBackup10001 DeployGuide EKS
NetBackup10001 DeployGuide EKS
Release 10.0.0.1
NetBackup Deployment Guide for Amazon Elastic
Kubernetes Services (EKS) Cluster
Last updated: 2022-06-27
Legal Notice
Copyright © 2022 Veritas Technologies LLC. All rights reserved.
Veritas, the Veritas Logo, and NetBackup are trademarks or registered trademarks of Veritas
Technologies LLC or its affiliates in the U.S. and other countries. Other names may be
trademarks of their respective owners.
This product may contain third-party software for which Veritas is required to provide attribution
to the third party (“Third-party Programs”). Some of the Third-party Programs are available
under open source or free software licenses. The License Agreement accompanying the
Software does not alter any rights or obligations you may have under those open source or
free software licenses. Refer to the Third-party Legal Notices document accompanying this
Veritas product or available at:
https://www.veritas.com/about/legal/license-agreements
The product described in this document is distributed under licenses restricting its use, copying,
distribution, and decompilation/reverse engineering. No part of this document may be
reproduced in any form by any means without prior written authorization of Veritas Technologies
LLC and its licensors, if any.
The Licensed Software and Documentation are deemed to be commercial computer software
as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19
"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, et seq.
"Commercial Computer Software and Commercial Computer Software Documentation," as
applicable, and any successor regulations, whether delivered by Veritas as on premises or
hosted services. Any use, modification, reproduction release, performance, display or disclosure
of the Licensed Software and Documentation by the U.S. Government shall be solely in
accordance with the terms of this Agreement.
Technical Support
Technical Support maintains support centers globally. All support services will be delivered
in accordance with your support agreement and the then-current enterprise technical support
policies. For information about our support offerings and how to contact Technical Support,
visit our website:
https://www.veritas.com/support
You can manage your Veritas account information at the following URL:
https://my.veritas.com
If you have questions regarding an existing support agreement, please email the support
agreement administration team for your region as follows:
Japan [email protected]
Documentation
Make sure that you have the current version of the documentation. Each document displays
the date of the last update on page 2. The latest documentation is available on the Veritas
website:
https://sort.veritas.com/documents
Documentation feedback
Your feedback is important to us. Suggest improvements or report errors or omissions to the
documentation. Include the document title, document version, chapter title, and section title
of the text on which you are reporting. Send feedback to:
You can also see documentation information or ask a question on the Veritas community site:
http://www.veritas.com/community/
https://sort.veritas.com/data/support/SORT_Data_Sheet.pdf
Contents
■ Required terminology
Note: NetBackup deployment for EKS offers only English language support and it
does not support OpsCenter.
Required terminology
The table describes the important terms for NetBackup deployment on EKS cluster.
For more information visit the link to Kubernetes documentation.
Term Description
Pod A Pod is a group of one or more containers, with shared storage and
network resources, and a specification for how to run the containers.
For more information on Pods, see Kubernetes Documentation.
Job Kubernetes jobs ensure that one or more pods execute their
commands and exit successfully. For more information on Jobs, see
Kubernetes Documentation.
Persistent Volume A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
storage classes. For more information on Persistent Volumes, see
Kubernetes Documentation.
Introduction to NetBackup on EKS 11
User roles and permissions
Term Description
Custom Resource A Custom Resource (CR) is an extension of the Kubernetes API that
is not necessarily available in a default Kubernetes installation. For
more information on Custom Resources, see Kubernetes
Documentation.
Custom Resource The CustomResourceDefinition (CRD) API resource lets you define
Definition custom resources. For more information on
CustomResourceDefinitions, see Kubernetes Documentation.
ServiceAccount A service account provides an identity for processes that run in a Pod.
For more information on configuring the service accounts for Pods,
see Kubernetes Documentation.
■ For the custom user, you can change only the password after the deployment.
The changed password will be persisted. If the username is changed after the
deployment, an error message will be logged in the Operator pod.
■ You can delete the secret after the primary server deployment. In that case, if
you want to deploy or scale the media servers, you must create a new secret
with the same username which was used in the primary server CR. The password
can be the same or different. If you change the password, it is also changed in
the primary server pod, and gets persisted.
■ Do not create a local user in the pods (using the kubectl exec or useradd
commands) as this user may or may not be persisted.
■ The Amazon Web Service user is supported through Single Sign-on (SSO). For
the detailed user integration information, refer to the NetBackup Administrator’s
Guide Volume I.
■ An nbitanalyticsadmin user is available in primary server container. This user
is used as Master Server User ID while creating data collector policy for data
collection on NetBackup IT Analytics portal.
■ Service account that is used for this deployment is netbackup-account and it
is defined in the operator_deployment.yaml.
■ NetBackup runs most of the primary server services and daemons as non-root
user (nbsvcusr) and only root and nbsvcusr are supported as a service account
user.
■ ClusterRole named netbackup-role is set in the NetBackup Operator to define
the cluster wide permissions to the resources. This is defined in the
operator_deployment.yaml.
■ Appropriate roles and EKS specific permissions are set to the cluster at the time
of cluster creation.
■ After successful deployment of the primary and media servers, the operator
creates a custom Kubernetes role with name <resourceNamePrefix>-admin
whereas resourceNamePrefix is given in primary server or media server CR
specification.
The following permissions are provided in the respective namespaces:
Introduction to NetBackup on EKS 13
User roles and permissions
This role can be assigned to the NetBackup Administrator to view the pods that
were created, and to execute into them. For more information on the access
control, see Kubernetes Access Control Documentation.
Note: One role would be created, only if primary and media servers are in same
namespace with the same resource name prefix.
Table 1-2
Resource Name API Group Allowed Operations
Prerequisites
Ensure that the following prerequisites are met before proceeding with the
deployment.
■ A Kubernetes cluster in Amazon Elastic Kubernetes Service in AWS with multiple
nodes. Using separate node group is recommended for the NetBackup and
MSDP Scaleout deployments.
■ Define one or more storage classes in the Kubernetes cluster.
Deployment with environment operators 19
About deployment with the environment operator
■ Access to a container registry that the Kubernetes cluster can access, like an
Amazon Elastic Kubernetes Service Container Registry.
■ Install Cert-Manager. You can use the following command to install the
Cert-Manager:
$ kubectl apply -f
https://github.com/jetstack/cert-manager/releases/download/v1.6.0/cert-manager.yaml
For details, see https://cert-manager.io/docs/installation/
■ A workstation or VM running Linux with the following:
■ Configure kubectl to access the cluster.
■ Install AWS CLI to access AWS resources.
■ Configure docker to be able to push images to the container registry.
■ Free space of approximately 8.5GB on the location where you copy and
extract the product installation TAR package file. If using docker locally, there
should be approximately 8GB available on the /var/lib/docker location
so that the images can be loaded to the docker cache, before being pushed
to the container registry.
Item Description
OCI images in the These docker image files that are loaded and then copied to
/images directory the container registry to run in Kubernetes. They include
NetBackup and MSDP Scaleout application images and the
operator images.
MSDP kubectl plug-in at Used to deploy and manage the MSDP Scaleout operator
/bin/kubectl-msdp tasks.
Configuration(.yaml) files at You can edit these to suit your configuration requirements
/operator directory before installation.
Sample product (.yaml) files You can use these as templates to define your NetBackup
at /samples directory environment.
Known limitations
Here are some known limitations.
■ Changes to the CorePattern which specifies the path used for storing core dump
files in case of a crash are not supported. CorePattern can only be set during
initial deployment.
■ Changes to MSDP Scaleout credential autoDelete, which allows automatic
deletion of credential after use, is not supported. The autoDelete value can only
be set during initial deployment.
Run the command docker image ls to confirm that the product images are
loaded properly to the docker cache.
Deployment with environment operators 21
Deploying the operators manually
3 Run the following commands to re-tag the images to associate them with your
container registry.
$
REGISTRY=<<SubscriptionID>.dkr.ecr.<zone>.amazonaws.com/<registryName>>
4 Run the following commands to push the images to the container registry.
$ docker push ${REGISTRY}/netbackup/main:10.0.0.1
6 Install the MSDP Scaleout operator in the created namespace, using this
command. To run this command you must define a full image name in step 3,
define a storage class for storing logs from the MSDP operator, and define
node selector labels (optional) for scheduling the MSDP operator pod on specific
nodes. See “Prerequisites” on page 18.
$ kubectl msdp init --image ${REGISTRY}/msdp-operator:16.0.1
--storageclass x --namespace netbackup-operator-system -l
key1=value1
Deployment with environment operators 22
Deploying the operators manually
images:
- name: netbackupoperator
newName: example.com/netbackup/operator
newTag: '10.0.0.1'
nodeSelector:
nbu_node: 'true'
9 To install the NetBackup operator, run the following command from the installer's
root directory:
$ kubectl apply -k operator
Where, nb-example is the name of the namespace. The Primary, Media, and
MSDP Scaleout application namespace must be different from the one used
by the operators. It is recommended to use two namespaces. One for the
operators, and a second one for the applications.
2 Create a secret to hold the primary server credentials. Those credentials are
configured in the NetBackup primary server, and other resources in the
NetBackup environment use them to communicate with and configure the
primary server. The secret must include fields for `username` and `password`.
If you are creating the secret by YAML, the type should be opaque or basic-auth.
For example:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials
namespace: nb-example
type: kubernetes.io/basic-auth
stringData:
username: nbuser
password: p@ssw0rd
3 Create a KMS DB secret to hold Host Master Key ID (`HMKID`), Host Master
Key passphrase (`HMKpassphrase`), Key Protection Key ID (`KPKID`), and
Key Protection Key passphrase (`KPKpassphrase`) for NetBackup Key
Management Service. If creating the secret by YAML, the type should be
_opaque_. For example:
apiVersion: v1
kind: Secret
metadata:
name: example-key-secret
namespace: nb-example
type: Opaque
stringData:
HMKID: HMKID
HMKpassphrase: HMKpassphrase
KPKID: KPKID
KPKpassphrase: KPKpassphrase
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic example-key-secret --namespace
nb-namespace --from-literal=HMKID="HMKID"
--from-literal=HMKpassphrase="HMKpassphrase"
--from-literal=KPKID="KPKID"
--from-literal=KPKpassphrase="KPKpassphrase"
4 Create a secret to hold the MSDP Scaleout credentials for the storage server.
The secret must include fields for `username` and `password` and must be
located in the same namespace as the Environment resource. If creating the
secret by YAML, the type should be _opaque_ or _basic-auth_. For example:
apiVersion: v1
kind: Secret
metadata:
name: msdp-secret1
namespace: nb-example
type: kubernetes.io/basic-auth
stringData:
username: nbuser
password: p@ssw0rd
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic msdp-secret1 --namespace
nb-example --from-literal=username='nbuser'
--from-literal=password='p@ssw0rd'
Note: You can use the same secret for the primary server credentials (from
step 2) and the MSDP Scaleout credentials, so the following step is optional.
However, to use the primary server secret in an MSDP Scaleout, you must set
the `credential.autoDelete` property to false. The sample file includes an
example of setting the property. The default value is true, in which case the
secret may be deleted before all parts of the environment have finished using
it.
Deployment with environment operators 26
Deploying NetBackup and MSDP Scaleout manually
5 (Optional) Create a secret to hold the KMS key details. Specify KMS Key only
if the KMS Key Group does not already exist and you need to create.
Note: When reusing storage from previous deployment, the KMS Key Group
and KMS Key may already exist. In this case, provide KMS Key Group only.
If creating the secret by YAML, the type should be _opaque_. For example:
apiVersion: v1
kind: Secret
metadata:
name: example-key-secret
namespace: nb-example
type: Opaque
stringData:
username: nbuser
passphrase: 'test passphrase'
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic example-key-secret --namespace
nb-example --from-literal=username="nbuser"
--from-literal=passphrase="test passphrase"
You may need this key for future data recovery. After you have successfully
deployed and saved the key details. It is recommended that you delete this
secret and the corresponding key info secret.
6 Configure the samples/environment.yaml file according to your requirements.
This file defines a primary server, media servers, and scale out MSDP Scaleout
storage servers. See “Configuring the environment.yaml file” on page 27. for
details.
7 Apply the environment yaml file, using the same application namespace created
in step 1.
$ kubectl apply --namespace nb-example --filename environment.yaml
Use this command to verify the new environment resource in your cluster:
$ kubectl get --namespace nb-example environments
NAME AGE
environment-sample 2m
Deployment with environment operators 27
Configuring the environment.yaml file
After a few minutes, NetBackup finishes starting up on the primary server, and
then the media servers and MSDP Scaleout storage servers you configured
in the environment resource start appearing. Run:
$ kubectl get --namespace nb-example
all,environments,primaryservers,mediaservers,msdpscaleouts
NAME STATUS
environment.netbackup.veritas.com/environment-sample Success
8 To start using your newly deployed environment sign-in to NetBackup web UI.
Open a web browser and navigate to
https://<primaryserver>/webui/login URL.
The primary server is the host name or IP address of the NetBackup primary
server.
You can retrieve the primary server's hostname by using the command:
$ kubectl describe primaryserver.netbackup.veritas.com/<primary
server CR name>--namespace <namespace_name>
Parameter Description
namespace: example-ns Specify the namespace where all the NetBackup resources are
managed. If not specified here, then it will be the current
namespace when you run the command kubectl apply -f
on this file.
tag: '10.0.0.1' This tag is used for all images in the environment. Specifying a
`tag` value on a sub-resource affects the images for that
sub-resource only. For example, if you apply an EEB that affects
only primary servers, you might set the `primary.tag` to the custom
tag of that EEB. The primary server runs with that image, but the
media servers and MSDP scaleouts continue to run images
tagged `10.0.0.1`. Beware that the values that look like numbers
are treated as numbers in YAML even though this field needs to
be a string; quote this to avoid misinterpretation.
licenseKeys: List the license keys that are shared among all the sub-resources.
Licenses specified in a sub-resource are appended to this list
and applied only to the sub-resource.
corePattern: Specify the path to use for storing core files in case of a crash.
/corefiles/core.%e.%p.%t
loadBalancerAnnotations: Specify the annotations to be added for the network load balancer
servcie.beta.kuberneteso
.i/aws-o
l ad-baa
l ncer-subnets:
example-subnet1 name
The following configurations apply to the primary server. The values specified in
the following table can override the values specified in the table above.
Deployment with environment operators 29
Configuring the environment.yaml file
Paragraph Description
labelValue: linux
Paragraph Description
capacity: 30Gi
storageClassName: gp2
Deployment with environment operators 31
Configuring the environment.yaml file
The following section describes the media server configurations. If you do not have
a media server either remove this section from the configuration file entirely, or
define it as an empty list.
parameters Description
labelValue: linux
parameters Description
ipAddr: 4.3.2.3
fqdn: media1-2.example.com
The following section describes MSDP-related parameters. You may also deploy
without any MSDP scaleouts. In that case, remove the msdpScaleouts section
entirely from the configuration file.
Parameter Description
tag: '16.0.1' This tag overrides the one defined in the table
1-3. It is necessary because the MSDP
Scaleout images are shipped with tags
different from the NetBackup primary and
media images.
Deployment with environment operators 33
Configuring the environment.yaml file
Parameter Description
fqdn: dedupe1-2.example.com
ipAddr: 1.2.3.6
fqdn: dedupe1-3.example.com
ipAddr: 1.2.3.7
fqdn: dedupe1-4.example.com
Parameter Description
Parameter Description
labelValue: linux
Parameter Description
resourceNamePrefix Specifies the prefix name for the primary, media, and MSDP
Scaleout server resources.
Note: Replace the environment custom resource names as per your configuration
in the steps below.
2 Wait for all the pods, services and resources to be terminated. To confirm, run
$ kubectl get --namespace <namespce_name>
all,environments,primaryservers,mediaservers,msdpscaleouts
You should get a message that no resources were found in the nb-example
namespace.
3 To identify and delete any outstanding persistent volume claims, run the
following:
$ kubectl get pvc --namespace <namespce_name>
4 To locate and delete any persistent volumes created by the deployment, run:
$ kubectl get pv
Note: Certain storage drivers may cause physical volumes to get stuck in the
terminating state. To resolve this issue, remove the finalizer, using the
command: $ kubectl patch pv <pv-name> -p
'{"metadata":{"finalizers":null}}
Note: Do not remove the MSDP Scaleout operator first as it may corrupt the
NetBackup operator.
DD_TAG=16.0.1-update1 ./deploy.sh
5 When the script prompts to tag and push images, wait. Open another terminal
window and re-tag the MSDP Scaleout images as:
docker tag msdp-operator:16.0.1 msdp-operator:16.0.1-update1
6 Return to the deploy script and when prompted, enter yes to tag and push the
images. Wait for the images to be pushed, and then the script will pause to
ask another question. The remaining questions are not required, so press
Ctrl+c to exit the deploy script.
The command prints the name of the image and includes the SHA-256 hash
identifying the image. For example:
example.dkr.ecr.us-east-2.amazonaws.com/
2 To restart the NetBackup operator, run:
pod=$(kubectl get pod -n netbackup-operator-system -l
nb-control-plane=nb-controller-manager -o jsonpath --template
'{.items[*].metadata.name}')
3 Re-run the kubectl command from earlier to get the image ID of the NetBackup
operator. Confirm that it's different from what it was before the update.
Deployment with environment operators 39
Applying security patches
3 Re-run the kubectl command from earlier to get the image ID of the MSDP
Scaleout operator. Confirm that it's different from what it was before the update.
2 Get the image ID of the existing NetBackup container and record it for later.
Run:
kubectl get pods -n nb-example $pod -o jsonpath --template
"{.status.containerStatuses[*].imageID}{'\n'}"
3 Look at the list of StatefulSets in the application namespace and identify the
one that corresponds to the pod or pods to be updated. The name is typically
the same as the pod, but without the number at the end. For example, a pod
named nb-primary-0 is associated with statefulset nb-primary. Hereafter the
statefulset will be referred to as $set. Run:
kubectl get statefulsets -n nb-example
The pod or pods associated with the statefulset are terminated and be
re-created. It may take several minutes to reach the "Running" state.
5 Once the pods are running, re-run the kubectl command from step 2 to get the
image ID of the new NetBackup container. Confirm that it's different from what
it was before the update.
Deployment with environment operators 40
Applying security patches
2 Get the image IDs of the existing MSDP Scaleout containers and record them
for later. All the MDS pods use the same image, and all the engine pods use
the same image, so it's only necessary to get three image IDs, one for each
type of pod.
kubectl get pods -n nb-example $engine $controller $mds -o
jsonpath --template "{range
.items[*]}{.status.containerStatuses[*].imageID}{'\n'}{end}"
...
spec:
...
msdpScaleouts:
- ...
tag: "16.0.1-update1"
4 Save the file and close the editor. The MSDP Scaleout pods are terminated
and re-created. It may take several minutes for all the pods to reach the
"Running" state.
5 Run kubectl get pods, to check the list of pods and note the new name of
the uss-controller pod. Then, once the pods are all ready, re-run the kubectl
command above to get the image IDs of the new MSDP Scaleout containers.
Confirm that they're different from what they were before the update.
Chapter 3
Assessing cluster
configuration before
deployment
This chapter includes the following topics:
■ Media server:
■ Data volume size: 50Gi
■ Log volume size: 30Gi
■ Following are the Config-Checker modes that can be specified in the Primary
and Media CR:
■ Default: This mode executes the Config-Checker. If the execution is
successful, the Primary and Media CRs deployment is started.
■ Dryrun: This mode only executes the Config-Checker to verify the
configuration requirements but does not start the CR deployment.
Assessing cluster configuration before deployment 43
Config-Checker execution and status details
■ Status of the Config-Checker can be retrieved from the primary server and media
server CRs by using the kubectl describe <PrimaryServer/MediaServer>
<CR name> -n <namespace> command.
For example, kubectl describe primaryservers environment-sample -n
test
■ Apply the CR again. Add the required data which was deleted earlier at
correct location, save it and apply the yaml using kubectl apply -f
<environment.yaml> command.
Chapter 4
Deploying NetBackup
This chapter includes the following topics:
■ Create a nodegroup with only one availability zone and instance type should
be of atleast m5.4xlarge configuration.
The nodepool uses AWS manual or autoscaling group feature which allows
your nodepool to scale by provisioning and de-provisioning the nodes as
required automatically.
Note: All the nodes in node group must be running on the Linux operating
system.
2 Use an existing AWS Elastic Container Registry or create a new one and
ensure that the EKS has full access to pull images from the elastic container
registry.
3 Deploy aws load balancer controller add-on in the cluster.
For more information on installing the add-on, see Installing the AWS Load
Balancer Controller add-on.
4 Install cert-manager by using the following command:
$ kubectl apply -f
https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml
6 Create a storage class with EBS storage type and retain reclaim policy. Provide
the name of the created EBS storage type in the storageclass name field
during the deployment of primary server/ media server CR.
For more information on creating the storage class with EBS provisioner, see
Kubernetes documentation.
7 If NetBackup client is outside VPC or if you want to access the WEB UI from
outside VPC then NetBackup client CIDR must be added with all NetBackup
ports in security group inbound rule of cluster. For more information on
NetBackup ports, See “About the Load Balancer service” on page 85..
■ To obtain the cluster security group, run the following command:
aws eks describe-cluster --name <my-cluster> --query
cluster.resourcesVpcConfig.clusterSecurityGroupId
■ The following link helps to add inbound rule to the security group:
Add rules to a security group
Host-specific requirements
1 Install AWS CLI.
For more information on installing the AWS CLI, see Installing or updating the
latest version of the AWS CLI.
2 Install Kubectl CLI.
For more information on installing the Kubectl CLI, see Installing kubectl.
3 Configure docker to enable the push of the container images to the container
registry.
4 Create the OIDC provider for the AWS EKS cluster.
For more information on creating the OIDC provider, see Create an IAM OIDC
provider for your cluster.
5 Create an IAM service account for the AWS EKS cluster.
For more information on creating an IAM service account, see Amazon EFS
CSI driver.
6 If an IAM role needs an access to the EKS cluster, run the following command
from the system that already has access to the EKS cluster:
kubectl edit -n kube-system configmap/aws-auth
For more information on creating an IAM role, see Enabling IAM user and role
access to your cluster.
Deploying NetBackup 47
Recommendations of NetBackup deployment on EKS
8 Free space of approximately 8.5GB on the location where you copy and extract
the product installation TAR package file. If using docker locally, there should
be approximately 8GB available on the /var/lib/docker location so that the
images can be loaded to the docker cache, before being pushed to the container
registry.
■ Deploy primary server custom resource and media server custom resource in
same namespace.
■ Ensure that you follow the symbolic link and edit the actual persisted version of
the file, if you want to edit a file having a symbolic link in the primary server or
media server.
■ A storage class that has the storage type as EBS is not supported. When the
Config-Checker runs the validation for checking the storage type, the
Config-Checker job fails if it detects the storage type as EBS. But if the
Config-Checker is skipped then this validation is not run, and there can be issues
in the deployment. There is no workaround available for this limitation. You must
clean up the PVCs and CRs and reapply the CRs.
■ Media server scale down is not supported. Certain workloads that require media
server affinity for the clients would not work.
■ External Certificate Authority (ECA) is not supported.
■ In case of load balancer service updating the CR with dynamic IP address to
static IP address and vice versa is not allowed.
■ Media server pods as NetBackup storage targets are not supported. For example,
NetBackup storage targets like AdvancedDisk and so on are not supported on
the media server pods.
■ Before the CRs can be deployed, the utility called Config-Checker is executed
that performs checks on the environment to ensure that it meets the basic
deployment requirements. The config-check is done according to the
configCheckMode and paused values provided in the custom resource YAML.
Deploying NetBackup 49
About primary server CR and media server CR
For more information, refer to See “How does the Config-Checker utility work”
on page 41.
■ You can deploy the primary server and media server CRs in same namespace.
■ Use the storage class that has the storage type as Amazon elastic files for
the catalog and log volumes in the primary server CR, and data and log volumes
in the media server CR.
■ During fresh installation of the NetBackup servers, the value for keep logs up
to under log retention configuration is set based on the log storage capacity
provided in the respective primary server or media server CR inputs. You may
change this value if required.
■ The NetBackup deployment sets the value as per the formula.
Size of logs PVC/PV * 0.8 = Keep logs up value By default, the default value
is set to 24GB.
For example: If the user configures the storage size in the CR as 40GB
(instead of the default 30GB) then the default value for that option become
32GB automatically based on the formula.
■ Deployment details of primary server and media server can be observed from
the operator pod logs using the following command:
kubectl logs <operator-pod-name> -c netbackup-operator -n
<operator-namespace>
Following table describes the primary server CR and media server CR status fields:
Deploying NetBackup 51
Monitoring the status of the CRs
Table 4-1
Section Field / Value Description
Following tables describe the specs that can be edited for each CR.
Spec Description
Spec Description
If you edit any other fields, the deployment can go into an inconsistent state.
Notes:
■ Deleting a CR will delete all its child resources like pod, statefulset, services,
configmaps, config checker job, config checker pod.
■ Deleting operator with kubectl delete -k <operator_folder_path> will
delete the CRs and its resources except the PVC.
■ Persistent volume claim (PVC) will not be deleted upon deleting a CR so that
the data is retained in the volumes. Then if you create a new CR with the same
name as the deleted one, the existing PVC with that same name will be
automatically linked to the newly created pods.
■ Do not delete "/mnt/nbdata" and "/mnt/nblogs" folders manually from primary
server and media pods. The NetBackup deployment will go into an inconsistent
state and will also result in data loss.
3 Create and copy NetBackup API key from NetBackup web UI.
Configuring the primary server with NetBackup IT Analytics tools is supported only
once from primary server CR.
For more information about IT Analytics data collector policy, see Add a Veritas
NetBackup Data Collector policy and for more information about adding NetBackup
Primary Servers within the Data Collector policy, see Add/Edit NetBackup Master
Servers within the Data Collector policy.
To change the already configured public key
1 Execute the following command in the primary server pod:
kubectl exec -it -n <namespace> <primaryServer-pod-name> --
/bin/bash
3 Restart the sshd service using the systemctl restart sshd command.
After adding the VxUpdate package to nbrepo, this package is persisted even
after pod restarts.
5 Edit environment CR object, update node selector with the same node selector
value and set paused to false.
Pods are created on new node group with the same persistent storage volumes.
All the pods will be in ready and running state.
Chapter 5
Deploying MSDP Scaleout
This chapter includes the following topics:
■ Prerequisites
Step 1 Install the docker images and See “Installing the docker images and
binaries. binaries” on page 62.
Prerequisites
A working Amazon Elastic Kubernetes Service (EKS
cluster)
■ AWS Kubernetes cluster
■ Your AWS Kubernetes cluster must be created with appropriate network and
configuration settings.
Supported AWS Kubernetes cluster version is 1.21.x and later.
■ The node group in EKS should not cross availability zone.
■ At least one storage class that is backed with Amazon EBS CSI storage
driver ebs.csi.aws.com or with the default provisioner
kubernetes.io/aws-ebs, and allows volume expansion. The built-in storage
class is gp2. It is recommended that the storage class has "Retain" reclaim
policy.
■ AWS Load Balancer controller must be installed on EKS.
■ A Kubernetes Secret that contains the MSDP credentials is required.
See “ Secret” on page 138.
■ Node Group
You must have a dedicated node group for MSDP Scaleout created. The node
group should not cross availability zone.
The AWS Auto Scaling allows your node group to scale dynamically as required.
If AWS Auto Scaling is not enabled, ensure the node number is not less than
MSDP Scaleout size.
It is recommended that you set the minimum node number to 1 or more to bypass
some limitations in EKS.
■ Client machine to access EKS cluster
■ A separate computer that can access and manage your EKS cluster and
ECR.
■ It must have Linux operating system.
■ It must have Docker daemon, the Kubernetes command-line tool (kubectl),
and AWS CLI installed.
The Docker storage size must be more than 6 GB. The version of kubectl
must be v1.19.x or later. The version of AWS CLI must meet the EKS cluster
requirements.
■ If EKS is a private cluster, see Create a private Azure Kubernetes Service
cluster.
■ If the internal IPs are used, reserve N internal IPs and make sure they are not
used. N matches the MSDP-X cluster size which is to be configured.
These IPs are used for network load balancer services. For the private IPs,
please do not use the same subnet with the node group to avoid IP conflict with
the secondary private IPs used in the node group.
For the DNS name, you can use the Private IP DNS name amazon provided,
or you can create DNS and Reverse DNS entries under Route53.
HOST_HAS_NAT_ENDPOINTS = YES
Deploying MSDP Scaleout 62
Installing the docker images and binaries
net.ipv4.tcp_keepalive_time=120
net.core.somaxconn = 1024
Tune the max open files to 1048576 if you run concurrent jobs.
3 Copy MSDP kubectl plugin to a directory from where you access EKS host.
This directory can be configured in the PATH environment variable so that
kubectl can load kubectl-msdp as a plugin automatically.
For example,
cp ./VRTSpddek-*/bin/kubectl-msdp /usr/local/bin/
■ Create a repository.
Deploying MSDP Scaleout 63
Initializing the MSDP operator
Option Description
Range: 1-365
Default value: 28
Range: 1-20
Default value: 20
Deploying MSDP Scaleout 64
Configuring MSDP Scaleout
Option Description
In the STATUS column, if the readiness state for the controller, MDS and
engine pods are all Running, it means that the configuration has completed
successfully.
In the READINESS GATES column for engines, 1/1 indicates that the engine
configuration has completed successfully.
8 If you specified spec.autoRegisterOST.enabled: true in the CR, when the
MSDP engines are configured, the MSDP operator automatically registers the
storage server, a default disk pool, and a default storage unit in the NetBackup
primary server.
A field ostAutoRegisterStatus in the Status section indicates the registration
status. If ostAutoRegisterStatus.registered is True, it means that the
registration has completed successfully.
You can run the following command to check the status:
kubectl get msdpscaleouts.msdp.veritas.com -n <sample-namespace>
You can find the storage server, the default disk pool, and storage unit on the
Web UI of the NetBackup primary server.
Deploying MSDP Scaleout 66
Using MSDP Scaleout as a single storage pool in NetBackup
■ Telemetry reporting
Table 6-1
Action Description Probe name Primary Media server
server (seconds)
(seconds)
Heath probes are run using the nbu-health command. If you want to manually run
the nbu-health command, the following options are available:
■ Disable
This option disables the health check that will mark pod as not ready (0/1).
■ Enable
This option enables the already disabled health check in the pod. This marks
the pod in ready state(1/1) again if all the NetBackup health checks are passed.
■ Deactivate
This option deactivates the health probe functionality in pod. Pod remains in
ready state(1/1). This will avoid pod restarts due to health probes like liveness,
readiness probe failure. This is the temporary step and not recommended to
use in usual case.
■
Monitoring NetBackup 69
Telemetry reporting
■ Activate
This option activates the health probe functionality that has been deactivated
earlier using the deactivate option.
You can manually disable or enable the probes if required. For example, if for any
reason you need to exec into the pod and restart the NetBackup services, the health
probes should be disabled before restarting the services, and then they should be
enabled again after successfully restarting the NetBackup services. If you do not
disable the health probes during this process, the pod may restart due to the failed
health probes.
You can check pod events in case of probe failures to get more details using
the kubectl describe <primary/media-pod-name> -n <namesapce>
command.
Telemetry reporting
Telemetry reporting entries for the NetBackup deployment on EKS are indicated
with the EKS based deployments text.
■ By default, the telemetry data is saved at the /var/veritas/nbtelemetry/
location. The default location will not persisted during the pod restarts.
■ If you want to save telemetry data to persisted location, execute the kubectl
exec -it -n <namespace> <primary/media_server_pod_name> - /bin/bash
command in the pod using the and execute telemetry command using
Monitoring NetBackup 70
About NetBackup operator logs
■ NetBackup operator provides different log levels that can be changed before
deployment of NetBackup operator.
The following log levels are provided:
■ -1 - Debug
■ 0 - Info
■ 1 - Warn
■ 2 - Error
By default, the log level is 0.
It is recommended to use 0, 1, or 2 log level depending on your requirement.
Before you deploy NetBackup operator, you can change the log levels using
operator_patch.yaml.
After deployment if user changes operator log level, to reflect it, user has to
perform the following steps:
■ Apply the operator changes using the kubectl apply -k
<operator-folder> command.
■ Restart the operator pod. Delete the pod using the kubectl delete
pod/<netbackup-opertaor-pod-name> -n <namespace> command.
Kubernetes will recreate the NetBackup operator pod again after deletion.
■ Config-Checker jobs that run before deployment of primary server and media
server creates the pod. The logs for config checker executions can be checked
using the kubectl logs <configchecker-pod-name> -n
<netbackup-operator-namespace> command.
■ Installation logs of NetBackup primary server and media server can be retrieved
using any of the following methods:
■ Run the kubectl logs <primaryServer/MediaServer-Pod-Name> -n
<primaryServer/mediaServer namespace> command.
Monitoring NetBackup 71
Expanding storage volumes
■ Execute the following command in the primary server/media server pod and
check the /mnt/nblogs/setup-server.log file:
kubectl exec -it -n <PrimaryServer/MediaServer-namespace>
<primaryServer/MediaServer-Pod-Name> -- bash
2 To pause the reconciler of the particular custom resource, change the Paused
value to true in the primaryServer or mediaServer section and save the
changes. In case of multiple media server objects change Paused value to
true for respective media server object only.
3 Edit StatefulSet of primary server or particular media server object using
thekubectl edit <statfulset name> -n <namespace> command, change
replica count to 0 and wait for all pods to terminate for the particular CR object.
4 Update all the persistent volume claim which expects capacity resize with the
kubectl edit pvc <pvcName> -n <namespace> command. In case of
particular media server object, resize respective PVC with expected storage
capacity for all its replicas.
5 Update the respective custom resource section using the kubectl edit
Environment <environmentCR_name> -n <namespace> command with updated
storage capacity for respective volume and change Paused = false. Save
updated custom resource.
To update the storage details for respective volume, add storage section with
specific volume and its capacity in respective primaryServer or mediaServer
section in environment CR.
Earlier terminated pod and StatefulSet must get recreated and running
successfully. Pod should get linked to respective persistent volume claim and
data must have been persisted.
Monitoring NetBackup 72
Allocating static PV for Primary and Media pods
6 Run the kubectl get pvc -n <namespace> command and check for capacity
column in result to check the persistent volume claim storage capacity is
expanded.
7 (Optional) Update the log retention configuration for NetBackup depending on
the updated storage capacity.
For more information, refer to the NetBackup™ Administrator's Guide,
Volume I
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp2-reclaim
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
fsType: ext4
type: gp2
Example If user wants to deploy a primary For this scenario, you must create total 8
1 server and media server with disks, 8 PV and 8 PVCs.
replica count 3.
2 disks, 2 PV and 2 PVCs for primary
Names of the Primary and Media server.
PVC assuming
6 disks, 6 PV and 6 PVCs for media
resourceNamePrefix_of_primary
server.
is testprimary and
resourceNamePrefix_of_media is Following will be the names for primary
testmedia. server volumes
For Catalog:
■ catalog-testprimary-primary-0
For logs:
■ logs-testprimary-primary-0
For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-10
■ data-testmedia-media-2
For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
Monitoring NetBackup 74
Allocating static PV for Primary and Media pods
Example If user wants to deploy a primary For this scenario, you must create 12
2 server and media server with disks, 12 PV and 12 PVCs
replica count 5
2 disks, 2 PV and 2 PVCs for primary
Names of the Primary and Media server.
PVC assuming
10 disks, 10 PV and 10 PVCs for media
resourceNamePrefix_of_primary
server.
is testprimary and
resourceNamePrefix_of_media is Following will be the names for primary
testmedia. server volumes
For Catalog:
■ catalog-testprimary-primary-0
For logs:
■ logs-testprimary-primary-0
For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-2
■ data-testmedia-media-3
■ data-testmedia-media-4
For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
■ logs-testmedia-media-3
■ logs-testmedia-media-4
3 Create the required number of AWS EBS volumes and save the VolumeId of
newly created volumes.
For more information on creating EBS volumes, see EBS volumes.
Monitoring NetBackup 75
Allocating static PV for Primary and Media pods
apiVersion: v1
kind: PersistentVolume
metadata:
name: catalog
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: xfs
volumeID: aws://us-east-2c/vol-xxxxxxxxxxxxxxxxx
capacity:
storage: 128Gi
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2-retain
volumeMode: Filesystem
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: catalog-testprimary-primary-0
namespace: test
Monitoring NetBackup 76
Allocating static PV for Primary and Media pods
5 Create PVC with correct PVC name (step 2), storage class and storage.
For example,
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: catalog-testprimary-primary-0
namespace: test
spec:
storageClassName: gp2-retain
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Gi
"name": "ip-x-x-x-x.ec2.internal",
"nodeName": "ip-x-x-x-x.ec2.internal",
"pvc": [
{
"pvcName": "ip-x-x-x-x.ec2.internal-catalog",
"stats": {
"availableBytes": "604539.68Mi",
"capacityBytes": "604629.16Mi",
"percentageUsed": "0.01%",
"usedBytes": "73.48Mi"
}
},
{
"pvcName": "ip-x-x-x-x.ec2.internal-data-0",
"stats": {
"availableBytes": "4160957.62Mi",
"capacityBytes": "4161107.91Mi",
"percentageUsed": "0.00%",
"usedBytes": "134.29Mi"
}
}
],
"ready": "True"
},
name: prometheus-cwagentconfig
namespace: amazon-cloudwatch
---
# create configmap for prometheus scrape config
apiVersion: v1
data:
# prometheus config
prometheus.yaml: |
global:
scrape_interval: 1m
scrape_timeout: 10s
scrape_configs:
- job_name: 'msdpoperator-metrics'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount
/token
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_
prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- source_labels: [__meta_kubernetes_namespace]
action: replace
Monitoring MSDP Scaleout 82
Monitoring with Amazon CloudWatch
target_label: NameSpace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: PodName
kind: ConfigMap
metadata:
name: prometheus-config
namespace: amazon-cloudwatch
Table 7-1 lists the Prometheus metrics that MSDP Scaleout supports.
4 Apply the YAML file.
Kubectl apply -f Prometheus-eks.yaml
If multiple MSDP scaleout clusters are deployed in the same EKS cluster, use
the filter to search the results. For example, search the MSDP engines with
the free space size lower than 1GB in the namespace sample-cr-namespace.
Log query:
■ Run the following command to find the Kubernetes cluster level resources that
belong to the CR:
kubectl api-resources --verbs=list --namespaced=false -o name |
xargs -n 1 -i bash -c 'kubectl get --show-kind --show-labels
--ignore-not-found {} |grep [msdp-operator|<cr-name>]'
Chapter 8
Managing the Load
Balancer service
This chapter includes the following topics:
networkLoadBalancer:
type: Private
annotations:
service.beta.kubernetes.io/aws-load-balancer-subnets: <subnet1 name>
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes:
preserve_client_ip.enabled=true
ipList:
"10.244.33.27: abc.vxindia.veritas.com"
networkLoadBalancer:
type: Private
annotations:
service.beta.kubernetes.io/aws-load-balancer-subnets: <subnet1 name>
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes:
preserve_client_ip.enabled=true
ipList:
"10.244.33.28: pqr.vxindia.veritas.com"
"10.244.33.29: xyz.vxindia.veritas.com"
Used as bidirectional port. Primary server to/from media servers and primary
server to/from client require this TCP port for communication.
■ 8443
Used to inbound to java nbwmc on the primary server.
■ 443
Used to inbound to vnet proxy tunnel on the primary server. Also, this is used
Nutanix workload, communication from primary server to the deduplication
media server.
■ 13781
The MQBroker is listening on TCP port 13781. NetBackup client hosts -
typically located behind a NAT gateway - be able to connect to the message
queue broker (MQBroker) on the primary server.
■ 13782
Used by primary server for bpcd process.
■ Port 22
Used by NetBackup IT Analytics data collector for data collection.
■ Media server:
■ 1556
Used as bidirectional port. Primary server to/from media servers and primary
server to/from client require this TCP port for communication.
■ 13782
Used by media server for bpcd process.
Note: Be caution while performing this step, this may lead to data loss.
Managing the Load Balancer service 88
Opening the ports from the Load Balancer service
■ Before using the DNS and its respective IP address in CR yaml, you can verify
the IP address and its DNS resolution using nslookup.
■ In case of media server scaleout, ensure that the number of IP addresses
mentioned in IPList in networkLoadBalancer section matches the replica count.
■ If nslookup is done for loadbalancer IP inside the container, it returns the DNS
in the form of <svc name>.<namespace_name>.svc.cluster.local. This is
Kubernetes behavior. Outside the pod, the loadbalancer service IP address is
resolved to the configured DNS. The nbbptestconnection command inside
the pods can provide a mismatch in DNS names, which can be ignored.
For example:
■ For primary server load balancer service:
■ Service name starts with resourcePrefixName of primary server like
<resourcePrefixName>-primary. Edit the service with the kubectl
edit service <resourcePrefixName>-primary -n <namespace>
command.
■ Each replica of media server has its own load balancer service with
name <resourcePrefixName>-media-<ordinal number>. For example,
replica 2 of media server has a load balancer service with name
<resourcePrefixName>-media-1.
■ You must modify service for specific replica with the kubectl edit
service <resourcePrefixName>-media-<replica-ordinal number>
-n <namespace> command.
3 Add entry for new port in ports array in specification field of the service. For
example, if user want to add 111 port, then add the following entry in ports
array in specification field.
name: custom-111
port: 111
protocol: TCP
targetPort: 111
■ Backing up a catalog
■ Restoring a catalog
Backing up a catalog
You can back up a catalog.
To back up a catalog
1 Exec into the primary server pod using the following command:
kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash
5 Exec into the primary server pod using the following command:
kuebctl exec -it -n <namespace> <primaryserver pod name> -- bash
Restoring a catalog
You can restore a catalog.
To restore a catalog
1 Copy DRPackages files (packages) located at /mnt/nblogs/DRPackages/
from the pod to the host machine from where Amazon Elastic Kubernetes
Service cluster is accessed.
Run the kubectl cp
<primary-pod-namespace>/<primary-pod-name>:/mnt/nblogs/DRPackages
<Path_where_to_copy_on_host_machine> command.
6 Delete primary server PVC (catalog, log) using the kubectl delete pvc
<pvc-name> -n <namespace> command.
7 Delete the PV linked to primary server PVC using the kubectl delete pv
<pv-name> command.
9 After the primary server pod is in ready state, change CR spec from paused
to true in primary server section in environment.yaml and reapply yaml with
the kubectl apply -f environment.yaml -n <namespace> command.
10 Execute the kubectl exec -it -n <namespace> <primary-pod-name> --
/bin/bash command in the primary server pod.
■ Change ownership of the DRPackages folder to service user using the chown
nbsvcusr:nbsvcusr /mnt/nblogs/DRPackages command.
15 Restart the NetBackup services in primary server pod and external media
server.
■ Execute the following command in the primary server pod:
kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash
Performing catalog backup and recovery 94
Restoring a catalog
16 Configure a storage unit on external media server that is used during catalog
backup.
17 Perform catalog recovery from NetBackup Administration Console.
For more information, refer to the NetBackup Troubleshooting Guide.
18 Execute the kubectl exec -it -n <namespace> <primary-pod-name> --
/bin/bash command in the primary server pod.
The MSDP Scaleout services are not interrupted when MSDP engines are added.
Note: Due to some Kubernetes restrictions, MSDP operator restarts the engine
pods for attaching the existing and new volumes, which can cause the short
downtime of the services.
Managing MSDP Scaleout 97
Expanding existing data or catalog volumes
To expand the data or catalog volumes using the kubectl command directly
◆ Run the following command to increase the requested storage size in the
spec.dataVolumes field or in the spec.catalogVolume field..
kubectl -n <sample-namespace> edit msdpscaleout <your-cr-name>
[-o json | yaml]
Sometimes Amazon EBS CSI driver may not respond the volume expansion request
promptly. In this case, the operator retries the request by adding 1 byte to the
requested volume size to trigger the volume expansion again. If it is successful,
the actual volume capacity could be slightly larger than the requested size.
Due to the limitation of Amazon EBS CSI driver, the engine pods need to be restarted
for resizing the existing volumes. This can cause the short downtime of the services.
MSDP Scaleout does not support the following:
■ Cannot shrink the volume size.
■ Cannot change the existing data volumes other than for storage expansion.
■ Cannot expand the log volume size. You can do it manually. See “Manual storage
expansion” on page 97.
■ Cannot expand the data volume size for MDS pods. You can do it manually.
See “Manual storage expansion” on page 97.
3 Apply new CR YAML to stop MSDP operator from reconciling and repairing
the pods automatically.
kubectl apply -f <your-cr-yaml>
Note: If you add new MSDP Engines later, the new Engines will respect the CR
specification only. Your manual changes would not be respected by the new Engines.
update the node-selector for the CR accordingly. If you create another node
group, the new node-selector does not take effect until you manually delete the
pods and deployments from the old node group, or delete the old node group
directly to have the pods re-scheduled to the new node group.
■ Ensure that each EKS node supports mounting the number of data volumes
plus 5 of the data disks.
For example, if you have 16 data volumes for each engine, then each your EKS
node should support mounting at least 21 data disks. The additional 5 data disks
are for the potential MDS pod, Controller pod or MSDP operator pod to run on
the same node with MSDP engine.
Credentials, bucket name, and sub bucket name must be the same as the
recovered Cloud LSU configuration in the previous MSDP Scaleout deployment.
Configuration file template:
If the LSU cloud alias does not exist, you can use the following command to
add it.
/usr/openv/netbackup/bin/admincmd/csconfig cldinstance -as -in
<instance-name> -sts <storage-server-name> -lsu_name <lsu-name>
Managing MSDP Scaleout 101
MSDP Cloud backup and disaster recovery
3 On the first MSDP Engine of MSDP Scaleout, run the following command for
each cloud LSU:
sudo -E -u msdpsvc /usr/openv/pdde/pdcr/bin/cacontrol --catalog
clouddr <LSUNAME>
Option 2: Stop MSDP services in each MSDP engine pod. MSDP service starts
automatically.
kubectl exec <sample-engine-pod> -n <sample-cr-namespace> -c
uss-engine -- /usr/openv/pdde/pdconfigure/pdde stop
Managing MSDP Scaleout 102
MSDP Cloud backup and disaster recovery
Scenario 2: MSDP Scaleout and its data is lost and the NetBackup primary
server was destroyed and is re-installed
1 Redeploy MSDP Scaleout on an EKS cluster by using the same CR parameters
and new NetBackup token.
2 When MSDP Scaleout is up and running, reuse the cloud LSU on NetBackup
primary server.
/usr/openv/netbackup/bin/admincmd/nbdevconfig -setconfig
-storage_server <STORAGESERVERNAME> -stype PureDisk -configlist
<configuration file>
Credentials, bucket name, and sub bucket name must be the same as the
recovered Cloud LSU configuration in previous MSDP Scaleout deployment.
Configuration file template:
If KMS is enabled, setup KMS server and import the KMS keys.
If the LSU cloud alias does not exist, you can use the following command to
add it.
/usr/openv/netbackup/bin/admincmd/csconfig cldinstance -as -in
<instance-name> -sts <storage-server-name> -lsu_name <lsu-name>
3 On the first MSDP Engine of MSDP Scaleout, run the following command for
each cloud LSU:
sudo -E -u msdpsvc /usr/openv/pdde/pdcr/bin/cacontrol --catalog
clouddr <LSUNAME>
5 Get the token from the target domain NetBackup Web UI.
Navigate to Security > Token. In the Create token window, enter the token
name and other required details. Click Create.
For more information, see the NetBackup Web UI Administrator’s Guide.
6 Add replication targets for the disk pool in replication source domain.
In the Disk pools tab, click on the disk pool link.
Click Add to add the replication target.
7 In the Add replication targets window:
■ Select the replication target primary server.
■ Provide the target domain token.
■ Select the target volume.
■ Provide the target storage credentials.
Click Add.
Option Description
Available options:
Run MSDP commands with non-root user msdpsvc after logging in to an engine
pod.
For example, sudo -E -u msdpsvc <command>
The MSDP Scaleout services in an engine pods are running with non-root user
msdpsvc. If you run the MSDP Scaleout services or commands with the root user,
MSDP Scaleout may stop working due to file permissions issues.
3 If the reclaim policy of the storage class is Retain, run the following command
to restart the existing MSDP Scaleout. MSDP Scaleout starts with the existing
data/metadata.
kubectl apply -f <your-cr-yaml>
Note: All affected pods or other Kubernetes workload objects must be restarted
for the change to take effect.
4 After the CR YAML file update, existing pods are terminated and restarted one
at a time, and the pods are re-scheduled for the new node group automatically.
Note: Controller pods are temporarily unavailable when the MDS pod restarts.
Do not delete pods manually.
5 Run the following command to change MSDP Scaleout operator to the new
node group:
kubectl msdp init -i <your-acr-url>/msdp-operator:<version> -s
<storage-class-name> -l agentpool=<new-nodegroup-name>
6 If node selector does not match any existing nodes at the time of change, you
see the message on the console.
If auto scaling for node is enabled, it may resolve automatically as the new
nodes are made available to the cluster. If invalid node selector is provided,
pods may go in the pending state after the update. In that case, run the
command above again.
Do not delete the pods manually.
Chapter 12
Uninstalling MSDP
Scaleout from EKS
This chapter includes the following topics:
When an MSDP Scaleout CR is deleted, the critical MSDP data and metadata
is not deleted. You must delete it manually. If you delete the CR without cleaning
up the data and metadata, you can re-apply the same CR YAML file to restart
MSDP Scaleout again by reusing the existing data.
2 If your storage class is with the Retain policy, you must write down the PVs
that are associated with the CR PVCs for deletion in the Kubernetes cluster
level.
kubectl get
pod,svc,deploy,rs,ds,pvc,secrets,certificates,issuers,cm,sa,role,rolebinding
-n <sample-namespace> -o wide
4 If your storage class is with the Retain policy, you must delete the EBS volumes
on Amazon console or using the AWS CLI.
aws ec2 delete-volume --volume-id <value>
■ -k: Delete all resources of MSDP Scaleout operator except the namespace.
3 If your storage class is with the Retain policy, you must delete the EBS volumes
on Amazon console or using the AWS CLI.
aws ec2 delete-volume --volume-id <value>
■ Resolving the issue where the NetBackup server pod is not scheduled for long
time
■ Resolving an issue where the primary server or media server deployment does
not proceed
Verify that both pods display Running in the Status column and both deployments
display 2/2 in the Ready column.
pod/x10-240-0-15.veritas
.internal 2/2 Running 0 59m
1556:30248/TCP 54m
service/
environment-
sample-primary LoadBalancer 10.0.206.39 13781:30246/TCP,
13782:30498/TCP,
1556:31872/TCP,
443:30049/TCP,
8443:32032/TCP,
22:31511/TCP 87m
service/
x10-240-0-12
-veritas-internal LoadBalancer 10.0.44.188 10082:31199/TCP 68m
service/
x10-240-0-13
-veritas-internal LoadBalancer 10.0.21.176 10082:32439/TCP, 68m
service/
x10-240-0-14 10102:30284/TCP
-veritas-internal LoadBalancer 10.0.25.99 10082:31810/TCP, 68m
service/
x10-240-0-15 10102:31755/TCP
-veritas-internal LoadBalancer 10.0.185.135 10082:31664/TCP, 68m
10102:31811/TCP
Once in the primary server shell prompt, to see the list of logs, run:
ls /usr/openv/logs/
StorageServer=PureDisk:nbux-systest-media-1;
Report=PDDO Stats for (nbux-systest-media-1):
scanned: 4195521 KB, CR sent: 171002 KB, CR sent over FC: 0 KB,
dedup: 95.9%, cache disabled, where dedup space saving:6.6%,
compression space saving:89.3%
*** - Info bpbkar (pid=19109) done. status: 42: network read failed
To resolve this issue, update the sysctl.conf values for NetBackup servers
deployed on the EKS cluster.
NetBackup image sets following values in sysctl.conf during EKS deployment:
■ net.ipv4.tcp_keepalive_time = 180
■ net.ipv4.tcp_keepalive_intvl = 10
■ net.ipv4.tcp_keepalive_probes = 20
■ net.ipv4.ip_local_port_range = 14000 65535
These settings are persisted at the location /mnt/nbdata/etc/sysctl.conf.
There are two ways to modify these values:
■ Modify the value in both /etc/sysctl.conf and
/mnt/nbdata/etc/sysctl.conf and run the sysctl -p command to load the
modified values.
■ Modify the values in /mnt/nbdata/etc/sysctl.conf and restart the pod. The
new values are reflected after the pod restart.
If external media servers are used, perform the steps in the following order:
1. Add the following in /usr/openv/netbackup/bp.conf:
HOST_HAS_NAT_ENDPOINTS = YES
2. Add the following sysctl configuration values in etc/sysctl.conf on external
media servers to avoid any socket connection issues:
■ net.ipv4.tcp_keepalive_time = 180
■ net.ipv4.tcp_keepalive_intvl = 10
■ net.ipv4.tcp_keepalive_probes = 20
■ net.ipv4.ip_local_port_range = 14000 65535
■ net.core.somaxconn = 4096
When you deploy NetBackup for the first time, perform the steps for primary CR
and media CR.
To resolve an invalid license key issue for Primary/Media CR
1 Get the configmap name created for primary CR or media CR using the
following command:
kubectl get configmap -n <namespace>
2 Edit the license key stored in configmap using the following command:
kubectl edit configmap <primary/media-configmap-name> -n
<namespace>
3 Update value for ENV_NB_LICKEY key in the configmap with correct license
key and save.
4 Delete respective primary/media pod using the following command:
kubectl delete pod<primary/media-pod-name> -n <namespace>
Note: Ensure that you copy spec information of the media server CR. The spec
information is used to reapply the media server CR.
3 Delete respective persistent volume claim using the kubectl delete pvc
<pvc_name> -n <namespace> command. Any available persisted data is
deleted.
4 Add the mediaServer section, update the license key, and reapply the
environment.yaml using the kubectl apply -f <environment.yaml>
command.
3 Depending on the output of the command and the reason for the issue, perform
the required steps and update the environment CR to resolve the issue.
Error: ERROR Storage class with the <storageClassName> name does not exist.
After fixing this error, primary server or media server CR does not require any
changes. In this case, NetBackup operator reconciler loop is invoked after every
10 hours. If you want to reflect the changes and invoke the NetBackup operator
reconciler loop immediately, delete and reapply the primary server or media server
CR.
Troubleshooting 122
Resolving an issue where the primary server or media server deployment does not proceed
Note: To reuse the mediaServer section information, you must save it and
apply the yaml again with the new changes using the kubectl apply -f
<environment.yaml> command.
2 Check pod events for obtaining more details for probe failure using the following
command:
kubectl describe pod/<podname> -n <namespace>
Kubernetes will automatically try to resolve the issue by restarting the pod after
liveness probe times out.
3 Depending on the error in the pod logs, perform the required steps or contact
technical support.
NetBackup media server and NetBackup primary server were in running state.
Media server persistent volume claim or media server pod is deleted. In this case,
reinstallation of respective media server can cause the issue.
To resolve token issues
1 Open the NetBackup web UI using primary server hostname given in the primary
server CR status.
2 Navigate to Security > Host Mappings.
3 Click Actions > Allow auto reissue certificate for the respective media server
name.
4 Delete data and logs PVC for respective media server only using the kubectl
delete pvc <pvc-name> -n <namespace> command.
New media server pod and new PVCs for the same media server are created.
Troubleshooting 125
Resolving an issue related to insufficient storage
2 Run the kubectl get nodes command to ensure that the nodes from the
newly created nodepool are used in your cluster.
3 Delete the primary server CR using the kubectl delete -f
<environment.yaml> command.
2 For media server CR: Delete the media server CR by removing the
mediaServer section in the environment.yaml and save the changes.
Note: Ensure that you copy spec information of the media server CR. The spec
information is used to reapply the media server CR.
To resolve this issue, execute the following command in the primary server pod:
kubectl exec -it -n <namespace> <primary-server-pod-name> -- /bin/bash
Refer the NetBackup Security and Encryption Guide for configure KMS manually:
For other troubleshooting issue related to KMS, refer the NetBackup Troubleshooting
Guide.
pod/netbackup-operator
-controller-manager-
5df6f58b9b-6ftt9 1/2 ImagePullBackOff 0 13s
4 Run the kubectl get PV command and verify bound state of PVs is Available.
Troubleshooting 129
Check primary server status
5 For the PV to be claimed by specific PVC, add the claimref spec field with
PVC name and namespace using the kubectl patch pv <pv-name> -p
'{"spec":{"claimRef": {"apiVersion": "v1", "kind":
"PersistentVolumeClaim", "name": "<Name of claim i.e. PVC name>",
"namespace": "<namespace of pvc>"}}}' command.
For example,
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": {"apiVersion":
"v1", "kind": "PersistentVolumeClaim", "name":
"data-testmedia-media-0", "namespace": "test"}}}'
While adding claimRef add correct PVC names and namespace to respective
PV. Mapping should be as it was before deletion of the namespace or deletion
of PVC.
6 Deploy environment CR that deploys the primary server and media server CR
internally.
If the output shows STATUS as Failed as in the example above, check the primary
pod log for errors with the command:
$ kubectl logs pod/environment-sample-primary-0 -n <namespace>
pod/netbackup-
operator-controller-
manager-6c9dc8d87f
-pq8mr 0/2 Pending 0 15s
1 Run:
$ docker load -i images/pdk8soptr-16.0.tar.gz
Sample output:
Sample output:
"sha256:353d2bd50105cbc3c61540e10cf32a152432d5173bb6318b8e"
The second copy is created in Azure Container Registry (ACR). To check this copy,
do the following:
1 Run:
$ docker image tag msdp-operator:16.0
testregistry.azurecr.io/msdp-operator:16.0
2 Run:
$ docker image ls | grep msdp-operator
Sample output:
16.0: digest:
sha256:d294f260813599562eb5ace9e0acd91d61b7dbc53c3 size:
2622
Troubleshooting 132
Ensure that the container is running the patched image
Sample output:
[
"testregistry.azurecr.io/msdp-operator@sha256:
d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
Sample output:
[
"msdp-operator",
]
Sample output:
{
"changeableAttributes": {
"deleteEnabled": true,
"listEnabled": true,
"readEnabled": true,
"writeEnabled": true
},
"createdTime": "2022-02-01T13:43:26.6809388Z",
"digest": "sha256:d294f260813599562eb5ace9e0acd91d61b7dbc53c3",
"lastUpdateTime": "2022-02-01T13:43:26.6809388Z",
"name": "16.0",
"signed": false
}
The third copy is located on a Kubernetes node running the container after it is
pulled from the registry. To check this copy, do the following:
Troubleshooting 133
Ensure that the container is running the patched image
1 Run;
$ kubectl get nodes -o wide
3 You can interact with the node session from the privileged container:
chroot /host
Sample output:
Sample output
"sha256:353d2bd50105cbc3c61540e10cf32a152432d5173bb6318b8e"
null
Sample output
[
"testregistry.azurecr.io/msdp-operator@sha256:
d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
null
Troubleshooting 134
Getting EEB information from an image, a running container, or persistent data
How to make sure that you are running the correct image
Use the steps given above to identify image ID and Digest and compare with values
obtained from the registry and the Kubernetes node running the container.
Sample output:
EEB_MSDP_16.0_PET3980928_SET3992009_EEB2
EEB_MSDP_16.0_PET3980928_SET3992010_EEB2
EEB_MSDP_16.0_PET3980928_SET3992018_EEB1
Wed Feb 2 20:48:14 UTC 2022: End
Wed Feb 2 20:48:14 UTC 2022: Listing strings for EEBs
installed in testregistry.azurecr.io/uss-mds:16.0-2.
EEB_MSDP_16.0_PET3980928_SET3992008_EEB1
EEB_MSDP_16.0_PET3992020_SET3992019_EEB2
EEB_MSDP_16.0_PET3980928_SET3992010_EEB2
Wed Feb 2 20:48:15 UTC 2022: End
Alternatively, if the nbbuilder script is not available, you can view the installed
EEBs by executing the following command:
$ docker run --rm <image_name>:<image_tag> cat
/usr/openv/pack/pack.summary
Sample output:
EEB_NetBackup_10.0Beta6_PET3980928_SET3992004_EEB1
EEB_NetBackup_10.0Beta6_PET3980928_SET3992021_EEB1
EEB_NetBackup_10.0Beta6_PET3980928_SET3992022_EEB1
EEB_NetBackup_10.0Beta6_PET3980928_SET3992023_EEB1
EEB_NetBackup_10.0Beta6_PET3992020_SET3992019_EEB2
EEB_NetBackup_10.0Beta6_PET3980928_SET3992009_EEB2
EEB_NetBackup_10.0Beta6_PET3980928_SET3992016_EEB1
EEB_NetBackup_10.0Beta6_PET3980928_SET3992017_EEB1
Note: The pack directory may be located in different locations in the uss-*
containers. For example: /uss-controller/pack , /uss-mds/pack,
/uss-proxy/pack.
To resolve this issue, restart the NetBackup operator by deleting the NetBackup
operator pod using the following command:
kubectl delete <Netbackup-operator-pod-name> -n <namespace>
■ Secret
■ MSDP Scaleout CR
Secret
The Secret is the Kubernetes security component that stores the MSDP credentials
that are required by the CR YAML.
stringData:
# Please follow MSDP guide for the credential characters and length.
# https://www.veritas.com/content/support/en_US/article.100048511
# The pattern is "^[\\w!$+\\-,.:;=?@[\\]`{}\\|~]{1,62}$"
username: xxxx
password: xxxxxx
MSDP Scaleout CR
■ The CR name must be less than 40 characters.
■ The MSDP credentials stored in the Secret must match MSDP credential rules.
See Deduplication Engine credentials for NetBackup
■ MSDP CR cannot be deployed in the namespace of MSDP operator. It must be
in a separate namespace.
■ You cannot reorder the IP/FQDN list. You can update the list by appending the
information.
■ You cannot change the storage class name. The storage class must be backed
with Amazon EBS CSI driver "ebs.csi.aws.com".
■ You cannot change the data volume list other than for storage expansion. It is
append-only and storage expansion only. Up to 16 data volumes are supported.
■ Like the data volumes, the catalog volume can be changed for storage expansion
only.
■ You cannot change or expand the size of the log volume by changing the MSDP
CR.
■ You cannot enable NBCA after the configuration.
■ Once KMS and the OST registration parameters set, you cannot change them.
■ You cannot change the core pattern.
MSDP Scaleout CR template:
fqdn: "sample-fqdn4"
#
# Optional annotations to be added in the LoadBalancer services for the
Engine IPs.
# In case we run the Engines on private IPs, we need to add some
customized annotations to the LoadBalancer services.
# loadBalancerAnnotations:
# # If it's an AKS environment, specify the following annotation
to use the internal IPs.
# # see https://docs.microsoft.com/en-us/azure/aks/internal-lb
# service.beta.kubernetes.io/azure-load-balancer-internal: "true"
# # If the internal IPs are in a different subnet as the AKS cluster,
the following annotation should be
# # specified as well. The subnet specified must be in the same virtual
network as the AKS cluster.
# service.beta.kubernetes.io/azure-load-balancer-internal-subnet:
"apps-subnet"
#
# # If your cluster is EKS, the following annotation item is required.
# # The subnet specified must be in the same VPC as your EKS.
# service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-04c47
28ec4d0ecb90"
#
# SecretName is the name of the secret which stores the MSDP credential.
# AutoDelete, when true, will automatically delete the secret specified
by SecretName after the
# initial configuration. If unspecified, AutoDelete defaults to true.
# When true, SkipPrecheck will skip webhook validation of the MSDP
credential. It is only used in data re-use
# scenario (delete CR and re-apply with pre-existing data) as the secret
will not take effect in this scenario. It
# cannot be used in other scenarios. If unspecified, SkipPrecheck defaults
to false.
credential:
# The secret should be pre-created in the same namespace which has the
MSDP credential stored.
# The secret should have a "username" and a "password" key-pairs with
the corresponding username and password values.
# Please follow MSDP guide for the rules of the credential.
# https://www.veritas.com/content/support/en_US/article.100048511
# A secret can be created directly via kubectl command or with the
equivalent YAML file:
# kubectl create secret generic sample-secret --namespace sample-
CR template 142
MSDP Scaleout CR
namespace \
# --from-literal=username=<username> --from-literal=password=
<password>
secretName: sample-secret
# Optional
# Default is true
autoDelete: true
# Optional
# Default is false.
# Should be specified only in data re-use scenario (aka delete and
re-apply CR with pre-existing data)
skipPrecheck: false
#
# Paused is used for maintenance only. In most cases you do not need
to specify it.
#
# When it is specified, MSDP operator stops reconciling the corresponding
MSDP-X cluster (aka the CR).
# Optional.
# Default is false
# paused: false
#
# The storage classes for logVolume, catalogVolume and dataVolumes
should be:
# - Backed with Azure disk CSI driver "disk.csi.azure.com" with the
managed disks, and allow volume
# expansion.
# - The Azure in-tree storage driver "kubernetes.io/azure-disk" is not
supported. You need to explicitly
# enable the Azure disk CSI driver when configuring your AKS cluster,
or use k8s version v1.21.x which
# has the Azure disk CSI driver built-in.
# - In LRS category.
# - At least Standard SSD for dev/test, and Premium SSD or Ultra Disk
for production.
# - The same storage class can be used for all the volumes.
# -
#
# LogVolume is the volume specification which is used to provision a
volume of an MDS or Controller
# Pod to store the log files and core dump files.
# It is not allowed to be changed.
# In most cases, 5-10 GiB capacity should be big enough for one MDS or
CR template 143
MSDP Scaleout CR
- storageClassName: sample-aws-disk-sc3
resources:
requests:
storage: xxTi
#
# NodeSelector is used to schedule the MSDPScaleout Pods on the
specified nodes.
# Optional.
# Default is empty (aka all available nodes)
nodeSelector:
# e.g.
# agentpool: nodegroup2
sample-node-label1: sampel-label-value1
sample-node-label2: sampel-label-value2
#
# NBCA is the specification for the MSDP-X cluster to enable NBCA
SecComm for the Engines.
# Optional.
nbca:
# The master server name
# The allowed length is in range 1-255
masterServer: sample-master-server-name
# The CA SHA256 fingerprint
# The allowed length is 95
cafp: sample-ca-fp
# The NBCA authentication/reissue token
# The allowed length is 16
# For security consideration, a token with maximum 1 user allowed
and valid for 1 day should be sufficient.
token: sample-auth-token
#
# KMS includes the parameters to enable KMS for the Engines.
# We support to enable KMS in init or post configuration.
# We do not support to change the parameters once they have been set.
# Optional.
kms:
# As either the NetBackup KMS or external KMS (EKMS) is configured
or registered on NetBackup master server, then used by
# MSDP by calling the NetBackup API, kmsServer is the NetBackup master
server name.
kmsServer: sample-master-server-name
keyGroup: sample-key-group-name
#
CR template 145
MSDP Scaleout CR
# The minimal allowed value is 4 and the maximum allowed value is 30.
# A default value 30 minutes is used if not specified. Set it to 0 to
disable the option.
# It is not allowed to change unless in maintenance mode (paused=true),
and the change will not apply until the Engine Pods and the LoadBalancer
services get recreated.
# For AKS deployment in 10.0 release, please leave it unspecified or
specify it with a value larger than 4.
# tcpIdleTimeout: 30