How does the entitlement key work for partners looking to run in their labs for educational purposes?
The way it’s currently set up is that we require entitlement (a purchase) of Data Management Edition (DME) or Data Access Edition (DAE) in order to gain access to the container images. If you are entitled to these editions of Specturm Scale, the container images should be accessible via the entitlement key.
What is with air gap installations?
The switch to IBM Cloud Registry (ICR) improves air gap installations as you can replicate the container images in the same way as you do with your OCP images.for details, please see https://www.ibm.com/docs/en/scalecontainernative?topic=appendix-airgap-setup-network-restricted-red-hat-openshift-container-platform-clusters.
Is it mandatory to have access to IBM Cloud or GitHub site? What if we have a secure site?
There is an air gap install available. You can mirror the registry and pull the images yourself and then serve them internally to your own OpenShift cluster. For GitHub, you would take the command-sets, pre-download the yamls and then use them internally. You would need these all before starting your install on an air gap setup.
Now CNSA only supports mounting one remote filesystem. Is it foreseen to deploy NSD servers in OpenShift pods in the future? Moreover, one worker node can host just one Core pod, because cluster and node files are put in directories mounted from the host in order to make them persistent. If you have a bare metal cluster with, say, static physical machine worker nodes, you are forced to deploy just one Core pod per node even if your physical node can provide computing resources to host multiple Core pods.
We’re looking at ECE and shared nothing local disks. You will first see some of this in the Spectrum Fusion product and then into our CNSA side of things. You’ll also see more filesystems available for remote mount.
Is it foreseen to make multiple Core pods deployable on the same worker node in future releases?
Not anytime soon. We will look into this once we managed to get rid of the kernel module and support FUSE based deployment. We are working on FUSE but have no timeline so far.
What’s the support for other types of containers like? Singularity for example?
Currently we’re sticking with the K8s related environments but we have heard other requests for Singularity.
For CNSA a integration of CSI are great but importantly rolling update is a must.
Agreed. We’re working on rolling upgrade!!!
If you have a fixed number of bare metal worker nodes, and one node fails, it would be useful to start the lost Core pod on one of the remaining worker nodes.
Well, we are infrastructure not an application. With current CNSA (remote mount) we are basically just an interface to the storage cluster. One per pod is fine. Down the line when we add local storage it would be nice to reschedule a failed pod to another node, but unfortunately this does not work either cause it cannot transfer the disks.
When will the Helm charts for CNAS 188.8.131.52 available.?
Meanwhile they are available here:
Note that in the future, we’re going to combine CSI/CNSA installs in order to simplify things. At this point, HELM will be less necessary as well.
- Episode 8: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
- Episode 6: Persistent Storage for Kubernetes and OpenShift environments
- Episode 3: Spectrum Scale Strategy Update