SSUG::Digital: 002 – Best Practices for building a stretched cluster

Talk 2 in the SSUG::Digital series looks at how to build a stretched cluster. What are the best practices? What pitfalls are there? Why would you consider a stretched cluster built with Spectrum Scale, as opposed to one of the alternative approaches to high availability? How do stretched clusters work, and what considerations go into planning a successful stretched cluster? We will examine the theory behind Spectrum Scale stretched clusters, review some best practices for designing stretched clusters, and talk about a few cases where stretched clusters have been successfully deployed.

Download slides here



Q: For DR use case where ClusterA and ClusterB are the 2 separate data centres (DC A and DCB), do I need my Tiebreaker Quorum node installed in Data Centre C?
A: (This is covered in the presentation). It is recommended to have the tiebreaker quorum node at a third site but it could be in one of the sites with the caveat that if that site goes down the second site will not be able to stay up.

Q: The documentation shows a high speed shared storage is needed…does it mean that san fabric should be merged over ISL for volume allocation across site?
A: When using Spectrum Scale replication for stretch clusters there is no need to for the SAN to be extended across the sites. The stretched cluster architecture described in the presentation works even when underlying storage does not replicate the data across sites.

Q: Will there be any performance difference between extended SAN and accessing NSD over network using their owner?
A: Well aside from the protocol difference block vs file, it depends on the type of connectivity you have to SAN vs network. Spectrum Scale has been placing more resiliency in recent releases for what to do for network behaviour (eg recently proactiveReconnet feature was added to Spectrum Scale).

Q: Does 10ms latency required between SiteA, SiteB and also Tiebreaker quorum node? Can my tiebreaker quorum node have higher latency? 
A: Yes, the third site can have a higher latency but it should still be “within reason”. So maybe double that number ie 20ms. It is recommended to keep it under a second.

Q: Is tiebreaker node hosted on AWS or any other Cloud Providers a supported configuration?
A: Yes, we have customers who using a public cloud for their third site.

Q: What is the RPO and RTO?
A: Remember that this is synchronous replication. So as long as you don’t run out of space on your storage there is 0 RPO. RTO answer depends on your workload and infrastructure. It depends on the rate of change of data change, your storage and the WAN.

Q: How to check/measure the rate of data change?
A: This really depends on the application and the rate of data change by the application. If you already have implemented Spectrum Scale, you can use the historical data from performance monitoring within Spectrum Scale to estimate the rate of data change.

Q: Do you have any general tips/recommendations regarding CES in a stretched cluster?
A: The CES nodes in your cluster need to be split between the two sites as they are still part of single cluster. SMB performs its own locking with ctdb component. Thus the latency between the CES nodes needs to be fairly small value. Also be aware if you have different address spaces on two sites, there may not be an automated failover of services and you may need to manually perform the same.

Continue reading “SSUG::Digital: 002 – Best Practices for building a stretched cluster”

SSUG::Digital: 001 – What is new in Spectrum Scale 5.0.5?

At each of our user group events, we pretty much always start off with “What’s new in XXX release?” and with Spectrum Scale 5.0.5 having just been release, we’re doing the same with the new series of SSUG::Digital events.


Download slides here


Blog Post: What is new in Spectrum Scale 5.0.5?


Q: ­How would one go about obtaining an IBM Contact? ­
A: ­­Do you have an IBM Sales rep already? That would be the first person to contact. ­If you purchased through a business partner, that is your first point of contact.

Q: ­ Will EUS releases going to be consumed preferentially by the ESS code distributions? Maybe would make it easier to coordinate Spectrum Scale and ESS code levels when we need to update Spectrum Scale. ­
A: ­That’s the intention. Of course, based on exact timing of releases and ESS needs for new functions, it might not work out in all cases. ­

Q: ­Regarding Thin Provisioning support: Are there any test cycle for other vendors like Hitachi already happening? ­
A: Contact IBM to discuss the specific vendor requirements. ­If you have a specific piece of hardware you want to see supported, file an RFE.

Q: ­Does the support of compression mean that you will also support the FCM Modules in an ESS 3000 or in storages like the IBM Flash Systems 9000/7000/5000? ­
A: ­It is under evaluation to support FCM Modules of IBM Flash Systems and future ESS models with future Spectrum Scale releases. ­

Q: ­Any estimate on performance differences between Spectrum Scale 4.2.3 and Spectrum Scale 5.0.5­?
A: There are incremental performance improvements in every Spectrum Scale releases. There is a significant performance jump from Spectrum Scale 4.2.3 to Spectrum Scale 5.0.0 to meet the performance commitments for CORAL. Some performance improvements have been covered at previous User Group Meetings and are available on ( It is also planned to provide a performance update in a future Expert Talk.

Q: ­Are there plans to set the all-to-all daemon connections to defaults? ­
A: ­No as default at the moment. See Expert Talk 004 “Performance Update” for more details:

Q: ­Is all-to-all connection establishment limited to the nodes inside a cluster or does it include all nodes from remote clusters that are already connected to a FS? ­
A: Local and remote­.

Q: ­”cp –preserver=xattr” feature is it something that 5.0.5 will enable for copies to “5.0.5”? Aka migrating data from 4.X to 5.X, or only from 5.0.5 to future versions? ­
A: ­You can copy files from Spectrum Scale 5.0.5 to previous and further version preserving extended attributes. You cannot copy the extended attributes from previous versions of Spectrum Scale to Spectrum Scale 5.0.5.

Q: ­Is the “cp –preserve…” a function of the RHEL release, or is there a version of cp included with Spectrum Scale? ­
A: ­The system calls listxattr, getxattr, and setxattr were extended to retrieve ACLs as extended attributes. ­

Q: ­So, then the version of the Spectrum Scale filesystem doesn’t matter, just the version of RHEL? (for the cp question). ­
A: ­Those system calls are extended by Spectrum Scale Scale at the VFS layer, so it would depend on the version of Spectrum Scale (and kernel extensions). ­

Q: ­Are there any plans for Scale Protocols to support SMB transparent failover? ­
A: ­As functionality is introduced into Samba code base, IBM look into how they can pick support up for that. It’s a topic being discussed in the Samba community for this type of failover. ­ However, it’s known to be a hard problem.

Q: ­Any news on NFS4.1 support? ­
A: ­It is in plan to support it later this year (subject to IBM plan commitment disclaimer)­.

Q: ­On Spectrum Scale 4.2.3 and RHEL 7, we’ve had problems with the Ganesha daemon using steadily more memory over time, requiring us to failover / stop / start / failback periodically. Has Ganesha’s memory requirements been reduced in Spectrum Scale 5.0.5, or is there better visibil­ity into what is driving memory usage?
A: ­This issue was traced back to a C library memory allocation fragmentation issue.  There was a fix put into the Ganesha code to force the release of this unused fragmented memory.  This fix was made available last year in the 5.0.x release stream. ­

Q: ­Hallo All, what is now the strategy to support object protocols like S3. It is missing the currency­.
A: ­We have renewed focus on Object protocol.  We plan to support the Train release in the fall release.  Going forward we will try to update the Swift/S3 version once a year to make sure it stays current (subject to IBM plan commitment disclaimer). ­­If you have specific interest in S3 applications, please contact us as we would like to hear about your requirements and use cases.

Q: ­We are working on a deployment of CSI 1.1.0. When is snapshot support happening.?
A: ­Snapshot support is planned to be available in a CSI driver update coming in late 3Q early 4Q 2020 (subject to IBM plan commitment disclaimer).­

Q: Are there any news about the restriction of GUI HA with CSI?
A: ­This is a high priority requirement for the fall release.  It’s not officially committed yet, but we are definitely trying to squeeze it in (subject to IBM plan commitment disclaimer).

User group host: Simon Thompson


Speaker NamePhotoBioSocial connections
Chris MaestasChris MaestasChris is an Executive Architect for IBM File and Object Storage Solutions with over 20 years of experience deploying and designing IT systems for clients in various spaces. He has experience scaling performance and availability with a variety of file systems technologies. He has developed benchmark frameworks to test out systems for reliability and validate research performance data. He also has led global enablement sessions online and face to face where discussing how best to position mature technologies like Spectrum Scale with emerging technologies in Cloud, Object, Container or AI spaces.Twitter: @cdmaestas
Mathias DietzMathias works in the Spectrum Scale development team in Kelsterbach (Germany) as a software architect responsible for Reliability, Availability and Serviceability (RAS). Part of his role is to drive reliability improvement into Spectrum Scale and improve Health & Performance monitoring, Proactive Services and CES failover.
Ismael Solis MorenoIsmael Solis MorenoIsmael works in the Spectrum Scale development team in Guadalajara Mexico as a data scientist and performance analyst. He is responsible for evaluating Spectrum Scale new features and releases performance. Part of his role is to analyze datasets to identify points of performance improvement providing insights to the development teams.LinkedIn:

Introducing SSUG::Digital

We’ve gone digital for a series of Spectrum Scale user group talks! As we’ve had to cancel our in-person events, we’ve been working out how we can do digital content in a way that works for the user group. We’ll be running a series of digital webinars along the lines of the talks we’d normally have at SSUG events hosted by the Spectrum Scale user group people.

During the series we’ll be pulling in the types of talks we’d have at our events and we’re sure you’ll see a number of familiar names and faces in our speaker line-up – as well as some new faces too! We’ve been working hard with IBM to work out the format and content of the events and whilst it might take a few goes to get the format right, we’re hopeful it will work out well.

Each of the events will be hosted by a member of the SSUG committee and we’ll have digital Q&A live during the event. We plan to post the videos on our Youtube channel and post the questions and answers on the user group website along with the slide decks. Anything we can’t answer during the talk, we’ll take away and try and get a response from afterwards!

Anyone can join the SSUG::Digital series using the links on each event page. We know it won’t be the same as the in-person events, but we hope you’ll join in and participate in the on-line series. We hope to run a talk every couple of weeks, but the schedule will be driven a little by how well people engage and speaker availability.

You can find details of all our Expert Talks on the SSUG website.

Due to the tooling we’re using, only the presenter and Q&A panel hosts will be able to speak, but we you can ask questions in the live Q&A chat.

We appreciate as a global group this timing doesn’t work for everyone, but as a lot of the speakers are based in the US and Europe…! And we will be posting the videos of the talks shortly after each session.