Event report: SSUG @SC17

We had well over 150 participants for the Spectrum Scale User Group at this year’s Supercomputing Conference held in Denver, USA, which took place on Sunday afternoon November 12, 2017.

A highlight of the event was several user talks, including Max Dellbrück Center, University of Pennsylvania Medical, DESY / European XFEL, and the University of Birmingham. We also heard about improvements to restriping and mmfsck, multiple enhancements resulting from CORAL work and other new features coming with Scale 5.0. We’re getting all the talks uploaded, so you can check out the details for yourself. We ran a little bit late as usual, but that was due to all the great questions from the audience. Thanks as always for making this an interactive event.

We’re looking for a site to host the next US event in the spring. If you’re interested in hosting, have ideas for a talk please reach out to Kristy or Bob at the US user group committee.

Update: Slides are available online

Spectrum Scale Users Group at Supercomputing – November 2016

First off, your feedback, please! If you were at the event, please give us feedback here: https://www.surveymonkey.com/r/SSUGSC16 We’ve only heard from 9 attendees so far, and there were a lot more of you there :-).

The slides should be available soon here: http://www.spectrumscale.org/presentations/

Kicking Things Off

This was the second year for a Spectrum Scale Users Group meeting at SC. To start things off Doris Conti welcomed the group and reminded us that the IBMers are here to present, but to also listen to users’ experiences —the good, the bad, (the ugly? 🙂 ) how Spectrum Scale is being used in the field and what should be coming in the product next from the users’ perspective. Doris complimented what she called the heroes of the data centers, the people managing PBs of data short-, if not, single-handedly. The developers then introduced themselves so people could talk to them at the break or during the week. Continue reading “Spectrum Scale Users Group at Supercomputing – November 2016”

Spectrum Scale Users Group Argonne National Lab – June 2016

MidWest Users Group Event

Many thanks to Argonne National Lab (ANL) for hosting our recent Spectrum Scale Users Group Event. It’s nice to have an event in the MidWest given the geography spread in the US. Bob Oesterlin kicked things off with a social event Thursday night so some of us could share stories prior to the actual UG day. Friday morning was focused on IBM presentations and the majority of the afternoon went to user presentations.

To start, we received updates on both Spectrum Scale and ESS from Scott McFadden. Some notable priorities for 2016 include making sure that US customers have the opportunity and channel to give feedback to development in the early phase of the process and shape the result. Scott noted that there’s been great feedback in this respect from the UK users, so you heard it, let your voice be heard US users! There should be some follow up traffic on the mailing lists about that —watch that space if you are interested. There will also be news about more open betas that are accessible in a downloadable VM.

Additionally, input from the PMR and field teams is being leveraged more effectively. IBM is big and there’s recognition that in the past PMR and field information may not have been getting back to the development team effectively.

Security is another focus and an internal security audit is ongoing as mentioned at SC. Ease of use in configuring key management with ISKLM is coming in 4.2.1

During the Problem Determination session many comments were put on screen about how tricky it is to know if a GPFS cluster is healthy, and problem determination is tricky when it’s not. To that end, an mmhealth command is being developed to report on all the key components in a cluster. This should answer the questions of what components to monitor, what command to use to do so and how to interpret the results. The tool takes into consideration all of the interdepencies to report a high-level healty, degraded or unhealthy. mmhealth is being reviewed with user input as it is being developed.

For the GUI tools there were both screen shots and a live demo. The question was asked to “Raise your hand if you monitor waiters?” Lots of hands shot up. The follow up question was “Keep your hand up if fully understand them.” I think Sven was the only one who keep his hand up. The GUI tool is building up a long waiters componetnt to categorize an document waiters.

Sven Oehme gave us an overview of some features coming with 4.2.1. On Scale 4.2 there are > 700 parameters for tuning —many of them undocumented, but still used in production, so in many cases customers have a lot to figure out. It’s difficult for IBM to come up with default settings when the range of hardware and networking capabilties for any given site varies wildly. Still, to make it easier on the customer some auto-tuning capabilities are being added. For example, there will be a new worker thread setting that will auto-tune about 20 other related settings. Care is being taken to make sure that those who want to retain manual settings can do so, and there will be information about this in the documentation. Long term there is a goal to let the admin describe the system and let that inform parameter choices automatically.

The user presentations were interesting and included campus Active Directory Integration, Using GPFS on ZFS, GPFS-HPSS-Integration (GHI) and using AFM as a Burst Buffer. All presentations are available online, check them out.

All presentations are available here: http://www.spectrumscale.org/presentations/

Q&A:

Some questions from the audience included, there were many more, people weren’t shy about interjecting questions:

Q: While there is appreciation for quick development changes, there is a concern for the quality of the releases.
A: This is an area that’s being actively reviewed for improvement and better regression testing to make sure changes don’t negatively impact performance.

Q: How much is compression being used in the wild?
A: Not much production use, but people are interested for future implementation. Generally it takes a year for new features to be adopted in deployment.

Q: With mmbackup, how much data can you backup?
A: It depends largely upon your infrastructure and how much you can parallelize. Multiple TSM servers can be used for Spectrum Scale now, but a discussion of your architecture would be required to answer with a numeric value.

Q: In the monitoring tool, can detailed tracking be seen?
A: Yes, at the granularity of individual filesystem calls, setattr, mkdir, vget, getxattr, etc.

Q: What is the retention of the data that is behind the monitoring tools?
A: It’s configurable, the default is something like 1s resolution for 24 hours, and then it starts getting aggregated and resolution is reduced.

The next in person event in the US is still being planned. Stay tuned.

Inaugural USA Meet the Devs

Well we did it, we held our first “Meet the Devs” in the US and I’ll venture to say it was successful. Let me emphasize the “we” involved in this event. This was a great joint effort between users and IBM with special thanks to Janet Ellsworth of IBM for all she did to make this event happen, and Doris Conti for making sure lots of IBMers, including lots of developers, came down from Poughkipsie. Thanks also to the US co-principal Bob Oesterlin and Pallavi Galgali for keeping things moving right along behind the scenes. We’ll see Bob and Pallavi in person at the SC15 meeting in Austin.

There’s a poll we’re running to get feedback from the attendees and we’ll see how my own feelings match up with the group’s. The discussion was lively, and everyone felt comfortable to ask direct questions and talk about it all —the good, the bad and the ugly.

From an attendee perspective there was reasonable diversity as well with representatives from financial, medical/genetics, and university sectors.

So, what was covered? I’m glad you asked.

Continue reading “Inaugural USA Meet the Devs”