mmperfmon query

The new Spectrum Scale has been out for a few weeks now, so I think I should introduce a new function that’s been added for the purpose of performance monitoring.
Some of you may remember the performance monitor that comes with GSS. Its now been added to Spectrum Scale, and is recording various metrics for GPFS, Object, NFS, and SMB (depending on what you have installed). It will start automatically for some metrics (there are other more complex ones that can be enabled). So here’s a few variations to introduce the new command to look at the metrics being gathered.

mmperfmon query Metric[,Metric....]  |  Key[,Key...]  | NamedQuery  [StartTime EndTime | Duration] [ options ]

or

mmperfmon query compareNodes ComparisonMetric  [StartTime EndTime | Duration] [ options ]

If you just want to look at the basic metrics for your node, then the usage query is the best place to start.

mmperfmon query usage
Legend: 
 1:    swift-test-01|CPU|cpu_user
 2:    swift-test-01|CPU|cpu_sys
 3:    swift-test-01|Memory|mem_total
 4:    swift-test-01|Memory|mem_free
 5:    swift-test-01|Network|lo|net_r
 6:    swift-test-01|Network|lo|net_s

Row Timestamp            cpu_user cpu_system mem_total  mem_free  net_r   net_s
1   2015-06-25-09:41:10  23.21    8.93       1018.2 MB  139.4 MB  7.1 kB  7.1 kB
2   2015-06-25-09:41:11  51.06    21.28      1018.2 MB  157.7 MB  4.0 kB  4.0 kB
3   2015-06-25-09:41:12  19.35    8.06       1018.2 MB  138.7 MB  6.8 kB  6.8 kB
4   2015-06-25-09:41:13  48.78    19.51      1018.2 MB  157.6 MB  3.5 kB  3.5 kB
5   2015-06-25-09:41:14  25       6.67       1018.2 MB  138.1 MB  5.6 kB  5.6 kB    
6   2015-06-25-09:41:15  59.46    32.43      1018.2 MB  133.4 MB  6.3 kB  6.3 kB
7   2015-06-25-09:41:16  36.73    14.29      1018.2 MB  138.3 MB  4.5 kB  4.5 kB
8   2015-06-25-09:41:17  57.14    40         1018.2 MB  130.2 MB  2.3 kB  2.3 kB 
9   2015-06-25-09:41:18  17.54    12.28      1018.2 MB  151.4 MB  2.4 kB  2.4 kB
10  2015-06-25-09:41:19  16.33    22.45      1018.2 MB  161.8 MB  4.8 kB  4.8 kB

The usage query is also a good place to introduce some of the other available options.
The command is currently looking at the node that it is being run on. To look at another node, just use the -N option:

mmperfmon query usage -N swift-test-02

If you want to change the timescale or number of rows, there are several ways to do this. StartTime and EndTime both use the standard YYYY-MM-DD-hh:mm:ss format. Or just use a single number to give a duration in seconds.

So between two different times:

mmperfmon query usage 2015-06-23-13:21:20 2015-06-23-13:23:20

And to get metrics for the last 120 seconds:

mmperfmon query usage 120

Or you can change the number of metric ‘buckets’ in use and the time period that they cover:

mmperfmon query usage --bucket_size 10 --number_buckets 20

will give you 20 rows, each looking at 10 seconds of time. Another favorite option is retrieving the data in a machine-readable form. Try the –csv option:

[root@swift-test-01 ~]# mmperfmon query usage --csv

Row,Timestamp,cpu_user,cpu_system,mem_memtotal,mem_memfree,netdev_bytes_r,netdev_bytes_s,df_free,df_total,df_used 

1,1435237142,21.670000,11.670000,1018248,156872,1264,1264,1959936,2097152,137216
2,1435237143,14.290000,26.980000,1018248,162176,7147,7147,1959936,2097152,137216
3,1435237144,65.790000,18.420000,1018248,145496,5324,5324,1959936,2097152,137216
4,1435237145,35.420000,10.420000,1018248,162288,4553,4553,1959936,2097152,137216
5,1435237146,23.730000,8.470000,1018248,150816,6826,6826,1959936,2097152,137216
6,1435237147,58.970000,23.080000,1018248,168836,4037,4037,1959936,2097152,137216
7,1435237148,20.630000,7.940000,1018248,150248,6774,6774,1959936,2097152,137216
8,1435237149,50.000000,22.730000,1018248,169132,4037,4037,1959936,2097152,137216
9,1435237150,19.670000,8.200000,1018248,145744,6009,6009,1959936,2097152,137216
10,1435237151,68.970000,20.690000,1018248,132752,6468,6468,1959936,2097152,137216

There are various other options. Check out the knowledge centre for more, or the performance paper. (See the links below).

So what other queries are available? There are some specific to the base GPFS. For example:

mmperfmon query gpfsNumberOperations

will give the read and write operations, and the bytes read and written.

Then there are several protocol-specific queries that you can use depending on which protocols you are running with. Here are examples for Object, NFS and SMB respectively:

mmperfmon query objObjLatency
mmperfmon query nfsThroughput
mmperfmon query smb2IORate

If you are running all three protocols on the same system then you can see metrics on all three together. For example:

mmperfmon query protocolThroughput

should give the basic Throughput values for all the protocols at the same time. Ideal if you’re trying to work out which protocol has the highest throughput, or just curious about the behaviour of your system.

If you are having problems with a particular node, then try the compareNodes option to compare a single metric over all the nodes in your system. The compareNodes query needs to be run with an additional parameter, to indicate what you want to compare. If you can see the same metric on all of your nodes, it can help track down a bottleneck or malfunctioning node. For example:

[root@swift-test-01 ~]# mmperfmon query compareNodes cpu_system 
Legend: 
 1:    swift-test-01|CPU|cpu_system 
 2:    swift-test-02|CPU|cpu_system 

Row Timestamp             Swift-test-01   swift-test-02
1   2015-06-25-13:07:55   25.64           23.08
2   2015-06-25-13:07:56   10.91           16.00
3   2015-06-25-13:07:57   22.50           2.06
4   2015-06-25-13:07:58   19.05           9.88
5   2015-06-25-13:07:59   17.95           2.04
6   2015-06-25-13:08:00   31.58           2.06
7   2015-06-25-13:08:01   23.08           3.06
8   2015-06-25-13:08:02   39.39           0.00
9   2015-06-25-13:08:03   21.95           2.04
10  2015-06-25-13:08:04   44.44           5.95

There are various other queries and options available so that you can tailor the command for your use. Or you can look at the individual metrics of your own choice.

[root@swift-test-01 ~]# mmperfmon query cpu_system,cpu_user 
Legend: 
 1:    swift-test-01|CPU|cpu_system 
 2:    swift-test-01|CPU|cpu_user 

Row    Timestamp             cpu_system  cpu_user 
1   2015-06-25-13:08:46   11.86       22.03
2   2015-06-25-13:08:47   45.45       51.52
3   2015-06-25-13:08:48   17.24       25.86
4   2015-06-25-13:08:49   32.08       20.75
5   2015-06-25-13:08:50   23.26       62.79
6   2015-06-25-13:08:51   11.86       23.73
7   2015-06-25-13:08:52   8.47        15.42
8   2015-06-25-13:08:53   22.22       55.56
9   2015-06-25-13:08:54   4.76        23.81
10  2015-06-25-13:08:55   23.91       73.91

I hope this will encourage you to go and see what options are available, and find uses for this command. Please let IBM know of any feedback – we would like to improve this command!

IBM Knowledge Center: http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_mmperfmon.htm

Paper on Tuning and Analysis: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Performance%20-%20Tuning%20and%20Analysis