random walk through fields of musings

Tuesday, February 3, 2009

Netapp OnTap per-volume statistics

I'm a fan of Netapps, and I've grown used to using the OnTap SNMP agent to collect statistics on various filer functions, stuff them into RRDs and troubleshoot using trend graphs (I really like drraw for that). I can then make cool graphs like:

drraw by you.

However, those ops numbers are for the filer as a whole making it hard to tell *what* is the target for all that activity -- we need per volume and even per disk stats to find hot spots. The "stats" command in OnTap can expose that data, however not via SNMP and only via the cli (maybe via the SDK, but that's a whole level of complication I'd like to avoid). The whole list of supported stats counters is exposed using the command stats list counters. So using ssh with public keys (and using stats start/stop every 5 minutes), I was able to "plug-in" the output from periodically querying the stats command into the existant set of RRDs that drraw draws from. Here's the per-volume read/write/other ops for the same filer:

drraw by you.

and one showing total time spent serving along with amount of data served in bytes:

drraw by you.

and bandwidth by volume:

drraw by you.
the relevant code fragment is quite simple and follows the SNMP_util::snmpmaptable semantics, each "row" of instance data is sent to a callback function which writes it into the appropriate RRD (with an example invocation):


$getStatCols->{'volume'}->{'type'} = "volStats";
@{$getStatCols->{'volume'}->{'cols'}} = ("read_data", "read_latency", "read_ops",  "write_data", "write_latency", "write_ops", "other_latency", "other_ops");

for my $stat (keys %{$getStatCols}){
my($stattxt) = "${stat}:*:" . join(" ${stat}:*:", @{$getStatCols->{$stat}->{'cols'}});
getStatTable($username, $netappHostname, $getStatCols->{$stat}->{'type'}, $stattxt,  \&printfun);

sub getStatTable {
my($user, $host, $id, $statcmd, $callback) = @_;

my(@ret) = `ssh ${user}\@${host} "stats stop -I ${id} -O print_zero_values=off -c -d |"`;
my(@start) = `ssh ${user}\@${host} "stats start -I ${id} ${statcmd}"`;
my($startcmd) = join(' ', @start);
if ($startcmd){
   $startcmd =~ s/\s+//g;
   if ($startcmd ne ""){
     print STDERR "ERROR starting stats collection for ${user}\@${host} $statcmd: " . $startcmd . "\n";

if ($#ret <= 2){
   print STDERR "ERROR retrieving stats from ${user}\@${host} ${statcmd}: " . join(" ", @ret) . "\n";
} else {
   for (my $i=0;$i<=$#ret;$i++){
     my($l) = $ret[$i];
     $l =~ s/^\s+//g;
     $l =~ s/\s+$//g;
     if ($l !~ /|$/){
       $l .= "|";
     my(@cols) = ($id, $i, split(/\|/, $l));

sub printfun {
my(@vals) = @_;
for (my $i=0;$i <= $#vals; $i++){
if (! $vals[$i] || $vals[$i] eq ''){
$vals[$i] = 0;
#next if (! @vals);
#next if (! $vals[0] || ! $vals[1]);
print join('|', @vals) . "|\n";