How to Monitor the HP Smart Array configuration on an VMware ESXi host

Aim

This is a description on how to check the status of a HP Smart array and report disk failures by email from the ESXi host using HP command line utilities.

Fur this purpose, the HPSSACLI utility will be used, which is part of HP ESXi Utilities Offline bundles, but also part of customised HP ESXi images for ProLiant servers. The following instructions are applicable to an already existing ESXi host.

Installation of HP smart array drivers and utility

First of all you have to grab the HP smart array drivers and the HPACUCLI utility in the vib format. They can be used both for ESXi 5.5 and 6.0 hosts.

cd /tmp
wget http://vibsdepot.hpe.com/hpq/latest/esxi-600-drv-vibs/hpvsa/scsi-hpvsa-5.5.0.100-1OEM.550.0.0.1331820.x86_64.vib
wget http://vibsdepot.hpe.com/hpq/latest/esxi-600-vibs/hpssacli/hpssacli-2.30.6.0-6.0.0.vib

Install both files on the ESXi host using these commands:

esxcli software vib install -f -v /tmp/scsi-hpvsa-5.5.0.100-1OEM.550.0.0.1331820.x86_64.vib
esxcli software vib install -f -v /tmp/hpssacli-2.30.6.0-6.0.0.vib

reboot

Save the HP Smart Array configuration and status

With the hpssacli utility, you can check the disk status and compare it to the saved healthy state of the array. For this purpose, we must save the healthy state of the HP Smart Array configuration to a location that will not be overwritten during reboot.

Let us find out the controller configuration first:

/opt/hp/hpssacli/bin/hpssacli controller all show config

Then, use this information to store the HP Smart Array configuration to a newly created directory on datastore1.

mkdir /vmfs/volumes/datastore1/custom
/opt/hp/hpssacli/bin/hpssacli controller all show config > /vmfs/volumes/datastore1/custom/raid-good

For further commands and how to use the HPACUCLI utility see: kallesplayground.wordpress.com/useful-stuff/hp-smart-array-cli-commands-under-esxi/

Automatically report a change of the disk status by email

An email shall be sent to the administrators If a disk failure was detected. For this purpose, netcat is used to send the email, because there is no email software on an ESXi host.

Add an outbound port 25 for SMTP in the ESXi firewall.

Create a file smtp.xml and add the following content

<ConfigRoot>
  <service id='1000'>
    <id>SMTP_Outbound</id>
    <rule>
      <direction>outbound</direction>
      <protocol>tcp</protocol>
      <porttype>dst</porttype>
      <port>25</port>
    </rule>
    <enabled>true</enabled>
    <required>false</required>
  </service>
</ConfigRoot>

Upload this file to /etc/vmware/firewall/

mv smtp.xml /etc/vmware/firewall/

Refresh the network firewall rules on the ESX device:

esxcli network firewall refresh

Check the disk status

Now create a shell script check-raid.sh which can be used to monitor the disk status and to send out emails. Adapt this script to your needs by replacing the parameters in angulated brackets with your values. Store the script in /vmfs/volumes/datastore1/custom.

#!/bin/sh
# declaration section
netcat="/bin/nc"
tmp="/tmp/raid-mail"
host="<hostname>"
hostname="$host.<domain>"
emailrcpt1="<admin1@domain.com>"
emailrcpt2="<admin2@domain.com>"
emailrcpt3="<admin3@domain.com>"
mailserver="<mailserver.domain.com>"
datetime=`date '+%a, %d %b %Y %H:%M:%S %z'`
# read the configuration
/opt/hp/hpssacli/bin/hpssacli controller all show config > /tmp/raid-current
curdiff=`/bin/diff -u /vmfs/volumes/datastore1/custom/raid-good /tmp/raid-current`
/bin/diff -u /vmfs/volumes/datastore1/custom/raid-good /tmp/raid-current > /tmp/raid-diff
# send email alert
if [ "$curdiff" != "" ] ; then
	/bin/echo -e "HELO $hostname\r" > $tmp
	/bin/echo -e "MAIL FROM: root@$hostname\r" >> $tmp
	/bin/echo -e "RCPT TO: $emailrcpt1\r" >> $tmp
	/bin/echo -e "RCPT TO: $emailrcpt2\r" >> $tmp
	/bin/echo -e "RCPT TO: $emailrcpt3\r" >> $tmp
	/bin/echo -e "DATA\r" >> $tmp
	/bin/echo -e "From: root@$hostname\r" >> $tmp
	/bin/echo -e "To: $emailrcpt1, $emailrcpt2, $emailrcpt3\r" >> $tmp
	/bin/echo -e "Date: $datetime \r" >> $tmp
	/bin/echo -e "Subject: Raid may be broken on $host\r" >> $tmp
	/bin/echo -e "\r" >> $tmp
	/bin/echo -e "====> A diff between production and current is:\r" >> $tmp
	/bin/echo -e "\r" >> $tmp
	/bin/awk '{printf("%s\r\n", $0);}' < /tmp/raid-diff >> $tmp
	/bin/echo -e "\r" >> $tmp
	/bin/echo -e "====> Full Raid Current Info:\r" >> $tmp
	/bin/echo -e "\r" >> $tmp
	/bin/awk '{printf("%s\r\n", $0);}' < /tmp/raid-current >> $tmp
	/bin/echo -e "\r" >> $tmp
	/bin/echo -e ".\r" >> $tmp
	/bin/echo -e "quit\r" >> $tmp
	$netcat -i 1 $mailserver 25 < $tmp
	/bin/rm $tmp
	/bin/cp /tmp/raid-current /vmfs/volumes/datastore1/custom/raid-good
fi
/bin/rm /tmp/raid-current
/bin/rm /tmp/raid-diff

Make the script executable:

chmod 755 /vmfs/volumes/datastore1/custom/check-raid.sh

Add line to crontab

The crontab is here: /var/spool/cron/crontabs/root. Add a line to this file to run above script every hour, for example:

35 * * * * /vmfs/volumes/datastore1/custom/check-raid.sh > /dev/null 2>&1

Please note that changes to the crontab file are not persistent. Therefore, this entry must be recreated after booting the hypervisor. This can be done by adding the following line to /etc/rc.local.d/local.sh :

echo "35 * * * * /vmfs/volumes/datastore1/custom/check-raid.sh > /dev/null 2>&1" >> /var/spool/cron/crontabs/root

Testing the script.

Modify /vmfs/volumes/datastore1/custom/raid-good in order to generate a deviation from the current configuration. Then run /vmfs/volumes/datastore1/custom/check-raid.sh from the command line and watch for any error messages from the mail server. You should receive an email about the Smart Array configuration.

Bearbeiter: Peter Sinn

Letzte Änderung: 23.07.2021