This is a description on how to check the status of a HP Smart array and report disk failures by email from the ESXi host using HP command line utilities.
Fur this purpose, the HPSSACLI utility will be used, which is part of HP ESXi Utilities Offline bundles, but also part of customised HP ESXi images for ProLiant servers. The following instructions are applicable to an already existing ESXi host.
First of all you have to grab the HP smart array drivers and the HPACUCLI utility in the vib
format. They can be used both for ESXi 5.5 and 6.0 hosts.
cd /tmp
wget http://vibsdepot.hpe.com/hpq/latest/esxi-600-drv-vibs/hpvsa/scsi-hpvsa-5.5.0.100-1OEM.550.0.0.1331820.x86_64.vib
wget http://vibsdepot.hpe.com/hpq/latest/esxi-600-vibs/hpssacli/hpssacli-2.30.6.0-6.0.0.vib
Install both files on the ESXi host using these commands:
esxcli software vib install -f -v /tmp/scsi-hpvsa-5.5.0.100-1OEM.550.0.0.1331820.x86_64.vib
esxcli software vib install -f -v /tmp/hpssacli-2.30.6.0-6.0.0.vib
reboot
With the hpssacli
utility, you can check the disk status and compare it to the saved healthy state of the array. For this purpose, we must save the healthy state of the HP Smart Array configuration to a location that will not be overwritten during reboot.
Let us find out the controller configuration first:
/opt/hp/hpssacli/bin/hpssacli controller all show config
Then, use this information to store the HP Smart Array configuration to a newly created directory on datastore1
.
mkdir /vmfs/volumes/datastore1/custom
/opt/hp/hpssacli/bin/hpssacli controller all show config > /vmfs/volumes/datastore1/custom/raid-good
For further commands and how to use the HPACUCLI utility see: kallesplayground.wordpress.com/useful-stuff/hp-smart-array-cli-commands-under-esxi/
An email shall be sent to the administrators If a disk failure was detected. For this purpose, netcat
is used to send the email, because there is no email software on an ESXi host.
smtp.xml
and add the following content<ConfigRoot>
<service id='1000'>
<id>SMTP_Outbound</id>
<rule>
<direction>outbound</direction>
<protocol>tcp</protocol>
<porttype>dst</porttype>
<port>25</port>
</rule>
<enabled>true</enabled>
<required>false</required>
</service>
</ConfigRoot>
/etc/vmware/firewall/
mv smtp.xml /etc/vmware/firewall/
esxcli network firewall refresh
Now create a shell script check-raid.sh
which can be used to monitor the disk status and to send out emails. Adapt this script to your needs by replacing the parameters in angulated brackets with your values. Store the script in /vmfs/volumes/datastore1/custom
.
#!/bin/sh
# declaration section
netcat="/bin/nc"
tmp="/tmp/raid-mail"
host="<hostname>"
hostname="$host.<domain>"
emailrcpt1="<admin1@domain.com>"
emailrcpt2="<admin2@domain.com>"
emailrcpt3="<admin3@domain.com>"
mailserver="<mailserver.domain.com>"
datetime=`date '+%a, %d %b %Y %H:%M:%S %z'`
# read the configuration
/opt/hp/hpssacli/bin/hpssacli controller all show config > /tmp/raid-current
curdiff=`/bin/diff -u /vmfs/volumes/datastore1/custom/raid-good /tmp/raid-current`
/bin/diff -u /vmfs/volumes/datastore1/custom/raid-good /tmp/raid-current > /tmp/raid-diff
# send email alert
if [ "$curdiff" != "" ] ; then
/bin/echo -e "HELO $hostname\r" > $tmp
/bin/echo -e "MAIL FROM: root@$hostname\r" >> $tmp
/bin/echo -e "RCPT TO: $emailrcpt1\r" >> $tmp
/bin/echo -e "RCPT TO: $emailrcpt2\r" >> $tmp
/bin/echo -e "RCPT TO: $emailrcpt3\r" >> $tmp
/bin/echo -e "DATA\r" >> $tmp
/bin/echo -e "From: root@$hostname\r" >> $tmp
/bin/echo -e "To: $emailrcpt1, $emailrcpt2, $emailrcpt3\r" >> $tmp
/bin/echo -e "Date: $datetime \r" >> $tmp
/bin/echo -e "Subject: Raid may be broken on $host\r" >> $tmp
/bin/echo -e "\r" >> $tmp
/bin/echo -e "====> A diff between production and current is:\r" >> $tmp
/bin/echo -e "\r" >> $tmp
/bin/awk '{printf("%s\r\n", $0);}' < /tmp/raid-diff >> $tmp
/bin/echo -e "\r" >> $tmp
/bin/echo -e "====> Full Raid Current Info:\r" >> $tmp
/bin/echo -e "\r" >> $tmp
/bin/awk '{printf("%s\r\n", $0);}' < /tmp/raid-current >> $tmp
/bin/echo -e "\r" >> $tmp
/bin/echo -e ".\r" >> $tmp
/bin/echo -e "quit\r" >> $tmp
$netcat -i 1 $mailserver 25 < $tmp
/bin/rm $tmp
/bin/cp /tmp/raid-current /vmfs/volumes/datastore1/custom/raid-good
fi
/bin/rm /tmp/raid-current
/bin/rm /tmp/raid-diff
Make the script executable:
chmod 755 /vmfs/volumes/datastore1/custom/check-raid.sh
The crontab is here: /var/spool/cron/crontabs/root
. Add a line to this file to run above script every hour, for example:
35 * * * * /vmfs/volumes/datastore1/custom/check-raid.sh > /dev/null 2>&1
Please note that changes to the crontab file are not persistent. Therefore, this entry must be recreated after booting the hypervisor. This can be done by adding the following line to /etc/rc.local.d/local.sh
:
echo "35 * * * * /vmfs/volumes/datastore1/custom/check-raid.sh > /dev/null 2>&1" >> /var/spool/cron/crontabs/root
Modify /vmfs/volumes/datastore1/custom/raid-good
in order to generate a deviation from the current configuration. Then run /vmfs/volumes/datastore1/custom/check-raid.sh
from the command line and watch for any error messages from the mail server. You should receive an email about the Smart Array configuration.