UNIX Tutorials, Tips, Tricks and Shell Scripts

ProcMonUX - a Simple Lightweight Linux Process Monitor Script with Alerts, Restart and Logging (UNIX compatible)

If you do not have the time or resources to learn, install and configure a more complex Linux process monitoring tool, this simple (and free) shell script may be the right option for you. ProcMonUX is a simple shell script that will monitor Linux process availability, send you an email when a process is not running, and automatically restart the process for you.

This process monitoring script can be used to monitor just one process, or multiple processes. Although it uses the BASH shell and runs on Linux, it also works with UNIX and has been tested using the Korn shell.

ProcMonUX consists of two parts: (1) the shell script that handles the Linux process monitoring, and (2) the automated running of the monitoring script. The first part we will look at is the script itself, stepping through it line by line.

IMPORTANT: BEFORE you deploy a script (or any new program, patch, application, database, upgrade, etc for that matter) in a production environment, always test in a non-production environment (i.e., development or QA) to ensure the script behaves as expected. This will keep your constituents happy and will enable you to make it home in time for dinner.  =)

NOTE: The number at the start of each line is included for reference purposes. CLICK HERE for a copy of this script without the line numbers.

 1 PS=/bin/ps
  2 GREP=/bin/grep
  3 AWK=/bin/awk
  4 LOG=/root/lfl/bin/logs/lfl_py_cb_chk.log
  5 MAIL_RECIP="your_email@your_domain.com"
  7 PROC_CNT=2
  9 PROC_NAME[1]="mysqlquotad"
 10 PROC_ACTION[1]="restart"
 11 PROC_CMD[1]="nohup /usr/bin/perl /usr/local/bin/mysqlquotad /etc/mysql_quota.conf &"
 13 PROC_NAME[2]="nginx"
 14 PROC_ACTION[2]="email"
 15 PROC_CMD[2]="/etc/init.d/nginx start"
 17 i=0
 18 while [ $i -ne $PROC_CNT ]
 19 do
 20   (( i=i+1 ))
 21   #echo "${PROC_NAME[$i]}"
 22   $PS -ef | $GREP ${PROC_NAME[$i]} | $GREP -vq grep
 23   FOUND=$?
 24   if [ $FOUND -ne 0 ]
 25   then
 26     DATE=$(date)
 27     echo "${DATE}: ${PROC_NAME[$i]} not found!" >> $LOG
 28     if [ ${PROC_ACTION[$i]} == "restart" ]
 29     then
 30       DATE=$(date)
 31       mail -s "LFL/PROC/MONITORING: process ${PROC_NAME[$i]} was not found! Attempting to restart..." $MAIL_RECIP < /dev/null
 32       echo "${DATE}: attempting to restart ${PROC_NAME[$i]} with ${PROC_CMD[$i]} ..." >> $LOG
 33       su - root -c "${PROC_CMD[$i]}"
 34     elif [ ${PROC_ACTION[$i]} == "email" ]
 35     then
 36       echo "${DATE}: sending email notification to ${MAIL_RECIP} ..." >> $LOG
 37       mail -s "LFL/PROC/MONITORING: process ${PROC_NAME[$i]} was not found! MANUAL INTERACTION REQ'D!!!" $MAIL_RECIP < /dev/null
 38     fi
 39   fi
 40 done
 42 exit

Lines 1-5: create five shell script variables that are used within the script. The first three variables point to system commands and utilities, and you may or may not need to change these values for your system. You can use the "which" command to determine if these need to be updated. For example, ...

-bash-3.2# which ps
..displays the path to the "ps" command (used to display information about active processes). The same approach is also used for the grep and awk utilities (script lines 2 and 3).

Line 4 defines a variable for a log file for the script to write to. It's not absolutely necessary to have this log file, but it may be useful for historical information or if you want to enhance the script to log additional details while monitoring processes.

Line 5 should be obvious. This is where an email is sent when a monitored process is not found. Insert your email here, or the email for your manager if you want to harass him/her. (Just a little joke.)

Lines 7-15: These lines work in concert with each other. The value stored in the PROC_CNT shell variable is the number of processes the script will monitor. In this sample script, only two processes are being monitored - "mysqlquotad" AND "nginx"

For each unique process (name) you want to monitor, you will need a set of lines similar to lines 9-11 and lines 13-15. Since this sample script only monitors two processes, there are only two sets of these lines.

If you wanted to monitor five processes, you would have five sets of these lines. The number within the brackets for each variable name would increment by one, and PROC_CNT (line 7) would need to be set to the number of processes you are monitoring...or to put another way, the number of sets of these lines you have.

Each set of lines (e.g., lines 9-11) set the values for three different shell arrays. The names of the shell arrays are PROC_NAME, PROC_ACTION and PROC_CMD.

PROC_NAME - is the exact name of the process you want to monitor. In this sample shell script, the two process names are "mysqlquotad" (line 9) and "nginx" (line 13). The process name is the name you will see in the CMD column of output from the "ps" command.

PROC_ACTION - is the action the script should take if the monitored process is not found. For this script, there are only two options - "restart" OR "email" The "email" option will cause the script to create an entry in the log file (as defined in line 4) AND send an email (to the address defined in line 5). It will NOT automatically restart the process. The "restart" option will do everything the "email" option does, but will also attempt to restart the process using the command you specified for PROC_CMD (lines 11 and 15 in the sample script). *You can always define more options that are valid for PROC_ACTION, but will need to update the script accordingly.

PROC_CMD - this is the command the script will run when attempting to restart the process, and is only relevant if PROC_ACTION is set to "restart"

Line 17: initializes the loop counter ("i") to 0

Lines 18: begins the while loop.

Since the loop counter (value stored in the variable "i") is not equal to the value stored in PROC_CNT, we enter the loop.

Line 20: increments the loop counter by 1

Line 21: this line is currently commented out with a pound/hash character ("#"), but you can uncomment it for troubleshooting if needed.

Before you continue reading... Has this article been helpful to you? Would it benefit others? If you answered "yes" to either question, kindly share the page.

Thank you for sharing.

Line 22: runs the ps command (stored in the PS shell variable) and greps for the value stored in PROC_NAME[$i], or PROC_NAME[1] for the first time through the loop. Looking at line 9 we see that PROC_NAME[1] contains the value "mysqlquotad" The output from that grep command is piped (|) to another grep command - $GREP -vq grep The "-q" option prevents anything from being written to standard output, and the "-v" option performs an inverted match so that the grep statement that looks for the process name is not counted. For example, ...

-bash-3.2# ps -ef | grep mysqlquotad
root 7269 3057 0 11:14 pts/3 00:00:00 grep mysqlquotad
root 12049 1 0 10:17 ? 00:00:00 /usr/bin/perl /usr/local/bin/mysqlquotad /etc/mysql_quota.conf

-bash-3.2# ps -ef | grep mysqlquotad | grep -v grep
root 12049 1 0 10:17 ? 00:00:00 /usr/bin/perl /usr/local/bin/mysqlquotad /etc/mysql_quota.conf
Line 23: captures the return value of line 22 (the grep sequence) and stores it in the shell variable FOUND. If the process name was found, the value from the grep sequence will be zero. If it was not found, the returned value will be one.

Lines 24-39: contains the main if statement block. If the process was found, per line 22, this block of code will not be run.

Lines 26-27: if the process was not found in the ps listing, these lines get today's date (and time!!) and uses the echo command to write a single line entry in the log file (as defined in line 4 - LOG). Notice that two redirect characters (">>") are used. This causes the echo statement to append an existing file (also creates it if it does not exist). If a single redirect character (">") is used, a new file will be created each time this line is run...we do not want to overwrite existing log file entries.

Line 28: check if the process action (PROC_ACTION) for this monitoring set (see lines 9-11) is equal to "restart" and if it is run lines 30-33 if it is.

Lines 31-32: sends an email to address defined in line 5, and creates an entry in the log file (as defined by LOG) indicating that the script will attempt to restart the process

Line 33: run a shell (as root, but can another user) and (using "-c" option) pass the shell the command to restart this process (as defined in PROC_ACTION[1])

Lines 34-37: if the value stored in PROC_ACTION for this monitoring set was not "restart" (per check in line 28), see if the value stored in PROC_ACTION is "email" and create a log file entry and send and email if it is.

Line 40: end of the while loop (started in line 18). Loops back to line 18 to see if the value in shell variable "i" is not equal to the value stored in PROC_CNT ("2" for the sample script). If it is not equal, which it is not since we've only been through the loop once ("1" is currently stored in "i"), the go through the loop again and increase the value in "i" to 2 as a first step. (This means that the loop will not be performed again after this iteration completes since the value in "i" (2) is equal to the value stored in PROC_CNT (2)).

Line 42: exit the script

Now that you understand how the process monitor script works, let's look at how to automate it so that it runs at a regular interval. This is accomplished by adding an entry in your crontab file, which is used by the cron daemon to run the script at the desired interval.

The "crontab -e" command can be used to add an entry similar (your frequency and path to the script will likely be different) to the following to your crontab:

*/1 * * * * /bin/bash /root/lfl/bin/lfl_py_proc_mon >/dev/null 2>&1

The script is invoked using BASH (/bin/bash /root/lfl/bin/lfl_py_proc_mon) once every minute (*/1).

Although this Linux process monitoring script is quite simple and does not have all the bells and whistles included in more complex Linux process monitoring tools, my hope is that is may be useful to you as a quick and easy to deploy and configre process monitoring solution and hopefully taught you a few shell scripting techniques and tricks along the way!

Comments, suggestions or questions about this process monitoring script can be sent to info@livefirelabs.com.

Do you need to learn UNIX shell scripting and get practice writing & running scripts...on a REAL SERVER? If you are ready to move past the basics, either of these online courses is a good place to start...

UNIX and Linux Operating System Fundamentals contains a very good "Introduction to UNIX Shell Scripting" module, and should be taken if you are new to the UNIX and Linux operating system environments or need a refresher on key concepts.

UNIX Shell Scripting is a good option if you are already comfortable with UNIX or Linux and just need to sharpen your knowledge about shell scripting and the UNIX shell in general.

Both courses include access to a real server in our Internet Lab for completing the course's hands-on exercises, which are used to re-enforce the key concepts presented in the course. Any questions you may have while taking the course are answered by an experienced UNIX technologist.

Thanks for reading and sharing this Linux process monitoring script with others!!!