1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Linux watchdog handler

Discussion in 'General Linux Discussion' started by adamjohnson, Nov 7, 2016.

  1. adamjohnson

    adamjohnson New Member

    Joined:
    Oct 3, 2016
    Messages:
    10
    Likes Received:
    0
    Trophy Points:
    1
    Hi all,

    I'm interested in monitoring the processes running in a Linux system and determining when they are stuck/running endlessly very quickly.
    Once I determine this, I also want to take on some actions (like dumping some debug info, restarting the process, etc..).

    I know I can detect stuck processes using systemd, but unfortunately I wasn't able to take action (where can I specify a script that I want to run when some process heartbeats are missed ?)

    Are you aware about other tools that act like watchdog monitors ?
    (processes can register to them, start sending heartbeats, and in case some heartbeats are missed, the tools takes some actions.

    I am aware I can write my own tool - I just want to know if there's anything else offering this functionality.

    Thank you,
  2. Gizmo

    Gizmo Chief Site Administrator Staff Member

    Joined:
    Dec 6, 2012
    Messages:
    2,230
    Likes Received:
    156
    Trophy Points:
    63
    Location:
    Webb City, Missouri
    Home page:
    Generically speaking, it is impossible to 'detect' if some random program is 'stuck'. By definition, 'stuck' means 'not operating correctly'.

    If I give you a list of 100 executables, how are you going to determine if they are not operating correctly? Doing so implies some knowledge of the application, of what it should be doing when operating 'correctly'. Without that knowledge, the task of determining if an application is 'stuck' is impossible. The only way systemd does it is via the application sending a 'heartbeat' (a special systemd message) to systemd periodically. If that heartbeat doesn't arrive at a regular interval, systemd assumes the application is not working and kills it.

    Note that, to the best of my knowledge, this only works with daemons launched by Systemd, and also (as mentioned above) requires the use of a special message; in other words the application has to be specifically written to provide the information.

    That being said, there ARE daemons that can monitor for SPECIFIC POTENTIAL indications of application issues, such as exceeding user definable memory utilization or CPU usage thresholds.

    Monit is one such daemon
    Here is a script at superuser.com

    I'm sure there are others, but these are a couple I found with a quick Google.

Share This Page