-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl script blocking itself on multiple icinga events #8
Comments
I'm having the same issue. |
No edits to this repo in a couple years, doubt this will get a fix.. I personally never resolved this. |
Thanks. I also opened a ticket with PD, let's wait for their thoughts. Did the cron job workaround worked for you or did you still get these locks? I find it strange that these two products don't have a better integration. |
So I worked on this over a year ago, but if I remember correctly: Using the cron job doesn't fix the lock issue (or resulting errors), they still happen regardless. I hope that makes sense? I can elaborate further if needed. |
PagerDuty are working on a new integration using the Nagios agent. It should be out shortly. |
So has the new integration using the nagios agent been released? |
Looks they they have nagios integration now, did a quick search and came up with: |
I found that first one as well, but it references https://github.com/PagerDuty/pagerduty-nagios-pl which says:
That's even older than this repo. Looks like it might be https://github.com/PagerDuty/pdagent-integrations ... wonder how much of a pain it'll be to make that work with Icinga2. |
Looks like there is already some icinga2 support built in: EDIT: added some links |
I am also having this issue. Just like @ronindesign mentioned. Was anybody able to find a solution for this? UPDATE: Wrong permissions on folder /tmp/pagerduty_icinga.
|
When Icinga triggers multiple issues, the NotificationCommand "notify-service-by-pagerduty" fires multiples times.
One of the calls makes it, locking / blocking on file: /tmp/pagerduty/lockfile
All of the other instances of notify-service-by-pagerduty fail, with their shell script exiting on the following error:
/var/log/icinga/icinaga.log:
/var/log/syslog:
This happens because each icinga event triggers an enqueue on pagerduty_icinga.pl, which internally calls (or tries to call) the method 'lock_and_flush_queue'. Only one instance gets the successful locks, the others are blocked.
This is not a fatal issue. If I have my cron job setup correctly, 1 minute later, the other entries will be called when 'pagerduty_icinga.pl flush' is called.
However, this is still not ideal. The pagerduty_icinga.pl enqueue process should either only enqueue (without attempting flush, and thus blocking itself) or it should implement some passive check timeout / keepalive option in the pearl script for the 'lock_and_flush_queue' section.
These processes finish almost immediately, so a keepalive would only need to be a matter of a few seconds, after which the calls could still be allowed to fail out, there would just now be a small buffer / threshold were multiple calls could be made successively.
The text was updated successfully, but these errors were encountered: