The problem
I have a MISP instance to share threat information with partners and various communities. MISP uses workers to perform certain tasks. Sometimes, these workers die (there is at least one other person out there with the same problem). This is of course something you don’t want to happen and we should fix the root cause of this problem. However, I also want to monitor this in in my regular system monitoring tool.
The problem is thus: How do we monitor MISP’s workers? Below is my work log; any pointers are welcome at e.g. Twitter.
The research
The first step in solving this problem is of course using Google to see if someone else solved this problem before us. The first page of hits takes us to the MISP documentation regarding restarting workers. This gives us, at the very least, a CLI way of restarting the dead workers. Additionally we find this blog post by Inspark about setting up MISP in Azure. Unfortunately this was a false positive and does not contain anything regarding the monitoring of workers.
Revisiting the Github issue that I mentioned earlier we see that this could have to do with Apache being restarted. As autoupdates are enabled, this could very well be happening.
So, our next step. We need to find a way to check from the CLI if the workers are alive. ps faux
comes to mind.
CakeResque
MISP uses CakeResque – a CakePHP plugin for creating background jobs – for creating background jobs.
The solution
So.. I didn’t look into MISP monitoring for a long time. Priorities shifted, MISP ran stable for a long time, etcetera. When I wanted to pick it back up (12 March 2021 – yeah.. I know..) I found out that two excellent posts have been made on the MISP project website. They concern monitoring MISP using Cacti and using OpenNSM. While I don’t use either of these monitoring services, the posts do give out pointers on how to monitor things like MISP workers, and as such they should be useful for anyone wanting to monitor their MISP installation.
Nice work, Koen and Sascha!