Page MenuHomePhabricator

PoC alert/notification functionality with Elastic Stack
Closed, DeclinedPublic

Description

Referred to in T123243 and T211700 there has been talk for some time of looking into https://github.com/Yelp/elastalert (or alternatives?) for alerting and correlation of logs (mentioned in the logging design doc as well). One of the ideas here is that this replaces the work done in T208611 (which will make @Volans very happy)

I'm going to try to workshop this out a bit in the logging cloud project and then possibly move demo functionality to deployment-prep depending on how things go.

Event Timeline

chasemp created this task.

Change 502773 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] WIP elastalert module

https://gerrit.wikimedia.org/r/502773

Change 503014 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] aptrepo: add component/elastalert

https://gerrit.wikimedia.org/r/503014

Reassiging to reflect the reality of Filippo's awesomeness

Change 503014 merged by Filippo Giunchedi:
[operations/puppet@production] aptrepo: add component/elastalert

https://gerrit.wikimedia.org/r/503014

Change 505762 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] WIP elastalert: enable on logstash1007

https://gerrit.wikimedia.org/r/505762

Elastalert is running on deployment-logstash2 now (I had to fudge with it a little because the instance is jessie (cfr T218729)) but other than that it'll work like in production (i.e. with https://gerrit.wikimedia.org/r/c/operations/puppet/+/505762 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/502773 merged in production, as opposed to cherry-picked on deployment-prep puppetmaster)

Rules are only on the host itself for experimentation purposes, for the first iteration we'll have the rules in private.git and possibly in the future in a separate rules private (in the sense of gerrit access) git repository to enable self-service.

The service name is elastalert@security and config / rules live in /etc/elastalert/security. I left a badpass.yaml example file, feel free to change/tweak as needed! cc @Dsharpe and let me know how we can help!

sbassett changed the task status from Open to Stalled.Oct 4 2019, 5:05 PM

Hey @fgiunchedi - I don't believe anyone on the Security-Team currently has access to deployment-logstash2 / https://logstash-beta.wmflabs.org/, so this isn't really feasible for us to test until 1) @chasemp returns 2) more of us get access. Setting to stalled for now.

fgiunchedi subscribed.

Hi @sbassett, apologies for the delayed reply! I'm not sure if deployment-prep access is all-or-nothing for services or shell access. In the sense that access to https://logstash-beta.wmflabs.org is one shared user/password and credentials are stored in a file in one of the deployment-prep hosts.

At any rate, I'm not sure when I'll have time / bandwidth to resume work on this and e.g. make sure elastalert deployment-prep works as expected, unassigning for now

Change 505762 abandoned by Filippo Giunchedi:

[operations/puppet@production] elastalert: enable on logstash1007

Reason:

Not relevant anymore

https://gerrit.wikimedia.org/r/505762

Change 502773 abandoned by Filippo Giunchedi:

[operations/puppet@production] elastalert: new module

Reason:

Not relevant anymore

https://gerrit.wikimedia.org/r/502773

Closing as declined, nowadays we can implement the same via alerts on logs