Page MenuHomePhabricator

Add Traffic's notion of "from public cloud" to Analytics webrequest data
Closed, ResolvedPublic

Description

Various bits of ratelimiting in Traffic's VCL have a notion of if a request is originating from a large public cloud (for example Amazon EC2, Google Compute Engine, etc). This is maintained by Traffic/SRE in the private Puppet repo as a list of IP blocks named public_cloud_nets (created by combining lists published by cloud providers under various APIs).

It would be useful to have this Boolean attached in the Analytics data (in the webrequest Hive table, in Turnilo, in the Kafka stream that winds up on centrallog1001 etc) for cursory glances at traffic sources, verifying that ratelimiting logic is working, etc.

In this case it seems to make the most sense for Traffic to provide this tag directly: we don't want to have to sync Analytics's view vs Traffic's view of a list of cloud nets or ASNs, etc, or to reason about either of those lists diverging in the time domain. What we really want to know is "did the cache categorize this request as public_cloud_nets at the time it processed it?" So we should just provide it directly.

So I suggest we add a new public_cloud key to the X-Analytics header. Following existing convention for Booleans, this can be set to a value of 1 iff true, and otherwise not present.

[x ] Add support for the field in Druid's data load of webrequest, so it becomes visible in turnilo (nothing to be done in the webrequest table, x-analytics is a map)

Event Timeline

+1 on the approach (updating the task description for details)

fgiunchedi triaged this task as Medium priority.Apr 13 2021, 1:04 PM

Change 679341 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] Add a public_cloud bit to X-Analytics

https://gerrit.wikimedia.org/r/679341

Change 679341 merged by CDanis:

[operations/puppet@production] Add a public_cloud bit to X-Analytics

https://gerrit.wikimedia.org/r/679341

CDanis added a subscriber: fdans.

@fdans @JAllemandou New map entry should be ready for Analytics to set up in Turnilo :)

Change 692310 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Add public_cloud info to webrequest in druid

https://gerrit.wikimedia.org/r/692310

@CDanis the patch for Druid is there - sorry for not having acted quicker.

Change 692310 merged by Milimetric:

[analytics/refinery@master] Add public_cloud info to webrequest in druid

https://gerrit.wikimedia.org/r/692310

Change 692926 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Add is_from_public_cloud to webrequest turnilo config

https://gerrit.wikimedia.org/r/692926

Change 692926 merged by Razzi:

[operations/puppet@production] Add is_from_public_cloud to webrequest turnilo config

https://gerrit.wikimedia.org/r/692926

The new field is in turnilo with data starting from May 18th 2021.
https://w.wiki/3MJq
Resolving the task :)