Page MenuHomePhabricator

Should WMCS be getting CF protection?
Closed, ResolvedPublicSecurity

Description

Right now most WMCS things (specifically, public IPs associated with VMs and all that flows from that) are not covered by wikiland's cloudflare protection. The only WMCS thing that /is/ covered is our DNS, and that's mostly an accident of history.

What would be involved in moving all the WMCS IPs behind cloudflare? What are the upsides, and what are the downsides?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

What are the upsides

In-bound DDoS protection is the only upside as far as I know.

what are the downsides?

Mystery routing issues when the DDoS protection goes awry?

Another potential downside could be: it will cost money

Indeed! Only upside I can think of is inbound DDoS protection :)

Downside are additional configuration and troubleshooting complexity. For example thresholds are tuned for heavy inbound type of traffic (LVS), which is different from the WMCS type of traffic.

the visibility and user impact of a DDoS on WMCS is also much lower than our main VIPs.

So far I'd recommend to stay as it, and revisit if it becomes a problem.

Right now the WMCS NSes ({ns0,ns1}.openstack.eqiad1.wikimediacloud.org) are behind CF, but nothing else from WMCS is.

This is both unlike the rest of WMCS (which isn't at all behind CF), and also, unlike the prod NSes (which are available from multiple sites, some of which deliberately are not behind CF).

This is what led to the impact on WMCS in 2021-05-14 CF Magic Transit dropping all UDP traffic.

I suggest that we leave WMCS not behind CF, but also, migrate those WMCS NS IPs to be on the main WMCS subnet, rather than the prod eqiad one that is fronted by CF.

Is there anything at all hosted in WMCS that is too risky (e.g. production depends on it to function) to leave exposed to being DDoSed?

Reedy subscribed.

Couple of quick questions...

If we needed to turn it on for WMCS (because of a DDoS or similar targetting ?) how much effort/time would that take? ie could we do this "relatively quickly" if necessary?

Is there a cost implication? Does that matter? Is it significant?

SecTeam is happy to defer to traffic/cloud team for what to do, but would be useful to at least know on the first question

Yep, we can turn it on relatively quickly in an emergency. No monetary cost.

sbassett subscribed.

Yep, we can turn it on relatively quickly in an emergency. No monetary cost.

Ok, this sounds good, and mitigates this to a low risk for the Security-Team, so no further action is required as far as we're concerned (unless sre or the wmcs team would like to develop/update any related playbooks).

Andrew claimed this task.

Agreed, i don't think there are any immediate actions needed here. Thanks!

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Jun 7 2021, 4:49 PM
sbassett changed the edit policy from "Custom Policy" to "All Users".