Page MenuHomePhabricator

[Bug] Page content service is deployed with localhost links to the CSS and JS, breaking all pages that have been edited recently
Closed, ResolvedPublic

Description

Steps to Reproduce

  1. Open https://en.wikipedia.org/api/rest_v1/page/mobile-html/Politics
  2. Observe the page

Expected Results

  • CSS/JS loads properly
  • CSS/JS is properly linked
<link rel="stylesheet" href="https://meta.wikimedia.org/api/rest_v1/data/css/mobile/base">
[...]
<script src="https://meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pcs"></script>

Actual Results

  • CSP issues:
Refused to load the stylesheet 'http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/base' because it violates the following Content Security Policy directive: "style-src app://meta.wikimedia.org https://meta.wikimedia.org app://*.wikipedia.org https://*.wikipedia.org 'self' 'unsafe-inline'". Note that 'style-src-elem' was not explicitly set, so 'style-src' is used as a fallback.

Politics:1 Refused to load the stylesheet 'http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/pcs' because it violates the following Content Security Policy directive: "style-src app://meta.wikimedia.org https://meta.wikimedia.org app://*.wikipedia.org https://*.wikipedia.org 'self' 'unsafe-inline'". Note that 'style-src-elem' was not explicitly set, so 'style-src' is used as a fallback.

Politics:1 Refused to load the script 'http://localhost:6011/meta.wikimedia.org/v1/data/javascript/mobile/pcs' because it violates the following Content Security Policy directive: "script-src app://meta.wikimedia.org https://meta.wikimedia.org 'unsafe-inline'". Note that 'script-src-elem' was not explicitly set, so 'script-src' is used as a fallback.
  • CSS/JS is linked to localhost
<link rel="stylesheet" href="http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/base">
[...]
<script src="http://localhost:6011/meta.wikimedia.org/v1/data/javascript/mobile/pcs"></script>

Environments Observed

Production

Additional notes

Varnish and RESTBase caches will need to be purged of the articles that were rendered incorrectly with the localhost links

Event Timeline

JoeWalsh triaged this task as Unbreak Now! priority.Sep 9 2020, 4:06 PM
JoeWalsh updated the task description. (Show Details)

So the broken configuration (that I deployed, sorry about that) has been fixed.

Now the problem that needs to be solved is to purge the broken pages from restbase.

We need to basically purge all pages cached by restbase between 08:40 and 16:10 today (and never refreshed).

Sadly there doesn't seem to be a way to do so in cassandra, so @Pchelolo is trying to find a way to do so.

Change 626189 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/deployment-charts@master] Update mobileapps to 2020-09-09-171242-production

https://gerrit.wikimedia.org/r/626189

Change 626189 merged by jenkins-bot:
[operations/deployment-charts@master] Update mobileapps to 2020-09-09-171242-production

https://gerrit.wikimedia.org/r/626189

Status update: we've decided to invalidate content for mobile-html in restbase, so whatever is not cached at the edge will be re-rendered if not rendered since the deployment of the new restbase version.

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:35:56Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:41:56Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:42:57Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2

Joe claimed this task.

This bug should now be resolved. Please reopen if this behaviour persists.

This patch should be linked as the actual fix: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/626178/, reverting changes from https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/626102.

I guess nobody noticed the issue when this config was deployed to staging only.

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:52:35Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:52:42Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:59:29Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:59:56Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout

Mentioned in SAL (#wikimedia-operations) [2020-09-09T18:02:51Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)

Joe added a project: Traffic.

Is there a separate task for the mobile-html-offline-resources issue or are we combining that?

Sadly, we still have caching issues:

These latter urls have a max-age of 1 day, it would be needed to purge them all (they're not computationally expensive, so it's ok to just ban them).

Sadly, I tried to do what wikitech suggests:

sudo cumin -b 1 A:cp-text "varnishadm -n frontend ban 'req.url ~ \"^/api/rest_v1/page/mobile-html-offline-resources/\"'"

and while this actually purged the varnish frontends, it did nothing for ATS, and I find no indication on how to purge content there. Any advice would be welcome.

Change 626210 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] trafficserver: Cache-ban pages with localhost links from page content service

https://gerrit.wikimedia.org/r/626210

Change 626210 merged by RLazarus:
[operations/puppet@production] trafficserver: Cache-ban pages with localhost links from page content service

https://gerrit.wikimedia.org/r/626210

RLazarus subscribed.

I purged /api/rest_v1/page/mobile-html/ and /api/rest_v1/page/mobile-html-offline-resources/ in ATS, then re-ran Joe's command from T262437#6448216 for both. This should now be fully expunged from cache, although it may persist in your browser cache for up to a day or until you refresh.

After waiting 24 hours, I'll revert the ATS patch.

Change 627328 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] Revert "trafficserver: Cache-ban pages with localhost links from page content service"

https://gerrit.wikimedia.org/r/627328

Change 627328 merged by RLazarus:
[operations/puppet@production] Revert "trafficserver: Cache-ban pages with localhost links from page content service"

https://gerrit.wikimedia.org/r/627328