Talk:Wikimediastatus.net

Rendered with Parsoid
From Wikitech

Add reference numbers or images for what a "spike" might be defined as

Context: I'm looking at the graphs, and IIUC the scales at the sides are all dynamic. This makes it hard for a lay-person to instantly understand whether a spike is normal or a problem.

  • I.e. During times of stability the graphs will always show many small spikes for a huge variety of reasons. Whereas during outages the scale will change and small spikes will be flattened while 1+ large spikes will appear.
  • E.g. Today the "User-reported connectivity errors" graph (screenshot) shows a graph of "0–1.5 reports/second". As a lay-person looking at that page for the first time, I am uncertain whether the brief spike to ~1.02/s indicated a small problem or a big problem.

I suggest adding something to the Status-page to help explain what a spike might look like or be defined as. E.g.

  • Perhaps adding some numbers into the tooltips? (E.g. "A variation of 0–1/s is a normal baseline; major outages usually go over 9000/s.")
  • Perhaps linking/embedding a screenshot of an old major problem? (or a gallery, as some graphs might remain stable when others are spiking. E.g. this screenshot includes the outage from March 29 but only shows spikes in 2 of the 5 graphs.)

I can currently get a slightly better understanding by looking at the "week" and "month" tabs, but if there's a completely/relatively smooth month then I wouldn't even have that!

HTH! Quiddity (talk) 18:45, 31 March 2022 (UTC)Reply

Great feedback, thanks @Quiddity.
I'll update the tooltip text with some ideas of a normal range. We'll just have to make sure to keep this up-to-date.
I also poked around in the management UI hoping to find a way to set a minimum max-y (instead of a hard max-y that then clips the graph) -- but it doesn't seem to exist. My thought is that this could give a nice visual hint of what the expected range is. I'll file a FR. ✍ CDanis 13:03, 5 April 2022 (UTC)Reply

In the list of 5 graphs at https://www.wikimediastatus.net/#system-metrics, I almost didn't notice the "Day | Week | Month" UI elements because the links are colored grey. I suggest changing these links to be blue, so they are more noticeable and intuitively link-colored. (More context at mw:User:Quiddity/Blue link color). Cheers, Quiddity (talk) 19:13, 31 March 2022 (UTC)Reply

Good idea -- done! ✍ CDanis 21:20, 4 April 2022 (UTC)Reply

I suggest adding a few links to more useful-resources for visitors into the service's page. E.g.

Thanks again! Quiddity (talk) 21:03, 31 March 2022 (UTC)Reply

In general, I am wildly ambivalent about this.
I really want to provide backlinks somewhere, but I'm worried about what will happen when we're having a large outage and the status page is getting a lot of traffic:
  • either the links won't work because we're hard down for many users, or,
  • we potentially create a greater outage, or an outage against some of our not-provisioned-for-whole-Internet-load monitoring tools (e.g. Grafana).
I'm considering providing a mailing list for feedback (since email is async by nature). I'd still like to be able to link to a documentation page somewhere; it will probably be Wikitech as that's well-provisioned. But we probably should note on the page somehow that it might not be reachable in an outage? ✍ CDanis 13:06, 5 April 2022 (UTC)Reply
Ah, right. Good points!
Hmmm. Maybe the links would be accessed significantly less, if they were placed in a collapsed-section? (cf. I've written an essay advising against using collapsed-sections as a UX (without due consideration) at mw:User:Quiddity/Collapsing and hiding, with 'decreased accessibility/readership' as one of the main reasons!) – Or located in a subpage like https://www.wikimediastatus.net/history with provisos/cautions/requests-for-hesitance highlighted at the top? (But I also grok the BEANS problem...)
I.e. In my mind, 2 of the core use-cases for the status page are:
  • (a) for Wikimedians who want to know if a problem is just affecting them (i.e. before asking at their local village-pump or realtime-chat platform), and those folks could be further helped by providing additional avenues for investigation/followup (I.e. "I am affected by a problem, and I see a spike in the status page, but nothing written about it (yet) in the Incident history. I want to make sure it has been reported. My next step is I should check [....].")
  • (b) for external folks, like Press, who might include a link to the status page in their article/tweet/etc. And yeah, we want to not slashdot the links that were provided for group (a)!
It's definitely difficult to balance the two... Quiddity (talk) 19:22, 5 April 2022 (UTC)Reply
I think I've figured out what I want to do here.
This was actually prompted by something unrelated, which was trying out the 'publish postmortem' feature of Statuspage: https://www.wikimediastatus.net/incidents/jnqvz8gljzhy
I don't mean for that to replace incident docs on Wikitech -- in fact I think we should link to a full Wikitech postmortem doc when we have one. But also I'd like for SRE to publish a very abbreviated version on Statuspage (max of a few sentences, and ideally sentences that would be suitable for Simple English Wikipedia).
So now I think I've decided that linking to Wikitech is okay -- if it doesn't work for the user in an outage, it's not a great user experience, but it won't be any worse for us than what is already happening to the site. I don't think we should allow linking to Grafana, Phabricator, Gerrit, etc, directly from the status page -- those tools are much less well-provisioned than the main wiki cluster.
I think I'll make a more user-facing version of the documentation page for the status page and start off by linking to it from the footer there. ✍ CDanis 13:53, 8 April 2022 (UTC)Reply
@CDanis That all sounds good & reasonable. :)
I wondered if anything at all like this already existed, and found that Reporting a connectivity issue is currently linked to from a few places (e.g. mw:How to report a bug#Reporting a connectivity issue and w:en:Wikipedia:Village pump (technical)/FAQ), so perhaps updating that existing page would be good?
HTH. Quiddity (talk) 17:49, 8 April 2022 (UTC)Reply

Alternative to Atlassian Statuspage

Hello, what is the license of Atlassian Statuspage and have you thinked about generating the site statically. I really like static websites and I think it would be great if it is possible for that Website to do so. I think it could be a interesting experiment for the Hackathon in May to try to do that.--Hogü-456 (talk) 19:08, 3 April 2022 (UTC)Reply

Thanks for your comment! Atlassian Statuspage is a commercial, closed-source product. We had considered static site generators but decided that the automated publishing of timeseries metrics was a feature we really wanted. ✍ CDanis 13:01, 5 April 2022 (UTC)Reply
Will you participate at the Hackathon. Up to now I thought that the strategy of the Wikimedia Foundation is to use OpenSource or Free Software where possible. Maybe there will be a alternative for Atlassian Statuspage in the future. I will look if I find a alternative for the automated publishing of timeseries metrics.--Hogü-456 (talk) 18:33, 5 April 2022 (UTC)Reply