Intake Standards
SRE Observability
SRE Observability - Monitoring and Logging (Prometheus/Grafana and ElasticSearch, plus some Kafka).
The Observability team, or "o11y" for short, works across SRE and Technology to provide teams with tools, platforms and insights into how systems and services are performing. It leverages technologies such as Grafana, Kibana/Logstash, Prometheus, AlertManager and more.
1How we work.
1.2How to connect (and why)
1.2.1Office Hours
1.3How we Organize our Work
1.4Phabricator Workflow
2How we Plan our Roadmap
2.1.1Project Work (OKR)
2.1.2Maintenance (non-OKR)
2.1.3Work Cadence Summary
How we work.
The Observability team (often referred to as o11y) maintains several tools while curating and building a collaborative roadmap for the Wikimedia Foundation. Maintaining several of these work streams provide challenges as there is work that is adjacent, related, and directly assigned to the Observability team workboard but is not easily distinguishable from each other.
The purpose of this document is to describe:
How to connect (and why)
Reach out:
Office Hours
We also have office hours every week; if you wish to join or participate please reach out using one of the previous channels to receive the invitation. Office hours happen weekly after the first 20 mins of every team planning session. These happen every Monday at 3pm UTC on google hangouts.
How we Organize our Work
Work comes from multiple sources, but most requests should land in Phabricator. The #observability tag/project is the tag we use for all incoming work. Tasks will be one of six major states: inbox, backlog, scheduled, in progress, radar, done/closed.
The Observability team grooms incoming tasks on a weekly basis normally during planning meetings on Monday at 8:00 AM Pacific. Some requests may receive an out of band prioritization effort.
This happens by reviewing the inbox for the #observability (component) workboard and the #sre-observability (group) workboard. (Inbox for a quarter or UP next)
From there the task or request should go through a quick prioritization of "done this quarter" i.e. time sensitive, or backlog if the task is actionable for the Observability team. Otherwise the task goes to radar or is blocked in the backlog if unable to move forward. Tasks which do not have enough information provided to groom will receive a follow-up comment and remain unprioritized in the general backlog until enough information has been collected to effectively perform the task.
Phabricator Workflow
How we Plan our Roadmap
Roadmap planning will be a rolling 1+ year roadmap with the goal to have a list of tasks pre-groomed and prioritized periodically (quarterly).
There are 6 major work categories that drive efforts of the o11y team:
The goal of this process is to:
This is both a scheduled and a continual effort in sizing up work and importance/impact of specific work streams. The team is employing a simple forced rank list of priorities that are fed from the intake process and groomed by the team. This effort in turn is then taken to a spreadsheet where these projects are scored for overall feel on value and capacity.
Order of presence for prioritization:
Project Work (OKR)
All project work is prioritized and groomed beforehand. Overarching project tasks are created in Phabricator with subtasks, both of which will be tagged with a FY/ Quarter "milestone" to indicate scheduling for projects that span multiple quarters or years.
Maintenance (non-OKR)
Planned maintenance will follow the same workflow as regular project work, unplanned maintenance or requests will be groomed and prioritized based on urgency and severity.
Work Cadence Summary
Intake Grooming + PrioritizationWeeklyo11y office hours
Planning (rolling roadmap)QuarterlyOKR Meetings
Annual PlanningYearlyTBD
Category​: Monitoring
This page was last edited on 9 September 2021, at 13:48.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details.
Privacy policy
About Wikitech
Code of Conduct
Mobile view
Cookie statement
Create accountLog in
ReadView sourceView history
Visit the main pageMain pageRecent changesServer admin log: ProdAdmin log: RelEngIncident statusDeploymentsSRE/Operations HelpCloud VPS portalToolforge portalRequest VPS projectAdmin log: Cloud VPSWhat links hereRelated changesSpecial pagesPermanent linkPage informationCite this pageCreate a bookDownload as PDFPrintable version