SRE: Difference between revisions
Content deleted Content added
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Site Reliability Engineering (SRE) is responsible for availability, performance, monitoring, emergency response, infrastructure security, and capacity planning plus the maintenance of software used for that purpose. This is similar to what in many other organizations is handled by an Operations or System Administration team. SRE treats computer operations as a software problem and applies automation wherever possible. The foundation has a number of |
Site Reliability Engineering (SRE) is responsible for availability, performance, monitoring, emergency response, infrastructure security, and capacity planning plus the maintenance of software used for that purpose. This is similar to what in many other organizations is handled by an Operations or System Administration team. SRE treats computer operations as a software problem and applies automation wherever possible. The foundation has a number of subteams with its SRE team. Check [https://wikitech.wikimedia.org/wiki/SRE_Team_requests here] to see how to get in touch with those teams and [https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering here] for a more detailed team structure. |
||
* [https://wikitech.wikimedia.org/wiki/Dc-operations SRE Data Center Operations] |
* [https://wikitech.wikimedia.org/wiki/Dc-operations SRE Data Center Operations] |
Revision as of 17:48, 20 May 2021
Site Reliability Engineering (SRE) is responsible for availability, performance, monitoring, emergency response, infrastructure security, and capacity planning plus the maintenance of software used for that purpose. This is similar to what in many other organizations is handled by an Operations or System Administration team. SRE treats computer operations as a software problem and applies automation wherever possible. The foundation has a number of subteams with its SRE team. Check here to see how to get in touch with those teams and here for a more detailed team structure.
- SRE Data Center Operations
- SRE Data Persistence
- SRE Infrastructure Foundations
- SRE Observability
- SRE Service Operations
- SRE Traffic
References:
- How complex systems fail This is where SRE works
- Google's SRE books Google formalized many of the concepts and coined the term SRE