SRE/Infrastructure naming conventions: Difference between revisions
various copyedits |
Cdentinger (talk | contribs) Explain fundraising naming |
||
Line 199: | Line 199: | ||
|In use |
|In use |
||
|- |
|- |
||
|fr* |
|||
| fdb || fundraising database |
|||
|Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) |
|||
|In use |
|In use |
||
|- |
|- |
Revision as of 17:39, 25 April 2018
This page documents the naming conventions of servers, routers, data center sites, and other infrastructure relevant to Wikimedia Foundation clusters.
Our servers currently fall in broadly two categories:
- Clustered servers: These use numeral sequences with a descriptive prefix (see #Networking and #Servers). For example: db1001.
- Miscellaneous servers: These use unique hostnames (see #Miscellaneous servers). For example: helium.
Name re-use
We never re-use names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001.
The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.
All previous servers are kept (even if decommissioned) in Racktables. Please check there for existing server names before deciding on a name for any new servers. (Note: Racktables is restricted by login.)
Datacenter sites
Clusters are named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport.
For example: Our Dallas site is named CODFW. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, Love Field airport is closer but less well-known.)
Cluster | Vendor | Airport Code |
---|---|---|
codfw | CyrusOne | DFW |
eqdfw | Equinix | DFW |
eqiad | Equinix | IAD |
eqord | Equinix | ORD |
eqsin | Equinix | SIN |
esams | EvoSwitch | AMS |
knams | Kennisnet | AMS |
ulsfo | United Layer | SFO |
Networking
Naming for network equipment is based on role and location.
This also applies to: power distribution units, serial console servers, and other networking infrastructure.
Name prefix | Role | Example |
---|---|---|
asw | access switch | asw-a1-eqiad |
cr | core router | cr1-eqiad |
mr | management router | mr1-eqiad |
msw | management switch | msw1-eqiad & msw-b2-eqiad |
pfw | payments fire wall | pfw1-eqiad |
ps1 / ps2 | power strips/distribution units | ps1-b3-eqiad |
psw | dedicated peering switch | psw1-eqiad |
scs | serial console server | scs-a8-eqiad |
OpenStack deployments
[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]
- Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment | Region | Availability Zone |
---|---|---|
eqiad0 | eqiad0-r | eqiad0-rb |
eqiad1 | eqiad1-r | eqiad1-rb |
codfw0dev | codfw0dev-r | codfw0dev-rb |
codfw1dev | codfw1dev-r | codfw1dev-rb |
Disks
- Naming follows two conventions:
- Array is attached to a single host:
- hostname_of_host_system-arrayN
- Example: ms2001-array1, ms2001-array2
- all arrays get a number, even if there is only a single array.
- Example: dataset1001-array1
- Array is attached to multiple hosts
- Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
- servicehostgroup-arrayN-site
- Example: labstore-array1-codfw, labstore-array2-codfw
Servers
Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.
Datacenter | Numeral range | Example |
---|---|---|
pmtpa / sdtpa | 1-999 | cp7 |
eqiad | 1000-1999 | db1001 |
codfw | 2000-2999 | mw2187 |
esams / knams | 3000-3999 | cp3031 |
ulsfo | 4000-4999 | bast4001 |
eqsin | 5000-5999 | dns5001 |
When adding a new datacenter, make sure to update operations/puppet.git
's /typos
file which checks hostnames.
Name prefix | Description | Status |
---|---|---|
amssq | esams caching server | No longer used (deprecated) |
amslvs | esams LVS | No longer used (deprecated) |
analytics | analytics nodes (Hadoop, Hive, Impala, and various other things) | In use |
auth | Authentication server | In use |
bast | bastion host | In use |
cloud*-dev | Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | In use (new) |
cloudcontrol | OpenStack deployment controller | In use (new) |
cloudservices | Misc OpenStack components (Designate) | In use (new) |
cloudvirt | OpenStack Hypervisor (libvirtd + KVM) | In use (new) |
cloudstore | Storage system for Cloud | In use (new) |
cloudnet | Network gateway for tenants (Neutron l3) | In use (new) |
conf | Configuration system host (etcd, zookeeper...) | In use |
cp | cache proxy (Varnish) | In use |
dataset | dataset dumps storage | In use (deprecated) |
db | database host | In use |
dbproxy | database proxy | In use |
dbstore | database backup | In use |
druid | Druid Cluster (Analytics) | In use |
dumpsdata | dataset generation fileset serving to snapshot hosts | In use |
elastic | elasticsearch servers | In use |
es | external storage database | In use |
etcd | Etcd server | In use |
etherpad | Etherpad server | In use (mistake) |
eventlog | Event logging host | In use |
fr* | Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) | In use |
ganeti | Ganeti Virtualization Cluster | In use |
gerrit | Gerrit code review (cobalt in eqiad is currently used) | In use (deprecated) |
graphite | Graphite server | In use |
install | Installation server | In use (rare) |
kafka | Kafka Brokers | In use |
kafkamon | Kafka Monitoring (VMs) | In use |
knsq | knams squid | No longer used (deprecated) |
lab | labs virtual node | No longer used (deprecated) |
labcontrol | lab controller | In use (deprecated) |
labnodepool | labs node pool server (CI) | In use (deprecated) |
labmon | labs monitoring server | In use (deprecated) |
labnet | labs network | In use (deprecated) |
labsdb | labs database | In use (deprecated) |
labservices | labs services | In use (deprecated) |
labstore | labs storage | In use |
labtest | labs test hosts | In use (deprecated) |
labvirt | labs virtualization node | In use (deprecated) |
logstash | elasticsearch/logstash/kibana node | In use |
lvs | lvs load balancer | In use |
maps-test | maps test cluster | In use |
mc | memcache server | In use |
ms | media storage | In use (deprecated) |
ms-be | media storage backend | In use |
ms-fe | media storage frontend | In use |
mw | MediaWiki node (MediaWiki PHP webservers, api, jobrunners, videoscalers) | In use |
mwdebug | mediawiki deployment testing nodes (run in ganeti) | In use |
mwlog | mediawiki logging server | In use |
mx | Mail relays | In use |
nas | NAS boxes (NetApp) | Unused |
netmon | Network Monitor (smokeping, torrus, librenms) | In use |
nfs | NFS server | Unused |
notebook | Jupyterhub experimental server | In use |
ocg | offline content generator (PDF) | In use |
oresrdb | ORES Redis systems | In use |
pc | Parser cache database | In use |
PDF Collections | No longer used (deprecated) | |
phab | Phabricator host (currently iridium is eqiad phab host) | In use |
planet | Planet server | In use (mistake) |
pybal-test | PyBal testing and development | In use |
rbf | Redis Bloom Filter server | Unused |
rcs | Recent changes stream | In use |
rdb | Redis server | In use |
relforge | Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) | In use |
restbase | RESTBase server | In use |
sca | Service Cluster A - Includes various services | In use |
scb | Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | In use |
snapshot | Data dump processing node | In use |
sq | squid server | No longer used (deprecated) |
srv | apache server | No longer used (deprecated) |
stat | statistics computation hosts (see Analytics/Data access) | In use |
storage | storage host | No longer used (deprecated) |
tmh | MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. | No longer used (deprecated) |
virt | labs virtualization nodes | No longer used (deprecated) |
wdqs | wikidata query service | In use |
webperf | webperf metrics (performance team). See T179036. | In use |
wtp | wiki-text processor node (parsoid) | In use |
Miscellaneous servers
Any one-off or single service host. This includes pretty much all non-MediaWiki software on the cluster that isn't load balanced across multiple machines. Or general task machines that can cluster (to an extent) but require opsen work to do so. The naming of these is based on location (since they tend to do more than one kind of thing or provide more than one particular service/site type).
Datacenter Site | Convention | Example | Notes |
---|---|---|---|
codfw | Star Names | acamar | Only use modern proper star names that are a single word long and contain no odd characters.
Orion constellation is reserved for fundraising (Alnilam, Alnitak, Bellatrix, |
eqiad | Elements | helium | Next atomic # assignment (incremental by atomic #): 112 |
esams / knams | Notable Dutch | vandale |