wikitech.wikimedia.org
Wikitech
Help:CirrusSearch elasticsearch replicas
Cloud Elastic is a replica of the CirrusSearch elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Cloud Services. Applications can use the full power of the elasticsearch search API's to query the search indices in ways that CirrusSearch does not expose directly on the wiki's themselves.
Accessing
There are actually three clusters, named chi, psi and omega. chi contains approximately the 200 largest wikis. psi and omega contain equal splits of the remaining smaller wikis. Assignment of wikis to clusters is constant and is not expected to change.
Cluster NameURL
chihttps://cloudelastic.wikimedia.org:8243/
psihttps://cloudelastic.wikimedia.org:8643/
omegahttps://cloudelastic.wikimedia.org:8443/
Clusters can be accessed through each other using the elasticsearch cross cluster search syntax. For example labswiki (wikitech's internal database name), which lives on the omega cluster, can be queried through the chi cluster with:
curl -XGET https://cloudelastic.wikimedia.org:8243/omega:labswiki/_search?q=example
A plausible method to programatically connect to the right cluster is to fetch the /_aliases end-point from each cluster. The cluster that contains indices for a wiki will have an alias matching the internal database name of the wiki. This alias will point to all related indices, such as labswiki_content_<ts> and labswiki_general_<ts>​. There are additional single-index aliases that map from a generic name, like labswiki_content to the exact index used such as labswiki_content_123456789​. Applications should always access indices through aliases to ensure clean switchover when indices are rebuilt for operational reasons.
Indices Available
All wikis have two indices, of the format <dbname>_content and <dbname>_general. The content index contains all of the content namespaces of the wiki, the general index contains everything else. So for example on wikipedia's articles are found in the content index, and talk pages are found in the general index. Querying both indices can be done through an alias by providing only the wiki db name.
The set of indices that exist in a cluster can be queried through the elasticsearch cat indices API.
curl -XGET https://cloudelastic.wikimedia.org:9843/_cat/indices
Schema
See mw:Extension:CirrusSearch/Schema​.
Example Use Cases
Query all indices
curl -XGET https://cloudelastic.wikimedia.org:8243/*,*:*/_search?q=example
Query all content indices
curl -XGET https://cloudelastic.wikimedia.org:8243/*_content,*:*_content/_search?q=example
Fetch full document for single page by page id
curl -XGET https://cloudelastic.wikimedia.org:8243/enwiki_content/page/33179123
Fetch full document for single page by title
curl -XGET https://cloudelastic.wikimedia.org:8243/enwiki_content/_search?q=title.keyword:Elasticsearch
Fetch full document for page by approximate page title
This is the underlying functionality that powers 'go directly to page' of the wiki autocomplete box. Target title: Ñuñoa
curl -XGET https://cloudelastic.wikimedia.org:8243/enwiki_content/_search?q=title.nearmatch=nunoa
Count words in a wiki
This demonstrates sending a full JSON query in the GET body, and extracting only part of the result using jq.
curl -s -XGET -H 'Content-Type: application/json' -d '{"query":{"bool":{"filter":[{"terms":{"namespace":[0]}}]}},"aggs":{"word_count":{"sum":{"field":"text.word_count"}}},"stats":["sum_word_count"]}' https://cloudelastic.wikimedia.org:8243/enwiki_content/_search | jq -r .aggregations.word_count.value
(Note that the namespace filter should potentially be adjusted for other wikis.)
See also
Help:Toolforge/Elasticsearch​: read/write Elasticsearch service for Toolforge tool
Communication and support
We communicate and provide support through several primary channels. Please reach out with questions and to join the conversation.
Communicate with us
ConnectBest for
Phabricator Workboard#Cloud-ServicesTask tracking and bug reporting
IRC Channel#wikimedia-cloud​connect
Telegram bridge
mattermost bridge
General discussion and support
Mailing Listcloud@Information about ongoing initiatives, general discussion and support
Announcement emailscloud-announce@Information about critical changes (all messages mirrored to cloud@)
News wiki pageNewsInformation about major near-term plans
Cloud Services BlogClouds & UnicornsLearning more details about some of our work
Wikimedia Technical Blogtechblog.wikimedia.orgNews and stories from the Wikimedia technical movement
Last edited on 7 June 2021, at 21:31
Wikitech
Content is available under CC BY-SA 3.0 unless otherwise noted.
Privacy policy
Terms of Use
Desktop
 Home Random Log in  Settings  Donate  About Wikitech  Disclaimers
WatchEdit