Portal:Data Services
Data Services includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.
Data services currently include: Wiki Replicas, ToolsDB, Wikilabels Postgres, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Quarry, PAWS, and the OSM Database.
1Wiki Replicas
3Wikilabels Postgres
4Wikimedia Dumps
5Shared Storage
6CirrusSearch Elasticsearch replicas
9OSM Database
10Wikimedia Enterprise
11See also
Wiki Replicas
Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.
How to access
Access to the Wiki Replicas is automatically granted to all users of Toolforge. See Help:Toolforge/Database to learn how to access the Wiki Replicas.
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database.
See Help:Toolforge/Database#User databases for help on ToolsDB.
How to access
ToolsDB is acessible on the following addresses:
It used to run on labsdb1005 and got migrated into a Cloud VPS VM called clouddb1001 in the clouddb-services project (more details about the migration are available in phab:T208754 phab:T193264​).
You can verify the service status and the availability report in Icinga. Active checks are carried out by Toolschecker upon request by Icinga.
Wikilabels Postgres
The Wikilabels Postgres database, used by ORES, is on a replicated VM cluster: clouddb-wikilabels-01 is the primary with clouddb-wikilabels-02 as the usual replica.
Wikimedia Dumps
Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets. More documentation about dumps can be found at Data dumps.
How to access
Shared Storage
Shared Storage is offered via NFS. It includes shared directories offered to VPS and Toolforge users. Currently offered shares are described at Help:Shared storage. Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.
How to access
The Toolforge environment is set up for access by default, and other Cloud VPS projects can access some resources by requesting access to listed shares by filing a task on Phabricator under the Data-Services and VPS-Projects projects.
CirrusSearch Elasticsearch replicas
The "Cloud Elastic" servers are a replica of the CirrusSearch Elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). Applications can use the full power of the elasticsearch search API's to query the search indices in ways that CirrusSearch does not expose directly on the wikis themselves. See Help:CirrusSearch elasticsearch replicas for more details.
How to access
These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Wikimedia Cloud Services.
Quarry is a graphical web interface that allows users to query the Wiki Replicas with SQL. Quarry is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See m:Research:Quarry for help.
How to access
Quarry requires a Wikimedia SUL account to login.
PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.
How to access
PAWS requires a Wikimedia SUL account to login.
OSM Database
Wikimedia Cloud Services provides a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL and Openstreetmap Databases for more information.
Wikimedia Enterprise
Wikimedia Enterprise is a set of API's targeting large scale user needs. For more information on the APIs, see the service's documentation on mediawiki.org​.
How to access
Users of Toolforge, Cloud VPS, or PAWS have access to the Misc and Bulk APIs (Daily and Hourly Exports).
See also
Data Services administrative documentation
Category: Portals
This page was last edited on 22 April 2022, at 22:41.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details.
Privacy policy
About Wikitech
Code of Conduct
Mobile view
Cookie statement
Create accountLog in
ReadView sourceView history
Visit the main pageMain pageRecent changesServer admin log: ProdAdmin log: RelEngIncident statusDeploymentsSRE Team HelpCloud VPS portalToolforge portalRequest VPS projectAdmin log: Cloud VPSWhat links hereRelated changesSpecial pagesPermanent linkPage informationCite this pageCreate a bookDownload as PDFPrintable version