These docs are for maintainers
of the various dumps. Information for users
of the dumps can be found on metawiki
. Information for developers
can be found on mediawiki.org
Dumps maintainers should watch or check a few things every day:
We produce several types of dumps. For information about deployment of updates, architecture of the dumps, and troubleshooting each dump type, check the appropriate entry below.
- xml/sql dumps which contain revision metadata and content for public Wikimedia projects, along with contents of select sql tables
- adds/changes dumps which contain a daily xml dump of new pages or pages with new revisions since the previous run, for public Wikimedia projects
- Wikidata entity dumps which contain dumps of 'entities' (Qxxx) in various formats, and a dump of lexemes, run once a week.
- category dumps which contain weekly full and daily incremental category lists for public Wikimedia projects, in rdf format
- other miscellaneous dumps including content translation dumps, cirrus search dumps, and global block information.
Other datasets are also provided for download, such as page view counts; these datasets are managed by other folks and are not documented here.
- Dumps snapshot hosts that run scripts to generate the dumps
- Dumps datastores where the snapshot hosts write intermediate and final dump output files, which are later published to our web servers
- Dumps servers that provide the dumps to the public, to our mirrors, and via nfs to Wikimedia Cloud Services and stats host users
If you are interested in adding a new dumpset, please check the guidelines
(still in draft form).
Testing changes to the dumps or new scripts
Last edited on 20 September 2021, at 07:12
Content is available under CC BY-SA 3.0
unless otherwise noted.