Nova Resource:Dumps/Archive.org: Difference between revisions

From Wikitech
Content deleted Content added
Hydriz (talk | contribs)
+cat
Hydriz (talk | contribs)
Line 5: Line 5:
== Archiving from Labs ==
== Archiving from Labs ==
There is a project on Wikimedia Labs called "[https://labsconsole.wikimedia.org/wiki/Nova_Resource:Dumps Dumps]" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:
There is a project on Wikimedia Labs called "[https://labsconsole.wikimedia.org/wiki/Nova_Resource:Dumps Dumps]" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:
# [[Dumps/Adds-changes dumps|Adds/Changes dumps]] ([https://github.com/Hydriz/incrdumps source])
# [[Dumps/Adds-changes dumps|Adds/Changes dumps]] ([https://github.com/Hydriz/incrdumps source]) - Runs daily via an indirect cron (see cron.py).
# [[Dumps/media|Incremental media tarballs]] ([https://github.com/Hydriz/MediaTarballs source])
# [[Dumps/media|Incremental media tarballs]] ([https://github.com/Hydriz/MediaTarballs source]) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance [http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/incrs/tarballs/ here].
Currently not running but is being planned:
Currently not running but is being planned:
# Main database dumps
# Main database dumps

Revision as of 09:37, 14 November 2012

Archive.org refers to the Internet Archive, which is a library of stuff, mainly scanned books, but can contain almost anything that is of free content.

We are currently working on moving the public datasets to the Archive for preservation, although right now its mainly being handled by volunteers (specifically Hydriz and Nemo).

Archiving from Labs

There is a project on Wikimedia Labs called "Dumps" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:

  1. Adds/Changes dumps (source) - Runs daily via an indirect cron (see cron.py).
  2. Incremental media tarballs (source) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance here.

Currently not running but is being planned:

  1. Main database dumps
  2. Full media tarballs (at least for those <10GB)