Nova Resource:Dumps/Archive.org: Difference between revisions

From Wikitech
Content deleted Content added
Hydriz (talk | contribs)
Hydriz (talk | contribs)
Line 6: Line 6:
There is a project on Wikimedia Labs called "[https://labsconsole.wikimedia.org/wiki/Nova_Resource:Dumps Dumps]" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:
There is a project on Wikimedia Labs called "[https://labsconsole.wikimedia.org/wiki/Nova_Resource:Dumps Dumps]" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:
# [[Dumps/Adds-changes dumps|Adds/Changes dumps]] ([https://github.com/Hydriz/incrdumps source]) - Runs daily via an indirect cron (see cron.py).
# [[Dumps/Adds-changes dumps|Adds/Changes dumps]] ([https://github.com/Hydriz/incrdumps source]) - Runs daily via an indirect cron (see cron.py).
# [[Dumps/media|Incremental media tarballs]] ([https://github.com/Hydriz/MediaTarballs source]) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance [http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/incrs/tarballs/ here].
# [[Dumps/media|Incremental media tarballs]] ([https://github.com/Hydriz/MediaTarballs source]) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance [http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/incrs/tarballs/ here]. ''(not currently running)''
Currently not running but is being planned:
Currently not running but is being planned:
# Main database dumps
# Main database dumps

Revision as of 02:34, 15 March 2013

Archive.org refers to the Internet Archive, which is a library of stuff, mainly scanned books, but can contain almost anything that is of free content.

We are currently working on moving the public datasets to the Archive for preservation, although right now its mainly being handled by volunteers (specifically Hydriz and Nemo).

Archiving from Labs

There is a project on Wikimedia Labs called "Dumps" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:

  1. Adds/Changes dumps (source) - Runs daily via an indirect cron (see cron.py).
  2. Incremental media tarballs (source) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance here. (not currently running)

Currently not running but is being planned:

  1. Main database dumps
  2. Full media tarballs (at least for those <10GB)