Help:Shared storage: Difference between revisions

From Wikitech
Content deleted Content added
→‎Home directory storage: updated with content from discussion about replacement of toolserver (what Ryan wrote over there)
Hydriz (talk | contribs)
→‎Public datasets: add add/change dumps
Line 41: Line 41:


However, this directory only contains the last 5 good dumps generated by Wikimedia, and not dumps generated a few months back. For those dumps, you would have to manually download it from the Wikimedia Downloads server.
However, this directory only contains the last 5 good dumps generated by Wikimedia, and not dumps generated a few months back. For those dumps, you would have to manually download it from the Wikimedia Downloads server.

The add/change dumps are also available under <tt>/public/datasets/public/other/incr</tt>.


== Public keys ==
== Public keys ==

Revision as of 02:48, 17 May 2013

In the Wikimedia Labs cluster, there are a few shared storage directories that you can access in any instance. These shared storage directories serve different purposes, and differ according to the current working directory you are in.

Project storage

Labs is meant to be a collaborative, community maintained environment. Therefore, we strongly recommend not to save data in spaces that are accessible to individuals only. We also recommend to run things as service users rather than individualized users. So, if a user leaves, their bot, or tool, should very easily be transferable to another user.

The project storage is a storage space in which you and your project members can store files in. It is the recommended place for this. It is available for every instance in your project, but is not shared across projects. The directory is available at /data/project.

There is a default quota of 300GB per project, which is a very nice limit for most projects. However, if you find you and your project members needing more space than just that, asking politely on IRC or email the Labs-l mailing list and state a reason as to why the quota limit should be raised for your project.

Typing in the command df -h when the directory is mounted on your instance can reveal how much space has been consumed on this project storage directory, how much its left and what the quota is for your project.

Troubleshooting

The directory can't be mounted on my instance!

Try installing/upgrading the glusterfs package:

sudo apt-get install glusterfs

If the package is already installed, then just attempt to restart the daemon running it:

sudo /etc/init.d/autofs restart

If the above does not work, reboot the instance first before trying the above steps again. If you have done every possible means to fix it, but it still does not mount, ask a human.

The disk usage is showing wrong figures!

If you do a du -sh in /data/project, and its far lesser than when you do a df -h and the system outputs an absurd figure showing how much has been used, try this:

TODO

For now, you can just drop-by our IRC channel and ping Ryan Lane for help (and put docs here!).

Home directory storage

You have a /home directory that is available in every instance in your project. We discourage the use of home directories for project files though: They are personal spaces and do therefore not support collaboration. We encourage project storage to be used in a fairly open way (not per-user, but per-bot or per-tool, or per-subproject). The only thing that should go into a user's home directory is their environment settings.

There is a limit of 50GB of storage available to each project, and can be increased upon request (although it is very unlikely to happen).

Note: New directories can be created inside the /home directory, which is helpful if you are trying to create a user within your own instance.

Public datasets

Wikimedia Labs has a directory for storing the public datasets of Wikimedia (i.e. the dumps generated by Wikimedia). These files can be accessed at /public/datasets. It is a read-only directory, but you can copy them to your project storage directory and manipulate them in whatever way you like.

These files are provided to avoid using too much bandwidth between the dumps.wikimedia.org server to Wikimedia Labs, and perhaps also make it faster to get access to it.

However, this directory only contains the last 5 good dumps generated by Wikimedia, and not dumps generated a few months back. For those dumps, you would have to manually download it from the Wikimedia Downloads server.

The add/change dumps are also available under /public/datasets/public/other/incr.

Public keys

This is just a directory that contains users' public keys. Its located at /public/keys and is accessible everywhere. It is quite useless, since it is just used for authentication between the user and Wikimedia Labs.