Obsolete:Labs NFS: Difference between revisions

From Wikitech
Content deleted Content added
Coren (talk | contribs)
→‎LVM: moar
Coren (talk | contribs)
Line 33: Line 33:
NFS version 4 exports from a single, unified tree (<code>/exp/</code> in our setup). This tree is populated with bind mounts taking the various subdirectories of <code>/srv</code> and kept in sync with changes there by the <code>/usr/local/sbin/sync-exports</code>. This is matched with the actual NFS exports in <code>/etc/exports.d</code>, one file per project.
NFS version 4 exports from a single, unified tree (<code>/exp/</code> in our setup). This tree is populated with bind mounts taking the various subdirectories of <code>/srv</code> and kept in sync with changes there by the <code>/usr/local/sbin/sync-exports</code>. This is matched with the actual NFS exports in <code>/etc/exports.d</code>, one file per project.


One huge caveat that needs to be noted: it is imperative that <code>sync-exports</code> be executed before NFS is started, as this sets up the actual filesystems to be exported (through the bind mounts) - if NFS is started before that point any NFS client will notice the changed root inode and will remain stuck in "stale NFS handle" errors until a reboot.
One huge caveat that needs to be noted: it is imperative that <code>sync-exports</code> be executed before NFS is started, as this sets up the actual filesystems to be exported (through the bind mounts) - if NFS is started before that point any NFS client will notice the changed root inode and will remain stuck in "stale NFS handle" errors until a reboot (whereas they should otherwise be able to recover from any outage since all NFS mounts are hard).





Revision as of 20:32, 8 September 2015

NFS is served to eqiad labs from one of two servers (labstore1001 and labstore1002) which are connected to a set of five MD1200 disk shelves.

Hardware setup

Each server is connected to all five shelves, with three shelves on one port of the controller and two shelves on the other. Each shelf holds 12 1.8TB SAS drives, and the controller is configured to expose them as single-disk raid 0 to the OS (The H800 controller does not support actual JBOD configuration). In addition, both servers have (independently) 12 more 1.8TB SAS drives in the internal bays.

The internal disks are visible to the OS as /dev/sda to /dev/sdl, and the shelves' disks are /dev/sdm to /dev/sdbt. (A quick early diagnostic is visible at the end of POST as the PERCs start up; normal operation should report 72 exported disks).

Software RAID

The external shelves are configured as raid10 arrays of 12 drives, constructed from six drives on one shelf, and six drives on a different shelf (such that no single raid10 array relies on any one shelf). MD numbering is not guaranteed to be stable between boots, but the current arrays are normally numbered md122-md125.

In addition, the first two drives of the internal bay are configured as a raid1 (md0) for the OS.

LVM

Each shelf array is configured as a LVM physical volume, and pooled in the labstore volume group, from which all shared volumes are allocated.

There is still a backup volume group containing the internal drives of labstore1002 (not counting the OS-allocated drives) that contains old images – but that VG is not in active use anymore.

The labstore volume contains four primary logical volumes:

  • labstore/tools, shared storage for the tools project
  • labstore/maps, shared storage for the maps project
  • labstore/others, containing storage for all other labs project
  • labstore/scratch, containing the labs-wide scratch storage

Conceptually, the volumes are mounted under /srv/{project,others}/$project, with /srv/others being the mountpoint of the "others" volume, and the project-specific volumes mounted under /srv/project/; this is configured in /etc/fstab and must be adjusted accordingly if new project-specific volumes are made.

In addition to the shared storage volume, the volume group also contains transient snapshots made during the backup process.

NFS Exports

NFS version 4 exports from a single, unified tree (/exp/ in our setup). This tree is populated with bind mounts taking the various subdirectories of /srv and kept in sync with changes there by the /usr/local/sbin/sync-exports. This is matched with the actual NFS exports in /etc/exports.d, one file per project.

One huge caveat that needs to be noted: it is imperative that sync-exports be executed before NFS is started, as this sets up the actual filesystems to be exported (through the bind mounts) - if NFS is started before that point any NFS client will notice the changed root inode and will remain stuck in "stale NFS handle" errors until a reboot (whereas they should otherwise be able to recover from any outage since all NFS mounts are hard).