md

TriggerTek Logo
abcdefghijklmnopqrstuvwxyz_
MD(4)									MD(4)



NAME
       md - Multiple Device driver aka Linux Software Raid

SYNOPSIS
       /dev/mdn
       /dev/md/n

DESCRIPTION
       The  md	driver	provides virtual devices that are created from one or
       more independent underlying devices.  This array of devices often con-
       tains redundancy, and hence the acronym RAID which stands for a Redun-
       dant Array of Independent Devices.

       md supports RAID levels 1 (mirroring) 4	(striped  array	 with  parity
       device),	 5  (striped array with distributed parity information) and 6
       (striped array with distributed dual redundancy information.)  If some
       number  of  underlying  devices fails while using one of these levels,
       the array will continue to function; this number is one for RAID	 lev-
       els  4  and  5,	two  for RAID level 6, and all but one (N-1) for RAID
       level 1.

       md also supports a number of pseudo  RAID  (non-redundant)  configura-
       tions  including RAID0 (striped array), LINEAR (catenated array), MUL-
       TIPATH (a set of different interfaces to the same device), and  FAULTY
       (a layer over a single device that sythesises errors).


   MD SUPER BLOCK
       Though  it  is  possible	 to  create an array without using per-device
       superblocks (see below), each device in an MD array will normally have
       a  super block written towards the end of the device.  This superblock
       records information about the structure and state of the array so that
       the array can be reliably re-assembled after a shutdown.

       The superblock is 4K long and is written into a 64K aligned block that
       starts at least 64K and less than 128K from  the	 end  of  the  device
       (i.e.  to  get  the  address  of	 the superblock round the size of the
       device down to a multiple of 64K and then subtract 64K).	  The  avail-
       able  size  of  each  device  is	 the amount of space before the super
       block, so between 64K and 128K is lost when a device  in	 incorporated
       into an MD array.

       The superblock contains, among other things:

       LEVEL  The  manner  in  which  the devices are arranged into the array
	      (linear, raid0, raid1, raid4, raid5, multipath).

       UUID   a 128 bit Universally Unique  Identifier	that  identifies  the
	      array that this device is part of.


   NO-SUPERBLOCK ARRAYS
       It  is possible for some md arrays to be created without a superblock.
       This allows the whole of each device to participate in the array,  but
       requires	 some  external mechanism to determine what devices should be
       arranged into which arrays.

       FAULTY arrays are an obvious candidate for not having a superblock  as
       there  is nothing useful to go in the superblock.  MUTIPATH arrays can
       also be usefully made without superblocks as there are  likely  to  be
       other  ways to detect that two paths connect to the same real devices.

       Other array type can work without superblocks are  RAID1,  RAID0,  and
       LINEAR.	However these should only be made without a superblock if you
       are sure that you know what you are doing.


   LINEAR
       A linear array simply catenates the  available  space  on  each	drive
       together to form one large virtual drive.

       One  advantage of this arrangement over the more common RAID0 arrange-
       ment is that the array may be reconfigured at a	later  time  with  an
       extra  drive  and  so  the array is made bigger without disturbing the
       data that is on the array.  However this cannot yet be done on a	 live
       array.



   RAID0
       A  RAID0	 array (which has zero redundancy) is also known as a striped
       array.  A RAID0 array is configured at  creation	 with  a  Chunk	 Size
       which must be a power of two, and at least 4 kibibytes.

       The  RAID0  driver  assigns  the first chunk of the array to the first
       device, the second chunk to the second device, and  so  on  until  all
       drives  have been assigned one chunk.  This collection of chunks forms
       a stripe.  Further chunks are gathered into stripes in  the  same  way
       which are assigned to the remaining space in the drives.

       If  devices  in	the  array  are	 not all the same size, then once the
       smallest device has been exhausted, the RAID0 driver starts collecting
       chunks into smaller stripes that only span the drives which still have
       remaining space.



   RAID1
       A RAID1 array is also known as a mirrored set (though mirrors tend  to
       provide reflected images, which RAID1 does not) or a plex.

       Once  initialised,  each	 device in a RAID1 array contains exactly the
       same data.  Changes are written to all devices in parallel.   Data  is
       read  from  any	one  device.   The driver attempts to distribute read
       requests across all devices to maximise performance.

       All devices in a RAID1 array should be the same	size.	If  they  are
       not, then only the amount of space available on the smallest device is
       used.  Any extra space on other devices is wasted.


   RAID4
       A RAID4 array is like a RAID0 array with an extra device	 for  storing
       parity.	This  device  is the last of the active devices in the array.
       Unlike RAID0, RAID4 also requires that all stripes span all drives, so
       extra space on devices that are larger than the smallest is wasted.

       When  any block in a RAID4 array is modified the parity block for that
       stripe (i.e. the block in the parity device at the same device  offset
       as  the	stripe) is also modified so that the parity block always con-
       tains the "parity" for the whole stripe.	 i.e. its contents is equiva-
       lent to the result of performing an exclusive-or operation between all
       the data blocks in the stripe.

       This allows the array to continue to function  if  one  device  fails.
       The  data that was on that device can be calculated as needed from the
       parity block and the other data blocks.


   RAID5
       RAID5 is very similar to RAID4.	The difference	is  that  the  parity
       blocks  for each stripe, instead of being on a single device, are dis-
       tributed across all devices.  This allows more parallelism when	writ-
       ing  as	two different block updates will quite possibly affect parity
       blocks on different devices so there is less contention.

       This also allows more parallelism when reading as  read	requests  are
       distributed  over all the devices in the array instead of all but one.


   RAID6
       RAID6 is similar to RAID5, but can handle the loss of any two  devices
       without	data  loss.   Accordingly,  it requires N+2 drives to store N
       drives worth of data.

       The performance for RAID6 is slightly lower but comparable to RAID5 in
       normal  mode  and  single  disk failure mode.  It is very slow in dual
       disk failure mode, however.


   MUTIPATH
       MULTIPATH is not really a RAID at all as there is only one real device
       in  a  MULTIPATH	 md  array.  However there are multiple access points
       (paths) to this device, and one of these paths might  fail,  so	there
       are some similarities.

       A  MULTIPATH  array  is	composed  of  a number of logically different
       devices, often fibre channel interfaces, that all refer the  the	 same
       real device. If one of these interfaces fails (e.g. due to cable prob-
       lems), the multipath driver  will  attempt  to  redirect	 requests  to
       another interface.


   FAULTY
       The FAULTY md module is provided for testing purposes.  A faulty array
       has exactly one component device and is normally assembled  without  a
       superblock,  so	the md array created provides direct access to all of
       the data in the component device.

       The FAULTY module may be requested to simulate faults to allow testing
       of other md levels or of filesystems.  Faults can be chosen to trigger
       on read requests or write requests, and can be transient (a subsequent
       read/write at the address will probably succeed) or persistant (subse-
       quent read/write of the same address will fail).	 Further, read faults
       can  be	"fixable"  meaning that they persist until a write request at
       the same address.

       Fault types can be requested with a period.  In this  case  the	fault
       will  recur repeatedly after the given number of requests of the rele-
       vant type.  For example if persistent read faults  have	a  period  of
       100,  then  every  100th	 read request would generate a fault, and the
       faulty sector would be recorded so that subsequent reads on that	 sec-
       tor would also fail.

       There  is a limit to the number of faulty sectors that are remembered.
       Faults generated after this limit is exhausted are  treated  as	tran-
       sient.

       The  list  of  faulty  sectors  can be flushed, and the active list of
       failure modes can be cleared.


   UNCLEAN SHUTDOWN
       When changes are made to a RAID1, RAID4, RAID5 or RAID6 array there is
       a  possibility  of  inconsistency  for  short  periods of time as each
       update requires are  least  two	block  to  be  written	to  different
       devices,	 and  these  writes probably won’t happen at exactly the same
       time.  Thus if a system with one of these arrays is  shutdown  in  the
       middle of a write operation (e.g. due to power failure), the array may
       not be consistent.

       To handle this situation, the md driver	marks  an  array  as  "dirty"
       before  writing any data to it, and marks it as "clean" when the array
       is being disabled, e.g. at shutdown.  If the md driver finds an	array
       to  be dirty at startup, it proceeds to correct any possibly inconsis-
       tency.  For RAID1, this involves copying the  contents  of  the	first
       drive onto all other drives.  For RAID4, RAID5 and RAID6 this involves
       recalculating the parity for each stripe and making sure that the par-
       ity block has the correct data.	This process, known as "resynchronis-
       ing" or "resync" is performed in the background.	 The array can	still
       be used, though possibly with reduced performance.

       In 2.6 Linux kernels, an md array is marked clean after a short period
       (around 20 milliseconds) of no write activity, and then	marked	dirty
       before  any  subsequent	write  is attempted.  This means that unclean
       shutdowns are much less likely with a 2.6 kernel.

       If a RAID4, RAID5 or RAID6 array is degraded  (missing  at  least  one
       drive)  when  it	 is  restarted	after  an unclean shutdown, it cannot
       recalculate parity, and so it is possible that  data  might  be	unde-
       tectably	 corrupted.  The 2.4 md driver does not alert the operator to
       this condition.	The 2.5 md driver will fail to start an array in this
       condition without manual intervention.


   RECOVERY
       If  the	md  driver  detects  any error on a device in a RAID1, RAID4,
       RAID5 or RAID6 array, it immediately disables that device (marking  it
       as faulty) and continues operation on the remaining devices.  If there
       is a spare drive, the driver will start recreating on one of the spare
       drives  the  data  what	was on that failed drive, either by copying a
       working drive in a RAID1 configuration, or by doing calculations	 with
       the parity block on RAID4, RAID5 or RAID6.

       While  this  recovery process is happening, the md driver will monitor
       accesses to the array and will slow down the rate of recovery if other
       activity	 is happening, so that normal access to the array will not be
       unduly affected.	 When no other activity is  happening,	the  recovery
       process	proceeds at full speed.	 The actual speed targets for the two
       different situations can be  controlled	by  the	 speed_limit_min  and
       speed_limit_max control files mentioned below.


   KERNEL PARAMETERS
       The md driver recognised three different kernel parameters.

       raid=noautodetect
	      This  will  disable the normal detection of md arrays that hap-
	      pens at boot time.  If a drive is partitioned with MS-DOS style
	      partitions,  then	 if any of the 4 main partitions has a parti-
	      tion type	 of  0xFD,  then  that	partition  will	 normally  be
	      inspected	 to see if it is part of an MD array, and if any full
	      arrays are found, they are  started.   This  kernel  paramenter
	      disables this behaviour.


       raid=partitionable

       raid=part
	      These  are available in 2.6 and later kernels only.  They indi-
	      cate that autodetected MD arrays should be  created  as  parti-
	      tionable	arrays,	 with  a different major device number to the
	      original non-partitionable md arrays.   The  device  number  is
	      listed as mdp in /proc/devices.



       md=n,dev,dev,...

       md=dn,dev,dev,...
	      This  tells the md driver to assemble /dev/md n from the listed
	      devices.	It is only necessary to start the device holding  the
	      root  filesystem	this way.  Other arrays are best started once
	      the system is booted.

	      In 2.6 kernels, the d immediately after the = indicates that  a
	      partitionable  device  (e.g.   /dev/md/d0)  should  be  created
	      rather than the original non-partitionable device.


       md=n,l,c,i,dev...
	      This tells the md driver to assemble a legacy RAID0  or  LINEAR
	      array  without  a	 superblock.  n gives the md device number, l
	      gives the level, 0 for RAID0 or -1  for  LINEAR,	c  gives  the
	      chunk  size  as a base-2 logarithm offset by twelve, so 0 means
	      4K, 1 means 8K.  i is ignored (legacy support).


FILES
       /proc/mdstat
	      Contains information about  the  status  of  currently  running
	      array.

       /proc/sys/dev/raid/speed_limit_min
	      A	 readable  and	writable  file that reflects the current goal
	      rebuild speed for times when non-rebuild activity is current on
	      an  array.  The speed is in Kibibytes per second, and is a per-
	      device rate, not a per-array rate (which means  that  an	array
	      with more disc will shuffle more data for a given speed).	  The
	      default is 100.


       /proc/sys/dev/raid/speed_limit_max
	      A readable and writable file that	 reflects  the	current	 goal
	      rebuild speed for times when no non-rebuild activity is current
	      on an array.  The default is 100,000.


SEE ALSO
       mdadm(8), mkraid(8).



									MD(4)