Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RAIDCTL(8)		  BSD System Manager's Manual		    RAIDCTL(8)

NAME
     raidctl --	configuration utility for the RAIDframe	disk driver

SYNOPSIS
     raidctl [-v] -a component dev
     raidctl [-v] -A [yes | no | root] dev
     raidctl [-v] -B dev
     raidctl [-v] -c config_file dev
     raidctl [-v] -C config_file dev
     raidctl [-v] -f component dev
     raidctl [-v] -F component dev
     raidctl [-v] -g component dev
     raidctl [-v] -G dev
     raidctl [-v] -i dev
     raidctl [-v] -I serial_number dev
     raidctl [-v] -m dev
     raidctl [-v] -M [yes | no | set params] dev
     raidctl [-v] -p dev
     raidctl [-v] -P dev
     raidctl [-v] -r component dev
     raidctl [-v] -R component dev
     raidctl [-v] -s dev
     raidctl [-v] -S dev
     raidctl [-v] -u dev

DESCRIPTION
     raidctl is	the user-land control program for raid(4), the RAIDframe disk
     device.  raidctl is primarily used	to dynamically configure and unconfig-
     ure RAIDframe disk	devices.  For more information about the RAIDframe
     disk device, see raid(4).

     This document assumes the reader has at least rudimentary knowledge of
     RAID and RAID concepts.

     The command-line options for raidctl are as follows:

     -a	component dev
	     Add component as a	hot spare for the device dev.  Component la-
	     bels (which identify the location of a given component within a
	     particular	RAID set) are automatically added to the hot spare af-
	     ter it has	been used and are not required for component before it
	     is	used.

     -A	yes dev
	     Make the RAID set auto-configurable.  The RAID set	will be	auto-
	     matically configured at boot before the root file system is
	     mounted.  Note that all components	of the set must	be of type
	     RAID in the disklabel.

     -A	no dev
	     Turn off auto-configuration for the RAID set.

     -A	root dev
	     Make the RAID set auto-configurable, and also mark	the set	as be-
	     ing eligible to be	the root partition.  A RAID set	configured
	     this way will override the	use of the boot	disk as	the root de-
	     vice.  All	components of the set must be of type RAID in the
	     disklabel.	 Note that only	certain	architectures (currently
	     alpha, i386, pmax,	sparc, sparc64,	and vax) support booting a
	     kernel directly from a RAID set.

     -B	dev  Initiate a	copyback of reconstructed data from a spare disk to
	     its original disk.	 This is performed after a component has
	     failed, and the failed drive has been reconstructed onto a	spare
	     drive.

     -c	config_file dev
	     Configure the RAIDframe device dev	according to the configuration
	     given in config_file.  A description of the contents of
	     config_file is given later.

     -C	config_file dev
	     As	for -c,	but forces the configuration to	take place.  Fatal er-
	     rors due to uninitialized components are ignored.	This is	re-
	     quired the	first time a RAID set is configured.

     -f	component dev
	     This marks	the specified component	as having failed, but does not
	     initiate a	reconstruction of that component.

     -F	component dev
	     Fails the specified component of the device, and immediately be-
	     gin a reconstruction of the failed	disk onto an available hot
	     spare.  This is one of the	mechanisms used	to start the recon-
	     struction process if a component does have	a hardware failure.

     -g	component dev
	     Get the component label for the specified component.

     -G	dev  Generate the configuration	of the RAIDframe device	in a format
	     suitable for use with the -c or -C	options.

     -i	dev  Initialize	the RAID device.  In particular, (re-)write the	parity
	     on	the selected device.  This MUST	be done	for all	RAID sets be-
	     fore the RAID device is labeled and before	file systems are cre-
	     ated on the RAID device.

     -I	serial_number dev
	     Initialize	the component labels on	each component of the device.
	     serial_number is used as one of the keys in determining whether a
	     particular	set of components belong to the	same RAID set.	While
	     not strictly enforced, different serial numbers should be used
	     for different RAID	sets.  This step MUST be performed when	a new
	     RAID set is created.

     -m	dev  Display status information	about the parity map on	the RAID set,
	     if	any.  If used with -v then the current contents	of the parity
	     map will be output	(in hexadecimal	format)	as well.

     -M	yes dev
	     Enable the	use of a parity	map on the RAID	set; this is the de-
	     fault, and	greatly	reduces	the time taken to check	parity after
	     unclean shutdowns at the cost of some very	slight overhead	during
	     normal operation.	Changes	to this	setting	will take effect the
	     next time the set is configured.  Note that RAID-0	sets, having
	     no	parity,	will not use a parity map in any case.

     -M	no dev
	     Disable the use of	a parity map on	the RAID set; doing this is
	     not recommended.  This will take effect the next time the set is
	     configured.

     -M	set cooldown tickms regions dev
	     Alter the parameters of the parity	map; parameters	to leave un-
	     changed can be given as 0,	and trailing zeroes may	be omitted.
	     The RAID set is divided into regions regions; each	region is
	     marked dirty for at most cooldown intervals of tickms millisec-
	     onds each after a write to	it, and	at least cooldown - 1 such in-
	     tervals.  Changes to regions take effect the next time is config-
	     ured, while changes to the	other parameters are applied immedi-
	     ately.  The default parameters are	expected to be reasonable for
	     most workloads.

     -p	dev  Check the status of the parity on the RAID	set.  Displays a sta-
	     tus message, and returns successfully if the parity is up-to-
	     date.

     -P	dev  Check the status of the parity on the RAID	set, and initialize
	     (re-write)	the parity if the parity is not	known to be up-to-
	     date.  This is normally used after	a system crash (and before a
	     fsck(8)) to ensure	the integrity of the parity.

     -r	component dev
	     Remove the	spare disk specified by	component from the set of
	     available spare components.

     -R	component dev
	     Fails the specified component, if necessary, and immediately be-
	     gins a reconstruction back	to component.  This is useful for re-
	     constructing back onto a component	after it has been replaced
	     following a failure.

     -s	dev  Display the status	of the RAIDframe device	for each of the	compo-
	     nents and spares.

     -S	dev  Check the status of parity	re-writing, component reconstruction,
	     and component copyback.  The output indicates the amount of
	     progress achieved in each of these	areas.

     -u	dev  Unconfigure the RAIDframe device.	This does not remove any com-
	     ponent labels or change any configuration settings	(e.g. auto-
	     configuration settings) for the RAID set.

     -v	     Be	more verbose.  For operations such as reconstructions, parity
	     re-writing, and copybacks,	provide	a progress indicator.

     The device	used by	raidctl	is specified by	dev.  dev may be either	the
     full name of the device, e.g., /dev/rraid0d, for the i386 architecture,
     or	/dev/rraid0c for many others, or just simply raid0 (for
     /dev/rraid0[cd]).	It is recommended that the partitions used to repre-
     sent the RAID device are not used for file	systems.

   Configuration file
     The format	of the configuration file is complex, and only an abbreviated
     treatment is given	here.  In the configuration files, a `#' indicates the
     beginning of a comment.

     There are 4 required sections of a	configuration file, and	2 optional
     sections.	Each section begins with a `START', followed by	the section
     name, and the configuration parameters associated with that section.  The
     first section is the `array' section, and it specifies the	number of
     rows, columns, and	spare disks in the RAID	set.  For example:

	   START array
	   1 3 0

     indicates an array	with 1 row, 3 columns, and 0 spare disks.  Note	that
     although multi-dimensional	arrays may be specified, they are NOT sup-
     ported in the driver.

     The second	section, the `disks' section, specifies	the actual components
     of	the device.  For example:

	   START disks
	   /dev/sd0e
	   /dev/sd1e
	   /dev/sd2e

     specifies the three component disks to be used in the RAID	device.	 If
     any of the	specified drives cannot	be found when the RAID device is con-
     figured, then they	will be	marked as `failed', and	the system will	oper-
     ate in degraded mode.  Note that it is imperative that the	order of the
     components	in the configuration file does not change between configura-
     tions of a	RAID device.  Changing the order of the	components will	result
     in	data loss if the set is	configured with	the -C option.	In normal cir-
     cumstances, the RAID set will not configure if only -c is specified, and
     the components are	out-of-order.

     The next section, which is	the `spare' section, is	optional, and, if
     present, specifies	the devices to be used as `hot spares' -- devices
     which are on-line,	but are	not actively used by the RAID driver unless
     one of the	main components	fail.  A simple	`spare'	section	might be:

	   START spare
	   /dev/sd3e

     for a configuration with a	single spare component.	 If no spare drives
     are to be used in the configuration, then the `spare' section may be
     omitted.

     The next section is the `layout' section.	This section describes the
     general layout parameters for the RAID device, and	provides such informa-
     tion as sectors per stripe	unit, stripe units per parity unit, stripe
     units per reconstruction unit, and	the parity configuration to use.  This
     section might look	like:

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level
	   32 1	1 5

     The sectors per stripe unit specifies, in blocks, the interleave factor;
     i.e., the number of contiguous sectors to be written to each component
     for a single stripe.  Appropriate selection of this value (32 in this ex-
     ample) is the subject of much research in RAID architectures.  The	stripe
     units per parity unit and stripe units per	reconstruction unit are	nor-
     mally each	set to 1.  While certain values	above 1	are permitted, a dis-
     cussion of	valid values and the consequences of using anything other than
     1 are outside the scope of	this document.	The last value in this section
     (5	in this	example) indicates the parity configuration desired.  Valid
     entries include:

     0	   RAID	level 0.  No parity, only simple striping.

     1	   RAID	level 1.  Mirroring.  The parity is the	mirror.

     4	   RAID	level 4.  Striping across components, with parity stored on
	   the last component.

     5	   RAID	level 5.  Striping across components, parity distributed
	   across all components.

     There are other valid entries here, including those for Even-Odd parity,
     RAID level	5 with rotated sparing,	Chained	declustering, and Interleaved
     declustering, but as of this writing the code for those parity operations
     has not been tested with NetBSD.

     The next required section is the `queue' section.	This is	most often
     specified as:

	   START queue
	   fifo	100

     where the queuing method is specified as fifo (first-in, first-out), and
     the size of the per-component queue is limited to 100 requests.  Other
     queuing methods may also be specified, but	a discussion of	them is	beyond
     the scope of this document.

     The final section,	the `debug' section, is	optional.  For more details on
     this the reader is	referred to the	RAIDframe documentation	discussed in
     the HISTORY section.

     See EXAMPLES for a	more complete configuration file example.

FILES
     /dev/{,r}raid*  raid device special files.

EXAMPLES
     It	is highly recommended that before using	the RAID driver	for real file
     systems that the system administrator(s) become quite familiar with the
     use of raidctl, and that they understand how the component	reconstruction
     process works.  The examples in this section will focus on	configuring a
     number of different RAID sets of varying degrees of redundancy.  By work-
     ing through these examples, administrators	should be able to develop a
     good feel for how to configure a RAID set,	and how	to initiate recon-
     struction of failed components.

     In	the following examples `raid0' will be used to denote the RAID device.
     Depending on the architecture, /dev/rraid0c or /dev/rraid0d may be	used
     in	place of raid0.

   Initialization and Configuration
     The initial step in configuring a RAID set	is to identify the components
     that will be used in the RAID set.	 All components	should be the same
     size.  Each component should have a disklabel type	of FS_RAID, and	a typ-
     ical disklabel entry for a	RAID component might look like:

	   f:  1800000	200495	   RAID		     # (Cyl.  405*- 4041*)

     While FS_BSDFFS will also work as the component type, the type FS_RAID is
     preferred for RAIDframe use, as it	is required for	features such as auto-
     configuration.  As	part of	the initial configuration of each RAID set,
     each component will be given a `component label'.	A `component label'
     contains important	information about the component, including a user-
     specified serial number, the row and column of that component in the RAID
     set, the redundancy level of the RAID set,	a `modification	counter', and
     whether the parity	information (if	any) on	that component is known	to be
     correct.  Component labels	are an integral	part of	the RAID set, since
     they are used to ensure that components are configured in the correct or-
     der, and used to keep track of other vital	information about the RAID
     set.  Component labels are	also required for the auto-detection and auto-
     configuration of RAID sets	at boot	time.  For a component label to	be
     considered	valid, that particular component label must be in agreement
     with the other component labels in	the set.  For example, the serial num-
     ber, `modification	counter', number of rows and number of columns must
     all be in agreement.  If any of these are different, then the component
     is	not considered to be part of the set.  See raid(4) for more informa-
     tion about	component labels.

     Once the components have been identified, and the disks have appropriate
     labels, raidctl is	then used to configure the raid(4) device.  To config-
     ure the device, a configuration file which	looks something	like:

	   START array
	   # numRow numCol numSpare
	   1 3 1

	   START disks
	   /dev/sd1e
	   /dev/sd2e
	   /dev/sd3e

	   START spare
	   /dev/sd4e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_5
	   32 1	1 5

	   START queue
	   fifo	100

     is	created	in a file.  The	above configuration file specifies a RAID 5
     set consisting of the components /dev/sd1e, /dev/sd2e, and	/dev/sd3e,
     with /dev/sd4e available as a `hot	spare' in case one of the three	main
     drives should fail.  A RAID 0 set would be	specified in a similar way:

	   START array
	   # numRow numCol numSpare
	   1 4 0

	   START disks
	   /dev/sd10e
	   /dev/sd11e
	   /dev/sd12e
	   /dev/sd13e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_0
	   64 1	1 0

	   START queue
	   fifo	100

     In	this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e, and /dev/sd13e
     are the components	that make up this RAID set.  Note that there are no
     hot spares	for a RAID 0 set, since	there is no way	to recover data	if any
     of	the components fail.

     For a RAID	1 (mirror) set,	the following configuration might be used:

	   START array
	   # numRow numCol numSpare
	   1 2 0

	   START disks
	   /dev/sd20e
	   /dev/sd21e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_1
	   128 1 1 1

	   START queue
	   fifo	100

     In	this case, /dev/sd20e and /dev/sd21e are the two components of the
     mirror set.  While	no hot spares have been	specified in this configura-
     tion, they	easily could be, just as they were specified in	the RAID 5
     case above.  Note as well that RAID 1 sets	are currently limited to only
     2 components.  At present,	n-way mirroring	is not possible.

     The first time a RAID set is configured, the -C option must be used:

	   raidctl -C raid0.conf raid0

     where raid0.conf is the name of the RAID configuration file.  The -C
     forces the	configuration to succeed, even if any of the component labels
     are incorrect.  The -C option should not be used lightly in situations
     other than	initial	configurations,	as if the system is refusing to	con-
     figure a RAID set,	there is probably a very good reason for it.  After
     the initial configuration is done (and appropriate	component labels are
     added with	the -I option) then raid0 can be configured normally with:

	   raidctl -c raid0.conf raid0

     When the RAID set is configured for the first time, it is necessary to
     initialize	the component labels, and to initialize	the parity on the RAID
     set.  Initializing	the component labels is	done with:

	   raidctl -I 112341 raid0

     where `112341' is a user-specified	serial number for the RAID set.	 This
     initialization step is required for all RAID sets.	 As well, using	dif-
     ferent serial numbers between RAID	sets is	strongly encouraged, as	using
     the same serial number for	all RAID sets will only	serve to decrease the
     usefulness	of the component label checking.

     Initializing the RAID set is done via the -i option.  This	initialization
     MUST be done for all RAID sets, since among other things it verifies that
     the parity	(if any) on the	RAID set is correct.  Since this initializa-
     tion may be quite time-consuming, the -v option may be also used in con-
     junction with -i:

	   raidctl -iv raid0

     This will give more verbose output	on the status of the initialization:

	   Initiating re-write of parity
	   Parity Re-write status:
	    10%	|****					| ETA:	  06:03	/

     The output	provides a `Percent Complete' in both a	numeric	and graphical
     format, as	well as	an estimated time to completion	of the operation.

     Since it is the parity that provides the `redundancy' part	of RAID, it is
     critical that the parity is correct as much as possible.  If the parity
     is	not correct, then there	is no guarantee	that data will not be lost if
     a component fails.

     Once the parity is	known to be correct, it	is then	safe to	perform
     disklabel(8), newfs(8), or	fsck(8)	on the device or its file systems, and
     then to mount the file systems for	use.

     Under certain circumstances (e.g.,	the additional component has not ar-
     rived, or data is being migrated off of a disk destined to	become a com-
     ponent) it	may be desirable to configure a	RAID 1 set with	only a single
     component.	 This can be achieved by using the word	"absent" to indicate
     that a particular component is not	present.  In the following:

	   START array
	   # numRow numCol numSpare
	   1 2 0

	   START disks
	   absent
	   /dev/sd0e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_1
	   128 1 1 1

	   START queue
	   fifo	100

     /dev/sd0e is the real component, and will be the second disk of a RAID 1
     set.  The first component is simply marked	as being absent.  Configura-
     tion (using -C and	-I 12345 as above) proceeds normally, but initializa-
     tion of the RAID set will have to wait until all physical components are
     present.  After configuration, this set can be used normally, but will be
     operating in degraded mode.  Once a second	physical component is ob-
     tained, it	can be hot-added, the existing data mirrored, and normal oper-
     ation resumed.

     The size of the resulting RAID set	will depend on the number of data com-
     ponents in	the set.  Space	is automatically reserved for the component
     labels, and the actual amount of space used for data on a component will
     be	rounded	down to	the largest possible multiple of the sectors per
     stripe unit (sectPerSU) value.  Thus, the amount of space provided	by the
     RAID set will be less than	the sum	of the size of the components.

   Maintenance of the RAID set
     After the parity has been initialized for the first time, the command:

	   raidctl -p raid0

     can be used to check the current status of	the parity.  To	check the par-
     ity and rebuild it	necessary (for example,	after an unclean shutdown) the
     command:

	   raidctl -P raid0

     is	used.  Note that re-writing the	parity can be done while other opera-
     tions on the RAID set are taking place (e.g., while doing a fsck(8) on a
     file system on the	RAID set).  However: for maximum effectiveness of the
     RAID set, the parity should be known to be	correct	before any data	on the
     set is modified.

     To	see how	the RAID set is	doing, the following command can be used to
     show the RAID set's status:

	   raidctl -s raid0

     The output	will look something like:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: spare
	   Component label for /dev/sd1e:
	      Row: 0 Column: 0 Num Rows: 1 Num Columns:	3
	      Version: 2 Serial	Number:	13432 Mod Counter: 65
	      Clean: No	Status:	0
	      sectPerSU: 32 SUsPerPU: 1	SUsPerRU: 1
	      RAID Level: 5  blocksize:	512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Component label for /dev/sd2e:
	      Row: 0 Column: 1 Num Rows: 1 Num Columns:	3
	      Version: 2 Serial	Number:	13432 Mod Counter: 65
	      Clean: No	Status:	0
	      sectPerSU: 32 SUsPerPU: 1	SUsPerRU: 1
	      RAID Level: 5  blocksize:	512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Component label for /dev/sd3e:
	      Row: 0 Column: 2 Num Rows: 1 Num Columns:	3
	      Version: 2 Serial	Number:	13432 Mod Counter: 65
	      Clean: No	Status:	0
	      sectPerSU: 32 SUsPerPU: 1	SUsPerRU: 1
	      RAID Level: 5  blocksize:	512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Parity status: clean
	   Reconstruction is 100% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     This indicates that all is	well with the RAID set.	 Of importance here
     are the component lines which read	`optimal', and the `Parity status'
     line.  `Parity status: clean' indicates that the parity is	up-to-date for
     this RAID set, whether or not the RAID set	is in redundant	or degraded
     mode.  `Parity status: DIRTY' indicates that it is	not known if the par-
     ity information is	consistent with	the data, and that the parity informa-
     tion needs	to be checked.	Note that if there are file systems open on
     the RAID set, the individual components will not be `clean' but the set
     as	a whole	can still be clean.

     To	check the component label of /dev/sd1e,	the following is used:

	   raidctl -g /dev/sd1e	raid0

     The output	of this	command	will look something like:

	   Component label for /dev/sd1e:
	      Row: 0 Column: 0 Num Rows: 1 Num Columns:	3
	      Version: 2 Serial	Number:	13432 Mod Counter: 65
	      Clean: No	Status:	0
	      sectPerSU: 32 SUsPerPU: 1	SUsPerRU: 1
	      RAID Level: 5  blocksize:	512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0

   Dealing with	Component Failures
     If	for some reason	(perhaps to test reconstruction) it is necessary to
     pretend a drive has failed, the following will perform that function:

	   raidctl -f /dev/sd2e	raid0

     The system	will then be performing	all operations in degraded mode, where
     missing data is re-computed from existing data and	the parity.  In	this
     case, obtaining the status	of raid0 will return (in part):

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: spare

     Note that with the	use of -f a reconstruction has not been	started.  To
     both fail the disk	and start a reconstruction, the	-F option must be
     used:

	   raidctl -F /dev/sd2e	raid0

     The -f option may be used first, and then the -F option used later, on
     the same disk, if desired.	 Immediately after the reconstruction is
     started, the status will report:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: reconstructing
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: used_spare
	   [...]
	   Parity status: clean
	   Reconstruction is 10% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     This indicates that a reconstruction is in	progress.  To find out how the
     reconstruction is progressing the -S option may be	used.  This will indi-
     cate the progress in terms	of the percentage of the reconstruction	that
     is	completed.  When the reconstruction is finished	the -s option will
     show:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: spared
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: used_spare
	   [...]
	   Parity status: clean
	   Reconstruction is 100% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     At	this point there are at	least two options.  First, if /dev/sd2e	is
     known to be good (i.e., the failure was either caused by -f or -F,	or the
     failed disk was replaced),	then a copyback	of the data can	be initiated
     with the -B option.  In this example, this	would copy the entire contents
     of	/dev/sd4e to /dev/sd2e.	 Once the copyback procedure is	complete, the
     status of the device would	be (in part):

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: spare

     and the system is back to normal operation.

     The second	option after the reconstruction	is to simply use /dev/sd4e in
     place of /dev/sd2e	in the configuration file.  For	example, the configu-
     ration file (in part) might now look like:

	   START array
	   1 3 0

	   START disks
	   /dev/sd1e
	   /dev/sd4e
	   /dev/sd3e

     This can be done as /dev/sd4e is completely interchangeable with
     /dev/sd2e at this point.  Note that extreme care must be taken when
     changing the order	of the drives in a configuration.  This	is one of the
     few instances where the devices and/or their orderings can	be changed
     without loss of data!  In general,	the ordering of	components in a	con-
     figuration	file should never be changed.

     If	a component fails and there are	no hot spares available	on-line, the
     status of the RAID	set might (in part) look like:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
	   No spares.

     In	this case there	are a number of	options.  The first option is to add a
     hot spare using:

	   raidctl -a /dev/sd4e	raid0

     After the hot add,	the status would then be:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
	   Spares:
		      /dev/sd4e: spare

     Reconstruction could then take place using	-F as describe above.

     A second option is	to rebuild directly onto /dev/sd2e.  Once the disk
     containing	/dev/sd2e has been replaced, one can simply use:

	   raidctl -R /dev/sd2e	raid0

     to	rebuild	the /dev/sd2e component.  As the rebuilding is in progress,
     the status	will be:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: reconstructing
		      /dev/sd3e: optimal
	   No spares.

     and when completed, will be:

	   Components:
		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
	   No spares.

     In	circumstances where a particular component is completely unavailable
     after a reboot, a special component name will be used to indicate the
     missing component.	 For example:

	   Components:
		      /dev/sd2e: optimal
		     component1: failed
	   No spares.

     indicates that the	second component of this RAID set was not detected at
     all by the	auto-configuration code.  The name `component1'	can be used
     anywhere a	normal component name would be used.  For example, to add a
     hot spare to the above set, and rebuild to	that hot spare,	the following
     could be done:

	   raidctl -a /dev/sd3e	raid0
	   raidctl -F component1 raid0

     at	which point the	data missing from `component1' would be	reconstructed
     onto /dev/sd3e.

     When more than one	component is marked as `failed'	due to a non-component
     hardware failure (e.g., loss of power to two components, adapter prob-
     lems, termination problems, or cabling issues) it is quite	possible to
     recover the data on the RAID set.	The first thing	to be aware of is that
     the first disk to fail will almost	certainly be out-of-sync with the re-
     mainder of	the array.  If any IO was performed between the	time the first
     component is considered `failed' and when the second component is consid-
     ered `failed', then the first component to	fail will not contain correct
     data, and should be ignored.  When	the second component is	marked as
     failed, however, the RAID device will (currently) panic the system.  At
     this point	the data on the	RAID set (not including	the first failed com-
     ponent) is	still self consistent, and will	be in no worse state of	repair
     than had the power	gone out in the	middle of a write to a file system on
     a non-RAID	device.	 The problem, however, is that the component labels
     may now have 3 different `modification counters' (one value on the	first
     component that failed, one	value on the second component that failed, and
     a third value on the remaining components).  In such a situation, the
     RAID set will not autoconfigure, and can only be forcibly re-configured
     with the -C option.  To recover the RAID set, one must first remedy what-
     ever physical problem caused the multiple-component failure.  After that
     is	done, the RAID set can be restored by forcibly configuring the raid
     set without the component that failed first.  For example,	if /dev/sd1e
     and /dev/sd2e fail	(in that order)	in a RAID set of the following config-
     uration:

	   START array
	   1 4 0

	   START disks
	   /dev/sd1e
	   /dev/sd2e
	   /dev/sd3e
	   /dev/sd4e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_5
	   64 1	1 5

	   START queue
	   fifo	100

     then the following	configuration (say "recover_raid0.conf")

	   START array
	   1 4 0

	   START disks
	   absent
	   /dev/sd2e
	   /dev/sd3e
	   /dev/sd4e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_5
	   64 1	1 5

	   START queue
	   fifo	100

     can be used with

	   raidctl -C recover_raid0.conf raid0

     to	force the configuration	of raid0.  A

	   raidctl -I 12345 raid0

     will be required in order to synchronize the component labels.  At	this
     point the file systems on the RAID	set can	then be	checked	and corrected.
     To	complete the re-construction of	the RAID set, /dev/sd1e	is simply hot-
     added back	into the array,	and reconstructed as described earlier.

   RAID	on RAID
     RAID sets can be layered to create	more complex and much larger RAID
     sets.  A RAID 0 set, for example, could be	constructed from four RAID 5
     sets.  The	following configuration	file shows such	a setup:

	   START array
	   # numRow numCol numSpare
	   1 4 0

	   START disks
	   /dev/raid1e
	   /dev/raid2e
	   /dev/raid3e
	   /dev/raid4e

	   START layout
	   # sectPerSU SUsPerParityUnit	SUsPerReconUnit	RAID_level_0
	   128 1 1 0

	   START queue
	   fifo	100

     A similar configuration file might	be used	for a RAID 0 set constructed
     from components on	RAID 1 sets.  In such a	configuration, the mirroring
     provides a	high degree of redundancy, while the striping provides addi-
     tional speed benefits.

   Auto-configuration and Root on RAID
     RAID sets can also	be auto-configured at boot.  To	make a set auto-con-
     figurable,	simply prepare the RAID	set as above, and then do a:

	   raidctl -A yes raid0

     to	turn on	auto-configuration for that set.  To turn off auto-configura-
     tion, use:

	   raidctl -A no raid0

     RAID sets which are auto-configurable will	be configured before the root
     file system is mounted.  These RAID sets are thus available for use as a
     root file system, or for any other	file system.  A	primary	advantage of
     using the auto-configuration is that RAID components become more indepen-
     dent of the disks they reside on.	For example, SCSI ID's can change, but
     auto-configured sets will always be configured correctly, even if the
     SCSI ID's of the component	disks have become scrambled.

     Having a system's root file system	(/) on a RAID set is also allowed,
     with the `a' partition of such a RAID set being used for /.  To use
     raid0a as the root	file system, simply use:

	   raidctl -A root raid0

     To	return raid0a to be just an auto-configuring set simply	use the	-A yes
     arguments.

     Note that kernels can only	be directly read from RAID 1 components	on ar-
     chitectures that support that (currently alpha, i386, pmax, sparc,
     sparc64, and vax).	 On those architectures, the FS_RAID file system is
     recognized	by the bootblocks, and will properly load the kernel directly
     from a RAID 1 component.  For other architectures,	or to support the root
     file system on other RAID sets, some other	mechanism must be used to get
     a kernel booting.	For example, a small partition containing only the
     secondary boot-blocks and an alternate kernel (or two) could be used.
     Once a kernel is booting however, and an auto-configuring RAID set	is
     found that	is eligible to be root,	then that RAID set will	be auto-con-
     figured and used as the root device.  If two or more RAID sets claim to
     be	root devices, then the user will be prompted to	select the root	de-
     vice.  At this time, RAID 0, 1, 4,	and 5 sets are all supported as	root
     devices.

     A typical RAID 1 setup with root on RAID might be as follows:

     1.	  wd0a - a small partition, which contains a complete, bootable, basic
	  NetBSD installation.

     2.	  wd1a - also contains a complete, bootable, basic NetBSD installa-
	  tion.

     3.	  wd0e and wd1e	- a RAID 1 set,	raid0, used for	the root file system.

     4.	  wd0f and wd1f	- a RAID 1 set,	raid1, which will be used only for
	  swap space.

     5.	  wd0g and wd1g	- a RAID 1 set,	raid2, used for	/usr, /home, or	other
	  data,	if desired.

     6.	  wd0h and wd1h	- a RAID 1 set,	raid3, if desired.

     RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
     raid0 is marked as	being a	root file system.  When	new kernels are	in-
     stalled, the kernel is not	only copied to /, but also to wd0a and wd1a.
     The kernel	on wd0a	is required, since that	is the kernel the system boots
     from.  The	kernel on wd1a is also required, since that will be the	kernel
     used should wd0 fail.  The	important point	here is	to have	redundant
     copies of the kernel available, in	the event that one of the drives fail.

     There is no requirement that the root file	system be on the same disk as
     the kernel.  For example, obtaining the kernel from wd0a, and using sd0e
     and sd1e for raid0, and the root file system, is fine.  It	is critical,
     however, that there be multiple kernels available,	in the event of	media
     failure.

     Multi-layered RAID	devices	(such as a RAID	0 set made up of RAID 1	sets)
     are not supported as root devices or auto-configurable devices at this
     point.  (Multi-layered RAID devices are supported in general, however, as
     mentioned earlier.)  Note that in order to	enable component auto-detec-
     tion and auto-configuration of RAID devices, the line:

	   options    RAID_AUTOCONFIG

     must be in	the kernel configuration file.	See raid(4) for	more details.

   Swapping on RAID
     A RAID device can be used as a swap device.  In order to ensure that a
     RAID device used as a swap	device is correctly unconfigured when the sys-
     tem is shutdown or	rebooted, it is	recommended that the line

	   swapoff=YES

     be	added to /etc/rc.conf.

   Unconfiguration
     The final operation performed by raidctl is to unconfigure	a raid(4) de-
     vice.  This is accomplished via a simple:

	   raidctl -u raid0

     at	which point the	device is ready	to be reconfigured.

   Performance Tuning
     Selection of the various parameter	values which result in the best	per-
     formance can be quite tricky, and often requires a	bit of trial-and-error
     to	get those values most appropriate for a	given system.  A whole range
     of	factors	come into play,	including:

     1.	  Types	of components (e.g., SCSI vs. IDE) and their bandwidth

     2.	  Types	of controller cards and	their bandwidth

     3.	  Distribution of components among controllers

     4.	  IO bandwidth

     5.	  file system access patterns

     6.	  CPU speed

     As	with most performance tuning, benchmarking under real-life loads may
     be	the only way to	measure	expected performance.  Understanding some of
     the underlying technology is also useful in tuning.  The goal of this
     section is	to provide pointers to those parameters	which may make signif-
     icant differences in performance.

     For a RAID	1 set, a SectPerSU value of 64 or 128 is typically sufficient.
     Since data	in a RAID 1 set	is arranged in a linear	fashion	on each	compo-
     nent, selecting an	appropriate stripe size	is somewhat less critical than
     it	is for a RAID 5	set.  However: a stripe	size that is too small will
     cause large IO's to be broken up into a number of smaller ones, hurting
     performance.  At the same time, a large stripe size may cause problems
     with concurrent accesses to stripes, which	may also affect	performance.
     Thus values in the	range of 32 to 128 are often the most effective.

     Tuning RAID 5 sets	is trickier.  In the best case,	IO is presented	to the
     RAID set one stripe at a time.  Since the entire stripe is	available at
     the beginning of the IO, the parity of that stripe	can be calculated be-
     fore the stripe is	written, and then the stripe data and parity can be
     written in	parallel.  When	the amount of data being written is less than
     a full stripe worth, the `small write' problem occurs.  Since a `small
     write' means only a portion of the	stripe on the components is going to
     change, the data (and parity) on the components must be updated slightly
     differently.  First, the `old parity' and `old data' must be read from
     the components.  Then the new parity is constructed, using	the new	data
     to	be written, and	the old	data and old parity.  Finally, the new data
     and new parity are	written.  All this extra data shuffling	results	in a
     serious loss of performance, and is typically 2 to	4 times	slower than a
     full stripe write (or read).  To combat this problem in the real world,
     it	may be useful to ensure	that stripe sizes are small enough that	a
     `large IO'	from the system	will use exactly one large stripe write.  As
     is	seen later, there are some file	system dependencies which may come
     into play here as well.

     Since the size of a `large	IO' is often (currently) only 32K or 64K, on a
     5-drive RAID 5 set	it may be desirable to select a	SectPerSU value	of 16
     blocks (8K) or 32 blocks (16K).  Since there are 4	data sectors per
     stripe, the maximum data per stripe is 64 blocks (32K) or 128 blocks
     (64K).  Again, empirical measurement will provide the best	indicators of
     which values will yield better performance.

     The parameters used for the file system are also critical to good perfor-
     mance.  For newfs(8), for example,	increasing the block size to 32K or
     64K may improve performance dramatically.	As well, changing the cylin-
     ders-per-group parameter from 16 to 32 or higher is often not only	neces-
     sary for larger file systems, but may also	have positive performance im-
     plications.

   Summary
     Despite the length	of this	man-page, configuring a	RAID set is a rela-
     tively straight-forward process.  All that	needs to be done is the	fol-
     lowing steps:

     1.	  Use disklabel(8) to create the components (of	type RAID).

     2.	  Construct a RAID configuration file: e.g., raid0.conf

     3.	  Configure the	RAID set with:

		raidctl	-C raid0.conf raid0

     4.	  Initialize the component labels with:

		raidctl	-I 123456 raid0

     5.	  Initialize other important parts of the set with:

		raidctl	-i raid0

     6.	  Get the default label	for the	RAID set:

		disklabel raid0	> /tmp/label

     7.	  Edit the label:

		vi /tmp/label

     8.	  Put the new label on the RAID	set:

		disklabel -R -r	raid0 /tmp/label

     9.	  Create the file system:

		newfs /dev/rraid0e

     10.  Mount	the file system:

		mount /dev/raid0e /mnt

     11.  Use:

		raidctl	-c raid0.conf raid0

	  To re-configure the RAID set the next	time it	is needed, or put
	  raid0.conf into /etc where it	will automatically be started by the
	  /etc/rc.d scripts.

SEE ALSO
     ccd(4), raid(4), rc(8)

HISTORY
     RAIDframe is a framework for rapid	prototyping of RAID structures devel-
     oped by the folks at the Parallel Data Laboratory at Carnegie Mellon Uni-
     versity (CMU).  A more complete description of the	internals and func-
     tionality of RAIDframe is found in	the paper "RAIDframe: A	Rapid Proto-
     typing Tool for RAID Systems", by William V. Courtright II, Garth Gibson,
     Mark Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by	the
     Parallel Data Laboratory of Carnegie Mellon University.

     The raidctl command first appeared	as a program in	CMU's RAIDframe	v1.1
     distribution.  This version of raidctl is a complete re-write, and	first
     appeared in NetBSD	1.4.

COPYRIGHT
     The RAIDframe Copyright is	as follows:

     Copyright (c) 1994-1996 Carnegie-Mellon University.
     All rights	reserved.

     Permission	to use,	copy, modify and distribute this software and
     its documentation is hereby granted, provided that	both the copyright
     notice and	this permission	notice appear in all copies of the
     software, derivative works	or modified versions, and any portions
     thereof, and that both notices appear in supporting documentation.

     CARNEGIE MELLON ALLOWS FREE USE OF	THIS SOFTWARE IN ITS "AS IS"
     CONDITION.	 CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY	KIND
     FOR ANY DAMAGES WHATSOEVER	RESULTING FROM THE USE OF THIS SOFTWARE.

     Carnegie Mellon requests users of this software to	return to

      Software Distribution Coordinator	 or  Software.Distribution@CS.CMU.EDU
      School of	Computer Science
      Carnegie Mellon University
      Pittsburgh PA 15213-3890

     any improvements or extensions that they make and grant Carnegie the
     rights to redistribute these changes.

WARNINGS
     Certain RAID levels (1, 4,	5, 6, and others) can protect against some
     data loss due to component	failure.  However the loss of two components
     of	a RAID 4 or 5 system, or the loss of a single component	of a RAID 0
     system will result	in the entire file system being	lost.  RAID is NOT a
     substitute	for good backup	practices.

     Recomputation of parity MUST be performed whenever	there is a chance that
     it	may have been compromised.  This includes after	system crashes,	or be-
     fore a RAID device	has been used for the first time.  Failure to keep
     parity correct will be catastrophic should	a component ever fail -- it is
     better to use RAID	0 and get the additional space and speed, than it is
     to	use parity, but	not keep the parity correct.  At least with RAID 0
     there is no perception of increased data security.

     When replacing a failed component of a RAID set, it is a good idea	to
     zero out the first	64 blocks of the new component to insure the RAIDframe
     driver doesn't erroneously	detect a component label in the	new component.
     This is particularly true on RAID 1 sets because there is at most one
     correct component label in	a failed RAID 1	installation, and the RAID-
     frame driver picks	the component label with the highest serial number and
     modification value	as the authoritative source for	the failed RAID	set
     when choosing which component label to use	to configure the RAID set.

BUGS
     Hot-spare removal is currently not	available.

BSD			       January 27, 2010				   BSD

NAME | SYNOPSIS | DESCRIPTION | FILES | EXAMPLES | SEE ALSO | HISTORY | COPYRIGHT | WARNINGS | BUGS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=raidctl&sektion=8&manpath=NetBSD+6.0>

home | help