Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
ZFS-MODULE-PARAMETERS(5)      File Formats Manual     ZFS-MODULE-PARAMETERS(5)

NAME
       zfs-module-parameters - ZFS module parameters

DESCRIPTION
       Description of the different parameters to the ZFS module.

   Module parameters
       dbuf_cache_max_bytes (ulong)
		   Maximum  size  in bytes of the dbuf cache.  The target size
		   is determined by the	MIN versus 1/2^dbuf_cache_shift	(1/32)
		   of the target ARC size.  The	behavior of the	dbuf cache and
		   its	associated  settings   can   be	  observed   via   the
		   /proc/spl/kstat/zfs/dbufstats kstat.

		   Default value: ULONG_MAX.

       dbuf_metadata_cache_max_bytes (ulong)
		   Maximum size	in bytes of the	metadata dbuf cache.  The tar-
		   get size is determined by  the  MIN	versus	1/2^dbuf_meta-
		   data_cache_shift (1/64) of the target ARC size.  The	behav-
		   ior of the metadata dbuf cache and its associated  settings
		   can	 be  observed  via  the	 /proc/spl/kstat/zfs/dbufstats
		   kstat.

		   Default value: ULONG_MAX.

       dbuf_cache_hiwater_pct (uint)
		   The percentage over dbuf_cache_max_bytes when dbufs must be
		   evicted directly.

		   Default value: 10%.

       dbuf_cache_lowater_pct (uint)
		   The	percentage  below  dbuf_cache_max_bytes	when the evict
		   thread stops	evicting dbufs.

		   Default value: 10%.

       dbuf_cache_shift	(int)
		   Set the size	of the dbuf cache, dbuf_cache_max_bytes, to  a
		   log2	fraction of the	target ARC size.

		   Default value: 5.

       dbuf_metadata_cache_shift (int)
		   Set	the  size  of  the  dbuf  metadata  cache,  dbuf_meta-
		   data_cache_max_bytes, to a log2 fraction of the target  ARC
		   size.

		   Default value: 6.

       dmu_object_alloc_chunk_shift (int)
		   dnode  slots	 allocated in a	single operation as a power of
		   2. The default value	minimizes lock contention for the bulk
		   operation performed.

		   Default value: 7 (128).

       dmu_prefetch_max	(int)
		   Limit  the  amount  we  can	prefetch with one call to this
		   amount (in bytes).  This helps to limit the amount of  mem-
		   ory that can	be used	by prefetching.

		   Default value: 134,217,728 (128MB).

       ignore_hole_birth (int)
		   This	is an alias for	send_holes_without_birth_time.

       l2arc_feed_again	(int)
		   Turbo L2ARC warm-up.	When the L2ARC is cold the fill	inter-
		   val will be set as fast as possible.

		   Use 1 for yes (default) and 0 to disable.

       l2arc_feed_min_ms (ulong)
		   Min	  feed	  interval    in    milliseconds.     Requires
		   l2arc_feed_again=1  and  only  applicable in	related	situa-
		   tions.

		   Default value: 200.

       l2arc_feed_secs (ulong)
		   Seconds between L2ARC writing

		   Default value: 1.

       l2arc_headroom (ulong)
		   How far through the ARC lists to search for L2ARC cacheable
		   content, expressed as a multiplier of l2arc_write_max.  ARC
		   persistence across reboots can be achieved with  persistent
		   L2ARC  by  setting  this  parameter	to 0 allowing the full
		   length of ARC lists to be searched for cacheable content.

		   Default value: 2.

       l2arc_headroom_boost (ulong)
		   Scales l2arc_headroom by this percentage  when  L2ARC  con-
		   tents  are  being successfully compressed before writing. A
		   value of 100	disables this feature.

		   Default value: 200%.

       l2arc_mfuonly (int)
		   Controls whether only MFU metadata and data are cached from
		   ARC into L2ARC.  This may be	desired	to avoid wasting space
		   on L2ARC when reading/writing large amounts	of  data  that
		   are not expected to be accessed more	than once. The default
		   is 0, meaning both  MRU  and	 MFU  data  and	 metadata  are
		   cached.  When turning off (0) this feature some MRU buffers
		   will	still be present  in  ARC  and	eventually  cached  on
		   L2ARC.  If  l2arc_noprefetch	 is  set to 0, some prefetched
		   buffers will	be cached to  L2ARC,  and  those  might	 later
		   transition  to  MRU,	in which case the l2arc_mru_asize arc-
		   stat	will not be 0. Regardless  of  l2arc_noprefetch,  some
		   MFU buffers might be	evicted	from ARC, accessed later on as
		   prefetches and transition to	MRU  as	 prefetches.   If  ac-
		   cessed   again   they   are	 counted   as	MRU   and  the
		   l2arc_mru_asize arcstat will	not be 0. The  ARC  status  of
		   L2ARC  buffers  when	they were first	cached in L2ARC	can be
		   seen	  in   the   l2arc_mru_asize,	l2arc_mfu_asize	   and
		   l2arc_prefetch_asize	 arcstats  when	 importing the pool or
		   onlining a cache device if persistent L2ARC is enabled. The
		   evicted_l2_eligible_mru  arcstat does not take into account
		   if this option is enabled as	the  information  provided  by
		   the evicted_l2_eligible_* arcstats can be used to decide if
		   toggling this option	is appropriate for the	current	 work-
		   load.

		   Use 0 for no	(default) and 1	for yes.

       l2arc_meta_percent (int)
		   Percent  of ARC size	allowed	for L2ARC-only headers.	 Since
		   L2ARC buffers are not evicted on memory pressure, too large
		   amount  of  headers	on system with irrationaly large L2ARC
		   can render it slow  or  unusable.   This  parameter	limits
		   L2ARC writes	and rebuild to achieve it.

		   Default value: 33%.

       l2arc_trim_ahead	(ulong)
		   Trims  ahead	of the current write size (l2arc_write_max) on
		   L2ARC devices by this percentage of write size if  we  have
		   filled  the	device.	 If set	to 100 we TRIM twice the space
		   required to accommodate upcoming writes. A minimum of  64MB
		   will	 be  trimmed.  It also enables TRIM of the whole L2ARC
		   device upon creation	or addition to an existing pool	or  if
		   the	header	of the device is invalid upon importing	a pool
		   or onlining a cache device. A value of 0 disables  TRIM  on
		   L2ARC  altogether and is the	default	as it can put signifi-
		   cant	stress on the underlying storage  devices.  This  will
		   vary	 depending  of	how  well  the specific	device handles
		   these commands.

		   Default value: 0%.

       l2arc_noprefetch	(int)
		   Do not write	buffers	to L2ARC if they were  prefetched  but
		   not used by applications. In	case there are prefetched buf-
		   fers	in L2ARC and this option is later set to 1, we do  not
		   read	 the  prefetched buffers from L2ARC.  Setting this op-
		   tion	to 0 is	useful for caching sequential reads  from  the
		   disks  to  L2ARC and	serve those reads from L2ARC later on.
		   This	may be beneficial in case the L2ARC device is signifi-
		   cantly  faster  in  sequential  reads than the disks	of the
		   pool.

		   Use 1 to disable (default) and 0 to enable  caching/reading
		   prefetches to/from L2ARC..

       l2arc_norw (int)
		   No reads during writes.

		   Use 1 for yes and 0 for no (default).

       l2arc_write_boost (ulong)
		   Cold	 L2ARC	devices	will have l2arc_write_max increased by
		   this	amount while they remain cold.

		   Default value: 8,388,608.

       l2arc_write_max (ulong)
		   Max write bytes per interval.

		   Default value: 8,388,608.

       l2arc_rebuild_enabled (int)
		   Rebuild the L2ARC when importing a pool (persistent L2ARC).
		   This	can be disabled	if there are problems importing	a pool
		   or attaching	an L2ARC device	(e.g. the L2ARC	device is slow
		   in  reading stored log metadata, or the metadata has	become
		   somehow fragmented/unusable).

		   Use 1 for yes (default) and 0 for no.

       l2arc_rebuild_blocks_min_l2size (ulong)
		   Min size (in	bytes) of an L2ARC device required in order to
		   write  log  blocks  in it. The log blocks are used upon im-
		   porting the pool to rebuild the L2ARC  (persistent  L2ARC).
		   Rationale:  for  L2ARC devices less than 1GB, the amount of
		   data	l2arc_evict() evicts is	significant  compared  to  the
		   amount  of  restored	 L2ARC data. In	this case do not write
		   log blocks in L2ARC in order	not to waste space.

		   Default value: 1,073,741,824	(1GB).

       metaslab_aliquot	(ulong)
		   Metaslab granularity, in bytes. This	is roughly similar  to
		   what	 would	be  referred to	as the "stripe size" in	tradi-
		   tional RAID arrays. In normal operation, ZFS	 will  try  to
		   write this amount of	data to	a top-level vdev before	moving
		   on to the next one.

		   Default value: 524,288.

       metaslab_bias_enabled (int)
		   Enable metaslab group biasing based on its vdev's over-  or
		   under-utilization relative to the pool.

		   Use 1 for yes (default) and 0 for no.

       metaslab_force_ganging (ulong)
		   Make	some blocks above a certain size be gang blocks.  This
		   option is used by the test suite to facilitate testing.

		   Default value: 16,777,217.

       zfs_history_output_max (int)
		   When	attempting to log the output nvlist of an ioctl	in the
		   on-disk  history,  the  output  will	not be stored if it is
		   larger than	size  (in  bytes).  This  must	be  less  then
		   DMU_MAX_ACCESS    (64MB).   This   applies	primarily   to
		   zfs_ioc_channel_program().

		   Default value: 1MB.

       zfs_keep_log_spacemaps_at_export	(int)
		   Prevent log spacemaps from being destroyed during pool  ex-
		   ports and destroys.

		   Use 1 for yes and 0 for no (default).

       zfs_metaslab_segment_weight_enabled (int)
		   Enable/disable segment-based	metaslab selection.

		   Use 1 for yes (default) and 0 for no.

       zfs_metaslab_switch_threshold (int)
		   When	using segment-based metaslab selection,	continue allo-
		   cating     from     the     active	   metaslab	 until
		   zfs_metaslab_switch_threshold  worth	 of  buckets have been
		   exhausted.

		   Default value: 2.

       metaslab_debug_load (int)
		   Load	all metaslabs during pool import.

		   Use 1 for yes and 0 for no (default).

       metaslab_debug_unload (int)
		   Prevent metaslabs from being	unloaded.

		   Use 1 for yes and 0 for no (default).

       metaslab_fragmentation_factor_enabled (int)
		   Enable  use	of  the	 fragmentation	metric	in   computing
		   metaslab weights.

		   Use 1 for yes (default) and 0 for no.

       metaslab_df_max_search (int)
		   Maximum  distance  to  search forward from the last offset.
		   Without this	limit, fragmented pools	can see	>100,000 iter-
		   ations  and metaslab_block_picker() becomes the performance
		   limiting factor on high-performance storage.

		   With	the default setting of 16MB,  we  typically  see  less
		   than	 500  iterations,  even	with very fragmented, ashift=9
		   pools.  The	maximum	 number	 of  iterations	 possible  is:
		   metaslab_df_max_search  /  (2 * (1<<ashift)).  With the de-
		   fault setting of 16MB this is 16*1024  (with	 ashift=9)  or
		   2048	(with ashift=12).

		   Default value: 16,777,216 (16MB)

       metaslab_df_use_largest_segment (int)
		   If	 we    are    not    searching	  forward    (due   to
		   metaslab_df_max_search,	metaslab_df_free_pct,	    or
		   metaslab_df_alloc_threshold),  this	tunable	 controls what
		   segment is used.  If	it is set, we  will  use  the  largest
		   free	 segment.   If it is not set, we will use a segment of
		   exactly the requested size (or larger).

		   Use 1 for yes and 0 for no (default).

       zfs_metaslab_max_size_cache_sec (ulong)
		   When	we unload a metaslab, we cache the size	of the largest
		   free	chunk. We use that cached size to determine whether or
		   not to load a metaslab for  a  given	 allocation.  As  more
		   frees  accumulate in	that metaslab while it's unloaded, the
		   cached max size becomes less	and  less  accurate.  After  a
		   number  of seconds controlled by this tunable, we stop con-
		   sidering the	cached max size	and start considering only the
		   histogram instead.

		   Default value: 3600 seconds (one hour)

       zfs_metaslab_mem_limit (int)
		   When	 we are	loading	a new metaslab,	we check the amount of
		   memory being	used to	store metaslab range trees. If	it  is
		   over	 a  threshold, we attempt to unload the	least recently
		   used	metaslab to prevent the	system from  clogging  all  of
		   its memory with range trees.	This tunable sets the percent-
		   age of total	system memory that is the threshold.

		   Default value: 25 percent

       zfs_metaslab_try_hard_before_gang (int)
		   If not set (the default), we	will first try normal  alloca-
		   tion.  If that fails	then we	will do	a gang allocation.  If
		   that	fails then we will do a	"try  hard"  gang  allocation.
		   If that fails then we will have a multi-layer gang block.

		   If set, we will first try normal allocation.	 If that fails
		   then	we will	do a "try hard"	allocation.  If	that fails  we
		   will	do a gang allocation.  If that fails we	will do	a "try
		   hard" gang allocation.  If that fails then we will  have  a
		   multi-layer gang block.

		   Default value: 0 (false)

       zfs_metaslab_find_max_tries (int)
		   When	 not  trying hard, we only consider this number	of the
		   best	metaslabs.  This improves performance, especially when
		   there  are many metaslabs per vdev and the allocation can't
		   actually be satisfied (so we	would  otherwise  iterate  all
		   the metaslabs).

		   Default value: 100

       zfs_vdev_default_ms_count (int)
		   When	 a  vdev  is added target this number of metaslabs per
		   top-level vdev.

		   Default value: 200.

       zfs_vdev_default_ms_shift (int)
		   Default limit for metaslab size.

		   Default value: 29 [meaning (1 << 29)	= 512MB].

       zfs_vdev_max_auto_ashift	(ulong)
		   Maximum ashift used when optimizing for logical -> physical
		   sector size on new top-level	vdevs.

		   Default value: ASHIFT_MAX (16).

       zfs_vdev_min_auto_ashift	(ulong)
		   Minimum ashift used when creating new top-level vdevs.

		   Default value: ASHIFT_MIN (9).

       zfs_vdev_min_ms_count (int)
		   Minimum number of metaslabs to create in a top-level	vdev.

		   Default value: 16.

       vdev_validate_skip (int)
		   Skip	label validation steps during pool import. Changing is
		   not recommended unless you know what	you are	doing and  are
		   recovering a	damaged	label.

		   Default value: 0.

       zfs_vdev_ms_count_limit (int)
		   Practical  upper  limit  of	total  metaslabs per top-level
		   vdev.

		   Default value: 131,072.

       metaslab_preload_enabled	(int)
		   Enable metaslab group preloading.

		   Use 1 for yes (default) and 0 for no.

       metaslab_lba_weighting_enabled (int)
		   Give	more weight to metaslabs  with	lower  LBAs,  assuming
		   they	 have  greater bandwidth as is typically the case on a
		   modern constant angular velocity disk drive.

		   Use 1 for yes (default) and 0 for no.

       metaslab_unload_delay (int)
		   After a metaslab is used, we	keep it	loaded for  this  many
		   txgs, to attempt to reduce unnecessary reloading. Note that
		   both	this many txgs and metaslab_unload_delay_ms  millisec-
		   onds	must pass before unloading will	occur.

		   Default value: 32.

       metaslab_unload_delay_ms	(int)
		   After  a  metaslab is used, we keep it loaded for this many
		   milliseconds, to attempt to reduce  unnecessary  reloading.
		   Note	 that  both  this  many	 milliseconds and metaslab_un-
		   load_delay txgs must	pass before unloading will occur.

		   Default value: 600000 (ten minutes).

       reference_history (int)
		   Maximum  reference  holders	being  tracked	 when	refer-
		   ence_tracking_enable	is active.

		   Default value: 3.

       reference_tracking_enable (int)
		   Track reference holders to refcount_t objects (debug	builds
		   only).

		   Use 1 for yes and 0 for no (default).

       send_holes_without_birth_time (int)
		   When	set, the hole_birth optimization will not be used, and
		   all	holes will always be sent on zfs send.	This is	useful
		   if you suspect your datasets	 are  affected	by  a  bug  in
		   hole_birth.

		   Use 1 for on	(default) and 0	for off.

       spa_config_path (charp)
		   SPA config file

		   Default value: /etc/zfs/zpool.cache.

       spa_asize_inflation (int)
		   Multiplication factor used to estimate actual disk consump-
		   tion	from the size of data being written. The default value
		   is a	worst case estimate, but lower values may be valid for
		   a given pool	depending on its configuration.	 Pool adminis-
		   trators  who	 understand  the  factors involved may wish to
		   specify a more realistic inflation factor, particularly  if
		   they	operate	close to quota or capacity limits.

		   Default value: 24.

       spa_load_print_vdev_tree	(int)
		   Whether  to	print  the  vdev tree in the debugging message
		   buffer during pool import.  Use 0 to	disable	and 1  to  en-
		   able.

		   Default value: 0.

       spa_load_verify_data (int)
		   Whether  to traverse	data blocks during an "extreme rewind"
		   (-X)	import.	 Use 0 to disable and 1	to enable.

		   An extreme rewind import normally performs a	full traversal
		   of all blocks in the	pool for verification.	If this	param-
		   eter	is set to 0, the traversal skips non-metadata  blocks.
		   It  can  be	toggled	once the import	has started to stop or
		   start the traversal of non-metadata blocks.

		   Default value: 1.

       spa_load_verify_metadata	(int)
		   Whether to traverse blocks during an	"extreme rewind"  (-X)
		   pool	import.	 Use 0 to disable and 1	to enable.

		   An extreme rewind import normally performs a	full traversal
		   of all blocks in the	pool for verification.	If this	param-
		   eter	 is  set to 0, the traversal is	not performed.	It can
		   be toggled once the import has started to stop or start the
		   traversal.

		   Default value: 1.

       spa_load_verify_shift (int)
		   Sets	the maximum number of bytes to consume during pool im-
		   port	to the log2 fraction of	the target ARC size.

		   Default value: 4.

       spa_slop_shift (int)
		   Normally,	we    don't    allow	 the	 last	  3.2%
		   (1/(2^spa_slop_shift)) of space in the pool to be consumed.
		   This	ensures	that we	don't run the pool completely  out  of
		   space,  due	to  unaccounted	changes	(e.g. to the MOS).  It
		   also	limits the worst-case time to allocate space.	If  we
		   have	 less  than this amount	of free	space, most ZPL	opera-
		   tions (e.g. write, create) will return ENOSPC.

		   Default value: 5.

       vdev_removal_max_span (int)
		   During top-level vdev removal, chunks of  data  are	copied
		   from	 the  vdev  which  may	include	free space in order to
		   trade bandwidth for IOPS.  This  parameter  determines  the
		   maximum  span  of  free  space (in bytes) which will	be in-
		   cluded as "unnecessary" data	in a chunk of copied data.

		   The	default	 value	here  was   chosen   to	  align	  with
		   zfs_vdev_read_gap_limit,  which  is	a similar concept when
		   doing regular reads (but there's no reason it has to	be the
		   same).

		   Default value: 32,768.

       vdev_file_logical_ashift	(ulong)
		   Logical ashift for file-based devices.

		   Default value: 9.

       vdev_file_physical_ashift (ulong)
		   Physical ashift for file-based devices.

		   Default value: 9.

       zap_iterate_prefetch (int)
		   If  this is set, when we start iterating over a ZAP object,
		   zfs will prefetch the  entire  object  (all	leaf  blocks).
		   However, this is limited by dmu_prefetch_max.

		   Use 1 for on	(default) and 0	for off.

       zfetch_array_rd_sz (ulong)
		   If  prefetching  is	enabled, disable prefetching for reads
		   larger than this size.

		   Default value: 1,048,576.

       zfetch_max_distance (uint)
		   Max bytes to	prefetch per stream.

		   Default value: 8,388,608 (8MB).

       zfetch_max_idistance (uint)
		   Max bytes to	prefetch indirects for per stream.

		   Default vaule: 67,108,864 (64MB).

       zfetch_max_streams (uint)
		   Max number of streams  per  zfetch  (prefetch  streams  per
		   file).

		   Default value: 8.

       zfetch_min_sec_reap (uint)
		   Min time before an active prefetch stream can be reclaimed

		   Default value: 2.

       zfs_abd_scatter_enabled (int)
		   Enables  ARC	from using scatter/gather lists	and forces all
		   allocations to be linear in kernel  memory.	Disabling  can
		   improve  performance	 in  some code paths at	the expense of
		   fragmented kernel memory.

		   Default value: 1.

       zfs_abd_scatter_max_order (iunt)
		   Maximum number of consecutive memory	pages allocated	 in  a
		   single  block  for  scatter/gather  lists. Default value is
		   specified by	the kernel itself.

		   Default value: 10 at	the time of this writing.

       zfs_abd_scatter_min_size	(uint)
		   This	is the minimum allocation size that will  use  scatter
		   (page-based)	 ABD's.	  Smaller  allocations will use	linear
		   ABD's.

		   Default value: 1536 (512B and 1KB allocations will be  lin-
		   ear).

       zfs_arc_dnode_limit (ulong)
		   When	 the number of bytes consumed by dnodes	in the ARC ex-
		   ceeds this number of	bytes, try to unpin some of it in  re-
		   sponse  to  demand  for  non-metadata. This value acts as a
		   ceiling to the amount of dnode metadata, and	defaults to  0
		   which   indicates   that   a	 percent  which	 is  based  on
		   zfs_arc_dnode_limit_percent of the ARC  meta	 buffers  that
		   may be used for dnodes.

		   See	also zfs_arc_meta_prune	which serves a similar purpose
		   but is used when the	amount of metadata in the ARC  exceeds
		   zfs_arc_meta_limit  rather  than in response	to overall de-
		   mand	for non-metadata.

		   Default value: 0.

       zfs_arc_dnode_limit_percent (ulong)
		   Percentage that can be consumed by dnodes of	ARC meta  buf-
		   fers.

		   See also zfs_arc_dnode_limit	which serves a similar purpose
		   but has a higher priority if	set to nonzero value.

		   Default value: 10%.

       zfs_arc_dnode_reduce_percent (ulong)
		   Percentage of ARC dnodes to try to scan in response to  de-
		   mand	 for non-metadata when the number of bytes consumed by
		   dnodes exceeds zfs_arc_dnode_limit.

		   Default value: 10% of the number of dnodes in the ARC.

       zfs_arc_average_blocksize (int)
		   The ARC's buffer hash table is sized	based on  the  assump-
		   tion	 of an average block size of zfs_arc_average_blocksize
		   (default 8K).  This works out to roughly 1MB	of hash	 table
		   per	1GB of physical	memory with 8-byte pointers.  For con-
		   figurations with a known larger  average  block  size  this
		   value can be	increased to reduce the	memory footprint.

		   Default value: 8192.

       zfs_arc_eviction_pct (int)
		   When	 arc_is_overflowing(),	arc_get_data_impl()  waits for
		   this	percent	of the requested amount	of data	to be evicted.
		   For	example,  by default for every 2KB that's evicted, 1KB
		   of it may be	"reused" by a new allocation.  Since  this  is
		   above  100%,	 it ensures that progress is made towards get-
		   ting	arc_size under arc_c.  Since this is  finite,  it  en-
		   sures  that	allocations  can still happen, even during the
		   potentially long time that arc_size is more than arc_c.

		   Default value: 200.

       zfs_arc_evict_batch_limit (int)
		   Number ARC headers to evict per sub-list before  proceeding
		   to  another	sub-list.  This	batch-style operation prevents
		   entire sub-lists from being evicted at once but comes at  a
		   cost	of additional unlocking	and locking.

		   Default value: 10.

       zfs_arc_grow_retry (int)
		   If	set   to  a  non  zero	value,	it  will  replace  the
		   arc_grow_retry value	with this value.   The	arc_grow_retry
		   value  (default  5)	is  the	number of seconds the ARC will
		   wait	before trying to resume	growth after a memory pressure
		   event.

		   Default value: 0.

       zfs_arc_lotsfree_percent	(int)
		   Throttle  I/O when free system memory drops below this per-
		   centage of total system memory.  Setting this  value	 to  0
		   will	disable	the throttle.

		   Default value: 10%.

       zfs_arc_max (ulong)
		   Max size of ARC in bytes.  If set to	0 then the max size of
		   ARC is determined by	the amount of system memory installed.
		   For	Linux, 1/2 of system memory will be used as the	limit.
		   For FreeBSD,	the larger of all system memory	- 1GB  or  5/8
		   of  system  memory  will  be	used as	the limit.  This value
		   must	be at least 67108864 (64 megabytes).

		   This	value can be changed dynamically with some caveats. It
		   cannot be set back to 0 while running and reducing it below
		   the current ARC size	will not cause the ARC to shrink with-
		   out memory pressure to induce shrinking.

		   Default value: 0.

       zfs_arc_meta_adjust_restarts (ulong)
		   The number of restart passes	to make	while scanning the ARC
		   attempting the free buffers in  order  to  stay  below  the
		   zfs_arc_meta_limit.	This value should not need to be tuned
		   but is available to facilitate performance analysis.

		   Default value: 4096.

       zfs_arc_meta_limit (ulong)
		   The maximum allowed size in bytes that  meta	 data  buffers
		   are	allowed	 to  consume  in  the ARC.  When this limit is
		   reached meta	data buffers will be  reclaimed	 even  if  the
		   overall  arc_c_max  has  not	 been reached.	This value de-
		   faults to 0 which indicates that a percent which  is	 based
		   on  zfs_arc_meta_limit_percent  of  the ARC may be used for
		   meta	data.

		   This	value my be changed dynamically	except that it	cannot
		   be set back to 0 for	a specific percent of the ARC; it must
		   be set to an	explicit value.

		   Default value: 0.

       zfs_arc_meta_limit_percent (ulong)
		   Percentage of ARC buffers that can be used for meta data.

		   See also zfs_arc_meta_limit which serves a similar  purpose
		   but has a higher priority if	set to nonzero value.

		   Default value: 75%.

       zfs_arc_meta_min	(ulong)
		   The	minimum	 allowed  size in bytes	that meta data buffers
		   may consume in the ARC.  This value	defaults  to  0	 which
		   disables  a	floor  on  the	amount of the ARC devoted meta
		   data.

		   Default value: 0.

       zfs_arc_meta_prune (int)
		   The number of dentries and inodes to	be scanned looking for
		   entries  which  can	be dropped.  This may be required when
		   the ARC reaches the zfs_arc_meta_limit because dentries and
		   inodes  can	pin buffers in the ARC.	 Increasing this value
		   will	cause to dentry	and inode caches to be pruned more ag-
		   gressively.	 Setting  this value to	0 will disable pruning
		   the inode and dentry	caches.

		   Default value: 10,000.

       zfs_arc_meta_strategy (int)
		   Define the strategy for ARC meta data buffer	eviction (meta
		   reclaim  strategy).	 A  value  of 0	(META_ONLY) will evict
		   only	the ARC	meta data buffers.  A value  of	 1  (BALANCED)
		   indicates  that  additional	data buffers may be evicted if
		   that	is required to in order	to evict the  required	number
		   of meta data	buffers.

		   Default value: 1.

       zfs_arc_min (ulong)
		   Min	size  of ARC in	bytes. If set to 0 then	arc_c_min will
		   default to consuming	the larger of 32M  or  1/32  of	 total
		   system memory.

		   Default value: 0.

       zfs_arc_min_prefetch_ms (int)
		   Minimum time	prefetched blocks are locked in	the ARC, spec-
		   ified in ms.	 A value of 0 will default to 1000 ms.

		   Default value: 0.

       zfs_arc_min_prescient_prefetch_ms (int)
		   Minimum time	"prescient prefetched" blocks  are  locked  in
		   the	ARC,  specified	 in  ms.  These	blocks are meant to be
		   prefetched fairly aggressively ahead	of the code  that  may
		   use them. A value of	0 will default to 6000 ms.

		   Default value: 0.

       zfs_max_missing_tvds (int)
		   Number  of  missing	top-level  vdevs which will be allowed
		   during pool import (only in read-only mode).

		   Default value: 0

       zfs_max_nvlist_src_size (ulong)
		   Maximum  size  in   bytes   allowed	 to   be   passed   as
		   zc_nvlist_src_size  for ioctls on /dev/zfs. This prevents a
		   user	from causing  the  kernel  to  allocate	 an  excessive
		   amount  of  memory.	When  the limit	is exceeded, the ioctl
		   fails with EINVAL and a description of the error is sent to
		   the	zfs-dbgmsg  log.  This parameter should	not need to be
		   touched under normal	circumstances. On FreeBSD, the default
		   is  based  on  the  system  limit  on user wired memory. On
		   Linux, the default is 128MB.

		   Default value: 0 (kernel decides)

       zfs_multilist_num_sublists (int)
		   To allow more fine-grained locking, each ARC	state contains
		   a  series  of  lists	 for  both data	and meta data objects.
		   Locking is performed	at the	level  of  these  "sub-lists".
		   This	 parameters  controls  the number of sub-lists per ARC
		   state, and also applies to other uses of the	multilist data
		   structure.

		   Default value: 4 or the number of online CPUs, whichever is
		   greater

       zfs_arc_overflow_shift (int)
		   The ARC size	is considered to be overflowing	if it  exceeds
		   the	current	 ARC target size (arc_c) by a threshold	deter-
		   mined by this parameter.  The threshold is calculated as  a
		   fraction of arc_c using the formula "arc_c >> zfs_arc_over-
		   flow_shift".

		   The default value of	8 causes the ARC to be	considered  to
		   be  overflowing  if	it  exceeds the	target size by 1/256th
		   (0.3%) of the target	size.

		   When	the ARC	is overflowing,	 new  buffer  allocations  are
		   stalled  until  the reclaim thread catches up and the over-
		   flow	condition no longer exists.

		   Default value: 8.

       zfs_arc_p_min_shift (int)
		   If  set  to	a   non	  zero	 value,	  this	 will	update
		   arc_p_min_shift   (default	4)   with   the	  new	value.
		   arc_p_min_shift is used to shift of arc_c  for  calculating
		   both	min and	max max	arc_p

		   Default value: 0.

       zfs_arc_p_dampener_disable (int)
		   Disable arc_p adapt dampener

		   Use 1 for yes (default) and 0 to disable.

       zfs_arc_shrink_shift (int)
		   If	set   to   a   non   zero   value,  this  will	update
		   arc_shrink_shift (default 7)	with the new value.

		   Default value: 0.

       zfs_arc_pc_percent (uint)
		   Percent of pagecache	to reclaim arc to

		   This	tunable	allows ZFS arc to play more  nicely  with  the
		   kernel's  LRU pagecache. It can guarantee that the ARC size
		   won't collapse under	scanning pressure  on  the  pagecache,
		   yet still allows arc	to be reclaimed	down to	zfs_arc_min if
		   necessary. This value is specified as percent of  pagecache
		   size	 (as measured by NR_FILE_PAGES)	where that percent may
		   exceed 100. This only operates during  memory  pressure/re-
		   claim.

		   Default value: 0% (disabled).

       zfs_arc_shrinker_limit (int)
		   This	 is  a	limit on how many pages	the ARC	shrinker makes
		   available for eviction in response to one  page  allocation
		   attempt.   Note that	in practice, the kernel's shrinker can
		   ask us to evict up to about 4x this for one allocation  at-
		   tempt.

		   The default limit of	10,000 (in practice, 160MB per alloca-
		   tion	attempt	with 4K	pages) limits the amount of time spent
		   attempting to reclaim ARC memory to less than 100ms per al-
		   location attempt, even  with	 a  small  average  compressed
		   block size of ~8KB.

		   The parameter can be	set to 0 (zero)	to disable the limit.

		   This	parameter only applies on Linux.

		   Default value: 10,000.

       zfs_arc_sys_free	(ulong)
		   The	target	number	of  bytes the ARC should leave as free
		   memory on the system.  Defaults to the larger  of  1/64  of
		   physical memory or 512K.  Setting this option to a non-zero
		   value will override the default.

		   Default value: 0.

       zfs_autoimport_disable (int)
		   Disable pool	import at module load by  ignoring  the	 cache
		   file	(typically /etc/zfs/zpool.cache).

		   Use 1 for yes (default) and 0 for no.

       zfs_checksum_events_per_second (uint)
		   Rate	 limit	checksum events	to this	many per second.  Note
		   that	this should not	be set below the zed thresholds	 (cur-
		   rently  10 checksums	over 10	sec) or	else zed may not trig-
		   ger any action.

		   Default value: 20

       zfs_commit_timeout_pct (int)
		   This	controls the amount of time that  a  ZIL  block	 (lwb)
		   will	 remain	 "open"	 when  it  isn't  "full", and it has a
		   thread waiting for it to be committed  to  stable  storage.
		   The timeout is scaled based on a percentage of the last lwb
		   latency to avoid significantly  impacting  the  latency  of
		   each	individual transaction record (itx).

		   Default value: 5%.

       zfs_condense_indirect_commit_entry_delay_ms (int)
		   Vdev	indirection layer (used	for device removal) sleeps for
		   this	many milliseconds during mapping generation.  Intended
		   for use with	the test suite to throttle vdev	removal	speed.

		   Default value: 0 (no	throttle).

       zfs_condense_indirect_vdevs_enable (int)
		   Enable  condensing  indirect	 vdev mappings.	 When set to a
		   non-zero value, attempt to condense indirect	vdev  mappings
		   if	the   mapping  uses  more  than	 zfs_condense_min_map-
		   ping_bytes bytes of memory and if the  obsolete  space  map
		   object uses more than zfs_condense_max_obsolete_bytes bytes
		   on-disk.  The condensing process is an attempt to save mem-
		   ory by removing obsolete mappings.

		   Default value: 1.

       zfs_condense_max_obsolete_bytes (ulong)
		   Only	 attempt to condense indirect vdev mappings if the on-
		   disk	size of	the obsolete space map object is greater  than
		   this	number of bytes	(see fBzfs_condense_indirect_vdevs_en-
		   able).

		   Default value: 1,073,741,824.

       zfs_condense_min_mapping_bytes (ulong)
		   Minimum size	vdev  mapping  to  attempt  to	condense  (see
		   zfs_condense_indirect_vdevs_enable).

		   Default value: 131,072.

       zfs_dbgmsg_enable (int)
		   Internally  ZFS  keeps a small log to facilitate debugging.
		   By default the log is disabled, to enable it	set  this  op-
		   tion	 to  1.	  The  contents	 of the	log can	be accessed by
		   reading the /proc/spl/kstat/zfs/dbgmsg file.	 Writing 0  to
		   this	proc file clears the log.

		   Default value: 0.

       zfs_dbgmsg_maxsize (int)
		   The maximum size in bytes of	the internal ZFS debug log.

		   Default value: 4M.

       zfs_dbuf_state_index (int)
		   This	 feature  is currently unused. It is normally used for
		   controlling	 what	 reporting    is    available	 under
		   /proc/spl/kstat/zfs.

		   Default value: 0.

       zfs_deadman_enabled (int)
		   When	 a  pool  sync	operation  takes longer	than zfs_dead-
		   man_synctime_ms milliseconds, or  when  an  individual  I/O
		   takes longer	than zfs_deadman_ziotime_ms milliseconds, then
		   the operation is considered to  be  "hung".	 If  zfs_dead-
		   man_enabled	is set then the	deadman	behavior is invoked as
		   described by	the zfs_deadman_failmode  module  option.   By
		   default the deadman is enabled and configured to wait which
		   results in "hung" I/Os only being logged.  The  deadman  is
		   automatically disabled when a pool gets suspended.

		   Default value: 1.

       zfs_deadman_failmode (charp)
		   Controls  the  failure  behavior when the deadman detects a
		   "hung" I/O.	Valid values are wait, continue, and panic.

		   wait	- Wait for a "hung" I/O	to complete.  For each	"hung"
		   I/O a "deadman" event will be posted	describing that	I/O.

		   continue  - Attempt to recover from a "hung"	I/O by re-dis-
		   patching it to the I/O pipeline if possible.

		   panic - Panic the system.  This can be used	to  facilitate
		   an  automatic  fail-over to a properly configured fail-over
		   partner.

		   Default value: wait.

       zfs_deadman_checktime_ms	(int)
		   Check time in milliseconds. This defines the	 frequency  at
		   which  we  check  for  hung	I/O and	potentially invoke the
		   zfs_deadman_failmode	behavior.

		   Default value: 60,000.

       zfs_deadman_synctime_ms (ulong)
		   Interval in milliseconds after which	the deadman  is	 trig-
		   gered  and also the interval	after which a pool sync	opera-
		   tion	is considered to be "hung".  Once this	limit  is  ex-
		   ceeded the deadman will be invoked every zfs_deadman_check-
		   time_ms milliseconds	until the pool sync completes.

		   Default value: 600,000.

       zfs_deadman_ziotime_ms (ulong)
		   Interval in milliseconds after which	the deadman  is	 trig-
		   gered  and  an individual I/O operation is considered to be
		   "hung".  As long as the I/O remains "hung" the deadman will
		   be  invoked every zfs_deadman_checktime_ms milliseconds un-
		   til the I/O completes.

		   Default value: 300,000.

       zfs_dedup_prefetch (int)
		   Enable prefetching dedup-ed blks

		   Use 1 for yes and 0 to disable (default).

       zfs_delay_min_dirty_percent (int)
		   Start to delay each transaction once	there is  this	amount
		   of	 dirty	  data,	  expressed   as   a   percentage   of
		   zfs_dirty_data_max.	   This	   value    should    be    >=
		   zfs_vdev_async_write_active_max_dirty_percent.    See   the
		   section "ZFS	TRANSACTION DELAY".

		   Default value: 60%.

       zfs_delay_scale (int)
		   This	controls how quickly the transaction delay  approaches
		   infinity.   Larger  values  cause longer delays for a given
		   amount of dirty data.

		   For the smoothest delay, this value should be about 1  bil-
		   lion	 divided  by the maximum number	of operations per sec-
		   ond.	 This will smoothly handle between 10x and 1/10th this
		   number.

		   See the section "ZFS	TRANSACTION DELAY".

		   Note: zfs_delay_scale * zfs_dirty_data_max must be <	2^64.

		   Default value: 500,000.

       zfs_disable_ivset_guid_check (int)
		   Disables  requirement  for  IVset  guids  to	be present and
		   match when doing a raw receive of encrypted	datasets.  In-
		   tended for users whose pools	were created with OpenZFS pre-
		   release versions and	now have compatibility issues.

		   Default value: 0.

       zfs_key_max_salt_uses (ulong)
		   Maximum number of uses of a single salt value before	gener-
		   ating  a  new one for encrypted datasets. The default value
		   is also the maximum that will be accepted.

		   Default value: 400,000,000.

       zfs_object_mutex_size (uint)
		   Size	of the znode hashtable used for	holds.

		   Due to the need to hold locks on objects that may not exist
		   yet,	 kernel	mutexes	are not	created	per-object and instead
		   a hashtable is used where collisions	will result in objects
		   waiting  when  there	is not actually	contention on the same
		   object.

		   Default value: 64.

       zfs_slow_io_events_per_second (int)
		   Rate	limit delay zevents (which report slow I/Os)  to  this
		   many	per second.

		   Default value: 20

       zfs_unflushed_max_mem_amt (ulong)
		   Upper-bound limit for unflushed metadata changes to be held
		   by the log spacemap in memory (in bytes).

		   Default value: 1,073,741,824	(1GB).

       zfs_unflushed_max_mem_ppm (ulong)
		   Percentage of the overall system memory that	ZFS allows  to
		   be used for unflushed metadata changes by the log spacemap.
		   (value is calculated	over 1000000 for finer granularity).

		   Default value: 1000 (which is divided by 1000000, resulting
		   in the limit	to be 0.1% of memory)

       zfs_unflushed_log_block_max (ulong)
		   Describes the maximum number	of log spacemap	blocks allowed
		   for each pool.  The default value of	262144 means that  the
		   space  in  all the log spacemaps can	add up to no more than
		   262144 blocks (which	means 32GB  of	logical	 space	before
		   compression	and  ditto  blocks, assuming that blocksize is
		   128k).

		   This	tunable	is important because it	involves  a  trade-off
		   between  import  time  after	an unclean export and the fre-
		   quency of flushing metaslabs.  The higher this  number  is,
		   the	more log blocks	we allow when the pool is active which
		   means that we flush metaslabs less often and	thus  decrease
		   the	number	of  I/Os for spacemap updates per TXG.	At the
		   same	time though, that means	that in	the event  of  an  un-
		   clean export, there will be more log	spacemap blocks	for us
		   to read, inducing overhead in the import time of the	 pool.
		   The	lower the number, the amount of	flushing increases de-
		   stroying log	blocks quicker as they become obsolete faster,
		   which  leaves less blocks to	be read	during import time af-
		   ter a crash.

		   Each	log spacemap block existing during pool	 import	 leads
		   to approximately one	extra logical I/O issued.  This	is the
		   reason why this tunable  is	exposed	 in  terms  of	blocks
		   rather than space used.

		   Default value: 262144 (256K).

       zfs_unflushed_log_block_min (ulong)
		   If  the  number of metaslabs	is small and our incoming rate
		   is high, we could get into a	situation that we are flushing
		   all our metaslabs every TXG.	 Thus we always	allow at least
		   this	many log blocks.

		   Default value: 1000.

       zfs_unflushed_log_block_pct (ulong)
		   Tunable used	to determine the number	of blocks that can  be
		   used	for the	spacemap log, expressed	as a percentage	of the
		   total number	of metaslabs in	the pool.

		   Default value: 400 (read as 400% - meaning that the	number
		   of  log spacemap blocks are capped at 4 times the number of
		   metaslabs in	the pool).

       zfs_unlink_suspend_progress (uint)
		   When	enabled, files will not	be asynchronously removed from
		   the list of pending unlinks and the space they consume will
		   be leaked. Once this	 option	 has  been  disabled  and  the
		   dataset is remounted, the pending unlinks will be processed
		   and the freed space returned	to the pool.  This  option  is
		   used	by the test suite to facilitate	testing.

		   Uses	0 (default) to allow progress and 1 to pause progress.

       zfs_delete_blocks (ulong)
		   This	is the used to define a	large file for the purposes of
		   delete.  Files containing more than zfs_delete_blocks  will
		   be  deleted	asynchronously while smaller files are deleted
		   synchronously.  Decreasing this value will reduce the  time
		   spent  in  an  unlink(2)  system  call  at the expense of a
		   longer delay	before the freed space is available.

		   Default value: 20,480.

       zfs_dirty_data_max (int)
		   Determines the dirty	space limit in bytes.  Once this limit
		   is  exceeded,  new  writes are halted until space frees up.
		   This	     parameter	     takes	 precedence	  over
		   zfs_dirty_data_max_percent.	 See the section "ZFS TRANSAC-
		   TION	DELAY".

		   Default   value:   10%   of	 physical   RAM,   capped   at
		   zfs_dirty_data_max_max.

       zfs_dirty_data_max_max (int)
		   Maximum allowable value of zfs_dirty_data_max, expressed in
		   bytes.  This	limit is only enforced at  module  load	 time,
		   and will be ignored if zfs_dirty_data_max is	later changed.
		   This	     parameter	     takes	 precedence	  over
		   zfs_dirty_data_max_max_percent. See the section "ZFS	TRANS-
		   ACTION DELAY".

		   Default value: 25% of physical RAM.

       zfs_dirty_data_max_max_percent (int)
		   Maximum allowable value of zfs_dirty_data_max, expressed as
		   a  percentage of physical RAM.  This	limit is only enforced
		   at	module	 load	time,	and   will   be	  ignored   if
		   zfs_dirty_data_max	is   later   changed.	The  parameter
		   zfs_dirty_data_max_max takes	precedence over	this one.  See
		   the section "ZFS TRANSACTION	DELAY".

		   Default value: 25%.

       zfs_dirty_data_max_percent (int)
		   Determines the dirty	space limit, expressed as a percentage
		   of all memory.  Once	this limit is exceeded,	new writes are
		   halted    until    space    frees	up.    The   parameter
		   zfs_dirty_data_max takes precedence over this one.  See the
		   section "ZFS	TRANSACTION DELAY".

		   Default value: 10%, subject to zfs_dirty_data_max_max.

       zfs_dirty_data_sync_percent (int)
		   Start  syncing  out a transaction group if there's at least
		   this	much dirty data	as a percentage	of zfs_dirty_data_max.
		   This	  should   be	less   than   zfs_vdev_async_write_ac-
		   tive_min_dirty_percent.

		   Default value: 20% of zfs_dirty_data_max.

       zfs_fallocate_reserve_percent (uint)
		   Since ZFS is	a  copy-on-write  filesystem  with  snapshots,
		   blocks  cannot be preallocated for a	file in	order to guar-
		   antee that later writes will	not run	 out  of  space.   In-
		   stead,  fallocate()	space  preallocation  only checks that
		   sufficient space is currently available in the pool or  the
		   user's  project quota allocation, and then creates a	sparse
		   file	of the requested size. The requested space  is	multi-
		   plied  by zfs_fallocate_reserve_percent to allow additional
		   space for indirect  blocks  and  other  internal  metadata.
		   Setting  this  value	to 0 disables support for fallocate(2)
		   and returns EOPNOTSUPP for fallocate() space	 preallocation
		   again.

		   Default value: 110%

       zfs_fletcher_4_impl (string)
		   Select a fletcher 4 implementation.

		   Supported  selectors	 are:  fastest,	 scalar,  sse2,	ssse3,
		   avx2, avx512f, avx512bw, and	aarch64_neon.  All of the  se-
		   lectors  except  fastest and	scalar require instruction set
		   extensions to be available and will only appear if ZFS  de-
		   tects  that they are	present	at runtime. If multiple	imple-
		   mentations of fletcher 4 are	available, the fastest will be
		   chosen using	a micro	benchmark. Selecting scalar results in
		   the original, CPU based calculation,	being used.  Selecting
		   any	option other than fastest and scalar results in	vector
		   instructions	from the respective CPU	instruction set	 being
		   used.

		   Default value: fastest.

       zfs_free_bpobj_enabled (int)
		   Enable/disable the processing of the	free_bpobj object.

		   Default value: 1.

       zfs_async_block_max_blocks (ulong)
		   Maximum number of blocks freed in a single txg.

		   Default value: ULONG_MAX (unlimited).

       zfs_max_async_dedup_frees (ulong)
		   Maximum number of dedup blocks freed	in a single txg.

		   Default value: 100,000.

       zfs_override_estimate_recordsize	(ulong)
		   Record size calculation override for	zfs send estimates.

		   Default value: 0.

       zfs_vdev_async_read_max_active (int)
		   Maximum  asynchronous read I/Os active to each device.  See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 3.

       zfs_vdev_async_read_min_active (int)
		   Minimum asynchronous	read I/Os active to each device.   See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 1.

       zfs_vdev_async_write_active_max_dirty_percent (int)
		   When	  the  pool  has  more	than  zfs_vdev_async_write_ac-
		   tive_max_dirty_percent	 dirty	      data,	   use
		   zfs_vdev_async_write_max_active   to	  limit	 active	 async
		   writes.  If the dirty data is between min and max, the  ac-
		   tive	 I/O  limit  is	linearly interpolated. See the section
		   "ZFS	I/O SCHEDULER".

		   Default value: 60%.

       zfs_vdev_async_write_active_min_dirty_percent (int)
		   When	 the  pool  has	 less  than   zfs_vdev_async_write_ac-
		   tive_min_dirty_percent	 dirty	      data,	   use
		   zfs_vdev_async_write_min_active  to	limit	active	 async
		   writes.   If	the dirty data is between min and max, the ac-
		   tive	I/O limit is linearly interpolated.  See  the  section
		   "ZFS	I/O SCHEDULER".

		   Default value: 30%.

       zfs_vdev_async_write_max_active (int)
		   Maximum asynchronous	write I/Os active to each device.  See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 10.

       zfs_vdev_async_write_min_active (int)
		   Minimum asynchronous	write I/Os active to each device.  See
		   the section "ZFS I/O	SCHEDULER".

		   Lower  values  are  associated with better latency on rota-
		   tional media	but poorer resilver performance.  The  default
		   value  of  2	 was  chosen as	a compromise. A	value of 3 has
		   been	shown to improve resilver  performance	further	 at  a
		   cost	of further increasing latency.

		   Default value: 2.

       zfs_vdev_initializing_max_active	(int)
		   Maximum  initializing  I/Os active to each device.  See the
		   section "ZFS	I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_initializing_min_active	(int)
		   Minimum initializing	I/Os active to each device.   See  the
		   section "ZFS	I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_max_active (int)
		   The maximum number of I/Os active to	each device.  Ideally,
		   this	will be	>= the sum of each  queue's  max_active.   See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 1,000.

       zfs_vdev_rebuild_max_active (int)
		   Maximum  sequential	resilver  I/Os	active to each device.
		   See the section "ZFS	I/O SCHEDULER".

		   Default value: 3.

       zfs_vdev_rebuild_min_active (int)
		   Minimum sequential resilver I/Os  active  to	 each  device.
		   See the section "ZFS	I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_removal_max_active (int)
		   Maximum  removal  I/Os active to each device.  See the sec-
		   tion	"ZFS I/O SCHEDULER".

		   Default value: 2.

       zfs_vdev_removal_min_active (int)
		   Minimum removal I/Os	active to each device.	See  the  sec-
		   tion	"ZFS I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_scrub_max_active (int)
		   Maximum  scrub I/Os active to each device.  See the section
		   "ZFS	I/O SCHEDULER".

		   Default value: 2.

       zfs_vdev_scrub_min_active (int)
		   Minimum scrub I/Os active to	each device.  See the  section
		   "ZFS	I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_sync_read_max_active (int)
		   Maximum  synchronous	 read I/Os active to each device.  See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 10.

       zfs_vdev_sync_read_min_active (int)
		   Minimum synchronous read I/Os active	to each	 device.   See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 10.

       zfs_vdev_sync_write_max_active (int)
		   Maximum  synchronous	write I/Os active to each device.  See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 10.

       zfs_vdev_sync_write_min_active (int)
		   Minimum synchronous write I/Os active to each device.   See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 10.

       zfs_vdev_trim_max_active	(int)
		   Maximum  trim/discard  I/Os active to each device.  See the
		   section "ZFS	I/O SCHEDULER".

		   Default value: 2.

       zfs_vdev_trim_min_active	(int)
		   Minimum trim/discard	I/Os active to each device.   See  the
		   section "ZFS	I/O SCHEDULER".

		   Default value: 1.

       zfs_vdev_nia_delay (int)
		   For non-interactive I/O (scrub, resilver, removal, initial-
		   ize and rebuild), the number	of  concurrently-active	 I/O's
		   is  limited	to  *_min_active,  unless  the vdev is "idle".
		   When	there are no interactive I/Os active (sync or  async),
		   and	zfs_vdev_nia_delay  I/Os have completed	since the last
		   interactive I/O, then the vdev is considered	to be  "idle",
		   and the number of concurrently-active non-interactive I/O's
		   is increased	to *_max_active.  See  the  section  "ZFS  I/O
		   SCHEDULER".

		   Default value: 5.

       zfs_vdev_nia_credit (int)
		   Some	 HDDs  tend to prioritize sequential I/O so high, that
		   concurrent random I/O latency reaches several seconds.   On
		   some	 HDDs it happens even if sequential I/Os are submitted
		   one at a time, and so setting *_max_active to  1  does  not
		   help.   To  prevent	non-interactive	I/Os, like scrub, from
		   monopolizing	the device no  more  than  zfs_vdev_nia_credit
		   I/Os	can be sent while there	are outstanding	incomplete in-
		   teractive I/Os.  This enforced wait ensures	the  HDD  ser-
		   vices  the  interactive  I/O	 within	a reasonable amount of
		   time.  See the section "ZFS I/O SCHEDULER".

		   Default value: 5.

       zfs_vdev_queue_depth_pct	(int)
		   Maximum number of queued allocations	per top-level vdev ex-
		   pressed  as a percentage of zfs_vdev_async_write_max_active
		   which allows	the system to detect devices that are more ca-
		   pable  of  handling allocations and to allocate more	blocks
		   to those devices.  It allows	for dynamic allocation distri-
		   bution  when	 devices are imbalanced	as fuller devices will
		   tend	to be slower than empty	devices.

		   See also zio_dva_throttle_enabled.

		   Default value: 1000%.

       zfs_expire_snapshot (int)
		   Seconds to expire .zfs/snapshot

		   Default value: 300.

       zfs_admin_snapshot (int)
		   Allow the creation, removal,	or renaming of entries in  the
		   .zfs/snapshot directory to cause the	creation, destruction,
		   or renaming of snapshots.  When enabled this	 functionality
		   works  both	locally	 and  over  NFS	exports	which have the
		   'no_root_squash' option set.	This functionality is disabled
		   by default.

		   Use 1 for yes and 0 for no (default).

       zfs_flags (int)
		   Set	additional debugging flags. The	following flags	may be
		   bitwise-or'd	together.

		   +-------------------------------------------------------------------------+
		   |Value   Symbolic Name						     |
		   |	    Description							     |
		   +-------------------------------------------------------------------------+
		   |	1   ZFS_DEBUG_DPRINTF						     |
		   |	    Enable dprintf entries in the debug	log.			     |
		   +-------------------------------------------------------------------------+
		   |	2   ZFS_DEBUG_DBUF_VERIFY *					     |
		   |	    Enable extra dbuf verifications.				     |
		   +-------------------------------------------------------------------------+
		   |	4   ZFS_DEBUG_DNODE_VERIFY *					     |
		   |	    Enable extra dnode verifications.				     |
		   +-------------------------------------------------------------------------+
		   |	8   ZFS_DEBUG_SNAPNAMES						     |
		   |	    Enable snapshot name verification.				     |
		   +-------------------------------------------------------------------------+
		   |   16   ZFS_DEBUG_MODIFY						     |
		   |	    Check for illegally	modified ARC buffers.			     |
		   +-------------------------------------------------------------------------+
		   |   64   ZFS_DEBUG_ZIO_FREE						     |
		   |	    Enable verification	of block frees.				     |
		   +-------------------------------------------------------------------------+
		   |  128   ZFS_DEBUG_HISTOGRAM_VERIFY					     |
		   |	    Enable extra spacemap histogram verifications.		     |
		   +-------------------------------------------------------------------------+
		   |  256   ZFS_DEBUG_METASLAB_VERIFY					     |
		   |	    Verify space accounting on disk matches in-core range_trees.     |
		   +-------------------------------------------------------------------------+
		   |  512   ZFS_DEBUG_SET_ERROR						     |
		   |	    Enable SET_ERROR and dprintf entries in the	debug log.	     |
		   +-------------------------------------------------------------------------+
		   | 1024   ZFS_DEBUG_INDIRECT_REMAP					     |
		   |	    Verify split blocks	created	by device removal.		     |
		   +-------------------------------------------------------------------------+
		   | 2048   ZFS_DEBUG_TRIM						     |
		   |	    Verify TRIM	ranges are always within the allocatable range tree. |
		   +-------------------------------------------------------------------------+
		   | 4096   ZFS_DEBUG_LOG_SPACEMAP					     |
		   |	    Verify that	the log	summary	is consistent with the spacemap	log  |
		   |	    and	enable zfs_dbgmsgs for metaslab	loading	and flushing.	     |
		   +-------------------------------------------------------------------------+
		   * Requires debug build.

		   Default value: 0.

       zfs_free_leak_on_eio (int)
		   If destroy encounters an EIO	while reading  metadata	 (e.g.
		   indirect  blocks), space referenced by the missing metadata
		   can not be freed.  Normally this causes the background  de-
		   stroy  to become "stalled", as it is	unable to make forward
		   progress.  While in this stalled state, all remaining space
		   to  free  from the error-encountering filesystem is "tempo-
		   rarily leaked".  Set	this flag to cause it  to  ignore  the
		   EIO,	 permanently  leak the space from indirect blocks that
		   can not be read, and	continue to free everything else  that
		   it can.

		   The	default,  "stalling" behavior is useful	if the storage
		   partially fails (i.e. some but not all i/os fail), and then
		   later  recovers.  In	this case, we will be able to continue
		   pool	operations while it is partially failed, and  when  it
		   recovers, we	can continue to	free the space,	with no	leaks.
		   However, note that this case	is actually fairly rare.

		   Typically pools either (a)  fail  completely	 (but  perhaps
		   temporarily,	 e.g.  a top-level vdev	going offline),	or (b)
		   have	localized, permanent errors  (e.g.  disk  returns  the
		   wrong  data due to bit flip or firmware bug).  In case (a),
		   this	setting	does not matter	because	the pool will be  sus-
		   pended and the sync thread will not be able to make forward
		   progress regardless.	 In case (b),  because	the  error  is
		   permanent, the best we can do is leak the minimum amount of
		   space, which	is what	setting	this flag will do.  Therefore,
		   it  is  reasonable for this flag to normally	be set,	but we
		   chose the more conservative approach	of not setting it,  so
		   that	 there is no possibility of leaking space in the "par-
		   tial	temporary" failure case.

		   Default value: 0.

       zfs_free_min_time_ms (int)
		   During a zfs	destroy	operation using	 feature@async_destroy
		   a  minimum of this much time	will be	spent working on free-
		   ing blocks per txg.

		   Default value: 1,000.

       zfs_obsolete_min_time_ms	(int)
		   Similar to zfs_free_min_time_ms but for cleanup of old  in-
		   direction records for removed vdevs.

		   Default value: 500.

       zfs_immediate_write_sz (long)
		   Largest  data  block	to write to zil. Larger	blocks will be
		   treated as if the dataset being written to had the property
		   setting logbias=throughput.

		   Default value: 32,768.

       zfs_initialize_value (ulong)
		   Pattern written to vdev free	space by zpool initialize.

		   Default  value: 16,045,690,984,833,335,022 (0xdeadbeefdead-
		   beee).

       zfs_initialize_chunk_size (ulong)
		   Size	of writes used by zpool	initialize.   This  option  is
		   used	by the test suite to facilitate	testing.

		   Default value: 1,048,576

       zfs_livelist_max_entries	(ulong)
		   The threshold size (in block	pointers) at which we create a
		   new sub-livelist.  Larger sublists are more costly  from  a
		   memory  perspective	but  the fewer sublists	there are, the
		   lower the cost of insertion.

		   Default value: 500,000.

       zfs_livelist_min_percent_shared (int)
		   If the amount of shared space between a  snapshot  and  its
		   clone  drops	 below this threshold, the clone turns off the
		   livelist and	reverts	to the old deletion method. This is in
		   place  because  once	 a  clone  has been overwritten	enough
		   livelists no	long give us a benefit.

		   Default value: 75.

       zfs_livelist_condense_new_alloc (int)
		   Incremented each time an extra ALLOC	blkptr is added	 to  a
		   livelist entry while	it is being condensed.	This option is
		   used	by the test suite to track race	conditions.

		   Default value: 0.

       zfs_livelist_condense_sync_cancel (int)
		   Incremented each time livelist condensing is	canceled while
		   in  spa_livelist_condense_sync.  This option	is used	by the
		   test	suite to track race conditions.

		   Default value: 0.

       zfs_livelist_condense_sync_pause	(int)
		   When	set, the livelist condense process pauses indefinitely
		   before executing the	synctask - spa_livelist_condense_sync.
		   This	option is used by the test suite to trigger race  con-
		   ditions.

		   Default value: 0.

       zfs_livelist_condense_zthr_cancel (int)
		   Incremented each time livelist condensing is	canceled while
		   in spa_livelist_condense_cb.	 This option is	 used  by  the
		   test	suite to track race conditions.

		   Default value: 0.

       zfs_livelist_condense_zthr_pause	(int)
		   When	set, the livelist condense process pauses indefinitely
		   before  executing  the  open	 context  condensing  work  in
		   spa_livelist_condense_cb.   This option is used by the test
		   suite to trigger race conditions.

		   Default value: 0.

       zfs_lua_max_instrlimit (ulong)
		   The maximum execution time limit that can be	set for	a  ZFS
		   channel program, specified as a number of Lua instructions.

		   Default value: 100,000,000.

       zfs_lua_max_memlimit (ulong)
		   The	maximum	memory limit that can be set for a ZFS channel
		   program, specified in bytes.

		   Default value: 104,857,600.

       zfs_max_dataset_nesting (int)
		   The maximum depth of	nested datasets.  This	value  can  be
		   tuned  temporarily to fix existing datasets that exceed the
		   predefined limit.

		   Default value: 50.

       zfs_max_log_walking (ulong)
		   The number of past TXGs that	the flushing algorithm of  the
		   log spacemap	feature	uses to	estimate incoming log blocks.

		   Default value: 5.

       zfs_max_logsm_summary_length (ulong)
		   Maximum  number  of	rows  allowed  in  the	summary	of the
		   spacemap log.

		   Default value: 10.

       zfs_max_recordsize (int)
		   We currently	support	block sizes from 512  bytes  to	 16MB.
		   The benefits	of larger blocks, and thus larger I/O, need to
		   be weighed against the cost of COWing a giant block to mod-
		   ify	one byte.  Additionally, very large blocks can have an
		   impact on i/o latency, and also potentially on  the	memory
		   allocator.  Therefore, we do	not allow the recordsize to be
		   set larger than zfs_max_recordsize (default	1MB).	Larger
		   blocks  can	be created by changing this tunable, and pools
		   with	larger blocks can always be imported and used, regard-
		   less	of this	setting.

		   Default value: 1,048,576.

       zfs_allow_redacted_dataset_mount	(int)
		   Allow  datasets  received  with redacted send/receive to be
		   mounted. Normally disabled because these  datasets  may  be
		   missing key data.

		   Default value: 0.

       zfs_min_metaslabs_to_flush (ulong)
		   Minimum number of metaslabs to flush	per dirty TXG

		   Default value: 1.

       zfs_metaslab_fragmentation_threshold (int)
		   Allow metaslabs to keep their active	state as long as their
		   fragmentation percentage is less  than  or  equal  to  this
		   value.  An active metaslab that exceeds this	threshold will
		   no longer keep its active status allowing better  metaslabs
		   to be selected.

		   Default value: 70.

       zfs_mg_fragmentation_threshold (int)
		   Metaslab  groups are	considered eligible for	allocations if
		   their fragmentation metric (measured	as  a  percentage)  is
		   less	 than  or equal	to this	value. If a metaslab group ex-
		   ceeds this threshold	then it	will  be  skipped  unless  all
		   metaslab groups within the metaslab class have also crossed
		   this	threshold.

		   Default value: 95.

       zfs_mg_noalloc_threshold	(int)
		   Defines a threshold at which	metaslab groups	should be eli-
		   gible  for  allocations.   The value	is expressed as	a per-
		   centage of free space beyond	which a	metaslab group is  al-
		   ways	 eligible for allocations.  If a metaslab group's free
		   space is less than or equal to the threshold, the allocator
		   will	 avoid	allocating  to that group unless all groups in
		   the pool have reached the threshold.	 Once all groups  have
		   reached the threshold, all groups are allowed to accept al-
		   locations.  The default value of 0 disables the feature and
		   causes all metaslab groups to be eligible for allocations.

		   This	parameter allows one to	deal with pools	having heavily
		   imbalanced vdevs such as would be the case when a new  vdev
		   has	been  added.  Setting the threshold to a non-zero per-
		   centage will	stop allocations from being made to vdevs that
		   aren't  filled to the specified percentage and allow	lesser
		   filled vdevs	to acquire more	allocations than  they	other-
		   wise	would under the	old zfs_mg_alloc_failures facility.

		   Default value: 0.

       zfs_ddt_data_is_special (int)
		   If  enabled,	ZFS will place DDT data	into the special allo-
		   cation class.

		   Default value: 1.

       zfs_user_indirect_is_special (int)
		   If enabled, ZFS will	place user data	(both file  and	 zvol)
		   indirect blocks into	the special allocation class.

		   Default value: 1.

       zfs_multihost_history (int)
		   Historical statistics for the last N	multihost updates will
		   be available	in /proc/spl/kstat/zfs/<pool>/multihost

		   Default value: 0.

       zfs_multihost_interval (ulong)
		   Used	to control the frequency of multihost writes which are
		   performed  when the multihost pool property is on.  This is
		   one factor used to determine	the  length  of	 the  activity
		   check during	import.

		   The	multihost  write  period  is  zfs_multihost_interval /
		   leaf-vdevs milliseconds.  On	average	a multihost write will
		   be  issued  for each	leaf vdev every	zfs_multihost_interval
		   milliseconds.  In practice, the observed  period  can  vary
		   with	 the  I/O  load	 and  this observed value is the delay
		   which is stored in the uberblock.

		   Default value: 1000.

       zfs_multihost_import_intervals (uint)
		   Used	to control the duration	of the activity	 test  on  im-
		   port.   Smaller  values  of	zfs_multihost_import_intervals
		   will	reduce the import time but increase the	risk of	 fail-
		   ing	to  detect  an	active pool.  The total	activity check
		   time	is never allowed to drop below one second.

		   On import the activity check	waits a	minimum	amount of time
		   determined  by  zfs_multihost_interval  * zfs_multihost_im-
		   port_intervals, or the same product computed	 on  the  host
		   which  last	had  the pool imported (whichever is greater).
		   The activity	check time may	be  further  extended  if  the
		   value  of  mmp  delay found in the best uberblock indicates
		   actual multihost updates happened at	longer intervals  than
		   zfs_multihost_interval.   A	minimum	 value of 100ms	is en-
		   forced.

		   A value of 0	is ignored and treated as if it	was set	to 1.

		   Default value: 20.

       zfs_multihost_fail_intervals (uint)
		   Controls the	behavior of  the  pool	when  multihost	 write
		   failures or delays are detected.

		   When	 zfs_multihost_fail_intervals  =  0,  multihost	 write
		   failures or delays are ignored.  The	failures will still be
		   reported  to	 the  ZED which	depending on its configuration
		   may take action such	as suspending the pool or offlining  a
		   device.

		   When	 zfs_multihost_fail_intervals  >  0,  the pool will be
		   suspended  if  zfs_multihost_fail_intervals	*   zfs_multi-
		   host_interval  milliseconds	pass  without a	successful mmp
		   write.  This	guarantees the	activity  test	will  see  mmp
		   writes  if  the  pool is imported.  A value of 1 is ignored
		   and treated as if it	was set	to 2.  This  is	 necessary  to
		   prevent  the	pool from being	suspended due to normal, small
		   I/O latency variations.

		   Default value: 10.

       zfs_no_scrub_io (int)
		   Set for no scrub I/O. This results in scrubs	 not  actually
		   scrubbing  data  and	 simply	 doing a metadata crawl	of the
		   pool	instead.

		   Use 1 for yes and 0 for no (default).

       zfs_no_scrub_prefetch (int)
		   Set to disable block	prefetching for	scrubs.

		   Use 1 for yes and 0 for no (default).

       zfs_nocacheflush	(int)
		   Disable cache flush operations on disks when	writing.  Set-
		   ting	 this  will  cause  pool corruption on power loss if a
		   volatile out-of-order write cache is	enabled.

		   Use 1 for yes and 0 for no (default).

       zfs_nopwrite_enabled (int)
		   Enable NOP writes

		   Use 1 for yes (default) and 0 to disable.

       zfs_dmu_offset_next_sync	(int)
		   Enable forcing txg sync to find holes. When enabled	forces
		   ZFS	to act like prior versions when	SEEK_HOLE or SEEK_DATA
		   flags are used, which when a	dnode is dirty causes txg's to
		   be synced so	that this data can be found.

		   Use 1 for yes and 0 to disable (default).

       zfs_pd_bytes_max	(int)
		   The	number	of  bytes  which should	be prefetched during a
		   pool	traversal (eg: zfs send	or other data crawling	opera-
		   tions)

		   Default value: 52,428,800.

       zfs_per_txg_dirty_frees_percent	(ulong)
		   Tunable  to	control	 percentage of dirtied indirect	blocks
		   from	frees allowed into one TXG. After  this	 threshold  is
		   crossed,  additional	frees will wait	until the next TXG.  A
		   value of zero will disable this throttle.

		   Default value: 5, set to 0 to disable.

       zfs_prefetch_disable (int)
		   This	tunable	disables predictive prefetch.	Note  that  it
		   leaves  "prescient"	prefetch  (e.g.	prefetch for zfs send)
		   intact.  Unlike  predictive	prefetch,  prescient  prefetch
		   never issues	i/os that end up not being needed, so it can't
		   hurt	performance.

		   Use 1 for yes and 0 for no (default).

       zfs_qat_checksum_disable	(int)
		   This	tunable	disables qat hardware acceleration for	sha256
		   checksums.  It  may	be set after the zfs modules have been
		   loaded to initialize	the qat	hardware as long as support is
		   compiled in and the qat driver is present.

		   Use 1 for yes and 0 for no (default).

       zfs_qat_compress_disable	(int)
		   This	 tunable  disables  qat	hardware acceleration for gzip
		   compression.	It may be set after the	zfs modules have  been
		   loaded to initialize	the qat	hardware as long as support is
		   compiled in and the qat driver is present.

		   Use 1 for yes and 0 for no (default).

       zfs_qat_encrypt_disable (int)
		   This	tunable	disables qat hardware acceleration for AES-GCM
		   encryption.	It  may	be set after the zfs modules have been
		   loaded to initialize	the qat	hardware as long as support is
		   compiled in and the qat driver is present.

		   Use 1 for yes and 0 for no (default).

       zfs_read_chunk_size (long)
		   Bytes to read per chunk

		   Default value: 1,048,576.

       zfs_read_history	(int)
		   Historical  statistics  for the last	N reads	will be	avail-
		   able	in /proc/spl/kstat/zfs/<pool>/reads

		   Default value: 0 (no	data is	kept).

       zfs_read_history_hits (int)
		   Include cache hits in read history

		   Use 1 for yes and 0 for no (default).

       zfs_rebuild_max_segment (ulong)
		   Maximum read	segment	size to	issue when sequentially	resil-
		   vering a top-level vdev.

		   Default value: 1,048,576.

       zfs_rebuild_scrub_enabled (int)
		   Automatically  start	 a pool	scrub when the last active se-
		   quential resilver completes in order	to verify  the	check-
		   sums	 of all	blocks which have been resilvered. This	option
		   is enabled by default and is	strongly recommended.

		   Default value: 1.

       zfs_rebuild_vdev_limit (ulong)
		   Maximum amount of i/o that can be concurrently issued for a
		   sequential resilver per leaf	device,	given in bytes.

		   Default value: 33,554,432.

       zfs_reconstruct_indirect_combinations_max (int)
		   If  an  indirect  split  block contains more	than this many
		   possible unique combinations	when being reconstructed, con-
		   sider  it  too computationally expensive to check them all.
		   Instead,  try  at  most   zfs_reconstruct_indirect_combina-
		   tions_max  randomly-selected	 combinations  each  time  the
		   block is accessed.  This allows all segment copies to  par-
		   ticipate fairly in the reconstruction when all combinations
		   cannot be checked and prevents  repeated  use  of  one  bad
		   copy.

		   Default value: 4096.

       zfs_recover (int)
		   Set	to  attempt  to	recover	from fatal errors. This	should
		   only	be used	as a last resort, as it	typically  results  in
		   leaked space, or worse.

		   Use 1 for yes and 0 for no (default).

       zfs_removal_ignore_errors (int)

		   Ignore  hard	IO errors during device	removal.  When set, if
		   a device encounters a hard  IO  error  during  the  removal
		   process the removal will not	be cancelled.  This can	result
		   in a	normally recoverable block becoming  permanently  dam-
		   aged	and is not recommended.	 This should only be used as a
		   last	resort when the	pool cannot be returned	to  a  healthy
		   state prior to removing the device.

		   Default value: 0.

       zfs_removal_suspend_progress (int)

		   This	 is  used by the test suite so that it can ensure that
		   certain actions happen while	in the middle of a removal.

		   Default value: 0.

       zfs_remove_max_segment (int)

		   The largest contiguous segment that we will attempt to  al-
		   locate  when	removing a device.  This can be	no larger than
		   16MB.  If there is a	performance problem with attempting to
		   allocate large blocks, consider decreasing this.

		   Default value: 16,777,216 (16MB).

       zfs_resilver_disable_defer (int)
		   Disables  the  resilver_defer feature, causing an operation
		   that	would start a resilver to restart one in progress  im-
		   mediately.

		   Default value: 0 (feature enabled).

       zfs_resilver_min_time_ms	(int)
		   Resilvers are processed by the sync thread. While resilver-
		   ing it will spend at	least this much	time working on	a  re-
		   silver between txg flushes.

		   Default value: 3,000.

       zfs_scan_ignore_errors (int)
		   If set to a nonzero value, remove the DTL (dirty time list)
		   upon	completion of a	pool scan (scrub) even if  there  were
		   unrepairable	errors.	 It is intended	to be used during pool
		   repair or recovery to stop resilvering  when	 the  pool  is
		   next	imported.

		   Default value: 0.

       zfs_scrub_min_time_ms (int)
		   Scrubs are processed	by the sync thread. While scrubbing it
		   will	spend at least this much time working on a  scrub  be-
		   tween txg flushes.

		   Default value: 1,000.

       zfs_scan_checkpoint_intval (int)
		   To preserve progress	across reboots the sequential scan al-
		   gorithm periodically	needs to stop  metadata	 scanning  and
		   issue all the verifications I/Os to disk.  The frequency of
		   this	flushing is determined by the zfs_scan_checkpoint_int-
		   val tunable.

		   Default value: 7200 seconds (every 2	hours).

       zfs_scan_fill_weight (int)
		   This	 tunable  affects  how scrub and resilver I/O segments
		   are ordered.	A higher number	indicates that	we  care  more
		   about  how filled in	a segment is, while a lower number in-
		   dicates we care more	about the size of the  extent  without
		   considering	the  gaps within a segment. This value is only
		   tunable upon	module insertion. Changing  the	 value	after-
		   wards will have no affect on	scrub or resilver performance.

		   Default value: 3.

       zfs_scan_issue_strategy (int)
		   Determines  the  order  that	 data  will  be	verified while
		   scrubbing or	resilvering.  If set to	1, data	will be	 veri-
		   fied	 as sequentially as possible, given the	amount of mem-
		   ory reserved	 for  scrubbing	 (see  zfs_scan_mem_lim_fact).
		   This	 may  improve  scrub performance if the	pool's data is
		   very	fragmented. If set to 2, the largest mostly-contiguous
		   chunk  of  found  data will be verified first. By deferring
		   scrubbing of	small segments,	we  may	 later	find  adjacent
		   data	 to  coalesce and increase the segment size. If	set to
		   0, zfs will use strategy 1 during normal  verification  and
		   strategy 2 while taking a checkpoint.

		   Default value: 0.

       zfs_scan_legacy (int)
		   A  value  of	 0  indicates  that  scrubs and	resilvers will
		   gather metadata in memory before issuing sequential I/O.  A
		   value of 1 indicates	that the legacy	algorithm will be used
		   where I/O is	initiated as soon as it	is discovered.	Chang-
		   ing	this  value  to	 0 will	not affect scrubs or resilvers
		   that	are already in progress.

		   Default value: 0.

       zfs_scan_max_ext_gap (int)
		   Indicates the largest gap in	bytes between scrub / resilver
		   I/Os	 that  will still be considered	sequential for sorting
		   purposes. Changing this value will not affect scrubs	or re-
		   silvers that	are already in progress.

		   Default value: 2097152 (2 MB).

       zfs_scan_mem_lim_fact (int)
		   Maximum  fraction of	RAM used for I/O sorting by sequential
		   scan	algorithm.  This tunable determines the	hard limit for
		   I/O	sorting	 memory	usage.	When the hard limit is reached
		   we stop scanning metadata and start issuing data  verifica-
		   tion	I/O. This is done until	we get below the soft limit.

		   Default value: 20 which is 5% of RAM	(1/20).

       zfs_scan_mem_lim_soft_fact (int)
		   The	fraction of the	hard limit used	to determined the soft
		   limit for I/O sorting by  the  sequential  scan  algorithm.
		   When	 we  cross  this  limit	from below no action is	taken.
		   When	we cross this limit from above it is  because  we  are
		   issuing verification	I/O. In	this case (unless the metadata
		   scan	is done) we stop issuing verification  I/O  and	 start
		   scanning metadata again until we get	to the hard limit.

		   Default value: 20 which is 5% of the	hard limit (1/20).

       zfs_scan_strict_mem_lim (int)
		   Enforces  tight  memory limits on pool scans	when a sequen-
		   tial	scan is	in progress. When disabled  the	 memory	 limit
		   may be exceeded by fast disks.

		   Default value: 0.

       zfs_scan_suspend_progress (int)
		   Freezes a scrub/resilver in progress	without	actually paus-
		   ing it. Intended for	testing/debugging.

		   Default value: 0.

       zfs_scan_vdev_limit (int)
		   Maximum amount of data that can be concurrently  issued  at
		   once	 for  scrubs  and  resilvers per leaf device, given in
		   bytes.

		   Default value: 41943040.

       zfs_send_corrupt_data (int)
		   Allow sending of corrupt data (ignore read/checksum	errors
		   when	sending	data)

		   Use 1 for yes and 0 for no (default).

       zfs_send_unmodified_spill_blocks	(int)
		   Include  unmodified	spill blocks in	the send stream. Under
		   certain circumstances previous versions of ZFS could	incor-
		   rectly remove the spill block from an existing object.  In-
		   cluding unmodified copies of	the  spill  blocks  creates  a
		   backwards  compatible  stream  which	 will recreate a spill
		   block if it was incorrectly removed.

		   Use 1 for yes (default) and 0 for no.

       zfs_send_no_prefetch_queue_ff (int)
		   The fill fraction of	the zfs	send internal queues. The fill
		   fraction  controls  the  timing with	which internal threads
		   are woken up.

		   Default value: 20.

       zfs_send_no_prefetch_queue_length (int)
		   The maximum number of bytes allowed in zfs send's  internal
		   queues.

		   Default value: 1,048,576.

       zfs_send_queue_ff (int)
		   The	fill fraction of the zfs send prefetch queue. The fill
		   fraction controls the timing	with  which  internal  threads
		   are woken up.

		   Default value: 20.

       zfs_send_queue_length (int)
		   The maximum number of bytes allowed that will be prefetched
		   by zfs send.	 This value must be at least twice the maximum
		   block size in use.

		   Default value: 16,777,216.

       zfs_recv_queue_ff (int)
		   The	fill fraction of the zfs receive queue.	The fill frac-
		   tion	controls the timing with which	internal  threads  are
		   woken up.

		   Default value: 20.

       zfs_recv_queue_length (int)
		   The	maximum	 number	 of  bytes  allowed in the zfs receive
		   queue. This value must be at	least twice the	maximum	 block
		   size	in use.

		   Default value: 16,777,216.

       zfs_recv_write_batch_size (int)
		   The maximum amount of data (in bytes) that zfs receive will
		   write in one	DMU transaction.   This	 is  the  uncompressed
		   size,  even	when receiving a compressed send stream.  This
		   setting will	not reduce  the	 write	size  below  a	single
		   block. Capped at a maximum of 32MB

		   Default value: 1MB.

       zfs_override_estimate_recordsize	(ulong)
		   Setting this	variable overrides the default logic for esti-
		   mating block	sizes when  doing  a  zfs  send.  The  default
		   heuristic  is  that the average block size will be the cur-
		   rent	recordsize. Override this value	if most	data  in  your
		   dataset  is	not  of	that size and you require accurate zfs
		   send	size estimates.

		   Default value: 0.

       zfs_sync_pass_deferred_free (int)
		   Flushing of data to disk is done  in	 passes.  Defer	 frees
		   starting in this pass

		   Default value: 2.

       zfs_spa_discard_memory_limit (int)
		   Maximum  memory  used  for prefetching a checkpoint's space
		   map on each vdev while discarding the checkpoint.

		   Default value: 16,777,216.

       zfs_special_class_metadata_reserve_pct (int)
		   Only	allow small data blocks	to be allocated	on the special
		   and dedup vdev types	when the available free	space percent-
		   age on these	vdevs exceeds this  value.  This  ensures  re-
		   served space	is available for pool meta data	as the special
		   vdevs approach capacity.

		   Default value: 25.

       zfs_sync_pass_dont_compress (int)
		   Starting in this sync pass, we disable compression (includ-
		   ing	of  metadata).	With the default setting, in practice,
		   we don't have this many sync	passes,	so this	has no effect.

		   The original	intent was that	 disabling  compression	 would
		   help	the sync passes	to converge. However, in practice dis-
		   abling compression increases	the  average  number  of  sync
		   passes,  because  when  we  turn  compression off, a	lot of
		   block's size	will change and	thus we	 have  to  re-allocate
		   (not	overwrite) them. It also increases the number of 128KB
		   allocations (e.g. for indirect blocks  and  spacemaps)  be-
		   cause  these	 will  not be compressed. The 128K allocations
		   are especially detrimental to performance on	 highly	 frag-
		   mented  systems,  which  may	have very few free segments of
		   this	size, and may need to load new	metaslabs  to  satisfy
		   128K	allocations.

		   Default value: 8.

       zfs_sync_pass_rewrite (int)
		   Rewrite new block pointers starting in this pass

		   Default value: 2.

       zfs_sync_taskq_batch_pct	(int)
		   This	  controls   the   number   of	threads	 used  by  the
		   dp_sync_taskq.  The default value of	75% will create	a max-
		   imum	of one thread per cpu.

		   Default value: 75%.

       zfs_trim_extent_bytes_max (uint)
		   Maximum size	of TRIM	command.  Ranges larger	than this will
		   be  split  in  to  chunks  no  larger   than	  zfs_trim_ex-
		   tent_bytes_max bytes	before being issued to the device.

		   Default value: 134,217,728.

       zfs_trim_extent_bytes_min (uint)
		   Minimum  size  of  TRIM commands.  TRIM ranges smaller than
		   this	will be	skipped	unless they're part of a larger	 range
		   which  was  broken in to chunks.  This is done because it's
		   common for these small TRIMs	to negatively  impact  overall
		   performance.	  This value can be set	to 0 to	TRIM all unal-
		   located space.

		   Default value: 32,768.

       zfs_trim_metaslab_skip (uint)
		   Skip	uninitialized metaslabs	during the TRIM	process.  This
		   option  is  useful for pools	constructed from large thinly-
		   provisioned devices where TRIM operations are slow.	 As  a
		   pool	 ages  an  increasing  fraction	of the pools metaslabs
		   will	be initialized progressively degrading the  usefulness
		   of  this  option.   This  setting is	stored when starting a
		   manual TRIM and will	persist	for the	duration  of  the  re-
		   quested TRIM.

		   Default value: 0.

       zfs_trim_queue_limit (uint)
		   Maximum  number  of queued TRIMs outstanding	per leaf vdev.
		   The number of concurrent TRIM commands issued to the	device
		   is	controlled   by	  the	zfs_vdev_trim_min_active   and
		   zfs_vdev_trim_max_active module options.

		   Default value: 10.

       zfs_trim_txg_batch (uint)
		   The number of  transaction  groups  worth  of  frees	 which
		   should  be  aggregated before TRIM operations are issued to
		   the device.	This setting represents	 a  trade-off  between
		   issuing  larger, more efficient TRIM	operations and the de-
		   lay before the recently trimmed space is available for  use
		   by the device.

		   Increasing this value will allow frees to be	aggregated for
		   a longer time.  This	will result is larger TRIM  operations
		   and	potentially  increased	memory usage.  Decreasing this
		   value will have the opposite	effect.	 The default value  of
		   32 was determined to	be a reasonable	compromise.

		   Default value: 32.

       zfs_txg_history (int)
		   Historical statistics for the last N	txgs will be available
		   in /proc/spl/kstat/zfs/<pool>/txgs

		   Default value: 0.

       zfs_txg_timeout (int)
		   Flush dirty data to disk at least every N seconds  (maximum
		   txg duration)

		   Default value: 5.

       zfs_vdev_aggregate_trim (int)
		   Allow  TRIM	I/Os  to  be aggregated.  This is normally not
		   helpful because the extents to be trimmed  will  have  been
		   already  been  aggregated  by the metaslab.	This option is
		   provided for	debugging and performance analysis.

		   Default value: 0.

       zfs_vdev_aggregation_limit (int)
		   Max vdev I/O	aggregation size

		   Default value: 1,048,576.

       zfs_vdev_aggregation_limit_non_rotating (int)
		   Max vdev I/O	aggregation size for non-rotating media

		   Default value: 131,072.

       zfs_vdev_cache_bshift (int)
		   Shift size to inflate reads too

		   Default value: 16 (effectively 65536).

       zfs_vdev_cache_max (int)
		   Inflate  reads  smaller  than  this	value  to   meet   the
		   zfs_vdev_cache_bshift size (default 64k).

		   Default value: 16384.

       zfs_vdev_cache_size (int)
		   Total size of the per-disk cache in bytes.

		   Currently  this feature is disabled as it has been found to
		   not be helpful for performance and in some cases harmful.

		   Default value: 0.

       zfs_vdev_mirror_rotating_inc (int)
		   A number by which the balancing  algorithm  increments  the
		   load	 calculation  for  the	purpose	of selecting the least
		   busy	mirror member when an I/O immediately follows its pre-
		   decessor  on	rotational vdevs for the purpose of making de-
		   cisions based on load.

		   Default value: 0.

       zfs_vdev_mirror_rotating_seek_inc (int)
		   A number by which the balancing  algorithm  increments  the
		   load	 calculation  for  the	purpose	of selecting the least
		   busy	mirror member when an I/O lacks	locality as defined by
		   the zfs_vdev_mirror_rotating_seek_offset.  I/Os within this
		   that	are not	immediately following the previous I/O are in-
		   cremented by	half.

		   Default value: 5.

       zfs_vdev_mirror_rotating_seek_offset (int)
		   The	maximum	 distance for the last queued I/O in which the
		   balancing algorithm considers an I/O	to have	locality.  See
		   the section "ZFS I/O	SCHEDULER".

		   Default value: 1048576.

       zfs_vdev_mirror_non_rotating_inc	(int)
		   A  number  by  which	the balancing algorithm	increments the
		   load	calculation for	the purpose  of	 selecting  the	 least
		   busy	mirror member on non-rotational	vdevs when I/Os	do not
		   immediately follow one another.

		   Default value: 0.

       zfs_vdev_mirror_non_rotating_seek_inc (int)
		   A number by which the balancing  algorithm  increments  the
		   load	 calculation  for  the	purpose	of selecting the least
		   busy	mirror member when an I/O lacks	locality as defined by
		   the	zfs_vdev_mirror_rotating_seek_offset. I/Os within this
		   that	are not	immediately following the previous I/O are in-
		   cremented by	half.

		   Default value: 1.

       zfs_vdev_read_gap_limit (int)
		   Aggregate  read  I/O	 operations if the gap on-disk between
		   them	is within this threshold.

		   Default value: 32,768.

       zfs_vdev_write_gap_limit	(int)
		   Aggregate write I/O over gap

		   Default value: 4,096.

       zfs_vdev_raidz_impl (string)
		   Parameter for selecting raidz parity	implementation to use.

		   Options marked (always) below may  be  selected  on	module
		   load	 as  they are supported	on all systems.	 The remaining
		   options may only be set after the module is loaded, as they
		   are	available  only	if the implementations are compiled in
		   and supported on the	running	system.

		   Once	 the  module  is  loaded,  the	content	 of  /sys/mod-
		   ule/zfs/parameters/zfs_vdev_raidz_impl  will	show available
		   options with	the currently selected	one  enclosed  in  [].
		   Possible options are:
		     fastest   - (always) implementation selected using	built-
		   in benchmark
		     original -	(always) original raidz	implementation
		     scalar   -	(always) scalar	raidz implementation
		     sse2     -	 implementation	 using	SSE2  instruction  set
		   (64bit x86 only)
		     ssse3     -  implementation  using	 SSSE3 instruction set
		   (64bit x86 only)
		     avx2     -	 implementation	 using	AVX2  instruction  set
		   (64bit x86 only)
		     avx512f   -  implementation using AVX512F instruction set
		   (64bit x86 only)
		     avx512bw -	implementation using AVX512F  &	 AVX512BW  in-
		   struction sets (64bit x86 only)
		     aarch64_neon  - implementation using NEON (Aarch64/64 bit
		   ARMv8 only)
		     aarch64_neonx2 - implementation using NEON	with more  un-
		   rolling (Aarch64/64 bit ARMv8 only)
		     powerpc_altivec  -	 implementation	using Altivec (PowerPC
		   only)

		   Default value: fastest.

       zfs_vdev_scheduler (charp)
		   DEPRECATED: This option exists for compatibility with older
		   user	configurations.	It does	nothing	except print a warning
		   to the kernel log if	set.

       zfs_zevent_cols (int)
		   When	zevents	are logged to the console use this as the word
		   wrap	width.

		   Default value: 80.

       zfs_zevent_console (int)
		   Log events to the console

		   Use 1 for yes and 0 for no (default).

       zfs_zevent_len_max (int)
		   Max	event queue length. A value of 0 will result in	a cal-
		   culated value which increases with the number  of  CPUs  in
		   the	system (minimum	64 events). Events in the queue	can be
		   viewed with the zpool events	command.

		   Default value: 0.

       zfs_zevent_retain_max (int)
		   Maximum recent  zevent  records  to	retain	for  duplicate
		   checking.   Setting	this  value to zero disables duplicate
		   detection.

		   Default value: 2000.

       zfs_zevent_retain_expire_secs (int)
		   Lifespan for	a recent ereport that was retained for	dupli-
		   cate	checking.

		   Default value: 900.

       zfs_zil_clean_taskq_maxalloc (int)
		   The	maximum	number of taskq	entries	that are allowed to be
		   cached.  When this limit is	exceeded  transaction  records
		   (itxs) will be cleaned synchronously.

		   Default value: 1048576.

       zfs_zil_clean_taskq_minalloc (int)
		   The number of taskq entries that are	pre-populated when the
		   taskq is first created and are  immediately	available  for
		   use.

		   Default value: 1024.

       zfs_zil_clean_taskq_nthr_pct (int)
		   This	  controls   the   number   of	threads	 used  by  the
		   dp_zil_clean_taskq.	The default value of 100% will	create
		   a maximum of	one thread per cpu.

		   Default value: 100%.

       zil_maxblocksize	(int)
		   This	 sets the maximum block	size used by the ZIL.  On very
		   fragmented pools, lowering this (typically to 36KB) can im-
		   prove performance.

		   Default value: 131072 (128KB).

       zil_nocacheflush	(int)
		   Disable  the	cache flush commands that are normally sent to
		   the disk(s) by the ZIL after	an LWB	write  has  completed.
		   Setting  this  will cause ZIL corruption on power loss if a
		   volatile out-of-order write cache is	enabled.

		   Use 1 for yes and 0 for no (default).

       zil_replay_disable (int)
		   Disable intent logging replay. Can be disabled for recovery
		   from	corrupted ZIL

		   Use 1 for yes and 0 for no (default).

       zil_slog_bulk (ulong)
		   Limit  SLOG write size per commit executed with synchronous
		   priority.  Any writes above	that  will  be	executed  with
		   lower  (asynchronous)  priority to limit potential SLOG de-
		   vice	abuse by single	active ZIL writer.

		   Default value: 786,432.

       zfs_embedded_slog_min_ms	(int)
		   Usually, one	metaslab from each (normal-class) vdev is ded-
		   icated  for	use  by	 the  ZIL (to log synchronous writes).
		   However, if there are fewer	than  zfs_embedded_slog_min_ms
		   metaslabs  in  the  vdev,  this  functionality is disabled.
		   This	ensures	that we	don't set aside	an unreasonable	amount
		   of space for	the ZIL.

		   Default value: 64.

       zio_deadman_log_all (int)
		   If  non-zero,  the  zio deadman will	produce	debugging mes-
		   sages (see zfs_dbgmsg_enable) for  all  zios,  rather  than
		   only	 for  leaf zios	possessing a vdev. This	is meant to be
		   used	by developers to gain diagnostic information for  hang
		   conditions  which  don't  involve  a	mutex or other locking
		   primitive; typically	conditions in which a  thread  in  the
		   zio pipeline	is looping indefinitely.

		   Default value: 0.

       zio_decompress_fail_fraction (int)
		   If  non-zero,  this value represents	the denominator	of the
		   probability that zfs	should induce a	decompression failure.
		   For	instance,  for	a  5% decompression failure rate, this
		   value should	be set to 20.

		   Default value: 0.

       zio_slow_io_ms (int)
		   When	an I/O operation takes more than  zio_slow_io_ms  mil-
		   liseconds  to  complete is marked as	a slow I/O.  Each slow
		   I/O causes a	delay zevent.  Slow I/O	counters can  be  seen
		   with	"zpool status -s".

		   Default value: 30,000.

       zio_dva_throttle_enabled	(int)
		   Throttle block allocations in the I/O pipeline. This	allows
		   for dynamic allocation distribution when devices are	imbal-
		   anced.  When	enabled, the maximum number of pending alloca-
		   tions    per	   top-level	vdev	 is	limited	    by
		   zfs_vdev_queue_depth_pct.

		   Default value: 1.

       zio_requeue_io_start_cut_in_line	(int)
		   Prioritize requeued I/O

		   Default value: 0.

       zio_taskq_batch_pct (uint)
		   Percentage  of  online  CPUs	(or CPU	cores, etc) which will
		   run a worker	thread for I/O.	These workers are  responsible
		   for I/O work	such as	compression and	checksum calculations.
		   Fractional number of	CPUs will be rounded down.

		   The default value of	75 was chosen to avoid using all  CPUs
		   which  can result in	latency	issues and inconsistent	appli-
		   cation performance, especially when high compression	is en-
		   abled.

		   Default value: 75.

       zvol_inhibit_dev	(uint)
		   Do  not create zvol device nodes. This may slightly improve
		   startup time	on systems with	a very large number of zvols.

		   Use 1 for yes and 0 for no (default).

       zvol_major (uint)
		   Major number	for zvol block devices

		   Default value: 230.

       zvol_max_discard_blocks (ulong)
		   Discard (aka	TRIM) operations done on zvols will be done in
		   batches of this many	blocks,	where block size is determined
		   by the volblocksize property	of a zvol.

		   Default value: 16,384.

       zvol_prefetch_bytes (uint)
		   When	  adding   a   zvol    to    the    system    prefetch
		   zvol_prefetch_bytes	from  the start	and end	of the volume.
		   Prefetching these regions of	the volume  is	desirable  be-
		   cause  they	are  likely  to	 be  accessed  immediately  by
		   blkid(8) or by the kernel scanning for a partition table.

		   Default value: 131,072.

       zvol_request_sync (uint)
		   When	processing I/O requests	for a zvol  submit  them  syn-
		   chronously.	 This  effectively limits the queue depth to 1
		   for each I/O	submitter.  When set to	0 requests are handled
		   asynchronously  by  a  thread pool.	The number of requests
		   which  can  be  handled  concurrently  is   controller   by
		   zvol_threads.

		   Default value: 0.

       zvol_threads (uint)
		   Max	number	of  threads which can handle zvol I/O requests
		   concurrently.

		   Default value: 32.

       zvol_volmode (uint)
		   Defines zvol	block devices behaviour	when volmode is	set to
		   default.  Valid values are 1	(full),	2 (dev)	and 3 (none).

		   Default value: 1.

ZFS I/O	SCHEDULER
       ZFS  issues  I/O	operations to leaf vdevs to satisfy and	complete I/Os.
       The I/O scheduler determines when and in	what  order  those  operations
       are issued.  The	I/O scheduler divides operations into five I/O classes
       prioritized in the following order: sync	read, sync write, async	 read,
       async  write,  and  scrub/resilver.  Each queue defines the minimum and
       maximum number of concurrent operations that may	be issued to  the  de-
       vice.	In   addition,	 the   device	has   an   aggregate  maximum,
       zfs_vdev_max_active. Note that the sum of the per-queue	minimums  must
       not exceed the aggregate	maximum.  If the sum of	the per-queue maximums
       exceeds the aggregate maximum, then the number of active	I/Os may reach
       zfs_vdev_max_active,  in	 which case no further I/Os will be issued re-
       gardless	of whether all per-queue minimums have been met.

       For many	physical devices, throughput increases with the	number of con-
       current	operations,  but  latency typically suffers. Further, physical
       devices typically have a	limit at which more concurrent operations have
       no effect on throughput or can actually cause it	to decrease.

       The  scheduler selects the next operation to issue by first looking for
       an I/O class whose minimum has not been satisfied. Once all are	satis-
       fied  and  the  aggregate maximum has not been hit, the scheduler looks
       for classes whose maximum has not been satisfied. Iteration through the
       I/O classes is done in the order	specified above. No further operations
       are issued if the aggregate maximum number of concurrent	operations has
       been hit	or if there are	no operations queued for an I/O	class that has
       not hit its maximum.  Every time	an I/O is queued or an operation  com-
       pletes, the I/O scheduler looks for new operations to issue.

       In general, smaller max_active's	will lead to lower latency of synchro-
       nous operations.	  Larger  max_active's	may  lead  to  higher  overall
       throughput, depending on	underlying storage.

       The  ratio of the queues' max_actives determines	the balance of perfor-
       mance  between	reads,	 writes,   and	 scrubs.    E.g.,   increasing
       zfs_vdev_scrub_max_active  will cause the scrub or resilver to complete
       more quickly, but reads and writes to have  higher  latency  and	 lower
       throughput.

       All  I/O	 classes have a	fixed maximum number of	outstanding operations
       except for the async write class.  Asynchronous	writes	represent  the
       data  that  is committed	to stable storage during the syncing stage for
       transaction groups. Transaction groups enter the	syncing	state periodi-
       cally  so  the  number of queued	async writes will quickly burst	up and
       then bleed down to zero.	Rather than servicing them as quickly as  pos-
       sible,  the  I/O	 scheduler  changes the	maximum	number of active async
       write I/Os according to the amount of dirty data	in  the	 pool.	 Since
       both  throughput	and latency typically increase with the	number of con-
       current operations issued to physical devices, reducing the  burstiness
       in  the	number	of  concurrent operations also stabilizes the response
       time of operations from other  --  and  in  particular  synchronous  --
       queues.	In broad strokes, the I/O scheduler will issue more concurrent
       operations from the async write queue as	there's	more dirty data	in the
       pool.

       Async Writes

       The  number  of	concurrent  operations	issued for the async write I/O
       class follows a piece-wise linear function defined by a few  adjustable
       points.

	      |		     o---------| <-- zfs_vdev_async_write_max_active
	 ^    |		    /^	       |
	 |    |		   / |	       |
       active |		  /  |	       |
	I/O   |		 /   |	       |
       count  |		/    |	       |
	      |	       /     |	       |
	      |-------o	     |	       | <-- zfs_vdev_async_write_min_active
	     0|_______^______|_________|
	      0%      |	     |	     100% of zfs_dirty_data_max
		      |	     |
		      |	     `-- zfs_vdev_async_write_active_max_dirty_percent
		      `--------- zfs_vdev_async_write_active_min_dirty_percent

       Until  the  amount  of  dirty  data exceeds a minimum percentage	of the
       dirty data allowed in the pool, the I/O scheduler will limit the	number
       of  concurrent operations to the	minimum. As that threshold is crossed,
       the number of concurrent	operations issued increases  linearly  to  the
       maximum	at  the	specified maximum percentage of	the dirty data allowed
       in the pool.

       Ideally,	the amount of dirty data on a  busy  pool  will	 stay  in  the
       sloped	part   of   the	  function   between  zfs_vdev_async_write_ac-
       tive_min_dirty_percent  and  zfs_vdev_async_write_active_max_dirty_per-
       cent.  If  it  exceeds  the maximum percentage, this indicates that the
       rate of incoming	data is	greater	than the rate that the backend storage
       can  handle. In this case, we must further throttle incoming writes, as
       described in the	next section.

ZFS TRANSACTION	DELAY
       We delay	transactions when we've	determined that	 the  backend  storage
       isn't able to accommodate the rate of incoming writes.

       If  there  is  already a	transaction waiting, we	delay relative to when
       that transaction	will finish waiting.  This way	the  calculated	 delay
       time  is	 independent  of  the number of	threads	concurrently executing
       transactions.

       If we are the only  waiter,  wait  relative  to	when  the  transaction
       started,	 rather	 than  the current time.  This credits the transaction
       for "time already served", e.g. reading indirect	blocks.

       The minimum time	for a transaction to take is calculated	as:
	   min_time = zfs_delay_scale *	(dirty - min) /	(max - dirty)
	   min_time is then capped at 100 milliseconds.

       The delay has two degrees of freedom that can be	adjusted via tunables.
       The  percentage	of dirty data at which we start	to delay is defined by
       zfs_delay_min_dirty_percent. This  should  typically  be	 at  or	 above
       zfs_vdev_async_write_active_max_dirty_percent  so that we only start to
       delay after writing at full speed has failed to keep up with the	incom-
       ing  write  rate. The scale of the curve	is defined by zfs_delay_scale.
       Roughly speaking, this variable determines the amount of	delay  at  the
       midpoint	of the curve.

       delay
	10ms +-------------------------------------------------------------*+
	     |								   *|
	 9ms +								   *+
	     |								   *|
	 8ms +								   *+
	     |								  * |
	 7ms +								  * +
	     |								  * |
	 6ms +								  * +
	     |								  * |
	 5ms +								 *  +
	     |								 *  |
	 4ms +								 *  +
	     |								 *  |
	 3ms +								*   +
	     |								*   |
	 2ms +						    (midpoint) *    +
	     |							|    **	    |
	 1ms +							v ***	    +
	     |		   zfs_delay_scale ---------->	   ********	    |
	   0 +-------------------------------------*********----------------+
	     0%			   <- zfs_dirty_data_max ->		  100%

       Note that since the delay is added to the outstanding time remaining on
       the most	recent transaction, the	delay is effectively  the  inverse  of
       IOPS.  Here the midpoint	of 500us translates to 2000 IOPS. The shape of
       the curve was chosen such that small changes in the amount  of  accumu-
       lated  dirty  data in the first 3/4 of the curve	yield relatively small
       differences in the amount of delay.

       The effects can be easier to understand when the	 amount	 of  delay  is
       represented on a	log scale:

       delay
       100ms +-------------------------------------------------------------++
	     +								    +
	     |								    |
	     +								   *+
	10ms +								   *+
	     +								 ** +
	     |						    (midpoint)	**  |
	     +							|     **    +
	 1ms +							v ****	    +
	     +		   zfs_delay_scale ---------->	      *****	    +
	     |						   ****		    |
	     +						****		    +
       100us +					      **		    +
	     +					     *			    +
	     |					    *			    |
	     +					   *			    +
	10us +					   *			    +
	     +								    +
	     |								    |
	     +								    +
	     +--------------------------------------------------------------+
	     0%			   <- zfs_dirty_data_max ->		  100%

       Note  here  that	 only as the amount of dirty data approaches its limit
       does the	delay start to increase	rapidly. The goal of a properly	 tuned
       system  should be to keep the amount of dirty data out of that range by
       first ensuring that the appropriate limits are set for the  I/O	sched-
       uler  to	 reach	optimal	throughput on the backend storage, and then by
       changing	the value of zfs_delay_scale to	increase the steepness of  the
       curve.

OpenZFS				 Aug 24, 2020	      ZFS-MODULE-PARAMETERS(5)

NAME | DESCRIPTION | ZFS I/O SCHEDULER | ZFS TRANSACTION DELAY

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=zfs-module-parameters&sektion=5&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help