19.6. 進階主題

19.6.1. 調校

這裡有數個可調校的項目可以調整,來讓 ZFS 在面對各種工作都能以最佳狀況運作。

  • vfs.zfs.arc_max - Maximum size of the ARC. The default is all RAM less 1 GB, or one half of RAM, whichever is more. However, a lower value should be used if the system will be running any other daemons or processes that may require memory. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.arc_meta_limit - Limit the portion of the ARC that can be used to store metadata. The default is one fourth of vfs.zfs.arc_max. Increasing this value will improve performance if the workload involves operations on a large number of files and directories, or frequent metadata operations, at the cost of less file data fitting in the ARC. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.arc_min - Minimum size of the ARC. The default is one half of vfs.zfs.arc_meta_limit. Adjust this value to prevent other applications from pressuring out the entire ARC. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.vdev.cache.size - A preallocated amount of memory reserved as a cache for each device in the pool. The total amount of memory used will be this value multiplied by the number of devices. This value can only be adjusted at boot time, and is set in /boot/loader.conf.

  • vfs.zfs.min_auto_ashift - Minimum ashift (sector size) that will be used automatically at pool creation time. The value is a power of two. The default value of 9 represents 2^9 = 512, a sector size of 512 bytes. To avoid write amplification and get the best performance, set this value to the largest sector size used by a device in the pool.

    Many drives have 4 KB sectors. Using the default ashift of 9 with these drives results in write amplification on these devices. Data that could be contained in a single 4 KB write must instead be written in eight 512-byte writes. ZFS tries to read the native sector size from all devices when creating a pool, but many drives with 4 KB sectors report that their sectors are 512 bytes for compatibility. Setting vfs.zfs.min_auto_ashift to 12 (2^12 = 4096) before creating a pool forces ZFS to use 4 KB blocks for best performance on these drives.

    Forcing 4 KB blocks is also useful on pools where disk upgrades are planned. Future disks are likely to use 4 KB sectors, and ashift values cannot be changed after a pool is created.

    In some specific cases, the smaller 512-byte block size might be preferable. When used with 512-byte disks for databases, or as storage for virtual machines, less data is transferred during small random reads. This can provide better performance, especially when using a smaller ZFS record size.

  • vfs.zfs.prefetch_disable - Disable prefetch. A value of 0 is enabled and 1 is disabled. The default is 0, unless the system has less than 4 GB of RAM. Prefetch works by reading larger blocks than were requested into the ARC in hopes that the data will be needed soon. If the workload has a large number of random reads, disabling prefetch may actually improve performance by reducing unnecessary reads. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.vdev.trim_on_init - Control whether new devices added to the pool have the TRIM command run on them. This ensures the best performance and longevity for SSDs, but takes extra time. If the device has already been secure erased, disabling this setting will make the addition of the new device faster. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.vdev.max_pending - Limit the number of pending I/O requests per device. A higher value will keep the device command queue full and may give higher throughput. A lower value will reduce latency. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.top_maxinflight - Maxmimum number of outstanding I/Os per top-level vdev. Limits the depth of the command queue to prevent high latency. The limit is per top-level vdev, meaning the limit applies to each mirror, RAID-Z, or other vdev independently. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.l2arc_write_max - Limit the amount of data written to the L2ARC per second. This tunable is designed to extend the longevity of SSDs by limiting the amount of data written to the device. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.l2arc_write_boost - The value of this tunable is added to vfs.zfs.l2arc_write_max and increases the write speed to the SSD until the first block is evicted from the L2ARC. This Turbo Warmup Phase is designed to reduce the performance loss from an empty L2ARC after a reboot. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.scrub_delay - Number of ticks to delay between each I/O during a scrub. To ensure that a scrub does not interfere with the normal operation of the pool, if any other I/O is happening the scrub will delay between each command. This value controls the limit on the total IOPS (I/Os Per Second) generated by the scrub. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 4, resulting in a limit of: 1000 ticks/sec / 4 = 250 IOPS. Using a value of 20 would give a limit of: 1000 ticks/sec / 20 = 50 IOPS. The speed of scrub is only limited when there has been recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.resilver_delay - Number of milliseconds of delay inserted between each I/O during a resilver. To ensure that a resilver does not interfere with the normal operation of the pool, if any other I/O is happening the resilver will delay between each command. This value controls the limit of total IOPS (I/Os Per Second) generated by the resilver. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 2, resulting in a limit of: 1000 ticks/sec / 2 = 500 IOPS. Returning the pool to an Online state may be more important if another device failing could Fault the pool, causing data loss. A value of 0 will give the resilver operation the same priority as other operations, speeding the healing process. The speed of resilver is only limited when there has been other recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.scan_idle - Number of milliseconds since the last operation before the pool is considered idle. When the pool is idle the rate limiting for scrub and resilver are disabled. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.txg.timeout - Maximum number of seconds between transaction groups. The current transaction group will be written to the pool and a fresh transaction group started if this amount of time has elapsed since the previous transaction group. A transaction group my be triggered earlier if enough data is written. The default value is 5 seconds. A larger value may improve read performance by delaying asynchronous writes, but this may cause uneven performance when the transaction group is written. This value can be adjusted at any time with sysctl(8).

19.6.2. i386 上的 ZFS

ZFS 所提供的部份功能需要使用大量記憶體,且可能需要對有限 RAM 的系統調校來取得最佳的效率。

19.6.2.1. 記憶體

最低需求,總系統記憶體應至少有 1 GB,建議的 RAM 量需視儲存池的大小以及使用的 ZFS 功能而定。一般的經驗法則是每 1 TB 的儲存空間需要 1 GB 的 RAM,若有開啟去重複的功能,一般的經驗法則是每 1 TB 的要做去重複的儲存空間需要 5 GB 的 RAM。雖然有部份使用者成功使用較少的 RAM 來運作 ZFS,但系統在負載較重時有可能會因為記憶用耗而導致當機,對於要使用低於建議 RAM 需求量來運作的系統可能會需要更進一步的調校。

19.6.2.2. 核心設定

由於在 i386™ 平台上位址空間的限制,在 i386™ 架構上的 ZFS 使用者必須加入這個選項到自訂核心設定檔,重新編譯核心並重新開啟:

options        KVA_PAGES=512

這個選項會增加核心位址空間,允許調整 vm.kvm_size 超出目前的 1 GB 限制或在 PAE 的 2 GB 限制。要找到這個選項最合適的數值,可以將想要的位址空間換算成 MB 然後除以 4,在本例中,以 2 GB 計算後即為 512

19.6.2.3. 載入程式可調參數

在所有的 FreeBSD 架構上均可增加 kmem 位址空間,經測試在一個 1 GB 實體記憶體的測試系統上,加入以下選項到 /boot/loader.conf,重新開啟系統,可成功設定。

vm.kmem_size="330M"
vm.kmem_size_max="330M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"

For a more detailed list of recommendations for ZFS-related tuning, see https://wiki.freebsd.org/ZFSTuningGuide.

本文及其他文件,可由此下載: ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/

若有 FreeBSD 方面疑問,請先閱讀 FreeBSD 相關文件,如不能解決的話,再洽詢 <questions@FreeBSD.org>。

關於本文件的問題,請洽詢 <doc@FreeBSD.org>。