21.2. The Z File System (ZFS)

The Z file system, originally developed by Sun™, is designed to future proof the file system by removing many of the arbitrary limits imposed on previous file systems. ZFS allows continuous growth of the pooled storage by adding additional devices. ZFS allows you to create many file systems (in addition to block devices) out of a single shared pool of storage. Space is allocated as needed, so all remaining free space is available to each file system in the pool. It is also designed for maximum data integrity, supporting data snapshots, multiple copies, and cryptographic checksums. It uses a software data replication model, known as RAID-Z. RAID-Z provides redundancy similar to hardware RAID, but is designed to prevent data write corruption and to overcome some of the limitations of hardware RAID.

21.2.1. ZFS Features and Terminology

ZFS is a fundamentally different file system because it is more than just a file system. ZFS combines the roles of file system and volume manager, enabling additional storage devices to be added to a live system and having the new space available on all of the existing file systems in that pool immediately. By combining the traditionally separate roles, ZFS is able to overcome previous limitations that prevented RAID groups being able to grow. Each top level device in a zpool is called a vdev, which can be a simple disk or a RAID transformation such as a mirror or RAID-Z array. ZFS file systems (called datasets), each have access to the combined free space of the entire pool. As blocks are allocated the free space in the pool available to of each file system is decreased. This approach avoids the common pitfall with extensive partitioning where free space becomes fragmentated across the partitions.

zpoolA storage pool is the most basic building block of ZFS. A pool is made up of one or more vdevs, the underlying devices that store the data. A pool is then used to create one or more file systems (datasets) or block devices (volumes). These datasets and volumes share the pool of remaining free space. Each pool is uniquely identified by a name and a GUID. The zpool also controls the version number and therefore the features available for use with ZFS.

Note:

FreeBSD 9.0 and 9.1 include support for ZFS version 28. Future versions use ZFS version 5000 with feature flags. This allows greater cross-compatibility with other implementations of ZFS.

vdev TypesA zpool is made up of one or more vdevs, which themselves can be a single disk or a group of disks, in the case of a RAID transform. When multiple vdevs are used, ZFS spreads data across the vdevs to increase performance and maximize usable space.
  • Disk - The most basic type of vdev is a standard block device. This can be an entire disk (such as /dev/ada0 or /dev/da0) or a partition (/dev/ada0p3). Contrary to the Solaris documentation, on FreeBSD there is no performance penalty for using a partition rather than an entire disk.

  • File - In addition to disks, ZFS pools can be backed by regular files, this is especially useful for testing and experimentation. Use the full path to the file as the device path in the zpool create command. All vdevs must be atleast 128 MB in size.

  • Mirror - When creating a mirror, specify the mirror keyword followed by the list of member devices for the mirror. A mirror consists of two or more devices, all data will be written to all member devices. A mirror vdev will only hold as much data as its smallest member. A mirror vdev can withstand the failure of all but one of its members without losing any data.

    Note:

    A regular single disk vdev can be upgraded to a mirror vdev at any time using the zpool attach command.

  • RAID-Z - ZFS implements RAID-Z, a variation on standard RAID-5 that offers better distribution of parity and eliminates the "RAID-5 write hole" in which the data and parity information become inconsistent after an unexpected restart. ZFS supports 3 levels of RAID-Z which provide varying levels of redundancy in exchange for decreasing levels of usable storage. The types are named RAID-Z1 through Z3 based on the number of parity devinces in the array and the number of disks that the pool can operate without.

    In a RAID-Z1 configuration with 4 disks, each 1 TB, usable storage will be 3 TB and the pool will still be able to operate in degraded mode with one faulted disk. If an additional disk goes offline before the faulted disk is replaced and resilvered, all data in the pool can be lost.

    In a RAID-Z3 configuration with 8 disks of 1 TB, the volume would provide 5TB of usable space and still be able to operate with three faulted disks. Sun recommends no more than 9 disks in a single vdev. If the configuration has more disks, it is recommended to divide them into separate vdevs and the pool data will be striped across them.

    A configuration of 2 RAID-Z2 vdevs consisting of 8 disks each would create something similar to a RAID 60 array. A RAID-Z group's storage capacity is approximately the size of the smallest disk, multiplied by the number of non-parity disks. 4x 1 TB disks in Z1 has an effective size of approximately 3 TB, and a 8x 1 TB array in Z3 will yeild 5 TB of usable space.

  • Spare - ZFS has a special pseudo-vdev type for keeping track of available hot spares. Note that installed hot spares are not deployed automatically; they must manually be configured to replace the failed device using the zfs replace command.

  • Log - ZFS Log Devices, also known as ZFS Intent Log (ZIL) move the intent log from the regular pool devices to a dedicated device. The ZIL accelerates synchronous transactions by using storage devices (such as SSDs) that are faster compared to those used for the main pool. When data is being written and the application requests a guarantee that the data has been safely stored, the data is written to the faster ZIL storage, then later flushed out to the regular disks, greatly reducing the latency of synchronous writes. Log devices can be mirrored, but RAID-Z is not supported. When specifying multiple log devices writes will be load balanced across all devices.

  • Cache - Adding a cache vdev to a zpool will add the storage of the cache to the L2ARC. Cache devices cannot be mirrored. Since a cache device only stores additional copies of existing data, there is no risk of data loss.

Adaptive Replacement Cache (ARC)ZFS uses an Adaptive Replacement Cache (ARC), rather than a more traditional Least Recently Used (LRU) cache. An LRU cache is a simple list of items in the cache sorted by when each object was most recently used; new items are added to the top of the list and once the cache is full items from the bottom of the list are evicted to make room for more active objects. An ARC consists of four lists; the Most Recently Used (MRU) and Most Frequently Used (MFU) objects, plus a ghost list for each. These ghost lists tracks recently evicted objects to provent them being added back to the cache. This increases the cache hit ratio by avoiding objects that have a history of only being used occasionally. Another advantage of using both an MRU and MFU is that scanning an entire filesystem would normally evict all data from an MRU or LRU cache in favor of this freshly accessed content. In the case of ZFS since there is also an MFU that only tracks the most frequently used objects, the cache of the most commonly accessed blocks remains.
L2ARCThe L2ARC is the second level of the ZFS caching system. The primary ARC is stored in RAM, however since the amount of available RAM is often limited, ZFS can also make use of cache vdevs. Solid State Disks (SSDs) are often used as these cache devices due to their higher speed and lower latency compared to traditional spinning disks. An L2ARC is entirely optional, but having one will significantly increase read speeds for files that are cached on the SSD instead of having to be read from the regular spinning disks. The L2ARC can also speed up deduplication since a DDT that does not fit in RAM but does fit in the L2ARC will be much faster than if the DDT had to be read from disk. The rate at which data is added to the cache devices is limited to prevent prematurely wearing out the SSD with too many writes. Until the cache is full (the first block has been evicted to make room), writing to the L2ARC is limited to the sum of the write limit and the boost limit, then after that limited to the write limit. A pair of sysctl values control these rate limits; vfs.zfs.l2arc_write_max controls how many bytes are written to the cache per second, while vfs.zfs.l2arc_write_boost adds to this limit during the "Turbo Warmup Phase" (Write Boost).
Copy-On-WriteUnlike a traditional file system, when data is overwritten on ZFS the new data is written to a different block rather than overwriting the old data in place. Only once this write is complete is the metadata then updated to point to the new location of the data. This means that in the event of a shorn write (a system crash or power loss in the middle of writing a file) the entire original contents of the file are still available and the incomplete write is discarded. This also means that ZFS does not require a fsck after an unexpected shutdown.
DatasetDataset is the generic term for a ZFS file system, volume, snapshot or clone. Each dataset will have a unique name in the format: poolname/path@snapshot. The root of the pool is technically a dataset as well. Child datasets are named hierarchically like directories; for example mypool/home, the home dataset is a child of mypool and inherits properties from it. This can be expended further by creating mypool/home/user. This grandchild dataset will inherity properties from the parent and grandparent. It is also possible to set properties on a child to override the defaults inherited from the parents and grandparents. ZFS also allows administration of datasets and their children to be delegated.
VolumeIn additional to regular file system datasets, ZFS can also create volumes, which are block devices. Volumes have many of the same features, including copy-on-write, snapshots, clones and checksumming. Volumes can be useful for running other file system formats on top of ZFS, such as UFS or in the case of Virtualization or exporting iSCSI extents.
SnapshotThe copy-on-write design of ZFS allows for nearly instantaneous consistent snapshots with arbitrary names. After taking a snapshot of a dataset (or a recursive snapshot of a parent dataset that will include all child datasets), new data is written to new blocks (as described above), however the old blocks are not reclaimed as free space. There are then two versions of the file system, the snapshot (what the file system looked like before) and the live file system; however no additional space is used. As new data is written to the live file system, new blocks are allocated to store this data. The apparent size of the snapshot will grow as the blocks are no longer used in the live file system, but only in the snapshot. These snapshots can be mounted (read only) to allow for the recovery of previous versions of files. It is also possible to rollback a live file system to a specific snapshot, undoing any changes that took place after the snapshot was taken. Each block in the zpool has a reference counter which indicates how many snapshots, clones, datasets or volumes make use of that block. As files and snapshots are deleted, the reference count is decremented; once a block is no longer referenced, it is reclaimed as free space. Snapshots can also be marked with a hold, once a snapshot is held, any attempt to destroy it will return an EBUY error. Each snapshot can have multiple holds, each with a unique name. The release command removes the hold so the snapshot can then be deleted. Snapshots can be taken on volumes, however they can only be cloned or rolled back, not mounted independently.
CloneSnapshots can also be cloned; a clone is a writable version of a snapshot, allowing the file system to be forked as a new dataset. As with a snapshot, a clone initially consumes no additional space, only as new data is written to a clone and new blocks are allocated does the apparent size of the clone grow. As blocks are overwritten in the cloned file system or volume, the reference count on the previous block is decremented. The snapshot upon which a clone is based cannot be deleted because the clone is dependeant upon it (the snapshot is the parent, and the clone is the child). Clones can be promoted, reversing this dependeancy, making the clone the parent and the previous parent the child. This operation requires no additional space, however it will change the way the used space is accounted.
ChecksumEvery block that is allocated is also checksummed (which algorithm is used is a per dataset property, see: zfs set). ZFS transparently validates the checksum of each block as it is read, allowing ZFS to detect silent corruption. If the data that is read does not match the expected checksum, ZFS will attempt to recover the data from any available redundancy (mirrors, RAID-Z). You can trigger the validation of all checksums using the scrub command. The available checksum algorithms include:
  • fletcher2

  • fletcher4

  • sha256

The fletcher algorithms are faster, but sha256 is a strong cryptographic hash and has a much lower chance of a collisions at the cost of some performance. Checksums can be disabled but it is inadvisable.
CompressionEach dataset in ZFS has a compression property, which defaults to off. This property can be set to one of a number of compression algorithms, which will cause all new data that is written to this dataset to be compressed as it is written. In addition to the reduction in disk usage, this can also increase read and write throughput, as only the smaller compressed version of the file needs to be read or written.

Note:

LZ4 compression is only available after FreeBSD 9.2

DeduplicationZFS has the ability to detect duplicate blocks of data as they are written (thanks to the checksumming feature). If deduplication is enabled, instead of writing the block a second time, the reference count of the existing block will be increased, saving storage space. In order to do this, ZFS keeps a deduplication table (DDT) in memory, containing the list of unique checksums, the location of that block and a reference count. When new data is written, the checksum is calculated and compared to the list. If a match is found, the data is considered to be a duplicate. When deduplication is enabled, the checksum algorithm is changed to SHA256 to provide a secure cryptographic hash. ZFS deduplication is tunable; if dedup is on, then a matching checksum is assumed to mean that the data is identical. If dedup is set to verify, then the data in the two blocks will be checked byte-for-byte to ensure it is actually identical and if it is not, the hash collision will be noted by ZFS and the two blocks will be stored separately. Due to the nature of the DDT, having to store the hash of each unique block, it consumes a very large amount of memory (a general rule of thumb is 5-6 GB of ram per 1 TB of deduplicated data). In situations where it is not practical to have enough RAM to keep the entire DDT in memory, performance will suffer greatly as the DDT will need to be read from disk before each new block is written. Deduplication can make use of the L2ARC to store the DDT, providing a middle ground between fast system memory and slower disks. It is advisable to consider using ZFS compression instead, which often provides nearly as much space savings without the additional memory requirement.
ScrubIn place of a consistency check like fsck, ZFS has the scrub command, which reads all data blocks stored on the pool and verifies their checksums them against the known good checksums stored in the metadata. This periodic check of all the data stored on the pool ensures the recovery of any corrupted blocks before they are needed. A scrub is not required after an unclean shutdown, but it is recommended that you run a scrub at least once each quarter. ZFS compares the checksum for each block as it is read in the normal course of use, but a scrub operation makes sure even infrequently used blocks are checked for silent corruption.
Dataset QuotaZFS provides very fast and accurate dataset, user and group space accounting in addition to quotes and space reservations. This gives the administrator fine grained control over how space is allocated and allows critical file systems to reserve space to ensure other file systems do not take all of the free space.

ZFS supports different types of quotas: the dataset quota, the reference quota (refquota), the user quota, and the group quota.

Quotas limit the amount of space that a dataset and all of its descendants (snapshots of the dataset, child datasets and the snapshots of those datasets) can consume.

Note:

Quotas cannot be set on volumes, as the volsize property acts as an implicit quota.

Reference QuotaA reference quota limits the amount of space a dataset can consume by enforcing a hard limit on the space used. However, this hard limit includes only space that the dataset references and does not include space used by descendants, such as file systems or snapshots.
User QuotaUser quotas are useful to limit the amount of space that can be used by the specified user.
Group QuotaThe group quota limits the amount of space that a specified group can consume.
Dataset ReservationThe reservation property makes it possible to guaranteed a minimum amount of space for the use of a specific dataset and its descendants. This means that if a 10 GB reservation is set on storage/home/bob, if another dataset tries to use all of the free space, at least 10 GB of space is reserved for this dataset. If a snapshot is taken of storage/home/bob, the space used by that snapshot is counted against the reservation. The refreservation property works in a similar way, except it excludes descendants, such as snapshots.

Reservations of any sort are useful in many situations, such as planning and testing the suitability of disk space allocation in a new system, or ensuring that enough space is available on file systems for audio logs or system recovery procedures and files.

Reference ReservationThe refreservation property makes it possible to guaranteed a minimum amount of space for the use of a specific dataset excluding its descendants. This means that if a 10 GB reservation is set on storage/home/bob, if another dataset tries to use all of the free space, at least 10 GB of space is reserved for this dataset. In contrast to a regular reservation, space used by snapshots and decendant datasets is not counted against the reservation. As an example, if a snapshot was taken of storage/home/bob, enough disk space would have to exist outside of the refreservation amount for the operation to succeed because descendants of the main data set are not counted by the refreservation amount and so do not encroach on the space set.
ResilverWhen a disk fails and must be replaced, the new disk must be filled with the data that was lost. This process of calculating and writing the missing data (using the parity information distributed across the remaining drives) to the new drive is called Resilvering.

21.2.2. What Makes ZFS Different

ZFS is significantly different from any previous file system owing to the fact that it is more than just a file system. ZFS combines the traditionally separate roles of volume manager and file system, which provides unique advantages because the file system is now aware of the underlying structure of the disks. Traditional file systems could only be created on a single disk at a time, if there were two disks then two separate file systems would have to be created. In a traditional hardware RAID configuration, this problem was worked around by presenting the operating system with a single logical disk made up of the space provided by a number of disks, on top of which the operating system placed its file system. Even in the case of software RAID solutions like GEOM, the UFS file system living on top of the RAID transform believed that it was dealing with a single device. ZFS's combination of the volume manager and the file system solves this and allows the creation of many file systems all sharing a pool of available storage. One of the biggest advantages to ZFS's awareness of the physical layout of the disks is that ZFS can grow the existing file systems automatically when additional disks are added to the pool. This new space is then made available to all of the file systems. ZFS also has a number of different properties that can be applied to each file system, creating many advantages to creating a number of different filesystems and datasets rather than a single monolithic filesystem.

21.2.3. ZFS Quick Start Guide

There is a start up mechanism that allows FreeBSD to mount ZFS pools during system initialization. To set it, issue the following commands:

# echo 'zfs_enable="YES"' >> /etc/rc.conf # service zfs start

The examples in this section assume three SCSI disks with the device names da0, da1, and da2. Users of SATA hardware should instead use ada device names.

21.2.3.1. Single Disk Pool

To create a simple, non-redundant ZFS pool using a single disk device, use zpool:

# zpool create example /dev/da0

To view the new pool, review the output of df:

# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235230 1628718 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032846 48737598 2% /usr example 17547136 0 17547136 0% /example

This output shows that the example pool has been created and mounted. It is now accessible as a file system. Files may be created on it and users can browse it, as seen in the following example:

# cd /example # ls # touch testfile # ls -al total 4 drwxr-xr-x 2 root wheel 3 Aug 29 23:15 . drwxr-xr-x 21 root wheel 512 Aug 29 23:12 .. -rw-r--r-- 1 root wheel 0 Aug 29 23:15 testfile

However, this pool is not taking advantage of any ZFS features. To create a dataset on this pool with compression enabled:

# zfs create example/compressed # zfs set compression=gzip example/compressed

The example/compressed dataset is now a ZFS compressed file system. Try copying some large files to /example/compressed.

Compression can be disabled with:

# zfs set compression=off example/compressed

To unmount a file system, issue the following command and then verify by using df:

# zfs umount example/compressed # df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235232 1628716 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example

To re-mount the file system to make it accessible again, and verify with df:

# zfs mount example/compressed # df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed

The pool and file system may also be observed by viewing the output from mount:

# mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) example on /example (zfs, local) example/data on /example/data (zfs, local) example/compressed on /example/compressed (zfs, local)

ZFS datasets, after creation, may be used like any file systems. However, many other features are available which can be set on a per-dataset basis. In the following example, a new file system, data is created. Important files will be stored here, the file system is set to keep two copies of each data block:

# zfs create example/data # zfs set copies=2 example/data

It is now possible to see the data and space utilization by issuing df:

# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed example/data 17547008 0 17547008 0% /example/data

Notice that each file system on the pool has the same amount of available space. This is the reason for using df in these examples, to show that the file systems use only the amount of space they need and all draw from the same pool. The ZFS file system does away with concepts such as volumes and partitions, and allows for several file systems to occupy the same pool.

To destroy the file systems and then destroy the pool as they are no longer needed:

# zfs destroy example/compressed # zfs destroy example/data # zpool destroy example

21.2.3.2. ZFS RAID-Z

There is no way to prevent a disk from failing. One method of avoiding data loss due to a failed hard disk is to implement RAID. ZFS supports this feature in its pool design. RAID-Z pools require 3 or more disks but yield more usable space than mirrored pools.

To create a RAID-Z pool, issue the following command and specify the disks to add to the pool:

# zpool create storage raidz da0 da1 da2

Note:

Sun™ recommends that the number of devices used in a RAID-Z configuration is between three and nine. For environments requiring a single pool consisting of 10 disks or more, consider breaking it up into smaller RAID-Z groups. If only two disks are available and redundancy is a requirement, consider using a ZFS mirror. Refer to zpool(8) for more details.

This command creates the storage zpool. This may be verified using mount(8) and df(1). This command makes a new file system in the pool called home:

# zfs create storage/home

It is now possible to enable compression and keep extra copies of directories and files using the following commands:

# zfs set copies=2 storage/home # zfs set compression=gzip storage/home

To make this the new home directory for users, copy the user data to this directory, and create the appropriate symbolic links:

# cp -rp /home/* /storage/home # rm -rf /home /usr/home # ln -s /storage/home /home # ln -s /storage/home /usr/home

Users should now have their data stored on the freshly created /storage/home. Test by adding a new user and logging in as that user.

Try creating a snapshot which may be rolled back later:

# zfs snapshot storage/home@08-30-08

Note that the snapshot option will only capture a real file system, not a home directory or a file. The @ character is a delimiter used between the file system name or the volume name. When a user's home directory gets trashed, restore it with:

# zfs rollback storage/home@08-30-08

To get a list of all available snapshots, run ls in the file system's .zfs/snapshot directory. For example, to see the previously taken snapshot:

# ls /storage/home/.zfs/snapshot

It is possible to write a script to perform regular snapshots on user data. However, over time, snapshots may consume a great deal of disk space. The previous snapshot may be removed using the following command:

# zfs destroy storage/home@08-30-08

After testing, /storage/home can be made the real /home using this command:

# zfs set mountpoint=/home storage/home

Run df and mount to confirm that the system now treats the file system as the real /home:

# mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) storage on /storage (zfs, local) storage/home on /home (zfs, local) # df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235240 1628708 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032826 48737618 2% /usr storage 26320512 0 26320512 0% /storage storage/home 26320512 0 26320512 0% /home

This completes the RAID-Z configuration. To get status updates about the file systems created during the nightly periodic(8) runs, issue the following command:

# echo 'daily_status_zfs_enable="YES"' >> /etc/periodic.conf

21.2.3.3. Recovering RAID-Z

Every software RAID has a method of monitoring its state. The status of RAID-Z devices may be viewed with the following command:

# zpool status -x

If all pools are healthy and everything is normal, the following message will be returned:

all pools are healthy

If there is an issue, perhaps a disk has gone offline, the pool state will look similar to:

pool: storage state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 OFFLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors

This indicates that the device was previously taken offline by the administrator using the following command:

# zpool offline storage da1

It is now possible to replace da1 after the system has been powered down. When the system is back online, the following command may issued to replace the disk:

# zpool replace storage da1

From here, the status may be checked again, this time without the -x flag to get state information:

# zpool status storage pool: storage state: ONLINE scrub: resilver completed with 0 errors on Sat Aug 30 19:44:11 2008 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors

As shown from this example, everything appears to be normal.

21.2.3.4. Data Verification

ZFS uses checksums to verify the integrity of stored data. These are enabled automatically upon creation of file systems and may be disabled using the following command:

# zfs set checksum=off storage/home

Doing so is not recommended as checksums take very little storage space and are used to check data integrity using checksum verification in a process is known as scrubbing. To verify the data integrity of the storage pool, issue this command:

# zpool scrub storage

This process may take considerable time depending on the amount of data stored. It is also very I/O intensive, so much so that only one scrub may be run at any given time. After the scrub has completed, the status is updated and may be viewed by issuing a status request:

# zpool status storage pool: storage state: ONLINE scrub: scrub completed with 0 errors on Sat Jan 26 19:57:37 2013 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors

The completion time is displayed and helps to ensure data integrity over a long period of time.

Refer to zfs(8) and zpool(8) for other ZFS options.

21.2.4. zpool Administration

21.2.4.1. Creating & Destroying Storage Pools

21.2.4.2. Adding & Removing Devices

21.2.4.3. Dealing with Failed Devices

21.2.4.4. Importing & Exporting Pools

21.2.4.5. Upgrading a Storage Pool

21.2.4.6. Checking the Status of a Pool

21.2.4.7. Performance Monitoring

21.2.4.8. Splitting a Storage Pool

21.2.5. zfs Administration

21.2.5.1. Creating & Destroying Datasets

21.2.5.2. Creating & Destroying Volumes

21.2.5.3. Renaming a Dataset

21.2.5.4. Setting Dataset Properties

21.2.5.5. Managing Snapshots

21.2.5.6. Managing Clones

21.2.5.7. ZFS Replication

21.2.5.8. Dataset, User and Group Quotes

To enforce a dataset quota of 10 GB for storage/home/bob, use the following:

# zfs set quota=10G storage/home/bob

To enforce a reference quota of 10 GB for storage/home/bob, use the following:

# zfs set refquota=10G storage/home/bob

The general format is userquota@user=size, and the user's name must be in one of the following formats:

  • POSIX compatible name such as joe.

  • POSIX numeric ID such as 789.

  • SID name such as joe.bloggs@example.com.

  • SID numeric ID such as S-1-123-456-789.

For example, to enforce a user quota of 50 GB for a user named joe, use the following:

# zfs set userquota@joe=50G

To remove the quota or make sure that one is not set, instead use:

# zfs set userquota@joe=none

Note:

User quota properties are not displayed by zfs get all. Non-root users can only see their own quotas unless they have been granted the userquota privilege. Users with this privilege are able to view and set everyone's quota.

The general format for setting a group quota is: groupquota@group=size.

To set the quota for the group firstgroup to 50 GB, use:

# zfs set groupquota@firstgroup=50G

To remove the quota for the group firstgroup, or to make sure that one is not set, instead use:

# zfs set groupquota@firstgroup=none

As with the user quota property, non-root users can only see the quotas associated with the groups that they belong to. However, root or a user with the groupquota privilege can view and set all quotas for all groups.

To display the amount of space consumed by each user on the specified filesystem or snapshot, along with any specified quotas, use zfs userspace. For group information, use zfs groupspace. For more information about supported options or how to display only specific options, refer to zfs(1).

Users with sufficient privileges and root can list the quota for storage/home/bob using:

# zfs get quota storage/home/bob

21.2.5.9. Reservations

The general format of the reservation property is reservation=size, so to set a reservation of 10 GB on storage/home/bob, use:

# zfs set reservation=10G storage/home/bob

To make sure that no reservation is set, or to remove a reservation, use:

# zfs set reservation=none storage/home/bob

The same principle can be applied to the refreservation property for setting a refreservation, with the general format refreservation=size.

To check if any reservations or refreservations exist on storage/home/bob, execute one of the following commands:

# zfs get reservation storage/home/bob # zfs get refreservation storage/home/bob

21.2.5.10. Compression

21.2.5.11. Deduplication

21.2.5.12. Delegated Administration

21.2.6. ZFS Advanced Topics

21.2.6.1. ZFS Tuning

21.2.6.2. Booting Root on ZFS

21.2.6.3. ZFS Boot Environments

21.2.6.4. Troubleshooting

21.2.6.5. ZFS on i386

Some of the features provided by ZFS are RAM-intensive, so some tuning may be required to provide maximum efficiency on systems with limited RAM.

21.2.6.5.1. Memory

At a bare minimum, the total system memory should be at least one gigabyte. The amount of recommended RAM depends upon the size of the pool and the ZFS features which are used. A general rule of thumb is 1GB of RAM for every 1TB of storage. If the deduplication feature is used, a general rule of thumb is 5GB of RAM per TB of storage to be deduplicated. While some users successfully use ZFS with less RAM, it is possible that when the system is under heavy load, it may panic due to memory exhaustion. Further tuning may be required for systems with less than the recommended RAM requirements.

21.2.6.5.2. Kernel Configuration

Due to the RAM limitations of the i386™ platform, users using ZFS on the i386™ architecture should add the following option to a custom kernel configuration file, rebuild the kernel, and reboot:

options KVA_PAGES=512

This option expands the kernel address space, allowing the vm.kvm_size tunable to be pushed beyond the currently imposed limit of 1 GB, or the limit of 2 GB for PAE. To find the most suitable value for this option, divide the desired address space in megabytes by four (4). In this example, it is 512 for 2 GB.

21.2.6.5.3. Loader Tunables

The kmem address space can be increased on all FreeBSD architectures. On a test system with one gigabyte of physical memory, success was achieved with the following options added to /boot/loader.conf, and the system restarted:

vm.kmem_size="330M" vm.kmem_size_max="330M" vfs.zfs.arc_max="40M" vfs.zfs.vdev.cache.size="5M"

For a more detailed list of recommendations for ZFS-related tuning, see http://wiki.freebsd.org/ZFSTuningGuide.

21.2.7. Additional Resources

This, and other documents, can be downloaded from http://ftp.FreeBSD.org/pub/FreeBSD/doc/

For questions about FreeBSD, read the documentation before contacting <questions@FreeBSD.org>.

For questions about this documentation, e-mail <doc@FreeBSD.org>.