20.4. zfs Administration

The zfs utility is responsible for creating, destroying, and managing all ZFS datasets that exist within a pool. The pool is managed using zpool.

20.4.1. Creating and Destroying Datasets

Unlike traditional disks and volume managers, space in ZFS is not preallocated. With traditional file systems, once all of the space was partitioned and assigned, there was no way to add an additional file system without adding a new disk. With ZFS, new file systems can be created at any time. Each dataset has properties including features like compression, deduplication, caching and quoteas, as well as other useful properties like readonly, case sensitivity, network file sharing, and a mount point. Each separate dataset can be administered, delegated, replicated, snapshoted, jailed, and destroyed as a unit. There are many advantages to creating a separate dataset for each different type or set of files. The only drawbacks to having an extremely large number of datasets is that some commands like zfs list will be slower, and the mounting of hundreds or even thousands of datasets can slow the FreeBSD boot process.

Destroying a dataset is much quicker than deleting all of the files that reside on the dataset, as it does not invole scanning all of the files and updating all of the corresponding metadata. In modern versions of ZFS, zfs destroy is asynchronous, and the free space may take several minutes to appear in the pool. The freeing property, accessible with zpool get freeing poolname indicates how many datasets are having their blocks freed in the background. If there are child datasets, like snapshots or other datasets, then the parent cannot be destroyed. To destroy a dataset and all of its children, use -r to recursively destroy the dataset and all of its children. -n -v can be used to list datasets and snapshots that would be destroyed and, in the case of snapshots, how much space would be reclaimed by the actual destruction.

20.4.2. Creating and Destroying Volumes

A volume is a special type of ZFS dataset. Rather than being mounted as a file system, it is exposed as a block device under /dev/zvol/poolname/dataset. This allows the volume to be used for other file systems, to back the disks of a virtual machine, or to be exported using protocols like iSCSI or HAST.

A volume can be formatted with any file system. To the user, it will appear as if they are working with a regular disk using that specific filesystem and not ZFS. Putting ordinary file systems on ZFS volumes provides features those file systems would not normally have. For example, using the compression property on a 250 MB volume allows creation of a compressed FAT filesystem.

# zfs create -V 250m -o compression=on tank/fat32
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 258M  670M   31K /tank
# newfs_msdos -F32 /dev/zvol/tank/fat32
# mount -t msdosfs /dev/zvol/tank/fat32 /mnt
# df -h /mnt | grep fat32
Filesystem           Size Used Avail Capacity Mounted on
/dev/zvol/tank/fat32 249M  24k  249M     0%   /mnt
# mount | grep fat32
/dev/zvol/tank/fat32 on /mnt (msdosfs, local)

Destroying a volume is much the same as destroying a regular file system dataset. The operation is nearly instantaneous, but it may take several minutes for the free space to be reclaimed in the background.

20.4.3. Renaming a Dataset

The name of a dataset can be changed with zfs rename. rename can also be used to change the parent of a dataset. Renaming a dataset to be under a different parent dataset will change the value of those properties that are inherited by the child dataset. When a dataset is renamed, it is unmounted and then remounted in the new location (inherited from the parent dataset). This behavior can be prevented with -u. Due to the nature of snapshots, they cannot be renamed outside of the parent dataset. To rename a recursive snapshot, specify -r, and all snapshots with the same specified snapshot will be renamed.

20.4.4. Setting Dataset Properties

Each ZFS dataset has a number of properties that control its behavior. Most properties are automatically inherited from the parent dataset, but can be overridden locally. Set a property on a dataset with zfs set property=value dataset. Most properties have a limited set of valid values, zfs get will display each possible property and its valid values. Most properties can be reverted to their inherited values using zfs inherit.

It is possible to set user-defined properties. They become part of the dataset configuration and can be used to provide additional information about the dataset or its contents. To distinguish these custom properties from the ones supplied as part of ZFS, a colon (:) is used to create a custom namespace for the property.

# zfs set custom:costcenter=1234 tank
# zfs get custom:costcenter tank
NAME PROPERTY           VALUE SOURCE
tank custom:costcenter  1234  local

To remove a custom property, use zfs inherit with -r. If the custom property is not defined in any of the parent datasets, it will be removed completely (although the changes are still recorded in the pool's history).

# zfs inherit -r custom:costcenter tank
# zfs get custom:costcenter tank
NAME    PROPERTY           VALUE              SOURCE
tank    custom:costcenter  -                  -
# zfs get all tank | grep custom:costcenter
#

20.4.5. Managing Snapshots

Snapshots are one of the most powerful features of ZFS. A snapshot provides a point-in-time copy of the dataset. The parent dataset can be easily rolled back to that snapshot state. Create a snapshot with zfs snapshot dataset@snapshotname. Adding -r creates a snapshot recursively, with the same name on all child datasets.

Snapshots are mounted in a hidden directory under the parent dataset: .zfs/snapshots/snapshotname. Individual files can easily be restored to a previous state by copying them from the snapshot back to the parent dataset. It is also possible to revert the entire dataset back to the point-in-time of the snapshot using zfs rollback.

Snapshots consume space based on how much the parent file system has changed since the time of the snapshot. The written property of a snapshot tracks how much space is being used by the snapshot.

Snapshots are destroyed and the space reclaimed with zfs destroy dataset@snapshot. Adding -r recursively removes all snapshots with the same name under the parent dataset. Adding -n -v to the command displays a list of the snapshots that would be deleted and an estimate of how much space would be reclaimed without performing the actual destroy operation.

20.4.6. Managing Clones

A clone is a copy of a snapshot that is treated more like a regular dataset. Unlike a snapshot, a clone is not read only, is mounted, and can have its own properties. Once a clone has been created, the snapshot it was created from cannot be destroyed. The child/parent relationship between the clone and the snapshot can be reversed using zfs promote. After a clone has been promoted, the snapshot becomes a child of the clone, rather than of the original parent dataset. This will change how the space is accounted, but not actually change the amount of space consumed.

20.4.7. Replication

Keeping data on a single pool in one location exposes it to risks like theft, natural, and human disasters. Making regular backups of the entire pool is vital when data needs to be restored. ZFS provides a built-in serialization feature that can send a stream representation of the data to standard output. Using this technique, it is possible to not only store the data on another pool connected to the local system, but also to send it over a network to another system that runs ZFS . To achieve this replication, ZFS uses filesystem snapshots (see the section on ZFS snapshots) to send them from one location to another. The commands for this operation are zfs send and zfs receive, respectively.

The following examples will demonstrate the functionality of ZFS replication using these two pools:

# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  960M    77K   896M     0%  1.00x  ONLINE  -
mypool  984M  43.7M   940M     4%  1.00x  ONLINE  -

The pool named mypool is the primary pool where data is written to and read from on a regular basis. A second pool, backup is used as a standby in case the primary pool becomes unavailable. Note that this fail-over is not done automatically by ZFS, but rather must be done by a system administrator in the event that it is needed. Replication requires a snapshot to provide a consistent version of the file system to be transmitted. Once a snapshot of mypool has been created it can be copied to the backup pool. ZFS only replicates snapshots, changes since the most recent snapshot will not be replicated.

# zfs snapshot mypool@backup1
# zfs list -t snapshot
NAME                    USED  AVAIL  REFER  MOUNTPOINT
mypool@backup1             0      -  43.6M  -

Now that a snapshot exists, zfs send can be used to create a stream representing the contents of the snapshot, which can be stored as a file, or received by another pool. The stream will be written to standard output, which will need to be redirected to a file or pipe otherwise ZFS will produce an error:

# zfs send mypool@backup1
Error: Stream can not be written to a terminal.
You must redirect standard output.

To backup a dataset with zfs send, redirect to a file located on the mounted backup pool. First ensure that the pool has enough free space to accommodate the size of the snapshot being sendt, which means all of the data contained in the snapshot, not only the changes in that snapshot.

# zfs send mypool@backup1 > /backup/backup1
# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  960M  63.7M   896M     6%  1.00x  ONLINE  -
mypool  984M  43.7M   940M     4%  1.00x  ONLINE  -

The zfs send transferred all the data in the snapshot called backup1 to the pool named backup. Creating and sending these snapshots could be done automatically with a cron(8) job.

Instead of storing the backups as archive files, ZFS can receive them as a live file system, allowing the backed up data to be accessed directly. To get to the actual data contained in those streams, the reverse operation of zfs send must be used to transform the streams back into files and directories. The command is zfs receive. The example below combines zfs send and zfs receive using a pipe to copy the data from one pool to another. This way, the data can be used directly on the receiving pool after the transfer is complete. A dataset can only be replicated to an empty dataset.

# zfs snapshot mypool@replica1
# zfs send -v mypool@replica1 | zfs receive backup/mypool
send from @ to mypool@replica1 estimated size is 50.1M
total estimated size is 50.1M
TIME        SENT   SNAPSHOT

# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  960M  63.7M   896M     6%  1.00x  ONLINE  -
mypool  984M  43.7M   940M     4%  1.00x  ONLINE  -

20.4.7.1. Incremental Backups

zfs send can also determine the difference between two snapshots and only send the changes between the two. This results in saving disk space and transfer time. For example:

# zfs snapshot mypool@replica2
# zfs list -t snapshot
NAME                    USED  AVAIL  REFER  MOUNTPOINT
mypool@replica1         5.72M      -  43.6M  -
mypool@replica2             0      -  44.1M  -
# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  960M  61.7M   898M     6%  1.00x  ONLINE  -
mypool  960M  50.2M   910M     5%  1.00x  ONLINE  -

A second snapshot called replica2 was created. This second snapshot contains only the changes on the ZFS filesystem between now and the previous snapshot, replica1. Using -i with zfs send and indicating the pair of snapshots, an incremental replica stream can be generated, containing only the data that has changed. This can only succeed if the initial snapshot already exists on the receiving side.

# zfs send -v -i mypool@replica1 mypool@replica2 | zfs receive /backup/mypool
send from @replica1 to mypool@replica2 estimated size is 5.02M
total estimated size is 5.02M
TIME        SENT   SNAPSHOT

# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  960M  80.8M   879M     8%  1.00x  ONLINE  -
mypool  960M  50.2M   910M     5%  1.00x  ONLINE  -

# zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
backup                      55.4M   240G   152K  /backup
backup/mypool               55.3M   240G  55.2M  /backup/mypool
mypool                      55.6M  11.6G  55.0M  /mypool

# zfs list -t snapshot
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
backup/mypool@replica1                       104K      -  50.2M  -
backup/mypool@replica2                          0      -  55.2M  -
mypool@replica1                             29.9K      -  50.0M  -
mypool@replica2                                 0      -  55.0M  -

The incremental stream was successfully transferred and only the data that has changed was replicated, rather than the entirety of replica1 and replica2 with both contain mostly the same data. The transmitted data only contains the differences, which took much less time to transfer and saves disk space by not copying the complete pool each time. This is useful when having to rely on slow networks or when costs per transferred byte have to be considered.

A new filesystem, backup/mypool is available and has all of the files and data from the pool mypool. If -P is specified, the properties of the dataset will be copied, including compression settings, quotas and mount points. If -R is specified all child datasets of the indicated dataset will be copied, along with all of their properties. Sending and receiving can be automated so that regular backups are created on the second pool.

20.4.7.2. Sending Encrypted Backups over SSH

Although sending streams to another system over the network is a good way to keep a remote backup, it does come with a drawback. Data sent over the network link is not encrypted, allowing anyone to intercept and transform the streams back into data without the knowledge of the sending user. This is undesirable, especially when sending the streams over the internet to a remote host. SSH can be used to securely encrypt data send over a network connection. Since ZFS only requires the stream to be redirected from standard output, it is relatively easy to pipe it through SSH. To keep the contents of the file system encrypted in transit and on the remote system, consider using PEFS.

A few settings and security precautions have to be made before this can be done. Since this chapter is about ZFS and not about configuring SSH, it only lists the things required to perform the zfs send operation. The following configuration is required:

  • Passwordless SSH access between sending and receiving host using SSH keys

  • Normally, the privileges of the root user are required to send and receive ZFS streams. This requires logging in to the receiving system as root. However, logging in as root is disabled by default for security reasons. The ZFS Delegation system can be used to allow a non-root user on each system to perform the respective send and receieve operations.

  • On the sending system:

    # zfs allow -u someuser send,snapshot mypool
  • In order for the pool to mounted, the unprivileged user must own the directory, and regular users must be allowed to mount file systems. On the receiving system:

    # sysctl vfs.usermount=1
    vfs.usermount: 0 -> 1
    # echo vfs.usermount=1 >> /etc/sysctl.conf
    # zfs create recvpool/backup
    # zfs allow -u someuser create,mount,receive recvpool/backup
    # chown someuser /recvpool/backup

The unprivileged user can now receieve and mount the replicated stream. Then the pool can be replicated:

% zfs snapshot -r mypool/home@monday
% zfs send -R mypool/home@monday | ssh someuser@backuphost zfs recv -dvu recvpool/backup

The first command creates a recursive snapshot (option -r) called monday of the filesystem dataset home that resides on the pool mypool. The second command uses zfs send with -R, which makes sure that the dataset and all child datasets are included in the transmitted data stream. This also includes snaphots, clones and settings on individual filesystems as well. The output is piped to the waiting zfs receive on the remote host backuphost via SSH. A fully qualified domain name or IP address should be used here. The receiving machine will write the data to backup dataset on the recvpool pool. Using -d with zfs recv will remove the original name of the pool on the receiving side and just takes the name of the snapshot instead. -u causes the filesystem(s) to not be mounted on the receiving side. When -v is included, more detail about the transfer is shown. Included are elapsed time and the amount of data transferred.

20.4.8. Dataset, User and Group Quotas

Dataset quotas are used to restrict the amount of space that can be consumed by a particular dataset. Reference Quotas work in very much the same way, but only count the space used by the dataset itself, excluding snapshots and child datasets. Similarly, user and group quotas can be used to prevent users or groups from using all of the space in the pool or dataset.

To enforce a dataset quota of 10 GB for storage/home/bob, use the following:

# zfs set quota=10G storage/home/bob

To enforce a reference quota of 10 GB for storage/home/bob, use the following:

# zfs set refquota=10G storage/home/bob

The general format is userquota@user=size, and the user's name must be in one of the following formats:

  • POSIX compatible name such as joe.

  • POSIX numeric ID such as 789.

  • SID name such as joe.bloggs@example.com.

  • SID numeric ID such as S-1-123-456-789.

For example, to enforce a user quota of 50 GB for the user named joe:

# zfs set userquota@joe=50G

To remove any quota:

# zfs set userquota@joe=none

Note:

User quota properties are not displayed by zfs get all. Non-root users can only see their own quotas unless they have been granted the userquota privilege. Users with this privilege are able to view and set everyone's quota.

The general format for setting a group quota is: groupquota@group=size.

To set the quota for the group firstgroup to 50 GB, use:

# zfs set groupquota@firstgroup=50G

To remove the quota for the group firstgroup, or to make sure that one is not set, instead use:

# zfs set groupquota@firstgroup=none

As with the user quota property, non-root users can only see the quotas associated with the groups that they belong to. However, root or a user with the groupquota privilege can view and set all quotas for all groups.

To display the amount of space used by each user on a filesystem or snapshot, along with any specified quotas, use zfs userspace. For group information, use zfs groupspace. For more information about supported options or how to display only specific options, refer to zfs(1).

Users with sufficient privileges and root can list the quota for storage/home/bob using:

# zfs get quota storage/home/bob

20.4.9. Reservations

Reservations guarantee a minimum amount of space will always be available on a dataset. The reserved space will not be available to any other dataset. This feature can be especially useful to ensure that free space is available for an important dataset or log files.

The general format of the reservation property is reservation=size, so to set a reservation of 10 GB on storage/home/bob, use:

# zfs set reservation=10G storage/home/bob

To clear any reservation:

# zfs set reservation=none storage/home/bob

The same principle can be applied to the refreservation property for setting a Reference Reservation, with the general format refreservation=size.

This command shows any reservations or refreservations that exist on storage/home/bob:

# zfs get reservation storage/home/bob
# zfs get refreservation storage/home/bob

20.4.10. Deduplication

When enabled, Deduplication uses the checksum of each block to detect duplicate blocks. When a new block is a duplicate of an existing block, ZFS writes an additional reference to the existing data instead of the whole duplicate block. Tremendous space savings are possible if the data contains many duplicated files or repeated information. Be warned: deduplication requires an extremely large amount of memory, and most of the space savings can be had without the extra cost by enabling compression instead.

To activate deduplication, set the dedup property on the target pool:

# zfs set dedup=on pool

Only new data being written to the pool will be deduplicated. Data that has already been written to the pool will not be deduplicated merely by activating this option. As such, a pool with a freshly activated deduplication property will look something like this example:

# zpool list
NAME  SIZE ALLOC  FREE CAP DEDUP HEALTH ALTROOT
pool 2.84G 2.19M 2.83G  0% 1.00x ONLINE -

The DEDUP column shows the actual rate of deduplication for the pool. A value of 1.00x shows that data has not been deduplicated yet. In the next example, the ports tree is copied three times into different directories on the deduplicated pool created above.

# zpool list
for d in dir1 dir2 dir3; do
for> mkdir $d && cp -R /usr/ports $d &
for> done

Redundant data is detected and deduplicated:

# zpool list
NAME SIZE  ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 2.84G 20.9M 2.82G 0% 3.00x ONLINE -

The DEDUP column now shows a factor of 3.00x. The multiple copies of the ports tree data was detected and deduplicated, taking only a third of the space. The potential for space savings can be enormous, but comes at the cost of having enough memory to keep track of the deduplicated blocks.

Deduplication is not always beneficial, especially when there is not much redundant data on a pool. ZFS can show potential space savings by simulating deduplication on an existing pool:

# zdb -S pool
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    2.58M    289G    264G    264G    2.58M    289G    264G    264G
     2     206K   12.6G   10.4G   10.4G     430K   26.4G   21.6G   21.6G
     4    37.6K    692M    276M    276M     170K   3.04G   1.26G   1.26G
     8    2.18K   45.2M   19.4M   19.4M    20.0K    425M    176M    176M
    16      174   2.83M   1.20M   1.20M    3.33K   48.4M   20.4M   20.4M
    32       40   2.17M    222K    222K    1.70K   97.2M   9.91M   9.91M
    64        9     56K   10.5K   10.5K      865   4.96M    948K    948K
   128        2   9.50K      2K      2K      419   2.11M    438K    438K
   256        5   61.5K     12K     12K    1.90K   23.0M   4.47M   4.47M
    1K        2      1K      1K      1K    2.98K   1.49M   1.49M   1.49M
 Total    2.82M    303G    275G    275G    3.20M    319G    287G    287G

dedup = 1.05, compress = 1.11, copies = 1.00, dedup * compress / copies = 1.16

After zdb -S finishes analyzing the pool, it shows the space reduction ratio that would be achieved by activating deduplication. In this case, 1.16 is a very poor ratio that is mostly influenced by compression. Activating deduplication on this pool would not save any significant amount of space. Using the formula dedup * compress / copies = deduplication ratio, system administrators can plan the storage allocation more towards having multiple copies of data or by having a decent compression rate in order to utilize the space savings that deduplication provides. As a rule of thumb, compression should be used before deduplication due to the much lower memory requirements.

20.4.11. Compression

ZFS provides transparent compression. Compressing data at the block level as it is written not only saves storage space, but can also result in higher disk throughput than would otherwise be possible. If data is compressed by 25%, then the compressed data can be written to the disk at the same rate as the uncompressed version, resulting in an effective write speed of 125% of what would normally be possible. Compression can also be a great alternative to Deduplication because it does not require additional memory to store a DDT.

ZFS offers a number of different compression algorithms to choose from, each with different trade-offs. With the introduction of LZ4 compression in ZFS v5000, it is possible to enable compression for the entire pool without the large performance trade-off of other algorithms. The biggest advantage to LZ4 is the early abort feature. If LZ4 does not achieve atleast 12.5% compression in the first part of the data, the block is written uncompressed to avoid wasting CPU cycles trying to compress data that is either already compressed or uncompressible. For details about the different compression algorithms available in ZFS, see the Compression entry in the terminology section.

The administrator can monitor the effectiveness of ZFS compression using a number of dataset properties.

# zfs get used,compressratio,compression,logicalused mypool/compressed_dataset
NAME        PROPERTY          VALUE     SOURCE
mypool/compressed_dataset  used              449G      -
mypool/compressed_dataset  compressratio     1.11x     -
mypool/compressed_dataset  compression       lz4       local
mypool/compressed_dataset  logicalused       496G      -

The dataset is currently using 449 GB of storage space (the used property). If this dataset was not compressed it would have taken 496 GB of space (the logicallyused property). This results in the 1.11:1 compression ratio.

Compression can have an unexpected side effect when combined with User Quotas. ZFS user quotas restrict how much space a user can consume on a dataset, however the measurements are based on how much data is stored, after compression. So if a user has a quota of 10 GB, and writes 10 GB of compressible data, they will still be able to store additional data. If they later update a file, say a database, with more or less compressible data, the amount of space available to them will change. This can result in the odd situation where a user did not increase the actual amount of data (the logicalused property), but the change in compression means they have now reached their quota.

Compression can have a similar unexpected interaction with backups. Quotas are often used to limit how much data can be stored to ensure there is sufficient backup space available. However since quotas do not consider compression, more data may be written than will fit in uncompressed backups.

20.4.12. ZFS and Jails

zfs jail and the corresponding jailed property are used to delegate a ZFS dataset to a Jail. zfs jail jailid attaches a dataset to the specified jail, and zfs unjail detaches it. For the dataset to be administered from within a jail, the jailed property must be set. Once a dataset is jailed, it can no longer be mounted on the host because the jail administrator may have set unacceptable mount points.

All FreeBSD documents are available for download at http://ftp.FreeBSD.org/pub/FreeBSD/doc/

Questions that are not answered by the documentation may be sent to <freebsd-questions@FreeBSD.org>.
Send questions about this document to <freebsd-doc@FreeBSD.org>.