ZFS - zfs
▶ Watch on YouTube (opens in a new tab)
ZFS combines both physical volume management and a file system. A ZFS installation can span across a series of storage devices and is very scalable, allowing you to add disks to expand the available space in the storage pool immediately.
ZFS is a block-based file system that protects against data corruption by using checksums to verify, confirm and correct every operation. To run at a sufficient speed, this mechanism requires a powerful environment with a lot of RAM.
In addition, ZFS offers snapshots and replication, RAID management, copy-on-write clones, compression and other features.
To use ZFS, make sure you have zfsutils-linux
installed on your machine.
Terminology
ZFS creates logical units based on physical storage devices. These logical units are called ZFS pools or zpools. Each zpool is then divided into a number of datasets. These datasets can be of different types:
- A ZFS filesystem can be seen as a partition or a mounted file system.
- A ZFS volume represents a block device.
- A ZFS snapshot captures a specific state of either a ZFS filesystem or a ZFS volume. ZFS snapshots are read-only.
- A ZFS clone is a writable copy of a ZFS snapshot.
zfs
driver in LXD
The zfs
driver in LXD uses ZFS filesystems and ZFS volumes for images and custom storage volumes, and ZFS snapshots and clones to create instances from images and for instance and custom volume snapshots. By default, LXD enables compression when creating a ZFS pool.
LXD assumes that it has full control over the ZFS pool and dataset. Therefore, you should never maintain any datasets or file system entities that are not owned by LXD in a ZFS pool or dataset, because LXD might delete them.
Due to the way copy-on-write works in ZFS, parent ZFS filesystems can't be removed until all children are gone. As a result, LXD automatically renames any objects that are removed but still referenced. Such objects are kept at a random deleted/
path until all references are gone and the object can safely be removed. Note that this method might have ramifications for restoring snapshots. See Limitations below.
LXD automatically enables trimming support on all newly created pools on ZFS 0.8 or later. This increases the lifetime of SSDs by allowing better block re-use by the controller, and it also allows to free space on the root file system when using a loop-backed ZFS pool. If you are running a ZFS version earlier than 0.8 and want to enable trimming, upgrade to at least version 0.8. Then use the following commands to make sure that trimming is automatically enabled for the ZFS pool in the future and trim all currently unused space:
zpool upgrade ZPOOL-NAME
zpool set autotrim=on ZPOOL-NAME
zpool trim ZPOOL-NAME
Limitations
The zfs
driver has the following limitations:
Restoring from older snapshots
ZFS doesn't support restoring from snapshots other than the latest one. You can, however, create new instances from older snapshots. This method makes it possible to confirm whether a specific snapshot contains what you need. After determining the correct snapshot, you can remove the newer snapshots so that the snapshot you need is the latest one and you can restore it.
Alternatively, you can configure LXD to automatically discard the newer snapshots during restore. To do so, set the zfs.remove_snapshots
configuration for the volume (or the corresponding volume.zfs.remove_snapshots
configuration on the storage pool for all volumes in the pool).
Note, however, that if zfs.clone_copy
is set to true
, instance copies use ZFS snapshots too. In that case, you cannot restore an instance to a snapshot taken before the last copy without having to also delete all its descendants. If this is not an option, you can copy the wanted snapshot into a new instance and then delete the old instance. You will, however, lose any other snapshots the instance might have had.
Observing I/O quotas
I/O quotas are unlikely to affect ZFS filesystems very much. That's because ZFS is a port of a Solaris module (using SPL) and not a native Linux file system using the Linux VFS API, which is where I/O limits are applied.
Feature support in ZFS
Some features, like the use of idmaps or delegation of a ZFS dataset, require ZFS 2.2 or higher and are therefore not widely available yet.
Quotas
ZFS provides two different quota properties: quota
and refquota
. quota
restricts the total size of a dataset, including its snapshots and clones. refquota
restricts only the size of the data in the dataset, not its snapshots and clones.
By default, LXD uses the quota
property when you set up a size/quota for your storage volume. If you want to use the refquota
property instead, set the zfs.use_refquota
configuration for the volume (or the corresponding volume.zfs.use_refquota
configuration on the storage pool for all volumes in the pool).
You can also set the zfs.reserve_space
(or volume.zfs.reserve_space
) configuration to use ZFS reservation
or refreservation
along with quota
or refquota
.
Configuration options
The following configuration options are available for storage pools that use the zfs
driver and for storage volumes in these pools.
Storage pool configuration
size
Size of the storage pool (for loop-based pools)
Key: size
Type: string
Default: auto (20% of free disk space, >= 5 GiB and <= 30 GiB)
Scope: local
When creating loop-based pools, specify the size in bytes (suffixes are supported). You can increase the size to grow the storage pool.
The default (auto
) creates a storage pool that uses 20% of the free disk space, with a minimum of 5 GiB and a maximum of 30 GiB.
source
Path to an existing block device, loop file, or ZFS dataset/pool
Key: source
Type: string
Scope: local
source.wipe
Whether to wipe the block device before creating the pool
Key: source.wipe
Type: bool
Default: false
Scope: local
Set this option to true
to wipe the block device specified in source
prior to creating the storage pool.
zfs.clone_copy
Whether to use ZFS lightweight clones
Key: zfs.clone_copy
Type: string
Default: true
Scope: global
Set this option to true
or false
to enable or disable using ZFS lightweight clones rather than full dataset copies. Set the option to rebase
to copy based on the initial image.
zfs.export
Whether to export the zpool when an unmount is being performed
Key: zfs.export
Type: bool
Default: true
Scope: global
zfs.pool_name
Name of the zpool
Key: zfs.pool_name
Type: string
Default: name of the pool
Scope: local
Tip: In addition to these configurations, you can also set default values for the storage volume configurations. See Configure default values for storage volumes.
Storage volume configuration
block.filesystem
File system of the storage volume
Key: block.filesystem
Type: string
Default: same as volume.block.filesystem
Condition: block-based volume with content type filesystem
(zfs.block_mode
enabled)
Scope: global
Valid options are: btrfs
, ext4
, xfs
If not set, ext4
is assumed.
block.mount_options
Mount options for block-backed file system volumes
Key: block.mount_options
Type: string
Default: same as volume.block.mount_options
Condition: block-based volume with content type filesystem
(zfs.block_mode
enabled)
Scope: global
security.shared
Enable volume sharing
Key: security.shared
Type: bool
Default: same as volume.security.shared
or false
Condition: virtual-machine or custom block volume
Scope: global
Enabling this option allows sharing the volume across multiple instances despite the possibility of data loss.
security.shifted
Enable ID shifting overlay
Key: security.shifted
Type: bool
Default: same as volume.security.shifted
or false
Condition: custom volume
Scope: global
Enabling this option allows attaching the volume to multiple isolated instances.
security.unmapped
Disable ID mapping for the volume
Key: security.unmapped
Type: bool
Default: same as volume.security.unmappped
or false
Condition: custom volume
Scope: global
size
Size/quota of the storage volume
Key: size
Type: string
Default: same as volume.size
Condition: appropriate driver
Scope: global
snapshots.expiry
When snapshots are to be deleted
Key: snapshots.expiry
Type: string
Default: same as volume.snapshots.expiry
Condition: custom volume
Scope: global
Specify an expression like 1M 2H 3d 4w 5m 6y
.
snapshots.pattern
Template for the snapshot name
Key: snapshots.pattern
Type: string
Default: same as volume.snapshots.pattern
or snap%d
Condition: custom volume
Scope: global
You can specify a naming template that is used for scheduled snapshots and unnamed snapshots.
The snapshots.pattern
option takes a Pongo2 template string to format the snapshot name.
To add a time stamp to the snapshot name, use the Pongo2 context variable creation_date
. Make sure to format the date in your template string to avoid forbidden characters in the snapshot name. For example, set snapshots.pattern
to {{ creation_date|date:'2006-01-02_15-04-05' }}
to name the snapshots after their time of creation, down to the precision of a second.
Another way to avoid name collisions is to use the placeholder %d
in the pattern. For the first snapshot, the placeholder is replaced with 0
. For subsequent snapshots, the existing snapshot names are taken into account to find the highest number at the placeholder's position. This number is then incremented by one for the new name.
snapshots.schedule
Schedule for automatic volume snapshots
Key: snapshots.schedule
Type: string
Default: same as snapshots.schedule
Condition: custom volume
Scope: global
Specify either a cron expression (<minute> <hour> <dom> <month> <dow>
), a comma-separated list of schedule aliases (@hourly
, @daily
, @midnight
, @weekly
, @monthly
, @annually
, @yearly
), or leave empty to disable automatic snapshots (the default).
volatile.idmap.last
JSON-serialized UID/GID map that has been applied to the volume
Key: volatile.idmap.last
Type: string
Condition: filesystem
volatile.idmap.next
JSON-serialized UID/GID map that has been applied to the volume
Key: volatile.idmap.next
Type: string
Condition: filesystem
volatile.uuid
The volume's UUID
Key: volatile.uuid
Type: string
Default: random UUID
Scope: global
zfs.block_mode
Whether to use a formatted zvol
rather than a dataset
Key: zfs.block_mode
Type: bool
Default: same as volume.zfs.block_mode
Scope: global
zfs.block_mode
can be set only for custom storage volumes. To enable ZFS block mode for all storage volumes in the pool, including instance volumes, use volume.zfs.block_mode
.
zfs.blocksize
Size of the ZFS block
Key: zfs.blocksize
Type: string
Default: same as volume.zfs.blocksize
Scope: global
The size must be between 512 bytes and 16 MiB and must be a power of 2. For a block volume, a maximum value of 128 KiB will be used even if a higher value is set.
Depending on the value of zfs.block_mode
, the specified size is used to set either volblocksize
or recordsize
in ZFS.
zfs.delegate
Whether to delegate the ZFS dataset
Key: zfs.delegate
Type: bool
Default: same as volume.zfs.delegate
Condition: ZFS 2.2 or higher
Scope: global
This option controls whether to delegate the ZFS dataset and anything underneath it to the container or containers that use it. When used in conjunction with security.nesting
, this allows using the zfs
command in the container.
zfs.remove_snapshots
Remove snapshots as needed
Key: zfs.remove_snapshots
Type: bool
Default: same as volume.zfs.remove_snapshots
or false
Scope: global
zfs.reserve_space
Use reservation
/refreservation
along with quota
/refquota
Key: zfs.reserve_space
Type: bool
Default: same as volume.zfs.reserve_space
or false
Scope: global
zfs.use_refquota
Use refquota
instead of quota
for space
Key: zfs.use_refquota
Type: bool
Default: same as volume.zfs.use_refquota
or false
Scope: global
Storage bucket configuration
To enable storage buckets for local storage pool drivers and allow applications to access the buckets via the S3 protocol, you must configure the core.storage_buckets_address
server setting.
size
Size/quota of the storage bucket
Key: size
Type: string
Default: same as volume.size
Condition: appropriate driver
Scope: local