Ceph RBD - ceph
▶ Watch on YouTube (opens in a new tab)
Ceph (opens in a new tab) is an open-source storage platform that stores its data in a storage cluster based on RADOS. It is highly scalable and, as a distributed system without a single point of failure, very reliable.
Tip
If you want to quickly set up a basic Ceph cluster, check out MicroCeph (opens in a new tab).
Ceph provides different components for block storage and for file systems.
Ceph RBD is Ceph’s block storage component that distributes data and workload across the Ceph cluster. It uses thin provisioning, which means that it is possible to over-commit resources.
Terminology
Ceph uses the term object for the data that it stores. The daemon that is responsible for storing and managing data is the Ceph OSD. Ceph’s storage is divided into pools, which are logical partitions for storing objects. They are also referred to as data pools, storage pools or OSD pools.
Ceph block devices are also called RBD images, and you can create snapshots and clones of these RBD images.
ceph
driver in LXD
Note
To use the Ceph RBD driver, you must specify it as
ceph
. This is slightly misleading, because it uses only Ceph RBD (block storage) functionality, not full Ceph functionality. For storage volumes with content typefilesystem
(images, containers and custom file-system volumes), theceph
driver uses Ceph RBD images with a file system on top (seeblock.filesystem
).
Alternatively, you can use the CephFS driver to create storage volumes with content type filesystem
.
Unlike other storage drivers, this driver does not set up the storage system but assumes that you already have a Ceph cluster installed.
This driver also behaves differently than other drivers in that it provides remote storage. As a result and depending on the internal network, storage access might be a bit slower than for local storage. On the other hand, using remote storage has big advantages in a cluster setup, because all cluster members have access to the same storage pools with the exact same contents, without the need to synchronize storage pools.
The ceph
driver in LXD uses RBD images for images, and snapshots and clones to create instances and snapshots.
LXD assumes that it has full control over the OSD storage pool. Therefore, you should never maintain any file system entities that are not owned by LXD in a LXD OSD storage pool, because LXD might delete them.
Due to the way copy-on-write works in Ceph RBD, parent RBD images can’t be removed until all children are gone. As a result, LXD automatically renames any objects that are removed but still referenced. Such objects are kept with a zombie_
prefix until all references are gone and the object can safely be removed.
Limitations
The ceph
driver has the following limitations:
Sharing custom volumes between instances
Custom storage volumes with content type filesystem
can usually be shared between multiple instances different cluster members. However, because the Ceph RBD driver "simulates" volumes with content type filesystem
by putting a file system on top of an RBD image, custom storage volumes can only be assigned to a single instance at a time. If you need to share a custom volume with content type filesystem
, use the CephFS driver instead.
Sharing the OSD storage pool between installations
Sharing the same OSD storage pool between multiple LXD installations is not supported.
Using an OSD pool of type "erasure"
To use a Ceph OSD pool of type "erasure", you must create the OSD pool beforehand. You must also create a separate OSD pool of type "replicated" that will be used for storing metadata. This is required because Ceph RBD does not support omap
. To specify which pool is "erasure coded", set the ceph.osd.data_pool_name
configuration option to the erasure coded pool name and the source
configuration option to the replicated pool name.
Configuration options
The following configuration options are available for storage pools that use the ceph
driver and for storage volumes in these pools.
Storage pool configuration
ceph.cluster_name
Name of the Ceph cluster in which to create new storage pools
Key: ceph.cluster_name
Type: string
Default: ceph
Scope: global
ceph.osd.data_pool_name
Name of the OSD data pool
Key: ceph.osd.data_pool_name
Type: string
Scope: global
ceph.osd.pg_num
Number of placement groups for the OSD storage pool
Key: ceph.osd.pg_num
Type: string
Default: 32
Scope: global
ceph.osd.pool_name
Name of the OSD storage pool
Key: ceph.osd.pool_name
Type: string
Default: name of the pool
Scope: global
ceph.osd.pool_size
Number of RADOS object replicas. Set to 1 for no replication.
Key: ceph.osd.pool_size
Type: string
Default: 3
This option specifies the name for the file metadata OSD pool that should be used when creating a file system automatically.
ceph.rbd.clone_copy
Whether to use RBD lightweight clones
Key: ceph.rbd.clone_copy
Type: bool
Default: true
Scope: global
Enable this option to use RBD lightweight clones rather than full dataset copies.
ceph.rbd.du
Whether to use RBD du
Key: ceph.rbd.du
Type: bool
Default: true
Scope: global
This option specifies whether to use RBD du
to obtain disk usage data for stopped instances.
ceph.rbd.features
Comma-separated list of RBD features to enable on the volumes
Key: ceph.rbd.features
Type: string
Default: layering
Scope: global
ceph.user.name
The Ceph user to use when creating storage pools and volumes
Key: ceph.user.name
Type: string
Default: admin
Scope: global
source
Existing OSD storage pool to use
Key: source
Type: string
Scope: local
volatile.pool.pristine
Whether the pool was empty on creation time
Key: volatile.pool.pristine
Type: string
Default: true
Scope: global
Tip
In addition to these configurations, you can also set default values for the storage volume configurations. See Configure default values for storage volumes.
Storage volume configuration
block.filesystem
File system of the storage volume
Key: block.filesystem
Type: string
Default: same as volume.block.filesystem
Condition: block-based volume with content type filesystem
Scope: global
Valid options are: btrfs
, ext4
, xfs
If not set, ext4
is assumed.
block.mount_options
Mount options for block-backed file system volumes
Key: block.mount_options
Type: string
Default: same as volume.block.mount_options
Condition: block-based volume with content type filesystem
Scope: global
security.shared
Enable volume sharing
Key: security.shared
Type: bool
Default: same as volume.security.shared
or false
Condition: virtual-machine or custom block volume
Scope: global
Enabling this option allows sharing the volume across multiple instances despite the possibility of data loss.
security.shifted
Enable ID shifting overlay
Key: security.shifted
Type: bool
Default: same as volume.security.shifted
or false
Condition: custom volume
Scope: global
Enabling this option allows attaching the volume to multiple isolated instances.
security.unmapped
Disable ID mapping for the volume
Key: security.unmapped
Type: bool
Default: same as volume.security.unmappped
or false
Condition: custom volume
Scope: global
size
Size/quota of the storage volume
Key: size
Type: string
Default: same as volume.size
Condition: appropriate driver
Scope: global
snapshots.expiry
When snapshots are to be deleted
Key: snapshots.expiry
Type: string
Default: same as volume.snapshots.expiry
Condition: custom volume
Scope: global
Specify an expression like 1M 2H 3d 4w 5m 6y
.
snapshots.pattern
Template for the snapshot name
Key: snapshots.pattern
Type: string
Default: same as volume.snapshots.pattern
or snap%d
Condition: custom volume
Scope: global
You can specify a naming template that is used for scheduled snapshots and unnamed snapshots.
The snapshots.pattern
option takes a Pongo2 template string to format the snapshot name.
To add a time stamp to the snapshot name, use the Pongo2 context variable creation_date
. Make sure to format the date in your template string to avoid forbidden characters in the snapshot name. For example, set snapshots.pattern
to {{ creation_date|date:'2006-01-02_15-04-05' }}
to name the snapshots after their time of creation, down to the precision of a second.
Another way to avoid name collisions is to use the placeholder %d
in the pattern. For the first snapshot, the placeholder is replaced with 0
. For subsequent snapshots, the existing snapshot names are taken into account to find the highest number at the placeholder’s position. This number is then incremented by one for the new name.
snapshots.schedule
Schedule for automatic volume snapshots
Key: snapshots.schedule
Type: string
Default: same as snapshots.schedule
Condition: custom volume
Scope: global
Specify either a cron expression (<minute> <hour> <dom> <month> <dow>
), a comma-separated list of schedule aliases (@hourly
, @daily
, @midnight
, @weekly
, @monthly
, @annually
, @yearly
), or leave empty to disable automatic snapshots (the default).
volatile.idmap.last
JSON-serialized UID/GID map that has been applied to the volume
Key: volatile.idmap.last
Type: string
Condition: filesystem
volatile.idmap.next
JSON-serialized UID/GID map that has been applied to the volume
Key: volatile.idmap.next
Type: string
Condition: filesystem
volatile.uuid
The volume’s UUID
Key: volatile.uuid
Type: string
Default: random UUID
Scope: global