Administering ZFS

What is ZFS?

ZFS is an advanced, enterprise grade COW (Copy on Write) filesystem and volume manager. It has many features which make it easier to adminster than traditional UNIX filesystems+volume managers.

Key Features

  1. Online filesystem/volume resizing.
  2. Realtime data deduplication and compression.
  3. Filesystem snapshots.
  4. Checksum all the things.
  5. Volume management and RAID.
Naming conventions.

ZFS refers to objects at the filesystem level as ZFS Datasets.
ZFS refers to storage volumes (or pools) as Zpools.

Online filesystem resizing.

Since in their default state all ZFS datasets have access to the entire Zpool which they are created under you simply need to set quotas when creating new datasets. This makes it extremely trivial to resize a dataset later.

Resizing a Zpool is also fairly straight forward, you simply need to replace one of the disks in the storage pool at a time and ZFS automagically resizes the Zpool.

In the example below we will increase the size of mirrored (RAID 1) Zpool by replacing the disks:

root@sandbox.donthurt.us:~ # zpool status
pool: datastor
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
datastor ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0

errors: No known data errors
root@sandbox.donthurt.us:~ # zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
datastor 1008M 65K 1008M - 0% 0% 1.00x ONLINE -

Replace each disk individually and allow the pool to resilver between disks:

root@sandbox.donthurt.us:~ # zpool replace datastor /root/zfs-1 /root/zfs-7
root@sandbox.donthurt.us:~ # zpool status
pool: datastor
state: ONLINE
scan: resilvered 59.5K in 0h0m with 0 errors on Fri May 6 14:52:32 2016
config:

NAME STATE READ WRITE CKSUM
datastor ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-7 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0

errors: No known data errors
root@sandbox.donthurt.us:~ # zpool replace datastor /root/zfs-2 /root/zfs-8

If the autoexpand flag was turned on for this zpool it would have automatically expanded the pool size after the 2nd disk was added. If not you have to issue the following command to expand the Zpool.

root@sandbox.donthurt.us:~ # zpool online -e datastor /root/zfs-8 /root/zfs-7
root@sandbox.donthurt.us:~ # zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
datastor 1.98G 150K 1.98G - 0% 0% 1.00x ONLINE -

Resizing a ZFS dataset is even easier if it wasn’t created as a block device:

root@sandbox.donthurt.us:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
datastor 218K 1.92G 19.5K none
datastor/backup 19K 1.92G 19K /backup
datastor/codex 19K 1.92G 19K none
datastor/test 19K 1.92G 19K none
root@sandbox.donthurt.us:~ # zfs set quota=500M datastor/backup
root@sandbox.donthurt.us:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
datastor 220K 1.92G 19.5K none
datastor/backup 19K 500M 19K /backup
datastor/codex 19K 1.92G 19K none
datastor/test 19K 1.92G 19K none
Data Deduplication and compression.

Deduplication by default is turned off on all ZFS datasets, this is due to the amount of RAM that is consumed while using that feature, for maximum performance ZFS stores its deduplication table in memory. As a rule of thumb, ZFS likes about 5GB of RAM for each TB of deduped data. Keep in mind that is the amount of data being deduplicated, not the size of the dataset.

Enabling deduplication is simple, simply turn on the flag:

root@sandbox.donthurt.us:~ # zfs set dedup=on datastor/backup

ZFS can take advantage of a few different compression algorithms, LZ4, LZE, LZJB and GZIP[1-9]. LZ4 provides the best speed to compression ratio of the algorithms avaliable.

Enabling compression is done the same way:

root@sandbox.donthurt.us:~ # zfs set compression=lz4 datastor/backup

Deduplication and compression statistics can be gathered using the following commands:

root@sandbox.donthurt.us:~ # zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
datastor 1.98G 58.0M 1.93G - 4% 2% 2.46x ONLINE -
root@sandbox.donthurt.us:~ # zfs get compressratio datastor/backup
NAME PROPERTY VALUE SOURCE
datastor/backup compressratio 1.77x -
Snapshots

Snapshots are just what they sound like, its a read-only copy of a ZFS dataset or datasets. Snapshots don’t consume any space since they only contain changes made to the parent dataset. Snapshots can be manipulated in many ways, for example, they can be sent from one dataset to another or even from one server to another.

To take a snapshot of a dataset issue the following command:

root@sandbox.donthurt.us:~ # zfs snapshot datastor/backup@$(date +%s)
root@sandbox.donthurt.us:~ # zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
datastor/backup@1462563611 0 - 103M -

Creating a new dataset from a snapshot:

root@sandbox.donthurt.us:~ # zfs send datastor/backup@1462563611 | zfs receive datastor/backup-snap
root@sandbox.donthurt.us:~ # zfs set mountpoint=/backup2 datastor/backup-snap
root@sandbox.donthurt.us:~ # ll /backup2
total 9.5K
drwxr-xr-x. 4 root root 4 May 6 15:21 ./
dr-xr-xr-x. 25 root root 4.0K May 6 15:54 ../
drwx--x--x. 24 donthurt donthurt 42 Mar 31 02:07 donthurt1/
drwx--x--x. 24 donthurt donthurt 42 Mar 31 02:07 donthurt2/

Sending and receiving snapshots between hosts:

root@sandbox.donthurt.us:~ # zfs send datastor/backup@1462563611 | ssh root@sandbox1.donthurt.us zfs receive datastor/backup-remote

<br></br>

Checksums and data integrity

Every block that is allocated is checksummed and the checksums are checksummed. The checksum algorithm can be set per ZFS dataset; available checksum algorithms are fletcher2, fletcher4 and sha256. The checksum of each block is transparently validated as it is read, allowing ZFS to detect silent corruption. If the data that is read does not match the expected checksum, ZFS will attempt to recover the data from any available redundancy, like mirrors or RAID-Z.
Validation of all checksums can be triggered with the scrub command.

To change the checksum algorithm on a ZFS dataset, issue the following command:

root@sandbox.donthurt.us:~ # zfs set checksum=sha256 datastor/backup
root@sandbox.donthurt.us:~ # zfs get checksum
NAME PROPERTY VALUE SOURCE
datastor checksum on default
datastor/backup checksum sha256 local
datastor/backup@1462563611 checksum - -
datastor/backup-snap checksum on default
datastor/backup-snap@1462563611 checksum - -
datastor/codex checksum on default
datastor/test checksum on default
Volume Management and RAID
ZFS supports many different RAID configurations.

RAID-0. Striped across all physical disks, no redunancy:

root@sandbox.donthurt.us:~ # zpool create zstripe /root/zfs-1 /root/zfs-2 /root/zfs-3 /root/zfs-4 /root/zfs-5
root@sandbox.donthurt.us:~ # zpool status
pool: zstripe
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zstripe ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0
/root/zfs-4 ONLINE 0 0 0
/root/zfs-5 ONLINE 0 0 0

RAID-1 Mirrored across all physical disks, 1:1 redundancy:

root@sandbox.donthurt.us:~ # zpool create zmirror mirror /root/zfs-1 /root/zfs-2
root@sandbox.donthurt.us:~ # zpool status
pool: zmirror
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zmirror ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0

RAIDZ Similar to RAID5, 2:1 data to parity:

root@sandbox.donthurt.us:~ # zpool create zraidz raidz /root/zfs-1 /root/zfs-2 /root/zfs-3
root@sandbox.donthurt.us:~ # zpool status
pool: zraidz
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zraidz ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0

RAIDZ-2 Similar to RAID6, 3:2 data to parity:

root@sandbox.donthurt.us:~ # zpool create zraid2z raidz2 /root/zfs-1 /root/zfs-2 /root/zfs-3 /root/zfs-4 /root/zfs-5 /root/zfs-6
root@sandbox.donthurt.us:~ # zpool status
pool: zraid2z
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zraid2z ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0
/root/zfs-4 ONLINE 0 0 0
/root/zfs-5 ONLINE 0 0 0
/root/zfs-6 ONLINE 0 0 0

RAID-10 Striped mirrors, offers 1:1 redunancy per mirror pair and speed increase from striping:

root@sandbox.donthurt.us:~ # zpool create zraid10 mirror /root/zfs-1 /root/zfs-2 mirror /root/zfs-3 /root/zfs-4
root@sandbox.donthurt.us:~ # zpool status
pool: zraid10
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zraid10 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0
/root/zfs-4 ONLINE 0 0 0

ZFS can also take advantage of an SSD drive for read/write caching. ZFS uses the ARC or Adaptive Replacement Cache which stores frequently used data in memory for faster access, the L2ARC is similar to this except it uses a high speed disk such as an SSD to store data which is read frequently.

Conversely, the write cache called the ZIL or ZFS Intent Log caches writes temporarily and later flushes the log as a transactional write. Only writes 64K or smaller use the ZIL.

You can add a ZIL or L2ARC device to a Zpool at any time:

root@sandbox.donthurt.us:~ # zpool add zraid10 log /root/zfs-6
root@sandbox.donthurt.us:~ # zpool status
pool: zraid10
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zraid10 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0
/root/zfs-4 ONLINE 0 0 0
logs
/root/zfs-6 ONLINE 0 0 0
root@sandbox.donthurt.us:~ # zpool add zraid10 cache /root/zfs-6
root@sandbox.donthurt.us:~ # zpool status
pool: zraid10
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zraid10 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/root/zfs-1 ONLINE 0 0 0
/root/zfs-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/root/zfs-3 ONLINE 0 0 0
/root/zfs-4 ONLINE 0 0 0
cache
/root/zfs-6 ONLINE 0 0 0

Creating and Manipulating ZFS Datasets

By default if a mountpoint isn’t specified during Zpool creating, the Zpool is mounted to /ZPOOLNAME. You can specify a mountpoint -m at creation time like so:

root@sandbox.donthurt.us:~ # zpool create -m none zraid10 mirror /root/zfs-1 /root/zfs-2 mirror /root/zfs-3 /root/zfs-4 log /root/zfs-5

When creating ZFS datasets you will always prefix the Zpool you want the dataset created under. Datasets can also be nested and any child datasets will inherit the properties of the parent if not specified at creation time:

root@sandbox.donthurt.us:~ # zfs create -o mountpoint=/backup -o dedup=off -o compression=lz4 -o checksum=fletcher4 zraid10/backup
root@sandbox.donthurt.us:~ # zfs create zraid10/backup/michaelb
root@sandbox.donthurt.us:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
zraid10 128K 1.91G 19K none
zraid10/backup 38K 1.91G 19K /backup
zraid10/backup/michaelb 19K 1.91G 19K /backup/michaelb
root@sandbox.donthurt.us:~ # zfs create -o compression=gzip-9 zraid10/backup/michaelb/monthly
root@sandbox.donthurt.us:~ # zfs get compression
NAME PROPERTY VALUE SOURCE
zraid10 compression off default
zraid10/backup compression lz4 local
zraid10/backup/michaelb compression lz4 inherited from zraid10/backup
zraid10/backup/michaelb/monthly compression gzip-9 local

You can set and retreive properties from a ZFS dataset or Zpool using the set and get commands. You can get a list of all set properties by issuing the following command:

root@sandbox.donthurt.us:~ # zfs get all
root@sandbox.donthurt.us:~ # zpool get all

ZFS handles mounting and unmounting Zpools and datasets on its own, there is no need to add any datasets to /etc/fstab.

 

Installing ZFS

While ZFS is not officially supported in the mainline kernel due to licensing incompatibilities, the kernel module and userland utilities are easy to install:

- Ensure you are booted into the latest kernel -
root@sandbox.donthurt.us:~ # yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release$(rpm -E %dist).noarch.rpm
root@sandbox.donthurt.us:~ # yum install -y epel-release
root@sandbox.donthurt.us:~ # yum install -y kernel-headers zfs
root@sandbox.donthurt.us:~ # modprobe zfs

That should get you going with ZFS.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code