Wednesday, March 28, 2007

Latest ZFS add-ons

ZFS boot support was just integrated (x86/x64 platform for now). It will be available in SXCE build 62.
Yes, we'll be able to boot directly from ZFS - that would definitely make life easier - no more hassle with separate partitions and their sizing, snapshots and clones for / and much easier live upgrade - those are just some examples. In b62 installer won't know about ZFS (yet) so some manual fiddling will be required to install system on ZFS.

Also in b62 gzip compression was integrated into ZFS (additional to ljzb) thanks to Adam Leventhal. It not only can save you lot of space transparently to application but in some workloads it can actually speed up disk access (if there's free CPU, disk IO is a bottleneck and data are good candidate for compression). We've been using zfs built-in compression (ljzb) for quite some time on LDAP servers - on disk database size reduced 2x and we've also gained some performance. It would be interesting to try ZFS/gzip.

Ditto block support for data blocks was integrated in b61. It means that we can set new property per fs basis (zfs set copies=N fs, N=1 by default) to instruct zfs to write N (1-3) copies of data regardless of a pool protection. Like with ditto blocks for meta data if your pool has more vdevs each copy will be on different disk.

ZFS support for iSCSI was integrated in b54. It greatly simplifies exposing ZVOLs via iSCSI in the same way sharenfs simplifies sharing file systems over nfs.

In case you haven't noticed 'zpool history' feature was integrated into b51. It stores zfs commands history in a pool itself so you can see what was happening.

Of course lots of bug and performance fixes were also integrated recently.

Monday, March 19, 2007

ZFS online replication

During last Christmas I was playing with ZFS code again and I figured out that adding online replication of ZFS file systems should be quite easy to implement. By online replication I mean one-to-one relation between two file systems, potentially on different servers, and all modifications done to one file system are asynchronously replicated to the other one with a small delay (like few seconds). Additionally one should be able to snapshot remote file system independently to get point-in-time copies and resume replication from automatically created snapshots on both ends at given intervals. The good thing is that once you're just few seconds behind you should get all transactions from memory so you get a remote copy of your file system without generating any additional IOs on a backuped one.

Due to some reasons I haven't done it myself rather I asked one of my developers to actually implement such tool and here we are :)

bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.13G 11.6G 24.5K /solaris
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00# zfs create solaris/d100
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.13G 11.6G 26.5K /solaris
solaris/d100 24.5K 11.6G 24.5K /solaris/d100
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00#


Now in another terminal:


bash-3.00# ./zreplicate send solaris/d100 | ./zreplicate receive solaris/d100-copy


Back to original terminal:


bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.13G 11.6G 26.5K /solaris
solaris/d100 24.5K 11.6G 24.5K /solaris/d100
solaris/d100-copy 24.5K 11.6G 24.5K /solaris/d100-copy
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00#
bash-3.00# cp /platform/i86pc/boot_archive /solaris/d100/
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.15G 11.6G 26.5K /solaris
solaris/d100 12.0M 11.6G 12.0M /solaris/d100
solaris/d100-copy 12.0M 11.6G 12.0M /solaris/d100-copy
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00#
bash-3.00# rm /solaris/d100/boot_archive
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.14G 11.6G 26.5K /solaris
solaris/d100 24.5K 11.6G 24.5K /solaris/d100
solaris/d100-copy 12.0M 11.6G 12.0M /solaris/d100-copy
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
solaris 5.13G 11.6G 26.5K /solaris
solaris/d100 24.5K 11.6G 24.5K /solaris/d100
solaris/d100-copy 24.5K 11.6G 24.5K /solaris/d100-copy
solaris/testws 5.13G 11.6G 5.13G /export/testws/
bash-3.00#

bash-3.00# cp /platform/i86pc/boot_archive /solaris/d100/
[stop replication in another terminal]
bash-3.00# zfs mount -a
bash-3.00# digest -a md5 /solaris/d100/boot_archive
33e242158c6eb691d23ce2c522a7bf55
bash-3.00# digest -a md5 /solaris/d100-copy/boot_archive
33e242158c6eb691d23ce2c522a7bf55
bash-3.00#


Bingo! All modifications to solaris/d100 are automatically replicated to solaris/d100-copy. Of course you can replicate over the network to remote server using ssh.

There're still some minor problems but generally the tool works as expected.

Once the first phase is implemented we will probably start second one - to implement a tool to manage replications between servers (like automatic replication setup if new file system is created, replication resume in case of a problem, etc.).


There're other approaches which create a snapshots and then incrementally replicate them to remote side in given intervals. While our approach is very similar it's more elegant and gives you almost on-line replication. What do you think?

Friday, March 16, 2007

IPMP - Next

If you're interested in how IPMP is going to look like in a near future check this blog entry. I really like this.

ps. don't miss this document.

Tuesday, March 13, 2007

Open Solaris Starter Kit

Do you want to try Open Solaris? Go and get free Starter Kit.

ps. Jim posted that most orders of the kit are from Russia and Poland!

Tuesday, March 06, 2007