I recently used ZFS snapshots and ZFS send/recv to move a non-global zone from one host to another, as I’ve done hundreds of times, and was quite surprised to see that some data was missing.
Spoiler alert: Solaris didn’t lose any data, but something non-obvious was happening.
For a non-global zone that I need to move from one host to another, I typically use ZFS snapshots and follow this methodology:
- Take an initial snapshot while zone is running
- Send the snapshot to the new host
- Possibly take incremental snapshots depending upon the data change rate and the zone size, and send those.
- Shutdown and detach the zone during a maintenance window
- Take a final incremental snapshot
- Send the final incremental snapshot
- Attach and boot the zone on the new destination
That process looks something like this:
#on source host zonecfg -z myzone export | ssh root@target "zonecfg -z myzone" zfs snapshot -r mypool/myzone@move zfs send -r mypool/myzone@move | ssh root@target "zfs recv -d newpool" zoneadm -z myzone halt zoneadm -z myzone detach zfs snapshot -r mypool/myzone@final zfs send -r -i mypool/myzone@move mypool/myzone@final | ssh root@target "zfs recv -d newpool" #on target zoneadm -z myzone attach zoneadm -z myzone boot
This has always worked well. However, in preparation for a demonstration I was providing, I had moved a zone back and forth a few times, and then noticed that the /root directory in the zone did not contain the same contents on the target server that it had on the source server. Just like any other time when something seems incredibly wrong, I immediately started checking all of the basics, but it seemed like I had done everything correctly.
Here are the steps I had taken:
- Moved myzone from host1 to host2
- Moved myzone from host2 to host1
- Added a file (my_marker_file) to the /root directory in myzone
- Moved myzone from host1 to host2
- Noticed the file (my_marker_file) was not present on host2
After the zone was moved back to host1, I created the file my_marker_file:
#running on host1 zlogin myzone touch /root/my_marker_file
Next, let’s move the zone again from host1 to host2 (still using our ZFS snapshot/send/receive methodology), and then look in the /root directory:
#running on host2 zlogin myzone ls /root/my_marker_file /root/my_marker_file: No such file or directory
Cutting to the chase a bit, boot environments are coming into play here. Let’s look at the zone’s boot environments after the first move from host1 to host2:
:~# beadm list BE Flags Mountpoint Space Policy Created -- ----- ---------- ----- ------ ------- solaris !RO - 247.50M static 2018-10-12 8:17 solaris-0 NR / 1.02G static 2018-10-12 08:28
We can see that the zoneadm attach process has created a new boot environment, solaris-0. Now, let’s run the same command after we move the zone back to host1:
~# beadm list BE Flags Mountpoint Space Policy Created -- ----- ---------- ----- ------ ------- solaris NR / 1.24G static 2018-10-12 08:35 solaris-0 !RO - 14.11M static 2018-10-12 08:35
Notice that it is the solaris BE that is now active, rather than the solaris-0 BE. Our file my_marker_file isn’t gone, but it’s stuck in the wrong BE. The parent host seems to recognize the boot environment associated with its global zone.
I tend to think most people aren’t going to encounter this issue. As I mentioned earlier, we don’t typically move our non-global zones around much, and we almost never move a non-global zone back to its original host right after we moved it away. If you find yourself needing to do this much, you might want to consider a kernel-zone which can be live migrated, or perhaps a non-global zone within a kernel zone (which would be live migrated as part of the kernel zone).
However, you can avoid this issue with non-global zones if you delete orphan boot environments along the way with the destroy-orphan-zbes option to the zoneadm attach command:
zoneadm -z myzone attach -x destroy-orphan-zbes
More information about the attach options as regards boot environments can be found here: https://docs.oracle.com/cd/E53394_01/html/E54752/gpoma.html.