Data Missing in Solaris Zone Snapshot
By: Date: October 12, 2018 Categories: Enterprise Computing,Solaris / Illumos Tags:

I recently used ZFS snapshots and ZFS send/recv to move a non-global zone from one host to another, as I’ve done hundreds of times, and was quite surprised to see that some data was missing.  

Spoiler alert:  Solaris didn’t lose any data, but something non-obvious was happening.

For a non-global zone that I need to move from one host to another, I typically use ZFS snapshots and follow this methodology:

  • Take an initial snapshot while zone is running
  • Send the snapshot to the new host
  • Possibly take incremental snapshots depending upon the data change rate and the zone size, and send those.
  • Shutdown and detach the zone during a maintenance window
  • Take a final incremental snapshot
  • Send the final incremental snapshot
  • Attach and boot the zone on the new destination

That process looks something like this:

#on source host
zonecfg -z myzone export | ssh root@target "zonecfg -z myzone"
zfs snapshot -r mypool/myzone@move
zfs send -r mypool/myzone@move | ssh root@target "zfs recv -d newpool"
zoneadm -z myzone halt
zoneadm -z myzone detach
zfs snapshot -r mypool/myzone@final
zfs send -r -i mypool/myzone@move mypool/myzone@final | ssh root@target "zfs recv -d newpool"
#on target
zoneadm -z myzone attach
zoneadm -z myzone boot

This has always worked well.  However, in preparation for a demonstration I was providing, I had moved a zone back and forth a few times, and then noticed that the /root directory in the zone did not contain the same contents on the target server that it had on the source server.  Just like any other time when something seems incredibly wrong, I immediately started checking all of the basics, but it seemed like I had done everything correctly.

Here are the steps I had taken:

  • Moved myzone from host1 to host2
  • Moved myzone from host2 to host1
  • Added a file (my_marker_file) to the /root directory in myzone
  • Moved myzone from host1 to host2
  • Noticed the file (my_marker_file) was not present on host2

After the zone was moved back to host1, I created the file my_marker_file:

#running on host1
zlogin myzone
touch /root/my_marker_file

Next, let’s move the zone again from host1 to host2 (still using our ZFS snapshot/send/receive methodology), and then look in the /root directory:

#running on host2
zlogin myzone
ls /root/my_marker_file
/root/my_marker_file: No such file or directory

Cutting to the chase a bit, boot environments are coming into play here.  Let’s look at the zone’s boot environments after the first move from host1 to host2:

:~# beadm list
BE        Flags Mountpoint Space   Policy Created
--        ----- ---------- -----   ------ -------
solaris   !RO   -          247.50M static 2018-10-12 8:17
solaris-0 NR    /          1.02G   static 2018-10-12 08:28

We can see that the zoneadm attach process has created a new boot environment, solaris-0.  Now, let’s run the same command after we move the zone back to host1:

~# beadm list
BE        Flags Mountpoint Space  Policy Created
--        ----- ---------- -----  ------ -------
solaris   NR    /          1.24G  static 2018-10-12 08:35
solaris-0 !RO   -          14.11M static 2018-10-12 08:35

Notice that it is the solaris BE that is now active, rather than the solaris-0 BE.    Our file my_marker_file isn’t gone, but it’s stuck in the wrong BE.  The parent host seems to recognize the boot environment associated with its global zone.

I tend to think most people aren’t going to encounter this issue.  As I mentioned earlier, we don’t typically move our non-global zones around much, and we almost never move a non-global zone back to its original host right after we moved it away.  If you find yourself needing to do this much, you might want to consider a kernel-zone which can be live migrated, or perhaps a non-global zone within a kernel zone (which would be live migrated as part of the kernel zone).

However, you can avoid this issue with non-global zones if you delete orphan boot environments along the way with the destroy-orphan-zbes option to the zoneadm attach command:

zoneadm -z myzone attach -x destroy-orphan-zbes

More information about the attach options as regards boot environments can be found here: https://docs.oracle.com/cd/E53394_01/html/E54752/gpoma.html.