Upgrading the disk space in my ZFS-based servers – pt 4

Part 1, Part 2, Part 3

Back on Deneb

I followed a similar procedure with the pools on deneb. The only change was that instead of running the second snapshot plus send/recv task with the system running normally, I ran it with the system running in no install/recovery mode. That way no services or zones were running.

After I had completed renaming, exporting and importing the pools, I rebooted as I had done with eridani. I immediately hit a problem: Smartos crashed at some point during the boot process. Unfortunately, the crash message scrolled off the screen before I could see what it was.

I rebooted and videoed the boot sequence on my ‘phone. There’s a kernel panic that causes the crash but it’s impossible to determine what the cause is.

On the basis that I can only really make progress with a running system, I decided to

  • reboot into recovery mode
  • destroy the the new pool
  • import the old pool as zzbackup
  • install SmartOS on to t a newly created pool
  • try and debug from there.

I removed dsk3 (containing the zzbackup pool) and then reinstalled smartos on to a newly created raidz1 pool.

When I rebooted without dsk3 the system was stable. When I then rebooted with dsk3 installed, the system panicked again!

I rebooted into recovery mode, imported zzbackup and destroyed it.

Now it reboots OK. Now I can import the destroyed zzbackup pool on to an alternate mount point.

[root@deneb ~]# zpool status
  pool: zones
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zones       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
        logs
          c1t4d0    ONLINE       0     0     0

errors: No known data errors
[root@deneb ~]# zpool import -D
   pool: zzbackup
     id: 11000531473529046782
  state: ONLINE (DESTROYED)
 action: The pool can be imported using its name or numeric identifier.
 config:

        zzbackup    ONLINE
          c1t3d0    ONLINE
[root@deneb ~]# mkdir /alt
[root@deneb ~]# zpool import -D -R /alt zzbackup
[root@deneb ~]# zpool status
  pool: zones
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zones       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
        logs
          c1t4d0    ONLINE       0     0     0

errors: No known data errors

  pool: zzbackup
 state: ONLINE
  scan: scrub repaired 0 in 4h47m with 0 errors on Tue Feb  6 22:32:20 2018
config:

        NAME        STATE     READ WRITE CKSUM
        zzbackup    ONLINE       0     0     0
          c1t3d0    ONLINE       0     0     0

errors: No known data errors
[root@deneb ~]# zfs mount
zones                           /zones
zones/archive                   /zones/archive
zones/cores/global              /zones/global/cores
zones/var                       /var
zones/config                    /etc/zones
zones/opt                       /opt
zones/usbkey                    /usbkey
zzbackup/opt/data               /alt/data
zzbackup/opt/data/backups       /alt/data/backups
zzbackup/opt/data/cfg-backups   /alt/data/cfg-backups
zzbackup/opt/data/dev_backups   /alt/data/dev_backups
zzbackup/opt/data/home          /alt/data/home
zzbackup/opt/data/home/git      /alt/data/home/git
zzbackup/opt/data/media         /alt/data/media
zzbackup/opt/data/public        /alt/data/public
zzbackup/opt/data/software      /alt/data/software
...
zzbackup/archive                /alt/zones/archive
...
zzbackup/cores/global           /alt/zones/global/cores
zzbackup                        /alt/zzbackup

Now I can rebuild deneb from the old system. A bit tedious though.

  1. Copied usbkey over and rebooted (had to destroy zzbackup first again though)
  2. Copied /opt over so that the custom services start up.
  3. Rebooted to be sure.

Before laboriously rebuilding, I decided to try booting with dsk3 as zones and the new pool as zznew.

It boots, but the mountpoints are screwed!

root@deneb ~ $ zfs list
NAME                                               USED  AVAIL  REFER  MOUNTPOINT
zones                                             2.67T   863G   588K  /zones
zones/0246b0fe-771c-60ba-cbe6-92ea5795117b        1.21G  8.79G  1.27G  /zones/0246b0fe-771c-60ba-cbe6-92ea5795117b
zones/088b97b0-e1a1-11e5-b895-9baa2086eb33         528M   863G   527M  /zones/088b97b0-e1a1-11e5-b895-9baa2086eb33
zones/147f4eca-1783-4b80-d7e4-9a1d4420567a         294M  9.71G   432M  /zones/147f4eca-1783-4b80-d7e4-9a1d4420567a
zones/163cd9fe-0c90-11e6-bd05-afd50e5961b6         257M   863G   257M  /zones/163cd9fe-0c90-11e6-bd05-afd50e5961b6
zones/1870884c-780a-cb0b-fdc0-8e740afa4173         320M  9.69G   459M  /zones/1870884c-780a-cb0b-fdc0-8e740afa4173
zones/1bd84670-055a-11e5-aaa2-0346bb21d5a1        52.2M   863G  51.9M  /zones/1bd84670-055a-11e5-aaa2-0346bb21d5a1
zones/1ed69a26-f60b-401c-bde6-793df2d0547b        2.12G   498G  2.01G  /zones/1ed69a26-f60b-401c-bde6-793df2d0547b
zones/2a9bfaf4-ddf1-e146-ab80-e2f8723ec714         313M  9.69G   453M  /zones/2a9bfaf4-ddf1-e146-ab80-e2f8723ec714
zones/46c77656-5d22-cdaf-8056-88aaa11c1e58         790M  9.23G   868M  /zones/46c77656-5d22-cdaf-8056-88aaa11c1e58
zones/4bc5b510-2d5d-e47e-c3bc-d492dfeae320         813M  9.21G   813M  /zones/4bc5b510-2d5d-e47e-c3bc-d492dfeae320
zones/4bc5b510-2d5d-e47e-c3bc-d492dfeae320-disk0  53.9G   903G  11.1G  -
zones/5c7d0d24-3475-11e5-8e67-27953a8b237e         256M   863G   256M  /zones/5c7d0d24-3475-11e5-8e67-27953a8b237e
zones/7b5981c4-1889-11e7-b4c5-3f3bdfc9b88b         241M   863G   240M  /zones/7b5981c4-1889-11e7-b4c5-3f3bdfc9b88b
zones/842e6fa6-6e9b-11e5-8402-1b490459e334         226M   863G   226M  /zones/842e6fa6-6e9b-11e5-8402-1b490459e334
zones/a21a64a0-0809-11e5-a64f-ff80e8e8086f         186M   863G   186M  /zones/a21a64a0-0809-11e5-a64f-ff80e8e8086f
zones/archive                                      152K   863G    88K  none
zones/b33d4dec-db27-4337-93b5-1f5e7c5b47ce         792M   863G   792M  -
zones/c8d68a9e-4682-11e5-9450-4f4fadd0936d         139M   863G   139M  /zones/c8d68a9e-4682-11e5-9450-4f4fadd0936d
zones/config                                       468K   863G   196K  legacy
zones/cores                                        250M   863G    88K  none
...
zones/cores/global                                 152K  10.0G    88K  /zones/global/cores
...
zones/dump                                         260K   863G   140K  -
...
zones/opt                                         2.50T   863G  1.20G  legacy
zones/opt/data                                    2.49T   863G   112K  /data
zones/opt/data/backups                             617G   863G   466G  /data/backups
zones/opt/data/cfg-backups                        57.2G   863G  47.8G  /data/cfg-backups
zones/opt/data/dev_backups                        2.61G   863G  2.61G  /data/dev_backups
zones/opt/data/home                                108G   863G   108G  /data/home
zones/opt/data/home/git                            152K   863G    88K  /data/home/git
zones/opt/data/media                              1.73T   863G  1.73T  /data/media
zones/opt/data/public                              172K   863G   108K  /data/public
zones/opt/data/software                            336K   863G   272K  /data/software
zones/swap                                        33.2G   896G   246M  -
zones/usbkey                                       196K   863G   132K  legacy
zones/var                                         1.05G   863G  1.03G  legacy
zznew                                             37.6G  3.47T  1018K  /zznew
zznew/archive                                      117K  3.47T   117K  /zznew/archive
zznew/config                                       139K  3.47T   139K  legacy
zznew/cores                                        234K  3.47T   117K  none
zznew/cores/global                                 117K  10.0G   117K  /zznew/global/cores
zznew/dump                                        1.84G  3.47T  1.84G  -
zznew/opt                                         2.88G  3.47T  2.88G  legacy
zznew/swap                                        32.9G  3.50T  74.6K  -
zznew/usbkey                                       261K  3.47T   261K  legacy
zznew/var                                         3.91M  3.47T  3.91M  /zznew/var

This may be the cause of the panic
The salient parts are:

root@deneb ~ $ zfs list
NAME                                               USED  AVAIL  REFER  MOUNTPOINT
zones                                             2.67T   863G   588K  /zones
zones/archive                                      152K   863G    88K  none
…
zones/config                                       468K   863G   196K  legacy
zones/cores                                        250M   863G    88K  none
…
zones/cores/global                                 152K  10.0G    88K  /zones/global/cores
…
zones/dump                                         260K   863G   140K  -
…
zones/opt                                         2.50T   863G  1.20G  legacy
…
zones/swap                                        33.2G   896G   246M  -
zones/usbkey                                       196K   863G   132K  legacy
zones/var                                         1.05G   863G  1.03G  legacy
zznew                                             37.6G  3.47T  1018K  /zznew
zznew/archive                                      117K  3.47T   117K  /zznew/archive
zznew/config                                       139K  3.47T   139K  legacy
zznew/cores                                        234K  3.47T   117K  none
zznew/cores/global                                 117K  10.0G   117K  /zznew/global/cores
zznew/dump                                        1.84G  3.47T  1.84G  -
zznew/opt                                         2.88G  3.47T  2.88G  legacy
zznew/swap                                        32.9G  3.50T  74.6K  -
zznew/usbkey                                       261K  3.47T   261K  legacy
zznew/var                                         3.91M  3.47T  3.91M  /zznew/var

root@deneb ~ $ zfs mount
zones                           /zones
…
zznew                           /zznew
zznew/archive                   /zznew/archive
zznew/cores/global              /zznew/global/cores
zznew/var                       /zznew/var
zznew/config                    /etc/zones
zznew/opt                       /opt
zznew/usbkey                    /usbkey

As you can see, some of the legacy datasets on zznew are being mounted instead of the equivalents from zones. i.e. it seems to be mixing up the legacy mounts.

yet more to follow

Upgrading the disk space in my ZFS-based servers – pt 3

Part 1, Part 2
It was now time to try the same recipe on the main server: deneb

On Deneb

root@deneb ~ $ zpool status
pool: zones
state: ONLINE
scan: scrub repaired 0 in 7h21m with 0 errors on Fri May 19 18:22:48 2017
config:

NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
logs
c1t4d0 ONLINE 0 0 0

errors: No known data errors

Downgrade one of the mirrors to make room for a new 4TB disk that can be used as a temporary store for deneb‘s data.

root@deneb ~ $ zpool detach zones c1t3d0
root@deneb ~ $ zpool status
pool: zones
state: ONLINE
scan: scrub repaired 0 in 7h21m with 0 errors on Fri May 19 18:22:48 2017
config:

NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
logs
c1t4d0 ONLINE 0 0 0

errors: No known data errors
root@deneb ~ $ poweroff
poweroff: Halting 9 zones.

I removed disk 4 and installed the third new 4TB disk in its place.

root@deneb ~ $ zpool status
pool: zones
state: ONLINE
scan: scrub repaired 0 in 7h21m with 0 errors on Fri May 19 18:22:48 2017
config:

NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
logs
c1t4d0 ONLINE 0 0 0

errors: No known data errors
root@deneb ~ $ zpool create newzones c1t3d0
root@deneb ~ $ zpool status
pool: newzones
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
newzones ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0

errors: No known data errors

pool: zones
state: ONLINE
scan: scrub repaired 0 in 7h21m with 0 errors on Fri May 19 18:22:48 2017
config:

NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
logs
c1t4d0 ONLINE 0 0 0

errors: No known data errors

Now I can clone zones on to newzones

root@deneb ~ $ zfs snapshot -r zones@txfr1
root@deneb ~ $ zfs send -R zones@txfr1 | zfs recv -F newzones

This took a long time!

Upgrading the disk space in my ZFS-based servers – pt 2

See here for part 1

On Eridani

Once the initial send/recv had completed, I did another snapshot and sent the incremental data just in case anything had changed.

root@eridani ~ $ zfs snapshot -r zones@txfr2
root@eridani ~ $ zfs send -R -i txfr zones@txfr2 | pv | zfs recv tempzone

Lastly, I promoted the incremental snapshot to be the current version of tempzone

root@eridani ~ $ zfs rollback tempzone@txfr2

As a final (paranoid) check, I ran a dummy rsync task to check if /zones was the same as /tempzone

root@eridani ~ $ rsync -avn /zones/ /tempzone/ | less
sending incremental file list
global/cores/

sent 765011152 bytes  received 2764561 bytes  486087.82 bytes/sec
total size is 3454738169591  speedup is 4499.67 (DRY RUN)

Nothing had changed, so I could now swap the pools over

root@eridani ~ $ zpool export tempzone

GOTCHA!

I couldn’t export the root zones pool at this point because it had mounted filesystems that were in use by the running system. To get further I had to reboot the system into restore/recovery mode.

In this mode, no pools are imported, so I could execute the following commands

zpool status - to confirm that no pools were mounted
zpool import - to see what pools were available
zpool import -NR /t1 tempzone zones - -NR to avoid any datasets being mounted and use an alternate mount point
zpool import -NR /t2  zones oldzones
zpool export oldzones
zpool export zones
reboot

I was still getting errors even though the pools had been renamed. The problem turned out to be that when SmartOS boots, it seems to mount the pools in alphabetical order.

Probably more likely that SmartOS scans the disks in alphabetical order

Thus, oldzones was mounted before zones and it’s datasets were grabbing the mount points.
Rather than laboriously change the mountpoint property on all the datasets, I simply disconnected the disk.

Once I had completed this, eridani booted using the new pool.

root@eridani ~ $ 
root@eridani ~ $ zpool status
  pool: zones
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    zones       ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        c2d1    ONLINE       0     0     0
        c3d0    ONLINE       0     0     0
        c3d1    ONLINE       0     0     0

errors: No known data errors
root@eridani ~ $ zfs list
NAME                                                                  USED  AVAIL  REFER  MOUNTPOINT
zones                                                                2.52T  2.73T   318K  /zones
zones/archive                                                        29.3K  2.73T  29.3K  none
zones/backup                                                         2.50T  2.73T  29.3K  /zones/backup
zones/backup/deneb                                                   2.50T  2.73T  29.3K  /zones/backup/deneb
zones/backup/deneb/zones                                             2.50T  2.73T   324K  /zones/backup/deneb/zones
...
zones/config                                                         55.9K  2.73T  36.0K  legacy
zones/cores                                                          58.6K  2.73T  29.3K  none
zones/cores/global                                                   29.3K  10.0G  29.3K  /zones/global/cores
zones/dump                                                           1023M  2.73T  1023M  -
zones/opt                                                             423M  2.73T   422M  legacy
zones/swap                                                           17.9G  2.74T  1.44G  -
zones/usbkey                                                         38.6K  2.73T  38.6K  legacy
zones/var                                                            7.08M  2.73T  5.42M  legacy
root@eridani ~ $ zfs mount
zones                           /zones
zones/backup                    /zones/backup
zones/backup/deneb              /zones/backup/deneb
zones/backup/deneb/zones        /zones/backup/deneb/zones
...
zones/backup/deneb/zones/usbkey  /zones/backup/deneb/zones/usbkey
zones/backup/deneb/zones/var    /zones/backup/deneb/zones/var
zones/cores/global              /zones/global/cores
zones/var                       /var
zones/config                    /etc/zones
zones/opt                       /opt
zones/usbkey                    /usbkey
root@eridani ~ $