Unresponsive Supermicro X10SDV-TLN4F motherboard

Why does a previously working Supermicro X10SDV-TLN4F motherboard suddenly stop booting or responding to the keyboard?

Background

Back in September 2015 I replaced my home server with a Supermicro SuperServer 5028D-TN4T Xeon D-1540. This is an amazing little box:

  • tiny enclosure;
  • Supermicro X10SDV-TLN4F motherboard with single XEON D-1540 CPU and integrated IPMI and KVM;
  • room for four hot-swappable 3.5 inch drives, plus space for another internally.

I configured this server with 16GB of ECC RAM, four 3GB SAS drives and an internal SAS SSD and run SmartOS from a USB Flash drive. The hard drives are configured into two ZRAID-1 mirrors providing 6GB of storage and the SSD is used as a ZFS cache. It’s blindingly fast and suits my needs perfectly.

This server has been running untouched for nn years and apart from a couple of reboots when I upgraded SmartOS has been running peacefully, and fairly quietly, in my garage cum workshop. However, I noticed that the front cover, which is also the air inlet, was collecting sawdust on it so I decided to power down the server and give a good internal clean.

It’s not coming back

I cleaned the server out and plugged in the power and waited, and waited, and waited for the server to come back on-line. Previously, a reboot took 2-3 minutes for the server to come back on-line. This time, nothing.

I attached a screen and keyboard and could immediately see that the server had not booted: it has tried to boot across the network and finally given up. It was now waiting for a boot disk to be added and a key to be pressed.

To be sure of what was happening, I reset the server and watched what was happening. Sure enough, the server never tried to boot from the USB Flash device and went directly to PXE boot: which failed. It was also ignoring the USB connected keyboard. One more reset confirmed that although there was a brief flash of the keyboard LED, after that, nothing worked. No NumLock, nothing.

Debugging

Stage one

I fired up the built in KVM and, unexpectedly, found that this also ignored the keyboard: even the virtual keyboard. I could control the power, reset etc. but not interact with the console.
That meant I couldn’t see any POST messages. All I could do was a POST Snoop, and that came back with 00: i.e. all OK. No errors from the onboard USB controller, so why no USB connectivity. Strange.

Stage two

To try and get control back, I decided to start by reseting the onboard CMOS. This is a bit fiddly on this board when it’s installed in the case, but I managed it.

No change.

Stage three

Check all plugs and connections in case I jogged something when I cleaned the server out.
I had to take the server out of it’s cupboard for this and put it on the bench.

That seemed to work.
This time, the server booted cleanly and came online. Put the case back on, and the server back in the cupboard.

Once again, no boot action.

Stage four

That seems to point at a physical problem. Time to remove and reseat all the boards and memory.

No change.

Time to dig deeper.

Stage five

There’s nothing showing up in the IPMI, however, the BIOS is very old: only 1.0a and I know from This page on tinkertry.com that the current BIOS is 1.1. However, if I can’t boot, then I can’t update the BIOS.

Next step is to see what happens if I remove the USB Flash Media and pull all the drives.

That made a difference. I now have control via the KVM and can get into the BIOS. The finger points at the USB Flash drive initially.

Update later the same day

Well, changing the USB Flash drive seemed to do the trick, though I’m at a loss to understand why. I booted another box with the “faulty” USB drive with no problems; but as soon as I try booting this box, it fails. I guess it’s a tolerance issue or something.

I still don’t understand how a Flash disk failure can cause the iKVM to fail though. However, I don’t have the time to pursue this further right now. I’m just happy that the server is up again.

Here’s to another two years of uninterrupted service before it needs to be powered down.

New ZFS based NAS and VM Host – part 3

In part 1 of this series, I covered the requirements and hardware. In part 2 I covered the initial configuration of the new server. In this part I’ll cover setting the server up as a file server.

The main use case for this new server is to be the main file and media server for our home. To achieve this I needed NFS, SMB and AFP access to the imported datasets.

NFS access is available by default in ZFS, but to get SMB and AFP access requires software to be installed. As indicated in part 2, you are strongly discouraged from installing software in the global zone. The supported approach is to create a new zone and install the software in there.
Zones are sort-of virtual machines in that each thinks it has exclusive use to hardware and are separate security containers. Where SmartOS differs from VMWare is its hybrid approach to supporting virtual machines. It does this by supporting multiple “brands” of zones:

  • “joyent” branded zones appear to be running SmartOS itself. They have no kernel of their own, they re-use the global zone’s kernel and just provide resource and security isolation.
  • “lx” branded zones appear to be running a specific version of Linux. As with “joyent” zones, they re-use the global zone’s kernel but translate the “brand”‘s system calls into those supported by SmartOS. This gets you the benefits of the software that normally runs on the brand, but without the overhead of having two run the brand’s kernel on top of the SmartOS kernel. The result is near bare-metal speeds. Currently (Sept 2015), SmartOS supports Ubuntu, Centos, Debian, Fedora (maybe others).
  • “kvm” branded zones are more like any other KVM virtual machine: allowing just about any other operating to be installed.

First attempt using an Ubuntu branded zone

This failed, so I’m not going into the detail

You could install Ubuntu in a kvm branded zone, but using an Ubuntu version of a lx zone avoids running two kernels. Base images for many Ubuntu variants exist in Joyent’s public repository, so I simply followed the instructions in the SmartOS wiki to:

  1. import the base Ubuntu 14.04 LTS server image
  2. Create a json file that describes the new zone
  3. Create the new zone using the json file.
    At the end of this, I had an Ubuntu 14.04 virtual machine called capella, on the same IP address as the old server and with direct access to the ZFS datasets containing the files from the old server.

I now followed the guide at outcoldman.comcoldman to install SAMBA and the guide at … to install Netatalk.
At the end, I had a functioning Samba server, but I had trouble with netatalk. My Macbook Air running Mavericks couldn’t connect to Capella using AFP. Investigation showed that the 14.04 version of lx-ubuntu was missing the uams_dhx2.so security module that was needed to support Mavericks.
Note: SmartOS freely admit that branded zones are still being developed

Second attempt using a native SmartOS zone

Rather than spending too much time on this, I exploited one of the major advantages of using SmartOS. I simply deleted the zone, downloaded a basic joyent brand zone, created a new json file and created a new joyent branded zone. It took 5 minutes! I used the following json

{
"hostname": "capella.agdon.net",
"alias": "capella",
"brand": "joyent",
"max_physical_memory": 4096,
"image_uuid": "5c7d0d24-3475-11e5-8e67-27953a8b237e",
"resolvers": ["172.29.12.7","8.8.4.4"],
"nics": [
{
"nic_tag": "admin",
"ip": "172.29.12.11",
"netmask": "255.255.255.0",
"gateway": "172.29.12.1",
"primary": "1"
}
],
"filesystems": [
{
"type": "lofs",
"source": "/data/media",
"target": "/import/media"
},
{
"type": "lofs",
"source": "/data/home",
"target": "/import/home"
},
{
"type": "lofs",
"source": "/data/home/git",
"target": "/import/home/git"
},
{
"type": "lofs",
"source": "/data/public",
"target": "/import/public"
},
{
"type": "lofs",
"source": "/data/software",
"target": "/import/software"
}
]
}

I then installed Samba and Netatalk as before. This time all was well and I now had a functioning NFS, SMB and AFP file server.

I reconfigured the clients to access the new server and I was back where I was before I changed hardware. Simples!

Next Step, install Plex media server, SABNZBD, CouchPotato and Sickbeard to create a fully functioning media server.

New ZFS based NAS and VM Host – part 2

In part one I covered the motivation, requirements and hardware. In this post I will cover software installation and configuration.

Installing and configuring SmartOS

SmartOS differs from many other operating systems, though not FreeNAS, by not requiring installation. You simply copy a disk image to a USB thumb drive and boot from it. SmartOS then creates a RAMDisk, copies itself to the RAMDisk and runs from there. Alternatively you can boot across the network using PXE.

This exposes SmartOS’s primary use case as a data-centre operating system. By not requiring installation, upgrades are quickly deployed by copying a new image to the flash drive and rebooting.

This does have an important side effect however. SmartOS supports the notion of “zones”, first implemented in Solaris. When booted, SmartOS itself runs in the “global” zone. However, because the filesystem is on a RAMdisk, any changes you make do not persist across a reboot. There are ways to get around this so that (e.g.) you can ensure your SSH public key is an authorized_key and you can login without a password; but you are strongly discouraged from installing software in the global zone. More on that later.

Installation

I started by following the instructions in the SmartOS wiki to download the latest SmartOS image and copy it to a 2GB consumer grade USB thumb drive. It’s only 161MB so it didn’t take long.
I booted the server from the thumb drive and, because this was a clean system, I was presented with a wizard that asked for hostname, IP address (or DHCP) and the identities of the drives I wished to use for the “zones” pool; which is used to store all the datasets for the other zones.

Rather than risk getting it all wrong, I chose the first two HGST drives and put them in a mirror. After configuring the zpool, I was presented with the login prompt.

The default login is root/root, so I immediately changed the root password!

zpool status showed

# zpool status
pool: zones
state: ONLINE
scan: resilvered 1.98G in 0h0m with 0 errors on Fri Sep 11 13:22:01 2015
config:
NAME        STATE     READ WRITE CKSUM
zones       ONLINE       0     0     0
  mirror  ONLINE       0     0     0
    c1t0d0  ONLINE       0     0     0
    c1t1d0  ONLINE       0     0     0

After this, I added the other two drives in as a second mirrored vdev using

zpool add zones mirror c1t2do c1t3do

and then added the SSD as an slog

zpool add zones log c1t4do

at the end of this zpool status showed

# zpool status
  pool: zones
 state: ONLINE
  scan: resilvered 4.01G in 0h0m with 0 errors on Fri Sep 11 13:38:15 2015
config:

        NAME       STATE      READ WRITE CKSUM
        zones      ONLINE        0     0     0
          mirror-0 ONLINE        0     0     0
            c1t0d0 ONLINE        0     0     0
            c1t1d0 ONLINE        0     0     0
          mirror-2 ONLINE        0     0     0
            c1t2d0 ONLINE        0     0     0
            c1t3d0 ONLINE        0     0     0
        logs
          c1t4d0   ONLINE        0     0     0

errors: No known data errors

This whole process took about 30 minutes (including downloading and copying the SmartOS image)

Moving data from the old server

Now that I had the new server installed and ready to go, I needed to copy the data across from the old server. There are a number of ways to do this:

  1. Physically move the disks across
  2. Use ZFS Send/ZFS receive to copy the data across the network
  3. Use rsync to send files.

As the old server was still running, I didn’t want to move the disks, but experiment showed it was going to take days to copy the data across my network. So I compromised.

I split the mirror on the old server and moved one disk to the new server. I then imported it:

zpool import rdata

It showed up as a degraded mirror. To get the data across, I did the following:

# zfs create snapshot rdata@export
# zfs create zones/import -o mountpoint=/import
# zfs snapshot -r rdata@export
# for dset in "media public software home"; do zfs send -R rdata/${dset}@export | zfs recv data/${dset}; done

After checking the data was there I cleaned up:

zpool export rdata
poweroff

I’ll keep this as an archive disk and re-use the one in the old server.

Next Steps

The initial use for this server is as a file/print server and as a media server. In the next post, I’ll cover how I did this.

New ZFS based NAS and VM Host – part 1

For some time I’ve been using a re-purposed Acer desktop PC as a NAS. It runs OpenIndiana with 2 x 3TB disks in a ZFS mirror. It has Napp-IT installed for administration.

It’s been OK, but it’s not very powerful and only has limited throughout on its single 1000BaseT Ethernet connection. I’ve been considering an upgrade for ages, but finally bit the bullet this week. I’ll cover the requirements and spec in this post and then the build in a followup.

Requirements

The current server was just a file server. I did try to run another zone on it but it couldn’t hack it. I wanted to get back to a position where I could run Virtual Machines as well. The main driver was to be able to use the NAS as a Crashplan host. There is a version of Crashplan for Solaris, but at the last major upgrade they dropped support for being the destination of a Crashplan. I still had the cloud backup but it was nice to have a local replica as well. So, being able to have the Linux version running in a  VM would be good.

  • NFS access from a bunch of RaspberryPi devices in the house
  • CIFS access from PCs and the SONOS devices
  • NETATALK access from my Macbook Air and the Apple TV
  • Support for running multiple VMs: including
  • Windows Home Server V2 to back up the Windows PCs
  • Ubuntu to host Crashplan

Whilst noodling about these requirements I was also thinking about replacing the Thinkpad that runs my radio software in the shack. A lightbulb moment made me reconsider how the IT is structured here.

Currently, I have a CAT6 network throughout the house and down to the shack. The Acer sits in the garage attached to the house and there is a FreeNAS server in the shack who’s main purpose is to be a backup for the Acer. It also has a jail running some scripts to keep backups of the RaspberryPi devices. I then have a Thinkpad T60 as my shack computer.

The idea is to move the FreeNAS device to the garage and put the new server in the shack. If I ensured I could do IO virtualisation (so a VM could make best use of a video card and get isochronous access to USB devices) and ensured it had a good video card then I also use the new server as my shack computer.

Solution

Hardware

To get the virtualisation features means an Intel Xeon class processor or the AMD equivalent. I’ve been a fan of Supermicro for some time and saw that the X10SDV-TLN4F looked to be perfect. I did some research and came across this post by Benjamin Bryan on a Supermicro Datacentre in a box using a close relative of this board. Perfect.

In the end I opted to buy the SYS-5028D-TN4T barebones server which includes this motherboard in the stunning CSE-721TQ-250B case. This has four front access 3.5 drive slots and two internal positions for 2.5 drives. I also bought 32GB ECC memory from Crucial and four HGST 2TB drives from Hitachi.

This is an expensive build but I think it will be worth it. I haven’t bought the video card yet.

(Incidentally. Years ago (back in the 90’s), the rule of thumb was that the computer you really lusted after always cost £2000. For a while that hasn’t been true for desktops, but I reckon it’s still good for servers).

Software

In Ben’s build he opted for the Napp-IT in one approach of Illuminos/ESXi and then VMs. However, in the comments there were references to SmartOS. Having this would avoid the need for a PCI Host Bus Adapter and would be simpler.

SmartOS supports zones like Solaris and OpenIndiana but adds support for KVM virtual machines. SmartOS is run from a flash drive and builds ZFS zpools from the disks. Because you are starting from a flashdrive, SmartOS runs from a RAMDrive and isn’t persistent. This means the global zone should be kept simple: i.e. so you don’t install software in it. Instead, create another zone and install software in there.

The beauty of SmartOS zones is that they use the same kernel as the global zone: i.e. you only need space for new software and any data. What happens is that the new zone os created from a ZFS snapshot of the global zone. Elegant!

More on the build itself in part 2.