Unresponsive Supermicro X10SDV-TLN4F motherboard

Why does a previously working Supermicro X10SDV-TLN4F motherboard suddenly stop booting or responding to the keyboard?

Background

Back in September 2015 I replaced my home server with a Supermicro SuperServer 5028D-TN4T Xeon D-1540. This is an amazing little box:

  • tiny enclosure;
  • Supermicro X10SDV-TLN4F motherboard with single XEON D-1540 CPU and integrated IPMI and KVM;
  • room for four hot-swappable 3.5 inch drives, plus space for another internally.

I configured this server with 16GB of ECC RAM, four 3GB SAS drives and an internal SAS SSD and run SmartOS from a USB Flash drive. The hard drives are configured into two ZRAID-1 mirrors providing 6GB of storage and the SSD is used as a ZFS cache. It’s blindingly fast and suits my needs perfectly.

This server has been running untouched for nn years and apart from a couple of reboots when I upgraded SmartOS has been running peacefully, and fairly quietly, in my garage cum workshop. However, I noticed that the front cover, which is also the air inlet, was collecting sawdust on it so I decided to power down the server and give a good internal clean.

It’s not coming back

I cleaned the server out and plugged in the power and waited, and waited, and waited for the server to come back on-line. Previously, a reboot took 2-3 minutes for the server to come back on-line. This time, nothing.

I attached a screen and keyboard and could immediately see that the server had not booted: it has tried to boot across the network and finally given up. It was now waiting for a boot disk to be added and a key to be pressed.

To be sure of what was happening, I reset the server and watched what was happening. Sure enough, the server never tried to boot from the USB Flash device and went directly to PXE boot: which failed. It was also ignoring the USB connected keyboard. One more reset confirmed that although there was a brief flash of the keyboard LED, after that, nothing worked. No NumLock, nothing.

Debugging

Stage one

I fired up the built in KVM and, unexpectedly, found that this also ignored the keyboard: even the virtual keyboard. I could control the power, reset etc. but not interact with the console.
That meant I couldn’t see any POST messages. All I could do was a POST Snoop, and that came back with 00: i.e. all OK. No errors from the onboard USB controller, so why no USB connectivity. Strange.

Stage two

To try and get control back, I decided to start by reseting the onboard CMOS. This is a bit fiddly on this board when it’s installed in the case, but I managed it.

No change.

Stage three

Check all plugs and connections in case I jogged something when I cleaned the server out.
I had to take the server out of it’s cupboard for this and put it on the bench.

That seemed to work.
This time, the server booted cleanly and came online. Put the case back on, and the server back in the cupboard.

Once again, no boot action.

Stage four

That seems to point at a physical problem. Time to remove and reseat all the boards and memory.

No change.

Time to dig deeper.

Stage five

There’s nothing showing up in the IPMI, however, the BIOS is very old: only 1.0a and I know from This page on tinkertry.com that the current BIOS is 1.1. However, if I can’t boot, then I can’t update the BIOS.

Next step is to see what happens if I remove the USB Flash Media and pull all the drives.

That made a difference. I now have control via the KVM and can get into the BIOS. The finger points at the USB Flash drive initially.

Update later the same day

Well, changing the USB Flash drive seemed to do the trick, though I’m at a loss to understand why. I booted another box with the “faulty” USB drive with no problems; but as soon as I try booting this box, it fails. I guess it’s a tolerance issue or something.

I still don’t understand how a Flash disk failure can cause the iKVM to fail though. However, I don’t have the time to pursue this further right now. I’m just happy that the server is up again.

Here’s to another two years of uninterrupted service before it needs to be powered down.

New ZFS based NAS and VM Host – part 1

For some time I’ve been using a re-purposed Acer desktop PC as a NAS. It runs OpenIndiana with 2 x 3TB disks in a ZFS mirror. It has Napp-IT installed for administration.

It’s been OK, but it’s not very powerful and only has limited throughout on its single 1000BaseT Ethernet connection. I’ve been considering an upgrade for ages, but finally bit the bullet this week. I’ll cover the requirements and spec in this post and then the build in a followup.

Requirements

The current server was just a file server. I did try to run another zone on it but it couldn’t hack it. I wanted to get back to a position where I could run Virtual Machines as well. The main driver was to be able to use the NAS as a Crashplan host. There is a version of Crashplan for Solaris, but at the last major upgrade they dropped support for being the destination of a Crashplan. I still had the cloud backup but it was nice to have a local replica as well. So, being able to have the Linux version running in a  VM would be good.

  • NFS access from a bunch of RaspberryPi devices in the house
  • CIFS access from PCs and the SONOS devices
  • NETATALK access from my Macbook Air and the Apple TV
  • Support for running multiple VMs: including
  • Windows Home Server V2 to back up the Windows PCs
  • Ubuntu to host Crashplan

Whilst noodling about these requirements I was also thinking about replacing the Thinkpad that runs my radio software in the shack. A lightbulb moment made me reconsider how the IT is structured here.

Currently, I have a CAT6 network throughout the house and down to the shack. The Acer sits in the garage attached to the house and there is a FreeNAS server in the shack who’s main purpose is to be a backup for the Acer. It also has a jail running some scripts to keep backups of the RaspberryPi devices. I then have a Thinkpad T60 as my shack computer.

The idea is to move the FreeNAS device to the garage and put the new server in the shack. If I ensured I could do IO virtualisation (so a VM could make best use of a video card and get isochronous access to USB devices) and ensured it had a good video card then I also use the new server as my shack computer.

Solution

Hardware

To get the virtualisation features means an Intel Xeon class processor or the AMD equivalent. I’ve been a fan of Supermicro for some time and saw that the X10SDV-TLN4F looked to be perfect. I did some research and came across this post by Benjamin Bryan on a Supermicro Datacentre in a box using a close relative of this board. Perfect.

In the end I opted to buy the SYS-5028D-TN4T barebones server which includes this motherboard in the stunning CSE-721TQ-250B case. This has four front access 3.5 drive slots and two internal positions for 2.5 drives. I also bought 32GB ECC memory from Crucial and four HGST 2TB drives from Hitachi.

This is an expensive build but I think it will be worth it. I haven’t bought the video card yet.

(Incidentally. Years ago (back in the 90’s), the rule of thumb was that the computer you really lusted after always cost £2000. For a while that hasn’t been true for desktops, but I reckon it’s still good for servers).

Software

In Ben’s build he opted for the Napp-IT in one approach of Illuminos/ESXi and then VMs. However, in the comments there were references to SmartOS. Having this would avoid the need for a PCI Host Bus Adapter and would be simpler.

SmartOS supports zones like Solaris and OpenIndiana but adds support for KVM virtual machines. SmartOS is run from a flash drive and builds ZFS zpools from the disks. Because you are starting from a flashdrive, SmartOS runs from a RAMDrive and isn’t persistent. This means the global zone should be kept simple: i.e. so you don’t install software in it. Instead, create another zone and install software in there.

The beauty of SmartOS zones is that they use the same kernel as the global zone: i.e. you only need space for new software and any data. What happens is that the new zone os created from a ZFS snapshot of the global zone. Elegant!

More on the build itself in part 2.