The recent unpleasantness

Last Tuesday, my server went down. This is not unusual; when you run your own server, you come to expect service to be interrupted once in a while. Usually it’s the router, and usually all I need to do is power-cycle it. Until recently, if this happened while I was at work, I’d just dash home and fix it. My current commute makes this impractical, so the downtime can last multiple hours, until I get home.

This time, it was even worse. When I got home, before I even got in the door, I could hear the alarm from my battery back-up. Something had gone seriously wrong with the system, presumably with the PSU. The machine was unbootable.

In times past, when my server suffered catastrophic hardware failure, I’d fix it with a transplant from my Windows machine, and then go buy a replacement part as soon as the shops open the next day. But this assumes a certain equivalence of hardware. Ever since I moved across the country, and left my old server behind to minimize downtime, my server has lived in a small-form-factor Shuttle box. This has a nice quiet PSU, but it’s a nonstandard size and shape, designed to fit snugly in the one case it was optimized for. I do like the system a lot, and I’ve been saying for some time that I should replace my big noisy Windows machine with another Shuttle box when I upgrade again, but since I started this blog, and have devoted most of my gaming hours to old stuff, I’ve had little motivation to upgrade.

There was another option for a transplant, though: move the hard drives into the Windows machine. Having done this, it failed to boot. Attempting to do an emergency repair of the OS, I discovered that the version of Linux I was using (Debian Sarge) didn’t recognize several of the system’s internal devices, including the specific ethernet adaptor and the SATA port. So it couldn’t access my data or the Internet, which kind of made it a failure as a server.

At this point, I was thinking that I’d need to get some replacement hardware before I could get the server up and running again. Which posed another problem: With my current commute, I can’t shop for hardware on weekdays. My job is in a location with nothing around it except cheap office space. My home is within walking distance of an excellent computer store, but it’s not open yet when I leave in the morning and closed when I get home. In desperation, I placed a rush order with newegg, telling them to deliver to the office, but this ran into validation problems because my credit card company didn’t have the office listed as an address of mine. By the time this got striaghtened out, the advantage of ordering online had been lost: the weekend was approaching, so I figured I might as well wait it out and buy the necessary components personally.

Which, ultimately, I didn’t do. When Saturday rolled around, I got the server up again by upgrading Debian Linux to the newest stable release, which recognizes the hardware on my working machine. Upgrading to a new release of Debian is always a pain — that’s why I was still running Sarge after all this time. Even now, after a day of tinkering, I don’t seem to have the mail server completely right. Nonetheless, it’s up, as you can tell by the fact that you’re reading this.

It’s not entirely happy with the new hardware, though. The load average keeps on getting into double digits. I’ve set up a cron job to restart mysql and apache every 15 minutes, which keeps it from getting entirely wedged, but clearly this is not an ideal solution. Also, it was periodically overheating, especially when doing something computationally intensive, like attempting to install upgraded Linux packages. The OS is smart enough to throttle down when this happens, but whenever it did, it would issue a warning to all consoles (messing up any text editor I had open) and beep. And then it would beep again a second or two later when the temperature came back to acceptable parameters. It’s as if the system had hiccups. I managed to turn off the warning and the beeping, but it’s just one more reason I need to get this system back into something it’s happy with, before it burns down the house or something. It all makes me wonder what was going on when the same hardware was running Windows. Was it running hot and simply not telling me?

Anyway, I have more hardware on order — not a rush order this time, because the crisis has passed. But in the meantime, I’m without a Windows machine. Which means it’s time to switch over to the PS2 for a while. As far as I’m concerned, the big lesson from this whole experience has been that it’s really inconvenient to not live near where you work and also work near where you shop. I suppose other people would derive a different lesson: that it’s not worth it to run your own server, not in the 21st century when there are plenty of reliable free alternatives. But that’s crazy talk.

No Comments

Leave a reply