Travails of a Fledgling Sysadmin

Over the past couple weeks, this server has been experiencing some major issues. As the guy who’s supposed to be running the thing, these major issues have been quite trying for me.

**What follows is a rather uninteresting account of some nerd stuff**

For instance, here I am on a Sunday night in Wisconsin about to go to my grandmother’s 80th birthday. I’ve spent the past few weeks getting this fun server back into shape after it practically exploded with Plesk/PHP5/Apache/MySQL conflicts and incompatibilities.

The kind people at [insert our hosting provider’s name here], where we host, went back and forth, fixing one problem and breaking something else in the meantime, to the extent that I resorted to several ultimatums demanding a working server or I’d take my business elsewhere, etc…

Well, a week or so ago, things finally settled down, PHP started working (which is important, because it is what is running the scripts that make up this application), MySQL stopped not working (which is also important, because it is the database server which holds all the information in this web application), and the little server ecosystem was more or less at peace. So I’d gone back to something I actually know a little about, which is coding, glad that my ignorance at web server administration had ceased being put to the test.

But like I said, here I am on a Sunday night in Wisconsin, and all of a sudden I get an IM from Faith: “Uh, what happened to the server?”. Whoops–all PHP scripts are now returning an error 52, which is a “no data” error. I take a look at Apache’s error logs, and all I see is a bunch of “segmented execution fault” thread errors. I have no idea what this means, because these aren’t sensible web-server errors, but instead just signify that some binary somewhere blew up somehow.

To make things really awesome, my whole team is actually at work on this Sunday night, because tomorrow is a Big Day. It’s one of those Days where the work we are doing actually needs to work, behave, not have bugs, integrate well, and so forth. I can’t say exactly why it’s a Big Day, but it is. And now, my simple little web application isn’t working. It wouldn’t be so bad only from one perspective it’s really the glue that holds a lot of the online functionality for Teleios courses together. Whoops!

So we’re at T-minus some small number of hours, and my server blows up. I’m not too excited about my chances of getting actual tech support anytime soon, since the last time I had a problem it took two weeks to get it worked out. So I called our hosting company and asked about getting a new server ASAP, one with all the pieces working from the factory, instead of one that I hacked for a while then cried for help on, then got help that actually blew more stuff up, and so on. They said it was a no-go, as was our option for an OS reload and reinstall tonight, because, “I don’t even know if there are any guys that do OS reloads back there right now”.

Needless to say, we’re experiencing some major frustrations, as lots of testing needs to be done tonight and there’s no server to test on. So I plead with the tech support guy I’ve got on the phone, and he promises to take a look at things himself and fix it if he could. Meanwhile our backup plan is to hack together an in-house server for the Big Event and get a copy of this application installed there. Only thing is, I’ve never installed the application somewhere else, databases aren’t set up, data is not backed up, and… I can’t even have access to the in-house server because it’s…in-house! Meaning, behind a firewall in Orlando, and I’m in the middle of Dairyland, getting on a plane in a few hours where I won’t even have internet.

So I’m trying to remember how exactly to get this app running on a server, what all the parts are, how to set up the database, and kicking myself for not writing a nice little delivery script which would just install on command anywhere you wanted. I’m writing Joe a very detailed set of instructions on how to set all these things up, and hoping they’re detailed enough to get things right the first time, because if something goes wrong, well, I’m on a plane, and I can’t help.

Then, when everything seems most desperate, I get word from the tech support guys that they fixed the problem! They described it in detail to me, and I realized it’s something that I should have known how to fix myself (just a configuration file misload for apache), but couldn’t troubleshoot because I’m not that smart at linuxy stuff yet. I’d really like to be, so that I could actually handle some of these problems myself, but alas, I am not.

The story has a good ending, and has taught me some very valuable lessons about server administration; most importantly, don’t let them get screwed up. Because if you do that and try to fix them, you’ll probably break things worse. And then it’s all over!

By Jonathan Lipps

Jonathan worked as a programmer in tech startups for several decades, but is also passionate about all kinds of creative pursuits and academic discussion. Jonathan has master’s degrees in philosophy and linguistics, from Stanford and Oxford respectively, and is working on another in theology. An American-Canadian, he lives in Vancouver, BC and has way too many hobbies.

2 replies on “Travails of a Fledgling Sysadmin”

Leave a Reply

Your email address will not be published. Required fields are marked *