Thursday, April 29, 2010

Serious Uptime

I caught this article (Humming away since 1993) today and it definitely caught my eye and brought back fond memories of the start of my career.

It's about a server shipped by Stratus Computer that has been up and running since 1993. I consider my career as having started at Stratus (first as a Co-op then part-time through the rest of college), and it was in fact 1993. So this computer shipped from the company I first started working for around the time I started working and has been running my entire career so far. Crazy!

The money quote: "Around Y2K we thought it might be time to update the hardware, but we just didn’t get around to it."

You usually think in terms of whether something you worked on might still have the code out there running somewhere (I'm sure I have code dating back to 1996 running on Nortel Contivity switches somewhere out there, and code dating back to 2000 running on CIENA switches). But to think about an actual instance of hardware up and running nonstop for that long just kicks it up to a whole new level.

This made me reflect back on my time at Stratus. It was a great place to start, and back in another era. It was before any real open-source projects, and before you could go online and get answers around your programming problems almost instantly. Everything was in your head, in-house or in a book on your shelf, and all the expertise needed to be inside the company.

From a technology point of view it was great experience. It not only helped ingrain *how* to think about high availability and fault tolerance, but also that it *should* be thought about in the first place. The best lesson is probably that it forces you to think at a full-system level. Everything was redundant in the hardware - power supplies, memories, CPUs, backplane, boards. Anything could fail and be removed and replaced without the system missing a beat.

Now, this is super expensive of course. And around that time (1993-94) Stratus itself was moving away from mainframes with fault-tolerance to high availability clustering approaches. But still - it's cool as hell that there are kids out there driving cars around that were born after this thing first booted.

I worked in the HAL (High Availability LAN) group, mostly around the development of FDDI - itself a fault tolerant networking technology.

I got to work at the application level, in kernel code, and especially enjoyed in an embedded environment - the firmware running on the FDDI board itself.

It was a great kick-start to my career because Stratus had layoffs followed by attrition, which led to my being the sole software engineer running the FDDI project for a good chunk of time at about the time that my Coop stint was over (after which I was a part-time software engineer). Even better, the hardware engineers that had started the project also had left the company. There was great fun to be made of the fact that the project was being led forward by a Coop and a couple lab techs.

I suppose if it had been 2003, then the cool thing would be to drop out of college and start my own company. But in 1993, having responsibility of a full project within a large company was good enough! I loved having a challenge to rise to and doing it.

In the end I can't believe the article didn't say whether this sever is running VOS or FTX though. I bet VOS.

No comments:

Post a Comment