By paying attention to some potential warning signs, you may be able to catch a server failure before it happens. Here are 3 signs that always make me shiver a bit when they start happening on a recurring basis. If you see these, you may want to dig into the system to see if it yields any clues. At a minimum, make sure you have some good server backups in place.
When I look back over years of dealing with systems, I find spontaneous server failures rare. Before hardware failures, I often see a string of random crashes. Unlike some crashes, these crashes cannot be tied to high loads, specific processes, cron jobs or other events. The system simply dies. This is a warning sign that should not be ignored.
If you have a server management provider, be sure that they log these events. I know many companies happily hit the reboot button and consider it resolved. This approach misses this key sign of possible server failure and prevents you from proactively fixing the issues.
If you cannot find a cause for random crashes, consider migrating to another server.
File System Issues
If the file system constantly needs checks or is going read-only frequently, your server may be dying. I’ve seen many cases where file system repeatedly go read-only for no clear reason. Even after extensive repairs, the file system quickly fails again. If you see, this then a full failure may be pending.
File system maintenance is very important to both normal operations but also disaster recovery. If you backup your server using an imaging tool, like R1Soft’s CDP server, file system issues will also be present in your backups.
If you see frequent file system corruption, consider migrating to another server.
Command Line Freezes
Another thing I’ve noticed is unexplained freezes or hangs while working at the command line. On several servers that have failed, I noticed prior to the failure odd behavior. For example, you will be running system commands and they return promptly. You re-execute the same command and find it hangs for a few seconds before running. This behavior is completely random.
If you see these freezes or hangs, you will want to rule out network or server load conditions. I’ve seen failing disks or corrupt memory cause these odd hangs. In either case, this may be another early warning sign of trouble to come.
If you see unexplained freezes at the command line, you may want to consider replacing your server.
This is just advice based on my 10+ years of experience of working with servers. I don’t have any hard facts and there are many symptoms for the issues above. However, when I’ve ruled out the obvious and still cannot find a cause either server crashes, file system corruption or system freezes, I get an uneasy feeling. When I get that knot in my gut, I go with it and typically recommend you change your server.
Do you have any gut reactions about when a server will fail? With virtualization and other technologies making server migration easier, I am more and more inclined to migrate a suspect system than wait for disaster. The peace of mind is well worth the cost. I would like to hear about your early warning signs.