Troubleshooting server crashes
Fraser Campbell
fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org
Sat Oct 4 13:41:19 UTC 2003
On Friday 03 October 2003 15:18, Ilya Palagin wrote:
> > When a server crashes and absolutely nothing interesting is in the logs
> > what does a person do? I generally suspect hardware problems but when a
> > server
>
> Turn on the monitor to find out if there is a kernel panic. What
> actually happens when it crashes?
Nothing, it crashed ;-) Seriously, no life, could not wake up the video, fans
whirring but no other obvious signs that the computer is even on.
> > has been rock solid historically I don't put a lot of faith in that and
> > in any case it's just a guess.
>
> How old is it? Maybe it's time to clean contacts on SIMMs, run
> memtest86, replace a power supply (electrolytic capasitors get dry in
> 2-3 years), make sure fans are good, run badblock?
What I realized after sending the first email is that even though this server
has historically been very stable (yes, 2-3 years) about 2 weeks ago it was
pushed into some extra services (many more websites and went from 2 databases
to 44).
Although the server still doesn't break a sweat there is significantly more
processing going on. I'm leaning towards bad ram in light of the fact that
it's almost certainly using more ram and bad bits might be getting tickled
that were previously unused.
> Crashes of stable Linux distributions don't happen on regular basis,
> there is no need in troubleshooting tips website :-). Seriously - if one
> starts to experience problems having no recent soft/config changes in
> the Linux system, hardware must be checked. I've listed the most weak
> parts above.
You are correct. I've rarely had Linux server crashes and 99% of the time
swapping out hardware has fixed them, or increasing resources in the event of
OOM type problems.
Still there are so many possible Linux error messages, some common and not
resulting in a crash, some more serious ... sometimes it's hard to find
definitive answers. For example:
hda: timeout waiting for DMA
Wouldn't it be nice to have a knowledgebase somewhere telling you that this
error is usually nothing to worry about unless accompanied by other errors,
this error is a sign that you're using unsupported DMA modes, this error
means that your that your motherboard needs to be replaced, ???
I just tried searching Redhat's knowledgebase for "timeout waiting for DMA"
and the results are a joke. A total of 7 hits, first hit was "nsupdate not
working", the second hit was "sendmail hangs at boot" dma matched because of
senDMAil.
--
Fraser Campbell <fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org> http://www.wehave.net/
Halton Hills, Ontario, Canada Debian GNU/Linux
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list