Troubleshooting server crashes

Fraser Campbell fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org
Fri Oct 3 18:10:30 UTC 2003


Hi,

When a Linux server crashes there are often clear messages in the logs 
indicating why ... out of memory and processes dying, file descriptors being 
exhausted, whatever.

When a server crashes and absolutely nothing interesting is in the logs what 
does a person do?  I generally suspect hardware problems but when a server 
has been rock solid historically I don't put a lot of faith in that and in 
any case it's just a guess.

The server is completely up to date; postfix, apache, courier (imap & pop) and 
ssh accessible to the Internet.  It also runs mysql (not Internet 
accessible).  Server 1 minute load average is normally less than 0.1.

What approaches do you guys take for tracking these things down? For now I've 
installed atsar to track resources, post crash (if it happens again) I can 
hopefully tell if it's a resource issue.

Although google is a great resource for finding specific error messages I find 
tracking down topics like this can sometimes be difficult.  A website that 
collected and organized troubleshooting tips would be a great idea, if only I 
had the time ;-)

Thanks,
-- 
Fraser Campbell <fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org>                 http://www.wehave.net/
Halton Hills, Ontario, Canada                       Debian GNU/Linux

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list