Server Weirdness

Madison Kelly linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Sat Oct 10 19:15:51 UTC 2009


Mark Lane wrote:
> I have been having problems with a CentOS 5.3 Fileserver (64 Bit) lately 
> that wants to run reboot all of a sudden. It was up for over 200 days 
> without an issue. I have run a burn in on the server and it was fine. 
> Checked the memory with memcheck86+ and it was fine. Ran a CPU burn-in 
> along with bonnie++ looping for 5 hours and the I couldn't get the CPU 
> to over heat or the power supply to choke on heavy load. It seems to be 
> a power management problem, yet am even running the same kernel that it 
> ran for 200 days hasn't made the system stable. I did have to replace 
> the motherboard battery but I have restored the BIOS settings and the 
> problems existed before the battery went. The system just stops working 
> without warning and it's getting worse.  It only seems to happen when 
> it's somewhat idle for an extended period.
> 
> It's a Athlon 64x2 3800+ Running on a MSI K9N Platinum with Linux 
> software raid 5 across 4 WD 250 Satas. I have checked the drives and 
> they seem fine. I am currently running FSCK to see if it finds any 
> problems.
> 
> Anyone else experiencing problems with CentOS 5.3 lately? I am wondering 
> if it's a package that might be causing the instability. And yes I have 
> checked to see if the system was compromised but I haven't found anything.

What daemons/services are you running? I've run into bad openais and 
cman RPMs that messed things up, but not reboots (unless you have fence 
devices in which case they could be in a fence loop, but not likely).

As for possible simple problems, check the fans. If they're 
sleeve-bearing fans, they could have "spun out". They'll work sometimes 
(sometimes with noise, other times quiet), and occasionally stop. If 
they were running during your burn-in you would not reproduce the 
reboots. However, if they stop, particularly the CPU fan, it could 
over-heat and trigger a thermal shutdown/reboot.

A bad power supply could also do this, but that is less likely.

Madi
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list