[GTALUG] server questions - - help needed

D. Hugh Redelmeier hugh at mimosa.com
Sun Jun 3 20:18:58 EDT 2018


| From: o1bigtenor via talk <talk at gtalug.org>

| My server has been operational for about a year and I am working on a
| number of different projects on it. Twice now (this last friday and 5
| weeks early I came into the office to find that the server has somehow
| been taken down and  has rebooted itself (process setup in the bios)
| but as it doesn't quite complete the boot process, I have to hit a key
| to tell it to continue and then finally to log in to read Debian
| (stable).
| 
| So I am trying to determine what may have caused the system to do a
| reboot,

Often a crash prevents logging.  Clearly logging would have to happen
after the crash, something that isn't easy when the system has
crashed.  But there is some hope.

Do you have a working UPS?  I don't, and I lose power a few times a
year.  That knocks out my computers (and clocks everywere).

Aside: all device classes evolve to have enough intelligence to have
clocks that need setting, and then evolve to be networked to set their
own clocks.  The timing of these steps is not fixed.

Can you believe that I grew up with phones that had no clock?

The first small computers I used had no clocks.  The big ones did so
that IBM could charge for the time that they were used (eg. one used
to rent machines and have to pay overtime if they worked more than one
shift).  CP/M's file system didn't have timestamps (the were added
long after I moved on).  MS-DOS stupidly used local time for
timestamps, even though UNIX got it right (used UTC) before MS-DOS.

| AIUI servers should be
| able to run happily for years without issues (barring hardware
| problems) so I want that kind of reliability. Where in /var/log will I
| be finding the most clues as to the events that lead up to this
| 'reboot'?

Not being a debian user, I don't know which files are most useful.  If
you are using systemd you might find that journalctl is the command
you need.

You could look at them all (you can skip the ones which haven't changed
recently).


I don't know why your system stops at the POST page.  Could it be that
your HDD doesn't spin up quickly enough for the normal boot logic?

I have one server that hangs because the EFI System Partition's
filesystem gets corrupted during a crash (oops).  I think that the
problem is that the OS leaves /boot/efi mounted most of the time
(that's dumb) so the filesystem gets marked as "dirty" and the
firmware doesn't like that.


More information about the talk mailing list