A keystroke away from Doom.

Jamon Camisso jamon.camisso-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Mon Nov 4 13:31:47 UTC 2013


On 13-10-19 01:41 AM, Robert Brockway wrote:
> (2) One night I was up late working on a problem.  I stayed up late
> working on this because I was stuck trying to solve it.  The problem was
> that the database backups were not restoring properly, and as we all
> know a backup needs to be tested to be a good backup.  The developers
> were loading real data in preparation for launch so I had to get this
> working soon.
> 
> I was dumping the database to run another restore test and I put the
> redirect around the wrong way.  In my tired state I thought I had
> over-written the database. My stress levels went up rather suddenly.  I
> assessed the situation and confirmed that I had not in fact damaged the
> database.  This reminded me of another important moral I ostensibly
> already knew but wasn't following:
> 
> Moral: Don't do sysadmin when extremely tired.  It will only end in tears.

Here's one for the late night OMGWTF files:

It is really easy to remove an entire volume group with lvm tools versus
reducing them by tab completing commands. On a production SAN. Running
20+ VM guests.

So imagine if you will an entire 4TB SAN with no volume groups defined
with VMs now running on top because I pebkac'ed a tab complete. It is
that easy - tab completion is usually a great aid in my .zshrc, but for
root I've turned it off completely and make sure to use bash so I have
to type everything out explicitly.

To save things I paused all VMs and walked away for 5 minutes. This is a
corollary to Robert's Moral 1) - WALK AWAY when things mess up like
this. Like step away from the computer and walk/move/breathe. Then get
out the disaster recovery plan and read the overview.

After doing the above, with VMs paused and having got my bearings and
planned for a late night data centre visit, I restored the volume group
using a backup from /etc/lvm/backup. I unpaused and rebooted all the VMs.

No one knew how close we'd come to complete failure.

Jamon
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list