A keystroke away from Doom.

Mauro Souza thoriumbr-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Mon Nov 4 14:17:39 UTC 2013


Wow, that would be really catastrophic!

I got my catastrophe story too.. I was working as an intern, and called the
storage admin to add 30 mainframe disks to a linux running Oracle. I
explicitly asked him to give me 30 disks starting on the range 0.0.4000
(explanation: this was a Linux running in a mainframe, mainframe disks at
that shop had only 7.9GB each, and there are not WWPN or LUN, just the
device address, something like 0.0.abcd, in hexa, and they map to
/dev/dasdxx, not /dev/sda or /dev/hda), so I would format them all,
partition and add to the VG using a script. I listed the current disks, and
found that /dev/dasdbt where the last disk.
The next morning the guy called me, said that the disks are ready. The guy
gave me half the disks on the desired range, and half scattered everywhere.
I changed my script to format everything from /dev/dasdbu onwards. And
Oracle died, and the database had no backup...

I didn't got fired, but it took us a lot of work to restore the system...

And a second story: my boss wanted to change the ownership of /var/firebird
to firebird:dba on our only production server, and issued "chown -R
firebird:dba .", thinking he was in /var/thunderbird... The command took
more time than it should, and he saw he was in /var, issued "cd firebrid"
(not firebird), and issued the chown... He tried to hit ^C, but was too
late... Broke everything: our Apache died, our email died, all data from
our clients where on /var and went inaccessible, and (as always on any
disaster history) backups were never made...

Mauro
http://mauro.limeiratem.com - registered Linux User: 294521
Scripture is both history, and a love letter from God.


2013/11/4 Jamon Camisso <jamon.camisso-H217xnMUJC0sA/PxXw9srA at public.gmane.org>

> On 13-10-19 01:41 AM, Robert Brockway wrote:
> > (2) One night I was up late working on a problem.  I stayed up late
> > working on this because I was stuck trying to solve it.  The problem was
> > that the database backups were not restoring properly, and as we all
> > know a backup needs to be tested to be a good backup.  The developers
> > were loading real data in preparation for launch so I had to get this
> > working soon.
> >
> > I was dumping the database to run another restore test and I put the
> > redirect around the wrong way.  In my tired state I thought I had
> > over-written the database. My stress levels went up rather suddenly.  I
> > assessed the situation and confirmed that I had not in fact damaged the
> > database.  This reminded me of another important moral I ostensibly
> > already knew but wasn't following:
> >
> > Moral: Don't do sysadmin when extremely tired.  It will only end in
> tears.
>
> Here's one for the late night OMGWTF files:
>
> It is really easy to remove an entire volume group with lvm tools versus
> reducing them by tab completing commands. On a production SAN. Running
> 20+ VM guests.
>
> So imagine if you will an entire 4TB SAN with no volume groups defined
> with VMs now running on top because I pebkac'ed a tab complete. It is
> that easy - tab completion is usually a great aid in my .zshrc, but for
> root I've turned it off completely and make sure to use bash so I have
> to type everything out explicitly.
>
> To save things I paused all VMs and walked away for 5 minutes. This is a
> corollary to Robert's Moral 1) - WALK AWAY when things mess up like
> this. Like step away from the computer and walk/move/breathe. Then get
> out the disaster recovery plan and read the overview.
>
> After doing the above, with VMs paused and having got my bearings and
> planned for a late night data centre visit, I restored the volume group
> using a backup from /etc/lvm/backup. I unpaused and rebooted all the VMs.
>
> No one knew how close we'd come to complete failure.
>
> Jamon
> --
> The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gtalug.org/pipermail/legacy/attachments/20131104/5ad2abc1/attachment.html>


More information about the Legacy mailing list