<div dir="ltr"><div><div><div><div>Wow, that would be really catastrophic!<br><br></div>I got my catastrophe story too.. I was working as an intern, and called the storage admin to add 30 mainframe disks to a linux running Oracle. I explicitly asked him to give me 30 disks starting on the range 0.0.4000 (explanation: this was a Linux running in a mainframe, mainframe disks at that shop had only 7.9GB each, and there are not WWPN or LUN, just the device address, something like 0.0.abcd, in hexa, and they map to /dev/dasdxx, not /dev/sda or /dev/hda), so I would format them all, partition and add to the VG using a script. I listed the current disks, and found that /dev/dasdbt where the last disk.<br>


</div>The next morning the guy called me, said that the disks are ready. The guy gave me half the disks on the desired range, and half scattered everywhere. I changed my script to format everything from /dev/dasdbu onwards. And Oracle died, and the database had no backup...<br>


<br></div>I didn't got fired, but it took us a lot of work to restore the system...<br><br></div>And a second story: my boss wanted to change the ownership of /var/firebird to firebird:dba on our only production server, and issued "chown -R firebird:dba .", thinking he was in /var/thunderbird... The command took more time than it should, and he saw he was in /var, issued "cd firebrid" (not firebird), and issued the chown... He tried to hit ^C, but was too late... Broke everything: our Apache died, our email died, all data from our clients where on /var and went inaccessible, and (as always on any disaster history) backups were never made...<br>


</div><div class="gmail_extra"><br clear="all"><div>Mauro<br><a href="http://mauro.limeiratem.com">http://mauro.limeiratem.com</a> - registered Linux User: 294521<br>Scripture is both history, and a love letter from God.</div>


<br><br><div class="gmail_quote">2013/11/4 Jamon Camisso <span dir="ltr"><<a href="mailto:jamon.camisso-H217xnMUJC0sA/PxXw9srA@public.gmane.org" target="_blank">jamon.camisso-H217xnMUJC0sA/PxXw9srA@public.gmane.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">On <a href="tel:13-10-19%2001" value="+13101901">13-10-19 01</a>:41 AM, Robert Brockway wrote:<br>

> (2) One night I was up late working on a problem.  I stayed up late<br>

> working on this because I was stuck trying to solve it.  The problem was<br>

> that the database backups were not restoring properly, and as we all<br>

> know a backup needs to be tested to be a good backup.  The developers<br>

> were loading real data in preparation for launch so I had to get this<br>

> working soon.<br>

><br>

> I was dumping the database to run another restore test and I put the<br>

> redirect around the wrong way.  In my tired state I thought I had<br>

> over-written the database. My stress levels went up rather suddenly.  I<br>

> assessed the situation and confirmed that I had not in fact damaged the<br>

> database.  This reminded me of another important moral I ostensibly<br>

> already knew but wasn't following:<br>

><br>

> Moral: Don't do sysadmin when extremely tired.  It will only end in tears.<br>

<br>

</div>Here's one for the late night OMGWTF files:<br>

<br>

It is really easy to remove an entire volume group with lvm tools versus<br>

reducing them by tab completing commands. On a production SAN. Running<br>

20+ VM guests.<br>

<br>

So imagine if you will an entire 4TB SAN with no volume groups defined<br>

with VMs now running on top because I pebkac'ed a tab complete. It is<br>

that easy - tab completion is usually a great aid in my .zshrc, but for<br>

root I've turned it off completely and make sure to use bash so I have<br>

to type everything out explicitly.<br>

<br>

To save things I paused all VMs and walked away for 5 minutes. This is a<br>

corollary to Robert's Moral 1) - WALK AWAY when things mess up like<br>

this. Like step away from the computer and walk/move/breathe. Then get<br>

out the disaster recovery plan and read the overview.<br>

<br>

After doing the above, with VMs paused and having got my bearings and<br>

planned for a late night data centre visit, I restored the volume group<br>

using a backup from /etc/lvm/backup. I unpaused and rebooted all the VMs.<br>

<br>

No one knew how close we'd come to complete failure.<br>

<span class="HOEnZb"><font color="#888888"><br>

Jamon<br>

</font></span><div class="HOEnZb"><div class="h5">--<br>

The Toronto Linux Users Group.      Meetings: <a href="http://gtalug.org/" target="_blank">http://gtalug.org/</a><br>

TLUG requests: Linux topics, No HTML, wrap text below 80 columns<br>

How to UNSUBSCRIBE: <a href="http://gtalug.org/wiki/Mailing_lists" target="_blank">http://gtalug.org/wiki/Mailing_lists</a><br>

</div></div></blockquote></div><br></div>