(Simple?) High availability question

Mon Jun 4 18:28:06 UTC 2007

On 6/1/07, Madison Kelly <linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org> wrote:
> Lennart Sorensen wrote:
> > Well running primary/secondary bind is trivial.
>
>    Aye, this was the least of my concerns. :P
>
> > Running identical web servers is not too hard, although you have to
> > update both whenever you make page changes.  Doing round robin
> > connection distribution with a load balancer at the firewall isn't too
> > hard, and there are probably better load balancers that take system load
> > of each web server into account as well as checking that the web server
> > is working and such.
>
>    Since posting I found this:
> http://www.howtoforge.com/load_balancing_apache_mod_proxy_balancer
>
>    I'll be running 2.2, so this should help deal with the apache side of
> things pretty easily.
>
> > Running two mail servers is harder.  When a user reads or deletes a
> > message, how do you ensure the update occours on both?  Redundant mail
> > reception isn't too hard since you just have one be the main mail server
> > and the other a backup MX which simply holds and forwards mail to the
> > primary when it comes back up (most of the time the primary receives the
> > mail directly).
>
>    The mail reception I wasn't worried about exactly because of the
> simplicity of using multiple MX records. As you pointed out, it's the
> directing users to their mailbox and keeping both in sync where the
> trouble starts. This might be a better candidate for a share FS?
>
> > Redundant pgsql is VERY hard.  If all you want is static database data
> > then it wouldn't be a big deal and you could treat it like the web
> > server.  Of course this is almost never what anyone wants.  Last I
> > checked postgresql did not have live replication support, which is
> > basicly what is needed.  This is one of those places where oracle and
> > db2 have a reason for existing.  I believe mysql has a replicating
> > server backend, although apparently that backend is much slower and has
> > less features than the regular one, so it is a major tradeoff there.
> > People are working on replication support for postgresql, but they have
> > been working on it for years and I don't think it is working yet.  It is
> > a very complicated thing to implement.  Keeping in sync when two servers
> > are both up and already in sync is no big deal.  Getting back in sync if
> > one has been down is very hard, especially while data is still changing
> > on the live server.
>
> Foo. I was under the impression that is exactly what clustering was
> about. Is there a way to used a distributed file system (like coda) and
> have two servers talking to the save directory structure? I am going to
> go out on a limb and guess no.

No, the problem is that the term "clustering" is spectacularly
overloaded, having probably a dozen somewhat conflicting
interpretations.

Most modern database systems do not support having multiple servers
"talking" to the same directory structure.  In the case of PostgreSQL,
that is decidedly Right Out; if two servers happen to start writing on
the same directory structure, it is well-known that this immediately
leads to the forcible DESTRUCTION of the database.

We've had cases where IBM's HACMP product got confused, and mounted a
DB directory on two servers; in EVERY case, this has destroyed the
database.

> I may have to give up the idea of having load balancing at this time and
> stick with having the second server keep a mirror of the main server
> with a heartbeat between the two to have the backup take over on a
> failure of the main. Seems like a sad waste though with the second
> server just sitting there. :(

The trouble is that distributed lock management is a painful problem
(including at the theoretical level).  It's sure to be expensive,
offers considerable risk of race condition bugs, and tends to chop
down performance pretty heavily.

> The main websites I care most about uptime on use PgSQL and have
> frequent writes. Have you (or anyone) played with how to handle
> mirroring the WAL of PostgreSQL? I can run a simple 'rsync' on the
> backup server say every 5m but that won't help if the master failed
> after an rsync (very likely) and without an up-to-date copy of the WAL
> rebuilding the missing bits would be, if I understand it all right, not
> possible. I could have a script run on the master that dumps the
> databases very frequently and have the most recent loaded on failure but
> I'd still lose any changes between the last dump and the failure.
>
> I hope I can come up with something more robust. Perhaps I'll have to
> look into slony more.

Here's an overview document that is of some value...
http://www.postgresql.org/docs/8.2/static/high-availability.html

I'd say that there are two plausible answers:

1.  Have the secondary server be on more or less cold/cool standby using PITR.

http://www.postgresql.org/docs/8.2/static/continuous-archiving.html

Particularly if you have a bunch of databases, and modify them a fair
bit (a "use case" of this sort is where you have one PG postmaster and
have a whole bunch of databases perhaps used by customers' web
applications), this is probably the easy way.

2.  If you really want to have the standby server be usable to service
queries, you might use a replication system such as Slony-I.
<http://slony.info/>

-- 
http://linuxfinances.info/info/linuxdistributions.html
"...  memory leaks  are  quite acceptable  in  many applications  ..."
(Bjarne Stroustrup, The Design and Evolution of C++, page 220)
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists