Symantec Gateway Model 5420 issues

Thu Jan 17 02:00:48 UTC 2008

A friend is having some trouble at work with a Symantec Gateway Model
5420 firewall.  It's proxying outbound traffic for about 160 people
and inbound to their webserver which is in a separate DMZ on its own
network card.  The firewall has recently load-spiked and entirely
stalled a couple times and they have no idea why since the device has
been working pretty well for about three years.  I was asked to look
at it because I have some knowledge of Linux, but couldn't figure out
what the problem was.  What follows is semi-random notes and
observations in the hopes that someone can suggest a plan of action.

The box is a 1U including a 2 GHz Celeron, 512 Mb of RAM, a 40 Gb HD,
and three network cards.  It appears to run on a modified RedHat 7.1
system with kernel 2.4.26.  The kernel build date is quite recent, I
think October or November of 2007, so there have been updates.  I was
fascinated to find that the HD is using LVM, and the three partitions
are all reiserfs.

Over a couple hours of observation, I saw the load fluctuate between
0.1 and 0.7.  This seemed a bit high to me considering that this is an
application where the load can surge quite abruptly?  There's a 2Gb
swap partition, which was only slightly used.  I thought the cheapest
possible test would be to stick in more memory, but unfortunately this
box apparently maxes out at 512Mb, already installed.

It's possible to SSH into the box, but apparently Symantec intended
you to access it primarily through the Java browser interface (which
runs Java both on the client and the server, and, ironically, is
itself the source of quite a bit of processor load when you look at
the rather large logs).  /var/log/messages has very little in it at
all, and nothing out of the ordinary at all when the spikes happened.
The "interesting" logs are in /var/log/sg/ and are in a binary format
(with some text) that I don't know how to read, so I was forced back
to the web interface.  I did take the time to notice that data was
arriving in those logs at about 20Mb/hour, which would translate to
roughly 125 work days to fill the 20Gb /var/ partition.  The guy who
deals with the system the most says that sounds about right, he
rotates the logs of the box quite frequently.  Something Symantec had
expressed an interest in was the number of httpd processes running -
apparently around 500.  The bulk of the traffic they proxy is web.

I spent a lot of time combing through the log, looking both at
"normal" times and at the time when it apparently spiked and choked.
I couldn't see anything that looked suspicious - no significant
increase in volume, no unusual transactions.  But it's a very dense
log and I may have missed something.

At this point the only things I can think of are flaky hardware (HD,
NIC, disk controller, memory ...) or high load.  Neither seems likely,
and both are hard to track (this is a live device with no backup).
Any suggestions would be appreciated.

-- 
Giles
http://www.gilesorr.com/
gilesorr-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists