Troubleshooting erratic network performance

ted leslie tleslie-RBVUpeUoHUc at public.gmane.org
Sat Jun 6 02:57:07 UTC 2009


i ran into similar issue, but with mine, 
it was a iptables

 limit: avg ####/sec burst #  

type policy, that got exceeded,

as limit was hit, some people noticed issue, as it was really hit hard, everyone got timeouts.

you're fine one minute, then the threshold get hits, and boom! users start refreshing away, 
starts a queue, then the limit is almost always being exceeded, and it melts down,

as long as your sure no errors or drops in ifconfig, you can rule out  duplex issue to the switch.

i assume you check the apache error logs?

i assume with no one hitting it, you get perfect responce?
to rule aid more in ruling in or out the web server (vs. network), you could 
use iptables to forward traffic to one of the other boxes.

as for the arp cache,
just config the ethernet card (new one), or new box with the same MAC address, to remove the 6 hour cache issue.

-tl



On Fri, 05 Jun 2009 20:36:04 -0400
Jamon Camisso <jamon.camisso-H217xnMUJC0sA/PxXw9srA at public.gmane.org> wrote:

> So it had to happen sometime. Friday, 4:30pm, developers go live with a 
> new release. Server performs just fine until someone upgrades something 
> and damn, I'm stuck in the server room on a gorgeous Friday night.
> 
> Hardware:
> 1) Sun Xfire 4600, 16 core opteron /w 32gb ram. A beast.
> 2) Quad Intel 82546EB Gigabit ports
> 3) Old and new ethernet cables
> 4) 4 other identical machines that haven't showed any signs of trouble.
> 
> OS:
> 1) 2.6.29-gentoo-r5 kernel /w e1000 ethernet driver loaded as a module.
> 2) 2.6.25-gentoo-r7 kernel /w e1000 ethernet driver loaded as a module.
> 
> Symptoms:
> 1) At first it looked like apache was having a problem with mod_rails. 
> Disabled that, nope, still erratic timeouts on web pages. Disabled 
> mod_proxy_ajp, still odd slowdowns. MaxClients are fine etc. Server had 
> 60 days of uptime, during which time it performed admirably. The problem 
> seems limited to apache, but then again, apart from a bit of mail, there 
> isn't much network throughput apart from it, so it's kind of hard to 
> diagnose without installing a whole other webserver.
> 
> 2) ab run from another host on gigabit link behind the same switch comes 
> in at a paltry 1.93 pages/second. Usually 50-100 is to be expected 
> depending on the vhost being served.
> 
> 2) vmstat looks absolutely normal, like it has for the last 60 days. No 
> blocking processes, cpu almost 100% idle, i/o is negligible.
> 
> 3) smartctl shows all 4 sas drives in the raid5 array are healthy, as 
> does /proc/mdstat. No disk problems that I can see.
> 
> 4) Outgoing network throughput is as expected, full kernel from 
> kernel.org in seconds.
> 
> 5) netstat shows about 300 connections at any given time, no large 
> fluctuations at all.
> 
> 6) The only odd looking graph (using munin) is the fork rate. I think 
> the spike I see of 150 forks/second is apache starting up after 
> rebooting though. Probably a red herring.
> 
> 7) tcpdump shows dropped packets sometimes when the problem is 
> occurring. Only sometimes. iptables looks fine, and ifconfig doesn't 
> show any dropped packets. I'm not too familiar with tcpdump, but I made 
> sure to grab some packets so I can pore over them and try to glean 
> something. *Any tips as to what to look for would be appreciated.*
> 
> Haven't looked at sysctl settings at all, things have been fine up until 
> now, and the other admin rebooted so I'd expect that any odd problem 
> somewhere in the bowels of the system's tcp stack would have gone away.
> 
> Any thoughts or suggestions of where to look next? I consider myself a 
> capable admin for basic stuff like setting up apache, databases etc., 
> but this problem has me completely baffled. I'd swap the disks to 
> another chassis, but some may recall a problem I had with the same 
> servers a month ago where switching ip addresses and mac addresses takes 
> 6 hours of so for the switch to realize what's happened, so that's 
> probably a no go.
> 
> Thanks! Jamon
> --
> The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
> 


-- 
ted leslie <tleslie-RBVUpeUoHUc at public.gmane.org>
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list