Page 1 of 1

Off to do battle

Posted: Sun Mar 02, 2014 10:45 pm
by BobbyK
So, I've had a fun weekend here.

Saturday morning, $CABLEMONOPOLY replaces the cable modem at Customer Site with a new unit, to allow them to upgrade from 18/2 to 100/10 service.

Our remote management and monitoring system uses http or https hits to check in and communicate. Servers are set to check in every 30 seconds, workstations every 5 minutes.

Starting about 3 hours after the installer leaves, we start getting server down notifications, followed by server up a couple of minutes later (specifically between 4 minutes and 6 minutes later). The servers impacted are all at this site, but there's no rhyme or reason or pattern. It's a mix of physical and virtual machines. There's 12 servers at the site, and no more than 2 at a time report down at the same time during all of this.

End users call to bitch about websites timing out. For testing purposes, I temporarily disable the web proxy on the perimeter UTM, with no change. So it's not that.

I finally manage to get a wireshark capture from both our RMM server, and an impacted machine, capturing normal checkins, badness, and the resumption of normal checkins. Guess what I see?

On our server, I see several minutes of no packets, followed by what Wireshark flags as a TCP Retransmit, a bunch of TCP Duplicate ACKs.

On the customer side, I see ACKs suddenly stop, and then a bunch of retransmits.

Pretty clear cut, right?

:evil: Nope. MOUTHBREATHER#1 at $CABLEMONOPOLY says "Signal is good, and I can ping the modem. It's not our problem."

I would like to point out that this is the same $CABLEMONOPOLY that we had to get the state's Public Service Commission involved before they would correct and admit that they fucked up an ACL on a CMTS somewhere, and blocked port 5060 TCP/UDP for about 1000 of their customers.

I'm either going to need a new liver later, or bail money. I know where their support call center is, and I'm now on hold for 15 minutes waiting for someone at the NOC, assuming that the aforementioned mouthbreather is in fact escalating me as requested.

FML

Re: Off to do battle

Posted: Sun Mar 02, 2014 11:34 pm
by Greg
Flap on, flap off, the flapper!

Re: Off to do battle

Posted: Sun Mar 02, 2014 11:41 pm
by BobbyK
I did in fact use the phrase "flapping like a flock of geese" with the dude at the NOC. Who then proceeded to demand that a tech be sent onsite in the morning.

<sigh>

Re: Off to do battle

Posted: Mon Mar 03, 2014 12:44 am
by 308Mike
I feel for ya' brother!!! I don't know how many times I did battle with our NOC in the middle between us and corporate. Our NOC would pass my data and testing only to have corporate say it wasn't their problem, and we'd argue it wasn't us - it was THEM causing it. I spent more than an entire weekend trouble-shooting a Microsoft Policy issue someone from upstream had created and was screwing up EVERY NEW MACHINE attached to the AD tree, but had no effect on stand-alone machines.

We were able to FINALLY get them to change the screwed up policy, but not after costing us and the company many HUNDREDS OF THOUSANDS of dollars in lost productivity and salaries - but only AFTER I was able to PROVE the problem was because of one of THEIR F'ED up policies pushed down to us, was causing the problem.

Talk about a head-scratcher!! I kept coming up with the same problems, the NOC kept saying they had no idea why it was happening, and Corp was saying they had nothing to do with it.

When they FINALLY got someone to review the MS policies modified by corporate (of course, they NEVER messed up our Linux/Unix machines used by our engineers), they FINALLY found the problem and meekly admitted they'd created it but never issued an apology for all the wasted time and resources tracking down (and PROVING) THEIR ERROR.

YES, I still get hot thinking about it.

I UNDERSTAND!!!

Re: Off to do battle

Posted: Mon Mar 03, 2014 2:01 am
by randy
If you need bail money after (allegedly) trashing said company's customer non-support center, I'm in.

Re: Off to do battle

Posted: Mon Mar 03, 2014 2:10 am
by First Shirt
I'm always willing to contribute to a worthy cause. I'm in!

Re: Off to do battle

Posted: Mon Mar 03, 2014 7:12 pm
by BobbyK
And to add insult to injury, three additional, widely dispersed sites on the same ISP have started flapping, as well.