Discussion:
Cogent Router dropping packets
(too old to reply)
Ryan Harden
2008-04-17 20:35:22 UTC
Permalink
Just spoke to cogent about another issue, said they only know about
issues in Los Angeles. Nevertheless, 877.726.4386 or ***@cogentco.com

/Ryan
Hi,
Some of our VoIP customers are experiencing issues using our service and it
only happens when routing through Cogent.
Does anyone have a contact for them?
Thanks
Host Loss% Snt Last Avg
Best Wrst StDev
1. adsl-63-194-xxx-xxx.dsl.lsan03.pacbell.net 0.0% 6049 13.7 24.2
8.4 72.2 11.1
2. dist3-vlan60.irvnca.sbcglobal.net 2.0% 6049 19.0 23.5
8.6 217.4 13.5
3. bb1-p6-7.emhril.ameritech.net 0.0% 6049 31.7 43.0
8.6 317.3 46.4
4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 6049 19.8 44.3
9.4 487.8 49.5
5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 6049 15.0 41.1
9.5 675.9 53.4
6. vl3491.ccr02.lax01.atlas.cogentco.com 3.3% 6049 34.3 29.4
9.9 337.5 25.4
7. te3-4.ccr01.lax04.atlas.cogentco.com 5.2% 6049 19.3 30.1
10.4 275.1 23.6
8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 5.1% 6049 35.0 27.7
10.2 227.3 12.2
9. PAETEC_Communications_Inc.demarc.cogentco.com 5.3% 6049 18.3 35.8
9.8 252.4 33.6
10. gi-4-0-1-3.core01.lsajca01.paetec.net 5.5% 6049 17.3 37.9
13.3 1054. 43.5
11. po-5-0-0.core01.anhmca01.paetec.net 5.5% 6049 30.7 37.5
13.7 1042. 37.9
12. gi-3-0-0.edge03.anhmca01.paetec.net 5.3% 6049 19.7 33.3
12.9 385.1 20.7
13. 74.10.xxx.xxx 5.9% 6049 23.8 35.8
18.1 86.5 11.8
14. 74.10.xxx.xxx 5.2% 6049 40.4 36.4
18.0 91.1 11.8
- --
Ryan M. Harden, BS, KC9IHX Office: 217-265-5192
CITES - Network Engineering Cell: 630-363-0365
2130 Digital Computer Lab Fax: 217-244-7089
1304 W. Springfield email: ***@uiuc.edu
Urbana, IL 61801

University of Illinois at Urbana/Champaign
University of Illinois - ICCN
Mike Fedyk
2008-04-17 21:15:24 UTC
Permalink
Thank you, the issue seems to be fixed now at Cogent.

Does anyone know how often issues like this seem to crop up? I'm wondering
to see how hard I should push here for routing around cogent for networks
our customers connect from.

Mike

-----Original Message-----
From: owner-***@merit.edu [mailto:owner-***@merit.edu] On Behalf Of Ryan
Harden
Sent: Thursday, April 17, 2008 1:35 PM
To: Mike Fedyk
Cc: ***@merit.edu
Subject: Re: Cogent Router dropping packets



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just spoke to cogent about another issue, said they only know about issues
in Los Angeles. Nevertheless, 877.726.4386 or ***@cogentco.com

/Ryan
Hi,
Some of our VoIP customers are experiencing issues using our service
and it only happens when routing through Cogent.
Does anyone have a contact for them?
Thanks
Host Loss% Snt Last
Avg
Best Wrst StDev
1. adsl-63-194-xxx-xxx.dsl.lsan03.pacbell.net 0.0% 6049 13.7
24.2
8.4 72.2 11.1
2. dist3-vlan60.irvnca.sbcglobal.net 2.0% 6049 19.0
23.5
8.6 217.4 13.5
3. bb1-p6-7.emhril.ameritech.net 0.0% 6049 31.7
43.0
8.6 317.3 46.4
4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 6049 19.8
44.3
9.4 487.8 49.5
5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 6049 15.0
41.1
9.5 675.9 53.4
6. vl3491.ccr02.lax01.atlas.cogentco.com 3.3% 6049 34.3
29.4
9.9 337.5 25.4
7. te3-4.ccr01.lax04.atlas.cogentco.com 5.2% 6049 19.3
30.1
10.4 275.1 23.6
8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 5.1% 6049 35.0
27.7
10.2 227.3 12.2
9. PAETEC_Communications_Inc.demarc.cogentco.com 5.3% 6049 18.3
35.8
9.8 252.4 33.6
10. gi-4-0-1-3.core01.lsajca01.paetec.net 5.5% 6049 17.3
37.9
13.3 1054. 43.5
11. po-5-0-0.core01.anhmca01.paetec.net 5.5% 6049 30.7
37.5
13.7 1042. 37.9
12. gi-3-0-0.edge03.anhmca01.paetec.net 5.3% 6049 19.7
33.3
12.9 385.1 20.7
13. 74.10.xxx.xxx 5.9% 6049 23.8
35.8
18.1 86.5 11.8
14. 74.10.xxx.xxx 5.2% 6049 40.4
36.4
18.0 91.1 11.8
- --
Ryan M. Harden, BS, KC9IHX Office: 217-265-5192
CITES - Network Engineering Cell: 630-363-0365
2130 Digital Computer Lab Fax: 217-244-7089
1304 W. Springfield email: ***@uiuc.edu
Urbana, IL 61801

University of Illinois at Urbana/Champaign
University of Illinois - ICCN
David Coulson
2008-04-17 23:59:59 UTC
Permalink
Cogent frequently have routing and packet loss issues. I can't imagine
VoIP over their network is all that appealing to most people. Last time
I used Cogent I had a problem approx. every month, and I purchased
transit from them.

Good luck :-)
Post by Mike Fedyk
Thank you, the issue seems to be fixed now at Cogent.
Does anyone know how often issues like this seem to crop up? I'm wondering
to see how hard I should push here for routing around cogent for networks
our customers connect from.
Paul Stewart
2008-04-18 00:33:34 UTC
Permalink
Same here... frequent packet loss. We had Cogent GigE service for about
9 months if I recall - more than one major outage per month and packet
loss issues at least once a week.

You get what you pay for (within reason)....


---
Paul Stewart
Senior Network Administrator
Nexicom
5 King St. E., Millbrook, ON, LOA 1GO
Phone: 705-932-4127
Web: http://www.nexicom.net
Nexicom. Connected. Naturally.




-----Original Message-----
From: owner-***@merit.edu [mailto:owner-***@merit.edu] On Behalf Of
David Coulson
Sent: Thursday, April 17, 2008 8:00 PM
To: Mike Fedyk
Cc: 'Ryan Harden'; ***@merit.edu
Subject: Re: Cogent Router dropping packets


Cogent frequently have routing and packet loss issues. I can't imagine
VoIP over their network is all that appealing to most people. Last time
I used Cogent I had a problem approx. every month, and I purchased
transit from them.

Good luck :-)
Post by Mike Fedyk
Thank you, the issue seems to be fixed now at Cogent.
Does anyone know how often issues like this seem to crop up? I'm wondering
to see how hard I should push here for routing around cogent for networks
our customers connect from.
----------------------------------------------------------------------------

"The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you."
Mike Fedyk
2008-04-19 00:05:58 UTC
Permalink
(Crossed Fingers)

Cogent's network seems "OK", for now.

I've received several responses asking for details on how I would avoid
Cogent. It looks like getting a connection to the AT&T network will allow
us to serve our customers on their DSLS and use their direct peering to the
Time Warner network for our customers with cable Internet.

If anyone has any ideas on how this will work, please let me know. For
instance, do most networks prefer to keep packets on their network until
closest to the end point or might a network just send the traffic through
cogent in another part of their network a few hops away?

-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:59 PM
To: ***@merit.edu
Subject: RE: Cogent Router dropping packets


I spoke too soon:

Host Loss% Snt Last Avg
Best Wrst StDev
1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2
8.4 57.9 11.0
2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7
8.3 45.6 9.6
3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3
8.5 256.6 44.2
4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4
9.3 209.3 46.2
5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3
9.3 238.6 45.1
6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4
12.9 98.9 13.4
7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8
16.4 198.9 22.1
8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6
17.0 68.7 11.2
9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3
17.0 168.7 27.8
10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2
21.0 188.6 20.6
11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9
20.5 232.7 23.9
12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6
20.2 157.9 16.6
13. 74.10.xxx.xxx 11.1% 108 25.7 33.9
25.2 55.2 8.9
14. 74.10.xxx.xxx 15.7% 108 26.7 35.7
25.0 70.8 11.7


-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:15 PM
To: Ryan Harden
Cc: ***@merit.edu
Subject: RE: Cogent Router dropping packets


Thank you, the issue seems to be fixed now at Cogent.
Martin Hannigan
2008-04-19 23:18:03 UTC
Permalink
It is Saturday after all. We generally are all aware of Cogents
'status'. You're not having a unique experience.

Martin
Post by Mike Fedyk
(Crossed Fingers)
Cogent's network seems "OK", for now.
I've received several responses asking for details on how I would avoid
Cogent. It looks like getting a connection to the AT&T network will allow
us to serve our customers on their DSLS and use their direct peering to the
Time Warner network for our customers with cable Internet.
If anyone has any ideas on how this will work, please let me know. For
instance, do most networks prefer to keep packets on their network until
closest to the end point or might a network just send the traffic through
cogent in another part of their network a few hops away?
-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:59 PM
Subject: RE: Cogent Router dropping packets
Host Loss% Snt Last Avg
Best Wrst StDev
1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2
8.4 57.9 11.0
2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7
8.3 45.6 9.6
3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3
8.5 256.6 44.2
4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4
9.3 209.3 46.2
5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3
9.3 238.6 45.1
6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4
12.9 98.9 13.4
7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8
16.4 198.9 22.1
8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6
17.0 68.7 11.2
9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3
17.0 168.7 27.8
10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2
21.0 188.6 20.6
11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9
20.5 232.7 23.9
12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6
20.2 157.9 16.6
13. 74.10.xxx.xxx 11.1% 108 25.7 33.9
25.2 55.2 8.9
14. 74.10.xxx.xxx 15.7% 108 26.7 35.7
25.0 70.8 11.7
-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:15 PM
To: Ryan Harden
Subject: RE: Cogent Router dropping packets
Thank you, the issue seems to be fixed now at Cogent.
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
manolo
2008-04-19 23:26:55 UTC
Permalink
Some things just never change at cogent.. fought them for months way
back when to get me off their infamous 2 bgp peer setup after many an
outage due to this setup, they finally put us on a single bgp session
but it took forever. Lets just say cogent didn't last long at the
company I worked for.

You get what you pay for....


Manolo
Post by Martin Hannigan
It is Saturday after all. We generally are all aware of Cogents
'status'. You're not having a unique experience.
Martin
Post by Mike Fedyk
(Crossed Fingers)
Cogent's network seems "OK", for now.
I've received several responses asking for details on how I would avoid
Cogent. It looks like getting a connection to the AT&T network will allow
us to serve our customers on their DSLS and use their direct peering to the
Time Warner network for our customers with cable Internet.
If anyone has any ideas on how this will work, please let me know. For
instance, do most networks prefer to keep packets on their network until
closest to the end point or might a network just send the traffic through
cogent in another part of their network a few hops away?
-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:59 PM
Subject: RE: Cogent Router dropping packets
Host Loss% Snt Last Avg
Best Wrst StDev
1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2
8.4 57.9 11.0
2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7
8.3 45.6 9.6
3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3
8.5 256.6 44.2
4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4
9.3 209.3 46.2
5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3
9.3 238.6 45.1
6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4
12.9 98.9 13.4
7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8
16.4 198.9 22.1
8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6
17.0 68.7 11.2
9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3
17.0 168.7 27.8
10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2
21.0 188.6 20.6
11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9
20.5 232.7 23.9
12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6
20.2 157.9 16.6
13. 74.10.xxx.xxx 11.1% 108 25.7 33.9
25.2 55.2 8.9
14. 74.10.xxx.xxx 15.7% 108 26.7 35.7
25.0 70.8 11.7
-----Original Message-----
From: Mike Fedyk
Sent: Thursday, April 17, 2008 2:15 PM
To: Ryan Harden
Subject: RE: Cogent Router dropping packets
Thank you, the issue seems to be fixed now at Cogent.
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Paul Wall
2008-04-21 04:57:24 UTC
Permalink
Post by manolo
Some things just never change at cogent.. fought them for months way
back when to get me off their infamous 2 bgp peer setup after many an
outage due to this setup, they finally put us on a single bgp session
but it took forever. Lets just say cogent didn't last long at the
company I worked for.
Could you provide additional details on the failure mode experienced
resultant from this "two tiered" configuration? How did moving to a
"conventional" configuration with a single directly-connected neighbor
solve things?

What steps were taken during your postmortem and subsequent lab
simulations to verify that the outages were not with the customer-side
implementation, or perhaps a simple typographical error?

Here in H-town, we are deploying a metro/BLEC network comprised of
1000s of small L3 boxes not carrying full tables (Cisco 3560 and
similar), and would like very much to learn from these major
architectural mistakes, so that we can avoid similar outage scenarios.
Any information you could provide would be excellent.
Post by manolo
You get what you pay for....
Not passing any judgment on quality, Cogent is more towards the middle
of the road for price, these days, on larger commits.

Paul Wall
Joe Greco
2008-04-19 13:01:54 UTC
Permalink
Post by Paul Stewart
Same here... frequent packet loss. We had Cogent GigE service for about
9 months if I recall - more than one major outage per month and packet
loss issues at least once a week.
You get what you pay for (within reason)....
Cogent tends towards being a content network, and has occasional peering
issues with other networks who wind up bearing some of the related costs.
Internally, their network earns better than average marks, but sometimes
there will be strange things that don't seem to make sense, and these may
or may not be explained or fixed if you contact Cogent. Externally, we've
seen their history of peering problems.

We've noticed that there tend to be specific routes where Cogent
experiences problems for long periods of time. In the last few years,
for example, if packets went through Paris or London to destination ISP's
in the associated countries, there tended to be some performance problems.
Some of these have cleared up. It would appear to be peering and/or
congestion issues. These would typically only affect specific
destinations, but often required analysis and workarounds to be
implemented.

We've not noticed problems with major outages, but that may be because of
our location (Ashburn). There seem to be some people who do not experience
outages, and others who experience frequent outages. This may be dependent
on where in the Cogent network you are.

In their original business model, providing business ethernet connections
in a metro area, they're a very attractive provider, but I'd guess their
overall reliability would feel to be on par with some commercial ISP's, due
to occasional peering problems, packet loss, etc.
Post by Paul Stewart
From a service provider's point of view, in many places in the country,
their service is still sufficiently cheap that it could be worth
considering as part of a bandwidth mix. Given the availability of
automated tools to manage connections susceptible to brownouts (Cisco
OER, Avaya/RouteScience, etc), it may be quite attractive and viable to
use Cogent in certain environments, but you have to give it some thought.

To the original poster, Cogent to AT&T (DSL, etc) has historically been
what I'd consider to be "problematic." For years. Filtering out routes
that involve both AT&T and Cogent seems to work, though it may be
unnecessarily aggressive (and obviously you'd need an alternate carrier
with good connectivity to AT&T).

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Joe Greco
2008-04-21 15:41:21 UTC
Permalink
Post by Paul Wall
Post by manolo
Some things just never change at cogent.. fought them for months way
back when to get me off their infamous 2 bgp peer setup after many an
outage due to this setup, they finally put us on a single bgp session
but it took forever. Lets just say cogent didn't last long at the
company I worked for.
Could you provide additional details on the failure mode experienced
resultant from this "two tiered" configuration? How did moving to a
"conventional" configuration with a single directly-connected neighbor
solve things?
For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent. However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router. To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router. I
believe they actually do all their BGP connections in that manner.

This probably makes a lot of sense from an engineering point of view, and
could be construed as a BGP competence test. On the other hand, it does
have the potential to make things more complex in the event of a failure.

I'm not aware of any flaws with such a design that would cause "many an
outage," and connections that we've managed for customers with Cogent
suggest that it works well. However, if there are problems within the
local Cogent node, I could easily see situations where hard-to-identify
problems could result. That would seem to me to be an equipment, capacity,
or possibly a configuration issue, but not something which discredits the
overall strategy. Given that they're providing inexpensive bandwidth, it
isn't likely that they'll be sticking large routers everywhere for the
customers who want a full table and a simpler BGP configuration.

There are many things that you can realistically criticize Cogent for, but
I'm not sure the peerA/peerB thing should be one of them. It is certainly
more complex, but seems to serve a purpose.
Post by Paul Wall
What steps were taken during your postmortem and subsequent lab
simulations to verify that the outages were not with the customer-side
implementation, or perhaps a simple typographical error?
Here in H-town, we are deploying a metro/BLEC network comprised of
1000s of small L3 boxes not carrying full tables (Cisco 3560 and
similar), and would like very much to learn from these major
architectural mistakes, so that we can avoid similar outage scenarios.
Any information you could provide would be excellent.
Interesting :-)
Post by Paul Wall
Post by manolo
You get what you pay for....
Not passing any judgment on quality, Cogent is more towards the middle
of the road for price, these days, on larger commits.
Or in places like Ashburn. I've been wondering what their future strategy
will be.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
David Coulson
2008-04-21 16:02:49 UTC
Permalink
Post by Joe Greco
For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent. However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router. To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router. I
believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered
as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is
handled as a single BGP session. I'm not sure if GigE or SONET services
are handled as multihop or not.

Probably all depends what hardware they have at each POP....
Joe Greco
2008-04-21 19:54:40 UTC
Permalink
Post by David Coulson
Post by Joe Greco
For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent. However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router. To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router. I
believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered
as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is
handled as a single BGP session. I'm not sure if GigE or SONET services
are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps
delivered over gigE, which was definitely multihop).
Post by David Coulson
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having consistency
throughout your network, and it sometimes struck me that many of the folks
working for Cogent had a bit more than average difficulty dealing with the
unusual situation. This is not meant harshly, btw. Generally I like the
Cogent folks, but they (and their products) have their faults, just as any
of the competition does.

It may also help to remember that there's "legacy" Cogent and then there's
PSI/etc. Perhaps there are some differences as a result.

The more things you can do using the same template, the less difficult it
is to support. On the flip side, the less flexible you are ...

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
manolo
2008-04-21 20:02:48 UTC
Permalink
I do have to say that the PSI net side of cogent is very good. We use
them in Europe without many issues. I stay far away from the legacy
cogent network in US.


Manolo
Post by Joe Greco
Post by David Coulson
Post by Joe Greco
For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent. However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router. To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router. I
believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered
as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is
handled as a single BGP session. I'm not sure if GigE or SONET services
are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps
delivered over gigE, which was definitely multihop).
Post by David Coulson
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having consistency
throughout your network, and it sometimes struck me that many of the folks
working for Cogent had a bit more than average difficulty dealing with the
unusual situation. This is not meant harshly, btw. Generally I like the
Cogent folks, but they (and their products) have their faults, just as any
of the competition does.
It may also help to remember that there's "legacy" Cogent and then there's
PSI/etc. Perhaps there are some differences as a result.
The more things you can do using the same template, the less difficult it
is to support. On the flip side, the less flexible you are ...
... JG
John van Oppen (list account)
2008-04-21 23:04:19 UTC
Permalink
Not sure what you are talking about, cogent is all AS174... Other
than a few odd routers doing DS3 aggregation I don't think there is any
old PSInet network online (other than the AS number and IP addresses).
Cogent integrated acquisitions quite quickly (I was an aleron customer
and it only took two months from the purchase close for us to move from
AS4200 to 174).

As for the two BGP peer question, they do it anywhere where they have
Ethernet distribution, at least as far I can tell. That being said, we
don't use them anymore since we could not get them to play-ball on
pricing at larger commits either (I won't buy cogent if they don't at
least match the terms of our cheapest large-network transit provider).
:)


John van Oppen
Spectrum Networks LLC
206.973.8302 (Direct)
206.973.8300 (main office)

-----Original Message-----
From: manolo [mailto:***@comcast.net]
Sent: Monday, April 21, 2008 1:03 PM
To: Joe Greco
Cc: ***@merit.edu
Subject: Re: [Nanog] Cogent Router dropping packets

I do have to say that the PSI net side of cogent is very good. We use
them in Europe without many issues. I stay far away from the legacy
cogent network in US.


Manolo
Post by Joe Greco
Post by David Coulson
Post by Joe Greco
For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent. However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router. To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router.
I
Post by Joe Greco
Post by David Coulson
Post by Joe Greco
believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered
as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is
handled as a single BGP session. I'm not sure if GigE or SONET services
are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps
delivered over gigE, which was definitely multihop).
Post by David Coulson
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having
consistency
Post by Joe Greco
throughout your network, and it sometimes struck me that many of the folks
working for Cogent had a bit more than average difficulty dealing with the
unusual situation. This is not meant harshly, btw. Generally I like the
Cogent folks, but they (and their products) have their faults, just as any
of the competition does.
It may also help to remember that there's "legacy" Cogent and then there's
PSI/etc. Perhaps there are some differences as a result.
The more things you can do using the same template, the less difficult it
is to support. On the flip side, the less flexible you are ...
... JG
manolo
2008-04-22 12:43:07 UTC
Permalink
Well it had sounded like I was in the minority and should keep my mouth
shut. But here goes. On several occasions the peer that would advertise
our routes would drop and with that the peer with the full bgp tables
would drop as well. This happened for months on end. They tried blaming
our 6500, our fiber provider, our IOS version, no conclusive findings
where ever found that it was our problem. After some testing at the
local Cogent office by both Cogent and myself, Cogent decided that they
could "make a product" that would allow us too one have only one peer
and two to connect directly to the GSR and not through a small catalyst.
Low and behold things worked well for some time after that.

This all happened while we had 3 other providers on the same router
with no issues at all. We moved gbics, ports etc around to make sure it
was not some odd ASIC or throughput issue with the 6500.

Hope this answers the question.

Manolo
Post by manolo
I do have to say that the PSI net side of cogent is very good. We use
them in Europe without many issues. I stay far away from the legacy
cogent network in US.
You still haven't explained the failure modes you've experienced as a
result of cogent's A/B peer configuration, only fronted.
Inquiring minds would like to know!
David Coulson
2008-04-22 12:52:21 UTC
Permalink
Post by manolo
Well it had sounded like I was in the minority and should keep my mouth
shut. But here goes. On several occasions the peer that would advertise
our routes would drop and with that the peer with the full bgp tables
would drop as well.
That doesn't sound like the problem has anything to do with their
multihop-eBGP configuration - It just appears that whatever you were
directly connected to was flaking out. If they had moved you to a
directly connected BGP session and it all worked, that would be one
argument, but you also moved from a junky 3550 or something to the GSR
in the process. I'd argue that if the switch could handle full tables
and you just had a single session, you would probably have experienced
the same issue.

I've ran with both direct and multihop with Cogent, and I honestly never
noticed any difference in stability. I hear what you're saying, and I
think you have a valid argument in some respects, but I just think the
BGP problem is a symptom, not a cause.
Joe Greco
2008-04-22 13:33:55 UTC
Permalink
Post by manolo
Well it had sounded like I was in the minority and should keep my mouth
shut. But here goes. On several occasions the peer that would advertise
our routes would drop and with that the peer with the full bgp tables
would drop as well. This happened for months on end. They tried blaming
our 6500, our fiber provider, our IOS version, no conclusive findings
where ever found that it was our problem. After some testing at the
local Cogent office by both Cogent and myself, Cogent decided that they
could "make a product" that would allow us too one have only one peer
and two to connect directly to the GSR and not through a small catalyst.
Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router
with no issues at all. We moved gbics, ports etc around to make sure it
was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that
Cogent probably had the same situation? They had a router with a bunch
of other customers on it, no reported problems, and you were the oddball
reporting significant issues?

Quite frankly, your own description does not support this as being a
problem inherent to the peerA/peerB setup.

You indicate that the peer advertising your routes would drop. The peer
with the full BGP tables would then drop as well. Well, quite frankly,
that makes complete sense. The peer advertising your routes also
advertises to you the route to get to the multihop peer, which you need
in order to be able to talk to that. Therefore, if the directly connected
BGP goes away for any reason, the multihop is likely to go away too.

However, given the exact same hardware minus the multihop, your direct
BGP was still dropping. So had they been able to send you a full table
from the aggregation router, the same thing probably would have happened.

This sounds more like flaky hardware, dirty optics, or a bad cable (or
several of the above).

Given that, it actually seems quite reasonable to me to guess that it
could have been your 6500, your fiber provider, or your IOS version that
was introducing some problem. Anyone who has done any reasonable amount
of work in this business will have seen all three, and many of the people
here will say that the 6500 is a bit flaky and touchy when pushed into
service as a real router (while simultaneously using them in their
networks as such, heh, since nothing else really touches the price per
port), so Cogent's suggestion that it was a problem on your side may have
been based on bad experiences with other customer 6500's.

However, it is also likely that it was some other mundane problem, or a
problem with the same items on Cogent's side. I would consider it a
shame that Cogent didn't work more closely with you to track down the
specific issue, because most of the time, these things can be isolated
and eliminated, rather than being potentially left around to mess up
someone in the future (think: bad port).

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
manolo
2008-04-22 14:22:48 UTC
Permalink
Well it also was the total arrogance on the part of Cogent engineering
and management taking zero responsibility and pushing it back everytime
valid issue or not. You had to be there. But everyone has a different
opinion, my opinion is set regardless of what cogent tries to sell me now.



Manolo
Post by Joe Greco
Post by manolo
Well it had sounded like I was in the minority and should keep my mouth
shut. But here goes. On several occasions the peer that would advertise
our routes would drop and with that the peer with the full bgp tables
would drop as well. This happened for months on end. They tried blaming
our 6500, our fiber provider, our IOS version, no conclusive findings
where ever found that it was our problem. After some testing at the
local Cogent office by both Cogent and myself, Cogent decided that they
could "make a product" that would allow us too one have only one peer
and two to connect directly to the GSR and not through a small catalyst.
Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router
with no issues at all. We moved gbics, ports etc around to make sure it
was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that
Cogent probably had the same situation? They had a router with a bunch
of other customers on it, no reported problems, and you were the oddball
reporting significant issues?
Quite frankly, your own description does not support this as being a
problem inherent to the peerA/peerB setup.
You indicate that the peer advertising your routes would drop. The peer
with the full BGP tables would then drop as well. Well, quite frankly,
that makes complete sense. The peer advertising your routes also
advertises to you the route to get to the multihop peer, which you need
in order to be able to talk to that. Therefore, if the directly connected
BGP goes away for any reason, the multihop is likely to go away too.
However, given the exact same hardware minus the multihop, your direct
BGP was still dropping. So had they been able to send you a full table
from the aggregation router, the same thing probably would have happened.
This sounds more like flaky hardware, dirty optics, or a bad cable (or
several of the above).
Given that, it actually seems quite reasonable to me to guess that it
could have been your 6500, your fiber provider, or your IOS version that
was introducing some problem. Anyone who has done any reasonable amount
of work in this business will have seen all three, and many of the people
here will say that the 6500 is a bit flaky and touchy when pushed into
service as a real router (while simultaneously using them in their
networks as such, heh, since nothing else really touches the price per
port), so Cogent's suggestion that it was a problem on your side may have
been based on bad experiences with other customer 6500's.
However, it is also likely that it was some other mundane problem, or a
problem with the same items on Cogent's side. I would consider it a
shame that Cogent didn't work more closely with you to track down the
specific issue, because most of the time, these things can be isolated
and eliminated, rather than being potentially left around to mess up
someone in the future (think: bad port).
... JG
John van Oppen (list account)
2008-04-22 19:43:54 UTC
Permalink
I know I have experienced the engineering department there as well, the
best one was when they wanted paper documentation for every route I
asked to have in our filters... (and they were incapable of using
RADB). It was especially odd since we have > 80 of our own peers and
three other transit providers to who we were announcing over 100 routes
while they still wanted paper docs.

But, filters seem to be an annoyance for most big providers... I have
been trying to get level3 to fix our radb-based filtering for a while
now (it just stopped pulling new updates for some reason). :)

John


-----Original Message-----
From: manolo [mailto:***@comcast.net]
Sent: Tuesday, April 22, 2008 7:23 AM
To: Joe Greco
Cc: ***@merit.edu
Subject: Re: [Nanog] Cogent Router dropping packets

Well it also was the total arrogance on the part of Cogent engineering
and management taking zero responsibility and pushing it back everytime
valid issue or not. You had to be there. But everyone has a different
opinion, my opinion is set regardless of what cogent tries to sell me
now.



Manolo
Post by Joe Greco
Post by manolo
Well it had sounded like I was in the minority and should keep my
mouth
Post by Joe Greco
Post by manolo
shut. But here goes. On several occasions the peer that would
advertise
Post by Joe Greco
Post by manolo
our routes would drop and with that the peer with the full bgp tables
would drop as well. This happened for months on end. They tried
blaming
Post by Joe Greco
Post by manolo
our 6500, our fiber provider, our IOS version, no conclusive findings
where ever found that it was our problem. After some testing at the
local Cogent office by both Cogent and myself, Cogent decided that
they
Post by Joe Greco
Post by manolo
could "make a product" that would allow us too one have only one peer
and two to connect directly to the GSR and not through a small
catalyst.
Post by Joe Greco
Post by manolo
Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router
with no issues at all. We moved gbics, ports etc around to make sure
it
Post by Joe Greco
Post by manolo
was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that
Cogent probably had the same situation? They had a router with a
bunch
Post by Joe Greco
of other customers on it, no reported problems, and you were the
oddball
Post by Joe Greco
reporting significant issues?
Quite frankly, your own description does not support this as being a
problem inherent to the peerA/peerB setup.
You indicate that the peer advertising your routes would drop. The
peer
Post by Joe Greco
with the full BGP tables would then drop as well. Well, quite
frankly,
Post by Joe Greco
that makes complete sense. The peer advertising your routes also
advertises to you the route to get to the multihop peer, which you
need
Post by Joe Greco
in order to be able to talk to that. Therefore, if the directly
connected
Post by Joe Greco
BGP goes away for any reason, the multihop is likely to go away too.
However, given the exact same hardware minus the multihop, your direct
BGP was still dropping. So had they been able to send you a full
table
Post by Joe Greco
from the aggregation router, the same thing probably would have
happened.
Post by Joe Greco
This sounds more like flaky hardware, dirty optics, or a bad cable (or
several of the above).
Given that, it actually seems quite reasonable to me to guess that it
could have been your 6500, your fiber provider, or your IOS version
that
Post by Joe Greco
was introducing some problem. Anyone who has done any reasonable
amount
Post by Joe Greco
of work in this business will have seen all three, and many of the
people
Post by Joe Greco
here will say that the 6500 is a bit flaky and touchy when pushed into
service as a real router (while simultaneously using them in their
networks as such, heh, since nothing else really touches the price per
port), so Cogent's suggestion that it was a problem on your side may
have
Post by Joe Greco
been based on bad experiences with other customer 6500's.
However, it is also likely that it was some other mundane problem, or
a
Post by Joe Greco
problem with the same items on Cogent's side. I would consider it a
shame that Cogent didn't work more closely with you to track down the
specific issue, because most of the time, these things can be isolated
and eliminated, rather than being potentially left around to mess up
someone in the future (think: bad port).
... JG
Pete Templin
2008-04-22 20:15:14 UTC
Permalink
Post by John van Oppen (list account)
I know I have experienced the engineering department there as well, the
best one was when they wanted paper documentation for every route I
asked to have in our filters... (and they were incapable of using
RADB). It was especially odd since we have > 80 of our own peers and
three other transit providers to who we were announcing over 100 routes
while they still wanted paper docs.
I've fixed this by throwing their own policies back at them. Point out
to them that the route is already appearing globally through your AS,
and remind them that their policy, section 3b, already allows that. :)

On the previous topic, I'd have to say that their two-peer system is
perhaps one of the better, if not best, multihop implementations I've
seen. Amongst other things, it tends to provide a rapid assessment of
"life in the POP". I just wish they'd use their network status messages
to reflect when they were having problems, instead of just problems
that are too large for the call center to handle. :(

pt

Loading...