Discussion:
Lies, Damned Lies, and Statistics
(too old to reply)
Paul Ferguson
2008-04-22 05:57:44 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Here in comcast land hdtv is actually averaging around 12 megabits a
second. Still adds up to staggering numbers..:)
Another disturbing fact inside this entire mess is that, by compressing
HD content, consumers are noticing the degradation in quality:

http://cbs5.com/local/hdtv.cable.compression.2.705405.html

So, we have a "Tragedy of The Commons" situation that is completely
created by the telcos themselves trying to force consumer decisions,
and then failing to deliver, but bemoaning the fact that
infrastructure is being over-utilized by file-sharers (or
"Exafloods" or whatever the apocalyptic issue of the day is for
telcos).

A real Charlie Foxtrot.

- - ferg

-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 9.6.3 (Build 3017)

wj8DBQFIDX5Rq1pz9mNUZTMRAovKAJ0SXpK/XrW73mkCGZhtLCO5ZGsspQCfbUY3
0DPluYrtq0Et/RbvJguq3WM=
=furJ
-----END PGP SIGNATURE-----


--
"Fergie", a.k.a. Paul Ferguson
Engineering Architecture for the Internet
fergdawg(at)netzero.net
ferg's tech blog: http://fergdawg.blogspot.com/
Petri Helenius
2008-04-22 08:17:27 UTC
Permalink
Time to push multicast as transport for bittorrent? If the downloads get
better performance that way, I think the clients would be around quicker
that multicast would be enabled for consumer DSL or cable.

Pete
Post by Paul Ferguson
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Here in comcast land hdtv is actually averaging around 12 megabits a
second. Still adds up to staggering numbers..:)
Another disturbing fact inside this entire mess is that, by compressing
http://cbs5.com/local/hdtv.cable.compression.2.705405.html
So, we have a "Tragedy of The Commons" situation that is completely
created by the telcos themselves trying to force consumer decisions,
and then failing to deliver, but bemoaning the fact that
infrastructure is being over-utilized by file-sharers (or
"Exafloods" or whatever the apocalyptic issue of the day is for
telcos).
A real Charlie Foxtrot.
- - ferg
-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 9.6.3 (Build 3017)
wj8DBQFIDX5Rq1pz9mNUZTMRAovKAJ0SXpK/XrW73mkCGZhtLCO5ZGsspQCfbUY3
0DPluYrtq0Et/RbvJguq3WM=
=furJ
-----END PGP SIGNATURE-----
--
"Fergie", a.k.a. Paul Ferguson
Engineering Architecture for the Internet
fergdawg(at)netzero.net
ferg's tech blog: http://fergdawg.blogspot.com/
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
m***@bt.com
2008-04-22 10:55:58 UTC
Permalink
Post by Petri Helenius
Time to push multicast as transport for bittorrent?
Bittorrent clients are already multicast, only they do it in a crude way
that does not match network topology as well as it could. Moving to use
IP multicast raises a whole host of technical issues such as lack of
multicast peering. Solving those technical issues requires ISP
cooperation, i.e. to support global multicast.

But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology. This topology guru would be some software that is run
by an ISP, and which communicates with all other topology gurus in
neighboring ASes These gurus learn the topology using some kind of
protocol like a routing protocol. They also have some local intelligence
configured by the ISP such as allowed traffic rates at certain time
periods over certain paths. And they share all of that information in
order to optimize the overall downloading of all files to all clients
which share the same guru. Some ISPs have local DSL architectures in
which it makes better sense to download a file from a remote location,
than from the guy next door. In that case, an ISP could configure a guru
to prefer circuits into their data centre, then operate clients in the
data center that effectively cache files. But the caching thing is
optional.

Then, a bittorrent client doesn't have to guess how to get files
quickly, it just has to follow the guru's instructions. Part of this
would involve cooperating with all other clients attached to the same
guru so that no client downloads distant blocks of data that have
already been downloaded by another local client. This is the part that
really starts to look like IP multicast except that it doesn't rely on
all clients functioning in real time. Also, it looks like NNTP news
servers except that the caching is all done on the clients. The gurus
never cache or download files.

For this to work, you need to start by getting several ISPs to buy-in,
help with the design work, and then deploy the gurus. Once this proves
itself in terms of managing how and *WHEN* bandwidth is used, it should
catch on quite quickly with ISPs. Note that a key part of this
architecture is that it allows the ISP to open up the throttle on
downloads during off-peak hours so that most end users can get a
predictable service of all downloads completed overnight.

--Michael Dillon
Mark Smith
2008-04-22 12:13:24 UTC
Permalink
On Tue, 22 Apr 2008 11:55:58 +0100
Post by m***@bt.com
Post by Petri Helenius
Time to push multicast as transport for bittorrent?
Bittorrent clients are already multicast, only they do it in a crude way
that does not match network topology as well as it could. Moving to use
IP multicast raises a whole host of technical issues such as lack of
multicast peering. Solving those technical issues requires ISP
cooperation, i.e. to support global multicast.
But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology.
<snip>

Isn't TCP already measuring throughput and latency of the network for
RTO etc.? Why not expose those parameters for peers to the local P2P
software, and then have it select the closest peers with either the
lowest latency, the highest throughput, or a weighed combination of
both? I'd think that would create a lot of locality in the traffic.

Regards,
Mark.
--
"Sheep are slow and tasty, and therefore must remain constantly
alert."
- Bruce Schneier, "Beyond Fear"
Alexander Harrowell
2008-04-22 12:35:41 UTC
Permalink
On Tue, Apr 22, 2008 at 1:13 PM, Mark Smith <
Post by Mark Smith
On Tue, 22 Apr 2008 11:55:58 +0100
Post by m***@bt.com
But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology.
<snip>
Isn't TCP already measuring throughput and latency of the network for
RTO etc.? Why not expose those parameters for peers to the local P2P
software, and then have it select the closest peers with either the
lowest latency, the highest throughput, or a weighed combination of
both? I'd think that would create a lot of locality in the traffic.
Regards,
Mark
This is where you hit a serious problem. If you implemented that in a
client, it could be much worse than naive P2P for quite a lot of networks -
for example all the UK ISPs. If you have a bitstream/IPStream architecture,
your bits get hauled from local aggregation sites to your routers via L2TP
and you get billed by the telco for them; now, if you strictly localise P2P
traffic, all the localised bits will be transiting the bitstream sector
TWICE, drastically increasing your costs.

(Assumption: your upstream costs are made up of X amount of wholesale
transit+Y amount of peering, unlike your telco costs which in this case are
100% transit-like and paid for by the bit.)

Things also vary depending on the wholesale transit and peering market; for
example, someone like a customer of CityLink in Wellington, NZ would be
intensely relaxed about local traffic on the big optical ethernet pipes, but
very keen indeed to save on international transit due to the highly
constrained cable infrastructure. But if you were, say, a Dutch DSL operator
with incumbent backhaul, you might want to actively encourage P2Pers to
fetch from external peers because international peering at AMSIX is
abundant.

Basically, it's bringing traffic engineering inside the access network.

Alex
m***@bt.com
2008-04-22 13:02:21 UTC
Permalink
Post by Mark Smith
Isn't TCP already measuring throughput and latency of
the network for
RTO etc.? Why not expose those parameters for peers to
the local P2P
This is where you hit a serious problem. If you implemented
that in a client, it could be much worse than naive P2P for
quite a lot of networks - for example all the UK ISPs. If you
have a bitstream/IPStream architecture, your bits get hauled
from local aggregation sites to your routers via L2TP and you
get billed by the telco for them; now, if you strictly
localise P2P traffic, all the localised bits will be
transiting the bitstream sector TWICE, drastically increasing
your costs.
This is where all the algorithmic tinkering of the P2P software
cannot solve the problem. You need a way to insert non-technical
information about the network into the decision-making process.
The only way for this to work is to allow the network operator
to have a role in every P2P transaction. And to do that you need
a middlebox that sits in the ISP network which they can configure.
In the scenario above, I would expect the network operator to ban
connections to their DSL address block. Instead, they would put
some P2P clients in the rack with the topology guru middlebox
and direct the transactions there. Or to peers/upstreams. And
the network operator would manage all the block retrieval requests
from the P2P clients in order to achieve both traffic shaping (rate
limiting) and to ensure that multiple local clients cooperate in
retrieving unique blocks from the file to reduce total traffic from
upstreams/peers.
Post by Mark Smith
Basically, it's bringing traffic engineering inside the
access network.
Actually, bringing traffic engineering into the P2P service which
is where the problem exists. Or bringing the network operator into
the P2P service rather than leaving the netop as a reactive outsider.

--Michael Dillon
Matthew Moyle-Croft
2008-04-22 13:12:57 UTC
Permalink
Post by m***@bt.com
...
You need a way to insert non-technical
information about the network into the decision-making process.
The only way for this to work is to allow the network operator
to have a role in every P2P transaction. And to do that you need
a middlebox that sits in the ISP network which they can configure.
You could probably do this with a variant of DNS. Use an Anycast
address common to everyone to solve the discovery problem. Client
sends a DNS request for a TXT record for, as an example,
148.165.32.217.p2ptopology.org. The topology box looks at the IP
address that the request came from and does some magic based on the
requested information and returns a ranking score based on that (maybe
0-255 worse to best) that the client can then use to rank where it
downloads from. (might have to run DNS on another port so that normal
resolvers don't capture this).

The great thing is that you can use it for other things.

MMC
--
Matthew Moyle-Croft - Internode/Agile - Networks
Level 5, 150 Grenfell Street, Adelaide, SA 5000 Australia
Email: ***@internode.com.au Web: http://www.on.net
Direct: +61-8-8228-2909 Mobile: +61-419-900-366
Reception: +61-8-8228-2999 Fax: +61-8-8235-6909

"The difficulty lies, not in the new ideas,
but in escaping from the old ones" - John Maynard Keynes
Matthew Moyle-Croft
2008-04-22 13:24:34 UTC
Permalink
(I know, replying to your own email is sad ...)
Post by Matthew Moyle-Croft
You could probably do this with a variant of DNS. Use an Anycast
address common to everyone to solve the discovery problem. Client
sends a DNS request for a TXT record for, as an example,
148.165.32.217.p2ptopology.org. The topology box looks at the IP
address that the request came from and does some magic based on the
requested information and returns a ranking score based on that (maybe
0-255 worse to best) that the client can then use to rank where it
downloads from. (might have to run DNS on another port so that normal
resolvers don't capture this).
The great thing is that you can use it for other things.
Since this could be dynamic (I'm guessing BGP and other things like SNMP
feeding the topology box) you could then use it to balance traffic flows
through your network to avoid congestion on certain links - that's a win
for everyone. You could get webbrowsers to look at it when you've got
multiple A records to chose which one is best for things like Flash
video etc.

MMC
--
Matthew Moyle-Croft - Internode/Agile - Networks
Level 5, 150 Grenfell Street, Adelaide, SA 5000 Australia
Email: ***@internode.com.au Web: http://www.on.net
Direct: +61-8-8228-2909 Mobile: +61-419-900-366
Reception: +61-8-8228-2999 Fax: +61-8-8235-6909

"The difficulty lies, not in the new ideas,
but in escaping from the old ones" - John Maynard Keynes
Alexander Harrowell
2008-04-22 13:28:58 UTC
Permalink
NCAP - Network Capability (or Cost) Announcement Protocol.
Post by Matthew Moyle-Croft
(I know, replying to your own email is sad ...)
Post by Matthew Moyle-Croft
You could probably do this with a variant of DNS. Use an Anycast
address common to everyone to solve the discovery problem. Client
sends a DNS request for a TXT record for, as an example,
148.165.32.217.p2ptopology.org. The topology box looks at the IP
address that the request came from and does some magic based on the
requested information and returns a ranking score based on that (maybe
0-255 worse to best) that the client can then use to rank where it
downloads from. (might have to run DNS on another port so that normal
resolvers don't capture this).
The great thing is that you can use it for other things.
Since this could be dynamic (I'm guessing BGP and other things like SNMP
feeding the topology box) you could then use it to balance traffic flows
through your network to avoid congestion on certain links - that's a win
for everyone. You could get webbrowsers to look at it when you've got
multiple A records to chose which one is best for things like Flash
video etc.
MMC
--
Matthew Moyle-Croft - Internode/Agile - Networks
Level 5, 150 Grenfell Street, Adelaide, SA 5000 Australia
Direct: +61-8-8228-2909 Mobile: +61-419-900-366
Reception: +61-8-8228-2999 Fax: +61-8-8235-6909
"The difficulty lies, not in the new ideas,
but in escaping from the old ones" - John Maynard Keynes
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Matthew Moyle-Croft
2008-04-22 13:32:12 UTC
Permalink
SNSP = Simple Network Selection Protocol
Post by Alexander Harrowell
NCAP - Network Capability (or Cost) Announcement Protocol.
On Tue, Apr 22, 2008 at 2:24 PM, Matthew Moyle-Croft
(I know, replying to your own email is sad ...)
Post by Matthew Moyle-Croft
You could probably do this with a variant of DNS. Use an Anycast
address common to everyone to solve the discovery problem. Client
sends a DNS request for a TXT record for, as an example,
148.165.32.217.p2ptopology.org
<http://148.165.32.217.p2ptopology.org>. The topology box looks
at the IP
Post by Matthew Moyle-Croft
address that the request came from and does some magic based on the
requested information and returns a ranking score based on that
(maybe
Post by Matthew Moyle-Croft
0-255 worse to best) that the client can then use to rank where it
downloads from. (might have to run DNS on another port so that
normal
Post by Matthew Moyle-Croft
resolvers don't capture this).
The great thing is that you can use it for other things.
Since this could be dynamic (I'm guessing BGP and other things like SNMP
feeding the topology box) you could then use it to balance traffic flows
through your network to avoid congestion on certain links - that's a win
for everyone. You could get webbrowsers to look at it when you've got
multiple A records to chose which one is best for things like Flash
video etc.
MMC
--
Matthew Moyle-Croft - Internode/Agile - Networks
Level 5, 150 Grenfell Street, Adelaide, SA 5000 Australia
http://www.on.net
Direct: +61-8-8228-2909 Mobile: +61-419-900-366
Reception: +61-8-8228-2999 Fax: +61-8-8235-6909
"The difficulty lies, not in the new ideas,
but in escaping from the old ones" - John Maynard Keynes
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
--
Matthew Moyle-Croft - Internode/Agile - Networks
Level 5, 150 Grenfell Street, Adelaide, SA 5000 Australia
Email: ***@internode.com.au Web: http://www.on.net
Direct: +61-8-8228-2909 Mobile: +61-419-900-366
Reception: +61-8-8228-2999 Fax: +61-8-8235-6909

"The difficulty lies, not in the new ideas,
but in escaping from the old ones" - John Maynard Keynes
Alexander Harrowell
2008-04-22 13:17:17 UTC
Permalink
Post by m***@bt.com
In the scenario above, I would expect the network operator to ban
connections to their DSL address block. Instead, they would put
some P2P clients in the rack with the topology guru middlebox
and direct the transactions there. Or to peers/upstreams.
Don't know about the word "ban"; what we need is more like BGP than DRM.
Ideally, we want the clients to do sensible things because it works best,
not because they are being coerced. Further, once you start banning things
you get into all kinds of problems; not least that interests are no longer
aligned and trust is violated.

If DillTorrent is working well with a localpref metric of -1 (where 0 is the
free-running condition with neither local or distant preference) there
shouldn't be any traffic within the DSL pool anyway, without coercion.

There is obvious synergy with CDNs here.

Alex
Stephane Bortzmeyer
2008-04-22 13:54:16 UTC
Permalink
On Tue, Apr 22, 2008 at 02:02:21PM +0100,
Post by m***@bt.com
This is where all the algorithmic tinkering of the P2P software
cannot solve the problem. You need a way to insert non-technical
information about the network into the decision-making process.
It's strange that noone in this thread mentioned P4P yet. Isn't there
someone involved in P4P at Nanog?

http://www.dcia.info/activities/p4pwg/

IMHO, the biggest issue with P4P is the one mentioned by Alexander
Harrowell. After that users have been s.....d up so many times by some
ISPs, will they trust this service?
Alexander Harrowell
2008-04-22 14:10:28 UTC
Permalink
Personally I consider P4P a big step forward; it's good to see Big Verizon
engaging with these issues in a non-coercive fashion.

Just to braindump a moment, it strikes me that it would be very useful to be
able to announce preference metrics by netblock (for example, to deal with
networks with varied internal cost metrics or to pref-in the CDN servers)
but also risky. If that was done, client developers would be well advised to
implement a check that the announcing network actually owns the netblock
they are either preffing in (to send traffic via a suboptimal route/through
a spook box of some kind/onto someone else's pain-point) or out (to restrict
traffic from reaching somewhere); you wouldn't want a hijack, whether
malicious or clue-deficient.

There is every reason to encourage the use of dynamic preference.
Post by Stephane Bortzmeyer
On Tue, Apr 22, 2008 at 02:02:21PM +0100,
Post by m***@bt.com
This is where all the algorithmic tinkering of the P2P software
cannot solve the problem. You need a way to insert non-technical
information about the network into the decision-making process.
It's strange that noone in this thread mentioned P4P yet. Isn't there
someone involved in P4P at Nanog?
http://www.dcia.info/activities/p4pwg/
IMHO, the biggest issue with P4P is the one mentioned by Alexander
Harrowell. After that users have been s.....d up so many times by some
ISPs, will they trust this service?
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Petri Helenius
2008-04-22 12:12:30 UTC
Permalink
Post by m***@bt.com
But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology. This topology guru would be some software that is run
While the current bittorrent implementation is suboptimal for large
swarms (where number of adjacent peers is significantly less than the
number of total participants) I fail to figure out the necessary
mathematics where topology information would bring superior results
compared to the usual greedy algorithms where data is requested from the
peers where it seems to be flowing at the best rates. If local peers
with sufficient upstream bandwidth exist, majority of the data blocks
are already retrieved from them.

In many locales ISP's tend to limit the available upstream on their
consumer connections, usually causing more distant bits to be delivered
instead.

I think the most important metric to study is the number of times the
same piece of data is transmitted in a defined time period and try to
figure out how to optimize for that. For a new episode of BSG, there are
a few hundred thousand copies in the first hour and a million or so in
the first few days. With the headers and overhead, we might already be
hitting a petabyte per episode. RSS feeds seem to shorten the
distribution ramp-up from release.

The p2p world needs more high-upstream "proxies" to make it more
effective. I think locality with current torrent implementations would
happen automatically. However there are quite a few parties who are
happy to have it as bad as they can make it :-)

Is there a problem that needs to be solved that is not solved by
Akamai's of the world already?

Pete
m***@bt.com
2008-04-22 12:52:22 UTC
Permalink
Post by Petri Helenius
I fail to figure
out the necessary mathematics where topology information
would bring superior results compared to the usual greedy
algorithms where data is requested from the peers where it
seems to be flowing at the best rates. If local peers with
sufficient upstream bandwidth exist, majority of the data
blocks are already retrieved from them.
First, it's not a mathematical issue. It is a network operational
issue where ISPs have bandwidth caps and enforce them by traffic
shaping when thresholds are exceeded. And secondly, there are
cases where it is not in the ISP's best interest for P2P clients
to retrieve files from the client with the lowest RTT.
Post by Petri Helenius
In many locales ISP's tend to limit the available upstream on
their consumer connections, usually causing more distant bits
to be delivered instead.
Yep, it's a game of whack-a-mole.
Post by Petri Helenius
I think the most important metric to study is the number of
times the same piece of data is transmitted in a defined time
period and try to figure out how to optimize for that.
Or P2P developers could stop fighting ISPs and treating the Internet
as an amorphous cloud, and build something that will be optimal for
the ISPs, the end users, and the network infrastructure.
Post by Petri Helenius
The p2p world needs more high-upstream "proxies" to make it
more effective.
That is essentially a cache, just like NNTP news servers or
Squid web proxies. But rather than making a special P2P client that
caches and proxies and fiddles with stuff, why not take all the
network intelligence code out of the client and put it into
a topology guru that runs in your local ISP's high-upstream
infrastructure. Chances are that many ISPs will put a few P2P
caching clients in the same rack as this guru if it pays them
to take traffic off one direction of the last-mile, or if it
pays them to ensure that files hang around locally longer than
they do naturally, thus saving on their upstream/peering traffic.
Post by Petri Helenius
Is there a problem that needs to be solved that is not solved
by Akamai's of the world already?
Akamai is a commercial service that content senders can contract
with to achieve the same type of multicasting (called Content Delivery
Network) as a P2P network provides to end users. ISPs don't provide
Akamai service to their hosting customers, but they do provide those
customers with web service, mail service, FTP service, etc. I am
suggesting
that there is a way for ISPs to provide a generic BitTorrent P2P service
to any customer who wants to send content (or receive content). It would
allow heavy P2P users to evade the crude traffic shaping which tends to
be off on the 1st day of the month, then gets turned on at a threshold
and
stays on until the end of the month. Most ISPs can afford to let users
take
all they can eat during non-peak hours without congesting the network.
Even
an Australian ISP could use this type of system because they would only
open
local peering connections during off-peak, not the expensive
trans-oceanic
links. This all hinges on a cooperative P2P client that only downloads
from sites
(or address ranges) which the local topology guru directs them to.
Presumably
the crude traffic shaping systems that cap bandwidth would still remain
in
place for non-cooperating P2P clients.

--Michael Dillon
Alexander Harrowell
2008-04-22 13:00:49 UTC
Permalink
The good news about a DillTorrent solution is that at least the user and ISP
interests are aligned; there's no reason for the ISP to have the guru lie to
the users (because you just know someone'll try it). However, it does
require considerable trust from the users that it will actually lead to a
better experience, rather than just cost-saving at their expense.

And as with any client-side solution, if you can write a client that listens
to it and behaves differently you can write one that pretends to listen:-)

Alex
Daniel Reed
2008-04-23 17:45:11 UTC
Permalink
Post by Petri Helenius
Post by m***@bt.com
But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology. This topology guru would be some software that is run
number of total participants) I fail to figure out the necessary
mathematics where topology information would bring superior results
compared to the usual greedy algorithms where data is requested from the
peers where it seems to be flowing at the best rates. If local peers
with sufficient upstream bandwidth exist, majority of the data blocks
are already retrieved from them.
You can think of the scheduling process as two independent problems:
1. Given a list of all the chunks that all the peers you're connected
to have, select the chunks you think will help you complete the
fastest. 2. Given a list of all peers in a cloud, select the peers you
think will help you complete the fastest.

Traditionally, peer scheduling (#2) has been to just connect to
everyone you see and let network bottlenecks drive you toward
efficiency, as you pointed out.

However, as your chunk scheduling becomes more effective, it usually
becomes more expensive. At some point, its increasing complexity will
reverse the trend and start slowing down copies, as real-world clients
begin to block making chunk requests waiting for CPU to make
scheduling decisions.

A more selective peer scheduler would allow you to reduce the inputs
into the chunk scheduler (allowing it to do more complex things with
the same cost). The idea is, doing more math on the best data will
yield better overall results than doing less math on the best + the
worse data, with the assumption that a good peer scheduler will help
you find the best data.


As seems to be a trend, Michael appears to be fixated on a specific
implementation, and may end up driving many observers into thinking
this idea is annoying :) However, there is a mathematical basis for
including topology (and other nontraditional) information in
scheduling decisions.
Laird Popkin
2008-04-23 19:20:57 UTC
Permalink
Post by Daniel Reed
Post by Petri Helenius
Post by m***@bt.com
But there is another way. That is for software developers to build a
modified client that depends on a topology guru for information on the
network topology. This topology guru would be some software that is run
number of total participants) I fail to figure out the necessary
mathematics where topology information would bring superior results
compared to the usual greedy algorithms where data is requested from the
peers where it seems to be flowing at the best rates. If local peers
with sufficient upstream bandwidth exist, majority of the data blocks
are already retrieved from them.
It's true that in the long run p2p transfers can optimize data sources
by measuring actual throughput, but at any given moment this approach
can only optimize within the set of known peers. The problem is that
for large swarms, any given peer only knows about a very small subset
of available peers, so it may take a long time to discover the best
peers. This means (IMO) that starting with good peers instead of
random peers can make a big difference in p2p performance, as well as
reducing data delivery costs to the ISP.

For example, let's consider a downloader in a swarm of 100,000 peers,
using a BitTorrent announce once a minute that returns 40 peers. Of
course, this is a simple case, but it should be sufficient to make the
general point that the selection of which peers you connect to matters.

Let's look at the odds that you'll find out about the closest peer (in
network terms) over time.

With random peer assignment, the odds of any random peer being the
closest peer is 40/100,000, and if you do the math, the odds of
finding the closest peer on the first announce is 1.58%. Multiplying
that out, it means that you'll have a 38.1% chance of finding the
closest peer in the first half hour, and a 61.7% chance in the first
hour, and 85.3% chance in the first two hours, and so on out as a
geometric curve.

In the real world there are factors that complicate the analysis (e.g.
most Trackers announce much less often than 1/minute, but some peers
have other discovery mechanisms such as Peer Exchange). But as far as
I can tell, the basic issue (that it takes a long time to find out
about and test data exchanges with all of the peers in a large swarm)
still holds.

With P4P, you find out about the closest peers on the first announce.

There's a second issue that I think is relevant, which is that
measured network throughput may not reflect ISP costs and business
policies. For example, a downloader might get data from a fast peer
through a trans-atlantic pipe, but the ISP would really rather have
that user get data from a fast peer on their local loop instead. This
won't happen unless the p2p network knows about (and makes decisions
based on) network topology.

What we found in our first field test was that random peer assignment
moved 98% of data between ISP's and only 2% within ISP's (and for
smaller ISP's, more like 0.1%), and that even simple network awareness
resulted in an average of 34% same-ISP data transfers (i.e. a drop of
32% in external transit). With ISP involvement, the numbers are even
better.
Post by Daniel Reed
1. Given a list of all the chunks that all the peers you're connected
to have, select the chunks you think will help you complete the
fastest. 2. Given a list of all peers in a cloud, select the peers you
think will help you complete the fastest.
Traditionally, peer scheduling (#2) has been to just connect to
everyone you see and let network bottlenecks drive you toward
efficiency, as you pointed out.
However, as your chunk scheduling becomes more effective, it usually
becomes more expensive. At some point, its increasing complexity will
reverse the trend and start slowing down copies, as real-world clients
begin to block making chunk requests waiting for CPU to make
scheduling decisions.
A more selective peer scheduler would allow you to reduce the inputs
into the chunk scheduler (allowing it to do more complex things with
the same cost). The idea is, doing more math on the best data will
yield better overall results than doing less math on the best + the
worse data, with the assumption that a good peer scheduler will help
you find the best data.
Interesting approach. IMO, given modern computers, CPU is highlu
underutilized (PC's are 80% idle, and rarely CPU-bound when in use),
while bandwidth is relatively scarce, so using more CPU to optimize
bandwidth usage seems like a great tradeoff!
Post by Daniel Reed
As seems to be a trend, Michael appears to be fixated on a specific
implementation, and may end up driving many observers into thinking
this idea is annoying :) However, there is a mathematical basis for
including topology (and other nontraditional) information in
scheduling decisions.
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Laird Popkin
CTO, Pando Networks
520 Broadway, 10th floor
New York, NY 10012

***@pando.com
c) 646/465-0570
m***@bt.com
2008-04-23 22:40:11 UTC
Permalink
Post by Daniel Reed
However, as your chunk scheduling becomes more effective, it
usually becomes more expensive. At some point, its increasing
complexity will reverse the trend and start slowing down
copies, as real-world clients begin to block making chunk
requests waiting for CPU to make scheduling decisions.
This is not a bad thing. The intent is to optimize the whole
system, not provide the fastest copies. Those who promote QoS
often talk of some kind of scavenger level of service that
sweeps up any available bandwidth after all the important users
have gotten their fill. I see this type of P2P system in a similar
light, i.e. it allows the ISP to allow as much bandwidth use
as is economically feasible and block the rest. Since the end
user ultimately relies on an ISP having a stable network that
functions in the long term (not drives the ISP to bankruptcy)
this seems to be a reasonable tradeoff.
Post by Daniel Reed
As seems to be a trend, Michael appears to be fixated on a
specific implementation, and may end up driving many
observers into thinking this idea is annoying :) However,
there is a mathematical basis for including topology (and
other nontraditional) information in scheduling decisions.
There is also precedent for this in manufacturing scheduling
where you optimize your total systems by identifying the prime
bottleneck and carefully managing that single point in the
chain of operations. I'm not hung up on a specific implementation,
just trying to present a concrete example that could be a starting
point. And until today, I knew nothing about the P4P effort which
seems to be working in the same direction.

--Michael Dillon
Brandon Butterworth
2008-04-22 12:27:01 UTC
Permalink
Post by Petri Helenius
Is there a problem that needs to be solved that is not solved by
Akamai's of the world already?
Yes, the ones that aren't Akamai want to play too

branodn
Laird Popkin
2008-04-22 14:48:59 UTC
Permalink
This raises an interesting issue - should optimization of p2p traffic (P4P) be based on "static" network information, or "dynamic" network information. It's certainly easier for ISP's to provide a simple network map that real-time network condition data, but the real-time data might be much more effective. Or even if it's not real-time, perhaps there could be "static" network maps reflecting conditions at different times of day?

Since P4P came up, I'd like to mention that the P4P Working Group is putting together another field test, where we can quantify issues like the tradeoff between static and dynamic network data, and we would love to hear from any ISP's that would be interested in participating in that test. If you'd like the details of what it would take to participate, and what data you would get out of it, please email me.

Of course, independently of the test, if you're interested in participating in the P4P Working Group, we'd love to hear from you!

- Laird Popkin, CTO, Pando Networks
email: ***@pando.com
mobile: 646/465-0570

----- Original Message -----
From: "Alexander Harrowell" <***@gmail.com>
To: "Stephane Bortzmeyer" <***@nic.fr>
Cc: ***@nanog.org
Sent: Tuesday, April 22, 2008 10:10:28 AM (GMT-0500) America/New_York
Subject: Re: [Nanog] Lies, Damned Lies, and Statistics [Was: Re: ATT VP: Internet to hit capacity by 2010]

Personally I consider P4P a big step forward; it's good to see Big Verizon
engaging with these issues in a non-coercive fashion.

Just to braindump a moment, it strikes me that it would be very useful to be
able to announce preference metrics by netblock (for example, to deal with
networks with varied internal cost metrics or to pref-in the CDN servers)
but also risky. If that was done, client developers would be well advised to
implement a check that the announcing network actually owns the netblock
they are either preffing in (to send traffic via a suboptimal route/through
a spook box of some kind/onto someone else's pain-point) or out (to restrict
traffic from reaching somewhere); you wouldn't want a hijack, whether
malicious or clue-deficient.

There is every reason to encourage the use of dynamic preference.
Post by Stephane Bortzmeyer
On Tue, Apr 22, 2008 at 02:02:21PM +0100,
Post by m***@bt.com
This is where all the algorithmic tinkering of the P2P software
cannot solve the problem. You need a way to insert non-technical
information about the network into the decision-making process.
It's strange that noone in this thread mentioned P4P yet. Isn't there
someone involved in P4P at Nanog?
http://www.dcia.info/activities/p4pwg/
IMHO, the biggest issue with P4P is the one mentioned by Alexander
Harrowell. After that users have been s.....d up so many times by some
ISPs, will they trust this service?
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Christopher Morrow
2008-04-23 14:47:18 UTC
Permalink
Post by Laird Popkin
This raises an interesting issue - should optimization of p2p traffic (P4P) be based on "static" network information, or "dynamic" network information. It's certainly easier for ISP's to provide a simple network map that real-time network condition data, but the real-time data might be much more effective. Or even if it's not real-time, perhaps there could be "static" network maps reflecting conditions at different times of day?
100% solution + 100% more complexity vs 80% solution ?

It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
network sense. Something like:

1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?

This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...

P4P is an interesting move by Verizon, tin-hat-ness makes me think
it's a method to raise costs on the direct competitors to VZ (increase
usage on access-links where competitors mostly have shared
access-links) but I agree with Harrowell that it's sure nice to see VZ
participating in Internet things in a good way for the community.
(though see tin-hat perhaps it's short-term good and long-term
bad.../me puts away hat now)

-Chris
Alexander Harrowell
2008-04-23 15:39:56 UTC
Permalink
On Wed, Apr 23, 2008 at 3:47 PM, Christopher Morrow <
Post by Christopher Morrow
It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?
This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...
Well. here's your problem; depending on the architecture, the IP addressing
structure doesn't necessarily map to the network's cost structure. This is
why I prefer the P4P/DillTorrent announcement model.

Alex
Christopher Morrow
2008-04-23 18:17:25 UTC
Permalink
On Wed, Apr 23, 2008 at 11:39 AM, Alexander Harrowell
Post by Alexander Harrowell
On Wed, Apr 23, 2008 at 3:47 PM, Christopher Morrow
Post by Christopher Morrow
It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?
This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...
Well. here's your problem; depending on the architecture, the IP addressing
structure doesn't necessarily map to the network's cost structure. This is
why I prefer the P4P/DillTorrent announcement model.
sure 80/20 rule... less complexity in the clients and some benefit(s).
perhaps short term something like the above with longer term more
realtime info about locality.
Laird Popkin
2008-04-23 19:50:25 UTC
Permalink
Post by Christopher Morrow
On Wed, Apr 23, 2008 at 11:39 AM, Alexander Harrowell
Post by Alexander Harrowell
On Wed, Apr 23, 2008 at 3:47 PM, Christopher Morrow
Post by Christopher Morrow
It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?
This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...
Well. here's your problem; depending on the architecture, the IP addressing
structure doesn't necessarily map to the network's cost structure. This is
why I prefer the P4P/DillTorrent announcement model.
sure 80/20 rule... less complexity in the clients and some benefit(s).
perhaps short term something like the above with longer term more
realtime info about locality.
For the applications, it's a lot less work to use a clean network map
from ISP's than it is to in effect derive one from lookups to ASN, /
24, /16, pings, traceroutes, etc. The main reason to spend the effort
to implement those tactics is that it's better than not doing
anything. :-)

Laird Popkin
CTO, Pando Networks
520 Broadway, 10th floor
New York, NY 10012

***@pando.com
c) 646/465-0570
Christopher Morrow
2008-04-23 21:14:12 UTC
Permalink
Post by Alexander Harrowell
Post by Christopher Morrow
On Wed, Apr 23, 2008 at 11:39 AM, Alexander Harrowell
Post by Alexander Harrowell
On Wed, Apr 23, 2008 at 3:47 PM, Christopher Morrow
Post by Christopher Morrow
It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?
This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...
Well. here's your problem; depending on the architecture, the IP
addressing
Post by Christopher Morrow
Post by Alexander Harrowell
structure doesn't necessarily map to the network's cost structure. This
is
Post by Christopher Morrow
Post by Alexander Harrowell
why I prefer the P4P/DillTorrent announcement model.
sure 80/20 rule... less complexity in the clients and some benefit(s).
perhaps short term something like the above with longer term more
realtime info about locality.
For the applications, it's a lot less work to use a clean network map from
ISP's than it is to in effect derive one from lookups to ASN, /24, /16,
pings, traceroutes, etc. The main reason to spend the effort to implement
those tactics is that it's better than not doing anything. :-)
so.. 'not doing anything' may or may not be a good plan.. bittorrent
works fine today(tm). On the other hand, asking network folks to turn
over 'state secrets' (yes some folks, including doug's company)
believe that their network diagrams/designs/paths are in some way
'secret' or a 'competitive advantage', so that will be a blocking
factor. While, doing simple/easy things initially (most bittorrent
things I've seen have <50 peers certainly there are more in some
cases, but average? > or < than 100? so dns lookups or bit-wise
comparisons seem cheap and easy) that get the progress going seems
like a grand plan.

Being blocked for the 100% solution and not making
progress/showing-benefit seems bad :(

-Chris
m***@bt.com
2008-04-23 22:26:16 UTC
Permalink
Post by Alexander Harrowell
Well. here's your problem; depending on the architecture, the
IP addressing structure doesn't necessarily map to the
network's cost structure. This is why I prefer the
P4P/DillTorrent announcement model.
What's with these cute cryptic and ultimately meaningless names?

I used the term "topology guru" because I wanted something that
halfway describes what is going on. Coining a word with "torrent"
in it is wrong because this kind of topology guru can be used with
any P2P protocol. And P4P seems more like a brand name that tries
to leverage off the term P2P.

--Michael Dillon
Brandon Galbraith
2008-04-23 22:36:12 UTC
Permalink
Post by m***@bt.com
Post by Alexander Harrowell
Well. here's your problem; depending on the architecture, the
IP addressing structure doesn't necessarily map to the
network's cost structure. This is why I prefer the
P4P/DillTorrent announcement model.
What's with these cute cryptic and ultimately meaningless names?
I used the term "topology guru" because I wanted something that
halfway describes what is going on. Coining a word with "torrent"
in it is wrong because this kind of topology guru can be used with
any P2P protocol. And P4P seems more like a brand name that tries
to leverage off the term P2P.
--Michael Dillon
Perhaps call it TopoMaster, and make it an open protocol that any app that
needs to move lots o' bits around can use.

-brandon
Laird Popkin
2008-04-23 22:30:46 UTC
Permalink
I would certainly view the two strategies (reverse engineering network information and getting ISP-provided network information) as being complimentary. As you point out, for any ISP that doesn't provide network data, we're better off figuring out what we can to be smarter than 'random'. So while I prefer getting better data from ISP's, that's not holding us back from doing what we can without that data.

ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.

- Laird Popkin, CTO, Pando Networks
mobile: 646/465-0570

----- Original Message -----
From: "Christopher Morrow" <***@gmail.com>
To: "Laird Popkin" <***@pando.com>
Cc: "Alexander Harrowell" <***@gmail.com>, "Doug Pasko" <***@verizon.com>, ***@nanog.org
Sent: Wednesday, April 23, 2008 5:14:12 PM (GMT-0500) America/New_York
Subject: Re: P2P traffic optimization Was: [Nanog] Lies, Damned Lies, and Statistics [Was: Re: ATT VP: Internet to hit capacity by 2010]
Post by Alexander Harrowell
Post by Christopher Morrow
On Wed, Apr 23, 2008 at 11:39 AM, Alexander Harrowell
Post by Alexander Harrowell
On Wed, Apr 23, 2008 at 3:47 PM, Christopher Morrow
Post by Christopher Morrow
It strikes me that often just doing a reverse lookup on the peer
address would be 'good enough' to keep things more 'local' in a
1) prefer peers with PTR's like mine (perhaps get address from a
public-ish server - myipaddress.com/ipchicken.com/dshield.org)
2) prefer peers within my /24->/16 ?
This does depend on what you define as 'local' as well, 'stay off my
transit links' or 'stay off my last-mile' or 'stay off that godawful
expensive VZ link from CHI to NYC in my backhaul network...
Well. here's your problem; depending on the architecture, the IP
addressing
Post by Christopher Morrow
Post by Alexander Harrowell
structure doesn't necessarily map to the network's cost structure. This
is
Post by Christopher Morrow
Post by Alexander Harrowell
why I prefer the P4P/DillTorrent announcement model.
sure 80/20 rule... less complexity in the clients and some benefit(s).
perhaps short term something like the above with longer term more
realtime info about locality.
For the applications, it's a lot less work to use a clean network map from
ISP's than it is to in effect derive one from lookups to ASN, /24, /16,
pings, traceroutes, etc. The main reason to spend the effort to implement
those tactics is that it's better than not doing anything. :-)
so.. 'not doing anything' may or may not be a good plan.. bittorrent
works fine today(tm). On the other hand, asking network folks to turn
over 'state secrets' (yes some folks, including doug's company)
believe that their network diagrams/designs/paths are in some way
'secret' or a 'competitive advantage', so that will be a blocking
factor. While, doing simple/easy things initially (most bittorrent
things I've seen have <50 peers certainly there are more in some
cases, but average? > or < than 100? so dns lookups or bit-wise
comparisons seem cheap and easy) that get the progress going seems
like a grand plan.

Being blocked for the 100% solution and not making
progress/showing-benefit seems bad :(

-Chris
Christopher Morrow
2008-04-23 23:47:57 UTC
Permalink
Post by Laird Popkin
I would certainly view the two strategies (reverse engineering network information and getting ISP-
provided network information) as being complimentary. As you point out, for any ISP that doesn't
provide network data, we're better off figuring out what we can to be smarter than 'random'. So while I
prefer getting better data from ISP's, that's not holding us back from doing what we can without that
data.
ok, sounds better :) or more reasonable, or not immediately doomed to
blockage :) 'more realistic' even.
Post by Laird Popkin
ISP's have been very clear that they regard their network maps as being proprietary for many good
reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker)
that processes the network maps and provides abstracted guidance (lists of IP prefixes and
percentages) to the p2p networks that allows them to figure out which peers are near each other. The > iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.
What's to keep the itracker from being the new 'napster megaserver'? I
suppose if it just trades map info or lookup (ala dns lookups) and
nothing about torrent/share content things are less sensitive from a
privacy perspective. and a single point of failure of the network
perspective.

Latency requirements seem to be interesting for this as well... at
least dependent upon the model for sharing of the mapping data. I'd
think that a lookup model served the client base better (instead of
downloading many large files of maps in order to determine the best
peers to use). There's also a sensitivity to the part of the network
graph and which perspective to use for the client -> peer locality
mapping.

It's interesting at least :)

Thanks!
-Chris

(also, as an aside, your mail client seems to be making each paragraph
one long unbroken line... which drives at least pine and gmail a bit
bonkers...and makes quoting messages a much more manual process than
it should be.)
Michael Holstein
2008-04-24 13:30:39 UTC
Permalink
Post by Laird Popkin
ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.
Won't this approach (using a ISP-managed intermediate) ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?

Cheers,

Michael Holstein
Cleveland State University
Mike Gonnason
2008-04-24 13:38:11 UTC
Permalink
On Thu, Apr 24, 2008 at 5:30 AM, Michael Holstein
Post by Michael Holstein
Post by Laird Popkin
ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.
Won't this approach (using a ISP-managed intermediate) ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?
Cheers,
Michael Holstein
Cleveland State University
This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.

-Mike Gonnason
Keith O'Neill
2008-04-24 13:48:26 UTC
Permalink
The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker and
is used for network topology, has no caching or file information or user
information..

Keith O'Neill
Pando Networks
Post by Mike Gonnason
On Thu, Apr 24, 2008 at 5:30 AM, Michael Holstein
Post by Michael Holstein
Post by Laird Popkin
ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.
Won't this approach (using a ISP-managed intermediate) ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?
Cheers,
Michael Holstein
Cleveland State University
This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.
-Mike Gonnason
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Eric Osterweil
2008-04-24 15:59:23 UTC
Permalink
Post by Keith O'Neill
The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker and
is used for network topology, has no caching or file information or user
information..
After reading the P4P paper, it seems like the iTrackers have some
large implications. Off the top of my head:
- - The paper says, "An iTracker provides... network status/
topology..." doesn't it seem like you wouldn't want to send this to
P2P clients? Is the "PID" supposed to preserve privacy here? I have
some doubts about how well the PID helps after exposing ASN and LOC.
- - As a P2P developer, wouldn't I be worried about giving the iTracker
the ability to tell my clients that their upload/download capacity is
0 (or just above)? It seems like iTrackers are allowed to control
P2P clients completely w/ this recommendation, right? That would be
very useful for an ISP, but a very dangerous DoS vector to clients.

These are just a couple of the thoughts that I had while reading.

Eric
Post by Keith O'Neill
Keith O'Neill
Pando Networks
Post by Mike Gonnason
On Thu, Apr 24, 2008 at 5:30 AM, Michael Holstein
Post by Michael Holstein
Post by Laird Popkin
ISP's have been very clear that they regard their network maps
as being proprietary for many good reasons. The approach that
P4P takes is to have an intermediate server (which we call an
iTracker) that processes the network maps and provides
abstracted guidance (lists of IP prefixes and percentages) to
the p2p networks that allows them to figure out which peers are
near each other. The iTracker can be run by the ISP or by a
trusted third party, as the ISP prefers.
Won't this approach (using a ISP-managed intermediate)
ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?
Cheers,
Michael Holstein
Cleveland State University
This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.
-Mike Gonnason
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Laird Popkin
2008-04-24 16:24:38 UTC
Permalink
Post by Paul Ferguson
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Keith O'Neill
The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker and
is used for network topology, has no caching or file information or user
information..
After reading the P4P paper, it seems like the iTrackers have some
- - The paper says, "An iTracker provides... network status/
topology..." doesn't it seem like you wouldn't want to send this to
P2P clients? Is the "PID" supposed to preserve privacy here? I have
some doubts about how well the PID helps after exposing ASN and LOC.
The PID is an identifier of a POP, which is really just a grouping
mechanism telling the P2P network that all of the nodes with IP
addresses that match a list of prefixes are in "the same place" in
network terms. The definition of "the same place" is up to the ISP -
it can be metro area, region, or even local loop or cable head end,
depending on the ISP's desire to localize traffic. The PID is an
arbitrary string sent by the ISP, so it could be numbers, name of a
city, etc., depending on how much the ISP wants to reveal. PID's are
tied to ASN, but of course all IP's can be mapped to ASN easily, so
that's not revealing new information.

The information that the iTracker sends to the p2p network is:
- ASN (which is public)
- PID (e.g. "1234" or "New York")
- For each PID, a list of IP prefixes that identify users in the PID
- A weight matrix of how much the ISP wants peers to connect
between each pair of PID's. For example, if the PID's were cities, the
weights might be something like "NYC to Philadephia 30%, NYC to
Chicago 25%, NYC to LA 2%", and so on. Or if the PID's are
'anonymized' then it could be something like "123 to 456 30%, 123 to
876 25%, 123 to 1432 2%" and so on.
Post by Paul Ferguson
- - As a P2P developer, wouldn't I be worried about giving the
iTracker
the ability to tell my clients that their upload/download capacity is
0 (or just above)? It seems like iTrackers are allowed to control
P2P clients completely w/ this recommendation, right? That would be
very useful for an ISP, but a very dangerous DoS vector to clients.
It's important to keep in mind that P4P doesn't control the P2P
network, it's just an additional source of data provided to the P2P
Trackers (for example) in addition to whatever else the P2P network
already does, helping the p2p network make smarter peer assignments.
But P4P doesn't tell p2p clients what to do, or give the ISP any
control over the P2P network. Specifically, if the P4P data from one
ISP is bad, the P2P network can (and presumably will) choose to ignore
it.
Post by Paul Ferguson
These are just a couple of the thoughts that I had while reading.
I appreciate your taking the time. This is a good discussion.
Post by Paul Ferguson
Eric
Post by Keith O'Neill
Keith O'Neill
Pando Networks
Post by Mike Gonnason
On Thu, Apr 24, 2008 at 5:30 AM, Michael Holstein
Post by Michael Holstein
Post by Laird Popkin
ISP's have been very clear that they regard their network maps
as being proprietary for many good reasons. The approach that
P4P takes is to have an intermediate server (which we call an
iTracker) that processes the network maps and provides
abstracted guidance (lists of IP prefixes and percentages) to
the p2p networks that allows them to figure out which peers are
near each other. The iTracker can be run by the ISP or by a
trusted third party, as the ISP prefers.
Won't this approach (using a ISP-managed intermediate)
ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?
Cheers,
Michael Holstein
Cleveland State University
This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.
The P2P network doesn't provide this kind of information to the
iTracker.

We're comparing two models, "generic' and 'tuned per swarm'.

In the 'generic' model, the P2P network is given one weight matrix,
based purely on the ISP's network. In this model, the P2P network
doesn't provide any information to the iTracker at all - they just
request an updated weight matrix periodically so that when the ISP
changes network structure or policies it's updated in the P2P network
automatically.

In the 'tuned per swarm' model, the P2P network provides information
about peer distribution of each swarm's peers (e.g. there are seeds in
NYC and downloaders in Chicago). With this information, the iTracker
can provide a 'tuned' weight matrix for each swarm, which should in
theory be better. This is something that we're going to test in the
next field test, so we can put some numbers around it. This model
requires more communications, and exposes more of the p2p network's
information to the ISP, so it's important to be able to quantify the
benefit to decide whether it's worth it.

BTW, if this discussion is getting off topic for the NANOG mailing
list, we can continue the discussion offline. Does anyone think that
we should do so?
Post by Paul Ferguson
Post by Keith O'Neill
Post by Mike Gonnason
-Mike Gonnason
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD4DBQFIEK5hK/tq6CJjZQIRAgXqAJd8t3XkmYqo1WYaJP7qOF4W67tYAJ9C5hZ+
iwVc8ZU8AJ3f98KCFCq8Eg==
=LEPV
-----END PGP SIGNATURE-----
_______________________________________________
NANOG mailing list
http://mailman.nanog.org/mailman/listinfo/nanog
Laird Popkin
CTO, Pando Networks
520 Broadway, 10th floor
New York, NY 10012

***@pando.com
c) 646/465-0570

Alexander Harrowell
2008-04-24 14:24:26 UTC
Permalink
Post by Mike Gonnason
This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.
A good point about the approach of announcing a list of prefixes and
preference metrics, rather than doing lookups for each peer individually, is
that the supernode's logs will only tell you who used a p2p client at all;
nothing about what they did with it.

If you have to lookup each peer, the log would be enough to start building a
social graph of the p2p network, which would be a good start towards knowing
who to send the nastygram to. Reading the following description of the P4P
Post by Mike Gonnason
The approach that P4P takes is to have an intermediate server (which we
call an iTracker) that >processes the network maps and provides abstracted
guidance (lists of IP prefixes and >percentages) to the p2p networks that
allows them to figure out which peers are near each other.
m***@bt.com
2008-04-24 13:52:42 UTC
Permalink
Post by Michael Holstein
Won't this approach (using a ISP-managed intermediate)
ultimately end up being co-opted by the lawyers for the
various industry "interest groups"
and thus be ignored by the p2p users?
To bring this back to network operations, it doesn't much
matter what lawyers and end users do. The bottom line is that
if P2P traffic takes up too much bandwidth at the wrong points
of the network or the wrong times of day, then ISPs will do things
like blocking it, disrupting connections(Comcast), and traffic
shaping (artificial congestion). The end users will get slower
downloads as a result.

Or, everybody can put their heads together, make something that
works for ISPs operationally, and give the end users faster
downloads. The whole question is how to multicast content over
the Internet in the most cost effective way.

--Michael Dillon
Michael Holstein
2008-04-24 15:50:24 UTC
Permalink
Post by m***@bt.com
Or, everybody can put their heads together, make something that
works for ISPs operationally, and give the end users faster
downloads. The whole question is how to multicast content over
the Internet in the most cost effective way.
This will work as long as the "optimization" strategy is content-agnostic.

p2p users want their content
netops want efficient utilization
lawyers want logfiles

You can have 2 out of 3.


Cheers,

Michael Holstein
Cleveland State University
Laird Popkin
2008-04-23 23:30:43 UTC
Permalink
In case anyone's curious, there's more info on P4P at http://cs-www.cs.yale.edu/homes/yong/p4p/index.html.

- Laird Popkin, CTO, Pando Networks
mobile: 646/465-0570

----- Original Message -----
From: "michael dillon" <***@bt.com>
To: ***@nanog.org
Sent: Wednesday, April 23, 2008 6:40:11 PM (GMT-0500) America/New_York
Subject: Re: [Nanog] Lies, Damned Lies, and Statistics [Was: Re: ATT VP: Internet to hit capacity by 2010]
Post by Daniel Reed
However, as your chunk scheduling becomes more effective, it
usually becomes more expensive. At some point, its increasing
complexity will reverse the trend and start slowing down
copies, as real-world clients begin to block making chunk
requests waiting for CPU to make scheduling decisions.
This is not a bad thing. The intent is to optimize the whole
system, not provide the fastest copies. Those who promote QoS
often talk of some kind of scavenger level of service that
sweeps up any available bandwidth after all the important users
have gotten their fill. I see this type of P2P system in a similar
light, i.e. it allows the ISP to allow as much bandwidth use
as is economically feasible and block the rest. Since the end
user ultimately relies on an ISP having a stable network that
functions in the long term (not drives the ISP to bankruptcy)
this seems to be a reasonable tradeoff.
Post by Daniel Reed
As seems to be a trend, Michael appears to be fixated on a
specific implementation, and may end up driving many
observers into thinking this idea is annoying :) However,
there is a mathematical basis for including topology (and
other nontraditional) information in scheduling decisions.
There is also precedent for this in manufacturing scheduling
where you optimize your total systems by identifying the prime
bottleneck and carefully managing that single point in the
chain of operations. I'm not hung up on a specific implementation,
just trying to present a concrete example that could be a starting
point. And until today, I knew nothing about the P4P effort which
seems to be working in the same direction.

--Michael Dillon
Laird Popkin
2008-04-24 15:40:19 UTC
Permalink
Replies below:

- Laird Popkin, CTO, Pando Networks
mobile: 646/465-0570

----- Original Message -----
From: "Christopher Morrow" <***@gmail.com>
To: "Laird Popkin" <***@pando.com>
Cc: "Alexander Harrowell" <***@gmail.com>, "Doug Pasko" <***@verizon.com>, ***@nanog.org
Sent: Wednesday, April 23, 2008 7:47:57 PM (GMT-0500) America/New_York
Subject: Re: P2P traffic optimization Was: [Nanog] Lies, Damned Lies, and Statistics [Was: Re: ATT VP: Internet to hit capacity by 2010]
Post by Laird Popkin
I would certainly view the two strategies (reverse engineering network information and getting ISP-
provided network information) as being complimentary. As you point out, for any ISP that doesn't
provide network data, we're better off figuring out what we can to be smarter than 'random'. So while I
prefer getting better data from ISP's, that's not holding us back from doing what we can without that
data.
ok, sounds better :) or more reasonable, or not immediately doomed to
blockage :) 'more realistic' even.

- Thanks. Given that there are many thousands of ISP's, an incremental approach seemed best. :-)
Post by Laird Popkin
ISP's have been very clear that they regard their network maps as being proprietary for many good
reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker)
that processes the network maps and provides abstracted guidance (lists of IP prefixes and
percentages) to the p2p networks that allows them to figure out which peers are near each other. The > iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.
What's to keep the itracker from being the new 'napster megaserver'? I
suppose if it just trades map info or lookup (ala dns lookups) and
nothing about torrent/share content things are less sensitive from a
privacy perspective. and a single point of failure of the network
perspective.

- That's a good point. The iTracker never knows what's moving in the P2P network. We are comparing two recommendation models, which expose different levels of information. In the 'general' model, the iTracker knows nothing about the p2p network, but provides a recommendation matrix based purely on the ISP's network resources and policies. In the 'per torrent' model, the iTracker receives information about peer distribution (e.g. there are many seeds in NYC, and many downloaders in Chicago), in which case it can make peering recommendations based on that knowledge. The latter approach seems like it should be better able to 'tune' communications (to reduce maximum link utilization, etc.), but it requires the p2p network to provide real-time information about swarm distribution, which involves more communications, and exposes more details of the network to the iTracker, raising some privacy concerns. Admittedly the iTracker doesn't know what the swarm is delivering, but it would know (in network terms) where the users in that swarm are, for example.

Latency requirements seem to be interesting for this as well... at
least dependent upon the model for sharing of the mapping data. I'd
think that a lookup model served the client base better (instead of
downloading many large files of maps in order to determine the best
peers to use). There's also a sensitivity to the part of the network
graph and which perspective to use for the client -> peer locality
mapping.

- The network data is loaded into the p2p networks's Tracker, and used locally there, so there's no external communications during normal p2p network operation. The communication pattern in P4P (current, at any rate - it's still evolving) is that the P2P network's Tracker polls the P4P iTracker periodically to receive updated map files. In the case of the 'general' weight map, it could be one update every few minutes (or every day, etc., depending on how often the ISP cares to update network information). In the case of 'per torrent' optimization, it's an update per swarm every few minutes, which is much more messaging, so it might only make sense to do this for a very small number of the most popular swarms.

It's interesting at least :)

Thanks!
-Chris

(also, as an aside, your mail client seems to be making each paragraph
one long unbroken line... which drives at least pine and gmail a bit
bonkers...and makes quoting messages a much more manual process than
it should be.)

- Sorry - I reconfigured to send 'plain text' email. Does it show up OK? I'm using Zimbra's web mail interface.
Loading...