Opened 12 years ago

Closed 12 years ago

#2430 closed Enhancement (fixed)

Peer atom pool grows too large

Reported by: KyleK Owned by: charles
Priority: Normal Milestone: 1.80
Component: libtransmission Version: 1.75
Severity: Major Keywords:
Cc: jch@…

Description

Each PEX message potentially contains of a list of dropped peers, which Transmission currently ignores. It should inspect these peers and drop them from the peer list if included.

Attachments (5)

pex-dropped-v1.patch (3.1 KB) - added by KyleK 12 years ago.
pex-dropped-v2.patch (4.5 KB) - added by KyleK 12 years ago.
0001-Maintain-atom-expiration-time-ignore-expired-atoms.patch (5.5 KB) - added by jch 12 years ago.
0002-Implement-tr_ptrArrayCull.patch (1.6 KB) - added by jch 12 years ago.
0003-Expire-atoms-periodically.patch (1.8 KB) - added by jch 12 years ago.

Download all attachments as: .zip

Change History (53)

Changed 12 years ago by KyleK

comment:1 Changed 12 years ago by KyleK

I've implemented the necessary code to read the "dropped" list from a PEX message and remove any potential peers from the peer list.

Unfortunately, this causes SEGFAULTS and aborts in the most strangesst places: I've seen gdb stop in malloc, realloc, and free. The code where T crashes seems to have no direct relations to the new code introduced by this patch.

I can only imagine that Transmission uses a peer from the list, then this new code removes the peer from the list (and frees the memory), and after that some other code tries to access the peer data.

@charles, I think you'll have to take a look at this.

comment:2 Changed 12 years ago by jch

  • Cc jch@… added

It should inspect these peers and drop them from the peer list if included.

Why is this useful?

comment:3 Changed 12 years ago by KyleK

Transmission currently accumulates all peer addresses that it receives via PEX. Over a longer run, this will be a lot (I've had torrents with 18.000 peer addresses). This slows down Transmission considerably on embedded machines. The PEX protocol specifically uses the "dropped" list to notify other clients of peers that don't exist anymore: they might have shut down, or dropped the torrent.

In order to use the PEX protocol efficiently, it is necessary to keep the internal list clean.

comment:4 Changed 12 years ago by jch

Does that mean that PEX entries never time out? In that case, the DHT is incorrect -- PEX data obtained from the DHT should time out after 32 minutes or so (slightly more than 30 minutes), since the DHT will refresh it after no more than 30 minutes.

--Juliusz

comment:5 follow-ups: Changed 12 years ago by KyleK

This doesn't have anything to do with DHT. PEX is a protocol with which peers that are connected to each other (because they're downloading the same torrent) can exchange other peer's addresses.

The specification says that each peer may send up to 50 peer addresses per minute to other peers. He can also send addresses of peers he dropped. This should mainly be peers that A) left the peer cloud B) behaved badly (e.g. lots of failed CRC hashes), C) possible other reasons that I can't think of right now :)

As stated in my previous comment, up until now Transmission was stockpiling peer addresses, storing each and every (valid) peer address it received. This slowed down T considerably on my machine.

As far as I can see in the code, peers received via PEX never time out. I do not know how DHT works.

comment:6 Changed 12 years ago by KyleK

Ok, here's v2 of the patch. I used the wrong unpack method for ipv6 peers. I also forgot to check if an atom from the torrent->pool list was also present in the torrent->peers list. Now if we receive a dropped peer address and it's an active peer, the peer gets closed and removed as well as the atom itself.

Transmission has been running for a while now, with no crashes whatsoever.

Changed 12 years ago by KyleK

comment:7 in reply to: ↑ 5 Changed 12 years ago by jch

Replying to KyleK:

This doesn't have anything to do with DHT. PEX is a protocol...

This was actually clarified on IRC, just summarising for the record.

While PEX and DHT are distinct (sub-)protocols, the DHT code in transmission uses the PEX interfaces. Hence, anything that applies to PEX peers also applies to DHT peers.

Currently, the DHT code inserts peers into the transmission core using tr_peerMgrCompactToPex, and never removes them. Any peer inserted by the DHT should be timed out after 32 minutes, unless it is refreshed by another call to tr_peerMgrCompactToPex.

Charles, shall I open a new bug?

--Juliusz

comment:8 in reply to: ↑ 5 Changed 12 years ago by jch

Replying to KyleK:

[A peer] can also send addresses of peers he dropped. This should mainly be peers that A) left the peer cloud B) behaved badly (e.g. lots of failed CRC hashes), C) possible other reasons that I can't think of right now :)

(B) makes sense; but how do you do (A)? How do you know that a peer that has dropped you has actually left the cloud?

Random thoughts:

  1. Tread lightly -- a PEX atom should only be removed by the peer that originally sent us the atom, lest we suffer from an easy Denial of Service attack. I've only scanned over it, but this is not what your patch does.
  1. Tread lightly -- we should not remove a peer that we're currently happily communicating with. Your patch drops it in this case.

In short, I'm still not convinced that this feature is worth the trouble.

--Juliusz

comment:9 follow-up: Changed 12 years ago by KyleK

An explanation: I recently noticed a high CPU usage in Transmission and tracked it down to the really large atom pool, which is growing as long as a torrent is live (I believe it grows even after it's finished). I implemented this to the best of my knowledge of how PEX works, in the hopes that the atom list would maintain itself. Seeing now that DHT peers also use the PEX interface internally, that is not the case.

For the live of me I cannot find any definitive specification or proposal regarding how PEX is supposed to work. If you know where the PEX protocol is written down in detail, please let me know, I want to know more about it :)

Regarding your points:

  1. I hadn't thought of that. I assume it will be difficult to implement this, though. I don't see that a single peer atom knows about its "creator", so to speak.
  1. I've talked with charles about this, it would probably be best to just ignore dropped addresses that we're still communicating with.

Although, regarding the last point: If another client dropped peer X because its data packets failed hash checks Y times, and we're still downloading, it might be reasonable to drop the peer as well. Unfortunately we do not know why a peer was dropped.

Regarding (A): I had assumed that a peer that closes its Bittorrent application will inform other peers that it's leaving the cloud. If that is not the case, and the other peers are not able to tell why the connection was dropped, then this is of course not a valid reason for being in a "dropped" PEX message.

After approx. 3hrs running, a public torrent has gathered 5381 known peers, 3137 of them via DHT. PEX is currently disabled, the number would be even higher otherwise. This tells me that peer atom management is badly needed here. The way these "known peers" lists are implemented, a memmove is done for every inseration or removal. At thousands of peers, on a low-power embedded device, this uses a lot of cycles.

comment:10 in reply to: ↑ 9 Changed 12 years ago by jch

  • Summary changed from PEX: support "dropped" field to PEX and DHT: atom pool management badly needed
  • Type changed from Enhancement to Bug

Replying to KyleK:

For the live of me I cannot find any definitive specification or proposal regarding how PEX is supposed to work.

As far as I know, there's none. There are two different PEX specifications (ut_pex, and the Azureus PEX), and at least two different incompatible implementations of ut_pex.

Regarding (A): I had assumed that a peer that closes its Bittorrent application will inform other peers that it's leaving the cloud.

It won't -- there's no way to distinguish a peer shutting down from a peer that dropped you.

Regarding your points: 1. I hadn't thought of that. I assume it will be difficult to implement this, though.

I'm claiming that it's not worth the trouble. See below.

This tells me that peer atom management is badly needed here.

I agree. I suggest that the following needs to be done:

  1. Peer atoms should time out if they are not refreshed. For atoms learned from PEX, the timeout should depend on the total number of peers known for this torrent, but be rather large (a day or so for small swarms, an hour or so for large swarms); for atoms learned from the DHT, it should be on the order of 32 minutes. Jitter (on the order of a couple of minutes) should be applied to atom timeouts to avoid global synchronisation.

(An atom is refreshed whenever either we have direct evidence that it is alive -- e.g. we receive a BitTorrent? message from that peer -- or we receive a PEX/DHT message that claims this peer is still alive.)

  1. The total number of peer atoms should be limited, and atoms treated as an LRU list -- i.e. when the total number is exceeded, the least recently used atoms should be dropped.

--Juliusz

comment:11 Changed 12 years ago by charles

Well, this was interesting:

06:03 < charles> alus: what does utorrent do with the information it gets from a "dropped" field in pex?
06:05 < charles> I'm redoing the code in Transmission that decides how many peer addresses to remember over time, and when to age them out
06:05 < charles> one user submitted a patch that removes "dropped" peers from Transmission's internal tables
06:06 < charles> but it seems to me that could be exploited by a malicious peer
06:06 < alus> well, we maintain per-peer tables
06:06 < charles> so I got to wondering, if we can't trust what's in the "dropped" field, what good is it
06:07 < charles> what are the per-peer tables used for?
06:07 < alus> for maintaining the sets that they are updating
06:08 < charles> ...
06:09 < alus> so we maintain a per-peer set, which they update by adding and removing things
06:09 < charles> right, I understand that part
06:09 < alus> when a new address is added, we put it in our torrent-global set
06:09 < alus> that's all.
06:10 < alus> we don't do anything when they drop it except drop it from their per-peer set
06:10 < alus> but it's useful to keep a list of peers they claim to be connected to, for this hole-punching extension
06:11 < charles> ah, ok...
06:12 < charles> ...doesn't "dropped" predate the hole-punching extension by a fair amount?
06:12 < alus> sure
06:13 < charles> so are there other uses for knowing which peers know which peers?
06:13 < alus> not that I know of
06:13 < charles> what was "dropped" used for before the hole-punching extension?
06:13 < alus> not for anything interesting, no
06:14 < alus> I mean, we still maintained the set
06:14 < alus> but that set wasn't used for anything except eating memory
06:14 < charles> heh :)
06:20 < The_8472> we're using it
06:21 < The_8472> to keep track of how many peers a peer is connected to
06:21 < The_8472> and see if a peer is poorly connected
06:23 < charles> when you track if a peer is poorly connected, when do you use that information?
06:24 < The_8472> i think it's used in some connection heuristics or whatever. don't ask for details, i don't want to touch that code
06:25 < charles> thanks for the info

comment:12 follow-up: Changed 12 years ago by KyleK

charles:

Reading the code, I see that the pool where potential peer addresses are kept is a array, which is reallocated every time a new address gets added. Also, memmove() is called every time. I haven't done any benchmarks, but that sounds horribly slow to me. Do you think read and write access could be improved here by using a normal list of addresses? I realize that that would use up more memory, but in conjunction with pool management this wouldn't be that much of an issue.

comment:13 Changed 12 years ago by jch

Charles: okay, so it looks like the µTorrent people agree with me -- the dropped field is useless.

comment:14 Changed 12 years ago by jch

KyleK:

The simplest fix would be to resize the array exponentially; every time it overflows, you double the size of the array and perform the copy. In this way, the cost of copying is linear in the number of peers.

(In case that's not clear to you: you start with an array of size 1. At the second peer, you double it. At the third peer, you reach size 4. At the fifth peer, you reach size 8. In this way, you've performed 1+2+4=7 copies at the eigth peer, 1+2+4+8=15 copies at the sixteenth peer, and so on.)

If we want to perform strict LRU aging, then using an exponentially grown array treated as a circular list is optimum. If we want to age atoms in the middle of the list, then a linked list with a pointer to the end is the simplest, but not necessarily the most efficient data structure. In particular, linked lists have poor locality (unless you use a copying generational garbage collector, but that's somewhat off topic), which kills your cache.

In the DHT code, for example, I'm using exponentially grown arrays for bulk data, and linked lists for node lists. With hindsight, it would have been both more efficient and simpler to use arrays everywhere, but it's not worth changing that.

comment:15 Changed 12 years ago by KyleK

Remember that it is a sorted list (sorted by peer address). While there will be less realloc()s with an exponentially growing array, there'll still be a memmove (of varying sizes) with each insertion and deletion.

comment:16 in reply to: ↑ 12 ; follow-up: Changed 12 years ago by charles

Replying to KyleK:

charles:

Reading the code, I see that the pool where potential peer addresses are kept is a array, which is reallocated every time a new address gets added. Also, memmove() is called every time.

Where are you seeing it reallocate every time? I'm not sure what you're looking at there.

comment:17 in reply to: ↑ 16 Changed 12 years ago by KyleK

Replying to charles:

Replying to KyleK:

charles:

Reading the code, I see that the pool where potential peer addresses are kept is a array, which is reallocated every time a new address gets added. Also, memmove() is called every time.

Where are you seeing it reallocate every time? I'm not sure what you're looking at there.

I'm sorry. It seems I remembered that part wrong. It doesn't reallocate every time.

comment:18 follow-up: Changed 12 years ago by KyleK

  • Summary changed from PEX and DHT: atom pool management badly needed to Peer atom pool management badly needed

I just wanted to bump this issue again to spur a discussion.

I agree that the "dropped" PEX list is apparently useless, so the attached patches can be ignored. Nevertheless, with a very large number of known peers (whether they came via PEX, DHT, or the tracker), Transmission is slowed down severely on embedded machines.

Looking at the code, the atom list is only ever added to, nothing is ever removed. At least on low-power machines, this list should have an upper limit to keep the daemon running.

Any ideas how we can accomplish this?

comment:19 in reply to: ↑ 18 Changed 12 years ago by jch

Replying to KyleK:

Looking at the code, the atom list is only ever added to, nothing is ever removed. At least on low-power machines, this list should have an upper limit to keep the daemon running.

Any ideas how we can accomplish this?

Yes, there should be two mechanisms:

  1. atoms should have an expiry time, and should be purged when they reach their expiry time. This time should be set by ensureAtomExists, even when the atom already exists, in the following manner:
  • now + 8 hours for directly connected atoms;
  • now + 2 hours for PEX atoms;
  • now + 35 minutes for DHT atoms;
  • now + 10 minutes for "fast restart" atoms.

Note that it is okay to purge atoms lazily, i.e. only when you encounter an expired atom (e.g. in getExistingAtom).

Note the small value for DHT atoms -- that's because the DHT code should, in normal circumstances, refresh its atoms every 28 minutes at most.

  1. There should be a fallback mechanism that limits the total amount of atoms (or the total amount of atoms per torrent?) in case the mechanism above is not sufficient. We would simply need to purge an atom whenever we attempt to go over the limit; the simplest solution would be to choose a random atom, but smarter strategies are of course possible.

--Juliusz

comment:20 follow-up: Changed 12 years ago by charles

Sounds like you're volunteering to make a patch... ;)

comment:21 in reply to: ↑ 20 Changed 12 years ago by jch

Replying to charles:

Sounds like you're volunteering to make a patch... ;)

I'd love to. Are you volunteering to correct 30 student projects for me?

--Juliusz (just kidding)

comment:22 follow-up: Changed 12 years ago by KyleK

Out of curiosity, where'd you get these intervals from? Also, atoms received via tracker seem to be "missing" from list up there. What about those?

comment:23 in reply to: ↑ 22 Changed 12 years ago by jch

Replying to KyleK:

Out of curiosity, where'd you get these intervals from?

Sucked them out of my thumb. Listened to my little finger.

  • DHT: the DHT is consulted every 28 minutes, so 35 minutes should be fine;
  • PEX: PEX is untrusted, so the interval should be fairly small;
  • fast restart: this is old data, it should expire pretty soon;
  • directly connected: this is direct evidence, so it should be retained for a longer time.

An additional issue I didn't mention is that we should not advertise PEX atoms over PEX, unless we have direct evidence they are correct (by connecting to them); otherwise, an obsolete PEX atom might circulate indefinitely in the swarm (since PEX doesn't contain a hop count or other strictly increasing metric).

Also, atoms received via tracker seem to be "missing" from list up there. What about those?

Good point. I suggest 1.2 times the tracker update interval, unless the atom is confirmed correct by direct evidence (a tracker atom should normally either be refreshed or replaced after that time).

--Juliusz (bored to death correcting student projects)

comment:24 Changed 12 years ago by jch

The following three patches implement my solution.

The first one maintains an expiry time in every atom, and ignores atoms that have expired. It shouldn't break anything, just make sure we don't attempt to use old atoms.

The second one just implements a helper function. It doesn't do anything on its own.

The third one is where dragons be. It actually expires atoms, i.e. removes them from the pool at most 10 minutes after they have expired. I believe it is correct, but I am a little nervous that somebody might be holding on to an atom somewhere.

--Juliusz

Changed 12 years ago by jch

Changed 12 years ago by jch

comment:25 Changed 12 years ago by jch

This crashes, as I expected. I'm pretty sure the issue is that we should copy the atom in reconnectTorrent before passing it to peerIoNewOutgoing; the latter should free it somehow.

The other issue is that we don't currently filter out expired atoms in reconnectTorrent.

Finally, the time constants need some tuning. I think the right thing is to use different time values for large and small swarms -- we might as well be very aggressive in a large swarm, but extremely permissive in smaller ones, where keeping obsolete information doesn't matter as much (it won't create connect storms, it won't cause a lot of memory usage, and it might actually be useful).

--Juliusz

comment:26 follow-up: Changed 12 years ago by livings124

As a result of being bored and randomly analying 0002, I have two small suggestions: 1. for true/false use tr_bool (TRUE/FALSE) and 2. when removing items from an array, it's more efficient to remove them in reverse order.

comment:27 Changed 12 years ago by livings124

analyzing, but I suppose it's analying as well

comment:28 Changed 12 years ago by livings124

Also for 0003, you check t->peer_expiry_time twice in a row.

comment:29 in reply to: ↑ 26 ; follow-up: Changed 12 years ago by jch

Replying to livings124:

  1. for true/false use tr_bool (TRUE/FALSE)

Over my dead body. If I wanted booleans, I'd be hacking Azureus.

(In real programming languages, of course, booleans are represented as NIL and non-NIL, but one doesn't always choose.)

  1. when removing items from an array, it's more efficient to remove them in reverse order.

You're right. Noted.

you check t->peer_expiry_time twice in a row

That's deliberate. In peerMgrGetPeers, now is already computed, so checking the time there saves a system call (remember, this check is made for every atom we see). In expireAtom, on the other hand, checking the time is just normal defensive programming -- somebody might call this function without checking in the future, and I'd rather we avoided spuriously flushing atoms in that case.

Call it paranoia.

--Juliusz

comment:30 in reply to: ↑ 29 ; follow-up: Changed 12 years ago by livings124

Replying to jch:

Replying to livings124:

  1. for true/false use tr_bool (TRUE/FALSE)

Over my dead body. If I wanted booleans, I'd be hacking Azureus.

(In real programming languages, of course, booleans are represented as NIL and non-NIL, but one doesn't always choose.)

There is no increase in overhead from using booleans, pretty much all languages have them (including C as of C99 - why we don't require C99 10 years later isn't my call), and it makes the code a lot easier to read and understand. It also allows constancy, and follows the conventions already used in the code. And if someone were to use the functions you added, a tr_bool might avoid bugs if they think the int implies it accepts more than 0 and 1.

What advantage does not using booleans have other than not beeing leet?

comment:31 in reply to: ↑ 30 Changed 12 years ago by jch

What advantage does not using booleans have [...]?

It makes my code too easy to understand. I usually use 13 for true and 47 for false, and give all my variables Polish names that cannot be typed with an American keyboard, let alone pronounced by a Yank, but Charles refuses to commit my patches when I use these convention.

(You're 100% right, of course, I'm just being deliberately annoying ;-)

--Juliusz

comment:32 Changed 12 years ago by charles

give all my variables Polish names that cannot be typed with an American keyboard

Still better than Hungarian notation, I suppose.

FWIW I agree with livings; tr_bool adds semantic clues that int doesn't.

comment:33 Changed 12 years ago by charles

  • Milestone changed from None Set to 1.80
  • Severity changed from Normal to Major
  • Status changed from new to assigned
  • Type changed from Bug to Enhancement

comment:34 Changed 12 years ago by charles

  • Summary changed from Peer atom pool management badly needed to Peer atom pool grows too large

comment:35 Changed 12 years ago by charles

(trunk libT) #2430 "Peer atom pool grows too large" -- add an atom expiration system along the lines of the suggestions in this ticket's comments. jch and KyleK, let me know if you think anything in this commit needs improvement.

comment:36 Changed 12 years ago by jch

I like the way you take piece_data_time into account in compareAtomPtrsByShelfDate, rather than explicitly extending an atom's shelf life as I did. That's really neat.

The shelf life of a tracker atom looks too long to me: after all, a tracker atom should normally be refreshed within the update interval. If the tracker becomes inaccessible, the other mechanism will kick in (not dropping atoms when there are few of them.

Minor point: the shelf life of a tracker atom should be of small multiple (say 2.2) of the tracker's update interval rather than being hard-wired to 4 hours. I'm not sure how difficult it is, though, it might require adding a new field to atoms. So it's probably not worth it.

--Juliusz

comment:37 follow-up: Changed 12 years ago by charles

The shelf life of a tracker atom looks too long to me: after all, a tracker atom should normally be refreshed within the update interval.

That's a good point. My rationale for the longer tracker shelf life is that reannouncing isn't a guaranteed refresh -- the tracker may give you a slightly (or wholly) different subset of the peers it knows about.

comment:38 in reply to: ↑ 37 Changed 12 years ago by jch

Replying to charles:

The shelf life of a tracker atom looks too long to me: after all, a tracker atom should normally be refreshed within the update interval.

That's a good point. My rationale for the longer tracker shelf life is that reannouncing isn't a guaranteed refresh -- the tracker may give you a slightly (or wholly) different subset of the peers it knows about.

The same is true of the DHT. I'd suggest at least making the two consistent.

However, you're not dropping atoms that have been productive recently. If an atom has not been productive in 2.2 times the tracker's announce interval and hasn't been refreshed by a tracker announce, I think you're fully justified in dropping it.

--Juliusz

comment:39 Changed 12 years ago by charles

I agree with most of that. Definitely the default shelf life could be tweaked some.

  • The effort/benefit of tying tracker atoms to their tracker's announce interval is too high. What happens if we get the same peer from two trackers? Do we go back and readjust that atom's shelf life? Do we take the highest value of the two trackers' announce intervals? And then what if the peer /isn't/ refreshed by the first tracker... and then what if there are multiple tiers... etc, etc.
  • Now that I think of it, IMO the resume file peers should have the shortest shelf life, since their peers are potentially the oldest of all.
  • If DHT might give a different set of peers each time, to me that suggests we should extend the DHT atoms' shelf life.
  • I agree that the DHT and tracker shelf lives should be closer together. If I'm understanding DHT correctly, the tracker's shelf life should probably be longer -- those atoms are time-bounded secondhand knowledge (the peer said hello to the tracker mumble minutes ago, and the tracker's telling us about it) but with DHT I don't think we know the number of steps, or more importantly, the age. If that's correct I'd rate an untested DHT atom somewhat lower than an untested Tracker atom, but above an untested DHT atom.

How about:

        case TR_PEER_FROM_INCOMING : return 60 * 60 * 8;
        case TR_PEER_FROM_LTEP     : return 60 * 60 * 8;
        case TR_PEER_FROM_TRACKER  : return 60 * 60 * 4;
        case TR_PEER_FROM_DHT      : return 60 * 60 * 3;
        case TR_PEER_FROM_PEX      : return 60 * 60 * 2;
        case TR_PEER_FROM_RESUME   : return 60 * 60 * 1;
        default                    : return 60 * 60;

comment:40 follow-up: Changed 12 years ago by charles

that last should read, "but above an untested PEX atom".

comment:41 Changed 12 years ago by jch

Replying to charles:

>         case TR_PEER_FROM_INCOMING : return 60 * 60 * 8;
>         case TR_PEER_FROM_LTEP     : return 60 * 60 * 8;
>         case TR_PEER_FROM_TRACKER  : return 60 * 60 * 4;
>         case TR_PEER_FROM_DHT      : return 60 * 60 * 3;
>         case TR_PEER_FROM_PEX      : return 60 * 60 * 2;
>         case TR_PEER_FROM_RESUME   : return 60 * 60 * 1;
>         default                    : return 60 * 60;

Please add "+ 7 * 60" at the end of each line. Using round values is a bad idea, as it might easily cause an atom to be flushed and then immediately reinstated (for example if the tracker's announce interval is itself a round number that evenly divides 4 hours).

Other than that, my gut instinct would be to reduce both the DHT and tracker shelf time to slightly over 1 hour, but I'm not really sure of myself. The intuition is that in a small swarm you're going to receive the same set of peers each time, so the atoms will normally get refreshed, while in a large swarm, where the sets don't necessarily overlap, you've got plenty of atoms anyway, who cares if you spuriously discard a fraction of those.

As I've said, I'm not really sure of myself, and furthermore I don't think it matters much.

with DHT I don't think we know the number of steps, or more importantly, the age

We do. The DHT is completely analogous to a set of 8 trackers with an update interval of 30 minutes. The only differences are how we find those 8 trackers -- we use the Kademlia algorithm rather than consulting the .torrent file --, and that the set of trackers may change over time (as nodes leave and join the Kademlia network).

In other words, the number of steps is exactly two, just like with a tracker, and the age is no more than 30 minutes.

According to the spec, each of those pseudo-trackers should return 50 randomly drawn peers, but I don't think everyone respects that, I've seen DHT nodes return up to 200 peers. The Transmission DHT follows the spec almost exactly: we announce to the DHT every 28 minutes, we expire peers that we're tracking after 35 minutes, and we return up to 50 peers, which are drawn with a good approximation to a random subset.

--Juliusz

comment:42 in reply to: ↑ 40 Changed 12 years ago by jch

Replying to charles:

that last should read, "but above an untested PEX atom".

The main reason to avoid PEX is not that PEX is particularly untrustworthy. The main issue is that over-reliance on PEX will cause the peer graph to degenerate into a set of fully-connected cliques.

Something like either a tracker or the DHT is needed to reestablish connections between poorly-connected subsets of the network.

--Juliusz

comment:43 Changed 12 years ago by charles

Please add "+ 7 * 60" at the end of each line. Using round values is a bad idea, as it might easily cause an atom to be flushed and then immediately reinstated (for example if the tracker's announce interval is itself a round number that evenly divides 4 hours).

We already add a jitter value to getDefaultShelfLife()... Also when we're talking about values in terms of hours, maybe a jitter range of 10 minutes makes more sense than the current 2 minutes.

As I've said, I'm not really sure of myself, and furthermore I don't think it matters much.

I agree. Any weaknesses in the default shelf life values will be ameliorated by the expiration code weighing the piece_data_time.

The DHT is completely analogous to a set of 8 trackers with an update interval of 30 minutes... In other words, the number of steps is exactly two, just like with a tracker, and the age is no more than 30 minutes.

Thanks for that correction! In that case you're right about weighing trackers and DHT the same for consistency's sake.

comment:44 Changed 12 years ago by charles

r9592 libtransmission/peer-mgr.c: (trunk libT) #2430 "peer atom pool grows too large" -- tweak the default atom shelf lives based on discussion in the ticket's comments section

comment:45 Changed 12 years ago by charles

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:46 Changed 12 years ago by KyleK

  • Resolution fixed deleted
  • Status changed from closed to reopened

Out of curiosity I was checking out the tordbg() output on line 3026 for a single torrent. The output for this is:

656:[27 11:23:17.547] <torrent> max atom count is 200; have 711.. pruning
672:[27 11:24:17.914] <torrent> max atom count is 200; have 1265.. pruning
685:[27 11:25:17.346] <torrent> max atom count is 200; have 599.. pruning
698:[27 11:26:17.853] <torrent> max atom count is 200; have 962.. pruning
715:[27 11:27:17.314] <torrent> max atom count is 200; have 890.. pruning
727:[27 11:28:17.772] <torrent> max atom count is 200; have 999.. pruning
739:[27 11:29:17.411] <torrent> max atom count is 200; have 738.. pruning
749:[27 11:30:17.859] <torrent> max atom count is 200; have 803.. pruning
760:[27 11:31:17.326] <torrent> max atom count is 200; have 648.. pruning
771:[27 11:32:17.747] <torrent> max atom count is 200; have 839.. pruning
784:[27 11:33:18.135] <torrent> max atom count is 200; have 676.. pruning
794:[27 11:34:17.526] <torrent> max atom count is 200; have 824.. pruning
808:[27 11:35:18.182] <torrent> max atom count is 200; have 667.. pruning
818:[27 11:36:17.574] <torrent> max atom count is 200; have 767.. pruning
831:[27 11:37:18.250] <torrent> max atom count is 200; have 672.. pruning
841:[27 11:38:17.602] <torrent> max atom count is 200; have 804.. pruning
853:[27 11:39:18.059] <torrent> max atom count is 200; have 720.. pruning
864:[27 11:40:17.571] <torrent> max atom count is 200; have 719.. pruning
876:[27 11:41:17.977] <torrent> max atom count is 200; have 913.. pruning
889:[27 11:42:18.405] <torrent> max atom count is 200; have 791.. pruning
901:[27 11:43:17.856] <torrent> max atom count is 200; have 874.. pruning
913:[27 11:44:18.313] <torrent> max atom count is 200; have 900.. pruning
925:[27 11:45:17.755] <torrent> max atom count is 200; have 899.. pruning
936:[27 11:46:18.202] <torrent> max atom count is 200; have 928.. pruning
949:[27 11:47:18.778] <torrent> max atom count is 200; have 991.. pruning
965:[27 11:48:18.220] <torrent> max atom count is 200; have 928.. pruning
991:[27 11:49:18.697] <torrent> max atom count is 200; have 1491.. pruning

While the number of peers is kept in check somewhat (the tracker alone has ~5000 peers), it is still way over the proposed limit. Is this expected behavior? Performance is fine otherwise, so no complaints there.

comment:47 Changed 12 years ago by charles

KyleK: yeah I think it's fine. We don't limit the influx anywhere; we just prune them periodically. If anything it's showing that DHT and PEX really do work well.

comment:48 Changed 12 years ago by charles

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.