Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#3656 closed Enhancement (fixed)

Endgame could be faster

Reported by: charles Owned by: jordan
Priority: Normal Milestone: 2.22
Component: libtransmission Version: 2.11
Severity: Normal Keywords:
Cc:

Description

The request manager was rewritten to eliminate downloading duplicate blocks, but it may be taking the idea too far in endgame -- several users have complained that endgame is too slow.

Attachments (16)

endgame.1.patch (983 bytes) - added by charles 11 years ago.
possible fix
endgame1.patch (2.2 KB) - added by harrydb 11 years ago.
my take at the endgame
endgame1-debug.patch (2.6 KB) - added by harrydb 11 years ago.
DEBUG.gz (18.5 KB) - added by gunzip 11 years ago.
endgame2-debug.patch (3.6 KB) - added by harrydb 11 years ago.
DEBUG-2.gz (11.8 KB) - added by gunzip 11 years ago.
endgame3.patch (6.6 KB) - added by harrydb 11 years ago.
endgame4.patch (11.6 KB) - added by harrydb 11 years ago.
endgame5.patch (11.4 KB) - added by harrydb 11 years ago.
endgame6.patch (11.8 KB) - added by harrydb 11 years ago.
-
endgame7.patch (11.9 KB) - added by ijuxda 11 years ago.
Patch to svn/trunk r11889
TheCrisisofCreditVisualized.torrent (3.4 KB) - added by jordan 11 years ago.
a webseed torrent
endgame8.patch (11.2 KB) - added by jordan 11 years ago.
patch against r11889 implementing the changes in comment:47
endgame9.patch (11.4 KB) - added by jordan 11 years ago.
patch against 11889 implementing the changes in comment:49, comment:50, and comment:51
endgame10.patch (12.5 KB) - added by jordan 11 years ago.
patch against r11889 implementing the changes in comment:53
endgame11.patch (12.4 KB) - added by jordan 11 years ago.
patch against r11889 implementing the changes in comment:55 and comment:59

Download all attachments as: .zip

Change History (83)

Changed 11 years ago by charles

possible fix

comment:1 Changed 11 years ago by gunzip

OS: Linux Debian, transmission-daemon 2.11+ (11345)

this patch looks very promising! i just tried it on two different torrents of the type that would usually hang at the final few pieces, that is, public torrents with very large swarms and tons of seeders and peers. but with this patch all the sputtering at the end no longer occurred .. at least on the 2 tests i've done so far.

could you include this patch in a new nightly build so more users could test it out? this has been quite an annoyance for me (and others) since TR 1.92, and this could be the final solution.

comment:2 Changed 11 years ago by gunzip

well maybe i spoke too soon. on a third similar torrent i tried (175 MiB, 5000+ peers, seed/leech ratio over 3) TR maxed out my connection up to 99.xx% in about 9 minutes, then spent about another 4 minutes sputtering to get the last few pieces.

very hard to figure why this occurs on some torrents and not others, unless it's deliberate poisoning of the swarm by anti-p2p organizations.

comment:3 Changed 11 years ago by kovalev

If I understand swarm "poisoning correctly", it could be differentiated from endgame slowdown by an accumulation of bad pieces of data (hash check failure) that are thrown in deliberately by malignant peer(s). This could be easily detected in Torrent Inspector (Properties - Information), though identifying the ip of the bad guy is not that easy within the present version of Transmission.

Imho such a peer deserves a temporary ban regardless of reason - I have come across one of a kind recently while downloading system update... May be this can be discussed further in a separate thread as an enhancement.

comment:4 Changed 11 years ago by Rolcol

kovalev: Bad peers are banned. It just takes 5 bad pieces to do so.

comment:5 Changed 11 years ago by gunzip

i don't think the slowdown is due to hash fails as these would show up in my log as "failed its checksum test", but they don't. something else is causing it to hang at the very end on some torrents.

my uninformed guess it's an unwanted side effect from this ticket introduced back in TR 1.92:

Local data can be corrupted by bad peers when Transmission accepts duplicate blocks during endgame https://trac.transmissionbt.com/ticket/1242

because that's when a flurry of forum posts appeared about torrents hanging at the very end.

comment:6 Changed 11 years ago by charles

Ticket #1242 was fixed by r10325, which appears to be unrelated to this.

gunzip, could you mail me an example of a torrent where you're seeing slow endgame?

comment:7 Changed 11 years ago by gunzip

@charles

if this behavior was reproducible i would have filed a bug report months ago -- unfortunately it isn't. i just re-ran the exact same torrent that hung yesterday and this time there was no sputtering at the end. it maxed out my connection right to the 100% mark without a hiccup. the only difference is the swarm makeup from 24 hours ago.

i know since the spring of this year there have been about 10 forum topics started regarding this problem, and it seems to affect both Mac and Linux users alike. the most active forum thread is:

https://forum.transmissionbt.com/viewtopic.php?f=4&t=9781

i will say the problem, from my experience, is not as bad now as it was in the TR 1.9* versions, where it would hang indefinitely at 99.xx% and the only way to get it to finish was a pause/resume. i would guess now less than 1 in 8 of my torrents exhibit this quirk, and when it does it will eventually finish within 5 minutes. unless you happen to be observing the endgame you might not even be aware of it, though on smaller torrents this can add 50% to the total download time.

i tried but failed to find some common denominator, though it seems more likely to occur on public trackers than private ones. i can recall only one case where the slowdown occurred on a private tracker.

comment:8 Changed 11 years ago by kovalev

As of my experience, the endgame slowdown occurs with very large swarms where the number of seeders connected is much higher than the number of leechers, and with large size of data piece: 2 to 4MB . The simplest (though rather resource-costly) way to reproduce is to download a disk image of any Linux distribution. At the very end a load of connections still remain active, but all the transfer actually proceeds from ca. 2-3 seeders of 50 connected.

The endgame behaviour has been improved since version 1.9x but is still clearly observable with r11345. It doesn't look really important though.

Last edited 11 years ago by kovalev (previous) (diff)

comment:9 follow-up: Changed 11 years ago by kovalev

In addition to my observations on the final approach slowdown. It definitely happens while downloading large data of many chunks of considerable size (more than 1Mb) in a large swarm. This means there is a great chance to have slow connections been accumulated at the end while all the transfer from faster peers has been complete. For some reason Transmission gets stuck with those slow connections to the very end. It looks rather as a feature of Transmission's connectivity method than a bug. Though I cannot test and confirm this kind of behaviour for other BitTorrent? clients.

comment:10 Changed 11 years ago by charles

Can anyone say whether things are better, worse, or no change with endgame.1.patch? I'm considering committing it for 2.12 but would like to get more user feedback on it.

comment:11 follow-up: Changed 11 years ago by kovalev

Well, it is not easy to obtain an accurate picture from a single case - all the issue is dependent on the torrent properties and swarm's conditions. As of my recent experience with downloading software upgrades - at least it didn't make things worse.

comment:12 in reply to: ↑ 11 Changed 11 years ago by gunzip

Replying to kovalev:

Well, it is not easy to obtain an accurate picture from a single case - all the issue is dependent on the torrent properties and swarm's conditions. As of my recent experience with downloading software upgrades - at least it didn't make things worse.

Yes I would agree with that -- i tried the patch with several torrents and saw no downside. But it didn't solve the problem on some torrents where it has difficultly completing at the very end even when there are a ton seeds connected.

If i can ever find a way of duplicating this quirk i will file a ticket. But the endgame of recent TR versions 2.10 and 2.11 is much better than in 1.92 and 1.93 IMO.

Last edited 11 years ago by gunzip (previous) (diff)

comment:13 Changed 11 years ago by charles

  • Milestone changed from None Set to 2.20

Added to trunk in r11528 for nightly build testing

comment:14 Changed 11 years ago by User294

Sure, for some files endgame became slower. I seen how T attempts to DL a file and at endgame it has sticked to a very slow peer (and that bastard sending at 0.01 ... 0.05 kb/sec! Ouch!) while ignoring other peers. So endgame could took as long as download itself for medium-sized files. I bet I should give it a try to see if this issue fixed.

comment:15 Changed 11 years ago by gunzip

Good, i'm going to test this build on some of the small, heavily seeded torrents on public trackers where i've seen this happen in the past .. but want to try it out on at least 10 torrents before reporting anything.

comment:16 Changed 11 years ago by gunzip

Unfortunately it only took 5 tests to see that r11528 didn't solve the problem of "finishing" at the end, at least for me. In fact i might consider it a bit of a regression. Two of the five tests hungup at the 99.x% point. All were small public torrents ~175MB and had very high seed/peer ratio. This is typical case:

Time to DL from 0 to 99% : 9 minutes (connection maxed out, all is good)

Time to DL from 99% to 100%: 7 minutes (sputtering with negligible d'load activity, but uploading is not effected)

Below is screenshot of peerlist at the time of the hangup, and you can see there are plenty of connected seeds but transmission just won't download from them.

The second screenshot show the 3 final pieces (total of only 1.5MB) where transmission can't seem to finish off cleanly, in effect almost doubling the DL time.

http://oi55.tinypic.com/mj3481.jpg

http://oi52.tinypic.com/241l5oi.jpg

comment:17 Changed 11 years ago by charles

r11533: undo r11528

comment:18 in reply to: ↑ 9 Changed 11 years ago by x190

Replying to kovalev:

In addition to my observations on the final approach slowdown. It definitely happens while downloading large data of many chunks of considerable size (more than 1Mb) in a large swarm. This means there is a great chance to have slow connections been accumulated at the end while all the transfer from faster peers has been complete. For some reason Transmission gets stuck with those slow connections to the very end. It looks rather as a feature of Transmission's connectivity method than a bug. Though I cannot test and confirm this kind of behaviour for other BitTorrent? clients.

Deluge apparently also exhibits this random behavior.

http://ubuntuforums.org/showthread.php?t=559829

comment:19 Changed 11 years ago by jordan

  • Milestone changed from 2.20 to Sometime

comment:20 follow-up: Changed 11 years ago by harrydb

Hi all, I have been thinking about this and I came up with the following: in the endgame let only a 'fast' peer download a block that is already being downloaded by another (possibly slow) peer. The problem is how to determine if a peer is fast? I noticed that the number of pending requests is higher for 'fast' peers so changed the code that in the endgame one additional peer can be assigned to a block, but only if the new peer has more than some threshold (10 works nice) pending requests.

Works nice but while playing with this I found out that sometimes the endgame is never reached. For one (not too small) torrent the endgame was even initiated when the torrent was about 50% complete (and causing a chain reaction of pending requests).

The latter is due to, I believe, the current code only looking at the pending requests/missing blocks of the piece at the top of the pieces list. If I understand correctly is perfectly possible (although unlikely) at any time for a piece to have X pending requests and X missing pieces which would falsely trigger the endgame.

The former problem seems to be caused by the number of pending requests per piece that seems to be inconsistent. I have seen (with the old code plus a println) pending/missing numbers of 8/2 for a piece and in another case n-1/n (n kept decreasing slowly) which seem weird to me (or, more probably, I don't understand transmission internal semantics properly). Also the sum of pending requests per piece is not equal to the total requestCount of the torrent (or does this include active transfers). Anyway, some things smelled fishy, but that may be me.

I ended up comparing the torrent's requestCount with the sum over the missing blocks per torrent (using tr_cpMissingBlocksInPiece) as these two seem consistent (subtracting the sum of tor->complition->completeBlocks from tor->blockCount did not work properly although I felt the result should be the same). Instead of summing it would be nice if there was some administration in tr_completion that keeps track of the total number of completed pieces.

Sorry for being a bit verbose but I wanted to motivate my changes and report the troubles I encountered (valid or not). I don't do a lot of C coding so if someone could take a look at my patch (attached below). In my tests troublesome torrents now go blazing right till the end.

Wow that was long.

Changed 11 years ago by harrydb

my take at the endgame

comment:22 in reply to: ↑ 20 Changed 11 years ago by gunzip

Replying to harrydb:

my take at the endgame endgame1.patch

hi, i ran your patch for 5 torrents but unfortunately on the third one i experienced the same "hang" at the 99.x% point pretty much identical to what i described in comment 16 above.

comment:23 follow-up: Changed 11 years ago by harrydb

Hmm, too bad. Thanks for testing though! Are these public (and legal) torrents I can test myself?

Last edited 11 years ago by harrydb (previous) (diff)

comment:24 in reply to: ↑ 23 Changed 11 years ago by gunzip

Replying to harrydb:

Hmm, too bad. Thanks for testing though! Are these public (and legal) torrents I can test myself?

you can try any torrents you want. i usually limit my tests to small well-seeded torrents as it makes the procedure easier and faster. also there doesn't seem to be a specific torrent that hangs .. rather it occurs randomly. i've run tests on the exact same torrent where it would hang the first time, and then just a few hours later complete with no issues.

of course it would be useful to get feedback from others experiencing this problem to see if your patch made any difference. kudos for trying a fix as this has been an annoyance going on 10 months now.

comment:25 follow-up: Changed 11 years ago by harrydb

I made a patch with the same changes but is also prints out some debug output (number of requests/missing blocks/endgame reached), could you try it and attach the output if the torrent fails to complete fast? Thanks!

In reply to myself: Torrent->pieces has only incomplete pieces, not all. This caused some of my weird numbers.

Changed 11 years ago by harrydb

comment:26 in reply to: ↑ 25 Changed 11 years ago by gunzip

Replying to harrydb:

I made a patch with the same changes but is also prints out some debug output (number of requests/missing blocks/endgame reached), could you try it and attach the output if the torrent fails to complete fast? Thanks!

OK, i found a small torrent (100MB) with a huge amount of seeds for the test. i actually had to run it four times before the hang issue occurred .. but sure enough it did. in this case it hung for about 4 minutes before finishing. (see attachment DEBUG.gz)

Changed 11 years ago by gunzip

comment:28 Changed 11 years ago by ijuxda

I can confirm that the 99% hang occurs even with the patch (endgame1.patch). In my particular case in peers tab I see a single peer with 2 download requests (i.e. 2 requests from us to download from that peer) and a periodically appearing value of 2 in the "we cancelled" column (I assume this refers to requests "we cancelled").

comment:29 Changed 11 years ago by harrydb

@gunzip: Thanks! I have an idea what _might_ be going wrong, and an idea how to solve it, I suspect (hope) the issue is the same for ijuxda.

@x190: thanks for the link, I already though about canceling additional requests; it is an issue in the current implementation as well but I have not explored that part of the codebase yet.

comment:30 Changed 11 years ago by harrydb

I am not entirely happy yet, but I think this one should work better. Could you give it a spin? Again if this one also fails could you attach the debug output? Thanks!

Changed 11 years ago by harrydb

comment:31 Changed 11 years ago by gunzip

@harrydb: Very good news to report! .. i'm now seeing a significant improvement in the endgame with your endgame2-debug.patch. i ran that test torrent 10 more times and there weren't any endgame hangs or massive slowdowns. furthermore i did 3 other torrents and they all finished cleanly. so far that's 13 consecutive runs without an issue. of course there's always a chance i just got lucky but i don't think so.

having others try your patch would help alot, especially all those users who have experienced this problem to various degrees. BTW, i attached debug output (DEBUG-2.gz) from a typical successful run if you want to compare it to the case where it hanged.

Changed 11 years ago by gunzip

comment:33 follow-up: Changed 11 years ago by harrydb

Nice to hear it works, let's hope it stays that way! I have done some cleanup and some other tweaks to make it slightly more efficient. I hope this will be the last revision (debug output removed and diff created in the root).

--- Patch summary ---

peer-mgr.c:

  • Detect endgame by looking at the total number of missing blocks and the total number of requests. When endgame is detected calculate the average number of requests per peer we are downloading from. This value is stored in the newly added Torrent->endgame (recalculating this in in the endgame results in 0 after a while, plus we avoid the calculation loop).
  • When in endgame allow an additional peer to download a block when it has a number of pending requests at least as high as the average (a measure of being 'fast').

completion.h/c

  • Add a variable that holds the total number of completed blocks so we do not need a sum loop each time we want to check for the endgame (might be useful elsewhere as well).

Lets test it some more and after a while I hope the devs are willing to merge this for the next version (2.30?).

Changed 11 years ago by harrydb

comment:34 in reply to: ↑ 33 Changed 11 years ago by jordan

Replying to harrydb:

Lets test it some more and after a while I hope the devs are willing to merge this for the next version (2.30?).

yes. :)

comment:36 in reply to: ↑ 35 Changed 11 years ago by harrydb

Thanks for looking at it ijuxda!

As before [...]

I don't know 'before' ;-)

The only really necessary change was with respect to the usage of short types (uint16_t). Perhaps you just confused Torrent.requestCount with weighted_piece.requestCount?

Not really, I was just not entirely sure why in some parts (unsigned) int was used and sometimes uintxx_t, so I just picked one (requestCount should not be that high in any near future). Changing that. On that note, I was wondering why a lot of int is used instead of unsigned int where appropriate, for example Torrent.requestCount. Is there a reason for that?

I did notice that once Torrent.endgame is non-zero, it is never recalculated again. So the first average value calculated by isInEndgame() will always be used even if the state of the peers or the client's requests changes during the endgame period.

Well, two (or three) things to consider here. First (if we do not take into account your comment below) all blocks already have a peer downloading so they will eventually finish. Even when adding additional peers there are of course a thousand and one things that can happen and there will always be a change that the torrent finishes slowly. The thing is I think if you want perfect that will also have a cost (complexity/speed). Of course there are ways to improve it so you are right to question me here.

More to the point the 'fast' peers will keep up their requestCount by requesting additional blocks and we don't care about the slow ones, so I did not want to do unnecessary calculations. Not that it is expensive at the moment, when it turns out we need to recalculate to avoid other problems I am perfectly happy with that.

Second, the average will drop to one near the end of the endgame which could be a good thing since we allow more peers to take a block, but it could also be a bad thing because we can now have two slow peers on a block.

Third, there is a bug in peer-mgr that it does not cancel additional block requests causing Torrent.requestCount to become too high in the endgame. This is also present without this patch and I have a feeling that there may also be some issues when peers reject or timeout. I will add a bug report about that after writing this.

Having said that I have two tweaks I want to implement: 1) Don't add a second peer if the first is already fast (reduce overhead) 2) In the endgame allow even a slow peer to take a block if it has no peers anymore due to rejects or timeouts (increase robustness)

Also, suppose that the user changes the download state of certain files (i.e. selects or deselects some for downloading). Wouldn't this require a change in the endgame state of the torrent? How would such partial torrents affect the usage of tr_torrent.blockCount in isInEndgame()?

That is actually a really good point, I will check this but it may take a while since I have exams coming up.

comment:38 in reply to: ↑ 37 Changed 11 years ago by harrydb

Replying to ijuxda:

Just to be clear, it is our client (the running transmission program) that does the requesting of additional blocks from other peers, and in so doing the requestCount of the pieces containing those blocks increases, unless I have misunderstood something. Also, for peers the value is called pendingReqsToPeer rather than requestCount.

Yup.

I admit I am not completely familiar with the peer-mgr code, but could it ever occur that a fast peer (in terms of bandwidth availability) has a low value of pendingReqsToPeer, or a slow peer a high value?

Well, transmission tunes the number of requests to how fast a peer sends us blocks (actually outside peer-mgr). But sure, as long as bittorrent doesn't become a protocol where peers give hard guarantees it might happen, just not very often.

There is also the fact that most peers we are connected to are probably choking us, so in the endgame couldn't we just send the block requests to all peers that are not currently choking us (I assume it would not be that many) and cancel the still active duplicate requests when the fastest ones complete? This would avoid having to rely on heuristics to determine where to send the requests, but on the other hand I am not sure how wasteful of bandwidth this might be, or how difficult it would be to implement.

That is how duplicate request part of the old code worked :) It actually requested each block three times in the endgame. We could leave that and just fix the endgame detection which is actually the biggest problem. However I think the request part is better with this patch than it was in the old code.

Last edited 11 years ago by harrydb (previous) (diff)

comment:39 Changed 11 years ago by harrydb

part 4:

  • make sure this works for multi-file torrents and when the download selection changes
  • cancel additional block requests when a block finishes
  • the first peer that wants to download a block also gets it in the endgame (ie when peers rejected a previous request or timeout)
  • only add an additional peer if the current one does not fit the requirement for a fast peer

Happy testing!

Changed 11 years ago by harrydb

comment:40 Changed 11 years ago by gunzip

@harrydb: i'm seeing a significant regression with endgame4.patch .. my first 2 tests with it had major hangs at the end.

in contrast, with your endgame2-debug.patch and then endgame3.patch i had 23 consecutive successful runs with no issues.

Linux Debian, transmission-daemon 2.20b4 (11829) with endgame4.patch

Last edited 11 years ago by gunzip (previous) (diff)

comment:41 Changed 11 years ago by harrydb

I noticed it as well, after some testing I determined that the 'only add an additional peer if the current one does not fit the requirement for a fast peer' thing is a bit too strict.

Changed 11 years ago by harrydb

comment:42 Changed 11 years ago by gunzip

@harrydb: thanks, endgame5.patch is testing great and haven't had any hangups in 6 trial runs .. it fixed the regression i was seeing previously.

@jordan: now that 2.20 is out the door, can you please get this patch in the trunk. it's a very promising solution to a long-standing issue and needs more testing by others.

Upate: After much more testing, endgame5.patch is showing occasional but not severe hangups at the end. so from what i can see endgame3.patch has given the best results to date.

Last edited 11 years ago by gunzip (previous) (diff)

comment:43 Changed 11 years ago by ijuxda

The patch endgame5.patch has a memory leak in getBlockRequestPeers: the memory allocated by the calls to tr_ptrArrayAppend is never released.

Also, there are a number of lines that have trailing whitespace. If you are not using git (git diff --color shows them clearly), you can still detect them with

grep -nC1 '[^[:space:]][[:space:]]\+$' endgame5.patch

for example. It's a matter of coding style whether trailing whitespace changes should be avoided. I prefer to do so, but as there is no official guide for this project I doubt anyone will hold it against you if you do not.

Changed 11 years ago by harrydb

comment:44 Changed 11 years ago by harrydb

  • destroy tr_ptrArray
  • fix trailing whitespace

comment:45 Changed 11 years ago by jordan

I'm liking these revisions :)

comment:46 Changed 11 years ago by harrydb

I think the patch in about ready, can we nominate this bug for 2.30?

I must say, the experience contributing here has been quite pleasant. It's nice to get good feedback (thanks gunzip and ijuxda!) and know the devs appreciate the work. Also I found the code quite readable and easy to understand once you get into it, good job!

Changed 11 years ago by ijuxda

Patch to svn/trunk r11889

comment:47 follow-up: Changed 11 years ago by jordan

A couple of thoughts on the latest iterations:

  • Is the return MIN( 1, t->requestCount / numDownloading ); meant to be MAX instead? In its current form I think isInEndgame() will always return either a 0 or a 1, which doesn't match up with the intent for t->endgame described in its declaration.
  • Patch 7 still leaks ptrarrays like mad due to all calls to "continue;" before reaching tr_ptrArrayDestruct(). We can plug this leak and save a bit of malloc/free overhead at the same time by declaring and destructing peerArr outside of the loop, and reusing the array inside the loop with a call to tr_ptrArrayClear(). :)
  • t->endgame is cached for the entire duration of the endgame, so our definition of "fast" will never change. Would it be better to keep updating the value, as patch 3 did? looping through calls to clientIsDownloadingFrom() seems pretty inexpensive to me.
  • Now that isInEndgame() refers to the t->endgame field, and given that the function is called this way: t->endgame = isInEndgame();, there's a dependency cycle there that should be made explicit. Let's rename isInEndgame() as updateEndgame() and have it update the field itself.

Attached is endgame8.patch, which implements these comments. completion.[ch] are unchanged from endgame7.patch.

comment:48 Changed 11 years ago by jordan

Also: I'd like to get some testing of the TR_PEER_CLIENT_GOT_BLOCK code added in patch (4?) wrt webseeds. Webseed tr_peers have a NULL msgs ptr and we shouldn't call tr_peerMsgsCancel() on them.

(There's not a good mechanism to cancel webseed requests, but IMO that's beyond the scope of this ticket and is an edge case anyway.)

Attached is a webseed .torrent that I used for testing 2.20's webseed implementation. It might be useful to test future patches in this ticket with it. It's small, so we should reach endgame quickly on it too...

Changed 11 years ago by jordan

a webseed torrent

Changed 11 years ago by jordan

patch against r11889 implementing the changes in comment:47

comment:49 in reply to: ↑ 47 Changed 11 years ago by harrydb

Replying to jordan:

A couple of thoughts on the latest iterations:

  • Is the return MIN( 1, t->requestCount / numDownloading ); meant to be MAX instead? In its current form I think isInEndgame() will always return either a 0 or a 1, which doesn't match up with the intent for t->endgame described in its declaration.

I am so stupid... I originally had '+ 1' (which is also fine) and performed most of my earlier tests with that but for some reason changed it. I was already wondering why some downloads did not finish as quickly as I expected.

  • Patch 7 still leaks ptrarrays like mad due to all calls to "continue;" before reaching tr_ptrArrayDestruct(). We can plug this leak and save a bit of malloc/free overhead at the same time by declaring and destructing peerArr outside of the loop, and reusing the array inside the loop with a call to tr_ptrArrayClear(). :)

Sounds good!

  • t->endgame is cached for the entire duration of the endgame, so our definition of "fast" will never change. Would it be better to keep updating the value, as patch 3 did? looping through calls to clientIsDownloadingFrom() seems pretty inexpensive to me.

Hmm, I know you wanted to do this before as well but I quite strongly disagree with this. The point is that the number of downloading peers you get from calling clientIsDownloadingFrom() in a loop is quite inaccurate, and gets more and more inaccurate as we go deeper in the endgame. This because it is not the number of peers we are actually downloading from but all the peers that did not choke us and have pieces we do not have yet (if I understand correctly?) but that last part is not updated too often and we may already have requests for all blocks in that piece. Also the peers may be unresponsive or something like that (the ones that caused hangs in the past?). As the number of requests keeps dropping in the endgame the average gets more and more skewed and goes to zero while per peer that we are actually 'getting bytes from' there is always one or more requests.

comment:50 Changed 11 years ago by harrydb

What would be an option is to use 'peers[i]->pendingReqsToPeer > 0' instead of 'clientIsDownloadingFrom()'. It may be better anyway.

comment:51 Changed 11 years ago by jordan

The point is that the number of downloading peers you get from calling clientIsDownloadingFrom() in a loop is quite inaccurate, and gets more and more inaccurate as we go deeper in the endgame.

Okay, I'll attach a new revision in a few minutes. I just noticed a bug in the new 'cancel' code anyway, we call 'cancel' on the same peer N times instead of 1 time for N peers ;)

Last edited 11 years ago by jordan (previous) (diff)

comment:52 Changed 11 years ago by harrydb

Hmm yes it should be

tr_peerMsgsCancel( peers[i]->msgs, block ) ;

instead of

tr_peerMsgsCancel( peer->msgs, block );

Changed 11 years ago by jordan

patch against 11889 implementing the changes in comment:49, comment:50, and comment:51

comment:53 Changed 11 years ago by harrydb

We need to count webseeds as well, otherwise it will cause a division by 0 on your webseed torrent posted above.

Changed 11 years ago by jordan

patch against r11889 implementing the changes in comment:53

comment:54 Changed 11 years ago by jordan

How's that? gunzip, ijuxda, does this one work for you?

comment:55 Changed 11 years ago by harrydb

Great! Two little things:

  • in updateEndgame(): we can drop the MAX now, we can be sure it is always >=1.
  • for TR_PEER_CLIENT_GOT_BLOCK : I don't think ( otherPeer != peer ) is needed, that peer is removed everywhere a few lines earlier by removeRequestFromTables() right?

comment:56 Changed 11 years ago by gunzip

just ran off 8 tests with endgame10.patch: 3 with the webseed torrent above and then 5 small heavily-seeded public torrents. it looks good so far as i didn't see any significant hangups at the end. will do more testing to see if it holds up.

comment:57 Changed 11 years ago by jordan

thanks gunzip!

comment:58 Changed 11 years ago by jordan

harrydb, do you want to do patch11 for those two points in comment:55?

comment:59 Changed 11 years ago by jordan

...actually if you haven't started on it already, I've got a few minutes free. I'll do it :)

I'm going to make two minor changes, if you're agreeable:

  • Instead of removing MAX altogether, I'll move it from MAX( 1, a/b ) to a / MAX(1,b) just as a paranoid safeguard against division by zero.
  • I think you're right about TR_PEER_CLIENT_GOT_BLOCK so I'll take out that clause, but replace it with an assert( peer != peers[i] ) just to make sure we're both right about that assumption.

Changed 11 years ago by jordan

patch against r11889 implementing the changes in comment:55 and comment:59

comment:60 Changed 11 years ago by x190

Can't get this patch to patch.

Checked out revision 11889.
J-B-iMac:-Old_dmgs sl$ cd /Users/snowyowl/Read\ Me/Trans_Weather/Transmission/-Old_dmgs/T_r11889 
J-B-iMac:T_r11889 sl$ patch -p1 < ./endgame11.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: libtransmission/peer-mgr.c
|===================================================================
|--- libtransmission/peer-mgr.c	(revision 11889)
|+++ libtransmission/peer-mgr.c	(working copy)
--------------------------
File to patch: libtransmission/peer-mgr.c
patching file libtransmission/peer-mgr.c
can't find file to patch at input line 266
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
Last edited 11 years ago by x190 (previous) (diff)

comment:61 Changed 11 years ago by jordan

x190: you're in the top-level transmission directory, so use -p0 rather than -p1.

I just tested endgame11.patch against r11889 to confirm, it does patch.

$ svn info | grep -e trunk -e Revision
URL: svn://svn.transmissionbt.com/Transmission/trunk
Revision: 11889
$ patch -p0 < /home/charles/Downloads/endgame11.patch 
patching file libtransmission/peer-mgr.c
patching file libtransmission/completion.c
patching file libtransmission/completion.h

comment:62 Changed 11 years ago by x190

Must be the way you format your patches. Anyway, I guess it was patching okay and I didn't interpret the output correctly. If anyone has a troublesome endgame torrent I can test, please PM me on the forum. Testing with Turbo Transmission, endgame, #117, +#532. :)

P.S. Webseed torrent above completes in 1 min 13 sec.

Last edited 11 years ago by x190 (previous) (diff)

comment:63 Changed 11 years ago by jordan

Sounds like patch 11 is working alright for at least four of us.

Does anyone have any final changes they want to make to the code before it goes in?

comment:64 Changed 11 years ago by harrydb

Hit it!

comment:65 Changed 11 years ago by jordan

  • Milestone changed from Sometime to 2.22
  • Owner set to jordan
  • Status changed from new to assigned

comment:66 Changed 11 years ago by jordan

  • Resolution set to fixed
  • Status changed from assigned to closed

jordan * r11892 libtransmission/ (completion.c completion.h peer-mgr.c): (trunk libT) #3656 "endgame could be faster" -- fixed. Patch by harrydb.

I'm going to close this ticket for now. If there are regressions or bugs in the patch, feel free to reopen.

Thanks to everyone who helped code & test on this, and especially to harrydb for taking the initiative. Working with friendly people is a big part of what makes free software fun.

I'll try to add some new bugs in a future release to give you another itch to scratch. ;)

A minor note on procedure, it's not certain whether or not a bugfix 2.2x release will be necessary. If not, this ticket's milestone will get bumped to 2.30.

comment:70 in reply to: ↑ 69 Changed 11 years ago by jordan

Replying to ijuxda:

There were some bugs I found during testing, the most severe being a memory leak (in tr_peerMgrGetNextRequests's ptrarray).

That leak was fixed early yesterday in endgame8.patch as per comment:47

Can you also confirm that

/* don't send the same request to the same peer twice */

if( peer == peers[0] )

is correct, rather than testing against all peers in the array peers?

It appears correct to me. By the time we reach that line, peerCount should always be 1.

comment:71 Changed 11 years ago by jordan

  • Milestone changed from 2.22 to 2.30

This patch is fairly straightforward, but it's also on the largish side for a point release. Let's bump this to 2.30.

comment:72 Changed 11 years ago by livings124

  • Component changed from Transmission to libtransmission

comment:73 Changed 11 years ago by jordan

  • Milestone changed from 2.30 to 2.22

Moving back to 2.22 :)

The patch is still kind of large for a point release, but IMO this is an important fix...

comment:74 Changed 11 years ago by jordan

r12035 /branches/2.2x/libtransmission/ (completion.c completion.h peer-mgr.c): (2.2x) backport r11892 for #3656 "endgame should be faster" -- patch by harrydb

Note: See TracTickets for help on using tickets.