Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#3329 closed Bug (fixed)

Connection problems when downloading

Reported by: cybex77 Owned by: Longinus00
Priority: High Milestone: 2.10
Component: libtransmission Version: 2.00
Severity: Normal Keywords:
Cc:

Description

I recently updated to 2.0 and ever since I have been having problems connecting to peers for downloading. I connect quickly for just connections but it takes forever to connect to peers for downloading and I have never had this problem before. Any suggestions?

Attachments (6)

fixSlowStart.patch (4.4 KB) - added by Longinus00 11 years ago.
fixSlowStart_v2.diff (6.3 KB) - added by charles 11 years ago.
fixSlowStart_v3.diff (13.1 KB) - added by charles 11 years ago.
fixSlowStart_v4.diff (12.4 KB) - added by charles 11 years ago.
fixSlowSpeed2.patch (6.1 KB) - added by Longinus00 11 years ago.
fixSlowSpeed3.patch (5.5 KB) - added by Longinus00 11 years ago.
combines the fast startup of the first patch with the more advanced stepping of the 2nd

Download all attachments as: .zip

Change History (37)

comment:1 follow-up: Changed 11 years ago by livings124

  • Resolution set to worksforme
  • Status changed from new to closed

This feels like a support issue better suited for the forums until more specifics are uncovered there.

comment:2 in reply to: ↑ 1 Changed 11 years ago by Rolcol

Replying to livings124:

This feels like a support issue better suited for the forums until more specifics are uncovered there.

I've done it: https://forum.transmissionbt.com/viewtopic.php?f=2&t=10218

comment:3 Changed 11 years ago by charles

  • Component changed from Transmission to libtransmission
  • Resolution worksforme deleted
  • Status changed from closed to reopened

Longinus00 had a very insightful suggestion about this issue in IRC this morning regarding a pattern in which Transmission attempts to connect to peers. It seems that rechokeDownloads() isn't randomizing the list of peer which it downloads from, so 2.0 may not be giving untried peers enough attention.

comment:4 Changed 11 years ago by charles

  • Summary changed from Connection problems to Connection problems when downloading

comment:5 follow-up: Changed 11 years ago by charles

the randomization issue described in comment:3 is fixed now in r10820.

cybex77, Rolcol, does this change the behavior you're seeing any?

comment:6 in reply to: ↑ 5 ; follow-up: Changed 11 years ago by Rolcol

Replying to charles:

the randomization issue described in comment:3 is fixed now in r10820.

cybex77, Rolcol, does this change the behavior you're seeing any?

I'm testing but I'm going to go through a lot more torrents before I say anything for certain.

comment:7 in reply to: ↑ 6 Changed 11 years ago by charles

Replying to Rolcol:

I'm testing but I'm going to go through a lot more torrents before I say anything for certain.

:)

comment:8 Changed 11 years ago by charles

Rolcol: ping

comment:9 Changed 11 years ago by charles

r10874 libtransmission/peer-mgr.c: (trunk libT) #3329 "connection problems when downloading" -- raise MAX_CONNECTIONS_PER_SECOND up the higher value used in 1.93

r10875 /branches/2.0x/libtransmission/peer-mgr.c: (2.0x libT) #3329 "connection problems when downloading" -- raise MAX_CONNECTIONS_PER_SECOND up the higher value used in 1.93

comment:10 Changed 11 years ago by charles

r10876 /branches/2.0x/libtransmission/peer-mgr.c: (2.0x libT) #3329 "connection problems when downloading" -- when deciding which peers to connect to, take download/seed status into account

r10877 libtransmission/peer-mgr.c: (trunk libT) #3329 "connection problems when downloading" -- when deciding which peer to connect to, take download/seed status into account

comment:11 Changed 11 years ago by charles

r10883 -- backported r10820 (the randomization fix) from trunk to 2.0x

comment:12 Changed 11 years ago by Rolcol

Until these recent commits it was still slow compared to 1.93. I wasn't comparing to the stable 2.00.

comment:13 Changed 11 years ago by sniper

Last edited 11 years ago by sniper (previous) (diff)

comment:14 Changed 11 years ago by Rolcol

I started a torrent in μTorrent and then migrated it over to r10903 to finish it up away from my dad's laptop. I got a good sample of the download speeds with μTorrent.

It took many minutes (roughly 10) to get up to speed. It looked like Transmission wasn't requesting blocks "fast enough" compared to 1.93. r10903 was slower to unchoke the peers we connected to compared to 1.93. It did eventually catch up but this is wasted time.

comment:15 Changed 11 years ago by Rolcol

I was monitoring Bittorrent traffic in Wireshark. I noticed that trunk was sending "Interested" "Not Interested" messages in the same packet to the same peer. None of these were logged when I was running 1.93.

1.93 seemed to have "Interested" "Unchoke" messages in the same packet to the same peer. None were logged for trunk. http://imgur.com/p0zMo.png

Last edited 11 years ago by Rolcol (previous) (diff)

comment:16 Changed 11 years ago by Longinus00

I'm pretty sure rolcol's observation is a red herring. The real problem is rechokeDownloads' peer decision algorithm.

  1. basing maxPeers on t->interestedCount

The number of peers set as interesting the first time calling rechokeDownloads for a given torrent is used as the basis for subsequent calculations. The rate of increase of interesting peers is 1 every 10 seconds so a low initial interesteCount, typically because transmission doesn't know many peers yet, really hurts performance. Restarting transmission, so peers get cached and boost the intial interestedCount, should work around this problem.

  1. halving maxPeers if no blocks are recieved

If no blocks have been received by the second and subsequent runs of rechokeDownloads then it starts halving maxPeers down to the minimum value of 5.If the initial set of known peers is slow to respond or are choking transmission then even fewer requests for data are going to be sent out, the opposite of what should happen. Another potential way to hit this issue is restarting a torrent after stopping it for more than a minute.

Each of the two issues would reduce initial downloading performance by themselves but they have a synergy which can cripple transmission in the first few minutes of leeching.

Last edited 11 years ago by Longinus00 (previous) (diff)

Changed 11 years ago by Longinus00

comment:17 Changed 11 years ago by charles

Longinus00: this is an excellent summary of the problems I was seeing the other day when we discussed this in IRC. Thanks for writing it up.

The patch is interesting, too. I was trying to come up with an approach to address this that didn't cause more oscillation, and the history struct you added fits the bill.

Attached is an (untested, sorry) revision of it that tries to:

  1. change the "10" to a number less brittle if/when either RECHOKE_PERIOD_MSEC or CANCEL_HISTORY_MSEC change.
  1. don't base on t->interestedCount when tweaking the number in the "else if (blocks)" code. It looks like using that variable there could cause us to make things worse if t->interestedCount is much different than t->maxPeers.
  1. rename t->maxPeers as t->maxUnchokedPeers

Changed 11 years ago by charles

comment:18 Changed 11 years ago by Longinus00

charles:

I originally used maxPeers but changed to interestedCount because interestedCount represents how many peers transmission actually requested blocks from. If cancels are coming in then interestedCount is probably too high but maxPeers can potentially be much higher than interestedCount so using interestedCount helps the initial response. The difference between using maxPeers and interestedCount will only matter the first several times through rechokeDownloads with cancels so if you like using maxPeers the whole way through for clearness then it shouldn't be a big issue.

comment:19 Changed 11 years ago by charles

Yes, I think you're right that the difference could cause problems during the first few iterations. I think we're better off having one variable and adjusting it over time, but I think we can address both issues by having that one variable be interestedCount instead of maxPeers.

Attached is yet another untested patch that makes that change, and a couple of others:

  • extracts getInterestingPeerLimit() into its own function for clarity
  • renames interestedCount as interestingPeerCount for peer/client clarity
  • simplifies the code that sets up the calls to tr_peerMsgsSetInterested()

Changed 11 years ago by charles

comment:20 Changed 11 years ago by Longinus00

How does that patch fix problem 1?

Last edited 11 years ago by Longinus00 (previous) (diff)

Changed 11 years ago by charles

comment:21 Changed 11 years ago by charles

I think the description of problem 1 is only half correct.

Keeping 1 field and adjusting it periodically is preferable to keeping two values for the current limit and ideal limit. If those values diverge too much then we can run into trouble with slowing down too quickly or speeding up too slowly.

IMO the way trunk lowers the interesting count based on cancelRate is good. The real problem 1 is that the way it raises the interesting count is too inflexible -- specifically, the "1 peer every 10 seconds" clause.

I suggest we add a new heuristic that does for increases what cancelRate does for decreases. For example if we measure a saturation level of how many peers we're interested in / how many are interesting, we could have the increases be larger for lower saturation levels.

comment:22 Changed 11 years ago by Longinus00

I still don't understand why initial request limits should be based on how many interesting peers are seen the first time through rechokeDownloads. A heuristic you might want to consider is:

Assuming bandwidth is not an issue (i.e. will never get a cancel for being too slow), how long will it take to ask m peers for data if the first run through rechokeDownloads sees i interesting peers (i > 0) and MAX_PEERS subsequent times?

Given m = 60 and i = 1, the code in trunk will take around 550 seconds while your new code will take around 300 seconds. Cancels take 2 minutes to come in so I'm not sure what the purpose of this delay is.

Using saturation as a metric for increasing requests is not meaningful. The limitation on the number of requests sent out is cancelRate which is independent of satuartion. If bandwidth is limited then canelRate will increase leading to a reduction in requests and a low saturation. As soon as cancelRate has dropped below the cutoff you're likely to go more than one cycle at the lowest saturation rung, exacerbating the osculating behavior of the proportional feedback.

If you want some sort of variable increase in requests then make it inversely proportional to the integral of cancelRate, i.e. the longer you've gone without errors then the more you increase your requests.

comment:23 Changed 11 years ago by charles

Your 300 number is correct. The saturation rate is an improvement over trunk, but it isn't a solution.

Using a time-since-last-cancel heuristic is reasonable, but its complexity compared to your first patch outweighs any advantages over that first patch.

Thank you for providing me a wall to bang my head against.

comment:24 Changed 11 years ago by charles

  • Keywords backport-2.0x added
  • Milestone changed from None Set to 2.20
  • Owner set to Longinus00
  • Status changed from reopened to new

comment:25 Changed 11 years ago by charles

  • Resolution set to fixed
  • Status changed from new to closed

fixSlowStart.diff applied to trunk by r11283.

livings: IMO we should consider this for inclusion in 2.10

comment:26 Changed 11 years ago by livings124

  • Milestone changed from 2.20 to 2.10

comment:27 Changed 11 years ago by Longinus00

I don't think that using 'time since cancel' as a metric is any more complicated than what is currently implemented. Here is an example implementation I whipped up. Using the example I gave earlier this will finish in under a minute.

Changed 11 years ago by Longinus00

comment:28 Changed 11 years ago by livings124

Reopening for further discussion.

Changed 11 years ago by Longinus00

combines the fast startup of the first patch with the more advanced stepping of the 2nd

comment:29 Changed 11 years ago by livings124

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for real this time.

comment:30 Changed 11 years ago by livings124

  • Resolution set to fixed
  • Status changed from reopened to closed

Re-closing. The new bits have been moved to #3600.

comment:31 Changed 10 years ago by charles

  • Keywords backport-2.0x removed
Note: See TracTickets for help on using tickets.