Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#1254 closed Bug (fixed)

reconnecting too frequently to the same peer

Reported by: charles Owned by: charles
Priority: Normal Milestone: 1.34
Component: libtransmission Version: 1.33
Severity: Major Keywords:
Cc:

Description

User Switeck has reported that Transmission connects too frequently to the same peer.

7:19 -!- Switeck [n=chatzill@c-68-59-65-173.hsd1.al.comcast.net] has joined #transmission
17:23 < Switeck> Does Transmission v1.33 have Encrypted connection bugs? It doesn't stay connected with me. :(
17:24 < Gimp_> Switeck: not that we know of. none have been reported.
17:24 < Switeck> I keep getting [Transmission/1.3.3.0 (0.0)]: Disconnect: Connection closed
17:24 < Switeck> from basically every Transmission client i'm connected to. :(
17:25 < Switeck> I've seen a single Transmission client do that 8 times in less than 20 seconds.
17:26 < Switeck> on just 1 torrent
17:27 < Switeck> I was told Transmission shouldn't even be attempting the same ip+port that fast on a single torrent, but seeing as I've seen it co
nstantly...I somehow think it's a bug.
17:28 < Switeck> TCP View is also confirming these short-lived connections are both real and incoming rather than just being misreported by my clie
nt.
17:29 -!- szsuj [n=ubuntu@218.17.78.91] has quit [Read error: 110 (Connection timed out)]
17:37 <@muks> Switeck: are you using utorrent?
17:37 < Switeck> yes
17:37 <@muks> it may be a utorrent bug rather than a transmission bug
17:37 <@muks> you have reported this in the past iirc
17:38 < Switeck> why would it be a uTorrent bug that Transmission makes 8 connections to me in <30 seconds?
17:38 <@muks> and you did a encrypted test transfer with me
17:38 <@muks> which worked fine
17:39 < Switeck> Just before I left the other day, a programmer for Transmission reported there *might* be an issue in encryption/connection
17:39 <@muks> huh
17:39 <@muks> who was this? ProNoob?
17:40 <@muks> Switeck: why would it be a uTorrent bug that Transmission makes 8 connections to me in <30 seconds?
17:40 < Switeck> I think it was ProNoob
17:40 -!- John_Clay [n=J@unaffiliated/johnclay] has joined #transmission
17:41 -!- mode/#transmission [+o John_Clay] by ChanServ
17:41 <@muks> that's not buggy behaviour. if transmission suspects that a connection got reset by the ISP, and that data transfer had taken place b
efore, it'll try again with this peer
17:41 <@muks> ProNoob is not a transmission developer
17:41 <@muks> he was just someone who was whining ehr
17:41 <@muks> here
17:41 < Switeck> muks, I can't see how Transmission attempting lots of connections to the same ip+port on the same torrent...as a remote client bug
17:42 <@muks> no, about that behaviour, i said that it's not a bug in transmission
17:42 <@muks> about the disconnects, it could be something in uTorrent or the ISP on either end. utorrent did have a similar bug fixed in its chang
elog
17:43 <@muks> Switeck: are you seeing this from multiple IP addresses?
17:43 < Switeck> yes
17:43 <@muks> please pastebin the log with multiple ip addresses.. i'll take a look
17:43 < Switeck> only Transmission ones or EXTREMELY old BitComet clients
17:44 <@muks> Switeck: iirc we did an encrypted transfer where i used transmission
17:44 <@muks> it worked fine, right?
17:45 < Switeck> yes, it ddn't go into connection flailing mode
17:45 <@muks> so it could be some ISP issues
17:45 <@muks> also you were constantly sending me TCP resets
17:45 <@muks> during that transfer
17:45 < Switeck> it seems to be a few ISPs
17:45 < Switeck> yes, that much is normal...but TCP resets are different than this
17:45 -!- brotig [n=chris@ip-62-143-88-149.hsi.ish.de] has quit [Remote closed the connection]
17:46 < Switeck> what's pastebin's link again?
17:46 <@muks> transmission.pastebin.com
17:46 <@muks> Switeck: we are willing to fix issues, but need more info if we are to figure out what the issue is
17:47 <@muks> if there was really a normally replicable problem with transmission and encryption, we'd have lots of users complaining here
17:47 <@muks> maybe it has to do with the network in this case
17:47 < Lss> or maybe we just dont check
17:47 < Lss> how do you see such stats anyway?
17:47 < Lss> im running daemon
17:47 <@muks> need more info
17:47 < Switeck> http://transmission.pastebin.com/m251c7f9b
17:48 <@muks> Lss: according to his last logs, no data transfer took place
17:48 <@muks> Lss: i have had plenty of data transfer so far in encrypted mode with other clients
17:49 < Lss> that means chances are its isolated?
17:49 < Switeck> I see Transmission clients stay connected to me for in excess of 15 minutes
17:49 < Switeck> then the SAME ips go into connection flailing mode for 5+ mins straight
17:50 < Switeck> ...then connect fine for 10+ mins
17:50 -!- ThomasShirley [n=ThomasSh@host81-158-249-72.range81-158.btcentralplus.com] has quit [Connection timed out]
17:50 <@muks> Lss: that's my point too, and i suspect the network more than the software
17:50 <@muks> Switeck: is utorrent open source?
17:50 < Switeck> no
17:51 < Lss> utorrent is coded in machine?
17:51 <@muks> Switeck: then i think it's better you get the utorrent devs to analyze this problem, as they have both ends of the software
17:51 <@muks> Switeck: and for what pronoob said, i have reviewed the encryption code in transmission and it's fine
17:51 <@muks> Switeck: pronoob was whining about the code quality of transmission, and that's just his opinion
17:52 < Lss> humm x-files is fun
17:52 < Lss> head transplants
17:52 <@muks> Switeck: i don't know what else can be done for this bug from our end. the log file has very limited info.
17:53 < Switeck> uTorrent doesn't give any reason for the connection closing probably because Transmission didn't give it one.
17:53 <@muks> Switeck: could be the connection was reset
17:53 < Switeck> no, absolutely not
17:53 <@muks> why not?
17:54 < Switeck> because then Transmission wouldn't be telling uTorrent to close the connection
17:54 < Switeck> the conneciton would already be broken
17:54 < Acrophosist> Switeck: You're on Comcast; are you certain this is not a factor? I presume you are aware they've used Sandvine to detect BT t
raffic and forge RST packets to both sides of the xfer?
17:55 < Switeck> yes, I am very familiar with the distinctive close methods ComCast uses
17:55 < Acrophosist> Okay, just checking
17:55 < Switeck> 10054 errors (conneciton reset by peer)
17:55 < Lss> wouldnt that explain alot?
17:55 < Switeck> I wish
17:56 < Switeck> it's the exact opposite of what I'd expect
17:56 < Switeck> Instead of seeing a broken connection by reset by peer (forged RST packets), I'm getting encrypted disconnect messages
17:56 < Switeck> seemingly from Transmission clients
17:57 < Lss> might there be a remote chance that comcast upgraded?
17:57 < Acrophosist> They *claimed* they were going to stop doing that
17:57 < Lss> im not getting good speeds on Transmission too because of 2 factors
17:57 < Lss> lack of DHT and my upload being 0.25mbit
17:58 < Lss> yet transmission still decides to open up 15 upload slots
18:00 < Switeck> does Transmission have an upload slots per torrent limit?
18:00 < Lss> it doesnt
18:00 < Lss> or at least you cant limit it
18:01 < Switeck> oh that's dumb...BitComet-level dumb :P
18:01 <@muks> Switeck: you can limit number of peers per torrent
18:01 < Switeck> yeah, but who wants to set theirs to <10 to prevent too many upload slots?
18:02 <@muks> Switeck: i really don't know what might be happening here
18:02 < Switeck> what about half open connection limit in Transmission...to prevent 8 connection attempts to the same ip in 16 seconds?
18:03 -!- brotig [n=chris@ip-62-143-88-149.hsi.ish.de] has joined #transmission
18:03 <@muks> it seems to be the connection is established, so it isn't just a half-open
18:03 < Lss> good pt im recommending a global max upload slots seeing how there is already a global max connections
18:03 <@muks> even the encryption handshake finishes
18:03 < Lss> it will greatly help ppl like me
18:04 < Switeck> muks, don't Trans. have to cycle through all other ips before retrying an ip again?
18:04 <@muks> no
18:05 <@muks> that doesn't make sense
18:05 < Switeck> actually, it does
18:05 <@muks> if my connection to a peer which is giving me data got reset, i want to try that peer again
18:05 < Switeck> sure, but with a cooldown at least!
18:05 < Lss> humm the new xfiles movie is fairly fail
18:06 < Lss> where are the real x-files!!!
18:06 < Lss> oh wait wrong channel
18:07 <@muks> Switeck: maybe if too many connections got reset, we can mark that peer as a faulty peer
18:07 <@muks> but the handshake got completed here according to your log, so something else may be happening
18:07 <@muks> Switeck: please take this to utorrent devs. they have access to the transmission code too.
18:07 < Switeck> I am
18:08 < Switeck> I'm referring to them as hostile Transmission clients ;P
18:08 < Switeck> since it obviously doesn't happen to every transmission client every time
18:14 <@muks> http://forum.utorrent.com/viewtopic.php?id=26991
18:14 <@muks> see some are getting that with azureus too
18:15 < Lss> i guess thats pretty much conclusive
18:16 < Lss> just wondering but how high/low is the priority for the web client?
18:16 < Switeck> muks, that link isn't the same issue
18:16 < Switeck> totally inconclusive
18:17 <@muks> [01:30:41]  75.68.175.64    : [Azureus/3016   ]: Encrypted handshake completed
18:17 <@muks> [01:30:41]  75.68.175.64    : [Azureus/3016   ]: Disconnect: Connection closed
18:18 < Switeck> He's on Road Runner ISP
18:18 <@muks> so
18:18 < Switeck> So that's probably not an ISP issue at least
18:18 < Switeck> and it was long enough ago that it isn't the same version of uTorrent unless the bug is very old
18:19 < Switeck> Disconnect: Is seed
18:19 < Switeck> is another possible cause
18:19 <@muks> that is seed is some other peer
18:19 < Switeck> in seed mode, other seeds MAY drop you silently without saying "is see"
18:19 -!- brotig [n=chris@ip-62-143-88-149.hsi.ish.de] has quit [Read error: 113 (No route to host)]
18:20 <@muks> look, i doubt we can do anymore about this bug. you're the one experiencing it with certain transmission clients behind certain ISPs,
 with utorrent. unless we have more info on the problem, we can't help.
18:21 < Switeck> what info do i need to dig for?
18:21 < Switeck> wireshark logs?
18:21 <@muks> we certainly don't have the source code for utorrent to take a look there, so please ask the utorrent devs to take a look at both end
s.
18:22 <@muks> Switeck: wireshark logs can show any possible out-of-the-ordinary packets like disconnects, but they won't help with encrypted peer t
raffic
18:22 < Switeck> good point :(
18:27 <@muks> Switeck: this could be pretty much harmless
18:27 -!- pea_ [n=pea@nunnu.0x00.ee] has quit ["No CPUs found. System halted."]
18:27 <@muks> as the peers could be doing PEX and disconnecting
18:28 < Switeck> PEX?
18:28 < Switeck> that reuses the same connection
18:29 <@muks> no i mean, if transmission detects that the other peer now has everything that we've got, and we haven't done a peer exchange recentl
y, then we disconnect
18:29 < Switeck> then retry the same ip a second later?
18:29 <@muks> that should not happen
18:30 -!- rob-to86 [n=rob-to86@p5B10E391.dip.t-dialin.net] has quit [Client Quit]
18:31 <@muks> but it might due to some reasons.. based on the peer limits, what the tracker gives us, etc.
18:32 < Switeck> then retry the same ip again 7 more times in <15 seconds?
18:32 <@muks> it does stop, right? or do you get connections non-stop?
18:33 < Switeck> it's intermittent
18:33 < Switeck> hits me 3-10 times, then nothing for >1 minute
18:33 < Switeck> then repeats
18:33 <@muks> see.. the "connect to the same ip" logic means that we keep the peer details around (that we tried last connecting to this peer recen
tly)
18:34 <@muks> but depending on the limits set on how many peers we keep info on, we may not (i haven't checked this code)
18:34 <@muks> we may kick this peer out of our list
18:35 <@muks> then, when the tracker or PEX gives the same peer again, it would be fresh
18:35 <@muks> i'll check that code, but in the above case, nothing can be done for it
18:36 <@muks> you should try to find out what torrent it's trying to get..
18:36 < Switeck> it's considered hostile behavior, I'm requesting uTorrent auto-ban such behavior.
18:36 <@muks> what traffic is exchanged, etc.
18:36 <@muks> Switeck: ok
18:36 < Switeck> Smoovious>     so, go back to those fools and tell them that you let uTorrent know about the handshaking, but that they need to fi
x THEIR hammering problem. :D
18:36 <@muks> like i said, utorrent devs should no better
18:36 < Switeck> and the Transmission devs should know better too!
18:36 <@muks> know
18:37 <@muks> no, about what exactly is going on in this case
18:37 <@muks> well calling us fools is not gonna help
18:37 <@BentMyWookie> how do you expect us to determine where the bug is working with code we don't have access to
18:37 <@muks> exactly
18:37 < Switeck> I'm quoting Smoovious
18:37 <@BentMyWookie> our code is open source, so feel free to debug it
18:38 <@charles_> muks: ping
18:38 <@muks> Switeck: if they want to help debug this issue, they can debug under what conditions this problem happens, perhaps with more verbose 
logging. we don't have access to utorrent source code to do this.
18:38 <@muks> charles_: boom
18:42 <@muks> Switeck: also, please do tell him that things like vice-versa bans, talk of foolishness, etc. won't help fix the issue or help in any
 future co-operation. if that person is a developer, he knows that problems happen in such large code bases. fixing implementations' interoperabili
ty can only be done in co-operation, not by mocking.
18:42 -!- mcdebugger [n=mc@93.157.16.67] has quit [Excess Flood]
18:42 < Switeck> muks, that's my suggestion of a ban
18:42 < Switeck> not his
18:42 -!- mcdebugger [n=mc@93.157.16.67] has joined #transmission
18:42 <@muks> that's not a very nice thing to suggest then
18:42 < Switeck> ban-by-hostile behavior
18:43 <@muks> it's better you press that they help in fixing this issue
18:43 <@charles_> Switeck: thanks for the pastebin
18:43 < Switeck> the main problem that concerns me is caused by Transmission
18:43 <@muks> charles_: do you think the behaviour in shouldPeerBeClosed() may be causing this?
18:44 <@charles_> muks: I'm still catching up on scrollback
18:44 -!- Lss [n=Lss@cm22.delta95.maxonline.com.sg] has quit [Read error: 110 (Connection timed out)]
18:44 <@charles_> Switeck: it looks like, in the latest version (1.33), it's not doing much hammering in that pastebin?  or am I reading it wrong?
18:44 < Switeck> charles, I am at a loss as to what causes this...but it is severe enough to even cause connection issues here.
18:45 < Switeck> I didn't catch the moment when a v1.33 was doing it...others were crowding it out
18:45 <@charles_> Switeck: I see 4 separate 1.33 clients, it looks like each of them are waiting 10+ seconds to reconnect
18:46 <@muks> charles_: Switeck and i did an encrypted transfer which passed btw
18:46 < Switeck> I'm seeing instances with v1.33 not waiting 3 seconds
18:47 <@charles_> muks: I'm wondering if the priority code in peer-mgr.c:1835 has something to do with this
18:48 <@muks> charles_: do you reckon that tr detects that the other end has all pieces in shouldPeerBeClosed() and close that connection, but agai
n connect back cause the priority code picks it up?
18:49 <@muks> charles_: these are not due to recent changes only btw..
18:49 <@muks> s/only//
18:50 <@muks> cause iirc in Switeck's old log, there were older clients than 1.33
18:50 <@muks> oh this is 1.22 in the new log.. nevermind
18:52 <@charles_> Switeck: can you make a log w/1.3 clients please?
18:53 < Switeck> what do you mean?
18:53 <@charles_> a log with 1.33 clients demonstrating the `hammering' behavior
18:53 <@muks> charles_: getPeerCandidates() does check the seed flag.. so it's not what i suggested ^
18:54 <@charles_> muks: well, that "< 10" is intended to do sort of what Switeck is saying, but only for a very short term
18:54 < Switeck> almost forgot...I even saw ~100 Time-Wait connections from the same Transmission client in TCP view
18:54 <@charles_> the idea is that if an ISP is causing a productive peer to disconnect, we try to reconnect to it again first.  but this priority 
boost only lasts for a few seconds
18:55 < Switeck> these HAD to occur very quickly, as Windows should close those automatically in 6 minutes.
18:55 <@charles_> Switeck: are you suggesting that 1.33 be banned for something that 1.06 is doing?
18:55 < Switeck> no, neither
18:55 < Switeck> ban by bahavior, not client tag
18:56 < Switeck> if v1.33 doesn't do it, it'd never get banned
18:56 <@charles_> muks: the uTorrent devs are pretty smart people, and alus lurks in-channel.  it doesn't do any good to get into a back-and-forth 
with uTorrent
18:57 <@muks> charles_: agree
18:57 <@charles_> well, what we could do, is put in some explicit code to say never connect to the same peer within N seconds of the previous attem
pt
18:58 <@charles_> we try to randomize the peers, iirc, but in a low-peer environment it could be that we visit the same well too many times
18:58 < Switeck> only reason it's as "slow"as it is...is because that's the fastest it can encrypted handshake and drop
18:58 <@charles_> heh :)
18:59 <@muks> charles_: we should try once after a disconnect, if a disconnect hadn't happened recently
19:00 <@charles_> right now that interval is set at http://trac.transmissionbt.com/browser/trunk/libtransmission/peer-mgr.c#L1777
19:00 <@charles_> where atom->numFails is the consecutive number of connection failures to a peer
19:01 -!- jodo [n=bryan@c-66-229-252-15.hsd1.fl.comcast.net] has joined #transmission
19:01 <@charles_> actually, those values look pretty reasonable to me, unless I'm missing something
19:01 <@charles_> for 0 & 1 failures there's a low threshold (0 and 5 seconds), but then it ramps up very quickly to two minutes, 15 minutes, 30 mi
nutes
19:02 < Switeck> I get lots of these from Transmission clients:  Disconnect: Invalid packet length
19:02 < Switeck> and transmission clients are trying to request >50 pieces at once from me
19:02 <@charles_> Switeck: logs would be helpful in those cases
19:03 < Switeck> http://transmission.pastebin.com/m1b9a8288
19:03 < Switeck> http://transmission.pastebin.com/m251c7f9b
19:03 < Switeck> there you go
19:03 <@charles_> part of the problem here is that Ubuntu H still has 1.06.  Once Ubuntu I comes out with 1.33, the behavior should improve 
19:03 <@charles_> Switeck: thanks :)
19:03 < Switeck> it was in the older logs, just not 'perfect' examples
19:06 < Switeck> I'm trying to "distill" the old logs down to specific instances, I can post that at pastebin when done
19:06 <@charles_> looks like the behavior in http://trac.transmissionbt.com/browser/trunk/libtransmission/peer-mgr.c#L1777 first appears in 1.33.  
From my reading, it's possible that 1.32 and older could have the behavior Switeck is describing, especially in low-peer situations
19:06 <@charles_> Switeck: I'd be very interested indeed if you see this behavior in *any* 1.33 peers
19:07 < Switeck> 72.42.186.237 and 70.249.219.118
19:07 < Switeck> do it LOTS...and they're v1.33
19:08 <@muks> charles_: when are peer atoms freed?
19:08 <@charles_> muks: never
19:08 <@muks> ah :)
19:09 <@muks> i thought when the peer struct was freed, the numFails and piece data time would also get destroyed with it
19:09 <@muks> :)
19:09 <@muks> i can't find any faults now!
19:09 <@charles_> muks: nope, never freed.  that's why the atom structures are intentionally small.
19:10 < jodo> im having a problem i cant seem to get around, im trying to add a web-seed to a torrent but transmission isnt recognizing it
19:10 <@charles_> Switeck: 72.42.186.237:33222 sure does seem to be misbehaving
19:10 < Switeck> 70.249.219.118 does it some as well
19:10 <@charles_> Switeck: is there any way to turn on more verbose logging?  I'd love to know what uTorrent is referring to wrt Invalid packet len
gth
19:11 < Switeck> you and me both!
19:11 <@charles_> alus: ^ :)
19:11 < Switeck> I can log about 25 things, but not exactly what the invalid packet length is...just that one occurred
19:12 < Switeck> 50+ piece requests at once...if there's no upper limit for that, i guess it could make for a very long packet
19:12 <@muks> the fact that this doesn't happen under normal conditions means that the uTorrent devs could help us a bit by adding more verbose log
ging on what happens after the handshake and before the disconnect. according to Switeck, the log is reporting that transmission is disconnecting
19:12 <@charles_> uTorrent's logging is pretty good.  that's where I got the inspiration for our `deep debug' logging messages
19:13 <@charles_> I suspect the ability is already there, if we knew the right config options to set
19:13 < jodo> ill wait
19:13 <@muks> charles_: i mean, specifically for this problem, unless more verbosity can be configured in utorrent right away to help us
19:13 <@muks> ah
19:14 <@muks> Switeck: another thing that you can try: if you can force unencrypted traffic only, do that, and use wireshark to do a packet capture
19:14 <@muks> see if the disconnects happens unencrypted.. then we have the traffic
19:14 < Switeck> for what it's worth, I just made another pastebin post...this instant log of other v1.33 Trans clients doing this at: http://trans
mission.pastebin.com/m586ce2ff
19:14 < Switeck> <- ComCast
19:15 < Switeck> ...I can try unencrypted
19:15 < Switeck> but will see >90% RST connections (including encrypted connections) once I allow unencrypted connections
19:16 <@charles_> muks: (on an unrelated note, sometime soon could you take a look at the prefs dialog changes in the gtk client and see what you t
hink... I'm trying to make things simpler/less cluttered and would appreciate the feedback and a second opinion)
19:16 <@charles_> Switeck: *nod* 
19:16 <@charles_> Switeck: if it's happening with encrypted connections, I don't think an unencrypted log would help any
19:16 < Switeck> I'm on an insanely busy torrent too ><
19:17 <@charles_> unfortunately I need to run.  Switeck, will you be around tomorrow?
19:18 < Switeck> I can be
19:18 < Switeck> when?

Change History (5)

comment:1 Changed 13 years ago by charles

  • Component changed from Transmission to libtransmission
  • Keywords backport-candidate added
  • Milestone changed from None Set to 1.40
  • Owner set to charles
  • Status changed from new to assigned

r6700 should fix this, but needs more testing before closing this ticket.

comment:2 Changed 13 years ago by charles

  • Severity changed from Normal to Major

comment:3 Changed 13 years ago by charles

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:4 Changed 13 years ago by charles

backported to 1.3x branch in r6757

comment:5 Changed 13 years ago by charles

  • Keywords backport-candidate removed
  • Milestone changed from 1.40 to 1.34
Note: See TracTickets for help on using tickets.