Opened 8 years ago

Closed 8 years ago

#5395 closed Bug (fixed)

Announce/scrape robustness

Reported by: pschrst Owned by:
Priority: Normal Milestone: 2.81
Component: Transmission Version: 2.77
Severity: Normal Keywords:
Cc:

Description

A private tracker is having some network troubles, if no tcp connection was active for like ~60 seconds originating from a client ip address, then establishing a new connection will result in connection refused. If you negotiate another connection a second later from the same ip address, it will then just work. Transmission's announcement feature becomes completly broken here ("tracker gave http response code 0"), resulting in missing/incorrect seed stats at the tracker side. I know this is the fault of the tracker/network, however, rtorrent and deluge tolerate this fault more flexible, those clients just work. Transmission should be improved to be more robust regarding this issue. Using this patch Transmission works flawlessly with this buggy tracker again:

http://pastebin.com/YucaFs5w

Please apply this or a similar patch to the version control. Thank you

Change History (10)

comment:1 Changed 8 years ago by jordan

Hurm, I don't know if I like this.

If Transmission gets a response code of 0, it'll wait 1 minute to retry, and then wait another 5 minutes before a third try. In the use case you described ("no tcp connection was active for like ~60 seconds"), this would delay Transmission by 1-6 minutes before a good announce went out. I don't see that as "completely broken here."

This patch suggests re-announcing 10 times in the first minute alone. Is this really necessary?

comment:2 Changed 8 years ago by pschrst

Transmissions current behaviour makes this tracker impossible to use (while rtorrent and deluge work fine!). People usually workaround this issue by running a curl script in the background, which connects to the tracker every 20 seconds, so the route remains active and Transmission's announcing just works.

Some more aggressive retry is neccessary really. If you dont like it, make it optional, but please apply a patch like this.

comment:3 Changed 8 years ago by jordan

Could you expand on this using informative arguments rather than subjective ones? Throwing around "completely broken" and "impossible to use" is not very useful.

In the use case of a one-minute downtime, Transmission's first retry interval is actually one minute, so the delay here would be about a minute. Let's say the tracker is down for a few more minutes, Transmission waits another five fore the third try. So this does result in a delay of a few minutes, but that's not the same thing as "completely broken" and "impossible to use."

Also consider the case of a user with hundreds torrents on that tracker. Why is it better to attempt thousands of reannounces in the first minute alone? Have you considered what effect that will have on other trackers' torrents in the announce queue?

comment:4 Changed 8 years ago by pschrst

Jordan, if you try to negotiate a tcp connection to this tracker every 60 seconds (or even more rarely), it will be connection refused all the time. This is the reason why I said "completly broken", announcement will be _never_ succesful here.

If you do connect right after an unsuccessful one (lets say in 20 seconds), the connection will be established and announcing will succeed. This means tracker wont be flooded by reannouncements, the 1st quick retry will work.

Also, not all the announcements should be requeued after an unsuccessful one only the actual one just failed and different torrents' announcements are scheduled randomly anyway.

But feel free to tweak my patch, adding an optional feature which would permit users to retry reannouncement quickly only once and only after connection refused (error code 0), it would solve this issue.

comment:5 follow-up: Changed 8 years ago by cfpp2p

Patch to enable --> Failed 10 (scaled within 55 seconds) quick reannouncements so reset and wait ~5 minutes to retry

.........

     /* schedule a reannounce */
     interval = getRetryInterval( tier->currentTracker );
+
+     /* HACK BEGIN */
+    if((tier->currentTracker != NULL)&&(err == strstr(err,"Tracker gave HTTP response code 0"))) {
+       dbgmsg (tier, "Forcing announce retry because of the connection refused n**** issue");
+       interval = tier->currentTracker->consecutiveFailures;
+       if(tier->currentTracker->consecutiveFailures > 10) {
+         /* Failed 10 (scaled within 55 seconds) quick reannouncements so reset and wait ~5 minutes to retry */
+         tier->currentTracker->consecutiveFailures = 0;
+         const unsigned int jitter_seconds = tr_cryptoWeakRandInt( 60 );
+         interval = 300 + jitter_seconds;
+         }
+     }
+      /* HACK END */
+
     dbgmsg( tier, "Retrying announce in %d seconds.", interval );
     tr_torinf( tier->tor, "Retrying announce in %d seconds.", interval );
     tier_announce_event_push( tier, e, tr_time( ) + interval );

.........
 
     /* schedule a rescrape */
     interval = getRetryInterval( tier->currentTracker );
+
+     /* HACK BEGIN */
+     if((tier->currentTracker != NULL)&&(errmsg == strstr(errmsg,"Tracker gave HTTP response code 0"))) {
+       dbgmsg (tier, "Forcing scrape retry because of the connection refused n**** issue");
+       interval = tier->currentTracker->consecutiveFailures;
+       if(tier->currentTracker->consecutiveFailures > 10) {
+         /* Failed 10 (scaled within 55 seconds) quick reannouncements so reset and wait ~5 minutes to retry */
+         tier->currentTracker->consecutiveFailures = 0;
+         const unsigned int jitter_seconds = tr_cryptoWeakRandInt( 60 );
+         interval = 300 + jitter_seconds;
+         }
+     }
+     /* HACK END */
+
     dbgmsg( tier, "Retrying scrape in %zu seconds.", (size_t)interval );
     tr_torinf( tier->tor, "Retrying scrape in %zu seconds.", (size_t)interval );
     tier->lastScrapeSucceeded = false;


.........


comment:6 in reply to: ↑ 5 Changed 8 years ago by rb07

Replying to cfpp2p:

Patch to enable --> Failed 10 (scaled within 55 seconds) quick reannouncements so reset and wait ~5 minutes to retry

.........
+         /* Failed 10 (scaled within 55 seconds) quick reannouncements so reset and wait ~5 minutes to retry */
+         tier->currentTracker->consecutiveFailures = 0;
+         const unsigned int jitter_seconds = tr_cryptoWeakRandInt( 60 );

Shouldn't that "60" be "56"?

comment:7 Changed 8 years ago by cfpp2p

Replying to rb07

Shouldn't that "60" be "56"?

yes

++tier->currentTracker->consecutiveFailures increments from 0 to 10 total sum of timed interval seconds used, 0 through 10 = 55 seconds

Last edited 8 years ago by cfpp2p (previous) (diff)

comment:8 follow-up: Changed 8 years ago by jordan

Okay, after rereading the ticket I think I understand the problem statement a little better, but am now just confused in new & different ways. :)

Before, I parsed "no tcp connection was active for ~60 seconds" as saying that the tracker had some brief downtime. But after rereading the top and comment:4 you're saying that /if/ transmission hasn't connected announced in the last 60 seconds, then the first new announce will always fail but a second one send immediately afterwards will succeed. Right?

So, two questions.

  1. This behavior seems fundamentally wrong. What's the root cause behind this problem? Does the tracker exhibit the same behavior to other clients wrt failing the first new tcp connection but succeeding on an immediate follow-up? (Note, I'm not asking about the difference in clients' reannounce intervals, I'm asking if the tracker *always* fails the first new tcp connection regardless of client)
  1. In libtransmission/announcer.c's getRetryInterval(), if the first retry interval was something like 10 seconds, would that solve this problem s.t. we wouldn't need any other special case code to handle this?

comment:9 in reply to: ↑ 8 Changed 8 years ago by pschrst

Replying to jordan:

If transmission hasn't connected announced in the last 60 seconds, then the first new announce will always fail but a second one send immediately afterwards will succeed. Right?

Right. The first new tcp connection always fails, and the 2nd connection immediately afterwards will succeed. The issue is regardless of the client, its the TCP layer.

What's the root cause behind this problem?

According to the ops, its "network problem" and they are unaware to fix it.

if the first retry interval was something like 10 seconds, would that solve this problem?

I believe it would.

comment:10 Changed 8 years ago by jordan

  • Milestone changed from None Set to 2.81
  • Resolution set to fixed
  • Status changed from new to closed

I've lowered the initial retry interval in r14124 as described in comment:7

Note: See TracTickets for help on using tickets.