Opened 8 years ago

Last modified 8 years ago

#5300 new Bug

Scrape error: Tracker gave HTTP response code 502 (Bad Gateway)

Reported by: helloadam Owned by:
Priority: Normal Milestone: None Set
Component: Transmission Version: 2.76
Severity: Normal Keywords: scrap error, bad gateway, announcer.c
Cc:

Description

Me again. We are trying to track down some wideness that is happening with transmission. Currently the transmission-daemon is running roughly 4,000 torrents. It announces everything okay, however when it comes to scrap (asking for more peers) it batches everything up causing to many requests to the tracker.

Currently tested with Transmission 2.50 && 2.73 (from source) with curl 7.22 and libevent 2.0.21 on CentOS 6 x86_64. Our internal tracker is behind a reverse proxy to support SSL and help with load. Its setup correctly, and only responds with the errors, on scrap with the following error message

[01:22:20.085] Sun-Feb-17.001.RAW.CID-738193281039340REGH Scrape error: Tracker gave HTTP response code 502 (Bad Gateway) (announcer.c:1255)
[01:22:20.085] Sun-Feb-17.001.RAW.CID-738193281039340REGH Retrying scrape in 346 seconds. (announcer.c:1264)

(this goes on and on in the log for every torrent)

Now running an older version of transmission, 2.13 with libevent 1.4.14b with ~7,000 torrents does not fail nor give out any 502 bad gateways.

Questions:

  • Is libevent the issue here?
  • Do older versions of transmission queue/batch up scrap requests better then new ones?
  • Do I need to stop all torrents, batch up lets say 100 at a time and use transmission-remote --reannounce and sleep for a few minutes before I do the next batch of 100?

Any insight / help much appreciated. Thanks!

Change History (7)

comment:1 follow-up: Changed 8 years ago by jordan

I've got no idea, I've never seen this. Or rather, I've seen 502 errors, but only as the result of a local problem. It doesn't seem like a local error would cause a 502.

Do you have a link to a .torrent that exhibits this behavior?

Last edited 8 years ago by jordan (previous) (diff)

comment:2 Changed 8 years ago by jordan

  • Summary changed from Scrap Error - Bad Gateway - 4,000 Torrents to Scrape error: Tracker gave HTTP response code 502 (Bad Gateway)

comment:3 in reply to: ↑ 1 Changed 8 years ago by helloadam

Replying to jordan:

I've got no idea, I've never seen this. Or rather, I've seen 502 errors, but only as the result of a local problem. It doesn't seem like a local error would cause a 502.

Do you have a link to a .torrent that exhibits this behavior?

Unfortunately not. This is on our internel network. I am running this command

echo > bad.gateway.txt ; for i in $(transmission-remote --list | awk {'print $1'}); do transmission-remote -t $i -it | grep "Bad Gateway" | tee -a bad.gateway.txt; done; cat bad.gateway.txt | wc -l

which spits out how many torrents are getting scrap error. At first it was spitting out a number like 3,500. Now its down to 1,712 (hours later). Sample output here:

  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 37 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 2 minutes, 4 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 15 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 46 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 26 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 6 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 2 minutes, 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 16 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 46 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 46 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 46 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 6 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 6 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 6 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 2 minutes, 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 16 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 2 minutes, 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 46 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 16 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 2 minutes, 5 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 26 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 26 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 38 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 16 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 6 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 7 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 17 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 39 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 47 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 7 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 39 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 47 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 7 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 17 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 5 minutes ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 5 minutes ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 27 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 47 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 27 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 18 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 40 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 18 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 1 minute, 40 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago
  Got a scrape error "Tracker gave HTTP response code 502 (Bad Gateway)" 48 seconds ago

As you can see, the reannounce times for scrap are for the most part together. If transmission gets restarted, everything gets announced at once and the reannounce times, 45 seconds, 1 minute, 3 minutes, ... 10 minutes, etc are all together. When running large number of torrents, 4,000, it hammers away at the tracker like a DOS attack. In our case, tracker returns 502 bad gateway.

Three things I am going to test:

  1. Run transmission-daemon in a more verbose output, does export TR_CURL_VERBOSE=1 still work? Or just run daemon with --log-debug?
  2. Stop all torrents via transmission-remote and start + reannounce (--start --reannounce) in batches for 100, with a sleep time of 1 minute. This should hopefully fix the issue.
  3. Try an older version of libevent (1.x series).

comment:4 Changed 8 years ago by jordan

Okay, so the issue is the grouping? Or the 502 response?

The grouping is somewhat intentional because it improves the number of times that we can replace N scrapes with a single multiscrapes:

static time_t
get_next_scrape_time (const tr_session * session, const tr_tier * tier, int interval)
{
    time_t ret;
    const time_t now = tr_time ();

    /* Maybe don't scrape paused torrents */
    if (!tier->isRunning && !session->scrapePausedTorrents)
        ret = 0;

    /* Add the interval, and then increment to the nearest 10th second.
     * The latter step is to increase the odds of several torrents coming
     * due at the same time to improve multiscrape. */
    else {
        ret = now + interval;
        while (ret % 10)
            ++ret;
    }

    return ret;
}

And when you've got 4000 torrents, batching scrapes together into a multiscrape strikes me as a very sound idea.

Again, a 502 response indicates some kind of error taking place between a proxy or gateway, and an external server. Unless you can give me a .torrent that reproduces this behavior, I'm not convinced that this is a Transmission issue.

However, if you think that the grouping is the cause of the problem, you should be able to test the hypothesis by removing the "while (ret % 10) ++ret;" clause from the code snippet I quoted from libtransmission/announcer.c...

comment:5 Changed 8 years ago by cfpp2p

Currently tested with Transmission 2.50 && 2.73 (from source) with curl 7.22 and libevent 2.0.21...

Now running an older version of transmission, 2.13 with libevent 1.4.14b with ~7,000 torrents does not fail nor give out any 502 bad gateways.

Multiscrape was introduced in v2.30 at #4113 so it might be somehow related to, whether it is a transmission problem or not, multiscrape.

Other experiments to possibly help isolate where this problem is coming from might be to alter announcer-common.h TR_MULTISCRAPE_MAX to a different values and see what the results are.

announcer-common.h

enum
{
  /* pick a number small enough for common tracker software:
   *  - ocelot has no upper bound
   *  - opentracker has an upper bound of 64
   *  - udp protocol has an upper bound of 74
   *  - xbtt has no upper bound */
  TR_MULTISCRAPE_MAX = 64
};

comment:6 follow-up: Changed 8 years ago by jordan

helloadam, any news?

comment:7 in reply to: ↑ 6 Changed 8 years ago by helloadam

Replying to jordan:

helloadam, any news?

Sorry, been a busy week. So I spent some time on this, its a combo of both transmission and the tracker software that is being used. Here is some info:

Tracker software, ocelot, opentracker, xbtt, etc. are limited in numbers and development work is sparse. A popular feature is to use reverse proxy (nginx for example) to be able to load balance, offer SSL, cache for some specific edge cases, etc. So in our setup our reverse proxy was not liking huge scrape requests. When running 4,000+ torrents and after a client restart (transmission), they all start up in sync and when they fail to receive a 200 response from tracker, they sync up in the next reannounce time.

A solution for this was to use transmission-remote to stop all the torrents, then batch up lets say ~100 torrents and use --reannounce wait a few seconds, then continue. I also modified libtransmission/announcer.c and removed the while bit per your suggestion. Not sure how much impact that had.

So far, all is good. Just need to sort out the pesky open file limit which is stuck at 1024 ;-)

If anything, this ticket shows an edge case which is not noticeable unless you start running a lot of torrents.

Note: See TracTickets for help on using tickets.