Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#3829 closed Bug (invalid)

transmission overwrites it's network connections & gets lost w/many torrents

Reported by: Astara Owned by:
Priority: Normal Milestone: None Set
Component: Transmission Version: 2.13
Severity: Critical Keywords: resource usage problems
Cc:

Description

I've had a long term problem that's had multiple symptoms, but that I've finally traced back to transmission. I've been collecting torrents that I register with 1 tracker over the past several months. Starting somewhere around 500, I'd guess, (I have over 600 torrents registered with this 1 tracker, alone, now, out of 1180 listed torrents). The problem is I've slowly been losing upload credits with this tracker, till in the past month or two, it's dropped to zero.

I was prompted to look at the problem in wireshark, and noticed that after a fresh restart, there's an initial 3 minutes of communication with the tracker, but nothing after that -- no more 'scratches'. This happens with the latest sources I downloaded and compiled as well as earlier versions.

What I do see, after about 20 seconds and several spread over the 3 minutes, are several TCP errors, where multiple TCP ports are re-used for communication to the tracker -- where previous TCP sessions on the same ports haven't been closed.

It appears transmission is simply reusing TCP handles without ensuring they are closed -- perhaps transmission is waiting for info back on those handles and hasn't gotten it yet, but it isn't closing the connections -- just writing over them.

Since 600 isn't even close to the number of possible ports or file descriptors, I can't see any reason why transmission would be re-using ports or file descriptors -- other processes are able to open thousands of file descriptors with no problem, but transmission gets lost when trying to use more than ~500. Not sure the exact number, but 629 is definitely a fail.

Ideas?

I've noticed transmission makes poor use of resources when it is given ALOT of resources. It's a 64-bit version (and machine) -- not a 32-bit machine, but it acts more like it's running on a 16-bit machine. Perhaps there needs to be several toggles? It can't even keep 1 processor busy on an 8 core machine when doing a verify of multiple torrents and it lets file-system cache memory go to waste (wasting system memory that's been allocated for that purpose).

Perhaps it's trying to save file or network connections somewhere when it doesn't need to and is over-writing it self -- basically pissing in its own pointer pool?

This has cost months of lost time tracking this down as well as Gigabytes of information of tracking info.

Change History (54)

comment:1 Changed 11 years ago by charles

Verify and system memory aren't related to TCP handles... those belong in separate tickets. libcurl handles the announce sockets. Try adding this to libtransmission/web.c's createEasy():

curl_easy_setopt( e, CURLOPT_FORBID_REUSE, (long)1 );

Then retest with wireshark please.

Last edited 11 years ago by charles (previous) (diff)

comment:2 Changed 11 years ago by livings124

Can you please supply more info on the TCP errors you're seeing.

comment:3 Changed 11 years ago by Astara

re: livings -- not sure what you want exactly, but the ones I'm seeing are 1) 6730 18:03:58.946 Ishtar.tlinx.org 49940 www.tracker.xxx sso-service TCP 66 [TCP Port numbers reused] 49940 > sso-service [SYN, ECN, CWR] Seq=0 Win=2920 Len=0 MSS=1460 WS=12

2) 8909 18:06:33.924 Ishtar.tlinx.org 45841 www.tracker.xxx sso-service TCP/XML 302 [TCP Retransmission] 45841 > sso-service [PSH, ACK] Seq=1 Ack=1 Win=4096 Len=248

and

3) 9345 18:07:12.076 Ishtar.tlinx.org 48067 www.tracker.xxx sso-service TCP 66 [TCP Dup ACK 9327#1] 48067 > sso-service [ACK] Seq=251 Ack=1 Win=4096 Len=0 SLE=0 SRE=1

2&3 are tcp errors, but not really problems for the most part. It's the 22 errors of type #1 in a capture lasting from 18:02-18:12 (~9k packets) before all conversation ceases.....

The reason it took me so long to figure out what was going on, is that with trackers where I have smaller numbers of torrents, I wouldn't get hit with this very often. But with my main tracker, I've been losing all stats for well over a month now. But even with no stats -- any clients picked up on the initial start would continue to upload/download until done.

The other symptom which I started noticing around 500 (maybe above 450), was that newly added torrents wouldn't start downloading from this particular tracker -- they would other trackers but not that one. I thought it was the tracker that was at fault, but wasn't closely monitoring upload stats due to real life issues and ratio-stats not being that high a priority to monitor (duh!)...

It took restarting the daemon to get torrents to start -- but since that re-initialized all the tcp streams, that explains why they would get started.

I have a feeling that if I'm not uploading many active torrents to the main tracker, but then get about 10-12 'reused-port' errors on that tracker, that might just over-write any of the sessions that were trying to report status, so quickly, communication dropped to zero.


have yet to try charles suggestion -- making the changes, and will report back...

comment:4 Changed 11 years ago by Astara

Just made that change: --- web.c.orig 2010-11-12 05:13:07.000000000 -0800 +++ web.c 2010-12-15 18:07:04.232297090 -0800 @@ -153,6 +153,7 @@

curl_easy_setopt( e, CURLOPT_MAXREDIRS, -1L ); curl_easy_setopt( e, CURLOPT_NOSIGNAL, 1L ); curl_easy_setopt( e, CURLOPT_PRIVATE, task );

+ curl_easy_setopt( e, CURLOPT_FORBID_REUSE, (long) 1);

#ifdef USE_LIBCURL_SOCKOPT

curl_easy_setopt( e, CURLOPT_SOCKOPTFUNCTION, sockoptfunction ); curl_easy_setopt( e, CURLOPT_SOCKOPTDATA, task );


wireshark is still showing the same prob: 7678 18:29:13.199 Ishtar.tlinx.org 54074 www.tracker.xxx sso-service TCP 66 [TCP Port numbers reused] 54074 > sso-service [SYN, ECN, CWR] Seq=0 Win=2920 Len=0 MSS=1460 WS=12

:-(

It sounded hopeful!...but now it sounds more depressing...maybe a bug in lib_curl?

comment:5 Changed 11 years ago by Astara

BTW -- a potential bug (not likely to be at fault here, but should be fixed to protect against future probs, curl docs say:

If you did not already call curl_global_init(3), curl_easy_init(3) does

it automatically. This may be lethal in multi-threaded cases, since curl_global_init(3) is not thread-safe, and it may result in resource problems because there is no corresponding cleanup.

You are strongly advised to not allow this automatic behaviour, by calling curl_global_init(3) yourself properly. See the description in libcurl(3) of global environment requirements for details of how to use this function.


I don't see a call to curl_global_init before the call to curl_easy_init.

I don't know how multi-threaded you are yet, but this probably should be fixed now rather than waiting for it to bite you in the *ss when some part concerning this does go multi-threaded and causes a problem.

I notice code related to threading in this module (web.c), so it's remotely possible it's causing problems already, but AFAIK, non of the trackers I am using use https, which does call the curl_global_init call, so I really have no idea if curl_easy_init might be getting called in multiple threads already, which would likely cause *some* problem...

comment:6 Changed 11 years ago by charles

curl_global_init's in tr_webThreadFunc().

Try setting the environment variable TR_CURL_VERBOSE=1 before starting Transmission? That gives good debugging output...

comment:7 Changed 11 years ago by charles

Anything to report?

comment:8 Changed 11 years ago by Astara

Primarly due to Real-Life Health Issues, I have a problem get alot done on any single project, on any short schedule - with delays sometimes being in the hours, though occasionally in the several-month time-frame (I've been without carpeting in half my house to a bath-tub flood for over a year, which doesn't help my health any, by itself, so it's not just limited to computer projects/issues). So beyond the initial compiles, no progress --

haven't tried the ENV var to see about it's debug output -- next on my TODO list for this proj. I keep thinking about it, but haven't quiet found the time& energy. Sorry.

(This is one of the reasons why I try to optimize my 'computer work' environments for whatever it is that I'm doing, since even what seem to be little concessions for others are just enough to "break the camel's" back, in my case, but obviously, the isn't something that's easy to explain, doesn't sound like an incredibly lame excuse, and doesn't make me sound pathetic, so I don't usually mention it. But, unfortunately, I'm 100% disabled as far as being able to work due to my health issues (due to direct and indirect limitations).

So, sorry but I don't know how long it will take me to track this down. ...

Linda W. (Astara)

comment:9 Changed 11 years ago by Astara

BTW -- where does the CURL VerB output end up/go?

comment:10 Changed 11 years ago by Astara

Still no debug output when setting Verbose.

I looked at libcurl's website, and found it has an extensive list of known bugs @ http://curl.haxx.se/docs/knownbugs.html

It might be safer to think about moving away from using libcurl considering what a large number of *known* bugs it brings into the project. Even if they aren't all exposed through use, their presense indicates pervasive algorithmic problems that would seem to make it unsuitable for a reliable real-time application like an event driven BT-client (as opposed to user-driven, where requests come in at a much less demanding pace (and certainly not as likely to have many overlapping sessions/requests going out in the same second).

comment:11 Changed 11 years ago by charles

what would we replace curl with?

comment:12 follow-up: Changed 11 years ago by Astara

What are the requirements? I.e. I'm unfamiliar with exactly what features of Curl you are using. I'm not a web-i/o expert and, so, am not familiar with exact specifics of all general purpose libs are available (Google might turn up suggestions), but other, similar, progs might have associated libraries or code that's reusable.

Depending on needs, though, you could be asking me what type of library you need do the web equivalent of printing out "hello world", in which case, I say don't use a 3rd party lib, but write your own.

Is it just to send a web-request to a server? I.e. a 'Get'? Is 'Post' needed? I noted references to use of 'SSL', but I've yet to encounter a tracker that supports it, let alone provides/allows, or requires it. I can imagine the need for such, but is that a common requirement?

Platform requirements might introduce real problems -- i.e. I've no clue what's available on the Mac. Open source browsers or web-fetch utils might have associated libs or code that can be duped, in part, to provide a customized solution containing only what's needed. Off the top of my head, projects like wget, lynx, links(sp?), firefox, konqueror, chrome(goog), "libwww", (used by Apple's browser and WWW Consortium's "Amaya"). Another project, Squid, an OSrc. Web Cache, might have lib or source code modules that could be [re]usable.

'libwww' might be a good fit. *likely* fewer bugs, as it's backed by the WWW Consortium, and likely has better funded developing, AND, _may_ (I think it is likely) be in wider usage.

But not knowing exact requirements, I can't say.

NOTE: Maybe goes w/o saying, but replacing the lib functionality may not fix the base problem of this incident. I.e., if, for example, transmission is stepping on some data-structure used to hold the state for the lib, or is not fulfilling some requirement for the lib (that replacement options might (or, might not) also have), then the prob might not be fixed, might create different symptoms, or might be hidden depending on the replacement.

Last edited 11 years ago by Astara (previous) (diff)

comment:13 in reply to: ↑ 12 Changed 11 years ago by charles

Replying to Astara:

But not knowing exact requirements, I can't say.

GET, POST, and SSL...

comment:14 Changed 11 years ago by leena

Curl maintains a page listing alternatives: http://curl.haxx.se/libcurl/competitors.html. FreeBSD's libfetch might do the trick.

P.S. Astara's last response cracked me up.

comment:15 Changed 11 years ago by charles

libfetch doesn't seem to support post.

That page leena listed doesn't seem to think very highly of libwww. :)

Looking at the number of projects using libcurl (http://curl.haxx.se/libcurl/using/apps.html), I wonder if it really has more bugs, or if they're just better-reported because of the large userbase.

It might be better to go back to libcurl's verbose info.

comment:16 Changed 11 years ago by Astara

I didn't suggest libfetch, leena did, who also suggested the page critical of libwww -- a page of 'competitors to libcurl' by the libcurl project. You are seriously giving weight to what it says about its competitors?

As for exposure of libwww, see http://en.wikipedia.org/wiki/Libwww. It dates back to 1991 (1993 as libwww). I'm certain it's had its share of bugs over the years.

If you feel libcurl's verbose info will solve the problem for you, feel free to use it. I didn't get any output from it at all. (I.e. setting the env var produced no output anywhere).

comment:17 follow-up: Changed 11 years ago by Astara

BTW -- you mentioned you needed 'PUT', but I can't find a reference to that in the code, while I do find get.

You also say you need https, but I'm not aware of any trackers that use it. Do you have any examples?

I.e. -- if it turns out that you only need HTTP GET, which seems to be the common case, then you probably only need 'proxy' support on top of that (though you didn't mention needing proxy support, progs like transmission-gui assume that users need proxy support, so presumably transmission does as well?

FWIW, libwww supports unix, mac and windows (or any platform that supports POSIX and C++). libcurl claims to support "more" platforms in it's comparison to libwww, and mentions lack of multi-thread support in libwww. This appears to be true, but it's also the case that Curl's SSL support is blocking and prevents multi-threaded operation. libwww's documentation dates back to the early 2000's and doesn't appear to have been updated much since. Changes in a change doc are listed up to about 2006, so it's either real stable, OR not very well supported. Amaya is still using it, however, and was last released/updated in 2010. So the library is still being actively used in current projects even though no changes to the library are listed after 2006.

I've no idea of the difference in complexity of usage. libwww allows 'plugins', and has a bunch of features that libcurl doesn't have (and that transmission likely wouldn't need). It may be that when it is built, it only pulls in what is needed (things like caching, HTML4 parsing and such, I don't see as being vital to transmission at this point). If it isn't build-time prune-able, then it might too 'heavy weight' for transmissions needs.

That brings us back to the question: Right now, where are PUT and SSL used? It looks like both scrape and announce use 'GET', so I'm wondering why the requirements for added complexity at this time, when it's known that the current features don't work?

comment:18 in reply to: ↑ 17 Changed 11 years ago by charles

Replying to Astara:

BTW -- you mentioned you needed 'PUT'

I'm confused. Where?

You also say you need https, but I'm not aware of any trackers that use it. Do you have any examples?

transmission-remote needs https to communicate with encrypted transmission daemons.

comment:19 Changed 11 years ago by Astara

Sorry, I confused POST w/PUT... dang P's

So it's the daemon and remote functions that need the other functions. That conceptually simplifies things.

FWIW, I reduced my number of active torrents down to ~150 total (less than 1/4th what it was previously), and I'm seeing some progress on the tracker w/my client now.

I still notice the first 'symptom' though it could be a symptom of another problem. That symptom, BTW, is that when I added new torrents to the same tracker (the one that used to have >600, now>150), they go into download state, but stat with all 0's for trackers/peers, 'forever' (or until I restart the daemon). Generally, for my main tracker, I run a restart script after adding torrents.

comment:20 Changed 11 years ago by charles

Tell me about your system setup. What version of curl is installed?

comment:21 Changed 11 years ago by Astara

I'm running on a Suse 11.2 base system (x64).

Curl packages:

curl-7.19.6-2.1.x86_64 libcurl4-32bit-7.19.6-2.1.x86_64 libcurl4-7.19.6-2.1.x86_64 libcurl-devel-7.19.6-2.1.x86_64

Any others?

BTW -- do you have a case for this in your test suites? I.e.

a case with ~5000 active torrents, all available to upload (even if only 1-2, or none actually are)?

Just wondering if this is something that is working in your test suite that's failing for me or if there's a reason to believe transmission works with higher number's of torrents....

Thanks

comment:22 Changed 11 years ago by jordan

I've never seen a session with 5000 torrents. You didn't mention that before!

I suspect that's more than >99% of T's users ever see.

comment:23 Changed 11 years ago by jordan

Did you mean 500?

500 Works For Me.

comment:24 Changed 11 years ago by Astara

I have over 1200. I had over 600 w/1 tracker.

I suggested 5000, as I thought it might be a limit that would be higher than what most users will have, but it would be likely to uncover resource usage problems.

Testing a product well above what the users would normally be expected to encounter is a very common way of uncovering potential problems. You want to keep well ahead of of where users will use things, since users will always do the 'darndest' things that make no sense to a developer. You need to program defensively and test extremes to see that they are handled gracefully. It's a form of 'stress' testing.

Stress testing often uncovers flaws in algorithms (allocation problems, timing problems, etc).

Example: @ SGI, they tested submitting 10,000 print jobs at one time.

The system took about 4-5 hours to clear the queue, but it didn't crash.

Or, I regularly have load averages over 500 on my linux machine. There was a time that such a load average was almost guaranteed to lock up most machines. Now -- it takes about 90 seconds to clear/finish (doing a kernel "make -j" to allow unlimited parallelism).

comment:25 Changed 11 years ago by livings124

So in other words you consider it an issue that the app doesn't scale to unrealistic and unreasonable limits? And you mark this as a critical issue?

comment:26 Changed 11 years ago by x190

Quoting from the original ticket description:

"I have over 600 torrents registered with this 1 tracker, alone, now, out of 1180 listed torrents" "transmission gets lost when trying to use more than ~500. Not sure the exact number, but 629 is definitely a fail."

I can recall a forum post about a user running 1500 torrents, so this is not completely unheard of.

BTW, do developers of software attempt to discover the limits for their applications or is that left to users to discover using various OS with varying resources? Do the Transmission developers have any recommendations in mind regarding upper limits of active torrents?

comment:27 Changed 11 years ago by Astara

livings124: So in other words you consider it an issue that the app doesn't scale to unrealistic and unreasonable limits? And you mark this as a critical issue?

Isn't usage by Distro Vendors, to distribute their wares, in addition to, (or in lieu of) HTTP/FTP solutions an oft made example of how BT can be use for legal purposes? I don't know about all vendor websites, but SuSE allows you to download via HTTP or FTP any of the RPM's for their distro's, separately. They have binaries, sources and multiple architectures (ex. see mirror @ http://mirrors.kernel.org/opensuse/distribution/[xx.x]).

Just the 64-bit binary RPMs for 11.3 @ http://mirrors.kernel.org/opensuse/distribution/11.3/repo/oss/suse/x86_64/ tally 6586 available binary RPMs. Under the 11.3 release alone, for example (this mirror does NOT include the source RPMs, BTW), there are over 15,000 RPMs.

In addition to not including sources, this directory also doesn't update RPMs (for which there are many -- at least 2 for each update/arch, since they are distributed as full binary RPMs as well as differential binary RPMs.

If one starts thinking about businesses being able to use BT for distribution, AND with the same reliability that they use FTP/HTTP now, a test suite that would test for a client serving 100's of thousands of torrents, limited only by machine resources -- and even then, "failing" [b]gracefullyb, would [i]eventuallyi seem prudent.

I can't really see one of these outfits using 'utorrent' or such, but a 'daemon' based client, designed to be used from/with multiple clients, but run on headless servers would seem to be exactly what they would look for.

Shouldn't the client be *scalable*, limited only by the actual limitations of the machine it is running on?

comment:28 Changed 11 years ago by livings124

I don't understand that explanation. Why would a business have thousands of transfers in a single Transmission session in a regular-ole PC? Bittorrent distributes the load with the other peers - that's the whole point. Of course we should do everything to get the app running as ideal as possible, but a vague "critical" ticket is pointless.

comment:29 Changed 11 years ago by jordan

But there are people who run 1000s of torrents in transmission-daemon. I still suspect libcurl.

Astara: see TR_CURL_VERBOSE @ https://trac.transmissionbt.com/wiki/EnvironmentVariables

comment:30 Changed 11 years ago by Astara

livings: Are you asking why a business would have 1 "server" to serve all of it's files? It's what they do. It's common practice. They often have multiple machines load-sharing the main server, but it's still 1 page.

I've pointed to a case where 1 distribution takes 15K files. With updates and sources it's easily double that. Suse has multiple distros available for download -- all served from 1 server. So you potentially could be talking almost 100K files.

You seem to think 600 files is 'unreasonable'. Presumably you would consider something @ 500 files or less, 'reasonable'. That would require about 2000 machines using your 'regular-ole PC' example. To get the redundancy offered by 3-4 machines each operating in round-robin fashion serving large websites, that be 6000 - 8000 machines.

That said, though, I don't see anything about anyone using a "regular ole PC" -- what is that? Are you referring to the 640K "no-one would ever need any more [on any regular ole PC or otherwise]", regular ole PC's, or the 2-6 core, 2-12G, multi-Terabyte regular ole PC's of today, or the 12-48 core, 256G, multi-Petabyte "regular ole PC's" that businesses often use (over 10 years ago, Windows 2000 Server could use up to 64G of RAM)?

Are you trying to say that you'd consider it 'reasonable' for businesses to replace ~4 HTTP & FTP servers w/6000+ machines so they could run a 'reasonable limit' in a BT client?

Also, you are missing a big point here. You said "thousands of transfers". I never said anything about transfers. I'm saying they'd have 1 session with thousands (or 100's of K's) of torrents listed -- the majority of which, presumably, would be *idle* (but available for downloading from) at any given point.

I don't have 600 actively downloading (or uploading) torrents w/1 tracker (or 1200 total). Max I've seen 'transferring' at any given point in time has been about 40. The rest are 'idle' (not halted, but awaiting connections). There is a difference. Typical is 10-15 active connections at any given point in time.

The problem has nothing to do with the # of active connection, as far as I can tell, but with the total number of *idle* clients listed with a single tracker.

That's the concept I'm seeing companies using: having 1 server (or virtual round-robin server), that makes 10's of thousands of RPM's available for download -- that means 1 machine (with multiple backups) would list say 100,000 files in it's "idle" state -- just as HTTP/FTP servers have similar numbers of files awaiting download. But only a small fraction of those files are actively being downloaded at any given point (though some servers handle loads of 100's of clients).

That's a huge difference. Having 100K RPMs in 'idle' state is functionally equivalent to an HTTP/FTP server having 100K RPMs available for transfer -- they are 'idle', and shouldn't create a problem for a modern PC -- even a user PC w/say 6 cores (a current processor), and 24G memory (~ 100$ of memory at today's prices).

comment:31 Changed 11 years ago by livings124

What is an acceptable number to support to solve this critical issue? It would seem to me that as machines and/or networks increase the app will scale as well. There is a ton of text above, but I don't see any S.M.A.R.T. goals.

comment:32 Changed 11 years ago by jordan

This is a distraction. People already run transmission-daemon with thousands of torrents. If Astara's system can't, we should learn how it's different.

Astara: see TR_CURL_VERBOSE @ https://trac.transmissionbt.com/wiki/EnvironmentVariables

comment:33 Changed 11 years ago by jordan

You said "thousands of transfers". I never said anything about transfers... I'm saying they'd have 1 session with thousands (or 100's of K's) of torrents listed.

I think this is a case of lost-in-translation from Linux to the Mac client, which calls torrents "transfers."

comment:34 Changed 11 years ago by jordan

Any news on TR_CURL_VERBOSE?

comment:35 Changed 11 years ago by Astara

I set it to 1, and set the FD, to ERROUT. I'm seeing no output on errout, though am seeing output on the --logfile arg.

Where is it supposed to be going?

Is it only enabled under debug maybe? I don't run the debug version normally -- it messes up the the version string given to the tracker which changes the behavior (to non-functional).

Last edited 11 years ago by Astara (previous) (diff)

comment:36 Changed 11 years ago by jordan

Try it this way:

$ TR_CURL_VERBOSE=2 transmission-gtk 2>log

comment:37 Changed 11 years ago by jordan

We'd like to figure out what's causing this bug for you, but we haven't heard back from you in a while. Could you please provide the requested information? Thanks!

comment:38 Changed 11 years ago by livings124

  • Resolution set to wontfix
  • Status changed from new to closed

Closing for lack of a specific and achievable goal.

comment:39 Changed 11 years ago by Astara

1) I don't run transmission-gtk, I run the daemon. Perhaps this is part of the problem in the expected debug output not coming out? 2) I was still waiting for an answer to my questions before proceeding. Having you toss out another test case to try when I didn't even know where the output was to go or whether or not it was supposed to work w/o debug being enabled seemed pointless, so I wasn't sure how to respond. 3) My computer has been down for about 3-4 days due to construction work going on @ my house. 4) same work hosed the disk and backup that the torrents were on, so it will be some time before I can reproduce the problem. 5) It's a sad reflection upon the project that such an immature and destructive personality as livings has any place on this project. He enjoys tearing other people down and goes out of his way to make comments that not only are false and that point to his laziness or inability to understand the problem, but that also are designed to be inflammatory, inviting off-topic responses about his behavior

Specific goals were given. It may be true that the goals are not achievable by him, but that doesn't mean they are not achievable. Setting the resolution of a bug that causes data corruption and allows *CHEATING* -- unlimited downloading of files without being charged for them, is a great way to get your client banned.

For all the lame excuses for not allowing user-control of a torrent's verified state because you are afraid of something that wouldn't cause the client program to be banned but, at worst, would cause the user of that client to be 'ignored' on a particular torrent (ignored on 1 torrent != 'banned') but then ignore a bug that allows unlimited downloading from a site without it affecting your ratio highlights the problem with having livings on the projects.

Let's see, all wee need is someone to read this bug and knows that network connections are used to update your 'stats' on a tracker site -- and then post that transmission stops updating tracker sites with it's client transfer information at some random point (ostensibly based on # of connections, but that '#' (number) being based on unknown factors (client memory, torrent size, network conditions, versions of libraries linked to and what patch level they are up to -- and who knows what else), but that is pretty reliably repeatable if you keep adding torrents.

Now that would be bad enough to warrant a *ban*, not a temporary ignoring (a difference that highlighted livings's lack of knowledge about the BT-protocol: what it provides (in terms of internal data verification) & how it is used), all by itself, but then livings goes and tells everyone that such a bug *won't be fixed* due to his not liking the test case that provided a specific and achievable goal as a start -- one that he didn't even attempt to reproduce. He even claimed that my usage, about 1/10th the test case's size was unreasonable -- something that others on the project told him was NOT unreasonable and was already being used by others -- but this didn't change his opinion about it being 'non-specific, unreasonable and achievable'.

Given that he was given a specific and achievable goal, then he closes the bug with w/a wontfix and claims that he wasn't given such a goal demonstrates either a high level of stupidity (as measured by the inability to remember the specific test cases he commented on 4 weeks ago or inability to read the history of the bug to remind himself of his own comments) or a deliberate *maliciousness* on his part. I don't think it is the former which is why he would seem to be a good fit in any role where he deals with the public. Some people contribute 'best' when they are kept in the 'back room' (ex. technically adept, but socially clueless).

Unfortunately, his malicious attitude takes precedence over all other goals and affects his responses to everyone -- in their requests/reporting of bugs and features. As this case represents, this can have a very deleterious effect on this project: having it known that the lead developer(s) don't care about data corruption to the trackers and that this is a fairly straight-forward way to do 'stat cheating' -- probably the single most sensitive issue for determining a client's acceptability to a bt-site, is a great way to ensure some sites will ban this client.

While it would be helpful if I could debug it for you, I've been working around the problem for myself by reducing the number of torrents I host -- since I do want my stats reported. But given that this bug also presents a way to block reporting of how much I download it can be used for the opposite purpose.

Even if I had a way to reproduce this (which, now, I don't as most of my torrents are gone), I'm not strongly motivated to spend alot of my time on a bug that I can usually work around. However, given that this bug also can be used to turn transmission into a 'cheating' client, I'd think members of the dev team would be strongly motivated to reproduce a test case that creates the problem so they can try to fix the problem.

A 'responsible' thing for someone to do in this case, might be to announce this stat-exploit to bt-sites so they can take appropriate measures to protect themselves -- especially given the closing being 'wontfix' and 'dontcare'. While this could be seen as "spiteful" and no doubt would be, for many, it's still true that it would also be a responsible action, so it's hard to argue that it shouldn't be. However, given that I'm not a spiteful person, and given that I would prefer to continue being able to use the client (and that I'm not using for exploitation) it wouldn't be in my best interests to shoot myself in the foot in order to spite my big toe, as it were...

Regardless, you really should consider investigating a stat-exploit bug more thoroughly rather than blowing it off with a clueless comment.

comment:40 Changed 11 years ago by livings124

Troll is trolling. Not worth the effort anymore - a history of your posts says enough about you. More text does not make any of your points valid. My only regret is marking this "wontfix" instead of "invalid." And yes, this is the opinion of all the devs. Please, no followup is necessary at this point.

comment:41 Changed 11 years ago by jordan

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening to change resolution from "wontfix" to "invalid"

comment:42 follow-up: Changed 11 years ago by jordan

  • Resolution set to invalid
  • Status changed from reopened to closed

Reclosing as invalid.

The 1,049 word flame in comment:39 misses the point: this ticket was closed because only the OP is seeing this behavior, and was not responding to requests for more information.

If -- if -- there is a valid issue here, a concise list of steps to reproduce the issue would be more helpful than 500 words about the definition of "scalability."

If you are still having issues with TR_CURL_VERBOSE, asking about that would be more productive than a thousand words about how destructive and immature the developers are.

Lastly, the suggestion that we "announce this stat-exploit to bt-sites so they can take appropriate measures to protect themselves" is laughable:

  • Nobody else is reporting this behavior.
  • Nobody on the development team has been able to reproduce this behavior.
  • The OP is no longer able to trigger the behavior.
  • I have seen, firsthand, someone running Transmission with over 2000 torrents without this behavior.

Painting this ticket as an "exploit" seems less about facts and more about trying to get the development team to jump to attention.

Astara, if this is a real issue, a concise list of steps to reproduce the issue would be helpful, as would the TR_CURL_VERBOSE log.

Last edited 11 years ago by jordan (previous) (diff)

comment:43 Changed 11 years ago by ijuxda

The problem is I've slowly been losing upload credits with this tracker, till in the past month or two, it's dropped to zero.

Could the problems in this ticket be related to other issues with the announcer code (e.g. as described in #3870 #3931)?

comment:44 Changed 11 years ago by jordan

The symptoms are different, but it's an interesting question.

Astara, were the issues preceded by 404 tracker responses?

comment:45 Changed 11 years ago by Astara

Not that I am aware of -- but the site can take a long time to respond -- sometimes timing out. I.e. it's possible that some of the query requests would fail with a timeout.

As for your request for the TR_CURL_VERBOSE log, you didn't respond to multiple requests for information on where the output should be expected, whether or not the output would be generated in a non-debug version of transmission -- both items I was awaiting answers on when you claimed I hadn't answered you.

Also something said earlier that could cause confusion: if Mac users call all torrents, transferring or not, 'transfers', then when do they call torrents that are actively transferring (vs. not)? I.e. since most of the time when I seed something they aren't transferring.

Last edited 11 years ago by Astara (previous) (diff)

comment:46 Changed 11 years ago by Astara

BTW -- other users are seeing this bug. #3931 is a duplicate of this one in its early stages.

See the comment from 7 weeks ago where I said an early stage of this bug was:

"The other symptom which I started noticing around 500 (maybe above 450), was *that newly added torrents wouldn't start downloading* from this particular tracker -- they would other trackers but not that one. I thought it was the tracker that was at fault, but wasn't closely monitoring upload stats due to real life issues and ratio-stats not being that high a priority to monitor ... "

comment:47 follow-up: Changed 11 years ago by KyleK

Regarding the output of TR_CURL_VERBOSE, why not run the daemon in foreground mode?

$ TR_CURL_VERBOSE=1 transmission-daemon -f --log-debug -g /path/to/your/config/folder | tee transmission.log

This will effectively block the console, so if you want to run it for a while longer, start it from a screen.

comment:48 in reply to: ↑ 47 Changed 11 years ago by Astara

Replying to KyleK:

Regarding the output of TR_CURL_VERBOSE, why not run the daemon in foreground mode? ...

If I had the time and interest to work on this right now, that might be a good idea. It's becoming more obvious that other people are running into this bug in its early stages. By far, though I've described conditions for setting up the bug, a test case (that some people didn't like, but turn up the simultaneous torrents (not transfers) and it will cause the problem), a likely candidate (I'd go for 90% certain it's a bug in the Curl lib and how it's being used in transmission); I tend to believe Curl is a 'fail' for the same reasons I found it to be a 'fail' when I tried to use it in an application: it hides too many details and doesn't allow transparency into important events -- either because the lib itself doesn't detect them, or because it's programmed to ignore them.

There are several 'null' events that can be returned from a web server in addition to the web server going into a state of being half-open/half-closed.

comment:49 in reply to: ↑ 42 ; follow-up: Changed 11 years ago by Astara

Replying to jordan:

Reclosing as invalid.

The 1,049 word flame in comment:39 misses the point:

Go look up the definition of 'flame' on wikipedia. That was not a flame so stop your grandstanding.

this ticket was closed because only the OP is seeing this behavior

Not true. Others are seeing the 1st symptoms I experienced.

, and was not responding to requests for more information.

Not true. Last response was me asking where the output was supposed to go and if the output would be displayed in a non-debug version. I never got a response.

If -- if -- there is a valid issue here, a concise list of steps to reproduce the issue would be more helpful than 500 words about the definition of "scalability."

A nice concise list would be nice -- why do you ask one of your developers to get right on that. I only spent a few months getting to the data I had. It's not easy to find out what is going on or what is causing it. It's starts out with vague unreliably reproduced problems -- and grows. But it's definitely there when it comes on full force -- with 500+ torrents, all being reported to 1 tracker site, I would see TCP connections overwritten by transmission withing the first 3 minutes. All traffic to or from the tracker site would stop within 10 minutes. Transmission doesn't recover -- it doesn't send more queries because it has overwritten it's connections and the library it is using is hosed. Over an hour of subsequent monitoring showed it sending (and receiving) NO traffic to the tracker site.

I gave you steps to reproduce this that were as concise as I had -- create test case w/100K torrents on the same tracker site. Very concise and simple. The torrents don't even have to use unique data -- they could all describe the same physical file -- make them torrents of about ~1000 pieces each (averaging recommended torrent size of 500-1500 pieces). Now it's not likely a network connection over vmware will work, since it may not drop traffic due to collisions the way a real network would. Chances are it will implode at a lot less than 100K. Have the tracker site get traffic updates ever 30 minutes.

If you are still having issues with TR_CURL_VERBOSE, asking about that would be more productive than a thousand words about how destructive and immature the developers are.

I did, it was ignored. Just as you ignored your your writing of this response.

Lastly, the suggestion that we "announce this stat-exploit to bt-sites so they can take appropriate measures to protect themselves" is laughable:

Where do you see a suggestion that you announce this to bt-sites?

I was talking about a hypothetical 'responsible' person... like someone who found the exploit...but then I mentioned something about shooting self in foot and that not being a great idea...

  • Nobody else is reporting this behavior.

Not true. The first symptom is reported in bug #3931

  • Nobody on the development team has been able to reproduce this behavior.

From what I've heard, no one has even tried.

  • The OP is no longer able to trigger the behavior.

--- The OP doesn't have the torrents that reproduced this, but given input from bug 3931, then depending on the tracker site, it can happen with alot less data and alot more frequently than my previous experience indicated.

  • I have seen, firsthand, someone running Transmission with over 2000 torrents without this behavior.

An anecdotal story of 1 person running that many torrents means nothing. All of their torrents could be very different -- using different tracker sites, or none at all, using PEX. The torrent site they talk to might be several servers - not 1 machine also hosting a website and forum...or any other number of combinations. You didn't even say what platform. A case of 1 client working 'somewhere' under 'some circumstance' is hardly what I would call testing, let along 'exhaustive' or 'stress' testing.

Painting this ticket as an "exploit" seems less about facts and more about trying to get the development team to jump to attention.

It is totally about the facts -- in that I can download as much as I want and my transfer amounts are not recorded. That was a stated fact from the beginning as well as in the private forum for a tracker site I asked about it on before looking in transmission. That's not something 'made' up just for sake of argument, but a core-fact of the bug. If it was 'new' information, you might argue that, but the fact that was a central symptom from the beginning makes it hard to argue that it's been 'cooked up' just to draw attention to this bug -- since without those core symptoms, the bug wouldn't exist!!! There'd have been no point in filing it or searching through a TCP dump for why.

Your logic is flawed.

Astara, if this is a real issue, a concise list of steps to reproduce the issue would be helpful, as would the TR_CURL_VERBOSE log.

Already answered above.

comment:50 in reply to: ↑ 49 Changed 11 years ago by sardok

Even though you you seem to have some insightful comments, other seem intentionally like trolling. And that's from the viewpoint of a total outsider. If you want the devs to spend their valuable time fixing a problem that affects you, don't piss them off. It doesn't matter how this spat started, please don't continue it.

comment:51 Changed 11 years ago by Astara

I reported this 2 months ago. I've submitted information as possible. It was closed out as 'wontfix', then changed to 'invalid' due to no malice on my part. The bug is still closed even though other people are encountering it. That indicates that even though I didn't start with malice, I am still receiving such from the other end. You don't want 'me' to piss off them?

Their actions indicate some level of already pissiness, so it seems like them being pissy has little to do with me than something that started in them. I can't very well be one who ends such a outburst. I didn't close the bug due to my inability to reproduce it. I wasn't demanding that they fix it nor threatening. I noted that it was livings response that was deliberately inflammatory -- inviting a defensive response. If you see something I said to instigate that, please let me know, cuz I don't think so.

Last edited 11 years ago by Astara (previous) (diff)

comment:52 Changed 11 years ago by Astara

FWIW, I wrote scripts to check and repair the various problems caused by this bug with one checking the tracker website. If it detected from the website that the tracker thought I was seeding '0 torrents', then it would restart trmd, then wait around for up to 5 minutes then try again, it would continue that for about 3-4 cycles. It never needed to cycle through a 2nd time (except when the web was down from my end, then I got emails a couple of times an hour that it tried to restart and get status).

From the tracker forum, this only happens (shows 0 trackers) when it hasn't heard from me in over an hour. So it's obvious that at least *some* communication could usually continue for, often, over a day before dying completely.

If the connections are 'essentially' overwritten in a somehwhat random manner, this would make sense that some might stay open for a long time...

Also, FWIW, this tracker, like other trackers, at one point was the target of a DOS that took it off line for over a week. As a result they had to implement some protection mechanisms to prevent 'flooding' (for example, I might wonder, what would 500 torrents all requesting scrapes from the same IP look like?). Such traffic may be dropped, timeout or take a very long time to answer. It may be the case that the connections are even held open on the other end -- disallowing a 'close' TCP to succeed on the client end and requiring a very long, 30-40 minute port timeout to ensure the ports aren't illicitly re-used -- and it didn't sound like 30-40 minutes would be enough in some cases, but the comments I read about such edge cases were vague. Behavior in such cases seemed to be subject to a certain amount of unpredictability.

There are also cases documented in the squid documentation about servers 'half-closing' connections -- where one direction is closed but the other direction may still have a significant amount of data to drain. In some circumstances with pipes this leads to problems with the side reading the data getting a signal indicating a hangup on the other end -- and forcing a close of the receiving end, as well, even though all the data hasn't been read. I don't know if something like that could occur in the networking stack as well, but it's possible given the conceptual similarities of a local and a network pipe/stream.

If it was a problem that had a simple formula for repetition, I would have posted it. But like most users, I'm first presented with 'symptoms'. I tracked those symptoms to transmission invalidly overwriting it's own TCP connections in wireshark. I know that is wrong. Now how to reproduce all the steps, reliably to get to the point where transmission starts walking on itself, I don't know, but I'm pretty sure it's in the curl library, or 2ndly in transmission's usage of the curl library -- but having looked at that, I don't see anything overtly wrong, and given my own problems with curl's not reporting exception conditions 'well' I would suspect there.

Problem is, is that curl is a lib that is useful for higher level apps that don't need to know the lower level details about failures and mostly for user-driven applications/usages -- not automated usages like transmission. It is being shoe-horned into real-time apps but the results have been less than spectacular.

If it was my own project, I'd write my own lib (more likely steal the routines from some existing open source project, w/credit, of course), but then butcher it into something that custom fit my usage or I'd look to some O.S. project to write my own routines. If I was looking for such, I'd look to other, existing applications to pattern them after or use as reference. An example of one such application would be squid.

Of course I might look at the routines for these types of functions and find them 'inscrutable' and look for something easier to make use of. But something like squid has 1000's of requesters banging on it asynchronously and has to serve back all of the responses that also come in, asynchronously, and it has to be efficient in doing it.

At the low-performance end of the scale would be routines designed for a background batch fetch utility. An in-between example would be interactive web browsers as they are usually only driven by 1 user, but there can be a large burst of requests all coming back from different websites -- which in turn may generate more asynchronous web-requests.

However, I would only be inclined to talk about such things with someone interested in performance and certainly would feel out of place raising such in a critical, competitive, atmosphere pervaded with the negativity of those looking for opportunities to take pot-shots, since such factors are the antithesis of creativity.

comment:53 Changed 11 years ago by Astara

In case I didn't mention it before, it is the case that randomly, often (but not all the time), I'll see 1-2 torrents with a state of 'tracker did not respond', but that will usually clear itself some time later.

Under what conditions would transmission keep uploading/downloading data to clients, yet not recognize that it is no longer talking to a tracker? I.e. from what someone posted in another bug, it looks like it should retry the tracker if there was no response. Yet when this is in serious failure mode w/over 500 torrents to a specific tracker site*-*, it stops talking to the tracker site in the first few-several minutes and I see no traffic after that (waiting over an hour in one test). Yet torrent transfer traffic continued. I don't know if tracker traffic to other sites was affected.

*-*- Note, I believe the number of torrents required for various types of failures depends on the communication success of the client to a particular tracker site, but most of all, I believe it depends on the tracker site and how fast it processes tracker requests and what it does when tracker requests come in "too fast" (does it detect it? does it drop requests? does it delay them? Do the they connect and then stay open?...etc...). I put "too fast" in quotes since what's too fast for 1 server might be a drop in the bucket on another.

Anyway -- I was wondering if transmission operated in a loop -- and if it was no longer talking to a tracker site -- wouldn't that mean that something has hung? If it is in a loop how are the torrents continuing to operate?

I sure wish I knew where the debug output was supposed to go an whether it was enabled in a non-devel version of transmission.

Last edited 11 years ago by Astara (previous) (diff)

comment:54 Changed 11 years ago by jordan

Xref: The fun & games continue at #3997

Note: See TracTickets for help on using tickets.