Opened 12 years ago

Closed 12 years ago

#2529 closed Bug (fixed)

New torrents load, but don't progress

Reported by: wfaulk Owned by:
Priority: Normal Milestone: 1.80
Component: Daemon Version: 1.76
Severity: Major Keywords: needinfo
Cc: 8nmenan02@…

Description

I have transmission-daemon running under Linux, watching a directory for new torrents. After it has been running for some time on the order of a week, it will often enter a state where it will find and load new torrents in that directory, and it will mark them as started, but it never finds or connects to any peers or downloads anything. (The web interface says "Downloading from 0 of 0 peers".) If I restart transmission-daemon, those torrents start transferring normally.

Attachments (1)

web.c (21.1 KB) - added by charles 12 years ago.
experimental replacement of libtransmission/web.c, rev 1

Download all attachments as: .zip

Change History (32)

comment:1 Changed 12 years ago by wfaulk

  • Resolution set to invalid
  • Status changed from new to closed

I just noticed that 1.76 has been released. I'm testing that one now, and I'll follow up if it has the same problem.

comment:2 Changed 12 years ago by wfaulk

  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Version changed from 1.75 to 1.76

Still happening in 1.76. I'm not sure what to look for, so I'm going to leave it in its broken mode until someone can suggest what kind of info to gather.

comment:3 Changed 12 years ago by wfaulk

  • Summary changed from New torrents from watch directory load, but don't progress to New torrents load, but don't progress

I've verified that torrents sent straight to the daemon also exhibit the same symptoms.

The activity "tab" of the inspector frame in the web interface shows zero for every value.

comment:4 Changed 12 years ago by charles

  • Keywords needinfo added
  • Resolution set to invalid
  • Status changed from reopened to closed

comment:5 Changed 12 years ago by wfaulk

  • Resolution invalid deleted
  • Status changed from closed to reopened

If I stop and then immediately restart transmission-daemon, those same torrents immediately start transferring. So unless one of the items in the troubleshooting document always coincides with me restarting the daemon, I'm pretty sure it has something to do with Transmission.

comment:6 Changed 12 years ago by wfaulk

I've been running transmission-daemon in the foreground for a few days now. It has started to fail loading peers for new torrents again. The console output shows, for example:

[21:03:49.114] Found new .torrent file "ap_6231500k.wmv.torrent" in watchdir "/mnt/vg0/torrent/torrent/"
[21:03:49.153] ap_6231500k.wmv: Couldn't read resume file
[21:03:49.157] ap_6231500k.wmv: Queued for verification
[21:03:49.157] ap_6231500k.wmv: Verifying torrent

But never says anything about getting peers from the tracker. In fact, it doesn't seem to be connecting to any tracker for any torrent. DHT is still printing activity logs.

Are there any further logs I can enable?

comment:7 Changed 12 years ago by wfaulk

I just noticed that transmission-daemon has 58 connections in the ESTABLISHED state and 129 in the CLOSE_WAIT state, all to the same tracker. I have a lot of active torrents associated with that tracker (more than 187), but that still seems odd. In addition, the connections never seem to change.

rasp% sudo /usr/sbin/lsof -n -p 20283 | grep 10.35.52.44 > /tmp/s1
rasp% sleep 300
rasp% sudo /usr/sbin/lsof -n -p 20283 | grep 10.35.52.44 > /tmp/s2
rasp% diff /tmp/s{1,2}
rasp% echo $?
0

comment:8 Changed 12 years ago by wfaulk

  • Cc 8nmenan02@… added

It's been 12 hours, and it's still the exact same list of connections to that tracker: same states, same ports, same file descriptors.

comment:9 Changed 12 years ago by charles

wfaulk: and all of the CLOSE_WAIT sockets are to trackers -- rather than, say, peers -- is that correct?

comment:10 Changed 12 years ago by wfaulk

I was searching lsof output for that tracker in particular, so I'm not sure of the states of other connections. I can say for sure that those 129 were all to the same tracker.

I have since restarted the daemon, and there are currently no connections (to any computer) in CLOSE_WAIT, but there are 78 ESTABLISHED connections to that tracker.

comment:11 Changed 12 years ago by wfaulk

I now have 8 connections in CLOSE_WAIT. Seven are to that tracker. One is to a peer.

comment:12 Changed 12 years ago by wfaulk

Now it's 31 in CLOSE_WAIT, all to that tracker.

I'm now periodically recording the connections to that tracker. What I'm finding is that the connections are seldom closed, only added to. Last night, there were 99 connections to the tracker. This morning there are 131. A diff shows that 98 of the connections from last night are still there; only one went away. Should tracker connections really be staying open for 9 hours or more?

comment:13 Changed 12 years ago by charles

The odd thing is that libcurl should be handling those fds... not Transmission.

What version of libcurl do you have installed?

comment:14 Changed 12 years ago by wfaulk

curl-7.19.5. I'll try upgrading to 7.19.7....

comment:15 Changed 12 years ago by wfaulk

Before I restart to use the updated libcurl, here's the status of t-d's connections:

194 connections to the tracker, 92 in CLOSE_WAIT, the rest in ESTABLISHED. Same list of connections as before with new ones added, but another two have gone away, for three total. That is:

rasp% diff /tmp/u{1,3} | egrep -e '^<' | wc
      3      30     387

There are another six CLOSE_WAIT connections, all to another tracker.

comment:16 Changed 12 years ago by charles

I have a Fedora 11 box that has had the same Transmission session running for over a week. While I don't see any CLOSE_WAIT connections, I see over two dozen ESTABLISHED connections that have been unchanged over the last hour. There's definitely something fishy here.

comment:17 Changed 12 years ago by wfaulk

Oh, mine is an OpenFiler? machine with a lot of additional software installed, if that makes any difference to you.

comment:18 Changed 12 years ago by charles

wfaulk: I'm using "lsof | grep transm" to look at the sockets. Is that better, worse, or no change from using the mechanism you listed in comment 15?

comment:19 Changed 12 years ago by wfaulk

I've been doing "lsof -n -p <t-d_pid>". Your way is slightly worse only in that it might match something that it shouldn't. "-n" just prevents a DNS lookup on the IP addresses, which is mostly useless in this context, and saves a lot of time when there are a lot of peers connected.

comment:20 Changed 12 years ago by wfaulk

The upgrade to libcurl 7.19.7 maybe seems to have had some effect. 31 in CLOSE_WAIT, all to that one tracker, which is about on par with before, but the tracker only has 5 in ESTABLISHED.

comment:21 Changed 12 years ago by charles

I've made an experimental modification to libtransmission/web.c that does the following things:

  • Added a periodic timer that unconditionally pumps all the fds, just in case there's a bug somewhere either in libcurl, or libevent, or (more likely ;) web.c that causes socket events to not get propagated from libevent to libcurl.
  • Added code to cull out CURL* handles that have taken longer than we've allocated for them. By my reading of the libcurl's docs, curl should be culling these itself, but maybe we're doing something wrong that prevents that.

Note this is experimental... actually it has a bug that will cause it to not shut down cleanly when you exit Transmission. That doesn't matter for these tests though.

Changed 12 years ago by charles

experimental replacement of libtransmission/web.c, rev 1

comment:22 Changed 12 years ago by wfaulk

I've been running with the test version of web.c in place for about eleven hours, and everything looks good. No extraneous connections to the tracker, though it's clear that it's the culler doing the work, as there are often connections in CLOSE_WAIT for longer than one would expect, but they do get removed.

comment:23 Changed 12 years ago by wfaulk

Still good.

comment:24 Changed 12 years ago by wfaulk

It's now not accepting RPC (transmission-remote, etc.) connections, and there are five of them in CLOSE_WAIT. It accepts the connection, but then doesn't do anything with it.

comment:25 Changed 12 years ago by wfaulk

It's also not accepting web console requests, which makes me wonder if it's accepting incoming torrent connections, since it's the same port. But I don't really have any way to check, since I can't get any administrative connection.

comment:26 Changed 12 years ago by charles

wfaulk: is this better, worse, or no change in 1.80 beta 1?

comment:27 Changed 12 years ago by mortennorby

Just as a FYI - a similar thing happens on my Mac.

Specifically, the number of connected peers diminishes over a couple of hours, and download often goes to 0, while upload continues at full speed.

Re-launching the app tends to get T to re-connect with a larger number of peers, which then decreases again over a few hours.

In one case (which may be a different problem) even after re-launching, connections would not go up, and the web interface did not respond.

Deleting all .resume files and dht.dat got the system going again (including the web interface), while still losing the connections slowly over a few hours.

As a verification, Vuze running on the same machine, and hence same network, manages to connect to more peers, to keep downloading from them, and even if not downloading, stay connected.

The network configuration is with a firewall that does not allow incoming connections.

comment:28 Changed 12 years ago by charles

mortennorby: what svn revision of Transmission are you seeing this behavior in?

comment:29 Changed 12 years ago by mortennorby

It was r9695.

comment:30 Changed 12 years ago by wfaulk

I've been running 1.80b3 for a day or so now with no problems. I'll update if I see any change.

Sorry for the slow response.

comment:31 Changed 12 years ago by charles

  • Milestone changed from None Set to 1.80
  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.