Opened 10 years ago

Closed 10 years ago

#3217 closed Bug (invalid)

Daemon running but doesn't respond on Synology NAS

Reported by: sarav Owned by:
Priority: Normal Milestone: None Set
Component: Daemon Version: 1.93
Severity: Critical Keywords:
Cc: therve@…

Description

I'm running the daemon on a Synology NAS. About once a day (that's how often I check), the transmission daemon just hangs and doesn't respond to any "messages". The processes are still running:

$ ps aux | grep transmission
 3417 blah      7596 S   transmission-daemon -f 
 3418 blah      7596 S   transmission-daemon -f 
 3419 blah      7596 R   transmission-daemon -f 
 3420 blah      7596 S   transmission-daemon -f 
<snip>

the port is still open:

$ netstat -tnl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:9091            0.0.0.0:*               LISTEN      
<snip>

but the daemon doesn't respond to the RPC calls or HTTP requests.

$ transmission-remote 9091 -si -b
posting:
--------
{"arguments":{},"method":"session-get","tag":0}

--------
* Couldn't find host localhost in the .netrc file, using defaults
* About to connect() to localhost port 9091 (#0)
*   Trying 127.0.0.1... * connected
* Connected to localhost (127.0.0.1) port 9091 (#0)
> POST /transmission/rpc HTTP/1.1
User-Agent: transmission-remote/1.93 (10621)
Host: localhost:9091
Accept: */*
Accept-Encoding: deflate, gzip
Content-Length: 48
Content-Type: application/x-www-form-urlencoded

$

I tried accessing the daemon from the webUI but the web UI doesn't load and it just keeps waiting. Tried listing session info using transmission-remote running on the same machine, and it would just keep waiting for a response. Also, tried accessing transmission using the Transdroid Android App and that wouldn't get a response either.

Other info that might be useful:

I'm seeding some torrents and downloading 1 or 2. Download is slow (poorly seeded torrent) and the upload is limited to 30 KB/s.

Transmission used to crash before, so the current setup (wrapper script) is to immediately restart transmission if it crashes after 5 mins of starting or restarting. It's not restarted if it crashes within 5 mins of starting/restarting.

Marking this as critical since it makes the daemon completely useless. I'm willing to try things out to help debug. Since it takes some time to hang, the test are going to be slow.

Change History (11)

comment:1 Changed 10 years ago by charles

  • Summary changed from Daemon running but doesn't respond to Daemon running but doesn't respond on Synology NAS

comment:2 Changed 10 years ago by rb07

I have a very similar ticket here: https://trac.transmissionbt.com/ticket/3196

What I've been doing to make this version usable is run a cron job to restart the daemon when it becomes unresponsive. This is very easy, a one liner, since transmission-remote does return after a minute with a time-out.

$ cat transmission-watch.sh
#!/bin/bash 
#
# transmission-daemon watchdog (restart the daemon if it becomes unresponsive)
#
# If tranmsission-remote timeouts with a message like this:
#[10:23:01.513] transmission-remote: (localhost:9091) Timeout was reached
# its time to restart the daemon.
# If the daemon is not running (perhaps it crashed) do the same.

OPTIONS="-n user:password"

transmission-remote $OPTIONS -si > /dev/null || \
/opt/etc/init.d/S99transmission restart

Of course it has to be adapted, a regular sh probably works just as well as bash, I do need the option shown (authentication), and my init.d script is in a non default place and the name follows the old SysV standard, plus it does have a restart (other scripts may not have it, but its a simple stop then start).

comment:3 Changed 10 years ago by rb07

An improvement on the script (in case anyone is interested, the daemon starts responding if you send a HUP signal, so there is no need to restart it -- you save all the file checking, but, as defined, the HUP means to re-load settings.json which it does, so it will use the last stored parameters which may be different than the ones you left working last, i.e. different speeds or global seed ratio).

#!/bin/bash
#
# transmission-daemon watchdog (restart the daemon if it becomes unresponsive)
#
# If tranmsission-remote timeouts with a message like this:
#[10:23:01.513] transmission-remote: (localhost:9091) Timeout was reached
# its time to restart the daemon.

OPTIONS="-n admin:password"

transmission-remote $OPTIONS -si > /dev/null || \
killall -HUP transmission-daemon

comment:4 follow-up: Changed 10 years ago by therve

I've reached a similar problem, a so-called "freeze" of transmission-daemon. I'm seeing lots of reports here and there, so I think it's a serious problem.

As in #3196, I've noticed that the thread is stuck while reading the threading pipe. I don't know if it's the problem, but looking at trevent.c I can definitely see some weirdness in pipe management:

  • The pipes are not set non-blocking. If that change, the handling of EAGAIN should probably be changed too.
  • More importantly, EINTR is not handled when writing (the result is not handled at all), which could explain why the daemon is stuck waiting to wake up.

When my daemon freezes, the SIGHUP option above works, but you can also write directly to the pipe to wake it up. It's something like "echo 'r' > /proc/$id/fd/4" (if your process is stuck reading on 3, which seems to be repeatable).

It's particularly hard to test things on those ARM-based device, but I'd be happy to try my best.

comment:5 Changed 10 years ago by therve

  • Cc therve@… added

comment:6 in reply to: ↑ 4 ; follow-up: Changed 10 years ago by rb07

Replying to therve:

The bug is in libevent. The workaround is in the ticket you mention, I closed it as solved, look at the solution: http://trac.transmissionbt.com/ticket/3196#comment:28

Last edited 10 years ago by rb07 (previous) (diff)

comment:7 in reply to: ↑ 6 ; follow-up: Changed 10 years ago by therve

Replying to rb07:

Replying to therve:

The bug is in libevent. The workaround is in the ticket you mention, I closed it as solved, look at the solution: http://trac.transmissionbt.com/ticket/3196#comment:28

It does indeed fix the problem, thanks (I was about to try it). Are you sure it's indeed a bug in libevent and not a misuse by transmission though? I think the remarks I wrote above are valid anyway.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 10 years ago by rb07

Replying to therve:

It does indeed fix the problem, thanks (I was about to try it).

How do you know it solves the problem? I tested the daemon for 3 days before closing my bug report, the problem shows randomly, some times often, some times not.

Are you sure it's indeed a bug in libevent and not a misuse by transmission though? I think the remarks I wrote above are valid anyway.

I'm sure, just look at the debugger output I posted, the first time I didn't knew how transmission-daemon worked and just expected a developer seeing the back trace could spot the problem.

When you analyze what the back trace says the problem is obvious: libevent called the peer-io call-back function which implies there is something to read, there wasn't so the function just blocks (and changing peer-io to non-blocking operation just makes things worse). There is another hint of what is happening on the last back trace: a thread is missing, it died, so probably libevent tried to raise an event which was not asked for, the SIGPIPE event, which the code sets to ignore not only on libevent but for the whole thread.

I agree with you that reading the transmission code shows a very confusing way to handle things, the loop that never should loop, the ignore signals which is just a quick and dirty way of not handling some errors.

comment:9 in reply to: ↑ 8 Changed 10 years ago by therve

Replying to rb07:

How do you know it solves the problem? I tested the daemon for 3 days before closing my bug report, the problem shows randomly, some times often, some times not.

It was fairly consistent for me, the daemon never run more than an hour. It ran for 15h straight now, so it seems better at least.

When you analyze what the back trace says the problem is obvious: libevent called the peer-io call-back function which implies there is something to read, there wasn't so the function just blocks

So it might be a bug in epoll, but I would be surprised for it to be a bug in libevent.

(and changing peer-io to non-blocking operation just makes things worse).

If you don't change the way read and write behave, sure.

There is another hint of what is happening on the last back trace: a thread is missing, it died, so probably libevent tried to raise an event which was not asked for, the SIGPIPE event, which the code sets to ignore not only on libevent but for the whole thread.

I don't have the same behavior though, I still have the 4 threads running when it freezes.

comment:10 Changed 10 years ago by charles

Any news on this ticket?

comment:11 Changed 10 years ago by charles

  • Resolution set to invalid
  • Status changed from new to closed

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future.

Note: See TracTickets for help on using tickets.