Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#2842 closed Bug (fixed)

Transmission crashes randomly on ARM-based Synology NAS

Reported by: grzegorzdubicki Owned by:
Priority: Normal Milestone: 1.93
Component: Daemon Version: 1.80
Severity: Critical Keywords: crash, 1.80, random, NAS, ARM, MIPS, Headless, needinfo
Cc: grzegorz.dubicki@…, giovannibajo@…

Description

Transmission: 1.83 installed from binary SPK (Synology NAS native) package got from http://forum.synology.com/enu/viewtopic.php?p=85166#p85166 .

(but I had apparently the same problem with 1.80 installed from binary ipk package)

Platform: Linux 2.6.24, ARM architecture (Synology DS210j NAS).

After adding 19 torrents (but I do not know if the number means anything) to the queue Transmission started to crash randomly.

It seems that the more I manipulate with torrents (pause, resume, check individual file info and set their priority) in web interface or via transmission-remote-dotnet the faster the crash comes - usually in the middle of those operations.

But even if I leave Transmission with those 19 torrents alone it still crashes after maximum time of few hours.

I have switched message level to 3 but that did not leave any interesting clues in syslog - just some tracker and checksum errors, nothing before the crash.

Please let me know what info to include to help resolve this problem. I would really like to use Transmission but for I can not for now.. :(

Attachments (5)

settings.json (1.8 KB) - added by grzegorzdubicki 7 years ago.
my settings
transmission-mk.patch (660 bytes) - added by giovannibajo 6 years ago.
Patch to build old revisions of transmission in optware
peer-io.c (24.8 KB) - added by charles 6 years ago.
peer-io.h (11.7 KB) - added by charles 6 years ago.
transmission-native-arm-make.bz2 (24.9 KB) - added by elmer91 6 years ago.
Transmission make log (native ARM build)

Download all attachments as: .zip

Change History (135)

Changed 7 years ago by grzegorzdubicki

my settings

comment:1 Changed 7 years ago by charles

If you're experiencing crashes, it may be libevent who is at the cause.

  1. If your (package-) system has libevent installed, update it. Older, such as v1.1, are know to contain bugs.
  2. Try setting, one by one, the following environment variables:
    • EVENT_NOEPOLL
    • EVENT_NOSELECT
    • EVENT_NOKQUEUE
    • EVENT_NOEVPORT
    • EVENT_NOPOLL

Libevent is capable of handling different kinds of event mechanisms used by OSes. However, it seems that some OSes have a malfunctioning implementation of one of these event mechanisms. So, one by one, disabling one of the event mechanisms that libevent supports allows us to identify and disable the one that's misbehaving, while still supporting the others.

comment:2 Changed 7 years ago by livings124

  • Resolution set to invalid
  • Status changed from new to closed

Closing because we need more info, and you probably won't notice the previous post until you see it close.

comment:3 Changed 7 years ago by grzegorzdubicki

  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Summary changed from Transmission crashes randomly to Transmission crashes randomly on ARM-based Synology NAS

I suppose my problem is the same as noted here http://forum.transmissionbt.com/viewtopic.php?f=2&t=9443&start=0 - maybe that helps.

I can not try your proposed solution as I don't have a working build environment to build Transmission for my NAS myself but I hope someone else will do that soon.

comment:4 follow-up: Changed 7 years ago by charles

You don't need to build Transmission for your NAS in order to test this: http://forum.transmissionbt.com/viewtopic.php?p=44444#p44444

comment:5 in reply to: ↑ 4 Changed 7 years ago by grzegorzdubicki

Replying to charles:

You don't need to build Transmission for your NAS in order to test this: http://forum.transmissionbt.com/viewtopic.php?p=44444#p44444

Thanks, I thought those were environment variables to set during libevent compilation..

I've got libevent 1.4.13, latest stable from ipkg binary from Optware feeds, by the way.

I am trying running Transmission with EVENT_NOEPOLL=1 now. I'll let know if it resolved crashing problem in day or two maximum.

comment:6 Changed 7 years ago by grzegorzdubicki

I've tried setting all those environment variables one by one and none of it helped - Transmission crashed every time after 5-15 minutes.

What now?

comment:7 Changed 7 years ago by grzegorzdubicki

  • Cc grzegorz.dubicki@… added

comment:8 Changed 7 years ago by bsgb

i'm having the same problem on my synology ds209, but installed using the ipk package (from optware/ipkg)

comment:9 Changed 7 years ago by charles

grzegorzdubicki, bsgb: what do the crash messages say?

comment:10 Changed 7 years ago by bsgb

When I run transmission-daemon in the foreground (with a message-level of 3) I get: Segmentation fault (core dumped). Not very useful. But I have to ay that it took a while for Transmission to crash, hen I ran it in the background last night it mostly crashed within 5 minutes, now it took almost 15.

comment:11 follow-ups: Changed 7 years ago by einy

May I add here too (Transmission v1.83-1 installed via ipkg on my Synology DS-409)

My Transmission works fine seeding >20 torrents. It crashes only when "checking" (for reseeding) already downloaded torrent (not everytime, though).

As I understand from other correspondence here at trac, checking procedure is very resources-greedy. Maybe the whole code for "checking" should be reviewed?

comment:12 in reply to: ↑ 11 Changed 7 years ago by grzegorzdubicki

Replying to einy:

May I add here too (Transmission v1.83-1 installed via ipkg on my Synology DS-409)

My Transmission works fine seeding >20 torrents. It crashes only when "checking" (for reseeding) already downloaded torrent (not everytime, though).

As I understand from other correspondence here at trac, checking procedure is very resources-greedy. Maybe the whole code for "checking" should be reviewed?

I think it's possible that my Transmission also crashes only when "checking"!

To this point I thought that it was crashing all the time, randomly. But I have ~20 large (350MB-4GB) torrents active. That means that when I restart Transmission after it dies, its checking them for many minutes with my slow DS210j's CPU and it probably crashing when still doing that with some torrent.

Of course it's just a guess from my point of view.

But I also think that when I had less torrents in the beginning of Transmission use, it crashed less.

Replying to charles:

grzegorzdubicki, bsgb: what do the crash messages say?

I didn't get any crash message yet - Transmission process just dies silently all the time.

But I'll try to do the same as bsgb in a day or two and post my results, hopefully with core dump and it's gdb analysis.

comment:13 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by charles

Replying to einy:

As I understand from other correspondence here at trac, checking procedure is very resources-greedy. Maybe the whole code for "checking" should be reviewed?

If you can reduce the CPU and disk IO cost of reading one of grzegorzdubicki's 4GB torrents from disk and generating its SHA1 checksum, I'd be very happy to hear it. About six months ago I compared the speed and load of verifying a Linux ISO in Transmission, KTorrent, Deluge, rTorrent, etc and they were all about the same.

comment:14 in reply to: ↑ 13 Changed 7 years ago by einy

Replying to charles:

If you can reduce the CPU and disk IO cost of reading one of grzegorzdubicki's 4GB torrents from disk and generating its SHA1 checksum, I'd be very happy to hear it. About six months ago I compared the speed and load of verifying a Linux ISO in Transmission, KTorrent, Deluge, rTorrent, etc and they were all about the same.

I didn't say a word about the speed. For me checking a 4GB file for <1 hour is OK. I just would not like the daemon to crash.

comment:15 Changed 7 years ago by grzegorzdubicki

I've started Transmission with:

TR_DEBUG_FD=2 ./transmission-daemon -f ... 2>runlog

as advised here http://forum.transmissionbt.com/viewtopic.php?p=44564#p44564 and it crashed with this message:

*** glibc detected *** transmission-daemon: corrupted double-linked list: 0x002b2268 ***

and the log's tail is:

# tail /volume1/downloads/transmission-debug-log
[21:55:03.154] 82.176.248.156:34276 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.154] 24.137.75.33:17542 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.154] 94.96.181.241:3528 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 89.32.124.6:19220 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 81.83.49.76:15512 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 94.96.71.106:42961 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 204.188.164.155:20681 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 59.189.12.196:55167 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
[21:55:03.155] 98.208.99.12:3972 bandwidth.c:289 is decrementing the IO's refcount from 2 to 1 (peer-io.c:575)
Aborted

..not very helpful I'm afraid. :(

comment:16 Changed 7 years ago by grzegorzdubicki

The whole log of Transmission running from 21:29 to 21:55 is ~835MB so I've split it to 100MB parts and uploaded 7-zipped beginning here:

http://uploading.com/files/c5d5a8cf/transmission-debug-log.001.7z/ (4,5 MB)

and the end here:

http://uploading.com/files/5c9d8m92/transmission-debug-log.009.7z/ (1,5 MB)

Let me know if you managed to download those files correctly.

Disclaimer: torrents that my log says I was downloading were being downloaded for the single purpose of resolving this bug. I do not keep those files for more that 24 hours.

comment:17 Changed 7 years ago by spyz

I have the exact same problems as the original poster but with a QNAP TS-110

comment:18 Changed 7 years ago by charles

grzegorzdubicki is correct -- those log files are, unfortunately, not very helpful. :/

It's crazy that these platforms don't have any meaningful mechanism for debugging. Surely there's something available?

comment:19 Changed 7 years ago by charles

  • Keywords needinfo added; random? removed

comment:20 Changed 7 years ago by tonin

As posted on http://forum.transmissionbt.com/viewtopic.php?f=2&t=9443&start=30#p44833 here is the backtrace I could generate on a Droboshare. I hope it's useful, if not I can try to get more info if you provide me with instructions.

    /mnt/DroboShares/Drobo/DroboApps/core $ ../gdb/bin/gdb ../transmission-1.83/bin/transmission-daemon transmission-da-4094.core
    GNU gdb (GDB) 7.0.1
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "arm-none-linux-gnueabi".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /mnt/DroboShares/Drobo/DroboApps/transmission-1.83/bin/transmission-daemon...done.

    warning: core file may not match specified executable file.
    [New Thread 4102]
    [New Thread 4096]
    [New Thread 4094]
    Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libnsl.so.1
    Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib/libresolv.so.2
    Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/librt.so.1
    Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib/libdl.so.2
    Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libz.so.1
    Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libm.so.6
    Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libgcc_s.so.1
    Reading symbols from /lib/libpthread.so.0...done.
    Loaded symbols for /lib/libpthread.so.0
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.3...(no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.3
    Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
    Loaded symbols for /lib/libnss_dns.so.2
    Core was generated by `/mnt/DroboShares/Drobo/DroboApps/transmission-1.83/bin/transmission-daemon -g /'.
    Program terminated with signal 11, Segmentation fault.
    #0  __tr_list_splice (head=0x3885e8) at list.c:184
    184   list.c: No such file or directory.
       in list.c
    (gdb) bt
    #0  __tr_list_splice (head=0x3885e8) at list.c:184
    #1  __tr_list_remove (head=0x3885e8) at list.c:191
    #2  0x00032408 in didWriteWrapper (io=0x214440, bytes_transferred=933) at peer-io.c:108
    #3  0x000334b0 in tr_peerIoTryWrite (io=0x214440, dir=TR_CLIENT_TO_PEER, limit=1024) at peer-io.c:941
    #4  tr_peerIoFlush (io=0x214440, dir=TR_CLIENT_TO_PEER, limit=1024) at peer-io.c:967
    #5  0x000298c4 in phaseOne (peerArray=<value optimized out>, dir=TR_CLIENT_TO_PEER) at bandwidth.c:221
    #6  0x0002b370 in tr_bandwidthAllocate (b=<value optimized out>, dir=TR_CLIENT_TO_PEER, period_msec=500) at bandwidth.c:278
    #7  0x00034dc0 in bandwidthPulse (foo=<value optimized out>, bar=<value optimized out>, vmgr=<value optimized out>) at peer-mgr.c:3095
    #8  0x0005ba64 in event_process_active ()
    #9  0x0005bea0 in event_base_loop ()
    #10 0x0005bcdc in event_loop ()
    #11 0x0005bac4 in event_dispatch ()
    #12 0x00021a2c in libeventThreadFunc (veh=<value optimized out>) at trevent.c:230
    #13 0x00012e8c in ThreadFunc (_t=0x1ae040) at platform.c:109
    #14 0x4015db60 in start_thread () from /lib/libpthread.so.0
    #15 0x402287e0 in clone () from /lib/libc.so.6
    #16 0x402287e0 in clone () from /lib/libc.so.6
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

comment:21 Changed 7 years ago by tonin

I got another segfault with the following trace:

Core was generated by `/mnt/DroboShares/Drobo/DroboApps/transmission-1.83/bin/transmission-daemon -g /'.
Program terminated with signal 11, Segmentation fault.
#0  0x000337bc in tr_peerIoFlushOutgoingProtocolMsgs (io=0x8c4a78) at peer-io.c:985
985	peer-io.c: No such file or directory.
	in peer-io.c
(gdb) bt
#0  0x000337bc in tr_peerIoFlushOutgoingProtocolMsgs (io=0x8c4a78) at peer-io.c:985
#1  0x0002b320 in tr_bandwidthAllocate (b=<value optimized out>, dir=TR_CLIENT_TO_PEER, period_msec=500) at bandwidth.c:264
#2  0x00034dc0 in bandwidthPulse (foo=<value optimized out>, bar=<value optimized out>, vmgr=<value optimized out>) at peer-mgr.c:3095
#3  0x0005ba64 in event_process_active ()
#4  0x0005bea0 in event_base_loop ()
#5  0x0005bcdc in event_loop ()
#6  0x0005bac4 in event_dispatch ()
#7  0x00021a2c in libeventThreadFunc (veh=<value optimized out>) at trevent.c:230
#8  0x00012e8c in ThreadFunc (_t=0x1ae040) at platform.c:109
#9  0x4015db60 in start_thread () from /lib/libpthread.so.0
#10 0x402287e0 in clone () from /lib/libc.so.6
#11 0x402287e0 in clone () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

comment:22 Changed 7 years ago by charles

Does the problem persist in 1.90?

comment:23 Changed 7 years ago by tonin

I just compiled 1.90 for my Droboshare, I'll let you know how it works.

comment:24 Changed 7 years ago by tonin

  • Component changed from Transmission to Daemon

I just had a similar crash with 1.90 tonight. Here is the trace:

Core was generated by `/mnt/DroboShares/Drobo/DroboApps/transmission-1.90/bin/transmission-daemon -g /'.
Program terminated with signal 11, Segmentation fault.
#0  updateDesiredRequestCount (msgs=0x865690, now=1266870755766) at peer-msgs.c:1688
1688	peer-msgs.c: No such file or directory.
	in peer-msgs.c
(gdb) bt
#0  updateDesiredRequestCount (msgs=0x865690, now=1266870755766) at peer-msgs.c:1688
#1  0x0003b8b8 in peerPulse (vmsgs=<value optimized out>) at peer-msgs.c:1983
#2  0x00035558 in pumpAllPeers (foo=<value optimized out>, bar=<value optimized out>, vmgr=<value optimized out>) at peer-mgr.c:3101
#3  bandwidthPulse (foo=<value optimized out>, bar=<value optimized out>, vmgr=<value optimized out>) at peer-mgr.c:3114
#4  0x0005cb2c in event_process_active ()
#5  0x0005cf68 in event_base_loop ()
#6  0x0005cda4 in event_loop ()
#7  0x0005cb8c in event_dispatch ()
#8  0x00022248 in libeventThreadFunc (veh=<value optimized out>) at trevent.c:230
#9  0x00013db0 in ThreadFunc (_t=0x1af040) at platform.c:109
#10 0x4015db60 in start_thread () from /lib/libpthread.so.0
#11 0x402287e0 in clone () from /lib/libc.so.6
#12 0x402287e0 in clone () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

comment:25 follow-up: Changed 7 years ago by grzegorzdubicki

  • Version changed from 1.83 to 1.83+

comment:26 in reply to: ↑ 25 ; follow-up: Changed 7 years ago by spyz

Replying to grzegorzdubicki:

Maybe this http://forum.synology.com/enu/viewtopic.php?p=88087#p88087 and this http://forum.synology.com/enu/viewtopic.php?p=88089#p88089 is right and the problem was introduced after 1.74?

I'm using 1.76-1 without problems... Transmission on my Qnap TS-110 started having crash problems since 1.80

comment:27 Changed 7 years ago by charles

It's strange that this is showing up so much on these nice little platforms and not all all in my testing scenarios (and not on anyone else's PCs either, unfortunately for testing)...

I guess the next thing to do would be to find the revision number of 1.76 that reportedly works, and the revision number of 1.80 that doesn't, and then do a binary search of checking out revisions and testing them to find out where the crash occurred. This may involve four or five iterations of checkout+test, depending on how lucky we get.

tonin, since you're building your own copies of Transmission, do you have time to do this?

comment:28 Changed 7 years ago by tonin

Hi Charles and All,

I don't have much time, but I probably can give a try at that with a bit of your help. I must also say that my Droboshare is not a so nice little platform and that it crashes quite often, so I'll have to deal with all that.

That being said, I first checked out and built the latest revision from log:branches/1.7x but I had to tweak a few things to make it compile. It was missing po/Makefile.in.in and third-party/libevent/test/regress.gen.*, not sure why, and the --disable-nls flag didn't work the same as on 1.8x series (disabled by default?).

I'm running this log:branches/1.7x@9888 now

For building some early 1.80, I guess I should take it out of the log:trunk, but do you have an idea which revisions should I start with, upper and lower bounds?

comment:29 Changed 7 years ago by charles

Every checkout you do should be from trunk, not from the branches. Trunk is where all the changes were taking place, so that's where the bug will have been introduced.

1.76: r9395

1.80: r9984

...so the bad news 1.80's abnormally long development cycle means we've got a lot of ground to cover. The good news is that the first test will eliminate almost 300 revisions from the list of suspects :)

The midpoint between r9395 and r9984 is r9689, so that would be the first revision to test in trunk.

comment:30 in reply to: ↑ 26 Changed 7 years ago by grzegorzdubicki

  • Version changed from 1.83+ to 1.80+

Replying to spyz:

Replying to grzegorzdubicki:

Maybe this http://forum.synology.com/enu/viewtopic.php?p=88087#p88087 and this http://forum.synology.com/enu/viewtopic.php?p=88089#p88089 is right and the problem was introduced after 1.74?

I'm using 1.76-1 without problems... Transmission on my Qnap TS-110 started having crash problems since 1.80

I have also switched to 1.76 (exactly this package: http://depositfiles.com/en/files/qdeuh7xun ) yesterday evening and it seems to be working fine after ~18h which never happened with >=1.80.

comment:31 Changed 7 years ago by deleter

That's me who have built those 1.76 for ARM. I can build it again if there is need! I can build them with some flags maybe.

I want to help because I'm having similar troubles on me PowerPC based Synology NAS.

comment:32 Changed 7 years ago by charles

deleter: please see comment #29.

I think the key now is to figure out where the issue was introduced.

Could you do a build of r9689 from svn trunk and see if the crash exists there?

comment:33 Changed 7 years ago by deleter

I've got this Fetching external item into 'transmission-/third-party/libevent' svn: URL 'svn://svn.transmissionbt.com/libevent/branches/patches-1.4/libevent' doesn't exist

comment:34 follow-up: Changed 7 years ago by deleter

i've found the solution i've just copied svn:external value from latest trunk commits:

third-party/libevent -r1558 svn://svn.transmissionbt.com/libevent/branches/patches-1.4/libevent

comment:35 Changed 7 years ago by deleter

I've got this at the end:
./configure: line 25254: syntax error near unexpected token `0.40.0,no-xml' ./configure: line 25254: ` IT_PROG_INTLTOOL(0.40.0,no-xml)' make: * home/sergey/optware/builds/transmission/.configured Error 2

comment:36 Changed 7 years ago by charles

Just tear out the intltool stuff from configure.ac and then rerun autogen.sh. Look for the section that begins:

dnl This section is only used for internationalization.

dnl If you don't need translations and this section gives you trouble --

dnl such as if you're building for a headless system --

dnl it's okay to tear this section out and re-build the configure script.

comment:37 follow-up: Changed 6 years ago by giovannibajo

  • Cc giovannibajo@… added

I have the same problem with a MIPS-based build of transmission (optware ddwrt feed).

I have looked at the traceback posted above and also gave a quick look at the code. I have an idea about what might be going on, but it's just crystal ball watching at this point. This theory also explains why we see the segfault on embedded systems only (ARM/MIPS cpus).

Both ARM and MIPS CPUs don't allow unaligned accesses to memory. When you dereference a pointer, it must be aligned. Commit [9651] (which falls in the middle between 1.76 and 1.80) added a one-byte element to struct tr_peerIo (uint8_t isSeed). This change disaligned all the members below, and specifically the outbuf_datatype list which is accessed through fancy pointer casts (list.h). My brain says that before [9651] outbuf_dataype was 4-byte aligned (but someone can verify it by printf-ing offsetof(struct tr_peerIo, outbuf_datatype)).

Charles, what do you think?

comment:38 Changed 6 years ago by einy

BTW, for those who can't wait for the fix there is a temporary solution -

  • install process watcher/restarter called monit (ipkg install monit) and make it watch transmission-daemon process and restart it if necessary.

I am using monit for two weeks (c. 1 transmission-daemon restart/day) and it works seamlessly.

You could also watch and restart the daemon using sbin/init but I would not recomment playing with this file because you can make your system unbootable.

comment:39 in reply to: ↑ 37 ; follow-ups: Changed 6 years ago by charles

Replying to giovannibajo:

Both ARM and MIPS CPUs don't allow unaligned accesses to memory. When you dereference a pointer, it must be aligned. Commit [9651] (which falls in the middle between 1.76 and 1.80) added a one-byte element to struct tr_peerIo (uint8_t isSeed). This change disaligned all the members below, and specifically the outbuf_datatype list which is accessed through fancy pointer casts (list.h). My brain says that before [9651] outbuf_dataype was 4-byte aligned (but someone can verify it by printf-ing offsetof(struct tr_peerIo, outbuf_datatype)).

Charles, what do you think?

It's an interesting theory, but it means that most software on the planet would crash on an ARM.

If this crash is affecting as many people as the # of comments in this ticket indicates, I don't understand why nobody has done a build of r9689 from trunk to say whether or not it crashes there. IMO doing a binary search of revisions to find the offending commit is the best way to solve this.

comment:40 in reply to: ↑ 39 Changed 6 years ago by giovannibajo

Replying to charles:

It's an interesting theory, but it means that most software on the planet would crash on an ARM.

I don't see why. The problem here are the casts being done by list.h, which is not exactly what most software on the planet do. This page does a decent overview on the problem: http://lecs.cs.ucla.edu/wiki/index.php/XScale_alignment

If this crash is affecting as many people as the # of comments in this ticket indicates, I don't understand why nobody has done a build of r9689 from trunk to say whether or not it crashes there. IMO doing a binary search of revisions to find the offending commit is the best way to solve this.

I've been trying to compile a toolchain for my platform for the whole day, but I was unable. I would surely help if I could recompile transmission. I will try to track down whoever packages transmission for ddwrt and see if he can help.

comment:41 Changed 6 years ago by giovannibajo

I eventually managed to build and test r9689. It's been running for the whole night which is much more than 1.91 usually does, so I would say it works. I'll try r9836 now (keep on bisecting).

comment:42 follow-up: Changed 6 years ago by charles

giovannibajo: great to hear. keep me posted; I'm eager to hear how it goes :)

comment:43 in reply to: ↑ 39 Changed 6 years ago by tonin

Replying to charles:

If this crash is affecting as many people as the # of comments in this ticket indicates, I don't understand why nobody has done a build of r9689 from trunk to say whether or not it crashes there. IMO doing a binary search of revisions to find the offending commit is the best way to solve this.

I'm sorry I'm not able to help more here, but my Droboshare is now crashing more often than transmission. The few times I've run transmission r9698 I didn't see a crash of it, it's about all I can say.

comment:44 in reply to: ↑ 42 Changed 6 years ago by giovannibajo

Replying to charles:

giovannibajo: great to hear. keep me posted; I'm eager to hear how it goes :)

r9836 seemed OK as well. It's hard to say because the crashes with 1.91 are really random, sometimes they take a few hours so it's a tough call to say "ok, this works, let's try next one".

r9910 is running for 4 hours without problems now. I'll wait a little bit more, and then I have r9947 already compiled ready for testing.

comment:45 Changed 6 years ago by charles

Bookkeeping: ticket #2994 may be a duplicate of this ticket.

comment:46 Changed 6 years ago by charles

giovannibajo: don't worry about running it overnight or for a day just to make sure. It'll save time in the long run if we don't have to backtrack.

comment:47 Changed 6 years ago by giovannibajo

r9947 has been running for 12 hours now. I start to wonder if I'm able to reproduce the problem with my own builds, and/or if the random segfaults affecting my MIPS-based platform are the same of the ones seen the ARM-based Sinology, and thus we can trust the report that says that 1.80 is affected.

I'll compile and install 1.80 (r9984) now.

comment:48 Changed 6 years ago by giovannibajo

OK we're lucky: r9984 crashed after less then 30 minutes :)

I'm testing r9965 now.

comment:49 Changed 6 years ago by giovannibajo

Status update:

r9947 survived 12 hours. r9965 crashed in 3 hours.

I'm testing r9956 now.

comment:50 Changed 6 years ago by giovannibajo

Charles, I was looking at the diff here: http://trac.transmissionbt.com/changeset?new=9965@/&old=9947@/

I might be blind, but it looks to me that the only thing that could affect transmission-daemon is the libevent switch (it went from a committed version to an external one). Does it make sense?

comment:51 Changed 6 years ago by charles

I've got a sneaking suspicion it's going to be the change to peer-io.c, but I don't know how or why yet.

comment:52 Changed 6 years ago by charles

About my previous comment -- I'm referring to the changes to peer-io.c made in r9959 and r9960.

So my prediction is that r9956 will survive, but that r9960 will crash.

Some of the crash reports in ticket #2994 show the code being in this area, too.

However I don't yet see the cause of the bug.

comment:53 Changed 6 years ago by giovannibajo

One thing that isn't clear to me is why transmission dies with SIGABRT (at least, the only thing I made it die within gdb). For a random corruption, I would have expected a SIGSEGV. SIGABRT sounds really like an assert() failing, but I didn't see any output in the gdb session (I should have seen it on stderr at least).

comment:54 in reply to: ↑ 34 Changed 6 years ago by grzegorzdubicki

Replying to deleter:

i've found the solution i've just copied svn:external value from latest trunk commits:

third-party/libevent -r1558 svn://svn.transmissionbt.com/libevent/branches/patches-1.4/libevent

I would like to help. Transmission 1.80+ with my configuration crashes really quickly (in about 15 minutes, most of the time).

I have set up my build environment (cross building on Debian 5.04 i386 - Virtual Machine, as advised here: http://forum.synology.com/enu/viewtopic.php?f=143&t=16560), got the r9956 and configure-d (or rather autogen.sh-ed) it but I think I encountered the same problem during build as you did during 'make'. It stopped with:

make[2]: Entering directory '/root/transmission-r9956/Transmission/third-party/libevent'
make[2]: *** No rule to make target 'all'. Stop.
  • is that the same as your problem? Event if so I do not understand your terse note about how you resolved it.. :( Can you help me?

Changed 6 years ago by giovannibajo

Patch to build old revisions of transmission in optware

comment:55 follow-up: Changed 6 years ago by giovannibajo

grzegorzdubicki: I've just attached a patch to transmission.mk that will fix building it for you. I suppose you're using an optware checkout in the first place... otherwise, just run:

$ svn ps svn:externals 'third-party/libevent -r1558 svn://svn.transmissionbt.com/libevent/branches/patches-1.4/libevent'
$ svn up

I'm testing r9956. Why don't you test r9960 instead, as suggested by Charles?

comment:56 Changed 6 years ago by giovannibajo

r9956 survived 12 hours. I'm testing r9960 now, following Charles' intuition.

comment:57 Changed 6 years ago by charles

:)

comment:58 Changed 6 years ago by giovannibajo

r9960 is still running after 12 hours, so it doesn't have the bug.

I'll try r9962, which is immediately after the libevent change.

comment:59 follow-up: Changed 6 years ago by kylechen

  • Keywords 1.80 random NAS ARM MIPS Headless added

giovannibajo, I dont feel you need to test 12 hours like that. I am with mips NAS, use optware transmission, have exactly same ARM problem as described here. (tested 1.75,1.76,1.80,1.83,1.90,1.91) All the 1.80+ transmission, if start few super large file downloads, let the speed go over 100KB/s, then start hash checking one of the torrent. In these kinds of busy case the transmission never survive 30 seconds. Now running 1.76 everything goes fine. how i wish i can have a 1.76 based transmission with magnet link support. Sounds perfect enough for me. I didn't successfully compile a transmission on my machine yet, sadly, have tech problem building environment. But sincerely appreciate your work and hope you succeed!

if anyone can conveniently make mips build, compile a series of suspicious builds and no matter how many versions we've got, i can give a precise test result in less than an hour.

comment:60 in reply to: ↑ 59 Changed 6 years ago by giovannibajo

Replying to kylechen:

All the 1.80+ transmission, if start few super large file downloads, let the speed go over 100KB/s, then start hash checking one of the torrent. In these kinds of busy case the transmission never survive 30 seconds.

I will try, thanks!

I didn't successfully compile a transmission on my machine yet, sadly, have tech problem building environment. But sincerely appreciate your work and hope you succeed!

if anyone can conveniently make mips build, compile a series of suspicious builds and no matter how many versions we've got, i can give a precise test result in less than an hour.

Be my guest then: http://rasky.develer.com/transmission_mips

These are all the builds I made. The "1.91" in the filename is wrong, just ignore it. They are all builds from SVN trunk at different points in history.

Can you please try r9956, r9960, r9962 and r9965?

comment:61 Changed 6 years ago by kylechen

giovannibajo, i just come back home, start testing now, give me few mimutes.

comment:62 in reply to: ↑ 55 Changed 6 years ago by grzegorzdubicki

Replying to giovannibajo:

grzegorzdubicki: I've just attached a patch to transmission.mk that will fix building it for you. I suppose you're using an optware checkout in the first place... otherwise, just run:

$ svn ps svn:externals 'third-party/libevent -r1558 svn://svn.transmissionbt.com/libevent/branches/patches-1.4/libevent'
$ svn up

Ok, thanks I will try that when I have some free time for this matter.

I'm testing r9956. Why don't you test r9960 instead, as suggested by Charles?

I thought it would be good to test the same version on a different platform before removing it from the to-test list.. Of course I will do build and test the most appropriate version according to this issue's history when I get back to this.

comment:63 Changed 6 years ago by kylechen

environment check

9689:ok

1.91:fail.(for determine the crash term, it run 5 minutes before crash)

Result: 9965:fail.

9962:fail.

9960:fail.

9956:Still Running, Over average crash term already.

Charles said:"So my prediction is that r9956 will survive, but that r9960 will crash." I think he got a really good hunch, is it the word?

comment:64 Changed 6 years ago by kylechen

dear giovannibajo, the r9956 is running still, I believe I can give it a pass. So, r9956 pass, r9960 crashes. would you please build me a 58 and 59, then we have a good chance to identical the bug build for sure tonight. save the smart guy some time to fix it up.

comment:65 follow-up: Changed 6 years ago by giovannibajo

r9958 and r9959 built and available on the same URL. Meanwhile, I can't make transmission crash anymore, I'm not sure why. Kylechen, you're our only hope :)

comment:66 in reply to: ↑ 65 Changed 6 years ago by grzegorzdubicki

Replying to giovannibajo:

r9958 and r9959 built and available on the same URL. Meanwhile, I can't make transmission crash anymore, I'm not sure why. Kylechen, you're our only hope :)

I wish I could join testing on my ARM-based NAS but I can not build Trans. :(

I am trying it on i386 Debian 5.0.4 as instructed in http://forum.synology.com/enu/viewtopic.php?f=143&t=16560 getting the source from

-r 9958 svn://svn.m0k.org/Transmission/trunk

with libevent from:

-r 1558 svn://svn.transmissionbt.com/libevent/branches/patches-1.4

with successfully compiling libevent first with:

./configure --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi --build=i686-pc-linux --prefix=/usr/local
make

and then configuring Trans. with:

CPPFLAGS="-O0 -g -I /root/transmission-r9956/Transmission/third-party/openssl/include/openssl" LDFLAGS="-L /root/transmission-r9956/Transmission/third-party/openssl/lib" ./autogen.sh --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi --build=i686-pc-linux --prefix=/usr/local --disable-gtk --disable-cli --disable-mac --disable-nls

which results in:

(...)
config.status: creating third-party/Makefile
config.status: creating third-party/miniupnp/Makefile
config.status: creating third-party/libnatpmp/Makefile
config.status: creating third-party/dht/Makefile
config.status: creating macosx/Makefile
config.status: creating gtk/Makefile
config.status: creating gtk/icons/Makefile
config.status: creating web/Makefile
config.status: creating web/images/Makefile
config.status: creating web/images/buttons/Makefile
config.status: creating web/images/graphics/Makefile
config.status: creating web/images/progress/Makefile
config.status: creating web/javascript/Makefile
config.status: creating web/javascript/jquery/Makefile
config.status: creating web/stylesheets/Makefile
config.status: creating po/Makefile.in
config.status: executing depfiles commands
config.status: executing po/stamp-it commands
config.status: error: po/Makefile is not ready.

Now type 'make' to compile Transmission.

but then:

make

cd . && /bin/sh /root/transmission-r9956/Transmission/missing --run aclocal-1.10 -I m4
 cd . && /bin/sh /root/transmission-r9956/Transmission/missing --run automake-1.10 --gnu
cd . && /bin/sh /root/transmission-r9956/Transmission/missing --run autoconf
/bin/sh ./config.status --recheck
running CONFIG_SHELL=/bin/sh /bin/sh ./configure  --enable-maintainer-mode --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi --build=i686-pc-linux --prefix=/usr/local --disable-gtk --disable-cli --disable-mac --disable-nls build_alias=i686-pc-linux host_alias=arm-none-linux-gnueabi target_alias=arm-none-linux-gnueabi LDFLAGS=-L /root/transmission-r9956/Transmission/third-party/openssl/lib CPPFLAGS=-O0 -g -I /root/transmission-r9956/Transmission/third-party/openssl/include/openssl --enable-static --disable-shared -q  --no-create --no-recursion
appending configuration tag "CXX" to libtool
appending configuration tag "F77" to libtool
configure: WARNING: In the future, Autoconf will not detect cross-tools
whose name does not start with the host triplet.  If you think this
configuration is useful to you, please write to autoconf@gnu.org.
configure: WARNING: using our own libevent from third-party/libevent/
configure: WARNING: if you are cross-compiling this is probably NOT what you want.


Configuration:

        Source code location:          .
        Compiler:                      arm-none-linux-gnueabi-g++
        System or bundled libevent:    bundled

        Build Mac client:              no
        Build GTK+ client:             no
          ... with canberra support:
          ... with gio support:        no
          ... with dbus-glib support:  no
          ... with libgconf support:
          ... with libnotify support:  no
        Build Command-Line client:     no
        Build Daemon:                  yes

 /bin/sh ./config.status
config.status: creating Makefile
config.status: creating transmission.spec
config.status: creating cli/Makefile
config.status: creating daemon/Makefile
config.status: creating doc/Makefile
config.status: creating libtransmission/Makefile
config.status: creating third-party/Makefile
config.status: creating third-party/miniupnp/Makefile
config.status: creating third-party/libnatpmp/Makefile
config.status: creating third-party/dht/Makefile
config.status: creating macosx/Makefile
config.status: creating gtk/Makefile
config.status: creating gtk/icons/Makefile
config.status: creating web/Makefile
config.status: creating web/images/Makefile
config.status: creating web/images/buttons/Makefile
config.status: creating web/images/graphics/Makefile
config.status: creating web/images/progress/Makefile
config.status: creating web/javascript/Makefile
config.status: creating web/javascript/jquery/Makefile
config.status: creating web/stylesheets/Makefile
config.status: creating po/Makefile.in
config.status: executing depfiles commands
config.status: executing po/stamp-it commands
config.status: error: po/Makefile is not ready.
make: *** [Makefile] Error 1

??

comment:67 follow-up: Changed 6 years ago by kylechen

surprisingly, the r9958 and r9959 are both running all right. But there are only 3 lines different between r9959 and r9960 though.

comment:68 follow-up: Changed 6 years ago by elmer91

It seems that ongoing tests are performed on MIPS platform.

I'm also running transmission (IPKG) on a Synology NAS (DS-207+) ARM based. I have exactly the same problems since upgrading to 1.80+ Transmission was crashing 2 or 3 times per day (2 or 3 active torrents).

I managed to build transmission for my system: Linux NAS 2.6.15 #959 Fri Nov 13 02:49:27 CST 2009 armv5tejl GNU/Linux

I'm currently testing r9960 (native compiler on the ARM system): svn co -r9960 svn://svn.m0k.org/Transmission/trunk Transmission Quite straightforward:

  • no need to use libevent from another branch
  • configure:25802 has to comment out IT_PROG_INTLTOOL(0.40.0,no-xml)

configure --prefix=/opt --disable-gtk --disable-nls --disable-mac

Few warnings emitted (gcc 3.4.6): peer-io.c: In function `didWriteWrapper': peer-io.c:91: warning: cast increases required alignment of target type peer-io.c: In function `trDatatypeFree': peer-io.c:519: warning: cast increases required alignment of target type peer-io.c: In function `tr_peerIoFlushOutgoingProtocolMsgs': peer-io.c:984: warning: cast increases required alignment of target type

peer-msgs.c: In function `tr_generateAllowedSet': peer-msgs.c:630: warning: cast increases required alignment of target type

./libtransmission.a(fdlimit.o): In function `tr_prefetch': /volume1/public/make/Transmission/libtransmission/fdlimit.c:244: warning: warning: posix_fadvise64 is not implemented and will always fail

r9960 is currently under testing with low/moderate load (total: 3 torrents, 100 peers, 100KB down, 75 KB up)

I will report progress.

comment:69 in reply to: ↑ 68 ; follow-up: Changed 6 years ago by grzegorzdubicki

Replying to elmer91:

It seems that ongoing tests are performed on MIPS platform.

I'm also running transmission (IPKG) on a Synology NAS (DS-207+) ARM based. I have exactly the same problems since upgrading to 1.80+ Transmission was crashing 2 or 3 times per day (2 or 3 active torrents).

I managed to build transmission for my system: Linux NAS 2.6.15 #959 Fri Nov 13 02:49:27 CST 2009 armv5tejl GNU/Linux

I'm currently testing r9960 (...)

Thank you for the tips but I still can not compile it neither on my ARM NAS or my i386 vm. I have already spent hours trying to do it and I do not want to spend any more (at least for now).

But I could test your build on my quickly-crashing config if you would share it.

comment:70 follow-ups: Changed 6 years ago by grzegorzdubicki

If you are curious: I get the same results as Pvt_Ryan here http://forum.transmissionbt.com/viewtopic.php?f=2&t=6339 when trying to compile on my NAS but there is no "libcurl3-openssl-dev" or similar package in my ipkg feed (http://ipkg.nslu2-linux.org/feeds/optware/cs08q1armel/cross/unstable/) to move forward..

comment:71 in reply to: ↑ 70 Changed 6 years ago by elmer91

Replying to grzegorzdubicki:

If you are curious: I get the same results as Pvt_Ryan here http://forum.transmissionbt.com/viewtopic.php?f=2&t=6339 when trying to compile on my NAS but there is no "libcurl3-openssl-dev" or similar package in my ipkg feed (http://ipkg.nslu2-linux.org/feeds/optware/cs08q1armel/cross/unstable/) to move forward..

I had to install (in addition to standard build tools already installed on my system:gcc, binutils, automake, ...) ipkg install openssl-dev ipkg install libcurl-dev I had to make a patch to /opt/lib/pkgconfig/libcurl.pc (commenting out URL line) My system is nearly up to date (ipkg upgrade done last month)

My knowledge is limited regarding building a IPKG package, but I can share my transmission-daemon binary freshly built.

By the way, r9960 has run all the night without any problem (still downloading). Usually Transmission 1.80+ is crashing 2 or 3 times per day.

I will try to build a 1.80 release from SVN in order to be sure I can reproduce the problem with my custom build. I was previously using prebuilt packages.

comment:72 in reply to: ↑ 69 ; follow-up: Changed 6 years ago by elmer91

Replying to grzegorzdubicki:

But I could test your build on my quickly-crashing config if you would share it.

You don't need a complete IPKG package:

  • Just download the following file, save it to your NAS system
  • Change transmission-daemon path in your starting script

Transmission r9960 for ARM based IPKG systems: http://rapidshare.com/files/360974520/transmission-daemon-r9960.html

It should work ...

comment:73 in reply to: ↑ 67 Changed 6 years ago by giovannibajo

Replying to kylechen:

surprisingly, the r9958 and r9959 are both running all right. But there are only 3 lines different between r9959 and r9960 though.

Charles, any idea what might be going on? I think this is the end of our bisect tests, so now it's a matter of debugging.

comment:74 in reply to: ↑ 70 ; follow-up: Changed 6 years ago by elmer91

Replying to grzegorzdubicki:

If you are curious: I get the same results as Pvt_Ryan here http://forum.transmissionbt.com/viewtopic.php?f=2&t=6339 when trying to compile on my NAS but there is no "libcurl3-openssl-dev" or similar package in my ipkg feed (http://ipkg.nslu2-linux.org/feeds/optware/cs08q1armel/cross/unstable/) to move forward..

Try:

  • ipkg update
  • ipkg install openssl-dev
  • ipkg install libcurl-dev

Both exist in your IPKG feed! You may have to patch /opt/lib/pkgconfig/libcurl.pc (commenting out URL line).

Name: libcurl
#URL: http://curl.haxx.se/
Description: Library to transfer files with ftp, http, etc.

pkgconfig gave me an error regarding this line...

comment:75 in reply to: ↑ 72 ; follow-up: Changed 6 years ago by grzegorzdubicki

Replying to elmer91:

Replying to grzegorzdubicki:

But I could test your build on my quickly-crashing config if you would share it.

You don't need a complete IPKG package:

  • Just download the following file, save it to your NAS system
  • Change transmission-daemon path in your starting script

Transmission r9960 for ARM based IPKG systems: http://rapidshare.com/files/360974520/transmission-daemon-r9960.html

It should work ...

Can not run it. :(

# /opt/bin/transmission-daemon
/bin/ash: /opt/bin/transmission-daemon: not found

perhaps it is because it has been compiled for Linux 2.4.x and my NAS runs 2.6.x:

# file /opt/bin/transmission-daemon
/opt/bin/transmission-daemon: ELF 32-bit LSB executable, ARM, version 1, dynamically linked (uses shared libs), for GNU/Linux 2.4.3, not stripped
# file /opt/bin/transmission-daemon-1.76
/opt/bin/transmission-daemon-1.76: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.14, stripped
# uname -a
Linux Serwer 2.6.24 #959 Fri Nov 13 01:40:45 CST 2009 armv5tejl GNU/Linux

comment:76 in reply to: ↑ 74 Changed 6 years ago by grzegorzdubicki

Replying to elmer91:

Replying to grzegorzdubicki:

If you are curious: I get the same results as Pvt_Ryan here http://forum.transmissionbt.com/viewtopic.php?f=2&t=6339 when trying to compile on my NAS but there is no "libcurl3-openssl-dev" or similar package in my ipkg feed (http://ipkg.nslu2-linux.org/feeds/optware/cs08q1armel/cross/unstable/) to move forward..

Try:

  • ipkg update
  • ipkg install openssl-dev
  • ipkg install libcurl-dev

Both exist in your IPKG feed! You may have to patch /opt/lib/pkgconfig/libcurl.pc (commenting out URL line).

Name: libcurl
#URL: http://curl.haxx.se/
Description: Library to transfer files with ftp, http, etc.

pkgconfig gave me an error regarding this line...

Thanks but it did not help. I have already had got both these packages installed (openssl-dev 0.9.8m-2 and libcurl-dev 7.20.0-1). And it does not help with ./configure being unable to find OpenSSL files.

Changed 6 years ago by charles

Changed 6 years ago by charles

comment:77 follow-ups: Changed 6 years ago by charles

What happens if you build from trunk, but drop in the attached versions of peer-io.c and peer-io.h into libtransmission/ before you build?

comment:78 in reply to: ↑ 75 Changed 6 years ago by elmer91

Replying to grzegorzdubicki:

perhaps it is because it has been compiled for Linux 2.4.x and my NAS runs 2.6.x:

Strange, my NAS is also running Linux 2.6 !

Yours:

# uname -a
Linux Serwer 2.6.24 #959 Fri Nov 13 01:40:45 CST 2009 armv5tejl GNU/Linux

Mine:

bash-3.2# uname -a
Linux NAS 2.6.15 #959 Fri Nov 13 02:49:27 CST 2009 armv5tejl GNU/Linux

comment:79 in reply to: ↑ 77 Changed 6 years ago by elmer91

Replying to charles:

What happens if you build from trunk, but drop in the attached versions of peer-io.c and peer-io.h into libtransmission/ before you build?

For the time being, I'm trying to reproduce the problem with custom build:

  • r9960 has survived 24h (I was expecting a failure)
  • r9968 is currently under testing (I expect it to fail as 1.80 from IPKG was failing)

Usually T. (1.80+) fails at least once a day on my system...

I'm waiting a little bit more and then will try trunk version with peer_io modified.

comment:80 in reply to: ↑ 77 Changed 6 years ago by kylechen

Replying to charles:

What happens if you build from trunk, but drop in the attached versions of peer-io.c and peer-io.h into libtransmission/ before you build?

Appreciate if anyone can build me a mips with this, then i am able to try this out.

And i am pretty sure about my test result above, v>=r9960 crash in 5 mins and v<r9960 last forever. but anyway, this evening when i get home, i'll re-do the test again on r9959 and r9960, to clear the doubt.

comment:81 follow-up: Changed 6 years ago by charles

kylechen: I think we've already nailed down the revision range closely enough. However I would be very interested in whether or not the new versions of peer-io.h and peer-io.c fix the problem.... could you do a build + test with them?

comment:82 in reply to: ↑ 81 ; follow-up: Changed 6 years ago by elmer91

Replying to charles:

kylechen: I think we've already nailed down the revision range closely enough. However I would be very interested in whether or not the new versions of peer-io.h and peer-io.c fix the problem.... could you do a build + test with them?

I stop my tests:

I was expecting failures for both. It seems that my home made builds don't have the same crash problem as IPKG builds. I have compared both binaries: they are very similar. I don't really understand.

Nevertheless, as adviced by Charles, I'm trying now a trunk version with modified peer-io. But I'm afraid that this test will not be significant on my system as my builds won't crash anymore !

comment:83 Changed 6 years ago by broadter

I have tested the 1.91 version with peer-io.h and peer-io.c provided by charles, and it crush again after about 15 minutes.

My system is mips based.

I also test the latest version(r10373) on my system, because the download rate is always very slow(about 5KB/s), and it do not crush for 2 hours, usually when the download rate > 100KB/s and for a while it would crush. and i will test the version for the whole night and tomorrow to test the latest version(r10373) + peer-io.h and peer-io.c.

any suggestion, i will do more test!

comment:84 follow-up: Changed 6 years ago by broadter

r10373 also crush. I am testing r10373 + peer-io.h and peer-io.c!

comment:85 Changed 6 years ago by giovannibajo

It might even be a miscompilation (bug in GCC), which would explain why elmer91's own builds don't crash.

I will provide an updated build with modified peer-io files later.

comment:86 in reply to: ↑ 82 ; follow-up: Changed 6 years ago by broadter

Replying to elmer91:

Replying to charles:

kylechen: I think we've already nailed down the revision range closely enough. However I would be very interested in whether or not the new versions of peer-io.h and peer-io.c fix the problem.... could you do a build + test with them?

I stop my tests:

I was expecting failures for both. It seems that my home made builds don't have the same crash problem as IPKG builds. I have compared both binaries: they are very similar. I don't really understand.

Nevertheless, as adviced by Charles, I'm trying now a trunk version with modified peer-io. But I'm afraid that this test will not be significant on my system as my builds won't crash anymore !

would you provide the Makefile in libtransmission and the compile information(use V=99), thanks!

comment:87 in reply to: ↑ 86 ; follow-up: Changed 6 years ago by elmer91

Replying to broadter:

would you provide the Makefile in libtransmission and the compile information(use V=99), thanks!

I know there are some differences between IPKG builds and mine: IPKG packages are cross-built using gcc-3.4.3-glibc-2.3.2, my builds are native builds using gcc-3.4.6

I have compared binaries, there are differences in dependencies (using strings, ldd not available): IPKG builds:

  • /lib/ld-linux.so.2
  • libevent-1.4.so.2
  • libnsl.so.1
  • libresolv.so.2
  • librt.so.1
  • libcurl.so.4
  • libssl.so.0.9.7
  • libcrypto.so.0.9.7
  • libdl.so.2
  • libz.so.1
  • libm.so.6
  • libpthread.so.0
  • libc.so.6

My builds:

  • /lib/ld-linux.so.2
  • libevent-1.4.so.2
  • libcurl.so.4
  • libssl.so.0.9.7
  • libcrypto.so.0.9.7
  • libz.so.1
  • libpthread.so.0
  • libc.so.6

I have investigated a little bit, but yet now found any reason for differences in dependencies !

I have attached Makefile for both Transmission and libtransmission on my system (in addition to config.log and make.log with V=99)

Changed 6 years ago by elmer91

Transmission make log (native ARM build)

comment:88 in reply to: ↑ 87 Changed 6 years ago by broadter

Replying to elmer91:

Replying to broadter:

I have investigated a little bit, but yet now found any reason for differences in dependencies !

I have attached Makefile for both Transmission and libtransmission on my system (in addition to config.log and make.log with V=99)

Thank you very much! I will study the log and do some new test!

comment:89 in reply to: ↑ 84 Changed 6 years ago by broadter

Replying to broadter:

r10373 also crush. I am testing r10373 + peer-io.h and peer-io.c!

It crashed too for lasting more than 6 hours!

comment:90 follow-ups: Changed 6 years ago by charles

This is all very confusing. We narrow it down to a very small range of revisions, but then it doesn't crash anymore. Except it does! But with no details. And maybe for one person and not another. And maybe it's a dependency problem! Or the way it was compiled! And I don't have any idea what "It crashed too for lasting more than 6 hours!" means.

If there's no more useful information to be had on this ticket, maybe it should be closed as invalid.

comment:91 in reply to: ↑ 90 ; follow-up: Changed 6 years ago by elmer91

Replying to charles:

And maybe it's a dependency problem!

I don't think it is a dependency problem: I was just saying that ARM native builds are very different from IPKG cross builds !

  • not same GCC version
  • not same library dependencies

IPKG builds are crashing (since 1.80) 2/3 times per day. All my native builds (r9960, 1.80 and last week trunk) have survived at least 24h.

My opinion is that a slight code change has pointed out a GCC bug on some platforms (MIPS and ARM). BTW, it would be interesting to compare GCC versions used on these different platforms:

  • IPKG builds: cross GCC 3.4.3
  • my builds: native GCC 3.4.6 (seems to work)
  • MIPS builds ?

comment:92 Changed 6 years ago by tonin

I couldn't test further with my Droboshare (ARMv5te), but the transmission builds that were crashing were cross built with:

debian:/# /usr/local/arm-none-linux-gnueabi/bin/gcc -v
Using built-in specs.
Target: arm-none-linux-gnueabi
Configured with: /scratch/paul/release/src/gcc-2006q1/configure --disable-nls --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gnueabi --enable-languages=c,c++ --enable-shared --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld --prefix=/opt/codesourcery --enable-symvers=gnu --enable-__cxa_atexit --with-versuffix='CodeSourcery ARM 2006q1-6' --with-bugurl=mailto:arm-gnu@codesourcery.com --with-sysroot=/opt/codesourcery/arm-none-linux-gnueabi/libc --with-build-sysroot=/scratch/paul/release/linux-gnueabi/install/arm-none-linux-gnueabi/libc
Thread model: posix
gcc version 4.1.0 (CodeSourcery ARM 2006q1-6)

I also think those crashes are not just random, but probably triggered by a quite unusual condition (gcc, kernel, libc, libevent or other, that I don't know).

comment:93 in reply to: ↑ 90 Changed 6 years ago by grzegorzdubicki

Replying to charles:

(...)

If there's no more useful information to be had on this ticket, maybe it should be closed as invalid.

I would be happy to help but I can not build my own version nor test any of version published here to this moment.

About the IPKG version problem: maybe we should just contact the ARM IPKG version maintainer and ask him for help with building the versions we want to test? And I would gladly test it with my "crashful" environment. :)

comment:94 Changed 6 years ago by kylechen

Have to say, when testing different version, dont forget to delete all "resume" files. I guess different version is using different resume format and it crush my transmssion during test.

comment:95 in reply to: ↑ 90 Changed 6 years ago by broadter

Replying to charles:

This is all very confusing. We narrow it down to a very small range of revisions, but then it doesn't crash anymore. Except it does! But with no details. And maybe for one person and not another. And maybe it's a dependency problem! Or the way it was compiled! And I don't have any idea what "It crashed too for lasting more than 6 hours!" means.

If there's no more useful information to be had on this ticket, maybe it should be closed as invalid.

It survived 6 hours and then crash.

I also change the compile option O3 to O0(the compile option of elmer91's build) and it survived longer.

comment:96 in reply to: ↑ 91 Changed 6 years ago by broadter

Replying to elmer91:

Replying to charles:

And maybe it's a dependency problem!

I don't think it is a dependency problem: I was just saying that ARM native builds are very different from IPKG cross builds !

  • not same GCC version
  • not same library dependencies

IPKG builds are crashing (since 1.80) 2/3 times per day. All my native builds (r9960, 1.80 and last week trunk) have survived at least 24h.

My opinion is that a slight code change has pointed out a GCC bug on some platforms (MIPS and ARM). BTW, it would be interesting to compare GCC versions used on these different platforms:

  • IPKG builds: cross GCC 3.4.3
  • my builds: native GCC 3.4.6 (seems to work)
  • MIPS builds ?

MIPS builds: cross GCC 4.2.3

comment:98 Changed 6 years ago by darwin

I've been running transmission on my MBWE (white) since 1.36 or so. Always installed from ipkg. And facing "transmission unresponsive" problem since then.

My observations:

  • the problem always appears when I end up seeding many files or downloading many files at once or file sizes of those files are huge (or combined)
  • when I check free, I'm seeing my ARM machine ran out of memory (it has only 128MB of RAM)
  • when I restart daemon, it works for a while and I'm seeing memory being eaten up, transmission usually dies short after I get to the low point
  • if I remember well, some releases had memory leaks, I think there the problem was even more obvious, I needed to restart daemon every few days even in case of having few files tracked
  • i tried to tweak parameters in setting.json to lower memory consuption, but without visible success, it maybe delayed dying but didn't solve anything

This leads to conclusion that transmission simply dies on some memory allocation (probably not directly in your code but in some libraries). They may not check that the memory was really allocated so they may end up corrupting memory (my theory).

comment:99 Changed 6 years ago by charles

Okay so we're looking for a crash that:

  • might be related to which compiler is used
  • might be related to which compiler options are used
  • might depend on how much memory is available
  • might or might not depend on a particular revision of Transmission or range of revisions

...frankly I have no idea what to do with this ticket anymore

comment:100 follow-up: Changed 6 years ago by vasyaodmin

Asus WL-500gP, using IPKG builds. transmission_1.76-1_mipsel - works OK, no crashes. All versions starting from 1.80 crashes. Sorry, no backtraces or anything, just fact - 1.76 or previous versions dont crash, 1.80+ crashes.

comment:101 in reply to: ↑ 100 Changed 6 years ago by charles

Replying to vasyaodmin:

Asus WL-500gP, using IPKG builds. transmission_1.76-1_mipsel - works OK, no crashes. All versions starting from 1.80 crashes. Sorry, no backtraces or anything, just fact - 1.76 or previous versions dont crash, 1.80+ crashes.

Please see comment:29

comment:102 Changed 6 years ago by dskchk

Just to add more information/confusion: I'm using successfully IPKG builds on a WD My Book World Edition White Light (cs05q1armel feed) up to version 1.92 with no problems, just some occasional crashes distributed over months of use.

comment:103 follow-up: Changed 6 years ago by kylechen

Keep reading some reports from many sites/forums saying all cross-compile arm/mips transmission have crash bug, but native-compile ones are fine. crash from some bug in GCC triggered by latest transmission builds.

How to compile a native arm/mips transmission out ah? from that little box?

comment:104 in reply to: ↑ 103 ; follow-up: Changed 6 years ago by broadter

Replying to kylechen:

Keep reading some reports from many sites/forums saying all cross-compile arm/mips transmission have crash bug, but native-compile ones are fine. crash from some bug in GCC triggered by latest transmission builds.

How to compile a native arm/mips transmission out ah? from that little box?

I also think all cross-compile arm/mips transmission have crash bug, from the log as below, align problem must be exist in transmission, because the crash is usualy found on arm/mips platform. This log is from mips based device:

Unhandled kernel unaligned access#1:

Cpu 0

$ 0 : 00000000 00000001 81207564 d370ae47

$ 4 : 80c90fa0 80c6bbb4 00000000 81207510

$ 8 : 00b4de37 0000c238 94136f0a 3411242b

$12 : de655692 2ae42718 9a5f8286 04c88220

$16 : 80c90fa0 81207510 000001f8 000003a8

$20 : 00000208 000001f8 81207890 00000001

$24 : 00000000 00000000

$28 : 80d64000 80d65d40 00000000 801d1478

Hi : 00000000

Lo : 0000001c

epc : 801d1470 Tainted: P

ra : 801d1478 Status: 10008403 KERNEL EXL IE

Cause : 80000014

BadVA : d370ae4b

PrId? : 0002a010

Modules linked in: ipt_iprange nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_irc nf_conntrack_irc n f_nat_h323 nf_conntrack_h323 nf_nat_tftp nf_nat_ftp nf_conntrack_tftp nf_conntrack_ftp iptable_mangle xt_MARK xt_mark ipt_LOG xt_l imit xt_state ipt_REDIRECT ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink xt_TCPMSS xt_tcpudp iptable_ filter ip_tables x_tables wl(P) bcm_enet(P) bcmprocfs(P) adsldd(P) bcmxtmcfg(P) Process transmission-da (pid: 1254, threadinfo=80d64000, task=81831810)

Stack : 00000000 81120050 80d65e50 811200e8 80b848e0 0000067a 00000000 00000001

00000000 81207564 812077ec 00000000 81207874 00000000 80d65e08 00000001 80d65e50 808a8b50 80d65f18 004c76d0 00000001 00470000 00470000 8018b480 80d65ea8 00000000 80d65e08 00136238 00000040 00000000 80d65dc0 819c6000 00000000 5ed60000 80d65e58 80189250 811200bc 80d65e58 81120050 80b848e0 ...

Call Trace:[<8018b480>][<80189250>][<800704e8>][<8018bf14>][<80043f68>][<8007d0bc>][<8007d3dc>][<80070e0c>][<80071310>][<80032070> ][<80032118>][<80018660>][<8007d40c>]

Code: ae000000 ae000004 ac430000 <0c063e71> ac620004 08074532 00000000 8ec20000 24420001 SIGSEGV

Last edited 6 years ago by broadter (previous) (diff)

comment:105 Changed 6 years ago by broadter

I tested two version: 1.76 and 1.92, both have the crash info.

comment:106 in reply to: ↑ 104 Changed 6 years ago by charles

Replying to broadter:

$ 0 : 00000000 00000001 81207564 d370ae47

$ 4 : 80c90fa0 80c6bbb4 00000000 81207510

$ 8 : 00b4de37 0000c238 94136f0a 3411242b

Unless you can translate this for me, this doesn't mean anything to me at all. Do you have a gdb backtrace?

comment:107 follow-ups: Changed 6 years ago by kylechen

start to feel hopeless about this tricky bug.

Is there someone who is really good at arm/mips system and can compile a native build before Optware/GCC finally work out a stable one? Post it here,This may be a easy fix for this bug for now.

comment:108 in reply to: ↑ 107 Changed 6 years ago by elmer91

Replying to kylechen:

Is there someone who is really good at arm/mips system and can compile a native build before Optware/GCC finally work out a stable one? Post it here,This may be a easy fix for this bug for now.

I have an uncrashable native built for ARM (synology 207+ / IPKG)

I can publish by binary if someone is interested in.

comment:109 in reply to: ↑ 107 Changed 6 years ago by grzegorzdubicki

Replying to kylechen:

start to feel hopeless about this tricky bug.

Is there someone who is really good at arm/mips system and can compile a native build before Optware/GCC finally work out a stable one? Post it here,This may be a easy fix for this bug for now.

That is some kind of a temporary solution.

I would need an as new as possible but not older than my current 1.76 uncrashable build for Synology DS210j with:

CPU: Marvell Kirkwood mv6281 ARM Processor

kernel: Linux Serwer 2.6.24 #959 Fri Nov 13 01:40:45 CST 2009 armv5tejl GNU/Linux

libc: GNU C Library stable release version 2.5, by Roland McGrath et al.
(...)
Compiled by GNU CC version 4.2.0 20070413 (prerelease).
Compiled on a Linux >>2.6.17-12-generic<< system on 2007-10-15.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        GNU libio by Per Bothner
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
        Native POSIX Threads Library by Ulrich Drepper et al
        Support for some architectures added on, not maintained in glibc core.
        BIND-8.2.3-T5B
Thread-local storage support included.

If anyone could upload it I would really appreciate it.

comment:110 Changed 6 years ago by charles

01:45:47 < CIA-45> charles * r10524 libtransmission/ (list.c list.h peer-io.c peer-io.h): (trunk libT) #2842 "Transmission crashes randomly on ARM-based Synology NAS" -- experimental commit based on giovannibajo's suggestion in comment:39 about the list struct's alignment.

anyone want to give r10524 or higher a test? :)

comment:111 follow-up: Changed 6 years ago by billyjeans

I have cross built this r10524 on centos for my optware (mips CPU). However the installation does not seem correct. Maybe something is wrong with my build enviornment. (This machine cross build TRANSMISSION_VERSION correct. But I never tried build TRANSMISSION_SVN_REV) Heres is the installation log: http://www.box.net/shared/j4izjarr4t Here is the build log: http://www.box.net/shared/kue0q0dsd1

Last edited 6 years ago by billyjeans (previous) (diff)

comment:112 in reply to: ↑ 111 ; follow-up: Changed 6 years ago by charles

Replying to billyjeans:

Here is the build log: http://www.box.net/shared/kue0q0dsd1

You appear to have created a tarfile of a single file, then bzipped it, removed the .bz2 suffix and manually added a .gz suffix, and uploaded that file to box.net. This is not the clearest way to convey information. May I suggest http://transmission.pastebay.com/ next time?

Anyway, it built fine. It's probably not installing into /opt because you're not using an account that has write properties for /opt. You might consider installing it into another directory, or installing with sudo or root.

comment:113 in reply to: ↑ 112 Changed 6 years ago by billyjeans

Replying to charles:

Anyway, it built fine. It's probably not installing into /opt because you're not using an account that has write properties for /opt. You might consider installing it into another directory, or installing with sudo or root.

Thanks for the hint. I realized that I have made typo for the bz2 as gz... Well, This is my first post here anyway. I was using the root to install, so maybe some other reasons caused my fail. I will try to find out the difference between my SVN built ipk file and the official ipk file. This may take time since I am not an expert on optware build.

comment:114 Changed 6 years ago by billyjeans

I have tried to manually copy all r10524 built files to replace the original 1.76 file. It is working and I will get back to see whether it will crash or not.

comment:115 Changed 6 years ago by billyjeans

Unfortunately all my PT site refused this experipental client.... I will try BT later.

Last edited 6 years ago by billyjeans (previous) (diff)

comment:116 Changed 6 years ago by charles

  • Keywords backport-1.9x added

Marking r10524 as backport-1.9x

comment:117 Changed 6 years ago by charles

grzegorzdubicki, giovannibajo, et al: does the above-mentioned change fix things?

comment:118 Changed 6 years ago by billyjeans

It seems works at least 8 hours for me without crash (optware, mss NAS, mips CPU, ipkg cross built from centOS). Since almost all my PT sites refuse this experimental version. So I tested with only 1 BT file download (300Kbps) and 10+ file seeding with 2G file verification. However the CPU looks much higher than 1.76. Not sure if the 1.92 is the same. r10524: 30+ file seeding, CPU 90+% 1.76: 80+ files seeding, CPU 40+%

Last edited 6 years ago by billyjeans (previous) (diff)

comment:119 Changed 6 years ago by elmer91

I have tested r10524 on my system (native build): seems to work fine

It is not really significant as my native builds are always working fine (IPKG cross-builds are crashing). Just for testing, I have also tested r10523. It seems that performances are nearly the same between r10523 and r10524 (only one torrent, 10 seeders, 200kB download: same torrent tested with both versions)

comment:120 Changed 6 years ago by billyjeans

Sorry for my mistake. Forget about my CPU probelm - just switch back to 1.76 and CPU is still high. Maybe some other reasons caused this.

comment:121 Changed 6 years ago by charles

  • Keywords backport-1.9x removed

r10524 has been backported to 1.9x/ by r10583

comment:122 follow-up: Changed 6 years ago by charles

Is there anyone else who can give a second report on whether this change fixes things? It's been two weeks since I asked and only billyjeans has responded. I was hoping that a few of the original bug reporters would also be able to weigh in.

comment:123 Changed 6 years ago by elmer91

Up to now, this change is applied to my system and is running fine since a week. But no heavy use of T. has been done (only 2 or 3 torrents downloaded).

It is a native build: never made crashed native builds on my system. It is not really significant.

comment:124 in reply to: ↑ 122 Changed 6 years ago by grzegorzdubicki

Replying to charles:

Is there anyone else who can give a second report on whether this change fixes things? It's been two weeks since I asked and only billyjeans has responded. I was hoping that a few of the original bug reporters would also be able to weigh in.

I would gladly test the change but as I wrote many times I am not able to build Trans. for my system..

comment:125 follow-up: Changed 6 years ago by kylechen

Problem solved on latest 1.93, tested mips, running smoothly. guess arm will be the same.

comment:126 Changed 6 years ago by charles

  • Milestone changed from None Set to 1.93
  • Resolution set to fixed
  • Status changed from reopened to closed

comment:127 in reply to: ↑ 125 Changed 6 years ago by grzegorzdubicki

Replying to kylechen:

Problem solved on latest 1.93, tested mips, running smoothly. guess arm will be the same.

As this bug original reporter I feel obliged to write that I have tested binary 1.93 build (SPK from Synology forum) and I think that I can also confirm that the problem seems to be fixed for ARM too.

Thank you all for your help! :)

comment:128 Changed 6 years ago by deleter

I can confirm that bug was fixed, too. I have built Transmission from sources to ipk and running for three days with no crashes.

comment:129 Changed 6 years ago by billyjeans

It works for me for almost 4 days without crash. Since the official oleg optware feed has not yet been updated, I make a cross compile and put it here for anyone who may need. http://www.box.net/shared/jepe7gx5p9

comment:130 Changed 5 years ago by jordan

  • Version changed from 1.80+ to 1.80
Note: See TracTickets for help on using tickets.