Opened 7 years ago

Last modified 5 years ago

#5505 new Bug

Assertion failed: (!tr_peerIsSeed (&msgs->peer)), function tr_peerMsgsCalculateActive

Reported by: x190 Owned by: jordan
Priority: Normal Milestone: 3.00
Component: libtransmission Version: 2.82
Severity: Normal Keywords:
Cc: peer-msgs, crash

Description

Assertion failed: (!tr_peerIsSeed (&msgs->peer)), function tr_peerMsgsCalculateActive, file /Users/transmission/jenkins/workspace/trunk-mac/libtransmission/peer-msgs.c, line 706.

https://trac.transmissionbt.com/attachment/ticket/5493/Transmission%20r14203%20Crash.log

https://forum.transmissionbt.com/viewtopic.php?f=4&t=15004&p=66100#p66098

Thread 2 Crashed: 0 libsystem_kernel.dylib 0x00007fff8ee76212 pthread_kill + 10 1 libsystem_c.dylib 0x00007fff8ba35b24 pthread_kill + 90 2 libsystem_c.dylib 0x00007fff8ba79f61 abort + 143 3 libsystem_c.dylib 0x00007fff8ba7acb9 assert_rtn + 146 4 org.m0k.transmission 0x00000001000879c0 tr_peerMsgsCalculateActive + 121 5 org.m0k.transmission 0x000000010008a051 readBtMessage + 708 6 org.m0k.transmission 0x0000000100089a47 canRead + 1432 7 org.m0k.transmission 0x0000000100080325 canReadWrapper + 245 8 org.m0k.transmission 0x000000010007eea6 event_read_cb + 327 9 org.m0k.transmission 0x00000001000b12c7 event_base_loop + 1149 10 org.m0k.transmission 0x000000010007ac77 libeventThreadFunc + 145 11 org.m0k.transmission 0x000000010007311a ThreadFunc? + 15 12 libsystem_c.dylib 0x00007fff8ba34772 _pthread_start + 327 13 libsystem_c.dylib 0x00007fff8ba211a1 thread_start + 13

Mac Client r14203 and r14206

Attachments (2)

tr_peerMsgsCalculateActive.patch (458 bytes) - added by x190 6 years ago.
Please test if you get this crash. Apply to r14339.
ignore.spurious.BT_INTERESTED.messages.patch (621 bytes) - added by x190.303cal 6 years ago.
Apply to r14444.

Download all attachments as: .zip

Change History (47)

comment:1 Changed 7 years ago by x190

There's a good chance this crash from #5493 involves webseeds.

Last edited 7 years ago by x190 (previous) (diff)

comment:2 Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:3 Changed 7 years ago by x190

See also: #5515 and comment:1 of this ticket.

comment:4 Changed 7 years ago by x190

Also occurring on 10.9.

http://pastebin.com/vK9td0Hn

comment:5 follow-up: Changed 7 years ago by jordan

#5515, #5585, and #5585 appear to be duplicates of this.

comment:6 Changed 7 years ago by Robby

comment:7 Changed 7 years ago by x190

canRead->readBtMessage->tr_peerMsgsUpdateActive->tr_peerMsgsCalculateActive
->CRASH
Maybe tr_peerMsgsUpdateActive() needs a lock so that peer-mgr.c doesn't change the peers status in tr_peerUpdateProgress() until we're done calculating?

Last edited 7 years ago by x190 (previous) (diff)

comment:8 Changed 7 years ago by CrazyFrog

I think I got the same situation with the latest stable build, Am I right? The process is dying 2/3 times a day.

transmission-daemon: peer-msgs.c:703: tr_peerMsgsCalculateActive: Assertion `!tr_peerIsSeed (&msgs->peer)' failed.

comment:9 follow-up: Changed 7 years ago by CrazyFrog

Well, Just did a kick & dirty patch on the svn source to add the x190's suggestion. Just 5 minutes active but seens to be working fine (I was affraid of dead locks). I'll keep it running for a few days to see if it's more stable.

comment:10 Changed 7 years ago by CrazyFrog

The lock didn't solve the problem but, when building I noticed a warning about structure alignment...

peer-msgs.c:2473: warning: cast increases required alignment of target type

comment:11 in reply to: ↑ 5 Changed 7 years ago by x190

Replying to jordan:

#5515, #5585, and #5585 appear to be duplicates of this.

So does #5644.

comment:12 Changed 7 years ago by iSh0w

The issue /log I posted was wrt #5644, just want to post an update! I was trying to understand why Transmission was crashing and tried to retrace my steps!

Ok, so I had a Seagate GofLex? NET NAS ans used transmission-daemon on that which sat on a pogoplug firmware modded with optware. It used a JSON file for settings of transmission. I used a lil script I wrote to parse a .txt file and fetch all the 200+ magnets in it. All was well.

I then moved to Windows, where I used transmission for windows and copied all the preferences files from my Seagate GoFlex? NET NAS to the appropriate Transmission preferences folder in windows, basically migrated from the NAS to the windows version of Transmission.

Ultimately, migrated from the windows version of Transmission to the Mac OS X version and migrated the preference files!

Today, I had a look at my preferences files and they looked to be in bad shape, the JSON file was still there, I don't even know if transmission for Mac needed that, coz it stores preferences in another .plist file.

I decided to delete and start from scratch, i.e. copy all the magnet links to clipboard, save it in a .txt document, delete all my preferences and old transmission files from /Application Support/Transmission?!

I then added each magnet link manually via the remote browser gui!

The Transmission app does not crash anymore!

Furthermore, I will wait and watch as there might be a possibility that an app called "TV Shows 2" may be the cause of the problem as well, as, that app adds magnet links of my latest subscribed TV Shows to Transmission, at which point it may have been crashing.

I am writing this huge post only to help others who are facing similar problems and who might have been doing/done the same thing I did!

/iSh0w

comment:13 follow-up: Changed 6 years ago by CrazyFrog

Still hapenning here after a clean install from svn.

comment:14 in reply to: ↑ 13 Changed 6 years ago by cfpp2p

Replying to CrazyFrog:

Still hapenning here after a clean install from svn.

confirmed here too: ticket:5782#comment:4

comment:15 in reply to: ↑ 9 Changed 6 years ago by x190

x190:

canRead->readBtMessage->tr_peerMsgsUpdateActive->tr_peerMsgsCalculateActive
->CRASH
Maybe tr_peerMsgsUpdateActive() needs a lock so that peer-mgr.c doesn't change the peers status in tr_peerUpdateProgress() until we're done calculating?

Replying to CrazyFrog:

Well, Just did a kick & dirty patch on the svn source to add the x190's suggestion. Just 5 minutes active but seens to be working fine (I was affraid of dead locks). I'll keep it running for a few days to see if it's more stable.

@CrazyFrog?, how about providing your patch so we can debug it? ;-)

Changed 6 years ago by x190

Please test if you get this crash. Apply to r14339.

comment:16 Changed 6 years ago by x190

---

Last edited 6 years ago by x190 (previous) (diff)

comment:17 follow-up: Changed 6 years ago by spaam

with that patch i get:

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff2ac5700 (LWP 10588)]
0x00007ffff64b38f0 in __libc_writev (fd=56, vector=0x7ffff2ac4060, count=1) at ../sysdeps/unix/sysv/linux/writev.c:54
54      ../sysdeps/unix/sysv/linux/writev.c: No such file or directory.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) n
Program not restarted.
(gdb) bt
#0  0x00007ffff64b38f0 in __libc_writev (fd=56, vector=0x7ffff2ac4060, count=1) at ../sysdeps/unix/sysv/linux/writev.c:54
#1  0x00007ffff77945b3 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#2  0x00007ffff7797df3 in evbuffer_write_atmost () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#3  0x0000000000444936 in tr_evbuffer_write (io=0x7fffea7c3140, fd=56, howmuch=314) at peer-io.c:317
#4  0x0000000000447348 in tr_peerIoTryWrite (io=0x7fffea7c3140, howmuch=314) at peer-io.c:1286
#5  0x000000000044750e in tr_peerIoFlush (io=0x7fffea7c3140, dir=TR_CLIENT_TO_PEER, limit=3000) at peer-io.c:1320
#6  0x000000000043812b in phaseOne (peerArray=0x7ffff2ac4d20, dir=TR_CLIENT_TO_PEER) at bandwidth.c:220
#7  0x0000000000438359 in tr_bandwidthAllocate (b=0x6cae20, dir=TR_CLIENT_TO_PEER, period_msec=500) at bandwidth.c:274
#8  0x000000000044ff7a in bandwidthPulse (foo=-1, bar=1, vmgr=0x7fffec004f90) at peer-mgr.c:3618
#9  0x00007ffff77903cc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#10 0x0000000000421310 in libeventThreadFunc (veh=0x6cb4f0) at trevent.c:246
#11 0x0000000000408a91 in ThreadFunc (_t=0x6aee80) at platform.c:109
#12 0x00007ffff67850a4 in start_thread (arg=0x7ffff2ac5700) at pthread_create.c:309
#13 0x00007ffff64bac2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) bt full
#0  0x00007ffff64b38f0 in __libc_writev (fd=56, vector=0x7ffff2ac4060, count=1) at ../sysdeps/unix/sysv/linux/writev.c:54
        resultvar = 18446744073709551584
        oldtype = 0
        result = <optimized out>
#1  0x00007ffff77945b3 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
No symbol table info available.
#2  0x00007ffff7797df3 in evbuffer_write_atmost () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
No symbol table info available.
#3  0x0000000000444936 in tr_evbuffer_write (io=0x7fffea7c3140, fd=56, howmuch=314) at peer-io.c:317
        e = 4425595
        n = 0
        errstr = "\000\000\000\000\320\a\000\000\260\257l\000\000\000\000\000\360\b\000\354\377\177\000\000\001\000\000\000\000\000\000\000\060I\254\362\377\177\000\000 \256l\000\001\000\000\000PI\254\362\377\177\000\000\020\210C\000\000\000\000\000PI\254\362\377\177\000\000U%B\000\001\000\000\000\240[r\037\001\000\000\000 \256l\000\000\000\000\000\240I\254\362\377\177\000\000\003\206C\000\000\000\000\000\240[r\037\001\000\000\000:\001\000\000\000\000\000\000\240[r\037I\001\000\000 \256l\000\000\000\000\000\060J\254\362\377\177", '\000' <repeats 23 times>, "H\205@\360I\254\362\377\177\000\000J\207C", '\000' <repeats 13 times>...
#4  0x0000000000447348 in tr_peerIoTryWrite (io=0x7fffea7c3140, howmuch=314) at peer-io.c:1286
        e = 0
        n = 0
        old_len = 314
#5  0x000000000044750e in tr_peerIoFlush (io=0x7fffea7c3140, dir=TR_CLIENT_TO_PEER, limit=3000) at peer-io.c:1320
        bytesUsed = 0
        __PRETTY_FUNCTION__ = "tr_peerIoFlush"
#6  0x000000000043812b in phaseOne (peerArray=0x7ffff2ac4d20, dir=TR_CLIENT_TO_PEER) at bandwidth.c:220
        i = 2
        increment = 3000
        bytesUsed = 32767
        n = 4
        peerCount = 4
        peers = 0x7fffec1bf530
#7  0x0000000000438359 in tr_bandwidthAllocate (b=0x6cae20, dir=TR_CLIENT_TO_PEER, period_msec=500) at bandwidth.c:274
        i = 4
        peerCount = 4
        tmp = {items = 0x7fffec1bf020, n_items = 4, n_alloc = 32}
        low = {items = 0x7fffec0c9dc0, n_items = 4, n_alloc = 32}
        high = {items = 0x0, n_items = 0, n_alloc = 0}
        normal = {items = 0x7fffec1bf530, n_items = 4, n_alloc = 32}
        peers = 0x7fffec1bf020
#8  0x000000000044ff7a in bandwidthPulse (foo=-1, bar=1, vmgr=0x7fffec004f90) at peer-mgr.c:3618
        tor = 0x7fffec005060
        mgr = 0x7fffec004f90
        session = 0x6cac10
#9  0x00007ffff77903cc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
No symbol table info available.
#10 0x0000000000421310 in libeventThreadFunc (veh=0x6cb4f0) at trevent.c:246
        base = 0x7fffec0008f0
        eh = 0x6cb4f0
#11 0x0000000000408a91 in ThreadFunc (_t=0x6aee80) at platform.c:109
        t = 0x6aee80
#12 0x00007ffff67850a4 in start_thread (arg=0x7ffff2ac5700) at pthread_create.c:309
        __res = <optimized out>
        pd = 0x7ffff2ac5700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737264768768, 5837302229518105092, 1, 140737354125408, 0, 140737264768768, -5837326834843928060, -5837316560835855868}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#13 0x00007ffff64bac2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

comment:18 in reply to: ↑ 17 Changed 6 years ago by mike.dld

Replying to spaam:

with that patch i get: ...

That's not really an issue. When debugging you need to ignore SIGPIPE and pass it to the program. Before you start, execute handle SIGPIPE pass nostop (for GDB) or process handle SIGPIPE -p yes -s no (for LLDB).

Last edited 6 years ago by mike.dld (previous) (diff)

comment:19 Changed 6 years ago by spaam

everything looked good with that patch and that ignore thing this morning i saw this:

transmission-daemon: peer-msgs.c:708: tr_peerMsgsCalculateActive: Assertion `!tr_peerIsSeed (&msgs->peer)' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff2ac5700 (LWP 25111)]
0x00007ffff640a077 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb)
(gdb) bt
#0  0x00007ffff640a077 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff640b458 in __GI_abort () at abort.c:89
#2  0x00007ffff6403196 in __assert_fail_base (fmt=0x7ffff65398c8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x48aced "!tr_peerIsSeed (&msgs->peer)", file=file@entry=0x48abc6 "peer-msgs.c", line=line@entry=708,
    function=function@entry=0x48b9b0 <__PRETTY_FUNCTION__.10118> "tr_peerMsgsCalculateActive") at assert.c:92
#3  0x00007ffff6403242 in __GI___assert_fail (assertion=0x48aced "!tr_peerIsSeed (&msgs->peer)", file=0x48abc6 "peer-msgs.c", line=708, function=0x48b9b0 <__PRETTY_FUNCTION__.10118> "tr_peerMsgsCalculateActive") at assert.c:101
#4  0x00000000004525ef in tr_peerMsgsCalculateActive (msgs=0x7fffe8e93100, direction=TR_CLIENT_TO_PEER) at peer-msgs.c:708
#5  0x00000000004527f1 in tr_peerMsgsUpdateActive (msgs=0x7fffe8e93100, direction=TR_CLIENT_TO_PEER) at peer-msgs.c:762
#6  0x000000000045485c in readBtMessage (msgs=0x7fffe8e93100, inbuf=0x7fffe828f620, inlen=0) at peer-msgs.c:1523
#7  0x0000000000453f3f in readBtId (msgs=0x7fffe8e93100, inbuf=0x7fffe828f620, inlen=1) at peer-msgs.c:1308
#8  0x0000000000455590 in canRead (io=0x7fffeabde780, vmsgs=0x7fffe8e93100, piece=0x7ffff2ac3a80) at peer-msgs.c:1774
#9  0x00000000004444e5 in canReadWrapper (io=0x7fffeabde780) at peer-io.c:203
#10 0x0000000000444df7 in utp_on_read (closure=0x7fffeabde780, buf=0x7ffff2ac3dd4 "\030\305\377\222\205", <incomplete sequence \367>, buflen=464) at peer-io.c:424
#11 0x00000000004793ae in UTP_ProcessIncoming (conn=0x7fffec0080d0, packet=0x7ffff2ac3dc0 "\001", len=484, syn=false) at utp.cpp:2158
#12 0x000000000047a5da in UTP_IsIncomingUTP (incoming_proc=0x42031b <incoming>, send_to_proc=0x420437 <tr_utpSendTo>, send_to_userdata=0x6cac10, buffer=0x7ffff2ac3dc0 "\001", len=484, to=0x7ffff2ac3d40, tolen=16) at utp.cpp:2587
#13 0x0000000000420637 in tr_utpPacket (buf=0x7ffff2ac3dc0 "\001", buflen=484, from=0x7ffff2ac3d40, fromlen=16, ss=0x6cac10) at tr-utp.c:180
#14 0x000000000041fe37 in event_callback (s=16, type=2, sv=0x6cac10) at tr-udp.c:226
#15 0x00007ffff77903cc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#16 0x0000000000421310 in libeventThreadFunc (veh=0x6cb4f0) at trevent.c:246
#17 0x0000000000408a91 in ThreadFunc (_t=0x6aee80) at platform.c:109
#18 0x00007ffff67850a4 in start_thread (arg=0x7ffff2ac5700) at pthread_create.c:309
#19 0x00007ffff64bac2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:10118

comment:20 Changed 6 years ago by x190

Look carefully at my patch.

https://trac.transmissionbt.com/attachment/ticket/5505/tr_peerMsgsCalculateActive.patch

There is no such assert at Line 708 (peer-msgs.c --- red means delete). Did you forget to remove it or comment it out?

Last edited 6 years ago by x190 (previous) (diff)

comment:21 follow-up: Changed 6 years ago by Robby

x190, are you still getting this crash? I got it when I tried the nightly a while back.

comment:22 Changed 6 years ago by cfpp2p

x190 has a fix that preliminary, but valid testing indicates a solution to the problem :) https://forum.transmissionbt.com/viewtopic.php?f=4&t=16639#p69969

Last edited 6 years ago by cfpp2p (previous) (diff)

comment:23 follow-up: Changed 6 years ago by mike.dld

While trying to reproduce the issue (and succeeding at it), I also had a couple of failed assertions on similar condition in the same function

if (is_active)
  assert (!tr_torrentIsSeed (msgs->torrent));

(for TR_PEER_TO_CLIENT case), and also in tr_peerMsgsIsActive on

assert (is_active == tr_peerMsgsCalculateActive (msgs, direction));

Those assertions are there for good, i.e. how can a peer be a seed if we've just received BT_INTERESTED from it (from what I've observed)? Or how could we be interested if we're seeding already? IMO those are conditions which shouldn't be met. Seeing the failures suggests that either Transmission (and/or another client) misbehaves and should be fixed or that our expectations are wrong to begin with and assertions should be transformed into simple conditions and handled normally. Proposed patch does the latter, but I'm wondering if it could still be the former. Any thoughts on this?

comment:24 Changed 6 years ago by mike.dld

Oh, worth noting that assertion in tr_peerMsgsIsActive failed after I changed tr_peerMsgsCalculateActive to return false instead of asserting on tr_peerIsSeed and tr_torrentIsSeed.

comment:25 in reply to: ↑ 23 Changed 6 years ago by cfpp2p

Replying to mike.dld:

...i.e. how can a peer be a seed if we've just received BT_INTERESTED from it (from what I've observed)? Or how could we be interested if we're seeding already? IMO those are conditions which shouldn't be met. Seeing the failures suggests that either Transmission (and/or another client) misbehaves...

My thoughts on this are conflicted. Transmission is misbehaving, and at the same time some or all of our expectations are wrong. The simple conditionals may just push the misunderstanding further into the code but x190's patch and test subject hasn't indicated a problem. However your report in comment:24 indicates otherwise.

I've looked at this section of code many times and have never liked it or been able to focus and really get my head around it. I always trace back to ticket:5372 Wrong peer states displayed by Transmission thinking that's where the problem manifested and was an indicator of a deeper problem somewhere else in the code r14108.

I'm glad that you've taken initiative and reproduced the problem. Looks like the real problem could possibly be exposed.

These are my thoughts.

comment:26 Changed 6 years ago by dirty

I've been seeing this recently. I am using the nightly build, revision 14388.

comment:27 in reply to: ↑ 21 Changed 6 years ago by x190.303cal

Replying to Robby:

x190, are you still getting this crash? I got it when I tried the nightly a while back.

You have to apply the patch from comment:15 yourself.

comment:28 Changed 6 years ago by cfpp2p

The crash I think is reported somewhat frequently on the forums too.

https://forum.transmissionbt.com/viewtopic.php?f=2&t=16662

comment:29 follow-up: Changed 6 years ago by mike.dld

I've tried debugging this and I see that sometimes (quite rarely actually) Vuze 5.5.0.0 sends "bitfield" message indicating that it has all the data and right after that sends "interested" message. This is what causes assertion to fail. Yet to talk to Vuze developers (to determine if it is a bug in Vuze), could as well be already fixed in 5.5.0.1 beta. If someone has an account on their forums I'd be grateful if you take the lead.

comment:30 in reply to: ↑ 29 Changed 6 years ago by cfpp2p

Replying to mike.dld:

I've tried debugging this and I see that sometimes (quite rarely actually) Vuze 5.5.0.0 sends "bitfield" message indicating that it has all the data and right after that sends "interested" message. This is what causes assertion to fail...

It looks promising that you might be finally getting to the bottom of this issue.

But even if it is a Vuze bug I still think it necessary to eliminate in transmission where a remote client could crash transmission in this fashion. There have been issues with Vuze before i.e. #5049 r13527

comment:31 Changed 6 years ago by mike.dld

HastaJun (on IRC) suggested that this could be connected to another long-standing issue where Transmission is unchoking uninterested peers. Quick patch seems to fix assertion failure for me, but still points out that Vuze doesn't behave well.

  • libtransmission/peer-mgr.c

     
    31163116  unchokedInterested = 0;
    31173117  for (i=0; i<size && unchokedInterested<session->uploadSlotsPerTorrent; ++i)
    31183118    {
     3119      if (!choke[i].isInterested)
     3120        continue;
    31193121      choke[i].isChoked = isMaxedOut ? choke[i].wasChoked : false;
    3120       if (choke[i].isInterested)
    3121         ++unchokedInterested;
     3122      ++unchokedInterested;
    31223123    }
    31233124
    31243125  /* optimistic unchoke */

comment:32 follow-up: Changed 6 years ago by mike.dld

The comment just above the code I patched indicates that unchoking uninterested clients is the intended behavior:

Peers which have a better upload rate (as compared to the downloaders) but aren't interested get unchoked. If they become interested, the downloader with the worst upload rate gets choked. If a client has a complete file, it uses its upload rate rather than its download rate to decide which peers to unchoke.

This leaves us with two consequences:

  1. Transmission deliberately unchokes uninterested peers and then asserts if they become interested while already having all the data. This could be fixed by eliminating the assertion and handling the case normally (after all, the bug is not in our code so assertion doesn't make much sense), which is done in x190's patch attached.
  2. Vuze 5.5.0.0 becomes interested when unchoked even if it already has all the data. This could be fixed by Vuze developers, but only in future versions. I've also observed rare cases when other clients (e.g. BitTorrent 7.1.0 and µTorrent 3.4.2) send 'interested' message at the time tr_peerIsSeed returns true, so it seems to be not at all uncommon.

Sorry for wasting everyone's time with my investigation, although it was interesting to get to know what's happening. I'm leaving actual commit to jordan if no one would mind. Note that release versions of Transmission don't crash (read: abort) as asserts are disabled, so workaround until fix would be to compile with -DNDEBUG.

Changed 6 years ago by x190.303cal

Apply to r14444.

comment:33 Changed 6 years ago by mike.dld

From ##bittorrent (on IRC):

<HastaJun> The_8472: Would you mind taking a quick look at this:
           https://trac.transmissionbt.com/ticket/5505#comment:29 IRC logg:
           http://pastebin.com/R68StLL8
<mikedld>  "quite rarely actually" should be removed there, as it happens to
           each one of Vuze 5.5.0.0 clients I think. sometimes assertion isn't
           being hit though, didn't dig into that; jordan_ will be more
           productive in that anyway
<mikedld>  what I meant to say is that Vuze always becomes interested when we
           unchoke it (and it has all the data), and only "quite rarely
           actually" the assertion is being hit
<The_8472> HastaJun, transmission unchoking uninterested peers sounds like a
           bug to me.
<The_8472> the BT spec states "Connections start out choked and not interested."
<The_8472> and it has no reason to unchoke according to the spec
<The_8472> of course vuze's behavior seems odd too
<HastaJun> Yea, I agree that Transmission have a bug there. Do you think Vuzes
           behavior is because it trying to get to a correct state in the state
           machine?!
<HastaJun> I don't know how much the code have changed in Vuze since you where
           involved, that part ought to be quite stable and untuched since
           several years
<The_8472> HastaJun, looking at the source i can't find a place where it would
           send an interested message unless it's really interested
<HastaJun> mikedld: ping
<The_8472> the whole thing seems conditional on there being unfinished pieces
<HastaJun> Yeah, it is strange and messy with bug in two ends...
<HastaJun> bugs
<mikedld>  The_8472: where's that source code you're looking at? I'd like to
           take a look myself as well
<The_8472> http://svn.vuze.com/public/client/
<The_8472> org.gudy.azureus2.core3.peer.impl.transport.PEPeerTransportProtocol.
           checkInterested()
<mikedld>  and azureus3 directory is something else, right?
<The_8472> the source is split over multiple subdirs, this particular file is
           in the azureus2/src folder
<HastaJun> Well, now i'm out because Java is not my thing =) Thanks for you
           tock the time The_8472
...
<The_8472> mikedld, that's basically the method making the decision:
           http://pastebin.com/a7Nfz4Py
<mikedld>  so interested_in_other_peer could only become false if
           is_download_disabled is true
<The_8472> hum...
<mikedld>  which in turn only happens in setDownloadRateLimitBytesPerSecond,
           right?
<The_8472> yeah, isdownloaddisabled = true can be considered uncommon
<The_8472> looks like the boolean logic was incorrectly modified to add the
           is_download_disabled logic
<mikedld>  and it happens so that particular code has changed from 5.4.0.0 :)
<The_8472> interested_in_other_peer used to be initialized to false
<mikedld>  would it be too much to ask you to convey this issue to Vuze
           developers?
<The_8472> can do

comment:34 in reply to: ↑ 32 Changed 6 years ago by cfpp2p

Replying to mike.dld:

...(after all, the bug is not in our code so assertion doesn't make much sense)...

Agreed.

Sorry for wasting everyone's time with my investigation...

didn't waste my time

<The_8472> HastaJun?, transmission unchoking uninterested peers sounds like a bug to me. <The_8472> the BT spec states "Connections start out choked and not interested." <The_8472> and it has no reason to unchoke according to the spec <The_8472> of course vuze's behavior seems odd too <HastaJun?> Yea, I agree that Transmission have a bug there. Do you think Vuzes behavior is because it trying to get to a correct state in the state machine?!

The above, stating transmission has a bug in the unchoking I don't agree with.

As mike.dld already stated:

The comment just above the code I patched indicates that unchoking uninterested clients is the intended behavior:

Peers which have a better upload rate (as compared to the downloaders) but aren't interested get unchoked. If they become interested, the downloader with the worst upload rate gets choked. If a client has a complete file, it uses its upload rate rather than its download rate to decide which peers to unchoke.

related references for libtransmission/peer-mgr.c :

but aren't interested get unchoked r4579

If our bandwidth is maxed out, don't unchoke any more peers r11229

support "reject request" and "have all/none" from BEP 6 -- support fast exensions #1549 r7234

comment:35 Changed 6 years ago by cfpp2p

A patch from the forums https://forum.transmissionbt.com/viewtopic.php?f=4&t=16639&p=70150#p70088 dictates an excellent method to fix without altering the original intended behavior. We'll patch peer-msgs.c

  • libtransmission/peer-msgs.c

     
    15181518
    15191519        case BT_INTERESTED:
    15201520            dbgmsg (msgs, "got Interested");
    1521             msgs->peer_is_interested = true;
    1522             tr_peerMsgsUpdateActive (msgs, TR_CLIENT_TO_PEER);
     1521            if (!tr_peerIsSeed (&msgs->peer)) /* ignore spurious BT_INTERESTED messages */
     1522            {
     1523              msgs->peer_is_interested = true;
     1524              tr_peerMsgsUpdateActive (msgs, TR_CLIENT_TO_PEER);
     1525            }
    15231526            break;
    15241527
    15251528        case BT_NOT_INTERESTED:

comment:36 follow-up: Changed 6 years ago by mike.dld

I tried making this kind of change myself before and I could tell you that

  • (see below)
  • IMHO it's not correct to ignore the message (and hence lose the information of client being interested) which could then lead to other illogical stuff like seeing that peer wasn't interested (because we ignored the 'interested' message before) upon receiving 'not interested' message.

EDIT: I stand corrected, assertion isn't being hit with this patch. I must have thought of something else. Second point is still there though.

Last edited 6 years ago by mike.dld (previous) (diff)

comment:37 Changed 6 years ago by cfpp2p

One thing we do know, the assertion brought to light a problem and made it obvious something needs to be done.

comment:38 in reply to: ↑ 36 Changed 6 years ago by cfpp2p

Replying to mike.dld:

...which could then lead to other illogical stuff like seeing that peer wasn't interested (because we ignored the 'interested' message before) upon receiving 'not interested' message.

If we get peers that send transmission illogical sequences of messages it's ultimately the responsibility of transmission to properly deal with this. If the release versions of "Transmission don't crash (read: abort) as asserts are disabled" then transmission must at times already be dealing with illogical sequences as the true/false status of is_active is not correctly decided as it stands now.

When jordan comes onto the scene his expertise and valued experience should solve these dilemmas. We've got three different patches now each with their own merits so I don't see that we won't have a solid solution soon.

Last edited 6 years ago by cfpp2p (previous) (diff)

comment:39 Changed 6 years ago by x190.303cal

Why don't we just deal with it, as peers will be using old client versions basically forever?

comment:40 follow-up: Changed 6 years ago by x190.303cal

Attached updated patch to handle spurious BT_INTERESTED messages.

comment:41 in reply to: ↑ 40 Changed 6 years ago by x190

Replying to x190.303cal:

Attached updated patch to handle spurious BT_INTERESTED messages.

https://trac.transmissionbt.com/ticket/5505#comment:35

comment:42 Changed 6 years ago by mike.dld

  • Milestone changed from None Set to 2.90

Sorry that it took so much time and so many broken nightly builds, but I've commented the assert out for now (r14473) and am changing milestone to 2.90 so that @jordan won't forget to look into it before release. I think we all agree that number of reports for this one is getting ridiculously high.

comment:43 Changed 6 years ago by cfpp2p

OK, good. Now lets see if we can coax jordan back on the scene. :)

comment:44 Changed 5 years ago by mike.dld

  • Milestone changed from 2.90 to 2.91

comment:45 Changed 5 years ago by mike.dld

  • Milestone changed from 2.91 to 3.00
Note: See TracTickets for help on using tickets.