Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#4090 closed Bug (duplicate)

Reproducible µTP assert

Reported by: m1b Owned by:
Priority: Normal Milestone:
Component: Transmission Version: 2.22+
Severity: Normal Keywords:
Cc: jch

Description

In builds <= 12104, Mac OS X 10.5.8 PPC, Transmission will abort after a relatively short period of time running with µTP enabled with the assert below. Disabling µTP works around the problem.


[0x0-0x11eb1ea].org.m0k.transmission[40354] Assertion failed: (min_rtt >= 0), function apply_ledbat_ccontrol, file /Users/sheila/transmission/trunk/third-party/libutp/utp.cpp, line 1627.

Thread 1 Crashed:
0   libSystem.B.dylib             	0x9010e2b0 __semwait_signal_nocancel + 8
1   libSystem.B.dylib             	0x9010dd7c nanosleep$NOCANCEL$UNIX2003 + 176
2   libSystem.B.dylib             	0x90106fe0 usleep$NOCANCEL$UNIX2003 + 68
3   libSystem.B.dylib             	0x90120c04 abort + 92
4   libSystem.B.dylib             	0x90113c0c __assert_rtn + 108
5   org.m0k.transmission          	0x000edeac UTPSocket::apply_ledbat_ccontrol(unsigned long, unsigned int, long long) + 1092
6   org.m0k.transmission          	0x000ee6f0 UTP_ProcessIncoming(UTPSocket*, unsigned char const*, unsigned long, bool) + 1984
7   org.m0k.transmission          	0x000ef274 UTP_IsIncomingUTP + 996
8   org.m0k.transmission          	0x000bd648 tr_utpPacket + 156
9   org.m0k.transmission          	0x000bcac0 event_callback + 284
10  org.m0k.transmission          	0x000d6188 event_base_loop + 2432
11  org.m0k.transmission          	0x000d6438 event_base_dispatch + 16
12  org.m0k.transmission          	0x00092500 libeventThreadFunc + 144
13  org.m0k.transmission          	0x000821f0 ThreadFunc + 32
14  libSystem.B.dylib             	0x900460d0 _pthread_start + 316

Attachments (1)

libutp-nonmonotonic-time.patch (954 bytes) - added by jch 11 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 Changed 11 years ago by jordan

  • Milestone None Set deleted
  • Version changed from 2.22 to 2.22+

comment:2 Changed 11 years ago by jordan

m1b: I've pinged alus about this in IRC, but you might want to report this upstream to https://github.com/bittorrent/libutp/issues even if only for bookkeeping purposes.

comment:3 Changed 11 years ago by jordan

from ghazel:

This assert can go off when the function to get a value from the monotonic clock returns a value lower than one it returned before. Either the platform/CPU has a bug, or libutp is being compiled incorrectly for that platform.

comment:4 Changed 11 years ago by jch

  • Cc jch added

comment:5 Changed 11 years ago by m1b

jordan: given ghazel's comment, I'm not sure what to do. I'm using the nightlies from xpjets rather than building my own, so that suggests something is wrong with the build settings there. (I'm feeling pretty confident about my platform/CPU not having a bug.) My primary concern is that since µTP is going to be on by default, having a known crasher within is probably not in anyone's interest. Please let me know how I can help further.

comment:6 Changed 11 years ago by jordan

Right, given Greg's comment I think this is likely to be a Transmission build issue rather than a libutp issue.

I'm not sure what the cause is right now. I'll have more time to look into this tonight.

comment:7 Changed 11 years ago by m1b

What bothers me about this is that the PPC7400 backend has been untouched for a while in Apple's toolchain. Deprecated architecture or not, it's mature and really shouldn't be producing bad code. Are the opt settings too aggressive, eg -O3 rather than -Os?

comment:8 Changed 11 years ago by jch

Hmm, we still should not be crashing if "monotonic" time goes backwards. See the very end of http://www.pps.jussieu.fr/~jch/software/repos/ahcpd/monotonic.c for how it can be done.

--jch

comment:9 Changed 11 years ago by jch

Please test the following. (Jordan, please don't apply -- I'll push it to Greg first.)

--jch

Changed 11 years ago by jch

comment:11 Changed 11 years ago by jch

  • Resolution set to fixed
  • Status changed from new to closed

Oh, what the hell. Committed in r12117.

comment:12 Changed 11 years ago by m1b

Confirming that 12117 fixes the problem I reported. It's been running for several hours and hasn't crashed/asserted yet. Thanks!

comment:13 Changed 11 years ago by m1b

  • Resolution fixed deleted
  • Status changed from closed to reopened

Regret to report that 12117 asserted, but after a longer time. Same error and line number reported, which seems a bit odd.

Assertion failed: (min_rtt >= 0), function apply_ledbat_ccontrol, file /Users/sheila/transmission/trunk/third-party/libutp/utp.cpp, line 1627.

Thread 1 Crashed:
0   libSystem.B.dylib             	0x9081d2b0 __semwait_signal_nocancel + 8
1   libSystem.B.dylib             	0x9081cd7c nanosleep$NOCANCEL$UNIX2003 + 176
2   libSystem.B.dylib             	0x90815fe0 usleep$NOCANCEL$UNIX2003 + 68
3   libSystem.B.dylib             	0x9082fc04 abort + 92
4   libSystem.B.dylib             	0x90822c0c __assert_rtn + 108
5   org.m0k.transmission          	0x000ecee0 UTPSocket::apply_ledbat_ccontrol(unsigned long, unsigned int, long long) + 1092
6   org.m0k.transmission          	0x000ed740 UTP_ProcessIncoming(UTPSocket*, unsigned char const*, unsigned long, bool) + 2012
7   org.m0k.transmission          	0x000ee2c4 UTP_IsIncomingUTP + 996
8   org.m0k.transmission          	0x000bc710 tr_utpPacket + 156
9   org.m0k.transmission          	0x000bbb88 event_callback + 284
10  org.m0k.transmission          	0x000d51bc event_base_loop + 2432
11  org.m0k.transmission          	0x000d546c event_base_dispatch + 16
12  org.m0k.transmission          	0x00091a80 libeventThreadFunc + 144
13  org.m0k.transmission          	0x00081770 ThreadFunc + 32

comment:14 Changed 11 years ago by jch

Hmm... I'm puzzled.

--jch

comment:15 Changed 11 years ago by jordan

I don't see the cause for this yet.

m1b, please excuse the stupid question, but are you positive that was >= 12117 in the crash report in comment:13?

comment:16 Changed 11 years ago by m1b

jordan: not a stupid question at all; I was skeptical myself. I now have two crash logs with that assert as the cause that are definitely from 12117. I haven't updated to a more recent build since I saw that PPC support was removed and my Transmission machine is PPC-based. Would love to help otherwise.

comment:17 Changed 11 years ago by jch

Reverted r12117 in r12140.

--jch

comment:18 Changed 11 years ago by m1b

For completeness' sake, a pastebin of the assert message and the detailed crash log from 12139: http://pastebin.com/7H5PkCS9

comment:19 Changed 11 years ago by jordan

  • Resolution set to invalid
  • Status changed from reopened to closed

Since the Mac build is dropping PPC suppport in 2.30 anyway, I'm not sure it makes sense to track this down further unless we get a report of this occurring on a non-Mac-PPC environment.

I have no opinion at all about the PPC decision, but that's being handled in a separate ticket #4104

comment:20 Changed 11 years ago by jordan

  • Resolution invalid deleted
  • Status changed from closed to reopened

After talking this over with m1b, I'm going to reopen this ticket for awhile longer.

I don't have any new insights about what could be causing this assertion failure, but m1b reiterated that he's eager to help however he can and is available to test out future patches.

We could also leave the ticket open and defer more investigation until 2.30 gets a beta out -- by then we should be able to get more testing from a wider number of platforms.

jch, do you have any opinion on this?

comment:21 Changed 11 years ago by jch

Yes. Could the two of you please arrange to compile a verison of Transmission with no debugging, and see whether you can reproduce the issue on that?

(I'm going to implement a proper piece of paranoia for non-monotonic monotonic time, but I suspect that is not what m1b is seeing.)

--jch

comment:22 Changed 11 years ago by livings124

m1b: Any update on this? Are you still getting it with a current nightly?

comment:23 Changed 11 years ago by m1b

I've run locally-built Release builds for a few days at a time without any visible issues. When I last used a Nightly from xpjets, the assert leading to a SIGABRT happened within a few hours.

This makes me wonder whether the check leading to the assert simply isn't in the Release build, so the underlying issue may still be there and causing invisible uTP problems.

Let me know if there's anything specific you'd like me to try.

comment:24 Changed 11 years ago by livings124

Try building in debug instead of release, since we're trying to debug.

comment:25 Changed 11 years ago by m1b

livings124: The whole reason I was using a Release build was jch's comment:21.

The problem is consistently reproducible with debug builds (since that's what xpjets makes). There're no visible signs of issues with release builds, but I think that means that the problem is unnoticed by the code, not that it's fixed itself.

I'm happy to go back to the Nightly builds, but in the absence of someone (jch or jordan, perhaps?) digging into the root cause of the monotonic clock assert, all that I expect to see is the assert -> SIGABRT behavior again.

Or am I missing something obvious?

comment:26 Changed 11 years ago by m1b

I just took the opportunity to AB test build 12203; I can't test builds newer than 12203 due to #4150. When using the xpjets debug build, I can still reproduce the assert leading to a SIGABRT. When using a locally-built release build, it carries on, though the error likely persists unreported.

Not sure what else I can do with this at the moment.

comment:27 Changed 11 years ago by jch

Superseded by #4153. Please reopen if this is still visible after #4153 gets fixed.

--jch

comment:28 Changed 11 years ago by jch

  • Resolution set to duplicate
  • Status changed from reopened to closed

comment:29 Changed 11 years ago by jch

#4153 has now been fixed, so hopefully this issue will go away too.

--jch

Note: See TracTickets for help on using tickets.