#3563 closed Bug (worksforme)
bandwidthPulse may be blocking the main event loop for extended periods of time
Reported by: | wereHamster | Owned by: | |
---|---|---|---|
Priority: | Normal | Milestone: | None Set |
Component: | Transmission | Version: | 2.04 |
Severity: | Normal | Keywords: | |
Cc: |
Description
bandwidthPules is on a 500ms timer. However, there are situations where this function doesn't return for a much longer (I measured up to 8 seconds). That means that the libT event thread is blocked for that time and no other thread can access any data from the libT thread. That causes the UI to block when it is trying to access that data (visible as 'beachball' on macosx).
The problem arises when Transmission has enough data to read or write and the network is fast enough to actually send this data. bandwidthPulse reads/writes until the kernel tells it that no more can be read/written. If the network can send data as fast as T reads/writes then bandwidthPulse keeps processing data until all queues have been drained. Let's assume the situation where T has requested a couple dozens of blocks à 4MB and the peer is sending us those. T reads one such block, starts processing it, and during that time another block arrives. T reads that and processes it and during that time... . This repeats until the peer has served us all blocks we have requested. This whole process can take a couple seconds.
The bandwidthPulse function needs to make sure to not block for more than 500ms and give control back to the libevent main loop if that time is exceeded. Ideally, we'd do all IO in libevent callbacks (evented-IO). However, this may be difficult to implement because T also has to look out for peer priorities. If I remove all calls to phaseOne from tr_bandwidthAllocate and strip that function down to tr_peerIoFlushOutgoingProtocolMsgs + tr_peerIoSetEnabled (this effectively causes all IO to be done from libevent callbacks) then the 'beechball' disappears completely. This would be the ideal solution (if peer priorities wouldn't exist). A simple timer inside phaseOne is probably the better solution at this point.
Attachments (2)
Change History (10)
comment:1 Changed 13 years ago by wereHamster
- Summary changed from bandwidthPules may be blocking the main event loop for extended periods of time to bandwidthPulse may be blocking the main event loop for extended periods of time
comment:2 Changed 13 years ago by charles
comment:3 Changed 13 years ago by wereHamster
- Version changed from 2.04+ to 2.04
This problem has been there since pretty much forever.
comment:4 Changed 12 years ago by jusid
Probably this issue causes slow RPC response on NMT devices. Often, RPC becomes very unresponsible (10-15 seconds to receive a response), while the transmission daemon is not very busy. This happened for older versions of Transmission (I started to use Transmission from v1.50) and continues in v2.11. I hope it will be fixed. :)
comment:5 Changed 12 years ago by jordan
I wonder about the "may" (in the ticket description) and jusid's prefixing his comment with "probably."
Does anyone have any solid evidence that this is taking place, and isn't just a phantom bug report? Have either of you put a stopwatch in bandwidthPulse()?
comment:6 Changed 12 years ago by wereHamster
I once profiled it (or rather, just added printfs to see how much time is spent inside bandwidthPulse). I don't think I have the code anymore, but it's easy to add. Anyway, the result was that T spent a *lot* time inside it, often much longer than 500ms. And I don't think I was making that 8 seconds I mention in the description up, it came straight from the printf's I had added.
The code may have changed and/or improved since, so another benchmark would probably be appropriate at this time.
comment:7 Changed 12 years ago by jordan
- Resolution set to worksforme
- Status changed from new to closed
Closing until someone provides an updated benchmark as suggested in comment:6
comment:8 Changed 12 years ago by wereHamster
Each line has the format 'bandwidthPulse <msec since last call to bandwidthPulse> <msec spent in the bandwidthPulse function>'. The first number should be 500, the second as short as possible.
As you can see sometimes the time between two calls to bandwidthPulse is quite long (up to 40 seconds), and the function itself can also block for quite long (often 0.7 - 3 seconds).
Unless this is new behavior that's arisen only post-2.04, please change the Version from 2.04+ to 2.04. Thanks!