Opened 8 years ago

Closed 7 years ago

#3767 closed Bug (fixed)

Rarest first policy

Reported by: athy Owned by: jordan
Priority: Normal Milestone: 2.30
Component: libtransmission Version: 2.12
Severity: Major Keywords:
Cc:

Description

At the moment, transmission is lacking support for the rarest first policy, which is one of the most important component of BitTorrent?'s reliability.

It has been discussed on the forum : https://forum.transmissionbt.com/viewtopic.php?f=2&t=10883

The current scheduling is :
1) Pieces that are already partially downloaded
2) Pieces that are part of file with a higher priority
3) Randomly chosen pieces

To keep the swarm healthy and staying interesting for the other peers, 3) should be "less replicated pieces" and random should be 4)

Here is a patch to add the rarest first support. It is still a work in progress but it is basically working.

Currently what it does is :

  • add two members to the structure tr_torrent_peers : an array to count the number of replication of each piece and a tr_bool to remember whether the pieces are sorted or not
  • replication count comes before random when comparing two pieces
  • increase the replication count when we receive a BITFIELD or a HAVE message (or a HAVE_ALL message of the fast extension)
  • decrease the replication count when a peer is disconnected/removed
  • if the replication count of a large number of pieces is modified (i.e. we received a bitfield) we mark the piece list as not sorted. This is to avoid a lot of sorting when everybody send us Bitfields at the beginning of a download.
  • if we receive a HAVE and the list is sorted, we put the piece at its place. Otherwise we let the list unsorted.
  • when making a request, we check if the list is sorted. If not, we sort it.

Still to do :

  • random policy at torrent start up
  • review of the sorting algorithms
  • make sure to have the right bitset.haveAll and bitset.haveNone values (this could avoid quite a lot of work on the bitfields)
  • testing testing testing

Since the patch is pretty intrusive, some review would be really welcomed.

Attachments (10)

rarest-first.patch (12.7 KB) - added by athy 8 years ago.
rarest-first2.patch (15.7 KB) - added by athy 8 years ago.
Improvements on the previous patch
rarest-first3.patch (15.6 KB) - added by athy 8 years ago.
Include the work on haveall/havenone correctness since this caused the previous patch to fail on asserts.
rarest-first4.patch (19.9 KB) - added by athy 8 years ago.
rarest-first5.patch (21.8 KB) - added by ijuxda 7 years ago.
Patch to svn/trunk r11855
rarest-first6.patch (32.8 KB) - added by ijuxda 7 years ago.
Incomplete patch to svn/trunk r11872
rarest-first7.patch (22.6 KB) - added by jordan 7 years ago.
patch against r11891 containing the changes described in comment:19
rarest-first8-jordan.patch (21.1 KB) - added by jordan 7 years ago.
patch against r11895 containing the changes described in comment:19 and comment:21
rarest-first9-jordan.patch (21.4 KB) - added by jordan 7 years ago.
patch against r11895. rarest-first8-jordan.patch + the changes described in comment:29
0001-Add-more-randomness-to-rarest-first.patch (1.1 KB) - added by jch 7 years ago.

Download all attachments as: .zip

Change History (50)

Changed 8 years ago by athy

comment:1 Changed 8 years ago by charles

I haven't tested this yet but the approach described in the ticket sounds reasonable. I'll try to review this over the weekend.

Changed 8 years ago by athy

Improvements on the previous patch

comment:2 Changed 8 years ago by athy

Here is a new patch improving the previous one.

No big changes in the way things work but some bug fixes to handle corner cases.

It introduces a function to assert that the replication count is good. (this is done by summing all the pieces that each peer we are connected to have). Since this is a quite expensive operation, I guess it should be disabled by default (even for the nightly builds). It is activated in this patch as well as the assertWeightedPiecesAreSorted function.

I have been developing it on the 2.04 release and it turns out to work quite well (seems to improve the speed a bit but I didn't do any benchmark since it is not the goal of this patch). While testing on the latest trunk, asserting that the replication count is good failed at some point in a random way. It turned out to happen when removing a peer from our neighbors. I probably won't have time to fix that this week-end but if somebody wants to review the patch (that would be great, I'm probably doing some things wrong), it's better for him to have the latest version.

As mentioned in the previous message, I'm also working on some haveall/havenone correctness since this could reduce the consumption of this patch (which is not bad at all by the way). Since this is independent, I will submit it in a separate patch when it's ready.

Last edited 8 years ago by athy (previous) (diff)

Changed 8 years ago by athy

Include the work on haveall/havenone correctness since this caused the previous patch to fail on asserts.

comment:3 Changed 8 years ago by athy

The above bug is now fixed and the patch should run smoothly.

I'm not sure yet if it's a good thing to begin with a random policy (to quickly obtain some pieces to share). I will do some review and testing on the subject before coding anything.

comment:4 Changed 8 years ago by charles

athy: one suggestion, maybe we should add cases to peer-common's PeerEventType? and fire messages back to peer-mgr when a peer says they have a piece or have all? It looks like then we could make the tr_incrReplication*() functions private to peer-mgr.

comment:5 Changed 8 years ago by athy

Right, it is much cleaner and less intrusive than I thought do it this way.

Here is an updated patch which does that and as consequence use (Torrent *) instead of (tr_torrent *). It also add some static / inline and const correctness where it was still needed.

No work on a random policy at the beginning yet. I am still not convinced that it really adds some value (even if it is suggested by Bram Cohen and implemented in various clients).

Changed 8 years ago by athy

comment:6 Changed 8 years ago by jch

Charles, I think this is a serious bug. I pointed this out to you a few months ago, by the way.

While I haven't reviewed this patch in detail, and have some comments below, I strongly suggest committing this patch straight away, and fixing any bugs there might be within the tree.

   /* tertiary key: rarest first */ 
   /* quaternary key: random */

No, that's a bad idea. The right thing is to have enough randomness to be able to perturb rarest first.

Consider a swarm with 50 peers with a single seed and perfect knowledge. Suppose now that the torrent has 50 pieces, and that every non-seed has a distinct piece; then all pieces are available twice, except for one piece which is available only once. Then all 49 leeches will jump on the single piece available once, which is not what you want. Of course, in practice things don't happen just like that, but this kind of effect should nonetheless be avoided.

I suggest having a tertiary key that is computed from something like

pieceReplication * 0x80 + salt % 0x100

--jch

comment:14 Changed 7 years ago by jordan

  • Milestone changed from None Set to 2.30

Changed 7 years ago by ijuxda

Patch to svn/trunk r11855

comment:17 follow-ups: Changed 7 years ago by athy

jch: what your saying makes sense but the impact of that will be really limited compared to what is stated in you comment.

With your example if a piece is only available once, then the seeder will only upload it to 4 peers at a time. (it's stated in the BitTorrent? protocol that a client only keep 4 unchoked peers at the same time). This is not optimal but there will still be 45 peers interested in that piece so it will be useful a bit later.

I feel that for a very unrealistic use case (full knowledge is never reached in such big torrents), you will make more basic use cases behave worse. With you computation, in a swarm with 2 leechers and one seeder, a leecher could request the seeder a piece that the other leecher already has. He will have a piece that nobody is interested in and might not be able to use his upload bandwidth.

In my opinion, the consequences of the problem you describe are already really limited by the fixed numbered of upload slots and the limited (and different) knowledge that each peer has of the swarm. I believe it would be better to keep it simple and that a pure rarest first will, on the average, behave better.



ijuxda: thanks for your changes, it looks a bit better know =)

but I believe that your code is not perturbating the rarest first policy at all. The most significants bits are still from (and only from) the replication count. Then your randomness only differentiate between 2 pieces of same rareness (which is what my code was doing before)

Changed 7 years ago by ijuxda

Incomplete patch to svn/trunk r11872

comment:18 in reply to: ↑ 17 Changed 7 years ago by ijuxda

Replying to athy:

jch: what your saying makes sense but the impact of that will be really limited compared to what is stated in you comment.

With your example if a piece is only available once, then the seeder will only upload it to 4 peers at a time. (it's stated in the BitTorrent? protocol that a client only keep 4 unchoked peers at the same time). This is not optimal but there will still be 45 peers interested in that piece so it will be useful a bit later.

I feel that for a very unrealistic use case (full knowledge is never reached in such big torrents), you will make more basic use cases behave worse. With you computation, in a swarm with 2 leechers and one seeder, a leecher could request the seeder a piece that the other leecher already has. He will have a piece that nobody is interested in and might not be able to use his upload bandwidth.

In my opinion, the consequences of the problem you describe are already really limited by the fixed numbered of upload slots and the limited (and different) knowledge that each peer has of the swarm. I believe it would be better to keep it simple and that a pure rarest first will, on the average, behave better.

I agree.

ijuxda: thanks for your changes, it looks a bit better know =) but I believe that your code is not perturbating the rarest first policy at all. The most significants bits are still from (and only from) the replication count. Then your randomness only differentiate between 2 pieces of same rareness (which is what my code was doing before)

You are right, I did not think through the changes very well.

Upon testing I found two issues with the previous implementation, namely that incomplete metadata was not taken into account (i.e. when using magnet links), nor the case of a peer losing pieces (i.e. when receiving a bitfield with less bits set than our current bitset for that peer).

The attached patch rarest-first6.patch tries to handle these cases, but in my tests I am still getting assertion failures in some rather difficult to reproduce situations. In all likelyhood I have made some mistakes somewhere in the newly added functions or made invalid assumptions about the client's or peers' states. I will try to rectify this with more testing as time permits and hopefully find the cause of the problems.

comment:19 Changed 7 years ago by jordan

Attached is a rarest-first7.patch that takes inspiration from

  • Athy's rarest-first4.patch
  • ijuxda's rarest-first6.patch

Most everything outside of peer-mgr.c is unchanged.

Inside peer-mgr, I've made some structural changes:

  • I felt the isSorted flag was missing some context, as it really meant isSortedByWeight. I've replaced it with a new field, pieceSortState, that uses the existing enum to specify unsorted, sorted-by-index, or sorted-by-weight.
  • Using the new pieceSortState, we can avoid calling setSorted(FALSE) in tr_incrReplicationFromBitfield() and tr_decrReplicationFromBitset() if the pieces array is sorted-by-index.
  • Some of peer-mgr.c's functions have been reorganized to avoid the need for forward-declaring the new replication functions.
  • I've omitted the randomness suggestion from comment:6 in this draft because the counterargument in comment:17 also seems valid and I think the idea needs more discussion. Perhaps there is a middle ground? The behavior described in comment:6 will be worse in larger swarms than in small ones. In addition, the bad random choices described in comment:17 will be less harmful in larger swarms. Perhaps randomness could be added in situations where we have more than N peers?

Changed 7 years ago by jordan

patch against r11891 containing the changes described in comment:19

comment:21 Changed 7 years ago by jordan

Wow, okay, I certainly did have more adjustments to make without realizing it. Magnet links definitely caused a necessary refactoring. Better still, it made me look more closely at what was being patched in bitfield.[ch], bitset.h, and peer-msgs.c, something I should have done earlier.

Attached is a patch revised from my previous one in comment:19. Here is how it differs from the other patches submitted so far:

  • bitfield.[ch] and bitset.h are identical to their versions in trunk. There's no tr_bitfieldInverse(), tr_bitfieldXor(), tr_bitfieldSetAll(), tr_bitsetResize(), tr_bitsetDup(), tr_bitsetFree(), tr_bitsetFieldDifference(), tr_bitsetInverse() or tr_bitsetXor(). Those hundreds of lines of new functions seemed to serve no purpose.
  • The new PeerEventTypes, as compared to trunk, are TR_PEER_CLIENT_GOT_BITFIELD, TR_PEER_CLIENT_GOT_HAVE, TR_PEER_CLIENT_GOT_HAVE_ALL, and TR_PEER_CLIENT_GOT_HAVE_NONE, for times when a client gets one of those four BT messages from a peer.
  • peer-msgs.c is unchanged from trunk except for the addition of the code to fire the four new event types. As with the bitset/bitfield codes, there seems to be a lot of unnecessary new code in the other patches' versions of this file.
  • peer-mgr.c: pieceReplication is not created until a peer calls tr_peerMgrGetNextRequests(). For seeding torrents, this avoids the memory overhead of the pieceReplication array and the CPU overhead of keeping it up-to-date. It also means we don't have to worry about monitoring the a magnet link's metadata state in peer-mgr.c.
  • peer-mgr.c: More bitfield/bitset functions such as bitsetMatchesMetadata() and updateReplicationFromBitset() are not needed.

Changed 7 years ago by jordan

patch against r11895 containing the changes described in comment:19 and comment:21

comment:22 follow-up: Changed 7 years ago by athy

jordan: I only had a very quick look at your patch. But seems to me that you removed some important parts.

For example : when receiving a bitfield you increment the replicationCount of every piece. You should make a diff with the previous bitfield to figure out which pieces you were not already aware of.

Same thing for have_all/have_none.

A lot of of the code you removed _was_ useful.

comment:23 in reply to: ↑ 22 Changed 7 years ago by jordan

Replying to athy:

jordan: I only had a very quick look at your patch. But seems to me that you removed some important parts.

For example : when receiving a bitfield you increment the replicationCount of every piece.

Hi Athy, thanks for looking over the patch. I don't understand what you mean by "increment the replicationCount of every piece", because I don't think the patch does that. Here is a quick walkthrough of what happens on a BT_BITFIELD message:

Here's how peer-msgs.c processes a BT_BITFIELD message:

        case BT_BITFIELD: {
            const size_t bitCount = tr_torrentHasMetadata( msgs->torrent )
                                  ? msgs->torrent->info.pieceCount
                                  : msglen * 8;
            dbgmsg( msgs, "got a bitfield" );
            tr_bitsetReserve( &msgs->peer->have, bitCount );
            tr_peerIoReadBytes( msgs->peer->io, inbuf,
                                msgs->peer->have.bitfield.bits, msglen );
            fireClientGotBitfield( msgs, &msgs->peer->have.bitfield );
            updatePeerProgress( msgs );
            break;

That block is identical to trunk except for the new call to fireClientGotBitfield(), which is a small boilerplate function:

static void
fireClientGotBitfield( tr_peermsgs * msgs, tr_bitfield * bitfield )
{
    tr_peer_event e = blankEvent;
    e.eventType = TR_PEER_CLIENT_GOT_BITFIELD;
    e.bitfield = bitfield;
    publish( msgs, &e );
}

which gets picked up by peerCallbackFunc() in peer-mgr.c here:

        case TR_PEER_CLIENT_GOT_BITFIELD:
            assert( e->bitfield != NULL );
            if( replicationExists( t ) )
                tr_incrReplicationFromBitfield( t, e->bitfield );
            break;

...and then the heart of tr_incrReplicationFromBitfield is exactly what you'd expect:

    size_t i;
    size_t * rep = t->pieceReplication;
    const size_t n = t->tor->info.pieceCount;

    for( i=0; i<n; ++i )
        if( tr_bitfieldHasFast( bitfield, i ) )
            ++rep[i];

So what I think I'm doing is incrementing only the pieces which pass tr_bitfieldHasFast(), which is the correct thing to do. What have I missed?

Same thing for have_all/have_none.

Okay, in peer-msgs.c, the additions to have_all/have_none are analogous to the bitfield handling above, so I won't quote them here. When we get to peerCallbackFunc():

        case TR_PEER_CLIENT_GOT_HAVE_ALL:
            if( replicationExists( t ) )
                tr_incrReplication( t );
            break;

        case TR_PEER_CLIENT_GOT_HAVE_NONE:
            /* noop */
            break;

...which again, seems correct to me.

A lot of of the code you removed _was_ useful.

How? I'm apparently missing something that you're seeing, and would like to understand.

comment:24 Changed 7 years ago by jordan

Ahhhh, I missed an important phrase from your previous message. Disregard the previous walkthrough.

You should make a diff with the previous bitfield to figure out which pieces you were not already aware of.

Same thing for have_all/have_none.

I think this is where we got disconnected. There is no previous bitfield, have_all, or have_none.

This is from the spec's description of bitfield: "The bitfield message may only be sent immediately after the handshaking sequence is completed, and before any other messages are sent. It is optional, and need not be sent if a client has no pieces."

And BEP 006's description of have_all/have_none: "When present, Have All or Have None replace the Have Bitfield. Exactly one of Have All, Have None, or Have Bitfield MUST appear and only immediately after the handshake."

That's why I've left bitset.h, bitfield.[ch], and peer-msgs.c unchanged except for the four new fireEvent() functions.

comment:25 Changed 7 years ago by jordan

Also, the question of adding randomness still seems to be open. There seem to be good arguments both for and against it.

Earlier I guessed that randomness would be more useful in larger swarms than in small ones, so we might want to activate it based on how many peers are available. But this is just a wild-assed guess.

comment:26 follow-up: Changed 7 years ago by athy

Ok, then your changes make sense.

This is what I did first too but I did receive some duplicate bitfields (and spent a lot of time sorting out why the replication count was not exact anymore).

From what I remember, duplicate bitfields are received _very_ rarely. (And havenone never) But it's quite common to send a HaveAll? when you become a seeder.

Ignoring those cases make the code a lot cleaner so I would go for your approach (stay stucked to the BEPs) except for HaveAll?.

I made those observations a few months ago already so please do not change anything based on what I said. I will try confirm those observations tomorrow (australian time) and hopefully fill some bug reports to the faulty clients.

For the randomness, I really believe it is useful only when the number of peer in the torrent is close to the maximum number of peers (80 for almost any client out there). (otherwise, each peer will have a different knowledge of the piece replication in the swarm).

It also needs a number of pieces which is not a lot higher than the number of peers (otherwise and that is almost always true, there will be a lot of rare pieces and there is no need to fight for that).

I really believe we don't need that randomness part.

I think (but once again i read it a long time ago) this is an interesting read on both advantages and problems of the rarest first policy : http://conferences.sigcomm.org/imc/2006/papers/p20-legout.pdf

comment:27 in reply to: ↑ 26 Changed 7 years ago by jordan

Replying to athy:

Ok, then your changes make sense.

This is what I did first too but I did receive some duplicate bitfields (and spent a lot of time sorting out why the replication count was not exact anymore).

From what I remember, duplicate bitfields are received _very_ rarely. (And havenone never) But it's quite common to send a HaveAll? when you become a seeder.

That's interesting -- I've never heard of clients doing any of that. But then, I've never explicitly watched for out-of-order bitfield/all/none messages either. I'll add some code to my test box to watch for that.

Ignoring those cases make the code a lot cleaner so I would go for your approach (stay stucked to the BEPs) except for HaveAll?.

I made those observations a few months ago already so please do not change anything based on what I said. I will try confirm those observations tomorrow (australian time) and hopefully fill some bug reports to the faulty clients.

Sounds good.

comment:28 follow-up: Changed 7 years ago by jordan

I've added some simple code to my seedbox to report if it gets more than one message from the (bitfield,all,none) set. I seed a lot of smaller Linux distro CDs, which should be good conditions for the have-all scenario you described: they're well-seeded, and there are always new downloaders joining the swarm.

comment:29 Changed 7 years ago by jordan

Attached is a patch that changes pieceReplication's type from size_t to uint16_t. This cuts the memory overhead by 1/2 on a 32 bit machine and possibly by 3/4 on 64-bit machines (depending on sizeof size_t).

This makes the assumption that a torrent won't be connected to 65,536 peers at the same time, but that's probably a safe assumption. :)

Last edited 7 years ago by jordan (previous) (diff)

Changed 7 years ago by jordan

patch against r11895. rarest-first8-jordan.patch + the changes described in comment:29

comment:30 in reply to: ↑ 28 Changed 7 years ago by jordan

Replying to jordan:

I've added some simple code to my seedbox to report if it gets more than one message from the (bitfield,all,none) set. I seed a lot of smaller Linux distro CDs, which should be good conditions for the have-all scenario you described: they're well-seeded, and there are always new downloaders joining the swarm.

Nine hours into this test, and things are looking good for the KISS implementation. I've observed Ares, BitComet, BitLord, BitTorrent, Deluge, Lphant, KTorrent, libtorrent (Rakshasa), Thunder, Transmission, Vuze, Xunlei, and uTorrent all reach 100% without sending a "have all" message. Everyone seems to be following the spec wrt when, and how many, all/none/bitfield messages are sent.

Unless there are changes you'd like to make first, let's put patch 9 into trunk for wider testing. It's working well for me. The patch6/patch7 bugs ijuxda alluded to seem to have been in the bitset/bitfield/peer-msgs/metainfo diffs that got dropped. For that matter, I don't think there's anything from patch5 through patch7 in this diff... it's always nice to see code getting shorter instead of longer.

Last edited 7 years ago by jordan (previous) (diff)

comment:31 follow-up: Changed 7 years ago by athy

Replying to ijuxda:

Could you also walk me through what happens when a peer we are connected to loses some pieces it previously had (e.g. by a "delete" feature)? In particular what bittorrent messages are sent and how the client's (i.e. the transmission program with your patch) updates its bitsets and replication counts. If you already did this in a previous comment just point me to it.

A delete feature doesn't really make sense. Anyway there is no way to advertise that to other peers. As pointed out by jordan, once you've sent your bitfield/haveall/havenone message. You're only allowed to advertise new pieces with have messages.

Replying to ijuxda:

Finally, the style is not consistent with your other patches/commits. In particular the spacing of tokens in for statements and brace alignment for if is different. Has the convention for svn/trunk changed? It would be beneficial for all contributors if you would create a style guide so that we could avoid needless formatting differences.

A style guide would be really welcome !


Two things for me :

In tr_decrReplicationFromBitset( ... ), we only need to call invalidatePieceSorting() in the 2nd case (no need to do it if we decrease the replication count of all pieces)

Add a comment in assertReplicationCountIsExact() saying "This assert might fail due to errors of implementations in other clients. It happens when receiving duplicate bitfields/HaveAll/HaveNone from a client. If a such a behavior is noticed, a bug report should be filled to the faulty client." I'm sure I received duplicate bitfields so if that happens to somebody else, it's better to give him a clue where to look at so he can gather enough information to report a bug.

To fix the segfault in tr_decrReplicationFromBitset() : for( it=0 ; it < bitset->bitfield.bitCount ; it++ ) The bitfield size might be smaller than the number of pieces (it grows when you add pieces).

comment:32 in reply to: ↑ 31 ; follow-up: Changed 7 years ago by jordan

Replying to athy:

In tr_decrReplicationFromBitset( ... ), we only need to call invalidatePieceSorting() in the 2nd case (no need to do it if we decrease the replication count of all pieces)

Good point. Done.

Add a comment in assertReplicationCountIsExact() saying "This assert might fail due to errors of implementations in other clients. It happens when receiving duplicate bitfields/HaveAll/HaveNone from a client. If a such a behavior is noticed, a bug report should be filled to the faulty client."

Okay. done.

To fix the segfault in tr_decrReplicationFromBitset() : for( it=0 ; it < bitset->bitfield.bitCount ; it++ ) The bitfield size might be smaller than the number of pieces (it grows when you add pieces).

That would work. I got the same crash earlier today and added if( tr_bitfieldTestFast( b, n-1 ) ) but didn't want to keep flooding the ticket with patches...

comment:33 Changed 7 years ago by jordan

jordan * r11897 libtransmission/ (peer-common.h peer-mgr.c peer-msgs.c webseed.c):

(trunk libT) #3767 "rarest first policy" -- fixed.

This commit, started by a patch from athy, implements a rarest first policy when deciding which pieces to request from peers. It keeps a count of how many peers have each piece, and updates the count when getting bitfields, have, have all, and have none messages, as well as decrementing the counts when peers disconnect.

This running total is generated only for downloading torrents. Seeds don't have this overhead.

comment:34 Changed 7 years ago by jordan

  • Resolution set to fixed
  • Status changed from new to closed

Thanks again Athy! Your patches made this happen.

comment:35 in reply to: ↑ 32 ; follow-up: Changed 7 years ago by athy

Replying to jordan:

Replying to athy:

To fix the segfault in tr_decrReplicationFromBitset() : for( it=0 ; it < bitset->bitfield.bitCount ; it++ ) The bitfield size might be smaller than the number of pieces (it grows when you add pieces).

That would work. I got the same crash earlier today and added if( tr_bitfieldTestFast( b, n-1 ) ) but didn't want to keep flooding the ticket with patches...

This won't work.

For example, if there is 10 pieces in a torrent. A peer has the pieces 0, 1 and 4, his bit bitfield might be : "11001" so bitfield.bitCount will be 5. If the peer send a "HAVE 6", the bitfield will be extended :"1100101" and bitfield.bitCount will be 7.

If the index is not in the bitfield limits, we consider that it doesn't have the piece. But you have to loop on the whole bitfield anyway. Otherwise the replication count will be false when a leecher leaves the swarm.

It should look like that :

+static void
+tr_decrReplicationFromBitset( Torrent * t, const tr_bitset * bitset )
+{
+    int i;
+
+    assert( replicationExists( t ) );
+    assert( t->pieceReplicationSize == t->tor->info.pieceCount );
+
+    if( bitset->haveAll )
+    {
+        for( i=0; i<t->pieceReplicationSize;; ++i )
+            --t->pieceReplication[i];
+    }
+    else if ( !bitset->haveNone )
+    {
+        const tr_bitfield * const b = &bitset->bitfield;
+
+        for( i=0; i<b->bitCount; ++i )
+            if( tr_bitfieldHasFast( b, i ) )
+                --t->pieceReplication[i];
+
+        if( t->pieceSortState == PIECES_SORTED_BY_WEIGHT )
+            invalidatePieceSorting( t );
+    }

Replying to jordan:

Thanks again Athy! Your patches made this happen.

You're welcome :D

Last edited 7 years ago by athy (previous) (diff)

comment:36 in reply to: ↑ 35 ; follow-up: Changed 7 years ago by jordan

Replying to athy:

Replying to jordan:

Replying to athy:

To fix the segfault in tr_decrReplicationFromBitset() : for( it=0 ; it < bitset->bitfield.bitCount ; it++ ) The bitfield size might be smaller than the number of pieces (it grows when you add pieces).

That would work. I got the same crash earlier today and added if( tr_bitfieldTestFast( b, n-1 ) ) but didn't want to keep flooding the ticket with patches...

This won't work.

For example, if there is 10 pieces in a torrent. A peer has the pieces 0, 1 and 4, his bit bitfield might be : "11001" so bitfield.bitCount will be 5. If the peer send a "HAVE 6", the bitfield will be extended :"1100101" and bitfield.bitCount will be 7.

I don't know of any BitTorrent app that does that. That behavior is explicitly forbidden by the spec:

Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set.

The reason for the crash in patch 9 wasn't that the code walked off the end of half-a-bitfield... it was because it tried to walk the bitfield of a brand new peer whose bitfield message we hadn't received yet. :)

Last edited 7 years ago by jordan (previous) (diff)

comment:37 in reply to: ↑ 36 Changed 7 years ago by athy

Replying to jordan:

I don't know of any BitTorrent app that does that. That behavior is explicitly forbidden by the spec:

Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set.

The reason for the crash in patch 9 wasn't that the code walked off the end of half-a-bitfield... it was because it tried to walk the bitfield of a brand new peer whose bitfield message we hadn't received yet. :)

The peers send bitfields of the right length, but the bitfield _we_ store is not always the right length.

Example :

at first your peer->have.bitfield is empty (you received a havenone or just connected to the peer) :

bits = ""
bitcount = 0

you receive a "Have 7"

bits = "00000001"
bitcount = 8

you receive a "Have 9"

bits = "0000000101"
bitcount = 10
Last edited 7 years ago by athy (previous) (diff)

comment:38 Changed 7 years ago by jordan

Uggh, you're right. If we're a magnet link, and if the peer sent a `have none' on handshake, we *still* won't know what pieceCount is when we get that have message.

However that's not enough either, unfortunately. Imagine we get a bitfield from this peer while we're still a magnet link. We don't know exactly how many bits there are in the bitfield, only that its upper bounds is msglen * 8. So in that decr loop, it's possible for bitCount to be larger than pieceCount.

So to be safe we'd have to guard against both conditions:

for( i=0; i<b->bitCount && i<n; ++i )
    if( tr_bitfieldHasFast( b, i ) )
        --t->pieceReplication[i];

By the time we get there, it might be more readable to do it this way:

for( i=0; i<n; ++i )
    if( tr_bitfieldHas( b, i ) )
        --t->pieceReplication[i];

Especially since this is only called when a peer drops out, so the extra overhead of tr_bitfieldHas() vs. tr_bitfieldHasFast() is not going to be significant.

Do you have an opinion on which of these, or some third thing I haven't thought of, would be better?

Strictly speaking, tr_incrReplicationFromBitfield() doesn't need the same safeguards: it's only going to be called when we get a bitfield from a peer after we've already got our own metainfo, so the bitfield will be allocated to the proper size. However, what are the odds that any of us will remember that if/when we go dinking around in this code a year from now. To be on the safe side, we should add the safeguards there too.

comment:39 Changed 7 years ago by Rolcol

Assertion failed crash:

http://transmission.pastebin.com/PNNYbsmM

Only one torrent was running. It was originally added as a magnet link, but the metadata appears to have downloaded correctly. This is a big swarm with poor seed to leech ratio.

About 1000 pieces @ 512KiB

comment:40 Changed 7 years ago by gunzip

r11897 crashed transmission-daemon twice, and had noticeable increase in CPU. reverting to r11895 solved the issues.

OS Linux Debian

comment:41 follow-up: Changed 7 years ago by athy

The crash is probably caused by the issue we are discussing. (basically, the assert might fail from time to time when a leecher gets disconnected).

I think the second proposal is the way to go. The CPU overhead will be really minor and it's really how it should be done to keep things clear (and avoid dealing directly with the implementation of bitfields that might change).

Also, assertWeightedPiecesAreSorted( ) should be deactivated even in nightly builds : it creates a huge CPU increase (basically looping through all pieces of all peers every time we make a request) and it relies on the assumption that other clients are implementing the BitTorrent? protocol perfectly (which we can not guarantee).

comment:42 in reply to: ↑ 41 Changed 7 years ago by jordan

Replying to athy:

I think the second proposal is the way to go. The CPU overhead will be really minor and it's really how it should be done to keep things clear (and avoid dealing directly with the implementation of bitfields that might change).

I've gone with the second proposal in r11899.

Also, assertWeightedPiecesAreSorted( ) should be deactivated even in nightly builds : it creates a huge CPU increase (basically looping through all pieces of all peers every time we make a request) and it relies on the assumption that other clients are implementing the BitTorrent? protocol perfectly (which we can not guarantee).

Done, in r11900.

comment:43 in reply to: ↑ 17 Changed 7 years ago by jch

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to athy:

jch: what your saying makes sense but the impact of that will be really limited compared to what is stated in you comment.

Hopefully, you're right. Still, it makes me uncomfortable. Swarm behaviour is notoriously difficult to predict, and avoiding swarm-wide synchronisation is always a good thing.

I believe it would be better to keep it simple and that a pure rarest first will, on the average, behave better.

I don't think that my scheme is any more complicated, and I wish to most respectfully disagree. Do I have your permission to commit the following patch?

--jch

comment:44 Changed 7 years ago by jordan

I share athy's concerns from comment:17 but there's no practical way to compare the two approaches without a testing environment far beyond our means. So all of us are going on intuition here and unfortunately that's what will decide this question.

IMO the patch is a net gain. Go ahead and commit it.

comment:45 Changed 7 years ago by jordan

(DeHackEd's nonanswer omitted)

22:05:31 < The_8472> <klapaucjusz> I.e. should we sometimes (with low probability) choose not the rarest, to avoid everyone jumping on the same piece? <- that would depend on how many rarest pieces there are

22:05:38 < The_8472> but for that you need an accurate view of the swarm

22:05:42 < The_8472> which is unlikely to happen

22:06:43 < The_8472> meh, he quit

22:07:05 < The_8472> jordan, some time you'll have to tell jch about idling

22:11:05 < jordan> the two viewpoints that sum up what he was asking are at https://trac.transmissionbt.com/ticket/3767#comment:6 and https://trac.transmissionbt.com/ticket/3767#comment:17

22:13:56 < jordan> one argument is that, if all peers in a scarce swarm have perfect knowledge, they'll all request the same rarest piece, which would be suboptimal

22:14:20 < jordan> the counterargument is that swarms don't have perfect knowledge, so adding randomness to the choice to avoid those situations is moot

22:14:25 < The_8472> that usually only happens when there's a slow seed and everyone already has everything except the newest piece

22:14:32 < The_8472> in which case rarest first makes no difference anyway

22:15:40 < The_8472> jch is correct in theory, but it only applies to edge cases that do not occur often in reality. and for non-ideal cases all he would do is adding noise to an already noisy signal

22:16:01 < alus> right. it would be terrible if there was a slow seed which was only going to be around long enough to upload two pieces and instead everyone requested the same piece the whole time

22:16:27 < alus> that is a tremendous edge case which is completely broken anyway if the seeder is available for just a little less time

22:16:34 < The_8472> nono

22:17:38 < The_8472> ah... nvm. i was thinking about a "rarest first among all available" pieces, not "globally-rarest first".

22:17:56 < The_8472> the latter does not apply in the superseed case

22:20:29 < The_8472> anyway, my counter-argument is the following: while adding noise in a theoretical case could yield a real benefit... adding the same noise in real situations could have a negative effect in theory (as in not always requesting the rarest pieces).

22:20:40 < alus> right

22:21:27 < alus> I think the way the problem he's solving would present is it peers had nothing to send to each other.

22:21:33 < alus> in practice I don't see this happen

22:21:48 < alus> swarm death happens, but I don't think this really prevents that much

22:22:44 < The_8472> random-among-rarest works perfectly fine if there's a bunch of rarest pieces to pick from, which usually is the case. as the system should normally develop away from the situation

22:22:54 < The_8472> i.e. if there is only 1 rarest piece then everyone should have it very soon

22:23:18 < The_8472> and thuse the availability number will be bumped by 1, because now there will be 0 rarest pieces of that particular rarity

22:23:29 < The_8472> err... i could have phrased that better

22:23:35 < The_8472> but you know what i mean, i hope

22:26:21 < alus> yes

22:26:36 < alus> and I believe most if not all clients do randomness among the same rarity level

comment:46 Changed 7 years ago by athy

I just remembered something ...

You don't download the rarest piece all the time. When you request a piece to a peer, you don't request the globally rarest piece. You request the rarest piece that he has (if there is no already partially downloaded pieces). After that, all peers will be requested for blocks of the same piece to complete the download.

Basically, the "next piece" will be chosen among the pieces of any peer who as been lucky enough to have an upload slot available when we finished downloading (requesting in fact) the previous piece.

With that, partial knowledge of the swarm and the fact that there is multiple "rarest piece" at the same time ; I guess we have enough (if not too much) noise to avoid adding even more.

comment:47 Changed 7 years ago by jch

  • Resolution set to fixed
  • Status changed from reopened to closed

Hmm...

comment:48 Changed 7 years ago by jordan

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:49 Changed 7 years ago by jordan

  • Owner changed from charles to jordan
  • Status changed from reopened to new

comment:50 Changed 7 years ago by jordan

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.