Opened 8 years ago
Last modified 8 years ago
#5755 new Enhancement
optimum block size
Reported by: | cfpp2p | Owned by: | |
---|---|---|---|
Priority: | Normal | Milestone: | None Set |
Component: | Transmission | Version: | 2.84 |
Severity: | Normal | Keywords: | |
Cc: |
Description
Currently torrent.c tr_getBlockSize() does not always find the maximum block size that could be used resulting in a less than optimal. For better efficiency and less overhead in requests why not use a block size that is as close to MAX_BLOCK_SIZE as we can. No additional block sizes different than transmission previously allowed are permitted due to the limits enforced.
The proposed code addresses this issue. It is simple and has only small change to the code. Very small blocks should not be allowed, see https://wiki.theory.org/BitTorrentSpecification#request:_.3Clen.3D0013.3E.3Cid.3D6.3E.3Cindex.3E.3Cbegin.3E.3Clength.3E
Strictly, the specification allows 215 (32KB) requests. The reality is near all clients will now use 214 (16KB) requests. Due to clients that enforce that size, it is recommended that implementations make requests of that size. Due to smaller requests resulting in higher overhead due to tracking a greater number of requests, implementers are advised against going below 214 (16KB).
. . As a side effect we also allow more piece sizes as depicted in ticket:4005
/** * Decide on a block size. Constraints: * (1) most clients decline requests over 16 KiB * (2) pieceSize must be a multiple of block size */ uint32_t tr_getBlockSize( uint32_t pieceSize ) { uint32_t b = MAX_BLOCK_SIZE; // this was established in the previos revision logic // do we really want to allow ANY piece sizes less than MAX_BLOCK_SIZE? if( pieceSize <= MAX_BLOCK_SIZE ) return pieceSize; /** * use a minimum block size of 8193 - same as previous revision * lesser block sizes create an undo number of requests * note - allowing a block size of 1 would guarantee any piece size * but smaller requests result in higher overhead * due to tracking a greater number of requests * there should be some constraints on this * we cant have everything */ // try for the LARGEST block size we can find // not just any one that is greater than 8192 as per previous revision // this way we are more efficient and less overhead while( (b > 8193) && (pieceSize % b) ) b--; if( pieceSize % b ) /* not cleanly divisible */ return 0; return b; }
Currently for this to work we must also fix ticket:5754 which can be done in a couple of different ways. ticket:5736#comment:2 ticket:4005#comment:25
Attachments (1)
Change History (15)
comment:1 follow-ups: ↓ 3 ↓ 5 Changed 8 years ago by x190
comment:2 Changed 8 years ago by suitessay
This modification halves the unhandled piece lengths in the sampling from ticket:4005#comment:14 to the following:
21107 |
26857 |
99476 |
270544 |
360286 |
450358 |
524281 |
579202 |
635790 |
703686 |
807504 |
873821 |
1179666 |
1194304 |
1233338 |
1523696 |
1717986 |
2048576 |
2097197 |
2097271 |
3342387 |
3801189 |
6357063 |
6357106 |
6488161 |
7077993 |
7274611 |
7274612 |
7471150 |
7471215 |
7852528 |
10132107 |
23216671 |
2147483647 |
The first row is prime, and the last is INT_MAX, so it may be possible to dismiss torrents like these as pathological.
comment:3 in reply to: ↑ 1 Changed 8 years ago by cfpp2p
Replying to x190:
The v2.84 original code is just plain wrong, ...
It originates from long ago at r2531 torrent.c changes. I've attached the old r2531 torrent.c patch for informational purposes.
I might see how it's easy to get mixed up about powers of 2, factors and computational math. The pre r2531 original code used either 16k ( 1 << 14 ) or piece size if that was smaller. It's not as simple as dividing by two to find the best factor and also this leads to missing lots of possible factors. There must have been a problem with non-16k divisible torrents starting to circulate. At any rate this old code apparently never really got looked at after this. This code has carried through and remained in its basic form for seven years now.
- /* Block size: usually 16 kib, or less if we have to */ - tor->blockSize = MIN( tor->info.pieceSize, 1 << 14 ); - tor->blockCount = ( tor->info.totalSize + tor->blockSize - 1 ) / - tor->blockSize; + /** + * Decide on a block size. constraints: + * (1) most clients decline requests over 16 KiB + * (2) pieceSize must be a multiple of block size + */ + tor->blockSize = info->pieceSize; + while( tor->blockSize > (1024*16) ) + tor->blockSize /= 2;
BTW, couldn't #5734 be closed now, as it seems basically a duplicate of this ticket?
done
Replying to suitessay:
This modification halves the unhandled piece lengths in the sampling from ticket:4005#comment:14 to the following:
The point I have with this ticket is that block size is not determined to the best optimal result. If it solves issue regarding torrents coming up "Invalid or corrupt" that's a bonus. You're right, a prime number only factors to 1 and itself.
comment:4 Changed 8 years ago by x190
IMO, when choosing a minimum block size, the deciding factor should be whether peers/clients will respond to these requests. We are talking about a very small, but not insignificant, number of torrents, so I feel the priority should be to always share the data, if possible. This would be balanced by the increase in efficiency we should get by optimizing block size elsewhere.
comment:5 in reply to: ↑ 1 ; follow-up: ↓ 6 Changed 8 years ago by jordan
Replying to x190:
if ((pieceSize % MAX_BLOCK_SIZE) == 0) {
tr_logAddInfo ("Block size in tr_getBlockSize is %d", MAX_BLOCK_SIZE); return MAX_BLOCK_SIZE;
}
This block seems to be redundant -- if you remove it, won't the following block come up with the same result?
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 8 years ago by x190
Replying to jordan:
Replying to x190:
if ((pieceSize % MAX_BLOCK_SIZE) == 0) {
tr_logAddInfo ("Block size in tr_getBlockSize is %d", MAX_BLOCK_SIZE); return MAX_BLOCK_SIZE;
}
This block seems to be redundant -- if you remove it, won't the following block come up with the same result?
Please see comment:14.
comment:7 in reply to: ↑ 6 ; follow-up: ↓ 8 Changed 8 years ago by cfpp2p
Replying to x190:
...Note that pieceSize < MAX_BLOCK_SIZE happens almost never according to mike's stats...
yes, according to mike.dld's magic - 8 different sizes between the two trackers. Each size only occurring once on each of the two trackers. 8 out of nearly 14 million total. 916, 1599, 2111, 2961, 4198, 5120, 7123, 13409
...but is covered nevertheless...
returns the same result as if( pieceSize <= MAX_BLOCK_SIZE ) return pieceSize; -- when b is equal to pieceSize. Rather than focusing on the efficiently for this section of code I think the only real issue here is how small a block size we should allow when pieceSize is greater than MAX_BLOCK_SIZE. The coding here isn't that difficult.
oops-edit -> ...how small a block...
comment:8 in reply to: ↑ 7 Changed 8 years ago by x190
Replying to cfpp2p:
Replying to x190:
... how small a piece size we should allow when pieceSize is greater than MAX_BLOCK_SIZE.
Meaning "how small a block size we should allow when pieceSize is greater than MAX_BLOCK_SIZE"?
See comment:4
As a note on mike.dld's stats: His stats are for 2 public trackers, while many of these oddball torrents may be from private trackers and I feel we should try to accommodate them, if possible.
comment:9 follow-up: ↓ 10 Changed 8 years ago by cfpp2p
The worst case scenario I could find is when we allow piece sizes up to 232 and block sizes down to 1. It turns out it's not just a freak occurrence for a block size of 1.
With a piece size of the prime number 4,294,967,291 http://www.bigprimes.net/cruncher/4294967291/ we get a block size of 1 and very inefficient messages. And there are lots of primes in the 4 billion range. I'm coming up with about 13,318,400 primes just between 4,000,000,000 and 4,294,967,291 alone.
http://www.bigprimes.net/archive/prime/2032803/ to http://www.bigprimes.net/archive/prime/1899619/
I could see malicious torrents utilizing excessive bandwidth with ideas like this.
comment:10 in reply to: ↑ 9 ; follow-ups: ↓ 11 ↓ 12 Changed 8 years ago by x190
Replying to cfpp2p:
...a block size of 1 and very inefficient messages.
Is it possible to do some real life testing?
comment:11 in reply to: ↑ 10 Changed 8 years ago by cfpp2p
Replying to x190:
Replying to cfpp2p:
...a block size of 1 and very inefficient messages.
Is it possible to do some real life testing?
A block size of 1 simply doesn't make sense to me. The raw overhead alone is 13 bytes and then only 1 byte for the piece sent. That's an efficiency of 0.07 and is ridiculous. We'd transfer 13 times the amount of useful data. Where as, for example, block size of 8193 is greater than 0.998 efficient. This is the raw overhead on the just the 'piece' message itself and equally as bad with 'request' messages and large number so required in such cases.
references: The length prefix is a four byte big-endian value. piece: <len=0009+X><id=7><index><begin><block> https://wiki.theory.org/BitTorrentSpecification#piece:_.3Clen.3D0009.2BX.3E.3Cid.3D7.3E.3Cindex.3E.3Cbegin.3E.3Cblock.3E
const uint32_t msglen = 4 + 1 + 4 + 4 + req.length; https://trac.transmissionbt.com/browser/trunk/libtransmission/peer-msgs.c?rev=14327#L2020
comment:12 in reply to: ↑ 10 ; follow-up: ↓ 13 Changed 8 years ago by x190
Replying to x190:
Replying to cfpp2p:
...a block size of 1 and very inefficient messages.
Is it possible to do some real life testing?
Yes. I just used various values of MAX_BLOCK_SIZE and the code from comment:6. Things work fine right down to 256. Since only a very small number of torrents will need a block size this small, a little extra u/l overhead should be no big deal.
comment:13 in reply to: ↑ 12 Changed 8 years ago by cfpp2p
Replying to x190:
Yes. I just used various values of MAX_BLOCK_SIZE and the code from comment:6. Things work fine right down to 256. Since only a very small number of torrents will need a block size this small, a little extra u/l overhead should be no big deal.
I understand. You tested real torrents by changing the code for MAX_BLOCK_SIZE to see what happens when small block sizes were forced to be used. Nice work.
But when an actual block size is determined with tr_getBlockSize() how did you come with the idea of "very small number of torrents".
Point 1.) The first prime number greater than 16,384 is 16,411 and there are another 203,278,281 primes alone before we get to 232 ( comment:9 ).
Point 2.) We are talking about when a torrent's piece size is greater than MAX_BLOCK_SIZE. If we choose for instance 256 as a minimum for the block size how many numbers between 16,384 and 232 will not have factor of 256 or greater. I suspect it is quite a few and we already know at least 203,278,281 primes. We'd need an algorithm to determine for all numbers between 16384 and 232 if there is any factor greater than 256. Add up how many couldn't be factored and we'd know. Then do the same for 8193 and an informed decision could be made since we'd then know the difference in factorisable between the two minimums. A list from 256 to 8192 of the count of non-factorisable to the list number for values between 16384 and 232 is really what we're after.
I really don't know where you came up with "Since only a very small number of torrents will need a block size this small". If you are basing this on mike.dld statistics I might agree but we'd first need to factor all of the piece sizes in the lists to 256 and determine the numbers. suitessay at comment:2 tested with the 8193 limit.
It's good to know that small block size of 256 work OK but I couldn't say whether I'd prefer that as a minimum over 8193 until I knew the difference in the number of non-factorisable for each, both theoretically and mike.dld list.
comment:14 Changed 8 years ago by x190
Latest and greatest: :-) peer-common.h:
enum { /* this is the maximum size of a block request. most bittorrent clients will reject requests larger than this size. */ MAX_BLOCK_SIZE = (1024 * 16), /* feel free to set this to a value of your choosing, however, * * unacceptable behavior may result from using a lower value. */ MIN_BLOCK_SIZE = 256 };
torrent.c:
/** * Decide on a block size. Constraints: * (1) most clients decline requests over 16 KiB * (2) pieceSize must be a multiple of block size */ uint32_t tr_getBlockSize (uint32_t pieceSize) { uint32_t b = MAX_BLOCK_SIZE; while ((pieceSize % b) != 0 && b > MIN_BLOCK_SIZE) b--; /* torrent is a magnet link or pieceSize is not cleanly divisible * * down to MIN_BLOCK_SIZE. */ if (!pieceSize || (pieceSize % b) != 0) return 0; return b; }
The v2.84 original code is just plain wrong, as it unnecessarily returns smaller block sizes in many cases, when MAX_BLOCK_SIZE could have been used. We can also accommodate that .001% while seldom seeing block sizes <2K.
See comment:14
99.999% of all torrents will use MAX_BLOCK_SIZE with this code, while we can still accommodate the remaining .001%.
BTW, couldn't #5734 be closed now, as it seems basically a duplicate of this ticket?