Opened 11 years ago

Closed 11 years ago

#3427 closed Enhancement (fixed)

PATCH Use Shortest-Job-First scheduling for verifying local data

Reported by: wateenellende Owned by: charles
Priority: Normal Milestone: 2.10
Component: libtransmission Version: 1.93
Severity: Normal Keywords:
Cc:

Description

Hi,

The problem: I just started transmission on a slow machine. It will first verify the local data on the torrents - there are about 10-16 open torrents. 1) Each torrent takes a fairly long time to verify as they're all large files. 2) A torrent will only continue downloading once it has been verified. 3) To complete files as soon a possible, they should start downloading as soon as possible.

Currently, transmission seems to select files at random to verify and start. This is not optimal.

The solution: Use Shortest-Job-First scheduling. This means that the first torrent to be verified out of all that await verification, should be the one that takes least time to verify. This is probably the torrent that has the least amount of data downloaded.

The attached patch implements this, and should allow transmission to start downloading much faster.

Details: This could be implemented in (at least) 2 ways, 1) sort the list by nr-of-bytes-completed in verifyThreadFunc( ... ) in verify.c; 2) Keep the list sorted by inserting in the right position tr_verifyAdd( ... )in verify.c

The second option is more elegant, it requires less calculation. This would need modifications to list.h/list.c, to allow the ordered lists. A function

void tr_list_insert_ordered( tr_list ** list,
                             void *     data,                   
                             TrListCompareFunc compare_func )

is added. Then, a compare function 'compareVerifyBySizeNow' for compares by downloaded data size. Last, the call 'tr_list_append( &verifyList, node );' in verify.c (315) is replaced by 'tr_list_insert_ordered( &verifyList, node, compareVerifyBySizeNow );

This patch compiles but it is NOT TESTED, I don't know whether I should use 'sizeNow' or 'tr_cpHaveValid()' to get the current number of bytes, and neither am i sure that these fields are valid /before/ the torrent is verified.

The description for tr_cpHaveValid() and tr_cpSizeWhenDone() is identical, maybe that should be fixed.

Note that the behaviour was observed with 1.93, but the patch is against 2.01.

Best,

Fokko Beekhof

Attachments (5)

patch_transmission_2.01.bz2 (1.1 KB) - added by wateenellende 11 years ago.
patch implementing shortest-job-first scheduling for verification on startup
verify-shortest-job-first.diff (3.0 KB) - added by charles 11 years ago.
same patch, from trunk
verify-shortest-job-first-bugfix.diff (3.0 KB) - added by wateenellende 11 years ago.
Previous patches had a bug in tr_list_insert_ordered(...) in list.c, handling of empty list. Fixed in this patch.
verify-shortest-job-first-recover.diff (2.7 KB) - added by wateenellende 11 years ago.
Recovered previous implementation of insert_sorted.
verify-shortest-job-first-recover-bugfix.diff (2.4 KB) - added by wateenellende 11 years ago.
Added missing 'node->prev = l->prev;' (comment by sadface)

Download all attachments as: .zip

Change History (15)

Changed 11 years ago by wateenellende

patch implementing shortest-job-first scheduling for verification on startup

Changed 11 years ago by charles

same patch, from trunk

comment:1 Changed 11 years ago by charles

  • Type changed from Bug to Enhancement

Changed 11 years ago by wateenellende

Previous patches had a bug in tr_list_insert_ordered(...) in list.c, handling of empty list. Fixed in this patch.

comment:2 Changed 11 years ago by sadface

tr_list_insert_ordered() is still bugged. You should have recovered a similar function removed some time ago: https://trac.transmissionbt.com/browser/trunk/libtransmission/list.c?rev=7404#L80.

Changed 11 years ago by wateenellende

Recovered previous implementation of insert_sorted.

comment:3 Changed 11 years ago by wateenellende

Here is a new patch, based on sadface's recommendation. I edited the patch by hand, hope it applies. If you spot another bug in it, please point it out so that I don't have to go hunting for it myself again, or provide a new patch yourself...

Additionally, could someone confirm that
1) 'sizeNow' is the correct field to use;
2) its value is meaningful before the data is verified ?

Thanks in advance,
wateenellende

comment:4 Changed 11 years ago by charles

I don't think sizeNow is the right thing to use until after verification's done.

tr_cpSizeWhenDone() isn't perfect either, but it's probably the right thing to use here, since it doesn't rely on the existing file size.

Alternately we could stat() the torrent's files on disk and take the total current file sizes, but that seems like overkill.

comment:5 Changed 11 years ago by wateenellende

I just tried the latest patch, and it's functioning correctly. I applied the patch by hand to Ubuntu's latest sources, and it's doing exactly what it is supposed to: it will verify a torrent with only a bit of data completed before a torrent with more data completed, even if the total size of the second torrent is smaller or if the other was added earlier to the list.

So I guess the patch is ok as it is now ...

comment:6 Changed 11 years ago by sadface

wateenellende, this time you forgot the following line: node->prev = l->prev;

Changed 11 years ago by wateenellende

Added missing 'node->prev = l->prev;' (comment by sadface)

comment:7 Changed 11 years ago by charles

I'm not sure that I understand why "assert(l->prev);" is valid in this diff. What if l points to the first node in the list?

Edit: ah, nvm, I see it's handled in the clause above. :)

Last edited 11 years ago by charles (previous) (diff)

comment:8 Changed 11 years ago by charles

  • Milestone changed from None Set to 2.10
  • Status changed from new to assigned
  • Summary changed from PATCH Consider Shortest-Job-First scheduling for verifying local data on startup to PATCH Use Shortest-Job-First scheduling for verifying local data

It occurs to me that we're already walking through the torrent's files, calling stat() on each one, in torrentHasAnyLocalData(). In fact, that function is just a special case version of the ideal for here. So, instead of being overkill as I suggested before, it's really no extra cost.

comment:9 Changed 11 years ago by charles

r11023 libtransmission/ (list.c list.h verify.c): (trunk libT) #3427 "use shortest-job-first scheduling for verifying local data" -- patch from wateenellende and sadface

This is committed to the nightlies to get testing from other users too. If I've messed anything up in the commit, please let me know and I'll fix it :)

comment:10 Changed 11 years ago by charles

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.